Opleiding Kunstmatige Intelligentie - Computersystemen voor AI programmeurs`

Course Computer Systems for AI-programmers

"Computersystemen voor AI-programmeurs"

Week 40, 2013

Description

Although modern compilers can use many tricks to generate efficient code, much can be done by a programmer to assist the compiler in this task. In this class we show that optimization blockers, such as memory aliasing and procedure calls, seriously restrict the ability of compilers to perform extensive optimizations. We demonstrate the effect of a number of techniques, including loop unrolling and iteration splitting. We show that this requires knowledge about the architecture of modern processors, including the number and type of functional units.

In this class the following concepts are introduced:

pipelining, stalling, bubbles
machine independent optimizations, reducing procedure calls, reducing memory references
machine dependent optimizations, loop unrolling, instruction parallelism, iteration splitting
profiling, van Amdahl's law

Literature

The class is based on chapter of the book Computer Systems: A programmer's perspective by R.E. Bryant and D.R. O'Hallaron.

Recommanded reading (62 pages, 3 hours):

5.1 Capabilities and Limitations of Optimizing Compilers

5.2 Expressing Program Performance

5.3 Program Example

5.4 Eliminating Loop Inefficiencies

5.5 Reducing Procedure Calls

5.6 Eliminating Unneeded Memory References

5.7 Understanding Modern Processors

5.8 Loop Unrolling

5.9 Enhancing Parallism

5.10 Summery of Results for Optimizing Code

5.11 Some Limiting Factors

5.13 Life in the Real World: Performance Improvement Techniques

5.14 Identifying and Eliminating Performance Bottlenecks

5.15 Summary

Schedule

The class is scheduled in three hours:

Lecture (Recording): 'Computer system - Optimizing program performance (machine independent)'
Practice Problem 5.1: 'What effect has the call swap(&xp, &xp)?'
Practice Problem 5.3: 'Indicate the number of function calls in 3 code-fragments'
Lecture (Recording) : 'Computer system - Optimizing program performance (machine dependent)'
Practice Problem 5.8: 'Different associations of aprod'
Exercise: 'Optimize the memory usage of the procedure transpose'
Description, code, Background article about a technique to reduce cache misses .

Last updated 29 September 2014

This web-page and the list of participants to this course is maintained by Arnoud Visser (arnoud@science.uva.nl)
Faculty of Science
University of Amsterdam