Course Computer Systems for AI-programmers
"Computersystemen voor AI-programmeurs"
Week 40, 2013
Description
Although modern compilers can use many tricks to generate efficient
code, much can be done by a programmer to assist the compiler in
this task. In this class we show that optimization blockers, such
as memory aliasing and procedure calls, seriously restrict
the ability of compilers to perform extensive optimizations.
We demonstrate the effect of a number of techniques, including
loop unrolling and iteration splitting. We show that
this requires knowledge about the architecture of modern
processors, including the number and type of functional units.
In this class the following concepts are introduced:
- pipelining, stalling, bubbles
- machine independent optimizations, reducing procedure calls, reducing memory references
- machine dependent optimizations, loop unrolling, instruction parallelism, iteration splitting
- profiling, van Amdahl's law
Literature
The class is based on chapter of the book
Computer Systems: A programmer's perspective by R.E. Bryant and D.R. O'Hallaron.
Recommanded reading (62 pages, 3 hours):
5.1  | Capabilities and Limitations of Optimizing Compilers
|
5.2  | Expressing Program Performance
|
5.3  | Program Example
|
5.4  | Eliminating Loop Inefficiencies
|
5.5  | Reducing Procedure Calls
|
5.6  | Eliminating Unneeded Memory References
|
5.7  | Understanding Modern Processors
|
5.8 | Loop Unrolling
|
5.9 | Enhancing Parallism
|
5.10 | Summery of Results for Optimizing Code
|
5.11 | Some Limiting Factors
|
5.13 | Life in the Real World: Performance Improvement Techniques
|
5.14 | Identifying and Eliminating Performance Bottlenecks
|
5.15 | Summary |
Schedule
The class is scheduled in three hours:
-
Lecture (Recording):
'Computer system - Optimizing program performance (machine independent)'
Practice Problem 5.1: 'What effect has the call swap(&xp, &xp)?'
Practice Problem 5.3: 'Indicate the number of function calls in 3 code-fragments'
-
Lecture (Recording) :
'Computer system - Optimizing program performance (machine dependent)'
Practice Problem 5.8: 'Different associations of aprod'
-
Exercise: 'Optimize the memory usage of the procedure transpose'
Description, code, Background article about a technique to reduce cache misses
.
Last updated 29 September 2014
This web-page and the list of participants to this course is maintained by
Arnoud Visser
(arnoud@science.uva.nl)
Faculty
of Science
University of Amsterdam