Course Computer Systems for AI-programmers

"Computersystemen voor AI-programmeurs"

Week 40, 2013

Description

Although modern compilers can use many tricks to generate efficient code, much can be done by a programmer to assist the compiler in this task. In this class we show that optimization blockers, such as memory aliasing and procedure calls, seriously restrict the ability of compilers to perform extensive optimizations. We demonstrate the effect of a number of techniques, including loop unrolling and iteration splitting. We show that this requires knowledge about the architecture of modern processors, including the number and type of functional units.

In this class the following concepts are introduced:

  • pipelining, stalling, bubbles
  • machine independent optimizations, reducing procedure calls, reducing memory references
  • machine dependent optimizations, loop unrolling, instruction parallelism, iteration splitting
  • profiling, van Amdahl's law

Literature

The class is based on chapter of the book Computer Systems: A programmer's perspective by R.E. Bryant and D.R. O'Hallaron.

Recommanded reading (62 pages, 3 hours):

5.1 Capabilities and Limitations of Optimizing Compilers
5.2 Expressing Program Performance
5.3 Program Example
5.4 Eliminating Loop Inefficiencies
5.5 Reducing Procedure Calls
5.6 Eliminating Unneeded Memory References
5.7 Understanding Modern Processors
5.8Loop Unrolling
5.9Enhancing Parallism
5.10Summery of Results for Optimizing Code
5.11Some Limiting Factors
5.13Life in the Real World: Performance Improvement Techniques
5.14Identifying and Eliminating Performance Bottlenecks
5.15Summary


Schedule

The class is scheduled in three hours:

  1. Lecture (Recording): 'Computer system - Optimizing program performance (machine independent)'
    Practice Problem 5.1: 'What effect has the call swap(&xp, &xp)?'
    Practice Problem 5.3: 'Indicate the number of function calls in 3 code-fragments'
  2. Lecture (Recording) : 'Computer system - Optimizing program performance (machine dependent)'
    Practice Problem 5.8: 'Different associations of aprod'
  3. Exercise: 'Optimize the memory usage of the procedure transpose'
    Description, code, Background article about a technique to reduce cache misses .

Last updated 29 September 2014

o This web-page and the list of participants to this course is maintained by Arnoud Visser (arnoud@science.uva.nl)
Faculty of Science
University of Amsterdam

visitors in arnoud@science.uva.nl