ECE 454 Computer Systems Programming & Optimization

ECE 454
Computer Systems Programming
Compiler and Optimization (I)
Ding Yuan
ECE Dept., University of Toronto
http://www.eecg.toronto.edu/~yuan
Content
Compiler Basics
Understanding Compiler Optimization
Manual Optimization (Next lecture)
Advanced Optimizations (Next lecture)
Parallel Unrolling
Profile-Directed Feedback
Ding Yuan, ECE454
2
A Brief History of Compilation
Ding Yuan, ECE454
3
In the Beginning…
Programmer
1010010010
0101101010
1010010100
1010001010
 
Programmers wrote machine instructions
Ding Yuan, ECE454
4
Then Came the Assembler
Processor
 
Programmers wrote human-readable 
assembly
Assembler
Machine Instructions
Ding Yuan, ECE454
5
Then Came the Compiler
Processor
 
Programmers wrote high-level language (HLL)
Assembly
Compiler
Machine Instructions
Ding Yuan, ECE454
6
Overview: Compilers &
Optimizations
Ding Yuan, ECE454
7
Goals of a Compiler
 
Correct program executes correctly
Provide support for debugging incorrect programs
Program executes fast
Compilation is fast?
Small code size?
More efficient use of energy?
Ding Yuan, ECE454
8
Inside a Basic Compiler
Front End
High-level 
language
(C, C++, Java)
Low-level 
language
(IA64)
HLL
IR
Code
Generator
LLL
Intermediate
Representation
(similar to assembly)
CSC488 Compilers and Interpreters
Ding Yuan, ECE454
9
Inside an optimizing compiler
Front End
Optimizer
High-level 
language
(C, C++, Java)
Low-level 
language
(IA32)
HLL
IR
IR
(Improved)
Code
Generator
LLL
ECE540  Optimizing Compilers
Ding Yuan, ECE454
10
Control Flow Graph:
(how a compiler sees your program)
    add …
L1
: add …
    add …
    branch L2
    add …
L2
: add …
    branch L1
    return …
Example IR
:
Basic Blocks
:
Basic Block
: 
a group of consecutive
instructions with a single entry point
and a single exit point
Ding Yuan, ECE454
11
Performance Optimization:
3 Requirements
 
1)
Preserve correctness
the speed of an incorrect program is irrelevant
2)
On average improve performance
Optimized may be worse than original if unlucky
3)
Be “worth the effort”
Is this example worth it?
1 person-year of work to implement compiler optimization
2x increase in compilation time
0.1% improvement in speed
Ding Yuan, ECE454
12
How do optimizations improve performance?
Execution_time = num_instructions * CPI * 
time/cycle
Fewer cycles per instruction
E.g.: Schedule instructions to avoid hazards
E.g.: Improve cache/memory behavior
Eg., prefetching, locality
Fewer instructions
E.g.: Target special/new instructions
Ding Yuan, ECE454
13
Role of Optimizing Compilers 
 
Provide efficient mapping of program to machine
eliminating minor inefficiencies
code selection and ordering
register allocation
Don’t (usually) improve asymptotic efficiency
up to programmer to select best overall algorithm
big-O savings are (often) more important than constant factors
but constant factors also matter
Ding Yuan, ECE454
14
Limitations of Optimizing Compilers
Operate Under Fundamental Constraints
Must not cause any change in program behavior under any
possible condition
Most analysis is performed only within procedures
inter-procedural analysis is too expensive in most cases
Most analysis is based only on 
static
 information
compiler has difficulty anticipating run-time inputs
When in doubt, the compiler must be conservative
Ding Yuan, ECE454
15
Role of the Programmer
How should I write my programs, given that I have a good, optimizing compiler?
Don’t: Smash Code into Oblivion
Hard to read, maintain, & assure correctness
Do:
Select best algorithm
Write code that’s readable & maintainable
Procedures, recursion
Even though these factors can slow down code
Eliminate optimization blockers
Allows compiler to do its job
Focus on Inner Loops
Do detailed optimizations where code will be executed repeatedly
Will get most performance gain here
Ding Yuan, ECE454
16
Optimization Basics
Ding Yuan, ECE454
17
Compiler Optimizations
Machine independent (apply equally well to most CPUs)
Constant propagation
Constant folding
Common Subexpression Elimination
Dead Code Elimination
Loop Invariant Code Motion
Function Inlining
Machine dependent (apply differently to different CPUs)
Instruction Scheduling
Loop unrolling
Parallel unrolling
Could do these manually, better if compiler does them
Many optimizations make code less readable/maintainable
Ding Yuan, ECE454
18
Constant Propagation (CP)
a = 5;
b = 3;
   :
   :
n = a + b;
for
 (i = 0 ; i < n ; ++i) {
      :
}
Replace variables with constants when possible
Ding Yuan, ECE454
19
Constant Folding (CF)
Evaluate expressions containing constants
Can lead to further optimization
E.g., another round of constant propagation
   :
   :
   :
   :
n = 5 + 3;
for
 (i = 0 ; i < n ; ++i) {
      :
}
Ding Yuan, ECE454
20
Common  Sub-expression Elimination (CSE)
a = c * d;
      :
      :
d = (c * d + t) * u
Try to only compute a given expression once
 
(assuming the variables have not been modified)
 
a = c * d;
      :
d = (a + t) * u
Ding Yuan, ECE454
21
Dead Code Elimination (DCE)
debug = 0;  
// set to False
    :
if
 (debug) {
    :
    :
}
a = f(b);
 Compiler can determine if certain code will never execute:
 
Compiler will remove that code
 You don’t have to worry about such code impacting performance
i.e., you are more free to have readable/debugable programs
!
 
debug = 0;
     :
     :
a = f(b);
Ding Yuan, ECE454
22
Loop Invariant Code Motion (LICM)
for
 (i=0; i < 100 ;  ++i) {
  
for
 (j=0; j < 100 ; ++j) {
    
for
 (k=0 ; k < 100 ; ++k) { 
      a[i][j][k] = i*j*k;
    }
  }
}
 
C: multi-dimensional array is stored in 
row-major order
a[I][J][K]
Ding Yuan, ECE454
23
 
a[i][j][k] = addr_of_a + i x J x K + j x K + k
Loop Invariant Code Motion (LICM)
for
 (i=0; i < 100 ;  ++i) {
  
for
 (j=0; j < 100 ; ++j) {
    
for
 (k=0 ; k < 100 ; ++k) { 
      a[i][j][k] = i*j*k;
    }
  }
}
 
Loop invariant: value does not change across iterations
LICM: move invariant code out of the loop
Big performance wins:
Inner loop will execute 1,000,000 times
Moving code out of inner loop results in big savings
 
for
 (i = 0; i < 100 ; ++i) {
  
t1 = a[i]; // t1=a+ixJxK
  
for
 (j = 0; j < 100 ; ++j) {
    
tmp = i * j;
    t2 = t1[j]; // t2=t1+jxK
    
for
 (k = 0 ; k < 100 ; ++k) {
      
t2[k] = tmp * k;
    }
  }
}
Ding Yuan, ECE454
24
Function Inlining
main(){
  
x =
 foo(
x
);
}
foo(int z){
  int m = 5;
  return z + m;
}
Code size
 can decrease if small procedure body and few calls
 can increase if big procedure body and many calls
Performance
 eliminates call/return overhead
 can expose potential optimizations
 can be hard on instruction-cache if many copies made
As a Programmer:
 a good compiler should inline for best performance
 feel free to use procedure calls to make your code readable!
Ding Yuan, ECE454
25
Loop Unrolling
reduces loop overhead
Fewer adds to update j
Fewer loop condition tests
enables more aggressive instruction scheduling
more instructions for scheduler to move around
j = 0;
while
 (j < 100){
   a[j] = b[j+1];
   j += 1;
}
j = 0;
while
 (j < 99){
   a[j] = b[j+1];
   a[j+1] = b[j+2];
   j += 2;
}
Ding Yuan, ECE454
26
Summary: gcc Optimization Levels
-g:
Include debug information, no optimization
-O0:
Default, no optimization
-O1:
Do optimizations that don’t take too long
CP, CF, CSE, DCE, LICM, inline functions called once
-O2:
Take longer optimizing, more aggressive scheduling (e.g., inline small functions)
-O3:
Make space/speed trade-offs: loop unrolling, more inlining
-Os:
Optimize program size
 
Ding Yuan, ECE454
27
Slide Note
Embed
Share

Explore the world of computer systems programming and optimization with ECE 454 at the University of Toronto. Dive into compiler basics, manual optimization, advanced techniques, and more through a comprehensive overview of compiler history. From programmer-machine instructions to high-level languages, discover the evolution of programming tools and techniques. Uncover the goals of a compiler, the inner workings of a basic compiler, and insights into optimizing compilers. Delve into topics like low-level and high-level languages, code generation, intermediate representation, and the process of compiler optimization.

  • Computer Systems Programming
  • Optimization
  • Compiler Basics
  • University of Toronto
  • ECE 454

Uploaded on Feb 17, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto http://www.eecg.toronto.edu/~yuan

  2. Content Compiler Basics Understanding Compiler Optimization Manual Optimization (Next lecture) Advanced Optimizations (Next lecture) Parallel Unrolling Profile-Directed Feedback 2 Ding Yuan, ECE454

  3. A Brief History of Compilation 3 Ding Yuan, ECE454

  4. In the Beginning Programmer Processor 1010010010 0101101010 1010010100 1010001010 Programmers wrote machine instructions 4 Ding Yuan, ECE454

  5. Then Came the Assembler Programmer Machine Instructions Processor Add r3,r3,r1 Cmp r3,r1 Bge 0x3340a Mulu r3,r5,r2 Sub r1,r3,r4 1010010010 0101101010 1010010100 1010001010 Assembler Programmers wrote human-readable assembly 5 Ding Yuan, ECE454

  6. Then Came the Compiler Programmer Assembly Machine Instructions int Foo (int x){ return x + 5; } Add r3,r3,r1 Cmp r3,r1 Bge 0x3340a Mulu r3,r5,r2 Sub r1,r3,r4 1010010010 0101101010 1010010100 1010001010 Compiler Processor Programmers wrote high-level language (HLL) 6 Ding Yuan, ECE454

  7. Overview: Compilers & Optimizations 7 Ding Yuan, ECE454

  8. Goals of a Compiler Correct program executes correctly Provide support for debugging incorrect programs Program executes fast Compilation is fast? Small code size? More efficient use of energy? 8 Ding Yuan, ECE454

  9. Inside a Basic Compiler CSC488 Compilers and Interpreters Low-level language High-level language Code Generator Front End (IA64) (C, C++, Java) IR LLL HLL Intermediate Representation (similar to assembly) 9 Ding Yuan, ECE454

  10. Inside an optimizing compiler ECE540 Optimizing Compilers Low-level language High-level language Code Generator Front End Optimizer (IA32) (C, C++, Java) HLL IR LLL IR (Improved) 10 Ding Yuan, ECE454

  11. Control Flow Graph: (how a compiler sees your program) Example IR: Basic Blocks: add add L1: add add branch L2 add L2: add branch L1 return L1: add add branch L2 add L2: add branch L1 Basic Block: a group of consecutive instructions with a single entry point and a single exit point return 11 Ding Yuan, ECE454

  12. Performance Optimization: 3 Requirements 1) Preserve correctness the speed of an incorrect program is irrelevant 2) On average improve performance Optimized may be worse than original if unlucky 3) Be worth the effort Is this example worth it? 1 person-year of work to implement compiler optimization 2x increase in compilation time 0.1% improvement in speed 12 Ding Yuan, ECE454

  13. How do optimizations improve performance? Execution_time = num_instructions * CPI * time/cycle Fewer cycles per instruction E.g.: Schedule instructions to avoid hazards E.g.: Improve cache/memory behavior Eg., prefetching, locality Fewer instructions E.g.: Target special/new instructions 13 Ding Yuan, ECE454

  14. Role of Optimizing Compilers Provide efficient mapping of program to machine eliminating minor inefficiencies code selection and ordering register allocation Don t (usually) improve asymptotic efficiency up to programmer to select best overall algorithm big-O savings are (often) more important than constant factors but constant factors also matter 14 Ding Yuan, ECE454

  15. Limitations of Optimizing Compilers Operate Under Fundamental Constraints Must not cause any change in program behavior under any possible condition Most analysis is performed only within procedures inter-procedural analysis is too expensive in most cases Most analysis is based only on static information compiler has difficulty anticipating run-time inputs When in doubt, the compiler must be conservative 15 Ding Yuan, ECE454

  16. Role of the Programmer How should I write my programs, given that I have a good, optimizing compiler? Don t: Smash Code into Oblivion Hard to read, maintain, & assure correctness Do: Select best algorithm Write code that s readable & maintainable Procedures, recursion Even though these factors can slow down code Eliminate optimization blockers Allows compiler to do its job Focus on Inner Loops Do detailed optimizations where code will be executed repeatedly Will get most performance gain here 16 Ding Yuan, ECE454

  17. Optimization Basics 17 Ding Yuan, ECE454

  18. Compiler Optimizations Machine independent (apply equally well to most CPUs) Constant propagation Constant folding Common Subexpression Elimination Dead Code Elimination Loop Invariant Code Motion Function Inlining Machine dependent (apply differently to different CPUs) Instruction Scheduling Loop unrolling Parallel unrolling Could do these manually, better if compiler does them Many optimizations make code less readable/maintainable 18 Ding Yuan, ECE454

  19. Constant Propagation (CP) a = 5; b = 3; : : n = a + b; for (i = 0 ; i < n ; ++i) { : } n = 5 + 3 Replace variables with constants when possible 19 Ding Yuan, ECE454

  20. Constant Folding (CF) : : : : n = 5 + 3; for (i = 0 ; i < n ; ++i) { : } n = 8 8 Evaluate expressions containing constants Can lead to further optimization E.g., another round of constant propagation 20 Ding Yuan, ECE454

  21. Common Sub-expression Elimination (CSE) a = c * d; : : d = (c * d + t) * u a = c * d; : d = (a + t) * u Try to only compute a given expression once (assuming the variables have not been modified) 21 Ding Yuan, ECE454

  22. Dead Code Elimination (DCE) debug = 0; // set to False : if (debug) { : : } a = f(b); debug = 0; : : a = f(b); Compiler can determine if certain code will never execute: Compiler will remove that code You don t have to worry about such code impacting performance i.e., you are more free to have readable/debugable programs! 22 Ding Yuan, ECE454

  23. Loop Invariant Code Motion (LICM) for (i=0; i < 100 ; ++i) { for (j=0; j < 100 ; ++j) { for (k=0 ; k < 100 ; ++k) { a[i][j][k] = i*j*k; } } } C: multi-dimensional array is stored in row-major order a[I][J][K] a[0][0][0] a[0][0][1] a[0][0][K-1] a[0][1][0] a[I-1][J-1][0] .. a[I-1][J-1][K-1] a[i][j][k] = addr_of_a + i x J x K + j x K + k 23 Ding Yuan, ECE454

  24. Loop Invariant Code Motion (LICM) for (i = 0; i < 100 ; ++i) { t1 = a[i]; // t1=a+ixJxK for (j = 0; j < 100 ; ++j) { tmp = i * j; t2 = t1[j]; // t2=t1+jxK for (k = 0 ; k < 100 ; ++k) { t2[k] = tmp * k; } } } for (i=0; i < 100 ; ++i) { for (j=0; j < 100 ; ++j) { for (k=0 ; k < 100 ; ++k) { a[i][j][k] = i*j*k; } } } Loop invariant: value does not change across iterations LICM: move invariant code out of the loop Big performance wins: Inner loop will execute 1,000,000 times Moving code out of inner loop results in big savings 24 Ding Yuan, ECE454

  25. Function Inlining main(){ x = x + 5; } main(){ { int foo_z = x; int m = 5; int foo_return = foo_z + m; x = foo_return; } } foo(int z){ int m = 5; return z + m; } main(){ x = foo(x); } Code size can decrease if small procedure body and few calls can increase if big procedure body and many calls Performance eliminates call/return overhead can expose potential optimizations can be hard on instruction-cache if many copies made As a Programmer: a good compiler should inline for best performance feel free to use procedure calls to make your code readable! 25 Ding Yuan, ECE454

  26. Loop Unrolling j = 0; while (j < 99){ a[j] = b[j+1]; a[j+1] = b[j+2]; j += 2; } j = 0; while (j < 100){ a[j] = b[j+1]; j += 1; } reduces loop overhead Fewer adds to update j Fewer loop condition tests enables more aggressive instruction scheduling more instructions for scheduler to move around 26 Ding Yuan, ECE454

  27. Summary: gcc Optimization Levels -g: Include debug information, no optimization -O0: Default, no optimization -O1: Do optimizations that don t take too long CP, CF, CSE, DCE, LICM, inline functions called once -O2: Take longer optimizing, more aggressive scheduling (e.g., inline small functions) -O3: Make space/speed trade-offs: loop unrolling, more inlining -Os: Optimize program size 27 Ding Yuan, ECE454

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#