Modern CPU Core Architecture: Superscalar, Branch Prediction, Out-of-Order Execution

 
A Parallelization Framework
for Recursive Tree Programs
by Paul Feautrier
 
Kirshanthan Sundararajah
School of Electrical and Computer Engineering
 
1
 
Regular Vs Irregular Programs
 
Arrays Vs Pointers
Iteration space, data space, execution order,
dependence
Finding parallelism depends on our ability to answer
questions about the associated Z-polytopes.
Is it possible to construct such a framework for
irregular programs?
 
2
 
Polytope Model
 
Iteration Vector
Bounds
Scanning a Polytope
 
3
 
for(i=0; i < N; ++i)
    for(j=0; j < i; ++j)
        S;
 
Definition of Dependence
 
“Two operations are dependent if they access the
same memory cell, one at least of the two accesses
being a write.”
 
4
 
Contributions
 
Control Domain of Tree Programs
Locations and Function From Location to Values
Address Functions
 
5
 
Control Domain
 
6
 
Control Automaton
 
7
 
Address Function (Rational Transduction)
 
8
 
Toy Language Tau
 
No pointers are allowed. They are replaced by
addresses.
The only data structures are scalars and trees.
Trees are always global.
No function returns an address.
The only control structures are conditionals function
calls
 
9
 
Dependence Analysis in Tau
 
Parallelization Model:
Control Parallelism and Data Parallelism
 
10
 
Synthetic Dependence Graph
 
Consider a function 
foo
 and the statements {
S1, S2 .., Sn
}
of its body.
There is a dependence edge from 
Si
 to 
Sj
, 
i < j 
iff there
exists three iteration words u, v, w such that,
u is an iteration of 
foo
.
Both u.i.v and u.j.w are iterations of some terminal
statements 
Sk
 and 
Sl
.
u.i.v and u.j.w are in dependence.
 
11
 
Synthetic Dependence Graph
 
12
 
Parallelization: Topological Sorting of SDG
The Dependence Test (Algorithm D)
 
Parallelization of Sum
 
13
 
Conclusion and Future Work
 
Data Structures are restricted to trees
Operations on the addresses are limited to postfixing
The analysis is operation oriented, meaning that
addresses are associated to operations, not to
statements. This allows to get more precise results
when computing dependences.
 
14
 
Conclusion and Future Work
 
Trees of arrays extension for multigrid method
Extending to DAGs and Cyclic Graphs by prefix
operator.
 
15
 
A Parallelization Framework
for Recursive Tree Programs
by Paul Feautrier
 
Kirshanthan Sundararajah
School of Electrical and Computer Engineering
 
16
Slide Note
Embed
Share

This review delves into the advanced features of modern CPU cores including superscalar pipelines, branch prediction, out-of-order execution, and sophisticated memory hierarchies. Explore the implications of these architectural elements on software performance, focusing on concepts like temporal and spatial locality, dependence, and semantically equivalent programs. Gain insights into loop optimizations for achieving higher performance, with a spotlight on affine loops and array accesses.

  • CPU Core
  • Architecture Review
  • Modern Technology
  • Software Performance
  • Loop Optimizations

Uploaded on Feb 27, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. A Parallelization Framework for Recursive Tree Programs by Paul Feautrier Kirshanthan Sundararajah School of Electrical and Computer Engineering 1

  2. Regular Vs Irregular Programs Arrays Vs Pointers Iteration space, data space, execution order, dependence Finding parallelism depends on our ability to answer questions about the associated Z-polytopes. Is it possible to construct such a framework for irregular programs? 2

  3. Polytope Model Iteration Vector Bounds Scanning a Polytope for(i=0; i < N; ++i) for(j=0; j < i; ++j) S; 3

  4. Definition of Dependence Two operations are dependent if they access the same memory cell, one at least of the two accesses being a write. 4

  5. Contributions Control Domain of Tree Programs Locations and Function From Location to Values Address Functions 5

  6. Control Domain 6

  7. Control Automaton 7

  8. Address Function (Rational Transduction) 8

  9. Toy Language Tau No pointers are allowed. They are replaced by addresses. The only data structures are scalars and trees. Trees are always global. No function returns an address. The only control structures are conditionals function calls 9

  10. Dependence Analysis in Tau Parallelization Model: Control Parallelism and Data Parallelism 10

  11. Synthetic Dependence Graph Consider a function foo and the statements {S1, S2 .., Sn} of its body. There is a dependence edge from Si to Sj, i < j iff there exists three iteration words u, v, w such that, u is an iteration of foo. Both u.i.v and u.j.w are iterations of some terminal statements Sk and Sl. u.i.v and u.j.w are in dependence. 11

  12. Synthetic Dependence Graph Parallelization: Topological Sorting of SDG The Dependence Test (Algorithm D) 12

  13. Parallelization of Sum 13

  14. Conclusion and Future Work Data Structures are restricted to trees Operations on the addresses are limited to postfixing The analysis is operation oriented, meaning that addresses are associated to operations, not to statements. This allows to get more precise results when computing dependences. 14

  15. Conclusion and Future Work Trees of arrays extension for multigrid method Extending to DAGs and Cyclic Graphs by prefix operator. 15

  16. A Parallelization Framework for Recursive Tree Programs by Paul Feautrier Kirshanthan Sundararajah School of Electrical and Computer Engineering 16

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#