Processor Architectures and Compiler Principles

undefined
C
o
m
p
i
l
e
r
 
p
r
i
n
c
i
p
l
e
s
Processor architectures
Jakub Yaghob
 
P
r
o
c
e
s
s
o
r
 
a
r
c
h
i
t
e
c
t
u
r
e
s
Processor architecture is a target language
For compilers targeting a processor code
It influences the compiler backend
, 
namely code
generator
It can affect an intermediate code form and
instructions, used high-level optimizations
 
R
e
g
i
s
t
e
r
s
The fastest memory
Precious resource
x86 
has 
7 
of 
32-
bits integer registers
x86_64 has 15 of 64-bit integer registers
IA-64 
has 
128 
of 
64-
bits integer registers
Different register types for different data types
Integer
, 
floating point
, 
address
, 
vector
Different access modes
Direct
Stack 
(
x87 
FPU)
 
I
n
s
t
r
u
c
t
i
o
n
 
s
e
t
RISC
Small number of simple instructions
e.g. no division
CISC
Complex instructions
, 
wide repertoire
Load-Execute-Store
Register-only operands for arithmetics
Orthogonality
x86 
has non-orthogonal instruction set
 
P
i
p
e
l
i
n
i
n
g
Next instruction fetch and decode started
before the previous instruction was finished
Each stage and each instruction has a
latency
Problem with operand dependency
 (RAW)
 
S
u
p
e
r
s
c
a
l
a
r
 
p
r
o
c
e
s
s
o
r
More equal units
capable of parallel
instruction execution
 
O
u
t
-
o
f
-
O
r
d
e
r
 
E
x
e
c
u
t
i
o
n
Instruction fetch
Instruction dispatch to an instruction queue
The instruction waits until its input operands
are available
The instruction is issued to its execution unit
The results are queued
When all older instructions written their results
to the register file, the instruction writes its
result
 
O
u
t
-
o
f
-
O
r
d
e
r
 
E
x
e
c
u
t
i
o
n
 
 
A
M
D
 
O
u
t
-
o
f
-
O
r
d
e
r
 
E
x
e
c
u
t
i
o
n
 
 
A
R
M
C
o
r
t
e
x
-
A
7
2
 
B
r
a
n
c
h
 
p
r
e
d
i
c
t
i
o
n
Deep pipelines have a problem with not taken conditional jumps
Dynamic branch prediction
BTB
CPU 
uses history
 (
patterns of some length
) 
for branch prediction
Static prediction used when no history is available
Static branch prediction
No hint
Forward jump is not taken, backward jump is taken
Hint
A compiler computes branch probability and generates appropriate
branch hint
Delay slot
 
S
p
e
c
u
l
a
t
i
v
e
 
e
x
e
c
u
t
i
o
n
Execution of a code in advance which is not used later
Significant disproportion between CPU and memory
speed
Usually used for read operations in advance
CPU 
postpones write operations
Memory barriers
Code speculation
Check for successful execution
Data speculation
Moved forward even with possible pointer-aliasing
 
S
I
M
D
 
i
n
s
t
r
u
c
t
i
o
n
s
Sometimes called multimedia instructions
Integer and float data types
Expression vectorization already in
intermediate code or later in code generation
Considerable (constant) execution speedup
Sometimes difficult detection of vectorization
opportunity
 
V
L
I
W
Instruction encoding
Templates in
 IA-64
ILP (
Instruction Level Parallelism
)
Concurrency encoded directly in instructions
Dependencies in concurrency group
RAW, WAW
 disallowed
WAR
 allowed
 
S
o
C
 
(
S
y
s
t
e
m
 
o
n
 
C
h
i
p
)
 
+
m
i
c
r
o
c
o
n
t
r
o
l
l
e
r
s
Small memory space
Optimization for size of code and data
One-bit variables
Separated memory spaces
Code and several data address spaces
Different access
Simple pipeline, no superscalarity
 
C
o
d
e
 
g
e
n
e
r
a
t
i
o
n
Memory allocation
Instruction selection
Register allocation
Instruction scheduling
Output
Output file format
Objects
Code, data, relocations, external and public symbols,
debug information
 
M
e
m
o
r
y
 
a
l
l
o
c
a
t
i
o
n
Code
Jump resolving
Static data
Global data
Stack
Local variables and function parameters
Size of data types
Memory placement
Data alignment
Decide, what is in memory and what gets an
allocated register
 
I
n
s
t
r
u
c
t
i
o
n
 
s
e
l
e
c
t
i
o
n
1:N
One intermediate code instruction corresponds with N target
instructions
Simple, the generated code is considerably suboptimal
M:N
M 
intermediate code instructions correspond with 
N 
target
instructions 
NP-
complete problem of selection
Heuristics
More possibilities of code generation
CISC, SIMD, VLIW
Non-orthogonal instruction set problem
 
R
e
g
i
s
t
e
r
 
a
l
l
o
c
a
t
i
o
n
What is placed to the memory and what is placed
into a register of a required register type
Non-orthogonal instruction set problem
Coloring of a graph of variable liveness
The graph is constructed only for a selected variable type
(e.g. integer variables)
The number of colors is equal to the number of available
registers of the selected type
When there is no graph coloring, remove “less important”
graph nodes and try again
We don’t need to find an optimal graph coloring, we are
looking for any graph coloring
 
I
n
s
t
r
u
c
t
i
o
n
 
s
c
h
e
d
u
l
i
n
g
Rearrange instructions keeping semantic and
avoiding pipeline stalls
RAW, WAW, WAR
 dependencies
Construct graph of data dependency
Oriented graph, where an edge represents instruction
ordering for data availability
Find any topological order
Instruction latency, speculations, superscalarity, out-of-
order
 
execution
Usually performed upon one basic block
Slide Note
Embed
Share

In the realm of processor architectures and compiler principles lies a crucial interplay that influences code generation and optimizations. The architecture of a processor directly impacts the backend operations of a compiler, guiding the generation of intermediate code and instructions. Explore the significance of registers, instruction sets, pipelining, superscalar processing, and out-of-order execution in optimizing performance and efficiency.

  • Processor Architectures
  • Compiler Principles
  • Registers
  • Instruction Sets
  • Pipelining

Uploaded on Feb 28, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Compiler principles Processor architectures Jakub Yaghob

  2. Processor architectures Processor architecture is a target language For compilers targeting a processor code It influences the compiler backend, namely code generator It can affect an intermediate code form and instructions, used high-level optimizations

  3. Registers The fastest memory Precious resource x86 has 7 of 32-bits integer registers x86_64 has 15 of 64-bit integer registers IA-64 has 128 of 64-bits integer registers Different register types for different data types Integer, floating point, address, vector Different access modes Direct Stack (x87 FPU)

  4. Instruction set RISC Small number of simple instructions e.g. no division CISC Complex instructions, wide repertoire Load-Execute-Store Register-only operands for arithmetics Orthogonality x86 has non-orthogonal instruction set

  5. Pipelining Next instruction fetch and decode started before the previous instruction was finished Each stage and each instruction has a latency Problem with operand dependency (RAW)

  6. Superscalar processor More equal units capable of parallel instruction execution

  7. Out-of-Order Execution Instruction fetch Instruction dispatch to an instruction queue The instruction waits until its input operands are available The instruction is issued to its execution unit The results are queued When all older instructions written their results to the register file, the instruction writes its result

  8. Out-of-Order Execution AMD

  9. Out-of-Order Execution ARM Cortex-A72

  10. Branch prediction Deep pipelines have a problem with not taken conditional jumps Dynamic branch prediction BTB CPU uses history (patterns of some length) for branch prediction Static prediction used when no history is available Static branch prediction No hint Forward jump is not taken, backward jump is taken Hint A compiler computes branch probability and generates appropriate branch hint Delay slot

  11. Speculative execution Execution of a code in advance which is not used later Significant disproportion between CPU and memory speed Usually used for read operations in advance CPU postpones write operations Memory barriers Code speculation Check for successful execution Data speculation Moved forward even with possible pointer-aliasing

  12. SIMD instructions Sometimes called multimedia instructions Integer and float data types Expression vectorization already in intermediate code or later in code generation Considerable (constant) execution speedup Sometimes difficult detection of vectorization opportunity

  13. VLIW Instruction encoding Templates in IA-64 ILP (Instruction Level Parallelism) Concurrency encoded directly in instructions Dependencies in concurrency group RAW, WAW disallowed WAR allowed

  14. SoC (System on Chip) + microcontrollers Small memory space Optimization for size of code and data One-bit variables Separated memory spaces Code and several data address spaces Different access Simple pipeline, no superscalarity

  15. Code generation Memory allocation Instruction selection Register allocation Instruction scheduling Output Output file format Objects Code, data, relocations, external and public symbols, debug information

  16. Memory allocation Code Jump resolving Static data Global data Stack Local variables and function parameters Size of data types Memory placement Data alignment Decide, what is in memory and what gets an allocated register

  17. Instruction selection 1:N One intermediate code instruction corresponds with N target instructions Simple, the generated code is considerably suboptimal M:N M intermediate code instructions correspond with N target instructions NP-complete problem of selection Heuristics More possibilities of code generation CISC, SIMD, VLIW Non-orthogonal instruction set problem

  18. Register allocation What is placed to the memory and what is placed into a register of a required register type Non-orthogonal instruction set problem Coloring of a graph of variable liveness The graph is constructed only for a selected variable type (e.g. integer variables) The number of colors is equal to the number of available registers of the selected type When there is no graph coloring, remove less important graph nodes and try again We don t need to find an optimal graph coloring, we are looking for any graph coloring

  19. Instruction scheduling Rearrange instructions keeping semantic and avoiding pipeline stalls RAW, WAW, WAR dependencies Construct graph of data dependency Oriented graph, where an edge represents instruction ordering for data availability Find any topological order Instruction latency, speculations, superscalarity, out-of- order execution Usually performed upon one basic block

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#