Advanced Branch Prediction

undefined
CS5100 Advanced Computer Architecture
Advanced Branch Prediction
Prof. Chung-Ta King
Department of Computer Science
National Tsing Hua University, Taiwan
(Slides are from textbook, Prof. Hsien-Hsin Lee, Prof. Yasun Hsu, 
Prof. Onur Mutlu
)
1
1
About This Lecture
 
Goal:
To understand the techniques for reducing the cost of
branches
 
Outline:
Reducing branch cost with advanced branch prediction
(Sec. 3.3)
Prediction of branch direction: static, dynamic, branch
correlation
Prediction of branch target
 
Control Speculation with Branch Prediction
 
Modern processors have deep pipelines
Branch penalty limits performance of deep pipelines
Want to execute instructions beyond a branch even
before that branch is resolved 
 use
 
speculative
execution
Branch prediction: dynamic vs. static
 
What to predict?
2
What to Predict?
 
Direction (1-bit)
Single direction for unconditional jumps and calls/returns
Binary for conditional branches
Target (32-bit or 64-bit addresses)
Some are easy
One address: uni-directional jumps
Two: addresses: fall through (not taken) vs. taken
Many: function pointer or indirect jump (e.g. jr r31)
 
Ideally, one predictor for direction and one predictor
for target for each branch in the code
3
Static Branch Prediction for Direction
 
Uni-directional: always predict taken (or not taken)
Always-not-taken: easy (does not need  branch target
address), not effective for loops
Always-taken: branch target address needs to be
computed before the instruction flow can continue (may
take extra cycles)
Backward taken, forward not taken
Check sign of branch displacement: taken if negative, not-
taken if positive 
 no extra hardware needed
Good for, e.g., loops
Do not require HW support since the sign of target
displacement is already encoded in the branch instruction
4
Static Branch Prediction for Direction
Compiler hints with branch annotation
Run instrumented program with sample input data
Collect info on branch direction (profiling)
Use this profile info for prediction
Use a bit in branch instruction
Set to 1 if taken
Set to 0 if un-taken
Bits set by compiler or user
Once set, same behavior every time
5
Dynamic Branch Prediction for Direction
Predict branch based on past history of branch
One-bit 
Branch History Table 
(BHT)
6
PC
 
.
.
.
.
.
 
2
N 
entries
Prediction
 
N bits
FSM
Update
Logic
T
able update
Actual outcome
 
BHT: a cache of recent branches
Each entry stores last direction that
the indexed branch went (1 bit to
encode taken/not-taken)
No need to decode to know if it is a
branch, just look at instr. address
Hash
Problems with the Simple Predictor
Aliasing:
Two branches may be hashed to the same entry
 branch prediction history is polluted
Solution: make the table bigger, apply other cache
optimization strategies
Always mispredict twice for a loop, e.g.,
 
for (i=0; i<
4
; i++) {   …  }
7
0
 
Pred
 
Actual
 
T
 
T
 
 
 
T
 
T
 
 
NT
 
 
 
T
 
 
T
 
 
 
T
 
T
 
 
NT
 
 
T
 
2-bit Counter
2-bit saturating up/down counter predictor
8
Not Taken
Taken
Predict Not taken
Predict taken
ST: Strongly Taken
WT: Weakly Taken
WN: Weakly Not Taken
SN: Strongly Not Taken
Give inertial in responding external changes
For More Advanced Branch Prediction …
Hypothesis: recent branches are correlated; that is,
behavior of recently executed branches affects
prediction of current branch
Two possibilities: current branch depends on
Local 
behavior: Last 
m
 outcomes of the same branch (
local
branch predictor), e.g., a loop of 3 iterations is executed
repetitively 
 a history record of the loop branch of the
last 6 iterations should be able to predict the direction of
that branch correctly
Global 
behavior: Last 
m
 most recently executed branches
 
because branches are often 
correlated
!
9
BHT predicts this
Branches Are Correlated!
Branch direction of multiple branches
Not independent but correlated to the path taken
Example: path 1-1 of b3 can be known beforehand
10
if (aa==2)   // b1
   aa = 0;
if (bb==2)   // b2
   bb = 0;
if (aa!=bb) {// b3
       …… 
}
b1
b2
b2
b3
b3
b3
1 (T)
1
1
0 (NT)
0
b3
0
Path: A:1-1
B:1-0
C:0-1
D:0-0
 
How to capture global behavior?
Capturing Global Branch Correlation
 
Idea: associate branch outcomes with global T/NT
history of “all” branches
Make a prediction based on outcome of the branch the
last time the same 
global branch history 
was encountered
Implementation:
Keep track of the 
global T/NT history
 of all branches in a
register 
 
Global History Register 
(GHR)
Use GHR to index into a table that records the outcome
that was seen for each GHR value in the recent past 
Pattern History Table 
(table of 2-bit counters)
Global history/branch predictor
Uses two levels of history (GHR + history at that GHR)
11
Two Level Global Branch Prediction
1
st
 level: Global Branch History Register (N bits)
The direction of last N branches
2
nd
 level: Table of saturating counters for each history entry
12
. . . . .
00…..00
00…..01
00…..10
11…..11
11…..10
Branch History Pattern
Pattern History
Table (PHT)
Prediction 
Rc-k
Rc-1
Actual 
b
ranch 
o
utcome
FSM Update
Logic
Branch History Register (BHR)
(Shift left when update)
N
2
N 
entries
Current state 
PHT update
How Does the Global Predictor Work?
13
 
Branch b1 tests i & last 3 branches test j.
 
History: TTN
Predict taken for i 
 
Next history: TNT
  (shift in last outcome)
BHR
for(i=0; i<100; i++) {
   for(j=1; j<3; j++) {
      
 
...
   }  // b2
 
...
}  // b1
Outcome of b1 at i=7
b2 at i=7, j=1
b2 at i=7, j=2
b2 at i=7, j=3
b1 at i=8
Outcome of b2 at i=6, j=3
Differentiating Per Branch Behavior
Two different branches may have the same global
branch history but behave differently
14
Global 
BHR
Global PHT
GAg
Global 
BHR
..
Addr(B)
Per-addr PHTs
(PPHTs)
GAp
Capturing Local Correlation
 
But, we still want to capture the behavior of the
same branch
    
for(i=0; i<100; i++)
    for(j=0; j<3; j++)
 
{
 
 if (aa==2) aa = 0;
 
 if (bb==2) bb = 0;
 
 if (aa!=bb) {...}
    }
Idea: have a per-branch history
register
15
Hybrid Branch Predictor
Some branches correlated to global history, some
correlated to local history
Use more than one type of predictors and select “
best”
16
 
P0
 
 
 P1
 
 
.
.
.
Choice (or Meta) Predictor
Branch PC
Final Prediction
Tradeoff between Cost and Precision
 
Idea: add more context infor. to the 
global predictor
to take into account which branch is being predicted
(local predictor)
Gshare: GHR hashed with the Branch PC
+ Better utilization of PHT
-- Increases access latency
17
Outline
Prediction of branch direction:
Static
Dynamic
Branch correlation
Prediction of branch target
18
Prediction of Branch Targets
Need target address at same time as prediction
Branch Target Buffer 
(BTB): use PC to
 access I$ and
simultaneously
 look up BTB to get prediction AND branch
address (if taken)
19
=?
PC of instruction
Fetch
Yes: instruction is
branch and use
predicted PC as
next PC
No: branch not predicted,
proceed normally
Branch predicted
t
aken or untaken
How about Subroutine Returns?
 
Different call sites make return address hard to
predict
printf() may be called by many callers
Target of “return” instruction in printf() is a moving target
But return address is actually easy to predict
It is the address after the last call instruction that have not
returned from yet
Can use a 
Return Address Stack 
(RAS)
RAS:
Call will push return address on the stack
Return uses the prediction of top-of-stack
20
Return Address Stack
21
+
+
4
  
Call 
PC   
Push
Return
Address
BTB
 
May not know if it is a return instruction prior to decoding
Rely on BTB for speculation
Fix once recognize Return
Outline
Prediction of branch direction:
Static
Dynamic
Branch correlation
Prediction of branch target
Predicated execution
22
Predicated Execution
Idea: compiler converts control dependence into data
dependence 
 branch is eliminated
Each instr. has a predicate bit set based on the predicate computation
Only instr. with TRUE predicates are committed (others become NOPs)
23
 
if (cond) {
      b = 0;
}
else {
      b = 1;
}
Conditional Move Operations
Very limited form of predicated execution
CMOV R1 
 
R2
R1 = (ConditionCode == true) ? R2 : R1
Employed in most modern ISAs (x86, Alpha)
if (a == 5) {b = 4;} else {b = 3;}
CMPEQ condition, a, 5;
CMOV condition, b 
 4;
CMOV !condition, b 
 3;
24
Recap
Branch History Table: 2 bits for loop accuracy
Correlation: recently executed branches correlated
with next branch.
Either different branches
Or different executions of same branches
2-level predictor
Branch history and pattern history
Branch Target Buffer: include branch address and
prediction
Return address stack for return address of calls
25
Slide Note
Embed
Share

Techniques for reducing branch cost through advanced branch prediction methods such as static and dynamic prediction, branch correlation, and prediction of branch targets are essential for enhancing processor performance. Control speculation with branch prediction is utilized in modern processors with deep pipelines to enable speculative execution and mitigate branch penalties. Various strategies for predicting branch direction and target addresses are explored, including static branch prediction methods that utilize compiler hints and profiling information.

  • Branch Prediction
  • Computer Architecture
  • Advanced Techniques
  • Pipeline Optimization
  • Processor Performance

Uploaded on Feb 26, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. CS5100 Advanced Computer Architecture Advanced Branch Prediction Prof. Chung-Ta King Department of Computer Science National Tsing Hua University, Taiwan (Slides are from textbook, Prof. Hsien-Hsin Lee, Prof. Yasun Hsu, Prof. Onur Mutlu) National Tsing Hua University

  2. About This Lecture Goal: To understand the techniques for reducing the cost of branches Outline: Reducing branch cost with advanced branch prediction (Sec. 3.3) Prediction of branch direction: static, dynamic, branch correlation Prediction of branch target 1 1 National Tsing Hua University

  3. Control Speculation with Branch Prediction Modern processors have deep pipelines Branch penalty limits performance of deep pipelines Want to execute instructions beyond a branch even before that branch is resolved use speculative execution Branch prediction: dynamic vs. static What to predict? 2 National Tsing Hua University

  4. What to Predict? Direction (1-bit) Single direction for unconditional jumps and calls/returns Binary for conditional branches Target (32-bit or 64-bit addresses) Some are easy One address: uni-directional jumps Two: addresses: fall through (not taken) vs. taken Many: function pointer or indirect jump (e.g. jr r31) Ideally, one predictor for direction and one predictor for target for each branch in the code 3 National Tsing Hua University

  5. Static Branch Prediction for Direction Uni-directional: always predict taken (or not taken) Always-not-taken: easy (does not need branch target address), not effective for loops Always-taken: branch target address needs to be computed before the instruction flow can continue (may take extra cycles) Backward taken, forward not taken Check sign of branch displacement: taken if negative, not- taken if positive no extra hardware needed Good for, e.g., loops Do not require HW support since the sign of target displacement is already encoded in the branch instruction 4 National Tsing Hua University

  6. Static Branch Prediction for Direction Compiler hints with branch annotation Run instrumented program with sample input data Collect info on branch direction (profiling) Use this profile info for prediction Use a bit in branch instruction Set to 1 if taken Set to 0 if un-taken Bits set by compiler or user Once set, same behavior every time 5 National Tsing Hua University

  7. Dynamic Branch Prediction for Direction Predict branch based on past history of branch One-bit Branch History Table (BHT) PC Hash 2N entries . . . . . Table update N bits BHT: a cache of recent branches Each entry stores last direction that the indexed branch went (1 bit to encode taken/not-taken) No need to decode to know if it is a branch, just look at instr. address FSM Update Logic Actual outcome Prediction 6 National Tsing Hua University

  8. Problems with the Simple Predictor Aliasing: Two branches may be hashed to the same entry branch prediction history is polluted Solution: make the table bigger, apply other cache optimization strategies Always mispredict twice for a loop, e.g., for (i=0; i<4; i++) { } 1 1 1 Pred 0 1 1 1 0 1 1 0 1 T T NT T NT T T T T T T Actual 7 National Tsing Hua University

  9. 2-bit Counter 2-bit saturating up/down counter predictor Taken Not Taken 10/ WT 11/ ST Predict Not taken Predict taken ST: Strongly Taken WT: Weakly Taken WN: Weakly Not Taken SN: Strongly Not Taken 01/ WN 00/ SN Give inertial in responding external changes 8 National Tsing Hua University

  10. For More Advanced Branch Prediction Hypothesis: recent branches are correlated; that is, behavior of recently executed branches affects prediction of current branch BHT predicts this Two possibilities: current branch depends on Local behavior: Last m outcomes of the same branch (local branch predictor), e.g., a loop of 3 iterations is executed repetitively a history record of the loop branch of the last 6 iterations should be able to predict the direction of that branch correctly Global behavior: Last m most recently executed branches because branches are often correlated! 9 National Tsing Hua University

  11. Branches Are Correlated! Branch direction of multiple branches Not independent but correlated to the path taken Example: path 1-1 of b3 can be known beforehand if (aa==2) // b1 aa = 0; if (bb==2) // b2 bb = 0; if (aa!=bb) {// b3 } b1 1 (T) 0 (NT) b2 b2 1 0 1 0 b3 b3 b3 B:1-0 C:0-1 aa=0 bb 2 b3 D:0-0 aa 2 bb 2 Path: A:1-1 aa 2 bb=0 aa=0 bb=0 How to capture global behavior? 10 National Tsing Hua University

  12. Capturing Global Branch Correlation Idea: associate branch outcomes with global T/NT history of all branches Make a prediction based on outcome of the branch the last time the same global branch history was encountered Implementation: Keep track of the global T/NT history of all branches in a register Global History Register (GHR) Use GHR to index into a table that records the outcome that was seen for each GHR value in the recent past Pattern History Table (table of 2-bit counters) Global history/branch predictor Uses two levels of history (GHR + history at that GHR) 11 National Tsing Hua University

  13. Two Level Global Branch Prediction 1st level: Global Branch History Register (N bits) The direction of last N branches 2nd level: Table of saturating counters for each history entry 00 ..00 00 ..01 00 ..10 2N entries Branch History Register (BHR) (Shift left when update) Pattern History Table (PHT) Rc-1 Rc-k 1 1 . . . . . 1 0 N Prediction 11 ..10 Current state PHT update 11 ..11 Branch History Pattern FSM Update Logic Actual branch outcome 12 National Tsing Hua University

  14. How Does the Global Predictor Work? for(i=0; i<100; i++) { for(j=1; j<3; j++) { } // b2 ... } // b1 Outcome of b2 at i=6, j=3 ... Outcome of b1 at i=7 BHR b2 at i=7, j=1 b2 at i=7, j=2 b2 at i=7, j=3 b1 at i=8 Branch b1 tests i & last 3 branches test j. History: TTN Predict taken for i Next history: TNT (shift in last outcome) 13 National Tsing Hua University

  15. Differentiating Per Branch Behavior Two different branches may have the same global branch history but behave differently GAg Per-addr PHTs (PPHTs) Addr(B) GAp Global PHT Global BHR Global BHR . . . . . . . . . . . . .. 14 National Tsing Hua University

  16. Capturing Local Correlation But, we still want to capture the behavior of the same branch for(i=0; i<100; i++) for(j=0; j<3; j++) { if (aa==2) aa = 0; if (bb==2) bb = 0; if (aa!=bb) {...} } Idea: have a per-branch history register PAp Addr(B)Per-addr PHTs (PPHTs) Per-addr BHT (PBHT) . . . . . . . . . . . . Addr(B) .. 15 National Tsing Hua University

  17. Hybrid Branch Predictor Some branches correlated to global history, some correlated to local history Use more than one type of predictors and select best Branch PC P0 P1 . . . Final Prediction Choice (or Meta) Predictor 16 National Tsing Hua University

  18. Tradeoff between Cost and Precision Idea: add more context infor. to the global predictor to take into account which branch is being predicted (local predictor) Gshare: GHR hashed with the Branch PC + Better utilization of PHT -- Increases access latency 17 National Tsing Hua University

  19. Outline Prediction of branch direction: Static Dynamic Branch correlation Prediction of branch target 18 National Tsing Hua University

  20. Prediction of Branch Targets Need target address at same time as prediction Branch Target Buffer (BTB): use PC to access I$ and simultaneously look up BTB to get prediction AND branch address (if taken) Branch PC PC of instruction Predicted PC Fetch Yes: instruction is branch and use predicted PC as next PC =? Branch predicted taken or untaken No: branch not predicted, proceed normally 19 National Tsing Hua University

  21. How about Subroutine Returns? Different call sites make return address hard to predict printf() may be called by many callers Target of return instruction in printf() is a moving target But return address is actually easy to predict It is the address after the last call instruction that have not returned from yet Can use a Return Address Stack (RAS) RAS: Call will push return address on the stack Return uses the prediction of top-of-stack 20 National Tsing Hua University

  22. Return Address Stack Call PC Return PC 4 + Push Return Address BTB BTB Return? May not know if it is a return instruction prior to decoding Rely on BTB for speculation Fix once recognize Return 21 National Tsing Hua University

  23. Outline Prediction of branch direction: Static Dynamic Branch correlation Prediction of branch target Predicated execution 22 National Tsing Hua University

  24. Predicated Execution Idea: compiler converts control dependence into data dependence branch is eliminated Each instr. has a predicate bit set based on the predicate computation Only instr. with TRUE predicates are committed (others become NOPs) A (normal branch code) A (predicated code) T N B if (cond) { b = 0; } else { b = 1; } C B C D D A A p1 = (cond) branch p1, TARGET mov b, 1 jmp JOIN p1 = (cond) B B (!p1) mov b, 1 (p1) mov b, 0 C C TARGET: mov b, 0 D D add x, b, 1 add x, b, 1 23 National Tsing Hua University

  25. Conditional Move Operations Very limited form of predicated execution CMOV R1 R2 R1 = (ConditionCode == true) ? R2 : R1 Employed in most modern ISAs (x86, Alpha) if (a == 5) {b = 4;} else {b = 3;} CMPEQ condition, a, 5; CMOV condition, b 4; CMOV !condition, b 3; 24 National Tsing Hua University

  26. Recap Branch History Table: 2 bits for loop accuracy Correlation: recently executed branches correlated with next branch. Either different branches Or different executions of same branches 2-level predictor Branch history and pattern history Branch Target Buffer: include branch address and prediction Return address stack for return address of calls 25 National Tsing Hua University

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#