Address Prediction and Recovery in EECS 470 Lecture Winter 2024

 
EECS 470
 
Branches:
Address prediction and recovery
(And interrupt recovery too.)
Lecture 6 – Winter 2024
 
 
 
Slides developed in part by Profs. Austin, Brehob, Falsafi, 
Hill, Hoe, Lipasti,
Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie
Mellon University, Purdue University, University of Michigan, University of
Pennsylvania, and University of Wisconsin.
 
1
Warning: Crazy times coming
 
P2 due today!
HW2 due on Friday 2/2
Will do group formation and project materials on Tuesday 2/6.
G
r
o
u
p
 
f
o
r
m
a
t
i
o
n
 
w
i
l
l
 
b
e
 
d
o
n
e
 
i
n
 
c
l
a
s
s
 
o
n
 
T
h
u
r
s
d
a
y
 
2
/
8
P3 is due on Sunday 2/9
It’s a lot of work (20 hours?)
Proposal is due on Tuesday 2/13
It’s not a lot of work (1 hour?) to do the write-up, but you’ll need to meet
with your group and discuss things.
Don’t worry too much about getting this right.  You’ll be allowed to change
(we’ll meet the following Friday).  Just a line in the sand.  Will discuss on 2/6.
HW3 is due on Wednesday 2/14.  No late homework. Answers
posted right after it’s due!
Midterm is on Thursday 2/15 in the evening
Review session over that weekend (date/time TBA)
Q&A in class 2/15
Best way to study is look at old exams (posted on-line later this week)
20 minute proposal meetings on Friday 2/16 rather than lab
 
Announcements:
 
Combining Branch Predictors
, S.
McFarling, WRL Technical Note TN-36,
June 1993.
On the website
Part of HW2.
Expect a (likely brief) question on the
midterm.
 
Last time:
 
Started on branch predictors
Branch prediction consists of
Branch taken predictor
Address predictor
Mis-predict recovery.
Discussed direction predictors and started
looking at some variations
 
 
 
Today
 
Predictors
Bimodal
Local history predictor
Global history predictor
Gshare
Tournament
Do detailed example
Start on the “P6 scheme”
 
What are the limitations of
Tomasulo’s Algorithm?
 
Branches are a pain.
 
 
Instructions that 
might
 throw an exception
are a pain.
Using history—bimodal
1-bit history (direction predictor)
Remember the last direction for a branch
branchPC
 
Using history—bimodal
 
2-bit history (direction predictor)
branchPC
 
Using History Patterns
 
~80 percent of branches are either heavily
TAKEN or heavily NOT-TAKEN
For the other 20%, we need to look a
patterns of reference to see if they are
predictable using a more complex
predictor
Example: 
gcc has a branch that flips each time
 
T(1)  NT(0)     10101010101010101010101010101010101010
Local history
branchPC
 
10101010
 
 
Pattern History
Table
 
Branch History
Table
What is the prediction
for this BHT 10101010?
When do I update the tables?
Local history
branchPC
01010101
 
Pattern History
Table
Branch History
Table
On the next execution of this
branch instruction, the branch
history table is 01010101, 
pointing to a different pattern
What is the accuracy of a flip/flop branch 0101010101010…?
Global history
01110101
Pattern History
Table
Branch History
Register
if (aa == 2)
 
aa = 0;
if (bb == 2)
 
bb = 0;
if (aa != bb) { …
How can branches interfere with each other?
 
Relative performance (Spec ‘89)
Gshare predictor
Ref: Combining Branch Predictors
branchPC
01110101
Pattern History
Table
Branch History
Register
xor
Hybrid predictors
Local predictor
(e.g. 2-bit)
Global/gshare predictor
(much more state)
Prediction
    1
              Prediction
            2
Selection table
(2-bit state machine)
 
How do you select which predictor to use?
How do you update the various predictor/selector?
Prediction
 
“Trivial” example:
Tournament Branch Predictor
 
Local
8-entry 3-bit local history table indexed by PC
8-entry 2-bit up/down counter indexed by local history
Global
8-entry 2-bit up/down counter indexed by global history
Tournament
8-entry 2-bit up/down counter indexed by PC
 
Branch
History
Register
 
r1=2, r2=6, r3=10, r4=12, r5=4
Address of joe =0x100 and each instruction is 4 bytes.
Branch History Register = 110
joe:
 
 add r1 r2 
r3
  
 beq r3 r4 next
  
 bgt r2 r3  skip // if r2>r3 branch
  
 lw 
r6
 4(r5)
  
 add r6 r8 
r8
skip:
 
 add r5 r2 
r2
  
 bne r4 r5 joe
next:
 
 noop
 
Overriding Predictors
 
Big predictors are slow, but more accurate
Use a single cycle predictor in fetch
Start the multi-cycle predictor
When it completes, compare it to the fast prediction.
If same, do nothing
If different, assume the slow predictor is right and flush
pipline.
Advantage: reduced branch penalty for those
branches mispredicted by the fast predictor and
correctly predicted by the slow predictor
 
BTB
(Chapter 3.9)
 
Branch Target Buffer
Addresses predictor
Lots of variations
Keep the target of “likely taken” branches
in a buffer
With each branch, associate the expected
target.
 
 
BTB indexed by current PC
If entry is in BTB fetch target address next
Generally set associative (too slow as FA)
Often qualified by branch taken predictor
So…
 
B
T
B
 
l
e
t
s
 
y
o
u
 
p
r
e
d
i
c
t
 
t
a
r
g
e
t
 
a
d
d
r
e
s
s
 
d
u
r
i
n
g
 
t
h
e
f
e
t
c
h
 
o
f
 
t
h
e
 
b
r
a
n
c
h
!
If BTB gets a miss, pretty much stuck with not-
taken as a prediction
So limits prediction accuracy.
Can use BTB as a predictor.
If it is there, predict taken.
Replacement is an issue
LRU seems reasonable, but only really want
branches that are taken at least a fair amount.
 
What branches will a BTB struggle with?
How to address that?
 
Pipeline recovery is pretty simple
 
Squash and restart fetch with right
address
Just have to be sure that nothing has
“committed” its state yet.
In our 5-stage pipe, state is only
committed during MEM (for stores) and
WB (for registers)
 
Tomasulo’s
 
Recovery seems really hard
What if instructions after the branch finish
after we find that the branch was wrong?
This could happen.  Imagine
R1=MEM[R2+0]
BEQ R1, R3 DONE 
 Predicted not taken
R4=R5+R6
So we have to not speculate on branches or
not let anything pass a branch
Which is really the same thing.
Branches become serializing instructions.
Note that can be executing some things before and
after the branch once branch resolves.
 
What we need is:
 
Some way to not commit instructions until
all branches before it are committed.
Just like in the pipeline, something could have
finished execution, but not updated anything
“real” yet.
 
Interrupts
 
These have a similar problem.
If we can execute out-of-order a “slower”
instruction might not generate an interrupt
until an instruction in front of it has finished.
This sounds like the end of out-of-order
execution
I mean, if we can’t finish out-of-order, isn’t this
pointless?
 
Exceptions and Interrupts
 
Precise Interrupts
 
Implementation
approaches
Don’t
E.g., Cray-1
Buffer speculative results
E.g., P4, Alpha 21264
History buffer
Future file/Reorder buffer
 
Instructions
Completely
Finished
 
No Instruction
Has Executed
At All
 
PC
 
Precise State
 
Speculative State
M
E
M
 
Precise Interrupts via the Reorder Buffer
 
@
 
A
l
l
o
c
Allocate result storage at Tail
@
 
S
c
h
e
d
Get inputs (ROB T-to-H then ARF)
Wait until all inputs ready
@
 
W
B
Write results/fault to ROB
Indicate result is ready
@
 
C
T
Wait until inst @ Head is done
If fault, initiate handler
Else, write results to ARF
Deallocate entry from ROB
I
F
I
D
A
l
l
o
c
S
c
h
e
d
E
X
C
T
 
Head
 
Tail
PC
Dst regID
Dst value
Except?
 
Reorder Buffer (ROB)
Circular queue of spec state
May contain multiple definitions
of 
same
 register
 
In-order
 
In-order
 
Any order
A
R
F
 
Reorder Buffer Example
 
Code Sequence
 
  f1 = f2 / f3
  r3 = r2 + r3
  r4 = r3 – r2
 
Initial Conditions
 
  - reorder buffer empty
  - f2 = 3.0
  - f3 = 2.0
  - r2 = 6
  - r3 = 5
 
ROB
 
Time
 
H
 
T
 
regID: f1
result: ?
Except: ?
 
H
 
T
 
regID: f1
result: ?
Except: ?
 
regID: r3
result: ?
Except: ?
 
H
 
T
 
regID: f1
result: ?
Except: ?
 
regID: r3
result: 11
Except: N
 
regID: r4
result: ?
Except: ?
r3
 
regID: r8
result: 2
Except: n
 
regID: r8
result: 2
Except: n
 
regID: r8
result: 2
Except: n
 
Reorder Buffer Example
 
Code Sequence
 
  f1 = f2 / f3
  r3 = r2 + r3
  r4 = r3 – r2
 
Initial Conditions
 
  - reorder buffer empty
  - f2 = 3.0
  - f3 = 2.0
  - r2 = 6
  - r3 = 5
 
ROB
 
Time
 
H
 
T
 
regID: f1
result: ?
Except: ?
 
regID: r3
result: 11
Except: n
 
regID: r4
result: 5
Except: n
 
H
 
T
 
regID: f1
result: ?
Except: y
 
regID: r3
result: 11
Except: n
 
regID: r4
result: 5
Except: n
 
regID: r8
result: 2
Except: n
 
regID: r8
result: 2
Except: n
 
H
 
T
 
regID: f1
result: ?
Except: y
 
regID: r3
result: 11
Except: n
 
regID: r4
result: 5
Except: n
 
Reorder Buffer Example
 
Code Sequence
 
  f1 = f2 / f3
  r3 = r2 + r3
  r4 = r3 – r2
 
Initial Conditions
 
  - reorder buffer empty
  - f2 = 3.0
  - f3 = 2.0
  - r2 = 6
  - r3 = 5
 
ROB
 
Time
 
H
 
T
 
H
 
T
 
first inst
of fault
handler
 
There is more complexity here
 
Rename table needs to be cleared
Everything is in the ARF
Really do need to finish everything which was
before the faulting instruction in program order.
What about branches?
Would need to drain everything before the branch.
Why not just squash everything that follows it?
 
And while we’re at it…
 
Does the ROB replace the RS?
Is this a good thing?  Bad thing?
 
ROB
 
ROB
ROB is an 
in-order
 queue where instructions are placed.
Instructions 
complete
 
(retire) in-order
Instructions still 
execute
 out-of-order
Still use RS
Instructions are issued to RS and ROB at the same time
Rename is to ROB entry, not RS.
When 
execute
 done instruction leaves RS
Only when all instructions in before it in program order are
done does the instruction retire.
 
Adding a Reorder Buffer
 
Tomasulo Data Structures
(Timing Free Example)
 
Review Questions
 
Could we make this work without a RS?
If so, why do we use it?
Why is it important to retire in order?
Why must branches wait until retirement before
they announce their mispredict?
Any other ways to do this?
 
 
More review questions
 
1.
What is the purpose of the RoB?
2.
Why do we have both a RoB and a RS?
Yes, that was pretty much on the last page…
3.
Misprediction
a)
When to we resolve a mis-prediction?
b)
What happens to the main structures (RS, RoB,
ARF, Rename Table) when we mispredict?
4.
What is the whole purpose of OoO execution?
 
Can We Add Superscalar?
 
Dynamic scheduling and multiple issue are orthogonal
E.g., Pentium4: dynamically scheduled 5-way superscalar
Two dimensions
N
:
 
s
u
p
e
r
s
c
a
l
a
r
 
w
i
d
t
h
 
(
n
u
m
b
e
r
 
o
f
 
p
a
r
a
l
l
e
l
 
o
p
e
r
a
t
i
o
n
s
)
W
:
 
(
n
u
m
b
e
r
 
o
f
 
r
e
s
e
r
v
a
t
i
o
n
 
s
t
a
t
i
o
n
s
)
 
W
h
a
t
 
d
o
 
w
e
 
n
e
e
d
 
f
o
r
 
a
n
 
N
-
b
y
-
W
 
T
o
m
a
s
u
l
o
?
R
S
:
 
N
 
t
a
g
/
v
a
l
u
e
 
w
-
p
o
r
t
s
 
(
D
)
,
 
N
 
v
a
l
u
e
 
r
-
p
o
r
t
s
 
(
S
)
,
 
2
N
 
t
a
g
 
C
A
M
s
 
(
W
)
S
e
l
e
c
t
 
l
o
g
i
c
:
 
W
N
 
p
r
i
o
r
i
t
y
 
e
n
c
o
d
e
r
 
(
S
)
M
T
:
 
2
N
 
r
e
a
d
-
p
o
r
t
s
 
(
D
)
,
 
N
 
w
r
i
t
e
-
p
o
r
t
s
 
(
D
)
R
F
:
 
2
N
 
r
e
a
d
-
p
o
r
t
s
 
(
D
)
,
 
N
 
w
r
i
t
e
-
p
o
r
t
s
 
(
W
)
C
D
B
:
 
N
 
(
W
)
Which are the expensive pieces?
 
Superscalar Select Logic
 
Superscalar select logic: W
N priority encoder
Somewhat complicated (N
2
 logW)
Can simplify using different RS designs
S
p
l
i
t
 
d
e
s
i
g
n
Divide RS 
into N banks: 1 per FU?
Implement N separate W/N

+


F
I
F
O
 
d
e
s
i
g
n
 
[
P
a
l
a
c
h
a
r
l
a
+
]
Can issue only head of each RS 
bank
+
Simpler: no select logic at all
Less scheduling flexibility (but surprisingly not that bad)
Slide Note
Embed
Share

Explore the concepts of address prediction, recovery, and interrupt recovery in EECS 470 lecture featuring slides developed by prominent professors. Topics include branch predictors, limitations of Tomasulo's Algorithm, various prediction schemes, branch history tables, and more. Dive into bimodal, global, and local history predictors, tournament schemes, and example applications. Understand the complexities of branch prediction algorithms and enhancing prediction accuracy in modern computing systems.


Uploaded on Mar 22, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2024 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, University of Pennsylvania, and University of Wisconsin. 1

  2. Announcements: Combining Branch Predictors, S. McFarling, WRL Technical Note TN-36, June 1993. On the website Part of HW2. Expect a (likely brief) question on the midterm.

  3. Last time: Started on branch predictors Branch prediction consists of Branch taken predictor Address predictor Mis-predict recovery. Discussed direction predictors and started looking at some variations

  4. Today Predictors Bimodal Local history predictor Global history predictor Gshare Tournament Do detailed example Start on the P6 scheme

  5. What are the limitations of Tomasulo s Algorithm? Branches are a pain. Instructions that might throw an exception are a pain.

  6. Using historybimodal 1-bit history (direction predictor) Remember the last direction for a branch branchPC T NT

  7. Using historybimodal 2-bit history (direction predictor) branchPC NT ST SN T

  8. Using History Patterns ~80 percent of branches are either heavily TAKEN or heavily NOT-TAKEN For the other 20%, we need to look a patterns of reference to see if they are predictable using a more complex predictor Example: gcc has a branch that flips each time T(1) NT(0) 10101010101010101010101010101010101010

  9. Local history branchPC Branch History Table Pattern History Table 10101010 What is the prediction for this BHT 10101010? T NT When do I update the tables?

  10. Local history branchPC Branch History Table Pattern History Table 01010101 T NT On the next execution of this branch instruction, the branch history table is 01010101, pointing to a different pattern What is the accuracy of a flip/flop branch 0101010101010 ?

  11. Global history Pattern History Table Branch History Register 01110101 if (aa == 2) if (bb == 2) if (aa != bb) { aa = 0; bb = 0; How can branches interfere with each other?

  12. Relative performance (Spec 89)

  13. Gshare predictor branchPC Pattern History Table Branch History Register xor 01110101 Must read! Ref: Combining Branch Predictors

  14. Hybrid predictors Local predictor (e.g. 2-bit) Global/gshare predictor (much more state) Prediction 1 Prediction 2 Selection table (2-bit state machine) Prediction How do you select which predictor to use? How do you update the various predictor/selector?

  15. Trivial example: Tournament Branch Predictor Local 8-entry 3-bit local history table indexed by PC 8-entry 2-bit up/down counter indexed by local history Global 8-entry 2-bit up/down counter indexed by global history Tournament 8-entry 2-bit up/down counter indexed by PC

  16. Local predictor 2nd level table (PHT) 00=NT, 11=T Local predictor 1st level table (BHT) 0=NT, 1=T ADR[4:2] 0 1 2 3 4 5 6 7 History 001 101 100 110 110 001 111 101 History 0 1 2 3 4 5 6 7 Pred. state 00 11 10 00 01 01 11 11 Tournament selector 00=local, 11=global ADR[4:2] 0 1 2 3 4 5 6 7 Pred. state 00 01 00 10 11 00 11 10 Global predictor table 00=NT, 11=T History 0 1 2 3 4 5 6 7 Pred. state 11 10 00 00 00 11 11 00 Branch History Register

  17. Global predictor table 00=NT, 11=T Local predictor 2nd level table (PHT) 00=NT, 11=T Local predictor 1st level table (BHT) 0=NT, 1=T Tournament selector 00=local, 11=global ADR[4:2] 0 1 2 3 4 5 6 7 Pred. state 00 01 00 10 11 00 11 10 History 0 1 2 3 4 5 6 7 Pred. state 11 10 00 00 00 11 11 00 ADR[4:2] 0 1 2 3 4 5 6 7 History 001 101 100 110 110 001 111 101 History 0 1 2 3 4 5 6 7 Pred. state 00 11 10 00 01 01 11 11 r1=2, r2=6, r3=10, r4=12, r5=4 Address of joe =0x100 and each instruction is 4 bytes. Branch History Register = 110 joe: skip: next: add r1 r2 r3 beq r3 r4 next bgt r2 r3 skip // if r2>r3 branch lw r6 4(r5) add r6 r8 r8 add r5 r2 r2 bne r4 r5 joe noop

  18. Overriding Predictors Big predictors are slow, but more accurate Use a single cycle predictor in fetch Start the multi-cycle predictor When it completes, compare it to the fast prediction. If same, do nothing If different, assume the slow predictor is right and flush pipline. Advantage: reduced branch penalty for those branches mispredicted by the fast predictor and correctly predicted by the slow predictor

  19. Address

  20. BTB (Chapter 3.9) Branch Target Buffer Addresses predictor Lots of variations Keep the target of likely taken branches in a buffer With each branch, associate the expected target.

  21. BTB indexed by current PC If entry is in BTB fetch target address next Generally set associative (too slow as FA) Often qualified by branch taken predictor Branch PC Target address 0x05360AF0 0x05360000

  22. So BTB lets you predict target address during the fetch of the branch! If BTB gets a miss, pretty much stuck with not- taken as a prediction So limits prediction accuracy. Can use BTB as a predictor. If it is there, predict taken. Replacement is an issue LRU seems reasonable, but only really want branches that are taken at least a fair amount. What branches will a BTB struggle with? How to address that?

  23. Pipeline recovery is pretty simple Squash and restart fetch with right address Just have to be sure that nothing has committed its state yet. In our 5-stage pipe, state is only committed during MEM (for stores) and WB (for registers)

  24. Tomasulos Recovery seems really hard What if instructions after the branch finish after we find that the branch was wrong? This could happen. Imagine R1=MEM[R2+0] BEQ R1, R3 DONE Predicted not taken R4=R5+R6 So we have to not speculate on branches or not let anything pass a branch Which is really the same thing. Branches become serializing instructions. Note that can be executing some things before and after the branch once branch resolves.

  25. What we need is: Some way to not commit instructions until all branches before it are committed. Just like in the pipeline, something could have finished execution, but not updated anything real yet.

  26. Interrupt!!!

  27. Interrupts These have a similar problem. If we can execute out-of-order a slower instruction might not generate an interrupt until an instruction in front of it has finished. This sounds like the end of out-of-order execution I mean, if we can t finish out-of-order, isn t this pointless?

  28. Exceptions and Interrupts Exception Type I/O request System call Breakpoint Overflow Page fault Misaligned access Memory Protect Machine Check Power failure Sync/Async Async Maskable? Restartable? Yes No Yes Yes No No No No No Yes Yes Yes Yes Yes Yes Yes No No Sync Sync Sync Sync Sync Sync Async/Sync Async

  29. Precise Interrupts Implementation approaches Don t E.g., Cray-1 Buffer speculative results E.g., P4, Alpha 21264 History buffer Future file/Reorder buffer Instructions Completely Finished Precise State PC Speculative State No Instruction Has Executed At All

  30. Precise Interrupts via the Reorder Buffer @ Alloc Allocate result storage at Tail @ Sched Get inputs (ROB T-to-H then ARF) Wait until all inputs ready @ WB Write results/fault to ROB Indicate result is ready @ CT Wait until inst @ Head is done If fault, initiate handler Else, write results to ARF Deallocate entry from ROB Any order MEM EX IF ID CT Alloc Sched In-order In-order ROB ARF PC Dst regID Dst value Except? Reorder Buffer (ROB) Circular queue of spec state May contain multiple definitions of same register Head Tail

  31. Reorder Buffer Example ROB Code Sequence regID: r8 result: 2 Except: n regID: f1 result: ? Except: ? f1 = f2 / f3 r3 = r2 + r3 r4 = r3 r2 H T regID: r8 result: 2 Except: n regID: r3 result: ? Except: ? regID: f1 result: ? Except: ? Time Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 H T regID: r8 result: 2 Except: n regID: r4 result: ? Except: ? regID: r3 result: 11 Except: N regID: f1 result: ? Except: ? r3 H T

  32. Reorder Buffer Example ROB Code Sequence regID: r8 result: 2 Except: n regID: r4 result: 5 Except: n regID: r3 result: 11 Except: n regID: f1 result: ? Except: ? f1 = f2 / f3 r3 = r2 + r3 r4 = r3 r2 H T regID: r4 result: 5 Except: n regID: r8 result: 2 Except: n regID: r3 result: 11 Except: n regID: f1 result: ? Except: y Time Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 H T regID: r4 result: 5 Except: n regID: r3 result: 11 Except: n regID: f1 result: ? Except: y H T

  33. Reorder Buffer Example ROB Code Sequence f1 = f2 / f3 r3 = r2 + r3 r4 = r3 r2 HT first inst of fault handler Time Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 H T

  34. There is more complexity here Rename table needs to be cleared Everything is in the ARF Really do need to finish everything which was before the faulting instruction in program order. What about branches? Would need to drain everything before the branch. Why not just squash everything that follows it?

  35. And while were at it Does the ROB replace the RS? Is this a good thing? Bad thing?

  36. ROB ROB ROB is an in-order queue where instructions are placed. Instructions complete (retire) in-order Instructions still execute out-of-order Still use RS Instructions are issued to RS and ROB at the same time Rename is to ROB entry, not RS. When execute done instruction leaves RS Only when all instructions in before it in program order are done does the instruction retire.

  37. Adding a Reorder Buffer

  38. CDB T Tomasulo Data Structures (Timing Free Example) V Map Table Reg Tag r0 r1 r2 r3 r4 Reservation Stations (RS) T FU busy op 1 2 3 4 5 ARF Reg V r0 r1 r2 r3 r4 RoB T1 T2 V1 V2 Instruction r0=r1*r2 r1=r2*r3 Branch if r1=0 r0=r1+r1 r2=r2+1 Reorder Buffer (RoB) RoB Number Dest. Reg. Value 0 1 2 3 4 5 6

  39. Review Questions Could we make this work without a RS? If so, why do we use it? Why is it important to retire in order? Why must branches wait until retirement before they announce their mispredict? Any other ways to do this?

  40. More review questions 1. What is the purpose of the RoB? 2. Why do we have both a RoB and a RS? Yes, that was pretty much on the last page 3. Misprediction a) When to we resolve a mis-prediction? b) What happens to the main structures (RS, RoB, ARF, Rename Table) when we mispredict? 4. What is the whole purpose of OoO execution?

  41. Can We Add Superscalar? Dynamic scheduling and multiple issue are orthogonal E.g., Pentium4: dynamically scheduled 5-way superscalar Two dimensions N: superscalar width (number of parallel operations) W: (number of reservation stations) What do we need for an N-by-W Tomasulo? RS: N tag/value w-ports (D), N value r-ports (S), 2N tag CAMs (W) Select logic: W N priority encoder (S) MT: 2N read-ports (D), N write-ports (D) RF: 2N read-ports (D), N write-ports (W) CDB: N (W) Which are the expensive pieces?

  42. Superscalar Select Logic Superscalar select logic: W Somewhat complicated (N2 logW) Can simplify using different RS designs Split design Divide RS into N banks: 1 per FU? Implement N separate W/N 1 encoders + Simpler: N * logW/N Less scheduling flexibility FIFO design [Palacharla+] Can issue only head of each RS bank + Simpler: no select logic at all Less scheduling flexibility (but surprisingly not that bad) N priority encoder

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#