Intel CPU Architectures Overview: Evolution and Features

 
Figures and data from Arstechnica
Figures and data from Arstechnica
arstechnica.com/old/content/2004/07/pentium-1.ars
arstechnica.com/old/content/2004/07/pentium-1.ars
arstechnica.com/old/content/2001/05/p4andg4e.ars
arstechnica.com/old/content/2001/05/p4andg4e.ars
arstechnica.com/old/content/2004/02/pentium-m.ars
arstechnica.com/old/content/2004/02/pentium-m.ars
arstechnica.com/hardware/news/2006/04/core.ars
arstechnica.com/hardware/news/2006/04/core.ars
arstechnica.com/hardware/news/2008/04/what-you-need-to-know-about-nehalem.ars
arstechnica.com/hardware/news/2008/04/what-you-need-to-know-about-nehalem.ars
Some Intel CPU examples
 
Pentium
 
Dual Issue
Two 5-stage integer
pipes (some
restrictions)
1: Prefetch/fetch
2: Decode 1
Branch predict (75%)
3: Decode 2
Address computation
4: Execute
5: Write back
6-stage float pipe
 
2
 
Pentium Pro, II, III
 
3 instruction issue
2 simple, 1 complex
40-entry ROB
Rotating queue
Execution
5 issue ports
Store addr/data
1 cycle EX for most
*÷ 4-cycle latency, 1
cycle issue
 
3
 
Pentium Pro, II, III
 
12-stage pipe
1-4.5: BTB & IF
Prediction 90+%
4.5-6: Decode
7: ROB rename
8: Write RS (20 inst.)
9: Issue
10: Execute
11-12: Retire
 
4
 
P4 (Pentium 4)
 
Trace cache
Internal RISC ISA
90% Hit rate
ROM for long
instructions
Mini BTB for trace
cache branches
20+ stage pipeline
More on trace cache
miss
 
5
 
P4 (Pentium 4)
 
1-2: Trace cache next IP
3-4: Trace cache fetch
5: Drive signals
6-8: Allocate & Rename
128 µreg
9: Queue
10-12: Schedule
13-14: Dispatch
Up to 6 per cycle
15-16: Register file
17: Execute
18: Flags
19: Branch check
20: Drive signals
 
 
6
 
Pentium M
 
Branch prediction
4k BTB
Loop predictor
Indirect predictor
µop fusion
Avoid ROB
 
7
 
Core
 
8
 
← 96 entry
 
Core Decode
 
4-7 issue to 7 µop
Multiple x86 to one
µop
Macro-fusion merges
across x86 ops
µop fusion to avoid
ROB
 
9
 
Memory Speculation
 
store A, addr1
-stall-
load addr2, B
-stall-
add B,C,D
 
If addr1 = addr2
Aliasing
 
load addr2, B
store A, add1
add B,C,D
 
 
 
If addr1 ≠ addr2
Assume no aliasing
Restart if wrong
 
10
 
Nehalem
 
Rely on hyperthreading
128-entry ROB
36-entry RS
 
11
Slide Note
Embed
Share

Explore the evolution and key features of various Intel CPU architectures including Pentium, Core, and Pentium 4 series. Learn about the pipeline stages, instruction issue capabilities, branch prediction mechanisms, cache designs, and memory speculation techniques employed in these processors. Gain insights into the advancements in CPU microarchitecture over the years.

  • Intel CPU
  • Architecture
  • Evolution
  • Features
  • Pipeline

Uploaded on Oct 06, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Some Intel CPU examples Figures and data from Arstechnica arstechnica.com/old/content/2004/07/pentium-1.ars arstechnica.com/old/content/2001/05/p4andg4e.ars arstechnica.com/old/content/2004/02/pentium-m.ars arstechnica.com/hardware/news/2006/04/core.ars arstechnica.com/hardware/news/2008/04/what-you-need-to-know-about-nehalem.ars

  2. 2 Pentium Dual Issue Two 5-stage integer pipes (some restrictions) 1: Prefetch/fetch 2: Decode 1 Branch predict (75%) 3: Decode 2 Address computation 4: Execute 5: Write back 6-stage float pipe

  3. 3 Pentium Pro, II, III 3 instruction issue 2 simple, 1 complex 40-entry ROB Rotating queue Execution 5 issue ports Store addr/data 1 cycle EX for most * 4-cycle latency, 1 cycle issue

  4. 4 Pentium Pro, II, III 12-stage pipe 1-4.5: BTB & IF Prediction 90+% 4.5-6: Decode 7: ROB rename 8: Write RS (20 inst.) 9: Issue 10: Execute 11-12: Retire

  5. 5 P4 (Pentium 4) Trace cache Internal RISC ISA 90% Hit rate ROM for long instructions Mini BTB for trace cache branches 20+ stage pipeline More on trace cache miss

  6. 6 P4 (Pentium 4) 1-2: Trace cache next IP 3-4: Trace cache fetch 5: Drive signals 6-8: Allocate & Rename 128 reg 9: Queue 10-12: Schedule 13-14: Dispatch Up to 6 per cycle 15-16: Register file 17: Execute 18: Flags 19: Branch check 20: Drive signals

  7. 7 Pentium M Branch prediction 4k BTB Loop predictor Indirect predictor op fusion Avoid ROB

  8. 8 Core 96 entry

  9. 9 Core Decode 4-7 issue to 7 op Multiple x86 to one op Macro-fusion merges across x86 ops op fusion to avoid ROB

  10. 10 Memory Speculation store A, addr1 -stall- load addr2, B -stall- add B,C,D load addr2, B store A, add1 add B,C,D If addr1 = addr2 Aliasing If addr1 addr2 Assume no aliasing Restart if wrong

  11. 11 Nehalem Rely on hyperthreading 128-entry ROB 36-entry RS

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#