Amoeba Cache: Adaptive Blocks for Memory Hierarchy Optimization

Amoeba-Cache
Adaptive Blocks for Eliminating
Waste in the Memory Hierarchy
Snehasish Kumar
Arrvindh Shriraman
Eric Matthews
Lesley Shannon
Hongzhou Zhao
Sandhya Dwarkadas
On-chip Storage
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
2
       Fixed granularity cache
Tag Array
Data Array
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
3
 
Cache data utilization
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
4
Tags
Data
Untouched
Data
Tag Array
Data Array
Utilization = Fraction of words touched in
cache block at the time of eviction
apache
cann.
eclipse
firefox
h2
jbb
lbm
mcf
tpcc
x264
 
Cache utilization
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
5
 
Block Distribution
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
6
1-2
3-4
5-6
7-8
Apache
Eclipse
Firefox
Canneal
# Words
Touched
64K – 64B/block
 
 Block Distribution
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
7
1-2
3-4
5-6
7-8
Canneal
Canneal
# Words
Touched
64K – 64B/block
1M – 64B/block
Application specific behaviour
Inefficient data structure access
patterns
Interaction with cache geometry
Way conflicts reduce block lifetime
and cause poor utilization
 
Factors affecting cache utilization
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
8
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
9
 
Application Specific Behaviour
struct TIE {
 
long long X, Y, Z;
 
long long V, H;
 
long long data[3];
} Imperial[1024];
for (int i=0; i<1024; i++)
{
 
Imperial[i].X = …;
 
Imperial[i].Y = …;
 
Imperial[i].Z = …;
 
Imperial[i].V = …;
}
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
10
 
Cache Geometry
Data Array – 4 ways
Problem : Lots of data map to same set
 
1
 
2
 
3
 
4
 
5
1.
Shrinks effective cache
space
2.
Increases miss rate
3.
Wastes on-chip
bandwidth
4.
Increases on-chip cache
energy consumption
 
 
Implications
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
11
=
Miss 
Rate
Space 
Utilisation
Bandwidth
 
Target Metrics
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
12
 
Variable Granularity Blocks
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
13
Tag Array
Data Array
How to support variable # of blocks / set ?
How to support variable granularity for
each block?
 
Our Approach : Amoeba Cache
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
14
Unified SRAM Array
Amoeba Cache
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
15
Insert
Lookup
Partial Miss
Overheads
 
SRAM Array
SRAM Array
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
16
 
Tag
 
Data Block
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
17
 
Tag - Regions
Memory 
Region
RMAX
bytes
 
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
18
 
Example
 
struct TIE {
 
long long X, Y, Z;
 
long long V, H;
 
long long data[3];
} Imperial;
 
Imperial.X = … ;
Invoke Spatial Granularity Predictor
(PC/Region based)
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
19
00000000
Valid?
Tag?
 
Amoeba Cache – Insert (8words/set)
00000000
SRAM Array / Set
 
Miss
Insert 4+1 words
Pos: 0
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
20
00000000
Valid?
Tag?
 
Amoeba Cache – Insert (8words/set)
00000000
SRAM Array / Set
11111000
 
Refill
2
10000000
3
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
21
 
Example
struct TIE {
 
long long X, Y, Z;
 
long long V, H;
 
long long data[3];
} Imperial;
 
Imperial.Y = … ;
Lookup Data from the cache
Data[3]
X
Y
H
Z
V
X
Y
Z
V
 
Amoeba Cache – Lookup (8words/set)
Region
Tag
Set
Index
Word
(W)
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
22
Tag
X
Y
Z
V
SRAM Array / Set
10000000
2x1
2x1
2x1
2x1
Tag?
Word Selector
Tag
X
Y
Z
V
Output Buffer
Critical Path
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
23
 
Partial Miss
Identify Sub-Blocks
Step 1 of 2
 
MSHR
2
 
Evict Overlap
Tag
X
Y
Tag
V
H
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
24
 
Partial Miss
Insert New Block
Step 2 of 2
MSHR
Allocate 6 words
Occurs ≈ 5 in 1000
accesses
X
Y
?
V
H
Z
 
Hardware Overheads
SRAM Array
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
25
Metadata
0000
Valid?
Tag?
0000
0000
0000
0000
0000
Critical Path
Extra
Amoeba Critical Path
1 KB
Latency +4%
Evaluation
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
26
Parameters for latency and energy
Workloads
 
Latency Parameters (cycles)
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
27
300
64K L1
1M LLC
CPU
1
3
20
 
Fixed Granularity
 
Amoeba Cache
1.04
 
Latency +4%
 
On-Chip Energy Parameters (pJ)
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
28
64K L1
1M LLC
101
230
Fixed Granularity
Amoeba Cache
≈ 7 / word
105
238
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
29
22 diverse workloads from
PARSEC
SPEC-CPU 2000 & 2006
DaCapo ( Java Benchmarks )
Apache, Firefox and PostgreSQL
 
Workloads
Results
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
30
 
 % Improvement in L1 Miss-Rate
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
31
Reduces  L1 and L2 miss rate by 
18%
 
% Improvement in L1 Miss-Bandwidth
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
32
Reduces on-chip bandwidth by 
46%
Reduces off-chip bandwidth by 
38%
 
% Improvement in memory energy
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
33
Reduces energy by 
11%
 
% Improvement in execution time
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
34
Improves performance by 
10%
 
Results Summary
Amoeba-Cache
Reduce cache pollution for applications with low
cache utilization
Improve performance for moderate cache
utilization
Maintain performance for high cache utilization
workloads
Save energy for streaming applications by
keeping out unused words
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
35
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
36
 
Additional Results
Lookup as an extra cache pipeline stage
vs. throttling the CPU
Spatial Granularity Predictor
Indexing
Training
Table Size
For extra pipeline stage, 8 of 22
applications show improvement
18 of 22 – Address region better
Evictions and First Touch
256 – PC and 1024 – Region
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
37
 
Additional Results
Multicore Shared Cache
Comparison against other designs
Fixed Granularity 2X
Sector Cache variants
Multi-$
Reduces miss rate (avg 18%) and LLC
miss bandwidth (16%-39%)
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
38
 
Amoeba Cache
What? 
Enable variable granularity data caching
Why?
Eliminate waste 
How?
Unify tag and data into a single SRAM array
Afforded by recent technology trends
Where?
Definitely at the L2, possibly at the L1
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
39
 
Frequently Asked Questions
1.
 Multiple threads?
2.
 Compare against other designs
3.
 Spatial Pattern Predictor
4.
 Replacement Policy
 
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
40
 
Multicore Shared Cache
 
Comparison
Impact on Miss-Rate
Impact on Bandwidth
Low tag overhead
Tradeoff data and tag space
Dynamically resize blocks
Amoeba Cache
Multi -$
Sector Variants
Yes
Yes
~
~
No
Yes
No
No
No
No
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
41
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
42
 
Comparison – Moderate Group – 64K
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
43
 
Spatial Pattern Predictor
Predictor History Table
1
PC : Read Addr
0
0
0
1
1
1
0
1
2
Critical Word
Policy Miss vs Policy-Bandwidth
What to do when there is no entry?
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
44
 
Predictor Training
Add / update
entry on evict
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
45
 
Predictor – L1 Miss Rate (1 of 2)
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
46
 
Predictor – L1 Miss Rate (2 of 2)
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
47
 
Predictor – L1 Miss Bandwidth (1 of 2)
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
48
 
Predictor – L1 Miss Bandwidth (2 of 2)
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
49
 
Predictor – Summary
For majority applications Region
Predictor with
1024 entry table
Table with 8 ways x 128 sets
PC Predictor is good for 5 applications
apache, art, mcf, lbm and omnetpp
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
50
 
Pseudo LRU Replacement
Logically partition the set into a 
N
ways
Pick a block at random from way
Unset the T? (Tag) and V? (Valid) bits
 
Access Distribution for L1
Word distribution for 64K L1
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
51
 
Amoeba block size distribution for L1
Block distribution for 64K L1
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
52
 
L1 FSM
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
53
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
54
 
 Miss-Rate ( 64K L1 )
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
55
 
 Miss Bandwidth Rate ( 64K L1 )
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
56
 
Energy Rate ( L1 + LLC ) – (nJ/KI)
Amoeba Cache : Adaptive blocks for
Eliminating Waste in the Memory Hierarchy
57
 
Reduction in execution time
Slide Note
Embed
Share

The Amoeba Cache introduces adaptive blocks to optimize memory hierarchy utilization, eliminating waste by dynamically adjusting storage allocations. Factors influencing cache efficiency and application-specific behaviors are explored. Images and data distributions illustrate the effectiveness of this innovative approach in improving cache utilization.


Uploaded on Sep 13, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Amoeba-Cache Adaptive Blocks for Eliminating Waste in the Memory Hierarchy Snehasish Kumar Arrvindh Shriraman Eric Matthews Lesley Shannon Hongzhou Zhao Sandhya Dwarkadas

  2. On-chip Storage Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 2

  3. Fixed granularity cache Tag Array Data Array Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 3

  4. Cache data utilization Tag Array Data Array Utilization = Fraction of words touched in Untouched Data Tags cache block at the time of eviction Data Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 4

  5. Cache utilization 100% 64K L1 4 ways 64B/block 75% 50% apache 25% eclipse firefox cann. x264 tpcc lbm mcf jbb h2 0% Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 5

  6. Block Distribution 64K 64B/block 26% 25% Apache # Words Touched Firefox 40% 6% 55% 9% 13% 1-2 26% 3-4 6%5% 18% Canneal Eclipse 5-6 5% 4% 14% 7-8 73% 75% Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 6

  7. Block Distribution 1M 64B/block 64K 64B/block # Words Touched 6%5% 10% Canneal Canneal 1-2 12% 14% 3-4 58% 20% 75% 5-6 7-8 Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 7

  8. Factors affecting cache utilization Application specific behaviour Inefficient data structure access patterns Interaction with cache geometry Way conflicts reduce block lifetime and cause poor utilization Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 8

  9. Application Specific Behaviour struct TIE { } Imperial[1024]; for (int i=0; i<1024; i++) { Imperial[i].X = ; Imperial[i].Y = ; Imperial[i].Z = ; Imperial[i].V = ; } Data Array long long X, Y, Z; long long V, H; long long data[3]; Access in a loop X Y Z V H Data[3] Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 9

  10. Cache Geometry Data Array 4 ways 3 2 1 5 4 Problem : Lots of data map to same set Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 10

  11. Implications = 1. Shrinks effective cache space 2. Increases miss rate 3. Wastes on-chip bandwidth 4. Increases on-chip cache energy consumption Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 11

  12. Target Metrics Bandwidth Amoeba Cache Space Utilisation Miss Rate Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 12

  13. Variable Granularity Blocks Tag Array Data Array How to support variable # of blocks / set ? How to support variable granularity for each block? Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 13

  14. Our Approach : Amoeba Cache Unified SRAM Array Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 14

  15. Amoeba Cache Insert Lookup Partial Miss Overheads Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 15

  16. SRAM Array Bitmaps Valid? Tag? SRAM Array 0000 0000 0000 0000 0000 0000 0000 0000 Tag Data Block Region Tag Start End 1+ words 1 word Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 16

  17. Tag - Regions RMAX bytes Memory Region Top 3 3 Start / End Region Tag Set Index Byte 64 bit address Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 17

  18. Example Imperial.X = ; struct TIE { } Imperial; (PC/Region based) Miss long long X, Y, Z; long long V, H; long long data[3]; Invoke Spatial Granularity Predictor Fetch Tag X Y Z V Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 18

  19. Amoeba Cache Insert (8words/set) Insert 4+1 words Tag? 00000000 00000000Valid? 00000 substring() 1 Pos: 0 SRAM Array / Set Miss Tag X Y Z V Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 19

  20. Amoeba Cache Insert (8words/set) Tag? Valid? 2 00000000 11111000 00000000 100000003 SRAM Array / Set Tag X Y Z V Refill Tag X Y Z V Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 20

  21. Example struct TIE { } Imperial; long long X, Y, Z; long long V, H; long long data[3]; Tag X Y Z V Imperial.Y = ; Lookup Data from the cacheData[3] Z V Z V X X Y Y H Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 21

  22. Amoeba Cache Lookup (8words/set) SRAM Array / Set Tag X Y Z V 1 10000000 Tag? V Output Buffer Tag X Y Z Critical Path Region Tag Set Index Word (W) 2x1 2x1 2x1 2x1 2 ???? ??? Start W Hit? Region == Word Selector End > W 3 Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 22

  23. Partial Miss Identify Sub-Blocks Step 1 of 2 Fetch New 1 New Tags Tag X Y Z V Tag X Y Tag V H 2 Evict Overlap MSHR Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 23

  24. Partial Miss Insert New Block Step 2 of 2 Allocate 6 words X Y 3 Tag Z V H MSHR X Y ? Z V H 4 Patch Missing ? s Occurs 5 in 1000 accesses 5 Miss Tag Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 24

  25. Hardware Overheads Metadata Valid? Tag? SRAM Array 0000 0000 Critical Path Amoeba Critical Path 0000 0000 0000 0000 1 KB Extra Latency +4% Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 25

  26. Evaluation Parameters for latency and energy Workloads Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 26

  27. Latency Parameters (cycles) 1 1.04 Latency +4% CPU Fixed Granularity Amoeba Cache 3 64K L1 20 1M LLC Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 300 27

  28. On-Chip Energy Parameters (pJ) 101 105 Fixed Granularity 64K L1 Amoeba Cache 7 / word 230 238 1M LLC Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 28

  29. Workloads 22 diverse workloads from PARSEC SPEC-CPU 2000 & 2006 DaCapo ( Java Benchmarks ) Apache, Firefox and PostgreSQL Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 29

  30. Results Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 30

  31. % Improvement in L1 Miss-Rate 40% 30% 20% Reduces L1 and L2 miss rate by 18% 10% 0% Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 31

  32. % Improvement in L1 Miss-Bandwidth 75% 50% Reduces on-chip bandwidth by 46% Reduces off-chip bandwidth by 38% 25% 0% Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 32

  33. % Improvement in memory energy 40% 30% 20% Reduces energy by 11% 10% 0% Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 33

  34. % Improvement in execution time 20% 21% 15% 10% Improves performance by 10% 5% 0% Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 34

  35. Results Summary Amoeba-Cache Reduce cache pollution for applications with low cache utilization Improve performance for moderate cache utilization Maintain performance for high cache utilization workloads Save energy for streaming applications by keeping out unused words Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 35

  36. Additional Results Lookup as an extra cache pipeline stage vs. throttling the CPU applications show improvement For extra pipeline stage, 8 of 22 Spatial Granularity Predictor Indexing Training Table Size 256 PC and 1024 Region 18 of 22 Address region better Evictions and First Touch Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 36

  37. Additional Results Reduces miss rate (avg 18%) and LLC Multicore Shared Cache miss bandwidth (16%-39%) Comparison against other designs Fixed Granularity 2X Sector Cache variants Multi-$ Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 37

  38. Amoeba Cache What? Enable variable granularity data caching Why? Eliminate waste How? Unify tag and data into a single SRAM array Afforded by recent technology trends Where? Definitely at the L2, possibly at the L1 Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 38

  39. Frequently Asked Questions 1. Multiple threads? 2. Compare against other designs 3. Spatial Pattern Predictor 4. Replacement Policy Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 39

  40. Multicore Shared Cache Miss Miss Miss Miss BW Mix T1 T2 T3 T4 (All) jbb x2, tpc-c x2 12.38% 12.38% 22.29% 22.37% 39.07% Firefox x2, x264 x2 3.82% 3.61% 2.44% 0.43% 15.71% cactus, fluid., omnet., sopl. 1.01% 1.86% 22.38% 0.59% 18.62% canneal, astar, ferret, milc 4.85% 2.75% 19.39% 4.07% 17.77% Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 40

  41. Comparison Multi -$ Sector Variants Amoeba Cache Impact on Miss-Rate Impact on Bandwidth Low tag overhead Tradeoff data and tag space Dynamically resize blocks ~ No No No Yes No No ~ Yes Yes Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 41

  42. Comparison Moderate Group 64K 1.0 Fixed-2X 0.9 Sector (x:2.9) Bandwidth Ratio 0.8 Amoeba 0.7 Multi$-25 0.6 Sector-Pre Multi$-50 0.5 0.4 1.0 1.1 1.2 Miss Rate Ratio 1.3 1.4 1.5 1.6 Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 42

  43. Spatial Pattern Predictor Predictor History Table Index Pattern 01011111 PC / Region 00011101 PC / Region 2 What to do when there is no entry? 1 0 0 0 1 1 1 0 1 PC : Read Addr Critical Word Policy Miss vs Policy-Bandwidth Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 43

  44. Predictor Training Data Array Add / update entry on evict Index Pattern 01011111 PC / Region 00011101 PC / Region Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 44

  45. Predictor L1 Miss Rate (1 of 2) Aligned Finite Infinite Finite+FT History 10 8 6 MPKI 4 2 0 h2 canne. x264 tpc-c eclip. firef. Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 45

  46. Predictor L1 Miss Rate (2 of 2) Aligned Finite Infinite Finite+FT History 140 120 100 80 MPKI 60 40 20 0 mcf apac. lbm jbb Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 46

  47. Predictor L1 Miss Bandwidth (1 of 2) Aligned Finite Infinite Finite+FT History 1800 1500 Bandwidth Rate 1200 900 600 300 0 h2 canne. firef. x264 tpc-c eclip. Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 47

  48. Predictor L1 Miss Bandwidth (2 of 2) Aligned Finite Infinite Finite+FT History 10000 8000 Bandwidth Rate 6000 4000 2000 0 mcf apac. lbm jbb Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 48

  49. Predictor Summary For majority applications Region Predictor with 1024 entry table Table with 8 ways x 128 sets PC Predictor is good for 5 applications apache, art, mcf, lbm and omnetpp Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 49

  50. Pseudo LRU Replacement Way 0 Way 1 Pick a block at random from way Unset the T? (Tag) and V? (Valid) bits Logically partition the set into a Nways Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy 50

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#