Adaptive Insertion Policies for High-Performance Caching

I
n
s
e
r
t
i
o
n
 
P
o
l
i
c
i
e
s
 
A
d
a
p
t
i
v
e
 
I
n
s
e
r
t
i
o
n
 
P
o
l
i
c
i
e
s
 
f
o
r
 
H
i
g
h
-
P
e
r
f
o
r
m
a
n
c
e
C
a
c
h
i
n
g
M
o
i
n
u
d
d
i
n
 
K
.
 
Q
u
r
e
s
h
i
Y
a
l
e
 
N
.
 
P
a
t
t
A
a
m
e
r
 
J
a
l
e
e
l
S
i
m
o
n
 
C
.
 
S
t
e
e
l
y
 
J
r
.
J
o
e
l
 
E
m
e
r
Dead on Arrival (DoA) Lines
 DoA Lines: Lines unused between insertion and eviction
For the 1MB 16-way L2, 60% of lines are DoA
 
Ineffective use of cache space
(
%
)
 
D
o
A
 
L
i
n
e
s
W
o
r
k
i
n
g
 
S
e
t
 
>
 
C
a
c
h
e
 
S
i
z
e
 
E
x
a
m
p
l
e
for (j = 0; j < M; j++)
   for (i = 0; i < LARGE_N; i++)
      a[i] = a[i] x 10;
Cac
he
a[]
Why DoA Lines ?
 
4
 Streaming data 
 Never reused. L2 caches don’t help.
 Working set of application greater than cache size
 
Soln:
 if working set > cache size, retain some working set
W
o
r
k
i
n
g
 
S
e
t
 
>
 
C
a
c
h
e
 
S
i
z
e
 
E
x
a
m
p
l
e
for (j = 0; j < M; j++)
   for (i = 0; i < LARGE_N; i++)
      a[i] = a[i] x 10;
Cac
he
a[]
Keep this in
the cache
Cache Insertion Policy
 
6
Simple changes to insertion policy can greatly improve
cache performance for memory-intensive workloads
Two components of cache replacement:
1.
Victim Selection:
Which line to replace for incoming line?
(E.g. LRU, Random, FIFO, LFU)
2.
Insertion Policy:
Where is incoming line placed in replacement list?
(E.g. insert incoming line at MRU position)
LRU-Insertion Policy (LIP)
 
7
 
Choose victim. Do NOT promote to MRU
L
i
n
e
s
 
d
o
 
n
o
t
 
e
n
t
e
r
 
n
o
n
-
L
R
U
 
p
o
s
i
t
i
o
n
s
 
u
n
l
e
s
s
 
r
e
u
s
e
d
H
o
w
 
i
t
 
w
o
r
k
s
 
f
o
r
 
o
u
r
 
e
x
a
m
p
l
e
?
for (j = 0; j < M; j++)
   for (i = 0; i < LARGE_N; i++)
      a[i] = a[i] x 10;
Cac
he
a[]
Keep this in
the cache
A
s
s
u
m
e
 
C
a
c
h
e
 
i
s
 
e
m
p
t
y
F
i
r
s
t
 
s
e
t
 
o
f
 
a
c
c
e
s
s
e
s
 
w
i
l
l
 
f
i
l
l
 
I
n
 
a
l
l
 
w
a
y
s
,
 
T
h
r
a
s
h
i
n
g
 
w
i
l
l
 
o
c
c
u
r
 
o
n
l
y
 
o
n
 
t
h
e
l
a
s
t
 
w
a
y
 
o
f
 
e
a
c
h
 
s
e
t
W
h
a
t
 
a
b
o
u
t
 
a
 
c
h
a
n
g
e
 
i
n
 
w
o
r
k
i
n
g
 
s
e
t
?
Cac
he
a[]
Keep this in
the cache
First this
b[]
Keep this in
the cache
Followed by this
a[] will occupy all N-1 sets and will not leave. b[] does not stand a chance.
Bimodal-Insertion Policy (BIP)
 
10
if  ( 
rand() < 
 ) 
 
Insert at MRU position;
else
 
Insert at LRU position; 
LIP does not age older lines
Infrequently insert lines in MRU position
Let 

Bimodal throttle parameter 
  
For small 

, BIP retains thrashing protection of LIP
while responding to changes in working set
Think two streaming working sets 
Back to back
Results for LIP and BIP
11
 
Changes to insertion policy increases misses for
LRU-friendly workloads
(%) Reduction in L2 MPKI
B
i
g
g
e
r
 
L
e
s
s
o
n
Interesting programs run for a long time
Billions of instructions per second
Several orders of magnitude larger than your cache size
Don’t have to rush to do the “right thing” immediately
Event “infrequent” changes will eventually affect the whole
cache
Dynamic-Insertion Policy (DIP)
 
13
Two types of workloads: LRU-friendly or BIP-friendly
DIP can be implemented by:
1.
Monitor
 both policies (LRU and BIP)
2.
Choose
 the best-performing policy
3.
Apply
 the best policy to the cache
Need a cost-effective implementation 
Set Dueling
 DIP via “Set Dueling”
 
14
 
Divide the cache in three:
Dedicated LRU sets
Dedicated BIP sets
Follower sets (winner of
LRU,BIP)
 
n-bit saturating counter
misses to LRU-sets:
 
count
er++
misses to BIP-set: 
counter--
 
Counter decides policy for
Follower sets:
MSB = 0
, Use LRU
MSB = 1
, Use BIP
n-bit cntr
monitor 
 
choose 
 apply
 (using a single counter)
Results for DIP
15
DIP reduces average MPKI by 21% and
requires < two bytes storage overhead
BIP
(%) Reduction in L2 MPKI
Slide Note
Embed
Share

Explore the concept of adaptive insertion policies in high-performance caching systems, focusing on mitigating the issue of Dead on Arrival (DoA) lines by making simple changes to cache insertion policies. Understanding cache replacement components, victim selection, and insertion policy can significantly enhance cache performance for memory-intensive workloads. Learn how the LRU-Insertion Policy (LIP) optimizes cache utilization and reduces cache thrashing, ensuring efficient data access and retention within the cache.


Uploaded on Sep 18, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Insertion Policies Adaptive Insertion Policies for High-Performance Caching Moinuddin K. Qureshi Yale N. Patt Aamer Jaleel Simon C. Steely Jr. Joel Emer

  2. Dead on Arrival (DoA) Lines DoA Lines: Lines unused between insertion and eviction (%) DoA Lines For the 1MB 16-way L2, 60% of lines are DoA Ineffective use of cache space

  3. Working Set > Cache Size Example for (j = 0; j < M; j++) for (i = 0; i < LARGE_N; i++) a[i] = a[i] x 10; Cache a[]

  4. Why DoA Lines ? Streaming data Never reused. L2 caches don t help. Working set of application greater than cache size Misses per 1000 instructions Misses per 1000 instructions art mcf Cache size in MB Cache size in MB Soln: if working set > cache size, retain some working set 4

  5. Working Set > Cache Size Example for (j = 0; j < M; j++) for (i = 0; i < LARGE_N; i++) a[i] = a[i] x 10; Keep this in the cache Cache a[]

  6. Cache Insertion Policy Two components of cache replacement: 1. Victim Selection: Which line to replace for incoming line? (E.g. LRU, Random, FIFO, LFU) 2. Insertion Policy: Where is incoming line placed in replacement list? (E.g. insert incoming line at MRU position) Simple changes to insertion policy can greatly improve cache performance for memory-intensive workloads 6

  7. LRU-Insertion Policy (LIP) MRU LRU a b c d e f g h Reference to i with traditional LRU policy: i a b c d e f g Reference to i with LIP: a b c d e f g i Choose victim. Do NOT promote to MRU Lines do not enter non-LRU positions unless reused 7

  8. How it works for our example? for (j = 0; j < M; j++) for (i = 0; i < LARGE_N; i++) a[i] = a[i] x 10; Assume Cache is empty First set of accesses will fill In all ways, Keep this in the cache Cache Thrashing will occur only on the last way of each set a[]

  9. What about a change in working set? First this Followed by this Keep this in the cache Keep this in the cache Cache b[] a[] a[] will occupy all N-1 sets and will not leave. b[] does not stand a chance.

  10. Bimodal-Insertion Policy (BIP) Think two streaming working sets Back to back LIP does not age older lines Infrequently insert lines in MRU position Let = Bimodal throttle parameter if ( rand() < ) Insert at MRU position; else Insert at LRU position; For small , BIP retains thrashing protection of LIP while responding to changes in working set 10

  11. Results for LIP and BIP BIP( =1/32) LIP (%) Reduction in L2 MPKI Changes to insertion policy increases misses for LRU-friendly workloads 11

  12. Bigger Lesson Interesting programs run for a long time Billions of instructions per second Several orders of magnitude larger than your cache size Don t have to rush to do the right thing immediately Event infrequent changes will eventually affect the whole cache

  13. Dynamic-Insertion Policy (DIP) Two types of workloads: LRU-friendly or BIP-friendly DIP can be implemented by: 1. Monitor both policies (LRU and BIP) 2. Choose the best-performing policy 3. Apply the best policy to the cache Need a cost-effective implementation Set Dueling 13

  14. DIP via Set Dueling Divide the cache in three: Dedicated LRU sets Dedicated BIP sets Follower sets (winner of LRU,BIP) miss LRU-sets + n-bit cntr BIP-sets miss MSB = 0? n-bit saturating counter misses to LRU-sets: counter++ misses to BIP-set: counter-- No Use BIP YES Use LRU Follower Sets Counter decides policy for Follower sets: MSB = 0, Use LRU MSB = 1, Use BIP monitor choose apply (using a single counter) 14

  15. Results for DIP DIP (32 dedicated sets) BIP (%) Reduction in L2 MPKI DIP reduces average MPKI by 21% and requires < two bytes storage overhead 15

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#