Adaptive Insertion Policies for High-Performance Caching

Dead on Arrival (DoA) Lines

 DoA Lines: Lines unused between insertion and eviction

For the 1MB 16-way L2, 60% of lines are DoA



Ineffective use of cache space

•

for (j = 0; j < M; j++)

•

   for (i = 0; i < LARGE_N; i++)

•

      a[i] = a[i] x 10;

Cac

he

a[]

Why DoA Lines ?



 Streaming data



 Never reused. L2 caches don’t help.



 Working set of application greater than cache size

Soln:

 if working set > cache size, retain some working set

•

for (j = 0; j < M; j++)

•

   for (i = 0; i < LARGE_N; i++)

•

      a[i] = a[i] x 10;

Cac

he

a[]

Keep this in

the cache

Cache Insertion Policy

Simple changes to insertion policy can greatly improve

cache performance for memory-intensive workloads

Two components of cache replacement:

1.

Victim Selection:

Which line to replace for incoming line?

(E.g. LRU, Random, FIFO, LFU)

2.

Insertion Policy:

Where is incoming line placed in replacement list?

(E.g. insert incoming line at MRU position)

LRU-Insertion Policy (LIP)

Choose victim. Do NOT promote to MRU

•

for (j = 0; j < M; j++)

•

   for (i = 0; i < LARGE_N; i++)

•

      a[i] = a[i] x 10;

Cac

he

a[]

Keep this in

the cache

Cac

he

a[]

Keep this in

the cache

First this

b[]

Keep this in

the cache

Followed by this

a[] will occupy all N-1 sets and will not leave. b[] does not stand a chance.

Bimodal-Insertion Policy (BIP)

if  (

rand() <



Insert at MRU position;

else

Insert at LRU position;

LIP does not age older lines

Infrequently insert lines in MRU position

Let



Bimodal throttle parameter

For small



, BIP retains thrashing protection of LIP

while responding to changes in working set

Think two streaming working sets

Back to back

Results for LIP and BIP

Changes to insertion policy increases misses for

LRU-friendly workloads

(%) Reduction in L2 MPKI

•

Interesting programs run for a long time

•

Billions of instructions per second

–

Several orders of magnitude larger than your cache size

•

Don’t have to rush to do the “right thing” immediately

–

Event “infrequent” changes will eventually affect the whole

cache

Dynamic-Insertion Policy (DIP)

Two types of workloads: LRU-friendly or BIP-friendly

DIP can be implemented by:

1.

Monitor

 both policies (LRU and BIP)

2.

Choose

 the best-performing policy

3.

Apply

 the best policy to the cache

Need a cost-effective implementation



“

Set Dueling

”

 DIP via “Set Dueling”

Divide the cache in three:

–

Dedicated LRU sets

–

Dedicated BIP sets

–

Follower sets (winner of

LRU,BIP)

n-bit saturating counter

misses to LRU-sets:

count

er++

misses to BIP-set:

counter--

Counter decides policy for

Follower sets:

–

MSB = 0

, Use LRU

–

MSB = 1

, Use BIP

n-bit cntr

monitor



choose



 apply

 (using a single counter)

Results for DIP

DIP reduces average MPKI by 21% and

requires < two bytes storage overhead

BIP

(%) Reduction in L2 MPKI

Slide Note

Embed Share

Download

Explore the concept of adaptive insertion policies in high-performance caching systems, focusing on mitigating the issue of Dead on Arrival (DoA) lines by making simple changes to cache insertion policies. Understanding cache replacement components, victim selection, and insertion policy can significantly enhance cache performance for memory-intensive workloads. Learn how the LRU-Insertion Policy (LIP) optimizes cache utilization and reduces cache thrashing, ensuring efficient data access and retention within the cache.

brinnle Follow

Uploaded on Sep 18, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Insertion Policies Adaptive Insertion Policies for High-Performance Caching Moinuddin K. Qureshi Yale N. Patt Aamer Jaleel Simon C. Steely Jr. Joel Emer

Dead on Arrival (DoA) Lines DoA Lines: Lines unused between insertion and eviction (%) DoA Lines For the 1MB 16-way L2, 60% of lines are DoA Ineffective use of cache space

Working Set > Cache Size Example for (j = 0; j < M; j++) for (i = 0; i < LARGE_N; i++) a[i] = a[i] x 10; Cache a[]

Why DoA Lines ? Streaming data Never reused. L2 caches don t help. Working set of application greater than cache size Misses per 1000 instructions Misses per 1000 instructions art mcf Cache size in MB Cache size in MB Soln: if working set > cache size, retain some working set 4

Working Set > Cache Size Example for (j = 0; j < M; j++) for (i = 0; i < LARGE_N; i++) a[i] = a[i] x 10; Keep this in the cache Cache a[]

Cache Insertion Policy Two components of cache replacement: 1. Victim Selection: Which line to replace for incoming line? (E.g. LRU, Random, FIFO, LFU) 2. Insertion Policy: Where is incoming line placed in replacement list? (E.g. insert incoming line at MRU position) Simple changes to insertion policy can greatly improve cache performance for memory-intensive workloads 6

LRU-Insertion Policy (LIP) MRU LRU a b c d e f g h Reference to i with traditional LRU policy: i a b c d e f g Reference to i with LIP: a b c d e f g i Choose victim. Do NOT promote to MRU Lines do not enter non-LRU positions unless reused 7

How it works for our example? for (j = 0; j < M; j++) for (i = 0; i < LARGE_N; i++) a[i] = a[i] x 10; Assume Cache is empty First set of accesses will fill In all ways, Keep this in the cache Cache Thrashing will occur only on the last way of each set a[]

What about a change in working set? First this Followed by this Keep this in the cache Keep this in the cache Cache b[] a[] a[] will occupy all N-1 sets and will not leave. b[] does not stand a chance.

Bimodal-Insertion Policy (BIP) Think two streaming working sets Back to back LIP does not age older lines Infrequently insert lines in MRU position Let = Bimodal throttle parameter if ( rand() < ) Insert at MRU position; else Insert at LRU position; For small , BIP retains thrashing protection of LIP while responding to changes in working set 10

Results for LIP and BIP BIP( =1/32) LIP (%) Reduction in L2 MPKI Changes to insertion policy increases misses for LRU-friendly workloads 11

Bigger Lesson Interesting programs run for a long time Billions of instructions per second Several orders of magnitude larger than your cache size Don t have to rush to do the right thing immediately Event infrequent changes will eventually affect the whole cache

Dynamic-Insertion Policy (DIP) Two types of workloads: LRU-friendly or BIP-friendly DIP can be implemented by: 1. Monitor both policies (LRU and BIP) 2. Choose the best-performing policy 3. Apply the best policy to the cache Need a cost-effective implementation Set Dueling 13

DIP via Set Dueling Divide the cache in three: Dedicated LRU sets Dedicated BIP sets Follower sets (winner of LRU,BIP) miss LRU-sets + n-bit cntr BIP-sets miss MSB = 0? n-bit saturating counter misses to LRU-sets: counter++ misses to BIP-set: counter-- No Use BIP YES Use LRU Follower Sets Counter decides policy for Follower sets: MSB = 0, Use LRU MSB = 1, Use BIP monitor choose apply (using a single counter) 14