Enhancing DRAM Performance with ChargeCache: A Novel Approach

ChargeCache
Reducing DRAM Latency by
Exploiting Row Access Locality
Hasan Hassan,
Gennady Pekhimenko
,
Nandita Vijaykumar,
Vivek Seshadri
, Donghyuk Lee,
Oguz Ergin, 
Onur Mutlu
Executive Summary
 
Goal
: 
Reduce average DRAM access latency with no
modification to the existing DRAM chips
Observations
:
1)
A highly-charged
 
DRAM 
row 
can be accessed with low latency
2)
A 
row’s 
charge is restored when the 
row 
is accessed
3)
A 
recently-accessed 
row
 
is
 likely to be accessed
 again
:
 
Row Level Temporal Locality (RLTL)
Key Idea
: 
Track recently-accessed DRAM rows and use lower
timing parameters if such rows are accessed again
ChargeCache
:
Low cost & no modifications to the DRAM
Higher performance (
8.6-
10.6
%
 on average for 
8
-core)
Lower DRAM energy (
7.9
%
 on average)
 1. DRAM Operation Basics
 2
. 
Accessing Highly-charged Rows
 4. 
ChargeCache
 5.
 
Evaluation
 6. 
Conclusion
 3
. 
Row Level Temporal Locality (RLTL)
Outline
DRAM Stores Data as Charge
 
1. Sensing
 
2. Restore
 
3. Precharge
 
Three steps of charge
movement
 
Cell
time
charge
 
Sense-Amplifier
DRAM Charge over Time
 
Sensing
 
Restore
Cell
Sense
Amplifier
 
Precharge
R/W
ACT
PRE
Ready to Access
Charge Level
 
tRCD
 
tRAS
Ready to Access
Ready to
Precharge
 1. 
DRAM Operation Basics
 2
. 
Accessing Highly-charged Rows
 4. 
ChargeCache
 5.
 
Evaluation
 6. 
Conclusion
 3
. 
Row Level Temporal Locality (RLTL)
Outline
Accessing Highly-charged Rows
 
Cell
time
charge
 
Sense-Amplifier
Sensing
Restore
Precharge
R/W
ACT
PRE
tRCD
tRAS
R/W
PRE
Ready to Access
Ready to Precharge
Observation 1
 
A 
highly-charged
 
DRAM 
row 
can be
accessed with 
low latency
t
R
C
D
:
 
4
4
%
t
R
A
S
:
 
3
7
%
How does a row become
highly-charged
?
How Does
 
a
 Row 
Become 
H
ighly-Charged?
 
DRAM cells 
lose charge 
over time
Two ways of 
restoring a row’s charge
:
Refresh Operation
Access
time
 
Refresh
charge
 
Refresh
 
Access
Observation 2
A 
row’s 
charge is 
restored
 when the 
row
is 
accessed
How likely is a 
recently-accessed
row to be accessed again?
 1. 
DRAM Operation Basics
 2
. 
Accessing Highly-charged Rows
 4. 
ChargeCache
 5.
 
Evaluation
 6. 
Conclusion
 3
. 
Row Level Temporal Locality (RLTL)
Outline
Row Level Temporal Locality (RLTL)
 
86%
 
97%
 
A 
recently-accessed
 
DRAM row
 
is
 likely to be
accessed
 again
.
t
-RLTL
: Fraction of rows that are accessed
within time 
t
 after their previous access
 
8ms – RLTL for single-core workloads
 
8ms – RLTL for eight-core workloads
 1. 
DRAM Operation Basics
 2
. 
Accessing Highly-charged Rows
 4. 
ChargeCache
 5.
 
Evaluation
 6. 
Conclusion
 3
. 
Row Level Temporal Locality (RLTL)
Outline
Summary of the Observations
 
1.
A 
highly-charged
 
DRAM 
row 
can be
accessed with 
low latency
 
2.
A 
row’s 
charge is 
restored
 when the
row 
is 
accessed
 
3.
A 
recently-accessed
 
DRAM row
 
is
likely to be accessed
 again
:
Row Level Temporal Locality (RLTL)
Key Idea
Track 
recently-accessed
 DRAM rows
and use 
lower timing parameters
 if
such rows are accessed again
ChargeCache Overview
Memory Controller
ChargeCache
 
A
A
:B
:D
:C
:E
:F
Requests:
:A
 
D
D
 
A
A
DRAM
 
A
 
D
 
ChargeCache Miss:
 
Use 
Default
 Timings
 
ChargeCache Hit:
 
Use 
Lower
 Timings
Area and Power 
Overhead
Modeled with CACTI
Area
~5KB for 128-entry ChargeCache
0.24%
 of a 4MB Last Level Cache (LLC)
area
Power Consumption
0.15 mW on average (static + dynamic)
0.23%
 of the 4MB LLC power consumption
 1. 
DRAM Operation Basics
 2
. 
Accessing Highly-charged Rows
 4. 
ChargeCache
 5.
 Evaluation
 6. 
Conclusion
 3
. 
Row Level Temporal Locality (RLTL)
Outline
Methodology
 
Simulator
DRAM Simulator (Ramulator 
[
Kim
+, 
CAL
’1
5
]
)
https://github.com/CMU-SAFARI/ramulator
Workloads
22 single-core workloads
SPEC CPU2006, TPC, STREAM
20 multi-programmed 
8-core 
workloads
By randomly choosing from single-core workloads
Execute at least 
1 billion
 representative 
instructions
 per
core (Pinpoints)
System Parameters
1/8 core system with 4MB LLC
Default 
tRCD
/
tRAS
 of 
11
/
28
 cycles
Mechanisms Evaluated
 
Non-Uniform Access Time Memory Controller (NUAT)
[
Shin
+, 
HPCA
’1
4
]
Key idea
:  Access only 
recently-refreshed
 rows with
lower timing parameters
 
Recently-refreshed
 rows can be accessed faster
 Only a small fraction (10-12%) of accesses go to
recently-refreshed 
rows
ChargeCache
 
Recently
-accessed
 rows can be accessed faster
 A large fraction (
86-97
%) of accesses go to 
recently-
accessed
 rows 
 (
RLTL
)
1
28 entries per core
, 
On hit: 
tRCD-7, tRAS-20 cycles
Upper Bound: 
Low Latency DRAM
Works as ChargeCache with 100% Hit Ratio
On all DRAM accesses: 
tRCD-7, tRAS-20 cycles
Single-core Performance
ChargeCache improves
single-core performance
Eight-core 
Performance
 
2.5%
 
9%
 
13%
ChargeCache significantly improves
multi-core performance
DRAM 
Energy Savings
ChargeCache reduces DRAM energy
Other Results In The Paper
 
Detailed analysis of the Row Level
Temporal Locality phenomenon
 
ChargeCache hit-rate analysis
 
Sensitivity studies
o
Sensitivity to 
t 
in
 t
-RLTL
o
ChargeCache capacity
 1. 
DRAM Operation Basics
 2
. 
Accessing Highly-charged Rows
 4. 
ChargeCache
 5.
 
Evaluation
 6. 
Conclusion
 3
. 
Row Level Temporal Locality (RLTL)
Outline
Conclusion
 
ChargeCache 
reduces 
average DRAM access latency at 
low cost
Observations
:
1)
A highly-charged
 
DRAM 
row 
can be accessed with low latency
2)
A 
row’s 
charge is restored when the 
row 
is accessed
3)
A 
recently-accessed 
row
 
is
 likely to be accessed
 again
: 
Row
Level Temporal Locality (RLTL)
Key Idea
: 
Track recently-accessed DRAM rows and use lower
timing parameters if such rows are accessed again
ChargeCache
:
Low cost & no modifications to the DRAM
Higher performance (
8.6-
10.6
%
 on average for 
8
-core)
Lower DRAM energy (
7.9
%
 on average)
 
Source code will be available in May
https://github.com/CMU-SAFARI
ChargeCache
Reducing DRAM Latency by
Exploiting Row Access Locality
Hasan Hassan,
Gennady Pekhimenko
,
Nandita Vijaykumar,
Vivek Seshadri
, Donghyuk Lee,
Oguz Ergin, 
Onur Mutlu
Backup Slides
Detailed Design
Highly-charged
Row Address
Cache (HCRAC)
 
PRE
Insert Row
Address
 
ACT
Lookup the
Address
Invalidation
Mechanism
1
2
3
R
L
T
L
 
D
i
s
t
r
i
b
u
t
i
o
n
S
e
n
s
i
t
i
v
i
t
y
 
o
n
 
C
a
p
a
c
i
t
y
 
H
i
t
-
r
a
t
e
 
A
n
a
l
y
s
i
s
 
S
e
n
s
i
t
i
v
i
t
y
 
o
n
 
t
-
R
L
T
L
 
Slide Note

Hello, my name is Hasan Hassan.

Today, I will present you our work named ChargeCache.

This work is done in collaboration with co-authors from Carnegie Mellon University and TOBB University of Economics & Technology.

Embed
Share

Reduce average DRAM access latency by leveraging row access locality with ChargeCache, a cost-effective solution requiring no modifications to existing DRAM chips. By tracking recently accessed rows and adjusting timing parameters, ChargeCache achieves higher performance and lower DRAM energy consumption. Explore the key ideas, evaluation, and conclusion in this innovative study.

  • DRAM Performance
  • ChargeCache
  • Row Access Locality
  • Memory Technology
  • Innovative Approach

Uploaded on Oct 04, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu

  2. Executive Summary Goal: Reduce average DRAM access latency with no modification to the existing DRAM chips Observations: 1) A highly-charged DRAM row can be accessed with low latency 2) A row s charge is restored when the row is accessed 3) A recently-accessed row is likely to be accessed again: Row Level Temporal Locality (RLTL) Key Idea: Track recently-accessed DRAM rows and use lower timing parameters if such rows are accessed again ChargeCache: Low cost & no modifications to the DRAM Higher performance (8.6-10.6% on average for 8-core) Lower DRAM energy (7.9% on average) 2

  3. Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 3

  4. DRAM Stores Data as Charge DRAM Cell Three steps of charge movement MemCtrl CPU 1. Sensing 2. Restore 3. Precharge Sense-Amplifier 4

  5. DRAM Charge over Time Ready to Access Charge Level Ready to Precharge Ready to Access Cell Cell Data 1 charge Sense-Amplifier Sense Amplifier Data 0 Restore Sensing Precharge time tRCD ACT R/W PRE tRAS 5

  6. Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 6

  7. Accessing Highly-charged Rows Ready to Access Ready to Precharge Cell Data 1 charge Sense-Amplifier Data 0 Restore Sensing tRCD Precharge time ACT R/W R/W PRE PRE tRAS 7

  8. Observation 1 A highly-charged DRAM row can be accessed with low latency tRCD: 44% 44% tRAS: 37 37% % How does a row become highly-charged? 8

  9. How Does a Row Become Highly-Charged? DRAM cells lose charge over time Two ways of restoring a row s charge: Refresh Operation Access charge time Access Refresh Refresh 9

  10. Observation 2 A row s charge is restored when the row is accessed How likely is a recently-accessed row to be accessed again? 10

  11. Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 11

  12. Row Level Temporal Locality (RLTL) A recently-accessed DRAM row is likely to be accessed again. t-RLTL: Fraction of rows that are accessed within time t after their previous access 86% 97% 100% 100% Fraction of Accesses Fraction of Accesses 80% 80% 60% 60% 40% 40% 20% 20% 0% 0% 8ms RLTL for single-core workloads 8ms RLTL for eight-core workloads 12

  13. Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 13

  14. Summary of the Observations 1. A highly-charged DRAM row can be accessed with low latency 2. A row s charge is restored when the row is accessed 3. A recently-accessed DRAM row is likely to be accessed again: Row Level Temporal Locality (RLTL) 14

  15. Key Idea Track recently-accessed DRAM rows and use lower timing parameters if such rows are accessed again 15

  16. ChargeCache Overview DRAM :A Memory Controller :B :C ChargeCache A D :D :E :F A D A Requests: ChargeCache Miss: Use Default Timings ChargeCache Hit: Use Lower Timings 16

  17. Area and Power Overhead Modeled with CACTI Area ~5KB for 128-entry ChargeCache 0.24% of a 4MB Last Level Cache (LLC) area Power Consumption 0.15 mW on average (static + dynamic) 0.23% of the 4MB LLC power consumption 17

  18. Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 18

  19. Methodology Simulator DRAM Simulator (Ramulator [Kim+, CAL 15]) https://github.com/CMU-SAFARI/ramulator Workloads 22 single-core workloads SPEC CPU2006, TPC, STREAM 20 multi-programmed 8-core workloads By randomly choosing from single-core workloads Execute at least 1 billion representative instructions per core (Pinpoints) System Parameters 1/8 core system with 4MB LLC Default tRCD/tRAS of 11/28 cycles 19

  20. Mechanisms Evaluated Non-Uniform Access Time Memory Controller (NUAT) [Shin+, HPCA 14] Key idea: Access only recently-refreshed rows with lower timing parameters Recently-refreshed rows can be accessed faster Only a small fraction (10-12%) of accesses go to recently-refreshed rows ChargeCache Recently-accessed rows can be accessed faster A large fraction (86-97%) of accesses go to recently- accessed rows (RLTL) 128 entries per core, On hit: tRCD-7, tRAS-20 cycles Upper Bound: Low Latency DRAM Works as ChargeCache with 100% Hit Ratio On all DRAM accesses: tRCD-7, tRAS-20 cycles 20

  21. Single-core Performance NUAT ChargeCache LL-DRAM (Upper bound) ChargeCache + NUAT 16% Speedup 12% 8% 4% 0% ChargeCache improves single-core performance 21

  22. Eight-core Performance NUAT 2.5% 9% ChargeCache LL-DRAM (Upperbound) ChargeCache + NUAT 13% 16% Speedup 12% 8% 4% 0% ChargeCache significantly improves multi-core performance 22

  23. DRAM Energy Savings 15% Average Maximum DRAM Energy Reduction 10% 5% 0% Single-core Eight-core ChargeCache reduces DRAM energy 23

  24. Other Results In The Paper Detailed analysis of the Row Level Temporal Locality phenomenon ChargeCache hit-rate analysis Sensitivity studies oSensitivity to t in t-RLTL oChargeCache capacity 24

  25. Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 25

  26. Conclusion ChargeCache reduces average DRAM access latency at low cost Observations: 1) A highly-charged DRAM row can be accessed with low latency 2) A row s charge is restored when the row is accessed 3) A recently-accessed row is likely to be accessed again: Row Level Temporal Locality (RLTL) Key Idea: Track recently-accessed DRAM rows and use lower timing parameters if such rows are accessed again ChargeCache: Low cost & no modifications to the DRAM Higher performance (8.6-10.6% on average for 8-core) Lower DRAM energy (7.9% on average) Source code will be available in May https://github.com/CMU-SAFARI 26

  27. ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu

  28. Backup Slides 28

  29. Detailed Design Highly-charged Row Address Cache (HCRAC) PRE Insert Row Address 1 3 Invalidation Mechanism ACT Lookup the Address 2 29

  30. RLTL Distribution RLTL Distribution 0.125ms - RLTL 0.25ms - RLTL 0.5ms - RLTL 1ms - RLTL 32ms - RLTL 100 Fraction of Accesses 80 60 40 20 0 cactusADM sjeng apache20 mcf astar STREAMcopy milc bwaves lbm hmmer tpcc64 spolex lislie3d GemsFDTD libquantum AVERAGE tpch17 sphinx3 tonto tpch6 tpch2 omnetpp bzip2 30

  31. Sensitivity on Capacity Sensitivity on Capacity 31

  32. Hit Hit- -rate Analysis rate Analysis 32

  33. Sensitivity on t Sensitivity on t- -RLTL RLTL 33

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#