Enhancing Memory Cache Efficiency with DRAM Compression Techniques
Explore the challenges faced by Moore's Law in relation to bandwidth limitations and the innovative solutions such as 3D-DRAM caches and compressed memory systems. Discover how compressing DRAM caches can improve bandwidth and capacity, leading to enhanced performance in memory-intensive applications. Practical implementations and benefits of DRAM caching are discussed, highlighting the importance of optimizing both bandwidth and capacity for improved system performance.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
DICE: Compressing DRAM Caches for Bandwidth and Capacity Vinson Young Prashant Nair Moinuddin Qureshi 1
MOORES LAW HITS BANDWIDTH WALL Moore s scaling encounters Bandwidth Wall 2
3D-DRAM MITIGATES BANDWIDTH WALL 3D-DRAM Hybrid Memory Cube (HMC) from Micron, High Bandwidth Memory (HBM) from Samsung capacity to replace conventional DIMM memory 3D-DRAM improves bandwidth, but does not have 3
3D-DRAM AS A CACHE (3D-DRAM CACHE) fast CPU L1$ L2$ CPU L1$ Memory Hierarchy L2$ L3$ 3D-DRAM Cache System Memory MCDRAM from Intel, HBC from AMD slow OS-visible Space Architecting 3D-DRAM as a cache can improve memory bandwidth (and avoid OS/software change) 4
PRACTICAL 3D-DRAM CACHE: ALLOY CACHE Tags part-of-line Alloy Tag+Data Avoid Tag Serialization One Tag+Data Similar to DRAM Cache in KNL: Direct-mapped, Tags in ECC Practical DRAM cache: low latency and bandwidth-efficient
3D-DRAM CACHE BANDWIDTH IS IMPORTANT On 8-CPU, 1GB DRAM Cache configuration 1.80 2x Capacity 2x Bandwidth, 2x Capacity 1.60 22% 1.40 Speedup 10% 1.20 1.00 0.80 0.60 2x-capacity cache improves performance by 10%. And,additional 2x bandwidth increases speedup to 22%. Improving both bandwidthandcapacity is valuable. 6
INTRODUCTION: DRAM CACHE Baseline: Direct-Mapped, One Data Block in an access A B C D A A A C W Y A C W B B B D X Z C C Y D D Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) Baseline 7 7
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B C D A W A A C W Y A C W B B X Y B D X Z C C 1x Bandwidth Y D Z D Traditional Compression (Compressible) Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) 8
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B C D A W A A C W Y A C W B B X Y B D X Z C C 1x Bandwidth Y D Z D Traditional Compression (Compressible) Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) 9
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B C D A W A A C W Y A C W B B X Y B D X Z 4 accesses @ 1x-2x Capacity C C Y D Z D Traditional Compression (Compressible) Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) 10
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B C D A W A A C W Y A C W B B X Y B D X Z C C 2x Bandwidth Y D Z D Traditional Compression (Compressible) Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) 11
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B C D B,D? A W A A C W Y A C W < 1x B B X Y B D X Z C C Bandwidth Y D Z D Traditional Compression (Compressible) Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) 12
INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? Traditional Compression Spatial Indexing 1x 2x 1x < 1x Bandwidth Bandwidth Bandwidth Bandwidth Compressible Incompressible Compressible Incompressible 13
INTRODUCTION: TRADITIONAL COMPRESSION 1.80 1.60 Improves Capacity 1.40 Speedup Little speedup (7%) No degradation 1.20 1.00 0.80 0.60 Compression for capacity (TSI) sees little speedup (7%) due to diminishing returns on giga-scale caches 14
INTRODUCTION: SPATIAL INDEXING 1.80 Improves Bandwidth 1.60 1.40 Speedup Can degrade No speedup 1.20 1.00 0.80 0.60 Spatial Indexing compression gets both benefits of bandwidth and capacity when lines are compressible. But, it hurts performance when lines are incompressible 15
INTRODUCTION: COMPRESSED DRAM CACHE Goal: Compression for Capacity AND Bandwidth Traditional Compression Spatial Indexing 1x 2x 1x < 1x Bandwidth Bandwidth Bandwidth Bandwidth Compressible Incompressible Compressible Incompressible DICE (Dynamic Index) 19% Speedup + 36% EDP 16
DICE OVERVIEW Compressed DRAM Cache Organization Flexible Mapping for Quick Switching Dynamic Indexing ComprEssion (DICE) Insertion Policy Index Prediction 17
PRACTICAL DRAM CACHE COMPRESSION L3 Cache On-chip Write Read Decompression Logic Compression Logic L4 Cache Controller Writeback Install DRAM Cache (compressed) DRAM Cache Memory Off-chip Compression: Simple changes within the controller 18
DRAM CACHE TAG FORMAT Tag Boundary Data 64 Bytes 8 Bytes Tag A Data A Cache controller receives 72B of tag+data. It can flexibly interpret bits as tag bits or data bits. 19
PROPOSED FLEXIBLE TAG FORMAT Tag Boundary Data Is Tag? Is Tag? Is Tag? Not Tag A B X I X B A We create Tag space as needed, for up to 28 lines. Achieves 1.6x effective capacity. 20
DICE OVERVIEW Compressed DRAM Cache Organization Flexible Mapping for Quick Switching Dynamic Indexing ComprEssion (DICE) Insertion Policy Index Prediction 21
FLEXIBLE MAPPING (TSI OR BAI) 0 1 0 4 0 1 2 3 1 5 4 5 4 5 2 6 2 3 6 7 3 7 6 7 Na ve Spatial Indexing Bandwidth-Aware Indexing (BAI) Traditional Set Indexing (TSI) Bandwidth-Aware Indexing (BAI) facilitates quick switching between two indices TSI and BAI. 22
FLEXIBLE MAPPING (TSI OR BAI) 0 4 0 4 0 1 1 5 1 5 4 5 2 6 2 6 2 3 7 3 3 7 6 7 Na ve Spatial Indexing Bandwidth-Aware Indexing (BAI) Traditional Set Indexing (TSI) Bandwidth-Aware Indexing (BAI) facilitates quick switching between two indices TSI and BAI. 23
FLEXIBLE MAPPING (TSI OR BAI) 0 1 0 4 0 1 4 2 3 1 5 4 1 5 4 5 2 6 2 3 6 6 7 3 7 6 3 7 Na ve Spatial Indexing Bandwidth-Aware Indexing (BAI) Traditional Set Indexing (TSI) Bandwidth-Aware Indexing (BAI) facilitates quick switching between two indices TSI and BAI. 24
DICE OVERVIEW Compressed DRAM Cache Organization Flexible Mapping for Quick Switching Dynamic Indexing ComprEssion (DICE) Insertion Policy Index Prediction 25
DICE: DYNAMIC-INDEXED COMPRESSED CACHE DRAM Cache Compressibility Based Insertion Cache Index Prediction ? ? Traditional Set Index Install Read Bandwidth- Aware Index TSI = BAI DICE: Dynamic-Indexing Cache comprEssion, decides index on install, and predicts index on read 26
COMPRESSIBILITY-BASED INSERTION DRAM Cache Compressibility Based Insertion > -size But checking both wastes bandwidth ? Traditional Set Index Install Read Bandwidth- Aware Index <= -size No explicit swaps. Eviction and install decides policy TSI = BAI Compressibilty-based insertion uses Bandwidth-Aware Indexingwhen lines are compressible,and TSI otherwise 27
SIMILAR INTRA-PAGE COMPRESSIBILITY Indices seen in a Compressible Page Bandwidth- Aware Index Install <= -size Read BAI Bandwidth- Aware Index Lines within a page have similar compressibility Bandwidth- Aware Index DICE is likely to install lines of a page into similar index 28
SIMILAR INTRA-PAGE COMPRESSIBILITY Indices seen in an Incompressible Page Traditional Set Index Traditional Set Index Install > -size Read TSI Traditional Set Index Lines within a page have similar compressibility Traditional Set Index Bandwidth- Aware Index 2nd access only on mispredict Thus, page-based last-time prediction of index can be accurate (94%) 29
PAGE-BASED CACHE INDEX PREDICTOR (CIP) 0 = Traditional Set Index 1 = Bandwidth- Aware Index Demand Access Last-Time Table (LTT) 1 1 Page # Hash Predict Traditional Set Index 0 0 0 1 0 1 Page-basedlast-timepredictionexploitssimilarintra-page compressibility,toachievehighpredictionaccuracy(94%) 30
DICE OVERVIEW Compressed DRAM Cache Organization Flexible Mapping for Quick Switching Dynamic Indexing (DICE) Insertion Policy Index Prediction Results 31
METHODOLOGY (1/8TH KNIGHTS LANDING) Stacked DRAM Commodity DRAM CPU Core Chip 3.2GHz 4-wide out-of-order core 8 cores, 8MB shared last-level cache Compression FPC + BDI 32
METHODOLOGY (1/8TH KNIGHTS LANDING) Other sensitivities in paper Stacked DRAM Commodity DRAM CPU Stacked DRAM 1GB DDR1.6GHz, 128-bit 4 channels 100 GBps 35ns Commodity DRAM 32GB DDR1.6GHz, 64-bit 1 channel 12.5 GBps 35ns Capacity Bus Channels Bandwidth Latency 33
DICE RESULTS 1.80 Traditional Set Indexing Spatial Indexing DICE 1.60 Performs as Spatial Indexing Indexing Performs as Traditional DICE outperforms both 1.40 Speedup 1.20 1.00 0.80 0.60 DICE improves performance over both Spatial Indexing and Traditional Indexing with fine-grain decision (19%) 34
INTRODUCTION: COMPRESSED DRAM CACHE Goal: Compression for Capacity AND Bandwidth Traditional Compression Spatial Indexing 1x 2x 1x < 1x Bandwidth Bandwidth Bandwidth Bandwidth Compressible Incompressible Compressible Incompressible DICE (Dynamic Index) 19% Speedup + 36% EDP 35
THANK YOU 36
EXTRA SLIDES Extra Slides 37