Enhancing Memory Cache Efficiency with DRAM Compression Techniques

Slide Note
Embed
Share

Explore the challenges faced by Moore's Law in relation to bandwidth limitations and the innovative solutions such as 3D-DRAM caches and compressed memory systems. Discover how compressing DRAM caches can improve bandwidth and capacity, leading to enhanced performance in memory-intensive applications. Practical implementations and benefits of DRAM caching are discussed, highlighting the importance of optimizing both bandwidth and capacity for improved system performance.


Uploaded on Sep 07, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. DICE: Compressing DRAM Caches for Bandwidth and Capacity Vinson Young Prashant Nair Moinuddin Qureshi 1

  2. MOORES LAW HITS BANDWIDTH WALL Moore s scaling encounters Bandwidth Wall 2

  3. 3D-DRAM MITIGATES BANDWIDTH WALL 3D-DRAM Hybrid Memory Cube (HMC) from Micron, High Bandwidth Memory (HBM) from Samsung capacity to replace conventional DIMM memory 3D-DRAM improves bandwidth, but does not have 3

  4. 3D-DRAM AS A CACHE (3D-DRAM CACHE) fast CPU L1$ L2$ CPU L1$ Memory Hierarchy L2$ L3$ 3D-DRAM Cache System Memory MCDRAM from Intel, HBC from AMD slow OS-visible Space Architecting 3D-DRAM as a cache can improve memory bandwidth (and avoid OS/software change) 4

  5. PRACTICAL 3D-DRAM CACHE: ALLOY CACHE Tags part-of-line Alloy Tag+Data Avoid Tag Serialization One Tag+Data Similar to DRAM Cache in KNL: Direct-mapped, Tags in ECC Practical DRAM cache: low latency and bandwidth-efficient

  6. 3D-DRAM CACHE BANDWIDTH IS IMPORTANT On 8-CPU, 1GB DRAM Cache configuration 1.80 2x Capacity 2x Bandwidth, 2x Capacity 1.60 22% 1.40 Speedup 10% 1.20 1.00 0.80 0.60 2x-capacity cache improves performance by 10%. And,additional 2x bandwidth increases speedup to 22%. Improving both bandwidthandcapacity is valuable. 6

  7. INTRODUCTION: DRAM CACHE Baseline: Direct-Mapped, One Data Block in an access A B C D A A A C W Y A C W B B B D X Z C C Y D D Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) Baseline 7 7

  8. INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B C D A W A A C W Y A C W B B X Y B D X Z C C 1x Bandwidth Y D Z D Traditional Compression (Compressible) Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) 8

  9. INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B C D A W A A C W Y A C W B B X Y B D X Z C C 1x Bandwidth Y D Z D Traditional Compression (Compressible) Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) 9

  10. INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B C D A W A A C W Y A C W B B X Y B D X Z 4 accesses @ 1x-2x Capacity C C Y D Z D Traditional Compression (Compressible) Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) 10

  11. INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B C D A W A A C W Y A C W B B X Y B D X Z C C 2x Bandwidth Y D Z D Traditional Compression (Compressible) Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) 11

  12. INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? A B C D B,D? A W A A C W Y A C W < 1x B B X Y B D X Z C C Bandwidth Y D Z D Traditional Compression (Compressible) Traditional Compression (Incompressible) Spatial Indexing (Compressible) Spatial Indexing (Incompressible) 12

  13. INTRODUCTION: COMPRESSED DRAM CACHE Compression: Adds capacity, improve bandwidth? Traditional Compression Spatial Indexing 1x 2x 1x < 1x Bandwidth Bandwidth Bandwidth Bandwidth Compressible Incompressible Compressible Incompressible 13

  14. INTRODUCTION: TRADITIONAL COMPRESSION 1.80 1.60 Improves Capacity 1.40 Speedup Little speedup (7%) No degradation 1.20 1.00 0.80 0.60 Compression for capacity (TSI) sees little speedup (7%) due to diminishing returns on giga-scale caches 14

  15. INTRODUCTION: SPATIAL INDEXING 1.80 Improves Bandwidth 1.60 1.40 Speedup Can degrade No speedup 1.20 1.00 0.80 0.60 Spatial Indexing compression gets both benefits of bandwidth and capacity when lines are compressible. But, it hurts performance when lines are incompressible 15

  16. INTRODUCTION: COMPRESSED DRAM CACHE Goal: Compression for Capacity AND Bandwidth Traditional Compression Spatial Indexing 1x 2x 1x < 1x Bandwidth Bandwidth Bandwidth Bandwidth Compressible Incompressible Compressible Incompressible DICE (Dynamic Index) 19% Speedup + 36% EDP 16

  17. DICE OVERVIEW Compressed DRAM Cache Organization Flexible Mapping for Quick Switching Dynamic Indexing ComprEssion (DICE) Insertion Policy Index Prediction 17

  18. PRACTICAL DRAM CACHE COMPRESSION L3 Cache On-chip Write Read Decompression Logic Compression Logic L4 Cache Controller Writeback Install DRAM Cache (compressed) DRAM Cache Memory Off-chip Compression: Simple changes within the controller 18

  19. DRAM CACHE TAG FORMAT Tag Boundary Data 64 Bytes 8 Bytes Tag A Data A Cache controller receives 72B of tag+data. It can flexibly interpret bits as tag bits or data bits. 19

  20. PROPOSED FLEXIBLE TAG FORMAT Tag Boundary Data Is Tag? Is Tag? Is Tag? Not Tag A B X I X B A We create Tag space as needed, for up to 28 lines. Achieves 1.6x effective capacity. 20

  21. DICE OVERVIEW Compressed DRAM Cache Organization Flexible Mapping for Quick Switching Dynamic Indexing ComprEssion (DICE) Insertion Policy Index Prediction 21

  22. FLEXIBLE MAPPING (TSI OR BAI) 0 1 0 4 0 1 2 3 1 5 4 5 4 5 2 6 2 3 6 7 3 7 6 7 Na ve Spatial Indexing Bandwidth-Aware Indexing (BAI) Traditional Set Indexing (TSI) Bandwidth-Aware Indexing (BAI) facilitates quick switching between two indices TSI and BAI. 22

  23. FLEXIBLE MAPPING (TSI OR BAI) 0 4 0 4 0 1 1 5 1 5 4 5 2 6 2 6 2 3 7 3 3 7 6 7 Na ve Spatial Indexing Bandwidth-Aware Indexing (BAI) Traditional Set Indexing (TSI) Bandwidth-Aware Indexing (BAI) facilitates quick switching between two indices TSI and BAI. 23

  24. FLEXIBLE MAPPING (TSI OR BAI) 0 1 0 4 0 1 4 2 3 1 5 4 1 5 4 5 2 6 2 3 6 6 7 3 7 6 3 7 Na ve Spatial Indexing Bandwidth-Aware Indexing (BAI) Traditional Set Indexing (TSI) Bandwidth-Aware Indexing (BAI) facilitates quick switching between two indices TSI and BAI. 24

  25. DICE OVERVIEW Compressed DRAM Cache Organization Flexible Mapping for Quick Switching Dynamic Indexing ComprEssion (DICE) Insertion Policy Index Prediction 25

  26. DICE: DYNAMIC-INDEXED COMPRESSED CACHE DRAM Cache Compressibility Based Insertion Cache Index Prediction ? ? Traditional Set Index Install Read Bandwidth- Aware Index TSI = BAI DICE: Dynamic-Indexing Cache comprEssion, decides index on install, and predicts index on read 26

  27. COMPRESSIBILITY-BASED INSERTION DRAM Cache Compressibility Based Insertion > -size But checking both wastes bandwidth ? Traditional Set Index Install Read Bandwidth- Aware Index <= -size No explicit swaps. Eviction and install decides policy TSI = BAI Compressibilty-based insertion uses Bandwidth-Aware Indexingwhen lines are compressible,and TSI otherwise 27

  28. SIMILAR INTRA-PAGE COMPRESSIBILITY Indices seen in a Compressible Page Bandwidth- Aware Index Install <= -size Read BAI Bandwidth- Aware Index Lines within a page have similar compressibility Bandwidth- Aware Index DICE is likely to install lines of a page into similar index 28

  29. SIMILAR INTRA-PAGE COMPRESSIBILITY Indices seen in an Incompressible Page Traditional Set Index Traditional Set Index Install > -size Read TSI Traditional Set Index Lines within a page have similar compressibility Traditional Set Index Bandwidth- Aware Index 2nd access only on mispredict Thus, page-based last-time prediction of index can be accurate (94%) 29

  30. PAGE-BASED CACHE INDEX PREDICTOR (CIP) 0 = Traditional Set Index 1 = Bandwidth- Aware Index Demand Access Last-Time Table (LTT) 1 1 Page # Hash Predict Traditional Set Index 0 0 0 1 0 1 Page-basedlast-timepredictionexploitssimilarintra-page compressibility,toachievehighpredictionaccuracy(94%) 30

  31. DICE OVERVIEW Compressed DRAM Cache Organization Flexible Mapping for Quick Switching Dynamic Indexing (DICE) Insertion Policy Index Prediction Results 31

  32. METHODOLOGY (1/8TH KNIGHTS LANDING) Stacked DRAM Commodity DRAM CPU Core Chip 3.2GHz 4-wide out-of-order core 8 cores, 8MB shared last-level cache Compression FPC + BDI 32

  33. METHODOLOGY (1/8TH KNIGHTS LANDING) Other sensitivities in paper Stacked DRAM Commodity DRAM CPU Stacked DRAM 1GB DDR1.6GHz, 128-bit 4 channels 100 GBps 35ns Commodity DRAM 32GB DDR1.6GHz, 64-bit 1 channel 12.5 GBps 35ns Capacity Bus Channels Bandwidth Latency 33

  34. DICE RESULTS 1.80 Traditional Set Indexing Spatial Indexing DICE 1.60 Performs as Spatial Indexing Indexing Performs as Traditional DICE outperforms both 1.40 Speedup 1.20 1.00 0.80 0.60 DICE improves performance over both Spatial Indexing and Traditional Indexing with fine-grain decision (19%) 34

  35. INTRODUCTION: COMPRESSED DRAM CACHE Goal: Compression for Capacity AND Bandwidth Traditional Compression Spatial Indexing 1x 2x 1x < 1x Bandwidth Bandwidth Bandwidth Bandwidth Compressible Incompressible Compressible Incompressible DICE (Dynamic Index) 19% Speedup + 36% EDP 35

  36. THANK YOU 36

  37. EXTRA SLIDES Extra Slides 37

  38. DIFFERENT CACHE SENSITIVITIES 38

  39. COMPARISON TO PREFETCH 39

  40. COMPARISON TO SRAM /MEMORY COMPRESSION 40

  41. FULL RESULTS (MIXED COMPRESSIBILITY) 41

  42. SRAM CACHE COMPRESSION ON DRAM CACHE 42

  43. DISTRIBUTION FOR INDEX DECISION 43

  44. DICE INSERTION THRESHOLD 44

  45. EFFECTIVE CAPACITY 45

  46. L3 HIT RATE IMPROVEMENT 46

  47. LARGER TSI VS. BAI EXAMPLE 47

  48. 48

Related


More Related Content