Maximizing Cache Hit Rate with LHD: An Overview
This presentation discusses the concept of Least Hit Density (LHD) for improving cache hit rates, focusing on the challenges and benefits of key-value caches in maximizing performance through efficient eviction policies like LRU. It emphasizes the importance of cache hit rates in enhancing web application speed and explores strategies to optimize cache performance in variable workload scenarios.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
LHD: Improving Cache Hit Rate by Maximizing Hit Density Nathan Beckmann Haoxian Chen Asaf Cidon CMU U. Penn Stanford & Barracuda Networks USENIX NSDI 2018
Outline Background and motivation Least hit density (LHD) RankCache Conclusion and future work
Outline Background and motivation Least hit density (LHD) RankCache Conclusion and future work
Key-value cache Key-value cache is 100X faster than database. Web application servers typically send requests to the cache cluster over the network with latencies of about 100 s while querying a much slower database is with latencies of about 10 ms.
Key-value cache Key-value cache hit rate determines web application performance At 98% cache hit rate: +1% hit rate --> 35% speedup Old latency: 374 s New latency: 278 s Facebook study [Atikoglu, Sigmetrics 12] Even small hit rate improvements cause significant speedup
key-value cache Goal: Maximize cache hit rate Constraint: Limited cache space Uncertainty: In practice, don t know what is accessed when Difficulty: Objects have variable sizes
Eviction policy Eviction policies show up in many contexts, e.g., OS page management, database buffer management, web proxies, and processors. Least recently used (LRU) is widely used. It uses only recency, or how long it has been since an object was last referenced, to decide which object to evict. However, LRU also has common pathologies that hurt its performance. In other words, LRU assumes that recently used objects are always more valuable. But common access patterns like scans (e.g., AB. . . Z AB. . . Z . . . ) violate this assumption.
Eviction policy Choosing the right eviction policy is hard Key-value caches have unique challenges Variable object sizes Variable workloads Prior policies are heuristics that combine recency and frequency No theoretical foundation Require hand-tuning --> fragile to workload changes No policy works for all workloads Prior system simulates many cache policy configurations to find right one per workload [Waldspurger, ATC 17]
Outline Background and motivation Least hit density (LHD) RankCache Conclusion and future work
Cache space The figure shows how cache space is used over time. Throughout this paper, time is measured in accesses, not wall-clock time. Each block represents an object where cache space is shown vertically, and time increases from left to right.
Hit density (HD) Time => Cache space is shown vertically, and time increases from left to right. Each block thus represents a single object lifetime Space => Each block is colored green or red, indicating whether it ends in a hit or eviction, respectively. Cost proportional to size & time spent in cache
Hit density (HD) Hit density combines hit probability and expected time. HD = P/(S L) P: Object's hit probability; S: Object's size; L: Object's expected lifetime. Least hit density (LHD) policy: Evict object with smallest hit density measures an object s contribution to the cache s hit rate (in units of hits per byte-access)
Hit density (HD) Age is the number of accesses since an object was last referenced. For example, if an object enters the cache at access T, hits at accesses T + 4 and T + 6, and is evicted at access T + 12, then it has two hits at age 4 and 2 and is evicted at age 6 (each reference resets age to zero).
LHD by example Consider an example application that scans repeatedly over a few objects, and accesses many other objects with Zipf-like popularity distribution. In this case, each scanned object is accessed frequently , whereas each Zipf-like object is accessed much less frequently. The cache should ideally therefore keep the scanned objects and evict the Zipf- like objects when necessary.
LHD by example This figure illustrates this application s access pattern, namely the distribution of time (measured in accesses) between references to the same object. Summary of access pattern Scanned objects produce a characteristic peak around a single reference time, as all are accessed together at once. Zipf-like objects yield a long tail of reference times.
LHD by example This figure illustrates LHD s behavior on this example application, showing the distribution of hits and evictions vs. an object s age. Distribution of hits and evictions. LHD keeps the scanned objects and popular Zipf references, as desired.
LHD by example It plots the predicted hit density for objects of different ages. Predicted hit density. The hit density is high up until the scanning peak, because LHD predicts that objects are potentially one of the scanned objects, and might hit quickly. It drops after the scanning peak because it learns they are Zipf objects and therefore unlikely to hit quickly.
Hit density (HD) Hit density combines hit probability and expected time. HD = P/(S L) P: Object's hit probability; S: Object's size; L: Object's expected lifetime. Least hit density (LHD) policy: Evict object with smallest hit density LHD s challenge is to predict an object s hit density, without knowing whether it will result in a hit or eviction, nor how long it will spend in the cache.
Hit density (HD) LHD infers these quantities in real-time using probability distributions. Specifically, LHD uses distributions of hit and eviction age. Random variables H hit age (e.g., P[H = 100] is probability an object hits after 100 accesses) L lifetime (e.g., P[L = 100] is probability an object hits or is evicted after 100 accesses) Easy to estimate HD from these quantities:
Hit density (HD) Estimate HD using conditional probability Monitor distribution of H By definition, object of age awasn t requested at age a -->Ignore all events before a &L online Hit probability = Expected remaining lifetime =
Improving predictions To incorporate reference frequency, they count how many times each object has been referenced and gather separate hit and eviction age distributions for each reference count. To generalize, an object belongs to an equivalence class c; e.g., c could be all objects that have been referenced twice. LHD predict this object s hit density as: where P[H = a|c] and P[L = a|c] are the conditional hit and lifetime age distributions for class c.
LHD LHD gets more hits than prior policies Miss ratio (%) Size (GB)
LHD LHD needs much less space
Outline Background and motivation Least hit density (LHD) RankCache Conclusion and future work
The problem Prior complex policies require complex data structures Synchronization --> poor scalability --> unacceptable request throughput Policies like GDSF require O(log N) heaps Even O(1) LRU is sometimes too slow because of synchronization Many key-value systems approximate LRU with CLOCK / FIFO MemC3 [Fan, NSDI 13], MICA [Lim, NSDI 14]... Can LHD achieve similar request throughput to production systems?
RankCache makes LHD fast Track information approximately (eg, coarsen ages) Precompute HD as table indexed by age & app id & etc Randomly sample objects to find victim Similar to Redis, Memshare [Cidon, ATC 17], [Psounis, INFOCOM 01],
Making evictions fast Two key techniques make LHD practical: (i) precomputation and (ii) sampling. No global synchronization --> Great scalability!
Making hits fast Metadata updated locally --> no global data structure Same scalability benefits as CLOCK, FIFO vs. LRU
Throughput They present throughput at 90% and 100% hit ratio where the former represents a realistic deployment, the latter peak performance. Serial bottlenecks dominate --> LHD best throughput (a) 90% Hit ratio. (b) 100% Hit ratio.
Outline Background and motivation Least hit density (LHD) RankCache Conclusion and future work
Conclusion The authors introduce hit density, an intuitive, workload-agnostic metric for ranking objects during eviction. Dynamic ranking enables LHD to adapt its eviction strategy to different application workloads over time without any hand tuning. RankCache is a key-value cache based on memcached that efficiently implements LHD (and other policies). On average, LHD needs 8 less space than LRU , 2.4 less than GDSF and 2.9 less than AdaptSize. Finally, at 16 threads, RankCache achieves 16 higher throughput than list-based LRU and, at 90% hit rate, 2 higher throughput than CLOCK .
Future directions Dynamic latency / bandwidth optimization Smoothly and dynamically switch between optimized hit ratio and byte-hit ratio Optimizing end-to-end response latency App touches multiple objects per request One such object evicted --> others should be evicted too Modeling cost, e.g., to maximize write endurance in FLASH / NVM Predict which objects are worth writing to 2nd tier storage from memory
Related Work Using conditional probabilities for eviction policies in CPU caches EVA [Beckmann, HPCA 16, 17] Fixed object sizes Different ranking function Prior replacement policies Key-value: Hyperbolic [Blankstein, ATC 17], Simulations [Waldspurger, ATC 17], AdaptSize [Berger, NSDI 17], Cliffhanger [Cidon, NSDI 16]... Non key-value: ARC [Megiddo, FAST 03], SLRU [Karedla, Computer 94], LRU-K [O Neil, Sigmod 93]... Heuristic based Require tuning or simulation
LHD Case study vs. AdaptSize [Berger et al, NSDI 17] AdaptSize improves LRU by bypassing most large objects LHD admits all objects --> more hits from big objects LHD evicts big objects quickly --> small objects survive longer --> more hits (b) LHD. (a) AdaptSize.
Memory management Many key-value caches use slab allocators (eg, memcached) Bounded fragmentation & fast ...But no global eviction policy --> poor hit ratio Strategy: balance victim hit density across slab classes Similar to Cliffhanger [Cidon, NSDI 16] and GDWheel [Li, EuroSys 15] Slab classes incur negligible impact on hit rate