Enhancing Key-Value Storage with MemC3 and Cuckoo Hashing

Slide Note
Embed
Share

MemC3 is a specialized key-value store that combines CLOCK and Concurrent Cuckoo Hashing to improve performance and efficiency. Memcached, an established DRAM-based key-value store, is also discussed along with its LRU eviction strategy. The use of internal chaining hashtable and LRU caching in Memcached is explored, highlighting space overhead issues. The goals of reducing space overhead and improving throughput in a read-intensive workload are achieved with a 3X throughput increase and 30% more objects stored. Additionally, Cuckoo Hashing is detailed as a method using two hash functions to optimize key placements.


Uploaded on Sep 27, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU) Dave Andersen (CMU), Michael Kaminsky (Intel Labs) Presenter: Shouqian Shi Credit to: Son Nguyen

  2. Memcached DRAM-based key- value store to alleviate database load Set(key, value) Get(key) -> value

  3. Memcached LRU (Least Recently Used) eviction Often used for small objects (Facebook [Atikoglu12] ) 90% keys < 31 bytes 10xM queries per second (Facebook [Atikoglu12] )

  4. Memcached internal Chaining hashtable LRU caching using doubly linked list Global Lock Huge Space Overhead

  5. Goals Target: read-intensive workload with small objects Reduce space overhead (bytes/key) Improve throughput (queries/sec) Result: 3X throughput, 30% more objects

  6. MemCachd: Chaining Hashtable Use linked list costly space overhead for pointers Pointer dereference is slow (no advantage from CPU cache) Read is not constant time (due to possibly long list)

  7. Cuckoo Hashing Use 2 hash functions to pick two candidate positions HASH1(ka) Each bucket has exactly 4 slots (fits in CPU cache) (ka,va) Each (key, value) object therefore can reside at one of the 8 possible slots HASH2(ka)

  8. Cuckoo Hashing X b a X X X Insert a: HASH1(ka) X X X X X (ka,va) X X X HASH2(ka) X c X X X X X X X X X X

  9. Cuckoo Hashing X X X a X Insert b: HASH2(kb) X X X X X (kb,vb) X X X c b X X X HASH1(kb) X X X X X X X X

  10. Cuckoo Hashing X X X a X Insert c: X HASH1(kc) X X c X X (kc,vc) X X X X b X X HASH2(kc) X X X X X X X Done !!! X

  11. Cuckoo Hashing Read: 4 lookups on average Write: write(ka, va) Find an empty slot in 8 possible slots of ka If all are full then randomly kick some (kb, vb) out Now find an empty slot for (kb, vb) Repeat 500 times or until an empty slot is found If still not found then do table expansion

  12. Cuckoos advantages Concurrency: multiple readers/single writer Read optimized (entries fit in CPU cache) Still O(1) amortized time for write 30% less space overhead 95% table occupancy

  13. Floating Problem Always one guy is outside during the insertion false cache miss Solution: Compute the kick out path (Cuckoo path) first, then move items backward

  14. Computed Cuckoo path X X X b X Insert a: HASH1(ka) X X X X X (ka,va) X X X HASH2(ka) X c X X X X X X X X X X

  15. Cuckoo path backward insert X b a X X X Insert a: HASH1(ka) X X X X X (ka,va) X X X HASH2(ka) X X X c X X X X X X X X

  16. Cuckoo and optimistic lock

  17. MemCachd: Doubly-linked-list At least two pointers per item expensive for small key-value pair Both read and write update the position change the list s structure need locking between threads (no concurrency)

  18. Solution: CLOCK-based LRU Only for multiple readers / single writer Approximate LRU Circular queue instead of linked list less space overhead 1 bit per entry vs 16 Bytes

  19. CLOCK example entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve) Originally: recency 0 1 0 0 1 entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve) Read(kd): recency 0 1 0 1 1 entry (ka, va) (kb, vb) (kf, vf) (kd, vd) (ke, ve) Write(kf, vf): recency 0 0 1 1 1 entry (kg, vg) (kb, vb) (kf, vf) (kd, vd) (ke, ve) Write(kg, vg): recency 1 0 1 0 0

  20. Eviction and lock

  21. Evaluation 68% throughput improvement in all hit case. 235% for all miss

  22. Evaluation 3x throughput on real workload

  23. Discussion Single machine multi-core optimization no cooperation between machines need a load balancer cannot address hotspot on the cluster level Atomic insertion lock along the path The impact of hotspot false eviction?

More Related Content