Enhancing Key-Value Storage with MemC3 and Cuckoo Hashing

MemC3: MemCache with CLOCK
and Concurrent Cuckoo Hashing
Bin Fan (CMU)
Dave Andersen (CMU), Michael Kaminsky (Intel Labs)
Presenter: 
Shouqian Shi
Credit to: Son Nguyen
 
DRAM-based key-
value store 
to alleviate
database load
Set(key, value)
Get(key) -> value
Memcached
 
 
LRU (Least Recently Used) eviction
Often used for small objects 
(Facebook 
[Atikoglu12] 
)
90% keys < 31 bytes
10xM queries per second 
(Facebook 
[Atikoglu12] 
)
Memcached
Memcached internal
 
Chaining hashtable
LRU caching using doubly linked list
Huge Space Overhead
Global Lock
Goals
 
Target: read-intensive workload with
small objects
 
Reduce space overhead (bytes/key)
 
Improve throughput (queries/sec)
 
Result: 3X throughput, 30% more objects
MemCachd: Chaining Hashtable
 
Use linked list
costly space overhead for pointers
 
Pointer dereference is slow
 
(no advantage from CPU cache)
 
Read is not constant time
 
(due to possibly long list)
Cuckoo Hashing
 
Use 2 hash functions to
pick two candidate
positions
 
Each bucket has exactly 4
slots (fits in CPU cache)
 
Each (key, value) object
therefore can reside at
one of the 8 possible slots
(ka,va)
HASH1(ka)
HASH2(ka)
Cuckoo Hashing
(ka,va)
HASH1(ka)
HASH2(ka)
 
b
 
a
Insert a:
Cuckoo Hashing
(kb,vb)
HASH2(kb)
HASH1(kb)
 
c
 
b
Insert b:
Cuckoo Hashing
(kc,vc)
HASH1(kc)
HASH2(kc)
 
c
Insert c:
 
Done !!!
Cuckoo Hashing
Read: 4 lookups on average
Write: write(ka, va)
Find an empty slot in 8 possible slots of ka
If all are full then randomly kick some (kb,
vb) out
Now find an empty slot for (kb, vb)
Repeat 500 times or until an empty slot is
found
If still not found then do table expansion
Cuckoo’s advantages
Concurrency: multiple readers/single writer
Read optimized (entries fit in CPU cache)
Still O(1) amortized time for write
30% less space overhead
95% table occupancy
Floating Problem
Always one guy is outside during the
insertion
false cache miss
Solution: Compute the kick out path
(Cuckoo path) first, then move items
backward
Computed Cuckoo path
(ka,va)
HASH1(ka)
HASH2(ka)
Insert a:
Cuckoo path backward insert
(ka,va)
HASH1(ka)
HASH2(ka)
Insert a:
 
c
 
b
 
a
Cuckoo and optimistic lock
MemCachd: 
Doubly-linked-list
At least two pointers per item
expensive for small key-value pair
Both read and write update the position
change the list’s structure
need locking between threads (no
concurrency)
Solution: CLOCK-based LRU
Only for multiple readers / single writer
Approximate LRU
Circular queue instead of linked list
less space overhead
1 bit per entry vs 16 Bytes
CLOCK example
Read(kd):
Write(kf, vf):
Write(kg, vg):
Originally:
Eviction and lock
Evaluation
68% throughput improvement in all hit case. 235% for all miss
Evaluation
3x throughput on “real” workload
Discussion
Single machine multi-core optimization
no cooperation between machines
need a load balancer
cannot address hotspot on the cluster level
Atomic insertion
lock along the path
The impact of hotspot false eviction?
Slide Note

MemC3: Internal Improvements of memcached servers

Concurrency, memory efficient and better performance

Embed
Share

MemC3 is a specialized key-value store that combines CLOCK and Concurrent Cuckoo Hashing to improve performance and efficiency. Memcached, an established DRAM-based key-value store, is also discussed along with its LRU eviction strategy. The use of internal chaining hashtable and LRU caching in Memcached is explored, highlighting space overhead issues. The goals of reducing space overhead and improving throughput in a read-intensive workload are achieved with a 3X throughput increase and 30% more objects stored. Additionally, Cuckoo Hashing is detailed as a method using two hash functions to optimize key placements.

  • MemC3
  • Cuckoo Hashing
  • Key-Value Store
  • Performance
  • Efficiency

Uploaded on Sep 27, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU) Dave Andersen (CMU), Michael Kaminsky (Intel Labs) Presenter: Shouqian Shi Credit to: Son Nguyen

  2. Memcached DRAM-based key- value store to alleviate database load Set(key, value) Get(key) -> value

  3. Memcached LRU (Least Recently Used) eviction Often used for small objects (Facebook [Atikoglu12] ) 90% keys < 31 bytes 10xM queries per second (Facebook [Atikoglu12] )

  4. Memcached internal Chaining hashtable LRU caching using doubly linked list Global Lock Huge Space Overhead

  5. Goals Target: read-intensive workload with small objects Reduce space overhead (bytes/key) Improve throughput (queries/sec) Result: 3X throughput, 30% more objects

  6. MemCachd: Chaining Hashtable Use linked list costly space overhead for pointers Pointer dereference is slow (no advantage from CPU cache) Read is not constant time (due to possibly long list)

  7. Cuckoo Hashing Use 2 hash functions to pick two candidate positions HASH1(ka) Each bucket has exactly 4 slots (fits in CPU cache) (ka,va) Each (key, value) object therefore can reside at one of the 8 possible slots HASH2(ka)

  8. Cuckoo Hashing X b a X X X Insert a: HASH1(ka) X X X X X (ka,va) X X X HASH2(ka) X c X X X X X X X X X X

  9. Cuckoo Hashing X X X a X Insert b: HASH2(kb) X X X X X (kb,vb) X X X c b X X X HASH1(kb) X X X X X X X X

  10. Cuckoo Hashing X X X a X Insert c: X HASH1(kc) X X c X X (kc,vc) X X X X b X X HASH2(kc) X X X X X X X Done !!! X

  11. Cuckoo Hashing Read: 4 lookups on average Write: write(ka, va) Find an empty slot in 8 possible slots of ka If all are full then randomly kick some (kb, vb) out Now find an empty slot for (kb, vb) Repeat 500 times or until an empty slot is found If still not found then do table expansion

  12. Cuckoos advantages Concurrency: multiple readers/single writer Read optimized (entries fit in CPU cache) Still O(1) amortized time for write 30% less space overhead 95% table occupancy

  13. Floating Problem Always one guy is outside during the insertion false cache miss Solution: Compute the kick out path (Cuckoo path) first, then move items backward

  14. Computed Cuckoo path X X X b X Insert a: HASH1(ka) X X X X X (ka,va) X X X HASH2(ka) X c X X X X X X X X X X

  15. Cuckoo path backward insert X b a X X X Insert a: HASH1(ka) X X X X X (ka,va) X X X HASH2(ka) X X X c X X X X X X X X

  16. Cuckoo and optimistic lock

  17. MemCachd: Doubly-linked-list At least two pointers per item expensive for small key-value pair Both read and write update the position change the list s structure need locking between threads (no concurrency)

  18. Solution: CLOCK-based LRU Only for multiple readers / single writer Approximate LRU Circular queue instead of linked list less space overhead 1 bit per entry vs 16 Bytes

  19. CLOCK example entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve) Originally: recency 0 1 0 0 1 entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve) Read(kd): recency 0 1 0 1 1 entry (ka, va) (kb, vb) (kf, vf) (kd, vd) (ke, ve) Write(kf, vf): recency 0 0 1 1 1 entry (kg, vg) (kb, vb) (kf, vf) (kd, vd) (ke, ve) Write(kg, vg): recency 1 0 1 0 0

  20. Eviction and lock

  21. Evaluation 68% throughput improvement in all hit case. 235% for all miss

  22. Evaluation 3x throughput on real workload

  23. Discussion Single machine multi-core optimization no cooperation between machines need a load balancer cannot address hotspot on the cluster level Atomic insertion lock along the path The impact of hotspot false eviction?

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#