Enhancing Key-Value Storage with MemC3 and Cuckoo Hashing

MemC3: MemCache with CLOCK

and Concurrent Cuckoo Hashing

Bin Fan (CMU)

Dave Andersen (CMU), Michael Kaminsky (Intel Labs)

Presenter:

Shouqian Shi

Credit to: Son Nguyen

•

DRAM-based key-

value store

to alleviate

database load

–

Set(key, value)

–

Get(key) -> value

Memcached

•

LRU (Least Recently Used) eviction

•

Often used for small objects

(Facebook

[Atikoglu12]

–

90% keys < 31 bytes

•

10xM queries per second

(Facebook

[Atikoglu12]

Memcached

Memcached internal

•

Chaining hashtable

•

LRU caching using doubly linked list

Huge Space Overhead

Global Lock

Goals

•

Target: read-intensive workload with

small objects

•

Reduce space overhead (bytes/key)

•

Improve throughput (queries/sec)

–

Result: 3X throughput, 30% more objects

MemCachd: Chaining Hashtable

•

Use linked list

–

costly space overhead for pointers

•

Pointer dereference is slow

(no advantage from CPU cache)

•

Read is not constant time

(due to possibly long list)

Cuckoo Hashing

•

Use 2 hash functions to

pick two candidate

positions

•

Each bucket has exactly 4

slots (fits in CPU cache)

•

Each (key, value) object

therefore can reside at

one of the 8 possible slots

(ka,va)

HASH1(ka)

HASH2(ka)

Cuckoo Hashing

(ka,va)

HASH1(ka)

HASH2(ka)

Insert a:

Cuckoo Hashing

(kb,vb)

HASH2(kb)

HASH1(kb)

Insert b:

Cuckoo Hashing

(kc,vc)

HASH1(kc)

HASH2(kc)

Insert c:

Done !!!

Cuckoo Hashing

•

Read: 4 lookups on average

•

Write: write(ka, va)

–

Find an empty slot in 8 possible slots of ka

–

If all are full then randomly kick some (kb,

vb) out

–

Now find an empty slot for (kb, vb)

–

Repeat 500 times or until an empty slot is

found

–

If still not found then do table expansion

Cuckoo’s advantages

•

Concurrency: multiple readers/single writer

•

Read optimized (entries fit in CPU cache)

•

Still O(1) amortized time for write

•

30% less space overhead

•

95% table occupancy

Floating Problem

•

Always one guy is outside during the

insertion

–

false cache miss

•

Solution: Compute the kick out path

(Cuckoo path) first, then move items

backward

Computed Cuckoo path

(ka,va)

HASH1(ka)

HASH2(ka)

Insert a:

Cuckoo path backward insert

(ka,va)

HASH1(ka)

HASH2(ka)

Insert a:

Cuckoo and optimistic lock

MemCachd:

Doubly-linked-list

•

At least two pointers per item

–

expensive for small key-value pair

•

Both read and write update the position

–

change the list’s structure

–

need locking between threads (no

concurrency)

Solution: CLOCK-based LRU

•

Only for multiple readers / single writer

•

Approximate LRU

•

Circular queue instead of linked list

–

less space overhead

–

1 bit per entry vs 16 Bytes

CLOCK example

Read(kd):

Write(kf, vf):

Write(kg, vg):

Originally:

Eviction and lock

Evaluation

68% throughput improvement in all hit case. 235% for all miss

Evaluation

3x throughput on “real” workload

Discussion

•

Single machine multi-core optimization

–

no cooperation between machines

–

need a load balancer

–

cannot address hotspot on the cluster level

•

Atomic insertion

–

lock along the path

•

The impact of hotspot false eviction?

Slide Note

MemC3: Internal Improvements of memcached servers

Concurrency, memory efficient and better performance

Embed Share

Download

MemC3 is a specialized key-value store that combines CLOCK and Concurrent Cuckoo Hashing to improve performance and efficiency. Memcached, an established DRAM-based key-value store, is also discussed along with its LRU eviction strategy. The use of internal chaining hashtable and LRU caching in Memcached is explored, highlighting space overhead issues. The goals of reducing space overhead and improving throughput in a read-intensive workload are achieved with a 3X throughput increase and 30% more objects stored. Additionally, Cuckoo Hashing is detailed as a method using two hash functions to optimize key placements.

kish_2 Follow

Uploaded on Sep 27, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU) Dave Andersen (CMU), Michael Kaminsky (Intel Labs) Presenter: Shouqian Shi Credit to: Son Nguyen

Memcached DRAM-based key- value store to alleviate database load Set(key, value) Get(key) -> value

Memcached LRU (Least Recently Used) eviction Often used for small objects (Facebook [Atikoglu12] ) 90% keys < 31 bytes 10xM queries per second (Facebook [Atikoglu12] )

Memcached internal Chaining hashtable LRU caching using doubly linked list Global Lock Huge Space Overhead

Goals Target: read-intensive workload with small objects Reduce space overhead (bytes/key) Improve throughput (queries/sec) Result: 3X throughput, 30% more objects

MemCachd: Chaining Hashtable Use linked list costly space overhead for pointers Pointer dereference is slow (no advantage from CPU cache) Read is not constant time (due to possibly long list)

Cuckoo Hashing Use 2 hash functions to pick two candidate positions HASH1(ka) Each bucket has exactly 4 slots (fits in CPU cache) (ka,va) Each (key, value) object therefore can reside at one of the 8 possible slots HASH2(ka)

Cuckoo Hashing X b a X X X Insert a: HASH1(ka) X X X X X (ka,va) X X X HASH2(ka) X c X X X X X X X X X X

Cuckoo Hashing X X X a X Insert b: HASH2(kb) X X X X X (kb,vb) X X X c b X X X HASH1(kb) X X X X X X X X

Cuckoo Hashing X X X a X Insert c: X HASH1(kc) X X c X X (kc,vc) X X X X b X X HASH2(kc) X X X X X X X Done !!! X

Cuckoo Hashing Read: 4 lookups on average Write: write(ka, va) Find an empty slot in 8 possible slots of ka If all are full then randomly kick some (kb, vb) out Now find an empty slot for (kb, vb) Repeat 500 times or until an empty slot is found If still not found then do table expansion

Cuckoos advantages Concurrency: multiple readers/single writer Read optimized (entries fit in CPU cache) Still O(1) amortized time for write 30% less space overhead 95% table occupancy

Floating Problem Always one guy is outside during the insertion false cache miss Solution: Compute the kick out path (Cuckoo path) first, then move items backward

Computed Cuckoo path X X X b X Insert a: HASH1(ka) X X X X X (ka,va) X X X HASH2(ka) X c X X X X X X X X X X

Cuckoo path backward insert X b a X X X Insert a: HASH1(ka) X X X X X (ka,va) X X X HASH2(ka) X X X c X X X X X X X X

Cuckoo and optimistic lock

MemCachd: Doubly-linked-list At least two pointers per item expensive for small key-value pair Both read and write update the position change the list s structure need locking between threads (no concurrency)

Solution: CLOCK-based LRU Only for multiple readers / single writer Approximate LRU Circular queue instead of linked list less space overhead 1 bit per entry vs 16 Bytes

CLOCK example entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve) Originally: recency 0 1 0 0 1 entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve) Read(kd): recency 0 1 0 1 1 entry (ka, va) (kb, vb) (kf, vf) (kd, vd) (ke, ve) Write(kf, vf): recency 0 0 1 1 1 entry (kg, vg) (kb, vb) (kf, vf) (kd, vd) (ke, ve) Write(kg, vg): recency 1 0 1 0 0

Eviction and lock

Evaluation 68% throughput improvement in all hit case. 235% for all miss

Evaluation 3x throughput on real workload

Discussion Single machine multi-core optimization no cooperation between machines need a load balancer cannot address hotspot on the cluster level Atomic insertion lock along the path The impact of hotspot false eviction?

Enhancing Key-Value Storage with MemC3 and Cuckoo Hashing

Download Presentation

Presentation Transcript

Related

More Related Content