Overlay Networks and Consistent Hashing in Distributed Systems

undefined
 
CS 3700
Networks and Distributed Systems
 
Overlay Networks
(P2P DHT via KBR FTW)
 
Revised 11/04/2019
undefined
 
Consistent Hashing
Structured Overlays / DHTs
 
Outline
 
2
Key/Value Storage Service
3
 
One server is probably fine as long as total pairs < 1M
How do we scale the service as pairs grows?
Add more servers and distribute the data across them
Imagine a simple service that stores key/value pairs
Similar to memcached or redis
Mapping Keys to Servers
4
Problem: how do you map keys to servers?
<“key1”, “value1”>
<“key2”, “value2”>
<“key3”, “value3”>
 
?
Keep in mind, the
number of servers
may change (e.g.
we could add a
new server, or a
server could crash)
Hash Tables
5
hash(key) % n 
 array index
Array
(length = n)
<“key1”, “value1”>
<“key2”, “value2”>
<“key3”, “value3”>
 
<“key1”, “value1”>
 
<“key2”, “value2”>
 
<“key3”, “value3”>
(Bad) Distributed Key/Value Service
6
hash(str) % n 
array index
Array
(length = n)
IP address of node A
IP address of node B
IP address of node C
IP address of node D
 
IP address of node E
Number of servers (
n
) will change
Need a “deterministic” mapping
As few changes as possible when machines join/leave
(length = n + 1)
<“key1”, “value1”>
<“key2”, “value2”>
<“key3”, “value3”>
Consistent Hashing
7
 
Alternative hashing algorithm with many beneficial characteristics
1.
Deterministic
 (just like normal hashing algorithms)
2.
Balanced
: given 
n
 servers, each server should get roughly 
1/n
 keys
3.
Locality sensitive
: if a server is added, only 
1/(n+1) 
keys need to be moved
Conceptually simple
Imagine a circular number line from 0
1
Hash the ID of each server and place it on the number line
Hash each key and place it at the next server on the number line
Move around the circle clockwise to find the next server
Consistent Hashing Example
8
(hash(str) % 256)/256
 ring location
0
<“key1”, “value1”>
<“key2”, “value2”>
<“key3”, “value3”>
“server A”
“server B”
“server C”
1
“server D”
 
“server E”
 
Practical Implementation
 
9
 
In practice, no need to implement complicated number lines
Store a list of servers, sorted by their hash (floats from 0 
 1)
To 
put()
 or 
get()
 a pair, hash the key and search through the list for the first
server where hash(server) >= hash(key)
O(log n) search time if we use a sorted data structure like a heap
O(log n) time to insert a new server into the list
Improvements to Consistent Hashing
10
 
Problem: hashing may not result in perfect
balance (
1/n
 items per server)
Solution: balance the load by hashing each
server multiple times
 
consistent_hash(“key1”) = 0.4
 
Problem: if a server fails, data may be lost
Solution: replicate keys/value pairs on
multiple servers
0
1
 
consistent_hash(“serverA_1”) = …
consistent_hash(“serverA_2”) = …
consistent_hash(“serverA_3”) = …
 
Consistent Hashing Summary
 
11
 
Consistent hashing is a simple, powerful tool for building distributed systems
Provides consistent, deterministic mapping between 
names
 and 
servers
Often called 
locality sensitive hashing
Ideal algorithm for systems that need to scale up or down gracefully
Many, many systems use consistent hashing
CDNs
Databases: memcached, redis, Voldemort, Dynamo, Cassandra, etc.
Overlay networks (more on this coming up…)
undefined
 
Consistent Hashing
Structured Overlays / DHTs
 
Outline
 
12
 
Layering, Revisited
 
13
 
Application
 
Transport
 
Network
 
Data Link
 
Physical
 
Network
 
Data Link
 
Application
 
Transport
 
Network
 
Data Link
 
Physical
 
Host 1
 
Router
 
Host 2
 
Physical
 
Layering hides low level details from higher layers
IP is a logical, point-to-point overlay
Towards Network Overlays
14
 
IP provides best-effort, point-to-point datagram service
Maybe you want additional features not supported by IP or even TCP
Multicast
Security
Reliable, performance-based routing
Content addressing, reliable data storage
Idea: 
overlay
 an additional routing layer on top of IP that adds additional
features
Example: Virtual Private Network (VPN)
15
VPNs encapsulate IP packets over an IP network
34.67.0.1
34.67.0.2
34.67.0.3
34.67.0.4
Internet
Private
Private
Public
Dest: 74.11.0.2
74.11.0.1
74.11.0.2
Dest: 34.67.0.4
Network Overlays
16
Application
Transport
Network
Data Link
Physical
Network
Data Link
Application
Transport
Network
Data Link
Physical
Host 1
Router
Host 2
Physical
VPN Network
VPN Network
P2P Overlay
P2P Overlay
 
Network Layer, version 2?
 
17
 
Function:
Provide natural, resilient routes based on keys
Enable new classes of P2P applications
Key challenge:
Routing table overhead
Performance penalty vs. IP
Application
Network
Transport
Network
Data Link
Physical
Unstructured P2P Review
18
Why Do We Need Structure?
19
 
Without structure, it is difficult to search
Any file can be on any machine
Centralization can solve this (i.e. Napster), but we know how that ends
How do you build a P2P network with structure?
1.
Give every machine and object a unique name
2.
Map from objects 
 machines
Looking for object 
A
? Map(
A
)
X
, talk to machine 
X
Looking for object 
B?
 Map(
B
)
Y
, talk to machine 
Y
Is this starting to sound familiar?
Naïve Overlay Network
20
P2P file-sharing network
Peers choose random IDs
Locate files by hashing
their names
 
0.322
 
hash(“GoT…”) = 0.314
 
Problems?
How do you know
the IP addresses of
arbitrary peers?
There may be
millions of peers
Peers come and go
at random (churn)
Structured Overlay Fundamentals
21
 
Every machine chooses a unique, random ID
Used for routing and object location, instead of IP addresses
Deterministic Key
Node mapping
Consistent hashing
Allows peer rendezvous using a common name
Key-based routing
Scalable to any network of size 
N
Each node needs to know the IP of b*log
b
(
N
) other nodes
Much better scalability than OSPF/RIP/BGP
Routing from node A
B takes at most log
b
(
N
) hops
Structured Overlays at 10,000ft.
22
Node IDs and keys from a randomized namespace
Incrementally route towards to destination ID
Each node knows a small number of IDs + IPs
 
To: ABCD
 
A
930
 
AB
5F
 
ABC
0
 
ABC
E
 
Structured Overlay Implementations
 
23
 
Many P2P structured overlay implementations
Generation 1: Chord, Tapestry, Pastry, CAN
Generation 2: Kademlia, SkipNet, Viceroy, Symphony, Koorde, Ulysseus,
Shared goals and design
Large, sparse, randomized ID space
All nodes choose IDs randomly
Nodes insert themselves into overlay based on ID
Given a key 
k
, overlay deterministically maps 
k 
to its 
root
 node (a live
node in the overlay)
Details
24
 
Structured overlay APIs
route(key, msg) 
: route 
msg
 to node responsible for 
key
Just like sending a packet to an IP address
Distributed hash table (DHT) functionality
put(key, value) 
: store value at node/
key
get(key)
 : retrieve stored value for 
key
 at node
Key questions:
Node ID space, what does it represent?
How do you route within the ID space?
How big are the routing tables?
How many hops to a destination (in the worst case)?
 
Tapestry/Pastry
25
 
Node IDs are numbers in a ring
160-bit circular ID space
Node IDs chosen at random
Messages for key 
X
 is routed to live node
with longest prefix match to 
X
Incremental prefix routing
1110
: 1XXX
11XX
111X
1110
0
1000
0100
0010
1110
1100
1010
0110
1111 | 0
 
To: 1110
Physical and Virtual Routing
26
0
1000
0100
0010
1110
1100
1010
0110
1111 | 0
 
To: 1110
 
To: 1110
1010
1100
1111
0010
 
Problem: Routing Table Size
 
27
 
Definitions:
N 
is the size of the network
b
 is the base of the node IDs
d 
is the number of digits in node IDs
b
d
 = N
If 
N
 is large, then a naïve routing table is going to be huge
Assume a flat naming space (kind of like MAC addresses)
A client knows its own ID
To send to any other node, would need to know 
N-1
 other IP addresses
Suppose 
N
 = 1 billion :(
Tapestry/Pastry Routing Tables
28
 
Incremental prefix routing
Definitions:
N 
is the size of the network
b
 is the base of the node IDs
d 
is the number of digits in node IDs
b
d
 = N
How many neighbors at each prefix digit?
b-1
How big is the routing table?
Total size: 
b * d
Or, equivalently: 
b * log
b
 N
log
b
 N
 hops to any destination
0
1000
0100
0010
1110
1100
1010
0110
1111 | 0
 
1011
 
0
011
 
1
1
10
 
10
0
0
 
101
0
Derivation
Definitions:
N 
is the size of the network
b
 is the base of the node IDs
d 
is the number of digits in node IDs
b
d
 = N
b
d 
can enumerate up to N items
 
Routing table size is 
b * d
 
b
d
 = N
d * log b = log N
d = log N / log b
d = log
b
 N
 
Thus, routing table is size 
b * log
b
 N
29
Key result!
Size of routing tables grows logarithmically
to the size of the network
Huge P2P overlays are totally feasible
Routing Table Example
30
Hexadecimal (base-16), node ID = 65a1fc4
Row 0
 
Row 1
 
Row 2
 
Row 3
Each 
x
 is the
IP address
of a peer
Routing, One More Time
31
Each node has a routing table
Routing table size:
b * d 
or
 b * log
b
 N
Hops to any destination:
log
b
 N
0
1000
0100
0010
1110
1100
1010
0110
1111 | 0
 
To: 1110
 
Leaf Sets
 
32
 
Each node has an additional table of the 
L
/2 numerically closest
neighbors
Larger and smaller
Uses
Alternate routes
Fault detection (keep-alive)
Replication of data
Joining the Overlay
33
 
1.
Pick a new ID 
X
2.
Contact an arbitrary bootstrap
node
3.
Route a message to 
X
, discover
the current owner
4.
Add new node to the ring
5.
Download routes from new
neighbors, update leaf sets
0
1000
 
0100
0010
1110
1100
1010
0110
1111 | 0
 
 0011
 
Node Departure
 
34
 
Leaf set members exchange periodic keep-alive messages
Handles local failures
Leaf set repair:
Request the leaf set from the farthest node in the set
Routing table repair:
Get table from peers in row 0, then row 1, …
Periodic, lazy
DHTs and Consistent Hashing
35
0
1000
0100
0010
1110
1100
1010
0110
1111 | 0
 
To: 1101
 
Mappings are deterministic in consistent
hashing
Nodes can leave
Nodes can enter
Most data does not move
Only local changes impact data
placement
Data is replicated among the leaf set
 
Summary of Structured Overlays
 
A namespace
For most, this is a linear range from 0 to some large number
A mapping from key to node
Chord: keys between node X and its predecessor belong to X
Tapestry/Pastry: keys belong to node w/ closest identifier
CAN: well defined N-dimensional space for each node
 
36
 
Summary, Continued
 
A routing algorithm
Numeric (Chord), prefix-based (Tapestry/Pastry/Chimera), hypercube (CAN)
Routing state: how much info kept per node
Tapestry/Pastry:  b * log
b
N
i
th
 column specifies nodes that match i digit prefix, but differ on (i+1)
th
 digit
Chord:  log
2
N pointers
i
th
 pointer points to MyID+ ( N * (0.5)
i 
)
CAN: 2*d neighbors for d dimensions
 
37
 
Structured Overlay Advantages and Uses
 
38
 
High level advantages
Complete decentralized
Self-organizing
Scalable and (relatively) robust
Applications
Reliable distributed storage
OceanStore (FAST’03), Mnemosyne (IPTPS’02)
Resilient anonymous communication
Cashmere (NSDI’05)
Consistent state management
Dynamo (SOSP’07)
Many, many others
Multicast, spam filtering, reliable routing, email services, even distributed mutexes
Trackerless BitTorrent
39
0
1000
0100
0010
1110
1100
1010
0110
1111 | 0
Torrent Hash: 1101
 
Tracker
 
Initial Seed
 
Leecher
Swarm
 
Initial Seed
 
Tracker
 
Leecher
Slide Note

8/22/2012

Defense

Christo Wilson

Embed
Share

Understanding the concept of overlay networks and consistent hashing in distributed systems is crucial for scalability and efficient data storage. Overlay networks like P2P DHT via KBR offer a decentralized approach for managing data while consistent hashing provides a balanced and deterministic way to map keys to servers, ensuring minimal data movement during server changes. This approach enables scalable key/value storage services and efficient mapping of keys to servers in dynamic environments.

  • Distributed Systems
  • Overlay Networks
  • Consistent Hashing
  • Scalability
  • Data Storage

Uploaded on Sep 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. CS 3700 Networks and Distributed Systems Overlay Networks (P2P DHT via KBR FTW) Revised 11/04/2019

  2. Outline 2 Consistent Hashing Structured Overlays / DHTs

  3. Key/Value Storage Service 3 Imagine a simple service that stores key/value pairs Similar to memcached or redis put( ajackson , abc ) get( ajackson ) abc One server is probably fine as long as total pairs < 1M How do we scale the service as pairs grows? Add more servers and distribute the data across them

  4. Mapping Keys to Servers 4 Problem: how do you map keys to servers? Keep in mind, the number of servers may change (e.g. we could add a new server, or a server could crash) ? < key1 , value1 > < key2 , value2 > < key3 , value3 >

  5. Hash Tables 5 Array (length = n) < key2 , value2 > < key1 , value1 > hash(key) % n array index < key2 , value2 > < key3 , value3 > < key1 , value1 > < key3 , value3 >

  6. (Bad) Distributed Key/Value Service 6 k2 Array A (length = n) (length = n + 1) IP address of node A < key1 , value1 > B IP address of node B hash(str) % n array index < key2 , value2 > IP address of node C k3 C IP address of node D < key3 , value3 > IP address of node E k1 D Number of servers (n) will change Need a deterministic mapping As few changes as possible when machines join/leave E

  7. Consistent Hashing 7 Alternative hashing algorithm with many beneficial characteristics Deterministic (just like normal hashing algorithms) Balanced: given n servers, each server should get roughly 1/n keys Locality sensitive: if a server is added, only 1/(n+1) keys need to be moved 1. 2. 3. Conceptually simple Imagine a circular number line from 0 1 Hash the ID of each server and place it on the number line Hash each key and place it at the next server on the number line Move around the circle clockwise to find the next server

  8. Consistent Hashing Example 8 k2 server A A server B 1 0 server C k2 B server D A C (hash(str) % 256)/256 ring location server E k3 C k1 k3 < key1 , value1 > B k1 < key2 , value2 > E D < key3 , value3 > D E

  9. Practical Implementation 9 In practice, no need to implement complicated number lines Store a list of servers, sorted by their hash (floats from 0 1) To put() or get() a pair, hash the key and search through the list for the first server where hash(server) >= hash(key) O(log n) search time if we use a sorted data structure like a heap O(log n) time to insert a new server into the list

  10. Improvements to Consistent Hashing 10 Problem: hashing may not result in perfect balance (1/n items per server) Solution: balance the load by hashing each server multiple times 1 0 B A A B consistent_hash( serverA_1 ) = consistent_hash( serverA_2 ) = consistent_hash( serverA_3 ) = B A 1 0 Problem: if a server fails, data may be lost Solution: replicate keys/value pairs on multiple servers B k1 A consistent_hash( key1 ) = 0.4

  11. Consistent Hashing Summary 11 Consistent hashing is a simple, powerful tool for building distributed systems Provides consistent, deterministic mapping between names and servers Often called locality sensitive hashing Ideal algorithm for systems that need to scale up or down gracefully Many, many systems use consistent hashing CDNs Databases: memcached, redis, Voldemort, Dynamo, Cassandra, etc. Overlay networks (more on this coming up )

  12. Outline 12 Consistent Hashing Structured Overlays / DHTs

  13. Layering, Revisited 13 Layering hides low level details from higher layers IP is a logical, point-to-point overlay Host 1 Host 2 Router Application Transport Network Data Link Physical Application Transport Network Data Link Physical Network Data Link Physical

  14. Towards Network Overlays 14 IP provides best-effort, point-to-point datagram service Maybe you want additional features not supported by IP or even TCP Multicast Security Reliable, performance-based routing Content addressing, reliable data storage Idea: overlay an additional routing layer on top of IP that adds additional features

  15. Example: Virtual Private Network (VPN) 15 Private Public Private 34.67.0.1 34.67.0.3 VPN is an IP over IP overlay Not all overlays need to be IP-based 74.11.0.1 74.11.0.2 Internet 34.67.0.4 34.67.0.2 Dest: 74.11.0.2 Dest: 34.67.0.4 VPNs encapsulate IP packets over an IP network

  16. Network Overlays 16 Host 1 Host 2 Router Application Application P2P Overlay P2P Overlay Transport Transport VPN Network VPN Network Network Network Network Data Link Data Link Data Link Physical Physical Physical

  17. Network Layer, version 2? 17 Function: Provide natural, resilient routes based on keys Enable new classes of P2P applications Application Network Key challenge: Routing table overhead Performance penalty vs. IP Transport Network Data Link Physical

  18. Unstructured P2P Review 18 Redundancy What if the file is rare or far away? Search is broken High overhead No guarantee it will work Traffic Overhead

  19. Why Do We Need Structure? 19 Without structure, it is difficult to search Any file can be on any machine Centralization can solve this (i.e. Napster), but we know how that ends How do you build a P2P network with structure? Give every machine and object a unique name Map from objects machines Looking for object A? Map(A) X, talk to machine X Looking for object B? Map(B) Y, talk to machine Y 1. 2. Is this starting to sound familiar?

  20. Nave Overlay Network 20 P2P file-sharing network Problems? 1 0 Peers choose random IDs How do you know the IP addresses of arbitrary peers? There may be millions of peers Peers come and go at random (churn) Locate files by hashing their names 0.322 GoT_s03e04.mkv hash( GoT ) = 0.314

  21. Structured Overlay Fundamentals 21 Every machine chooses a unique, random ID Used for routing and object location, instead of IP addresses Deterministic Key Node mapping Consistent hashing Allows peer rendezvous using a common name Advantages Completely decentralized Self organizing Infinitely scalable Key-based routing Scalable to any network of size N Each node needs to know the IP of b*logb(N) other nodes Much better scalability than OSPF/RIP/BGP Routing from node A B takes at most logb(N) hops

  22. Structured Overlays at 10,000ft. 22 Node IDs and keys from a randomized namespace Incrementally route towards to destination ID Each node knows a small number of IDs + IPs ABCE Each node has a routing table ABC0 Forward to the longest prefix match To: ABCD AB5F A930

  23. Details 24 Structured overlay APIs route(key, msg) : route msg to node responsible for key Just like sending a packet to an IP address Distributed hash table (DHT) functionality put(key, value) : store value at node/key get(key) : retrieve stored value for key at node Key questions: Node ID space, what does it represent? How do you route within the ID space? How big are the routing tables? How many hops to a destination (in the worst case)?

  24. Tapestry/Pastry 25 Node IDs are numbers in a ring 160-bit circular ID space 1111 | 0 To: 1110 Node IDs chosen at random 0 Messages for key X is routed to live node with longest prefix match to X Incremental prefix routing 1110: 1XXX 11XX 111X 1110 1110 0010 0100 1100 1010 0110 1000

  25. Physical and Virtual Routing 26 1111 | 0 To: 1110 0 1111 1110 0010 To: 1110 0100 1100 0010 1100 1010 0110 1000 1010

  26. Problem: Routing Table Size 27 Definitions: N is the size of the network b is the base of the node IDs d is the number of digits in node IDs bd = N If N is large, then a na ve routing table is going to be huge Assume a flat naming space (kind of like MAC addresses) A client knows its own ID To send to any other node, would need to know N-1 other IP addresses Suppose N = 1 billion :(

  27. Tapestry/Pastry Routing Tables 28 Incremental prefix routing Definitions: N is the size of the network b is the base of the node IDs d is the number of digits in node IDs bd = N How many neighbors at each prefix digit? b-1 How big is the routing table? Total size: b * d Or, equivalently: b * logb N logb N hops to any destination 1111 | 0 1110 0 0011 1110 0010 0100 1100 1011 1010 0110 1000 1010 1000

  28. Derivation 29 Definitions: N is the size of the network b is the base of the node IDs d is the number of digits in node IDs bd = N bd can enumerate up to N items Routing table size is b * d bd = N d * log b = log N d = log N / log b d = logb N Key result! Size of routing tables grows logarithmically to the size of the network Huge P2P overlays are totally feasible Thus, routing table is size b * logb N

  29. Routing Table Example 30 Hexadecimal (base-16), node ID = 65a1fc4 Each x is the IP address of a peer Row 0 Row 1 d Rows (d = length of node ID) Row 2 Row 3

  30. Routing, One More Time 31 Each node has a routing table 1111 | 0 Routing table size: b * d or b * logb N Hops to any destination: logb N To: 1110 0 1110 0010 0100 1100 1010 0110 1000

  31. Leaf Sets 32 Each node has an additional table of the L/2 numerically closest neighbors Larger and smaller Uses Alternate routes Fault detection (keep-alive) Replication of data

  32. Joining the Overlay 33 Pick a new ID X Contact an arbitrary bootstrap node Route a message to X, discover the current owner Add new node to the ring Download routes from new neighbors, update leaf sets 1. 1111 | 0 2. 0 1110 0010 3. 0100 1100 4. 5. 1010 0110 0011 1000

  33. Node Departure 34 Leaf set members exchange periodic keep-alive messages Handles local failures Leaf set repair: Request the leaf set from the farthest node in the set Routing table repair: Get table from peers in row 0, then row 1, Periodic, lazy

  34. DHTs and Consistent Hashing 35 Mappings are deterministic in consistent hashing Nodes can leave Nodes can enter Most data does not move 1111 | 0 To: 1101 0 1110 0010 Only local changes impact data placement Data is replicated among the leaf set 0100 1100 1010 0110 1000

  35. Structured Overlay Advantages and Uses 38 High level advantages Complete decentralized Self-organizing Scalable and (relatively) robust Applications Reliable distributed storage OceanStore (FAST 03), Mnemosyne (IPTPS 02) Resilient anonymous communication Cashmere (NSDI 05) Consistent state management Dynamo (SOSP 07) Many, many others Multicast, spam filtering, reliable routing, email services, even distributed mutexes

  36. Trackerless BitTorrent 39 Torrent Hash: 1101 Tracker 1111 | 0 Leecher 0 Tracker 1110 0010 Initial Seed 0100 1100 Swarm 1010 0110 Leecher Initial Seed 1000

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#