Evolution of Peer-to-Peer Networks and Distributed Hash Tables

Peer to Peer Networks
Distributed Hash Tables
Chord, Kelips, Dynamo
Galen Marchetti, Cornell University
1960s – 1999: Research Origins
ARPANET
every node requests and serves content
No self-organization
USENET
Decentralized messaging system
Self-organized
World Wide Web
Originally imagined as fundamentally P2P
Each node providing and receiving content
Sean Parker
1999-2001: Industry Popularity
Napster
Centralized index
P2P file transfer
FastTrack / Kazaa
“Supernodes” can act as proxy servers and routers
 P2P file transfer
Gnutella
Fully distributed P2P network
Robert Morris
  
Chord Protocol
 Handles one operation:
Map keys to nodes
 With system properties:
Completely decentralized
Dynamic membership
 Hits performance goals:
High availability
Scalable in number of nodes
Decentralization Requirements
User has a key 
k
, wants a node 
n
k
Given k, every node must:
locate 
n
k 
  
OR
Delegate the location of 
n
k
 
to another node
System must eventually return 
n
k
Consistent Hashing System
Given 
k, 
every node can locate 
n
k
Hash every node’s IP address
 map these values on a circle
Given a key 
k
, hash k
 k is assigned to closest node on
circle, moving clockwise.
Consistent Hashing System
Consistent Hashing System
 Pros:
Load Balanced
Dynamic membership
when N
th
 node joins network, only O(1/N) keys are
moved to rebalance
 Con:
Every node must know about every
other node
O(N) memory, O(1) communication
Not scalable in number of nodes
Scaling Consistent Hashing
 Approach 0:
Each node keeps track of only
their successor
 Resolution of hash function done
through routing
 O(1) memory
 O(N) communication
Scaling Consistent Hashing
 Approach 1:
Each node keeps track of O(log N)
successors in a “finger table”
 O(log N) memory
 O(log N) communication
Finger Table Pointers
Routing with Finger Tables
Node Joins
Learn finger table from predecessor
O(log n)
Update other node’s tables
O(log
2
 n)
Notify application for state transfer
O(1)
Concurrent Joins
Maintain correctness by ensuring
successor is likely correct
“Stabilize” periodically
Verify successor
Update a random finger table entry
Handling Failures
Maintain list of “r” immediate
successors
To higher level applications, this list
may be a list of replicas
Chord Shortcomings
High churn rate really hurts the ability
to find keys
Transient network partitions can
permanently disrupt network
Chord does not converge – nodes are
not eventually reachable
Researched with Alloy Modeling by
Pamela Zave at AT&T
Two Circle Failure
Zave, Pamela. "Using lightweight modeling to understand chord." 
ACM SIGCOMM Computer Communication Review
 42.2 (2012): 49-57.
Cornell’s Response
Kelips: Building an Efficient and Stable
P2P DHT Through Increased Memory
and Background Overhead
Gossip!
Leiden; Dec 06
Gossip-Based Networking
Workshop
21
Kelips
30
110
230
202
Take a a collection of
“nodes”
Taken from Gossip-Based Networking Workshop: Leiden ‘06
Leiden; Dec 06
Gossip-Based Networking
Workshop
22
Kelips
0
1
2
30
110
230
202
 
members
per affinity
group
Map nodes to
affinity groups
Affinity Groups:
peer membership thru
consistent
 
hash
Taken from Gossip-Based Networking Workshop: Leiden ‘06
Leiden; Dec 06
Gossip-Based Networking
Workshop
23
Kelips
0
1
2
30
110
230
202
Affinity Groups:
peer membership thru
consistent
 
hash
 
 
 
Affinity group
pointers
members
per affinity
group
Affinity group view
110 knows about
other members –
230, 30…
Taken from Gossip-Based Networking
Workshop: Leiden ‘06
Leiden; Dec 06
Gossip-Based Networking
Workshop
24
Affinity Groups:
peer membership thru
consistent
 
hash
Kelips
0
1
2
30
110
230
202
 
 
 
Contact
pointers
members
per
affinity
group
Affinity group view
Contacts
202 is a “contact”
for 110 in group 2
Taken from Gossip-Based Networking Workshop:
Leiden ‘06
Leiden; Dec 06
Gossip-Based Networking
Workshop
25
Affinity Groups:
peer membership thru
consistent
 
hash
Kelips
0
1
2
30
110
230
202
 
 
 
Gossip protocol
replicates data
cheaply
members
per
affinity
group
Affinity group view
Contacts
Resource Tuples
“cnn.com” maps to group 2.  So
110 tells group 2 to “route”
inquiries about cnn.com to it.
Taken from Gossip-Based Networking Workshop:
Leiden ‘06
Leiden; Dec 06
Gossip-Based Networking
Workshop
26
How it works
Kelips is 
entirely 
gossip based!
Gossip about membership
Gossip to replicate and repair data
Gossip about “last heard from” time used to discard failed nodes
Gossip “channel” uses fixed bandwidth
… fixed rate, packets of limited size
Taken from Gossip-Based Networking Workshop:
Leiden ‘06
Leiden; Dec 06
Gossip-Based Networking
Workshop
27
Connection to self-stabilization
Self-stabilization theory
Describe a system and a desired property
Assume a failure in which code remains correct but node states
are corrupted
Proof obligation: property reestablished within bounded time
Kelips is self-stabilizing.  Chord isn’t.
Taken from Gossip-Based Networking Workshop:
Leiden ‘06
Amazon Dynamo
Highly available distributed hash table
Uses Chord-like ring structure
Two operations:
get()
put()
Following “CAP Theorem” lore…
Sacrifice consistency
Gain availability
No “ACID” transactions
Performance Requirements
Service Level Agreement (SLA)
Cloud providers must maintain certain
performance levels according to
contracts
Clients describe an expected request
rate distribution: SLA describes
expected latency
Amazon expresses SLA’s at 99.9
th
percentile of latency
High Availability for Writes
Clients write to first node they find
Vector clocks timestamp writes
Different versions of key’s value live
on different nodes
Conflicts are resolved during reads
Like git: “automerge conflict” is
handled by end application
Incremental Scalability
Consistent Hashing a la Chord
Utilize “virtual nodes” along ring
Many virtual nodes per physical
node
larger machines can hold more
virtual nodes
Heterogeneous hardware is properly
load balanced
Membership
Background gossip
propagates membership knowledge
Gives O(1) hops for routing
Heartbeats and timeouts
detects failures
Replication: Sloppy Quorum
Each node maintains a “preference
list” of replicas for its own data
Replicas are made on first N 
healthy
nodes from preference list
require R nodes to respond for get()
require W nodes to respond for put()
Replication: Sloppy Quorum
Quorum System
: R + W > N, W > N/2
Dynamo: W < N, R < N
R, W, N are tunable
Blessing:
 highly flexible
Curse:
 developers must know how to
work with Dynamo correctly
Replication: Hinted Handoff
If replica node is down
Use next node on preference list as
replica
Include “hint” declaring the original
replica
Periodically check if original comes
back up : update replica
Permanent Failure Recovery
Anti-Entropy: Merkle Trees
Maintain a tree per virtual node
Every leaf is a hash of a block of
data (value with an individual key)
Every node is the hash of its
children
Quick check for consistency
Slide Note
Embed
Share

Peer-to-peer networks and distributed hash tables have evolved significantly over the years, from the early days of ARPANET to the emergence of decentralized systems like Chord, Kelips, and Dynamo. This evolution has brought about a shift towards greater decentralization, improved scalability, and enhanced performance in peer-to-peer communication and data storage. Key figures like Sean Parker and Robert Morris have played important roles in shaping the development of these technologies, paving the way for more efficient and robust distributed systems.

  • Peer-to-Peer Networks
  • Distributed Hash Tables
  • Decentralization
  • Scalability
  • Chord Protocol

Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University

  2. 1960s 1999: Research Origins ARPANET every node requests and serves content No self-organization USENET Decentralized messaging system Self-organized World Wide Web Originally imagined as fundamentally P2P Each node providing and receiving content

  3. Sean Parker

  4. 1999-2001: Industry Popularity Napster Centralized index P2P file transfer FastTrack / Kazaa Supernodes can act as proxy servers and routers P2P file transfer Gnutella Fully distributed P2P network

  5. Robert Morris

  6. Chord Protocol Handles one operation: Map keys to nodes With system properties: Completely decentralized Dynamic membership Hits performance goals: High availability Scalable in number of nodes

  7. Decentralization Requirements User has a key k, wants a node nk Given k, every node must: locate nkOR Delegate the location of nkto another node System must eventually return nk

  8. Consistent Hashing System Given k, every node can locate nk Hash every node s IP address map these values on a circle Given a key k, hash k k is assigned to closest node on circle, moving clockwise.

  9. Consistent Hashing System 6 1 0 1 successor(1) = 1 7 successor(2) = 3 6 2 successor(6) = 0 5 3 4 2 Figure 2: An identifier circle consisting of the three nodes 0, 1, and 3. In this example, key 1 is located at node 1, key 2 at node 3, and key 6 at node 0. Figure 1: Structure of an example Chord-based distributed storage system. Chord improves the scalability of consistent hashing by avoid- ing the requirement that every node know about every other node. A Chord node needs only a small amount of routing informa- tion about other nodes. Because this information is distributed, a node resolves the hash function by communicating witha few other nodes. In an -node network, each node maintains information only about other nodes, and a lookup requires messages. Chord must update the routing information when a node joins or leaves the network; a join or leave requires machine is onlyoccasionally available, theycan offer tostore others data while they are up, in return for having their data stored elsewhere when they are down. The data s name can serve as a key to identify the (live) Chord node responsible for storing the data item at any given time. Many of the same issues arise as in the Cooperative Mirroring applica- tion, though the focus here is on availability rather than load balance. messages. Distributed Indexes tosupport Gnutella- or Napster-like keyword search. A key in this application could be derived from the desired keywords, while values could be lists of machines offering documents with those keywords. Consistent Hashing The consistent hash function assigns each node and key an identifier using a base hash function such as SHA-1 [9]. A node s identifier is chosen by hashing the node s IP address, while a key identifier is produced by hashing the key. We will use the term key to refer to both the original key and its image under the hash function, as the meaning will be clear from context. Similarly, the term node will refer to both the node and its identifier under the hash function. The identifier length make theprobability of two nodes or keys hashing to the sameiden- tifier negligible. Consistent hashing assigns keys to nodes as follows. Identifiers are ordered in an identifier circle modulo the first node whose identifier is equal to or follows (the identifier of) in the identifier space. This node is called the successor node of key , denoted by successor a circle of numbers from to first node clockwise from . Figure 2 shows an identifier circle with three nodes: 0, 1, and 3. The successor of identifier 1 is node 1, so key 1 would be located at node 1. Similarly, key 2 would be located at node 3, and key 6 at node 0. Consistent hashing is designed to let nodes enter and leave the network with minimal disruption. To maintain the consistent hash- ing mapping when a node joins the network, certain keys previ- ously assigned to s successor now become assigned to node leaves the network, all of its assigned keys are reassigned to s successor. No other changes in assignment of keys to nodes need occur. In the example above, if a node were to join with iden- tifier 7, it would capture the key with identifier 6 from the node with identifier 0. The following results are proven in the papers that introduced consistent hashing [11, 13]: 4.2 -bit Large-Scale Combinatorial Search, such as code breaking. In this case keys are candidate solutions to the problem (such as cryptographic keys); Chord maps these keys to the machines responsible for testing them as solutions. must be large enough to Figure 1 shows a possible three-layered software structure for a cooperative mirror system. The highest layer would provide a file- like interface to users, including user-friendly naming and authenti- cation. This filesystem layer might implement named directories and files, mapping operations on them to lower-level block opera- tions. The next layer, a block storage layer, would implement the block operations. It would take care of storage, caching, and replication of blocks. The block storage layer would use Chord to identify the node responsible for storing a block, and then talk to the block storage server on that node to read or write the block. . Key is assigned to . If identifiers are represented as , then is the . The circle has The Base Chord Protocol The Chord protocol specifies how to find the locations of keys, how new nodes join the system, and how to recover from the failure (or planned departure) of existing nodes. This section describes a simplified version of the protocol that does not handle concurrent joins or failures. Section 5 describes enhancements to the base pro- tocol to handle concurrent joins and failures. 4. . When 4.1 Overview At its heart, Chord provides fast distributed computation of a hash function mapping keys to nodes responsible for them. It uses consistent hashing [11, 13], which has several good properties. With high probability the hash function balances load (all nodes receive roughly the same number of keys). Also with high prob- ability, when an node joins (or leaves) the network, only an fraction of the keys are moved to a different location this is clearly the minimum necessary to maintain a balanced load. THEOREM 1. For any set of probability: nodes and keys, with high 1. Each node is responsible for at most keys 3

  10. Consistent Hashing System Pros: Load Balanced Dynamic membership when Nthnode joins network, only O(1/N) keys are moved to rebalance Con: Every node must know about every other node O(N) memory, O(1) communication Not scalable in number of nodes

  11. Scaling Consistent Hashing Approach 0: Each node keeps track of only their successor Resolution of hash function done through routing O(1) memory O(N) communication

  12. Scaling Consistent Hashing Approach 1: Each node keeps track of O(log N) successors in a finger table O(log N) memory O(log N) communication

  13. Finger Table Pointers

  14. Routing with Finger Tables

  15. Node Joins Learn finger table from predecessor O(log n) Update other node s tables O(log2 n) Notify application for state transfer O(1)

  16. Concurrent Joins Maintain correctness by ensuring successor is likely correct Stabilize periodically Verify successor Update a random finger table entry

  17. Handling Failures Maintain list of r immediate successors To higher level applications, this list may be a list of replicas

  18. Chord Shortcomings High churn rate really hurts the ability to find keys Transient network partitions can permanently disrupt network Chord does not converge nodes are not eventually reachable Researched with Alloy Modeling by Pamela Zave at AT&T

  19. Two Circle Failure Zave, Pamela. "Using lightweight modeling to understand chord." ACM SIGCOMM Computer Communication Review 42.2 (2012): 49-57.

  20. Cornells Response Gossip! Kelips: Building an Efficient and Stable P2P DHT Through Increased Memory and Background Overhead

  21. Kelips Leiden; Dec 06 Take a a collection of nodes Gossip-Based Networking Workshop 110 230 202 30 21 Taken from Gossip-Based Networking Workshop: Leiden 06

  22. Kelips Affinity Groups: peer membership thru consistent hash Leiden; Dec 06 Map nodes to affinity groups N 0 1 2 1 Gossip-Based Networking Workshop 110 N members per affinity group 230 202 30 22 Taken from Gossip-Based Networking Workshop: Leiden 06

  23. Kelips 110 knows about other members 230, 30 Affinity Groups: peer membership thru consistent hash Leiden; Dec 06 Affinity group view id hbeat rtt 30 234 90ms 0 1 2 N 1 Gossip-Based Networking Workshop 230 322 30ms 110 N 230 202 members per affinity group 30 Affinity group pointers 23 Taken from Gossip-Based Networking Workshop: Leiden 06

  24. Kelips 202 is a contact for 110 in group 2 Affinity Groups: peer membership thru consistent hash Leiden; Dec 06 Affinity group view id hbeat rtt 30 234 90ms N 0 1 2 1 Gossip-Based Networking Workshop 230 322 30ms 110 N Contacts 230 202 members per group contactNode affinity group 30 2 202 Contact pointers 24 Taken from Gossip-Based Networking Workshop: Leiden 06

  25. Kelips cnn.com maps to group 2. So 110 tells group 2 to route inquiries about cnn.com to it. Affinity Groups: peer membership thru consistent hash Leiden; Dec 06 Affinity group view id hbeat rtt 30 234 90ms N 0 1 2 1 Gossip-Based Networking Workshop 230 322 30ms 110 N Contacts 230 202 members per group contactNode affinity group 30 2 202 Resource Tuples Gossip protocol replicates data cheaply resource info 25 Taken from Gossip-Based Networking Workshop: Leiden 06 cnn.com 110

  26. How it works Kelips is entirely gossip based! Gossip about membership Gossip to replicate and repair data Gossip about last heard from time used to discard failed nodes Gossip channel uses fixed bandwidth fixed rate, packets of limited size Leiden; Dec 06 Gossip-Based Networking Workshop 26 Taken from Gossip-Based Networking Workshop: Leiden 06

  27. Connection to self-stabilization Self-stabilization theory Describe a system and a desired property Assume a failure in which code remains correct but node states are corrupted Proof obligation: property reestablished within bounded time Kelips is self-stabilizing. Chord isn t. Leiden; Dec 06 Gossip-Based Networking Workshop 27 Taken from Gossip-Based Networking Workshop: Leiden 06

  28. Amazon Dynamo Highly available distributed hash table Uses Chord-like ring structure Two operations: get() put() Following CAP Theorem lore Sacrifice consistency Gain availability No ACID transactions

  29. Performance Requirements Service Level Agreement (SLA) Cloud providers must maintain certain performance levels according to contracts Clients describe an expected request rate distribution: SLA describes expected latency Amazon expresses SLA s at 99.9th percentile of latency

  30. High Availability for Writes Clients write to first node they find Vector clocks timestamp writes Different versions of key s value live on different nodes Conflicts are resolved during reads Like git: automerge conflict is handled by end application

  31. Incremental Scalability Consistent Hashing a la Chord Utilize virtual nodes along ring Many virtual nodes per physical node larger machines can hold more virtual nodes Heterogeneous hardware is properly load balanced

  32. Membership Background gossip propagates membership knowledge Gives O(1) hops for routing Heartbeats and timeouts detects failures

  33. Replication: Sloppy Quorum Each node maintains a preference list of replicas for its own data Replicas are made on first N healthy nodes from preference list require R nodes to respond for get() require W nodes to respond for put()

  34. Replication: Sloppy Quorum Quorum System: R + W > N, W > N/2 Dynamo: W < N, R < N R, W, N are tunable Blessing: highly flexible Curse: developers must know how to work with Dynamo correctly

  35. Replication: Hinted Handoff If replica node is down Use next node on preference list as replica Include hint declaring the original replica Periodically check if original comes back up : update replica

  36. Permanent Failure Recovery Anti-Entropy: Merkle Trees Maintain a tree per virtual node Every leaf is a hash of a block of data (value with an individual key) Every node is the hash of its children Quick check for consistency

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#