Evolution of Peer-to-Peer Networks and Distributed Hash Tables
Peer-to-peer networks and distributed hash tables have evolved significantly over the years, from the early days of ARPANET to the emergence of decentralized systems like Chord, Kelips, and Dynamo. This evolution has brought about a shift towards greater decentralization, improved scalability, and enhanced performance in peer-to-peer communication and data storage. Key figures like Sean Parker and Robert Morris have played important roles in shaping the development of these technologies, paving the way for more efficient and robust distributed systems.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University
1960s 1999: Research Origins ARPANET every node requests and serves content No self-organization USENET Decentralized messaging system Self-organized World Wide Web Originally imagined as fundamentally P2P Each node providing and receiving content
1999-2001: Industry Popularity Napster Centralized index P2P file transfer FastTrack / Kazaa Supernodes can act as proxy servers and routers P2P file transfer Gnutella Fully distributed P2P network
Chord Protocol Handles one operation: Map keys to nodes With system properties: Completely decentralized Dynamic membership Hits performance goals: High availability Scalable in number of nodes
Decentralization Requirements User has a key k, wants a node nk Given k, every node must: locate nkOR Delegate the location of nkto another node System must eventually return nk
Consistent Hashing System Given k, every node can locate nk Hash every node s IP address map these values on a circle Given a key k, hash k k is assigned to closest node on circle, moving clockwise.
Consistent Hashing System 6 1 0 1 successor(1) = 1 7 successor(2) = 3 6 2 successor(6) = 0 5 3 4 2 Figure 2: An identifier circle consisting of the three nodes 0, 1, and 3. In this example, key 1 is located at node 1, key 2 at node 3, and key 6 at node 0. Figure 1: Structure of an example Chord-based distributed storage system. Chord improves the scalability of consistent hashing by avoid- ing the requirement that every node know about every other node. A Chord node needs only a small amount of routing informa- tion about other nodes. Because this information is distributed, a node resolves the hash function by communicating witha few other nodes. In an -node network, each node maintains information only about other nodes, and a lookup requires messages. Chord must update the routing information when a node joins or leaves the network; a join or leave requires machine is onlyoccasionally available, theycan offer tostore others data while they are up, in return for having their data stored elsewhere when they are down. The data s name can serve as a key to identify the (live) Chord node responsible for storing the data item at any given time. Many of the same issues arise as in the Cooperative Mirroring applica- tion, though the focus here is on availability rather than load balance. messages. Distributed Indexes tosupport Gnutella- or Napster-like keyword search. A key in this application could be derived from the desired keywords, while values could be lists of machines offering documents with those keywords. Consistent Hashing The consistent hash function assigns each node and key an identifier using a base hash function such as SHA-1 [9]. A node s identifier is chosen by hashing the node s IP address, while a key identifier is produced by hashing the key. We will use the term key to refer to both the original key and its image under the hash function, as the meaning will be clear from context. Similarly, the term node will refer to both the node and its identifier under the hash function. The identifier length make theprobability of two nodes or keys hashing to the sameiden- tifier negligible. Consistent hashing assigns keys to nodes as follows. Identifiers are ordered in an identifier circle modulo the first node whose identifier is equal to or follows (the identifier of) in the identifier space. This node is called the successor node of key , denoted by successor a circle of numbers from to first node clockwise from . Figure 2 shows an identifier circle with three nodes: 0, 1, and 3. The successor of identifier 1 is node 1, so key 1 would be located at node 1. Similarly, key 2 would be located at node 3, and key 6 at node 0. Consistent hashing is designed to let nodes enter and leave the network with minimal disruption. To maintain the consistent hash- ing mapping when a node joins the network, certain keys previ- ously assigned to s successor now become assigned to node leaves the network, all of its assigned keys are reassigned to s successor. No other changes in assignment of keys to nodes need occur. In the example above, if a node were to join with iden- tifier 7, it would capture the key with identifier 6 from the node with identifier 0. The following results are proven in the papers that introduced consistent hashing [11, 13]: 4.2 -bit Large-Scale Combinatorial Search, such as code breaking. In this case keys are candidate solutions to the problem (such as cryptographic keys); Chord maps these keys to the machines responsible for testing them as solutions. must be large enough to Figure 1 shows a possible three-layered software structure for a cooperative mirror system. The highest layer would provide a file- like interface to users, including user-friendly naming and authenti- cation. This filesystem layer might implement named directories and files, mapping operations on them to lower-level block opera- tions. The next layer, a block storage layer, would implement the block operations. It would take care of storage, caching, and replication of blocks. The block storage layer would use Chord to identify the node responsible for storing a block, and then talk to the block storage server on that node to read or write the block. . Key is assigned to . If identifiers are represented as , then is the . The circle has The Base Chord Protocol The Chord protocol specifies how to find the locations of keys, how new nodes join the system, and how to recover from the failure (or planned departure) of existing nodes. This section describes a simplified version of the protocol that does not handle concurrent joins or failures. Section 5 describes enhancements to the base pro- tocol to handle concurrent joins and failures. 4. . When 4.1 Overview At its heart, Chord provides fast distributed computation of a hash function mapping keys to nodes responsible for them. It uses consistent hashing [11, 13], which has several good properties. With high probability the hash function balances load (all nodes receive roughly the same number of keys). Also with high prob- ability, when an node joins (or leaves) the network, only an fraction of the keys are moved to a different location this is clearly the minimum necessary to maintain a balanced load. THEOREM 1. For any set of probability: nodes and keys, with high 1. Each node is responsible for at most keys 3
Consistent Hashing System Pros: Load Balanced Dynamic membership when Nthnode joins network, only O(1/N) keys are moved to rebalance Con: Every node must know about every other node O(N) memory, O(1) communication Not scalable in number of nodes
Scaling Consistent Hashing Approach 0: Each node keeps track of only their successor Resolution of hash function done through routing O(1) memory O(N) communication
Scaling Consistent Hashing Approach 1: Each node keeps track of O(log N) successors in a finger table O(log N) memory O(log N) communication
Node Joins Learn finger table from predecessor O(log n) Update other node s tables O(log2 n) Notify application for state transfer O(1)
Concurrent Joins Maintain correctness by ensuring successor is likely correct Stabilize periodically Verify successor Update a random finger table entry
Handling Failures Maintain list of r immediate successors To higher level applications, this list may be a list of replicas
Chord Shortcomings High churn rate really hurts the ability to find keys Transient network partitions can permanently disrupt network Chord does not converge nodes are not eventually reachable Researched with Alloy Modeling by Pamela Zave at AT&T
Two Circle Failure Zave, Pamela. "Using lightweight modeling to understand chord." ACM SIGCOMM Computer Communication Review 42.2 (2012): 49-57.
Cornells Response Gossip! Kelips: Building an Efficient and Stable P2P DHT Through Increased Memory and Background Overhead
Kelips Leiden; Dec 06 Take a a collection of nodes Gossip-Based Networking Workshop 110 230 202 30 21 Taken from Gossip-Based Networking Workshop: Leiden 06
Kelips Affinity Groups: peer membership thru consistent hash Leiden; Dec 06 Map nodes to affinity groups N 0 1 2 1 Gossip-Based Networking Workshop 110 N members per affinity group 230 202 30 22 Taken from Gossip-Based Networking Workshop: Leiden 06
Kelips 110 knows about other members 230, 30 Affinity Groups: peer membership thru consistent hash Leiden; Dec 06 Affinity group view id hbeat rtt 30 234 90ms 0 1 2 N 1 Gossip-Based Networking Workshop 230 322 30ms 110 N 230 202 members per affinity group 30 Affinity group pointers 23 Taken from Gossip-Based Networking Workshop: Leiden 06
Kelips 202 is a contact for 110 in group 2 Affinity Groups: peer membership thru consistent hash Leiden; Dec 06 Affinity group view id hbeat rtt 30 234 90ms N 0 1 2 1 Gossip-Based Networking Workshop 230 322 30ms 110 N Contacts 230 202 members per group contactNode affinity group 30 2 202 Contact pointers 24 Taken from Gossip-Based Networking Workshop: Leiden 06
Kelips cnn.com maps to group 2. So 110 tells group 2 to route inquiries about cnn.com to it. Affinity Groups: peer membership thru consistent hash Leiden; Dec 06 Affinity group view id hbeat rtt 30 234 90ms N 0 1 2 1 Gossip-Based Networking Workshop 230 322 30ms 110 N Contacts 230 202 members per group contactNode affinity group 30 2 202 Resource Tuples Gossip protocol replicates data cheaply resource info 25 Taken from Gossip-Based Networking Workshop: Leiden 06 cnn.com 110
How it works Kelips is entirely gossip based! Gossip about membership Gossip to replicate and repair data Gossip about last heard from time used to discard failed nodes Gossip channel uses fixed bandwidth fixed rate, packets of limited size Leiden; Dec 06 Gossip-Based Networking Workshop 26 Taken from Gossip-Based Networking Workshop: Leiden 06
Connection to self-stabilization Self-stabilization theory Describe a system and a desired property Assume a failure in which code remains correct but node states are corrupted Proof obligation: property reestablished within bounded time Kelips is self-stabilizing. Chord isn t. Leiden; Dec 06 Gossip-Based Networking Workshop 27 Taken from Gossip-Based Networking Workshop: Leiden 06
Amazon Dynamo Highly available distributed hash table Uses Chord-like ring structure Two operations: get() put() Following CAP Theorem lore Sacrifice consistency Gain availability No ACID transactions
Performance Requirements Service Level Agreement (SLA) Cloud providers must maintain certain performance levels according to contracts Clients describe an expected request rate distribution: SLA describes expected latency Amazon expresses SLA s at 99.9th percentile of latency
High Availability for Writes Clients write to first node they find Vector clocks timestamp writes Different versions of key s value live on different nodes Conflicts are resolved during reads Like git: automerge conflict is handled by end application
Incremental Scalability Consistent Hashing a la Chord Utilize virtual nodes along ring Many virtual nodes per physical node larger machines can hold more virtual nodes Heterogeneous hardware is properly load balanced
Membership Background gossip propagates membership knowledge Gives O(1) hops for routing Heartbeats and timeouts detects failures
Replication: Sloppy Quorum Each node maintains a preference list of replicas for its own data Replicas are made on first N healthy nodes from preference list require R nodes to respond for get() require W nodes to respond for put()
Replication: Sloppy Quorum Quorum System: R + W > N, W > N/2 Dynamo: W < N, R < N R, W, N are tunable Blessing: highly flexible Curse: developers must know how to work with Dynamo correctly
Replication: Hinted Handoff If replica node is down Use next node on preference list as replica Include hint declaring the original replica Periodically check if original comes back up : update replica
Permanent Failure Recovery Anti-Entropy: Merkle Trees Maintain a tree per virtual node Every leaf is a hash of a block of data (value with an individual key) Every node is the hash of its children Quick check for consistency