Distributed Hash Tables in Peer-to-Peer Systems

undefined
 
Distributed Hash Tables
 
 
Distributed Hash Table
 
A common approach is to use a 
Distributed Hash
Table
 (DHT)
 to organize nodes
 
Traditional hash functions convert a key to a hash
value, which can be used as an index into a hash table.
Keys are unique
 – Each represents an object to store in the
table
The hash function value is used to insert an object in the hash
table and to retrieve it.
 
 
Distributed Hash Table (DHT)
 
DHT = distributed P2P database
Database has 
(key, value) 
pairs;
key: ss number; value: human name
key: content type; value: IP address
Peers 
query
 DB with key
DB returns values that match the key
Peers can also 
insert
 (key, value) peers
 
DHT Identifiers
 
Assign integer identifier to each peer in range [0,2
n
-1].
Each identifier can be represented by n bits.
Require each key to be an integer in 
same range
.
To get integer keys, hash original key.
eg, key = h(
Led Zeppelin IV
)
This is why they call it a distributed 
hash
 table
 
How to assign keys to peers?
 
Central issue:
Assigning (key, value) pairs to peers.
Rule: assign key to the peer that has the 
closest
 ID.
Convention in lecture: closest is the 
immediate
successor 
of the key.
Ex: n=4; peers: 1,3,4,5,8,10,12,14;
key = 13, then successor  peer = 14
key = 15, then successor peer = 1
 
Circular DHT (1)
 
Each peer only aware of immediate successor and
predecessor.
Overlay network
Circle DHT  (2)
0001
0011
0100
0101
1000
1010
1100
1111
 
O(N) messages
on avg to resolve
query, when there
are N peers
 
1110
 
1110
 
1110
 
1110
 
1110
 
1110
Define 
closest
as closest
successor
Circular DHT with Shortcuts
Each peer keeps track of IP addresses of predecessor, successor,
short cuts.
Reduced from 6 to 2 messages.
Possible to design shortcuts so O(log N) neighbors, O(log N)
messages in query
Peer Churn
 
Peer 5 abruptly leaves
Peer 4 detects; makes 8 its immediate successor; asks 8
who its immediate successor is; makes 8
s immediate
successor its second successor.
What if peer 13 wants to join?
To handle peer churn, require 
each peer to know the IP address 
of its two successors. 
 Each peer periodically pings its 
two successors to see if they 
are still alive
. 
undefined
 
S
S
Y
Y
N
N
C
C
H
H
O
O
R
R
N
N
I
I
Z
Z
A
A
T
T
I
I
O
O
N
N
 
C
C
LOCK
LOCK
 S
 S
YNCHRONIZATION
YNCHRONIZATION
 
Motivation
 
Why 
clock synchronization 
matters?
Control access 
to a single, shared resource
Agree on the 
ordering of events
Time-based
 computations on multiple machines
Many applications require that clocks advance at
similar rates
Real-time scheduling 
events based on processor clock
Setting timeouts 
and 
measuring latencies
 
Synchronization
 
Synchronization within one system is hard enough
Messages
Semaphores,
Monitors,
Barriers,
Synchronization among processes in a distributed
system is much harder
Two approaches to achieve synchronization
Synchronization based on physical time
Synchronization using 
relative ordering 
rather than
absolute time
 
Clock Synchronization
 
In a centralized system, time is 
unambiguous
 
In a distributed system, achieving agreement
on time is 
hard
It is impossible to guarantee that independent
physical clocks are fully synchronized
 
Lack of a single global time source can cause errors
undefined
 
S
S
Y
Y
N
N
C
C
H
H
O
O
R
R
N
N
I
I
Z
Z
A
A
T
T
I
I
O
O
N
N
 
P
P
HYSICAL
HYSICAL
 C
 C
LOCK
LOCK
s
s
 
Clock Synchronization – Actual Time
 
For time dependent applications the 
actual time 
is
important
External physical clocks become critical
For efficiency and redundancy, multiple physical clocks
are needed
Two issues arise:
1.
How to synchronize the physical clocks with real-world
clocks?
2.
How to synchronize these clocks with each other?
 
 
Time Measurement
 
Atomic clocks make it possible to measure time
accurately
Several laboratories, equipped with Cesium 133 clocks,
periodically report the number of times their clock has
ticket to the Bureau International de l’Heure (BIH)
BIH computes the average to produce the International
Atomic Time (TAI), the number of seconds since
01/01/1958 midnight divided by 9,192,631,370
TAI seconds are of constant length, unlike solar seconds.
As the mean solar day is getting longer, TAI get out of synch
Now, 86,600 TAI seconds are about 3 msec less than a mean
solar day
 
 
Physical clocks
 
Computer Clocks are acutually Timers,
 a precisely machined
quartz crystal
When kept under tension, crystals oscillate at a known fixed frequency
 
Associated with the crystal are two registers, a counter and
holding register
At each oscillation, the counter decrements by 1
When counter reaches zero, an interrupt is generated
Each interrupt corresponds to a clock tick
 
Timers have manufacturing defects
On a single node a clock being slightly off is tolerable
With multiple nodes clock skew is problematic
 
Relation between Clock Time and UTC
 
Dealing with drift
 
Set computer to true time
Generally not good idea to set clock back
Software can get very confused
 
Gradual
 clock correction
If
 fast 
then
 make clock run slower until it synchronizes
If
 slow 
then
 make clock run faster until it synchronizes
 
 
 
Compensating for a fast clock
UTC time, 
t
Computer’s time, 
C
Compensating for a fast clock
UTC time, 
t
Computer’s time, 
C
undefined
 
S
S
Y
Y
N
N
C
C
H
H
O
O
R
R
N
N
I
I
Z
Z
A
A
T
T
I
I
O
O
N
N
 
Network Time Protocol
Network Time Protocol
 
Clients contact a time server for synchronization,
Time server is typically equipped with an accurate time source
 
 
 
 
 
 
 
Does not account for processing and communication
delay
Delay Approximation
 
Network Time Protocols
 
client
 
server
 
Request ()
 
Reply(Crnt_Time)
 
Network Time Protocol
 
Getting the current time from a time server.
 
NTP Skew
 
Assume delay is nearly symmetric
Offset estimation

= [(T2 – T1) + (T3 – T4 )]/2.
If 


Not advisable
Change must be introduced gradually, until
the correction is made
 
NTP Basic Operation
 
NTP is set up pairwise
A probes B for its current time and B also probes A
 Compute the offset, 


(T
4
 – T
1
) - (T
3
 – T
2
 )





 
NTP Hierarchy
 
NTP divides servers into strata
Strata-1 Servers – Servers with a reference clock
Clock itself is said to operate at stratum 0
undefined
 
S
S
Y
Y
N
N
C
C
H
H
O
O
R
R
N
N
I
I
Z
Z
A
A
T
T
I
I
O
O
N
N
 
Berkeley Protocol
Berkeley Protocol
 
The Berkeley Algorithm – Request
 
For systems where no
machine has an
accurate time source
 
The time daemon asks
all the other machines
for their clock values.
 
 
The Berkeley Algorithm – Reply
 
The machines answer
with their current time
Daemon computes the
average of the received
answers
 
 
The Berkeley Algorithm (3)
 
The time  daemon tells
everyone how to adjust
their clock.
Speed up or jump ahead
Slow down
 
Slide Note
Embed
Share

Distributed Hash Tables (DHTs) are a fundamental component in organizing nodes in peer-to-peer networks. By using hash functions to assign keys to peers, DHTs enable efficient storage and retrieval of objects. Peers in a DHT are responsible for storing and managing key-value pairs, with each key being converted to an integer identifier. Assigning keys to peers based on the closest identifier allows for effective data retrieval. Circular DHTs, with or without shortcuts, optimize the network structure to reduce message overhead and enhance query resolution. Additionally, handling peer churn is essential for maintaining network stability.

  • DHT
  • Peer-to-Peer
  • Hash Tables
  • Distributed Systems
  • Network Structure

Uploaded on Sep 11, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Distributed Hash Tables

  2. Distributed Hash Table A common approach is to use a Distributed Hash Table (DHT) to organize nodes Traditional hash functions convert a key to a hash value, which can be used as an index into a hash table. Keys are unique Each represents an object to store in the table The hash function value is used to insert an object in the hash table and to retrieve it.

  3. Distributed Hash Table (DHT) DHT = distributed P2P database Database has (key, value) pairs; key: ss number; value: human name key: content type; value: IP address Peers query DB with key DB returns values that match the key Peers can also insert (key, value) peers

  4. DHT Identifiers Assign integer identifier to each peer in range [0,2n-1]. Each identifier can be represented by n bits. Require each key to be an integer in same range. To get integer keys, hash original key. eg, key = h( Led Zeppelin IV ) This is why they call it a distributed hash table

  5. How to assign keys to peers? Central issue: Assigning (key, value) pairs to peers. Rule: assign key to the peer that has the closest ID. Convention in lecture: closest is the immediate successor of the key. Ex: n=4; peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 key = 15, then successor peer = 1

  6. Circular DHT (1) 1 3 15 4 12 5 10 8 Each peer only aware of immediate successor and predecessor. Overlay network

  7. Circle DHT (2) 0001 O(N) messages on avg to resolve query, when there are N peers Who s resp for key 1110 ? I am 0011 1111 1110 0100 1110 1110 1100 0101 1110 1110 Define closest as closest successor 1110 1010 1000

  8. Circular DHT with Shortcuts 1 Who s resp for key 1110? 3 15 4 12 5 10 8 Each peer keeps track of IP addresses of predecessor, successor, short cuts. Reduced from 6 to 2 messages. Possible to design shortcuts so O(log N) neighbors, O(log N) messages in query

  9. Peer Churn 1 To handle peer churn, require each peer to know the IP address of its two successors. Each peer periodically pings its two successors to see if they are still alive. 3 15 4 12 5 10 8 Peer 5 abruptly leaves Peer 4 detects; makes 8 its immediate successor; asks 8 who its immediate successor is; makes 8 s immediate successor its second successor. What if peer 13 wants to join?

  10. SYNCHORNIZATION CLOCK SYNCHRONIZATION

  11. Motivation Why clock synchronization matters? Control access to a single, shared resource Agree on the ordering of events Time-based computations on multiple machines Many applications require that clocks advance at similar rates Real-time scheduling events based on processor clock Setting timeouts and measuring latencies

  12. Synchronization Synchronization within one system is hard enough Messages Semaphores, Monitors, Barriers, Synchronization among processes in a distributed system is much harder Two approaches to achieve synchronization Synchronization based on physical time Synchronization using relative ordering rather than absolute time

  13. Clock Synchronization In a centralized system, time is unambiguous In a distributed system, achieving agreement on time is hard It is impossible to guarantee that independent physical clocks are fully synchronized Lack of a single global time source can cause errors

  14. SYNCHORNIZATION PHYSICAL CLOCKs

  15. Clock Synchronization Actual Time For time dependent applications the actual time is important External physical clocks become critical For efficiency and redundancy, multiple physical clocks are needed Two issues arise: 1. How to synchronize the physical clocks with real-world clocks? 2. How to synchronize these clocks with each other?

  16. Time Measurement Atomic clocks make it possible to measure time accurately Several laboratories, equipped with Cesium 133 clocks, periodically report the number of times their clock has ticket to the Bureau International de l Heure (BIH) BIH computes the average to produce the International Atomic Time (TAI), the number of seconds since 01/01/1958 midnight divided by 9,192,631,370 TAI seconds are of constant length, unlike solar seconds. As the mean solar day is getting longer, TAI get out of synch Now, 86,600 TAI seconds are about 3 msec less than a mean solar day

  17. Physical clocks Computer Clocks are acutually Timers, a precisely machined quartz crystal When kept under tension, crystals oscillate at a known fixed frequency Associated with the crystal are two registers, a counter and holding register At each oscillation, the counter decrements by 1 When counter reaches zero, an interrupt is generated Each interrupt corresponds to a clock tick Timers have manufacturing defects On a single node a clock being slightly off is tolerable With multiple nodes clock skew is problematic

  18. Relation between Clock Time and UTC

  19. Dealing with drift Set computer to true time Generally not good idea to set clock back Software can get very confused Gradual clock correction If fast then make clock run slower until it synchronizes If slow then make clock run faster until it synchronizes

  20. Compensating for a fast clock ?? ?? > 1 Computer s time, C Clock Synchronized skew Linear Compensating Function Applied UTC time, t

  21. Compensating for a fast clock Computer s time, C ?? ?? < 1 UTC time, t

  22. SYNCHORNIZATION Network Time Protocol

  23. Network Time Protocols Clients contact a time server for synchronization, Time server is typically equipped with an accurate time source client server Request () Reply(Crnt_Time) Does not account for processing and communication delay Delay Approximation

  24. Network Time Protocol Getting the current time from a time server.

  25. NTP Skew Assume delay is nearly symmetric Offset estimation = [(T2 T1) + (T3 T4 )]/2. If < 0 then clock must be set backward Not advisable Change must be introduced gradually, until the correction is made

  26. NTP Basic Operation NTP is set up pairwise A probes B for its current time and B also probes A Compute the offset, , and the delay estimation, = (T4 T1) - (T3 T2 ) Buffer 8 pairs of ( , ), and select the minimal value for as the best delay estimation between the two servers The associated with the minimal delay is the most reliable estimation of the offset

  27. NTP Hierarchy NTP divides servers into strata Strata-1 Servers Servers with a reference clock Clock itself is said to operate at stratum 0

  28. SYNCHORNIZATION Berkeley Protocol

  29. The Berkeley Algorithm Request For systems where no machine has an accurate time source The time daemon asks all the other machines for their clock values.

  30. The Berkeley Algorithm Reply The machines answer with their current time Daemon computes the average of the received answers

  31. The Berkeley Algorithm (3) The time daemon tells everyone how to adjust their clock. Speed up or jump ahead Slow down

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#