Paxos and Consensus in Distributed Systems

undefined
 
LECTURE 8
Paxos and Consensus
Availability of P/B-based RSM
 
 
When is RSM unavailable to serve requests?
 
Replica is down but viewservice yet to detect
 
How to …
… make RSM tolerant to network partitions?
… ensure that operations don’t block even if some
machines are unavailable?
2
 
Analogy
 
 
US Senate needs to pass laws
 
Senators are often on travel
Common case: Not all senators present
 
How to pass laws successfully?
3
RSM via Consensus
 
 
Key idea:
 Apply an update if 
majority of replicas
commit to it
If 2f+1 replicas, need f+1 to commit
 
Why majority? Why not fewer or more?
Remaining replicas cannot accept some other
update
4
 
Context for Today’s Lecture
 
 
Say all replicas are in sync with each other
 
First:
 Among several concurrent new updates, how to
pick next update to apply?
 
Later:
 How to apply all updates in a consistent
order at all replicas?
5
Strawman Approaches
 
Every client proposes its value to all replicas
Every replica accepts first proposal received
Value accepted by majority is applied
Why might this not work?
 
Every client tags its proposal with seq number
Every replica collects proposals and accepts lowest
seq number proposal
Why might this not work?
6
 
Paxos
 
 
 
 
 
Original paper submitted in 1990
Tells mythical story of Greek island of Paxos with
“legislators” and “current law” passed through
parliamentary voting protocol
 
Widely used in industry today
7
Desirable Properties
 
Safety
“No bad things happen”
System 
never
 reaches an undesirable state
 
Liveness
“Good things eventually happen”
System makes progress 
eventually
 
Tradeoff between consistency and latency
8
Desired Properties of Solution
 
 
Safety:
Accept a value only if accepted by a majority
Accept a value only if proposed by some client
 
Liveness:
If any values are proposed, one of them will eventually
be accepted
If a value is accepted, all replicas will eventually
discover that it was chosen
9
Roles of a Process
 
Three conceptual roles
Proposers
 propose values
Acceptors
 accept values; chosen if majority accept
Learners
 learn the outcome (chosen value)
 
In reality, a process can play any/all roles
Roles in bank account example?
Roles in US Senate example?
10
Paxos Overview
 
Three phases within each round
Prepare Phase:
Proposer sends a unique proposal number to all
acceptors
Waits to get commitment from majority of acceptors
Accept Phase:
Proposer sends proposed value to all acceptors
Waits to get proposal accepted by majority
Learn Phase:
Learners discover value accepted by majority
11
Paxos State
 
 
Every acceptor maintains three values:
n
p
 
 highest proposal number promised to accept
n
a
 
 highest proposal number accepted
v
a
 
 value accepted  (operation)
 
This state must persist across restarts
Learners can re-discover accepted value (if any)
from acceptors
12
Paxos Phase 1
 
Proposer:
Choose unique proposal number n
S
end <prepare, n> to all acceptors
Acceptors:
If n > n
p
n
p
 = n     
 promise not to accept any new proposals n’ < n
If no prior proposal accepted
Reply < promise, n, Ø >
Else
Reply < promise, n, (n
a , 
v
a
)  >
Else
Reply < prepare-failed >
13
Prepare Phase
14
14
P
1
P
2
A
1
A
2
A
3
 
n=2
 
n=1
 
n=3
October 2, 2017
 
n
p
=_
 
n
p
=_
 
n
p
=_
 
n
p
=2
 
n
p
=2
 
n
p
=1
 
n
p
=3
 
n
p
=3
Paxos Phase 1
Proposer:
Choose unique proposal number n
S
end <prepare, n> to all acceptors
Acceptors:
If n > n
p
n
p
 = n     
 promise not to accept any new proposals n’ < n
If no prior proposal accepted
Reply < promise, n, Ø >
Else
Reply < promise, n, (n
a , 
v
a
)  >
Else
Reply < prepare-failed >
15
 
Why all? Why not majority
?
 
How to pick unique proposal number?
 
 What else is worth including
?
Paxos Phase 2
 
Proposer:
Once 
received promises from majority of acceptors,
v’ = v
a
 returned with highest n
a
, if exists, else own v
Send  <accept, (n, v’)>  to acceptors
Acceptors:
Upon receiving (n, v),  if n ≥ n
p
,
Accept proposal and notify learner(s)
n
a
 = n
p
 = n
v
a
 = v
16
 
When would majority not promise?
 
Why not stop if v
a
 != v?
Accept Phase
17
17
P
1
 (v
1
)
P
2
 (v
2
)
A
1
A
2
A
3
 
n=1
 
n=1
v=v
1
 
n=2
 
n=2
v=v
1
Accept Phase
18
18
P
1
 (v
1
)
P
2
 (v
2
)
A
1
A
2
A
3
 
n=1
 
n=1
v=v
1
 
n=2
 
n=2
v=v
2
Accept Phase
19
19
P
1
 (v
1
)
P
2
 (v
2
)
A
1
A
2
A
3
 
n=1
 
n=1
v=v
1
 
n=2
 
n=2
v=v
1
 
Paxos: Sample Execution
 
 
Acceptor1:   
P1
   
A1-X
    
P2
 
Acceptor2:   
P1
   
A1-X    P2    P3
 
Acceptor3:   
P1   A1-X             P3
20
 
Paxos: Sample Execution
 
 
Acceptor1:   
P1
          
A1-X
    
P3
             
A3-X
 
Acceptor2:   
P1
   
P2
   
A1-X
   
P3   
A2-Y
   A3-X
 
Acceptor3:          
P2                      A2-Y
21
 
Paxos: Sample Execution
 
 
Acceptor1:   
P1
   
A1-X
 
Acceptor2:   
P1
   
A1-X    P2
 
Acceptor3:   
                   P2
22
 
Paxos: Sample Execution
 
 
Acceptor1:   
P1
   
A1-X
 
Acceptor2:   
P1
   
A1-X    P2    A2-X
 
Acceptor3:   
                   P2     A2-X
23
 
Paxos: Sample Execution
 
 
Acceptor1:   
P1
 
Acceptor2:   
P1
   
A1-X    P2
 
Acceptor3:   
                   P2
24
 
Paxos: Sample Execution
 
 
Acceptor1:   
P1
 
Acceptor2:   
P1
   
A1-X    P2    A2-X
 
Acceptor3:   
                   P2     A2-X
25
 
Paxos: Sample Execution
 
 
Acceptor1:   
P1
   
A1-X
 
Acceptor2:   
P1
   
           P2
 
Acceptor3:   
                   P2
26
 
Paxos: Sample Execution
 
 
Acceptor1:   
P1
   
A1-X
 
Acceptor2:   
P1
   
           P2    A2-Y
 
Acceptor3:   
                   P2    A2-Y
27
Intuition: Once proposal with value v accepted, then
every higher-numbered proposal issued by any
proposer has value v
28
Paxos is safe
 
Majority of
acceptors
accept 
(n, v):
v
 is decided
 
Next prepare request
with proposal n+1
 
Desired Properties of Solution
 
 
Safety:
Accept a value only if accepted by a majority
Accept a value only if proposed by some client
 
Liveness:
If any values are proposed, one of them will eventually
be accepted
If a value is accepted, all replicas will eventually
discover that it was chosen
29
Race condition leads to liveness problem
 
Completes phase 1
with proposal n0
30
 
Starts and completes phase 1
with proposal n1 > n0
 
Performs phase 2,
acceptors reject
 
Retries and completes phase 1 with
proposal n2 > n1
Process 0
Process 1
 
Performs phase 2, acceptors
reject
 
… can go on indefinitely …
Paxos: Race condition
Acceptor1:   
P1
          
A1-X
    
P3
Acceptor2:   
P1
   
P2
   
A1-X
   
P3   
A2-Y    
P4
Acceptor3:          
P2                      A2-Y    P4
31
 
How to fix this?
Fixes to liveness problem
 
 
When proposal fails, 
back off for a random period
of time
 before retrying
 
Pre-determined ordering of proposers
Negative response from acceptor includes ID of
proposer to whom the acceptor has committed
Back off period chosen based on ordering
 
Note co-operative nature of protocol
32
Why two phases?
 
Liveness problem is partly due to two phases
Between one proposer’s Prepare and Accept phases, n_p
updated by another proposer
 
Alternate design:
Proposer sends propose messages to all acceptors
Retry with higher proposal no. if majority don’t accept
 
Problem?
Once a value is accepted by majority, we don’t want
another value accepted by a majority
33
 
Paxos: Three Phases
 
Prepare Phase:
Proposer gets commitment from majority of acceptors
 
Accept Phase:
Proposer sends proposed value to all acceptors
Waits to get proposal accepted by majority
 
Learn Phase:
Learners discover value accepted by majority
34
Paxos in Action
35
R
R
R
R
R
 
P   L
 
A   L
 
A   L
 
A   L
 
A   L
 
A   L
Paxos Phase 3
 
Goal:
 For all learners to discover if any value was
accepted by majority
 
Potential approaches:
Proposer who has proposal accepted by majority of
acceptors informs all learners
Acceptor broadcasts to all learners whenever it accepts
any value
Acceptors notify distinguished learner, which informs
others
36
 
Paxos Phase 3
 
 
Learners mimic proposers
 
Discover value accepted by each acceptor in
response to prepare messages
37
 
Paxos: Sample Execution
 
 
Acceptor1:   
P1
   
A1-X
    
P2
 
Acceptor2:   
P1
   
A1-X    P2    P3
 
Acceptor3:   
P1   A1-X             P3
38
 
RSM with Paxos
 
 
Log of updates at every replica
Replicas execute updates in order in log
 
Use Paxos to come to 
consensus about each slot of
the log
39
 
RSM with Paxos
40
 
Slots in log
RSM with Paxos
 
Example:
 updates from MapReduce workers
submitted to replicated Master
 
Whenever an update is submitted:
Attempt to get update accepted to a particular slot in
replicated log
If unsuccessful, retry proposing to higher slot
 
Challenge: 
Must guess slot at end of log
41
RSM with Paxos
 
 
e1: an operation is accepted to i
th
 slot in log
e2: i
th
 operation is executed at all replicas
 
Arbitrarily large delay between events e1 and e2
 
Consequence: 
Local state at any replica differs from
state of replicated log
42
Comparing with P/B Replication
43
B
B
P
B
B
44
R
R
R
R
R
P   L
A   L
A   L
A   L
A   L
A   L
Comparing with P/B Replication
 
 
 
 
 
l
Benefit of Paxos:
u
Need only majority of
replicas to be up
l
Downside of Paxos?
u
Need two rounds of inter-
replica communication
 
Leader-based Paxos
45
R
R
R
R
R
 
P   L
 
A   L
 
A   L
 
A   L
 
A   L
 
A   L
Leader-based Paxos
 
Pick one of the acceptors as the leader
All clients submit proposals to leader
Leader can directly skip to Accept phase because no
contention
Learn phase executed asynchronously
 
How to pick a leader?
Paxos!
Drawbacks compared to leaderless Paxos?
Leader may be far from client
46
Slide Note
Embed
Share

This lecture covers the concept of Paxos and achieving consensus in distributed systems. It discusses the availability of P/B-based RSM, RSM via consensus, the context for today's lecture, and desirable properties of solutions. The analogy of the US Senate passing laws is used to explain the need for consensus. Various approaches like Paxos and desirable properties of solutions are explored to ensure the safety and liveness of distributed systems.

  • Consensus
  • Paxos
  • Distributed Systems
  • RSM
  • Availability

Uploaded on Sep 19, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. LECTURE 8 Paxos and Consensus

  2. Availability of P/B-based RSM 2 When is RSM unavailable to serve requests? Replica is down but viewservice yet to detect How to make RSM tolerant to network partitions? ensure that operations don t block even if some machines are unavailable?

  3. Analogy 3 US Senate needs to pass laws Senators are often on travel Common case: Not all senators present How to pass laws successfully?

  4. RSM via Consensus 4 Key idea: Apply an update if majority of replicas commit to it If 2f+1 replicas, need f+1 to commit Why majority? Why not fewer or more? Remaining replicas cannot accept some other update

  5. Context for Todays Lecture 5 Say all replicas are in sync with each other First: Among several concurrent new updates, how to pick next update to apply? Later: How to apply all updates in a consistent order at all replicas?

  6. Strawman Approaches 6 Every client proposes its value to all replicas Every replica accepts first proposal received Value accepted by majority is applied Why might this not work? Every client tags its proposal with seq number Every replica collects proposals and accepts lowest seq number proposal Why might this not work?

  7. Paxos 7 Original paper submitted in 1990 Tells mythical story of Greek island of Paxos with legislators and current law passed through parliamentary voting protocol Widely used in industry today

  8. Desirable Properties 8 Safety No bad things happen System never reaches an undesirable state Liveness Good things eventually happen System makes progress eventually Tradeoff between consistency and latency

  9. Desired Properties of Solution 9 Safety: Accept a value only if accepted by a majority Accept a value only if proposed by some client Liveness: If any values are proposed, one of them will eventually be accepted If a value is accepted, all replicas will eventually discover that it was chosen

  10. Roles of a Process 10 Three conceptual roles Proposers propose values Acceptors accept values; chosen if majority accept Learners learn the outcome (chosen value) In reality, a process can play any/all roles Roles in bank account example? Roles in US Senate example?

  11. Paxos Overview 11 Three phases within each round Prepare Phase: Proposer sends a unique proposal number to all acceptors Waits to get commitment from majority of acceptors Accept Phase: Proposer sends proposed value to all acceptors Waits to get proposal accepted by majority Learn Phase: Learners discover value accepted by majority

  12. Paxos State 12 Every acceptor maintains three values: np highest proposal number promised to accept na highest proposal number accepted va value accepted (operation) This state must persist across restarts Learners can re-discover accepted value (if any) from acceptors

  13. Paxos Phase 1 13 Proposer: Choose unique proposal number n Send <prepare, n> to all acceptors Acceptors: If n > np np = n promise not to accept any new proposals n < n If no prior proposal accepted Reply < promise, n, > Else Reply < promise, n, (na , va) > Else Reply < prepare-failed >

  14. Prepare Phase 14 P2 P1 A1 A2 A3 n=2 n=1 n=3 np=_ np=_ np=_ np=2 np=2 np=1 np=3 np=3 October 2, 2017

  15. Paxos Phase 1 15 Proposer: How to pick unique proposal number? Choose unique proposal number n Send <prepare, n> to all acceptors Why all? Why not majority? Acceptors: If n > np np = n promise not to accept any new proposals n < n If no prior proposal accepted Reply < promise, n, > Else Reply < promise, n, (na , va) > Else Reply < prepare-failed > What else is worth including?

  16. Paxos Phase 2 16 Proposer: Once received promises from majority of acceptors, v = va returned with highest na, if exists, else own v Send <accept, (n, v )> to acceptors When would majority not promise? Why not stop if va != v? Acceptors: Upon receiving (n, v), if n np, Accept proposal and notify learner(s) na = np = n va = v

  17. Accept Phase 17 P2 (v2) P1 (v1) n=1 A1 A2 A3 n=1 v=v1 n=2 n=2 v=v1

  18. Accept Phase 18 P2 (v2) P1 (v1) n=1 A1 A2 A3 n=1 v=v1 n=2 n=2 v=v2

  19. Accept Phase 19 P2 (v2) P1 (v1) n=1 A1 A2 A3 n=1 v=v1 n=2 n=2 v=v1

  20. Paxos: Sample Execution 20 Acceptor1: P1 A1-X P2 Acceptor2: P1 A1-X P2 P3 Acceptor3: P1 A1-X P3

  21. Paxos: Sample Execution 21 Acceptor1: P1 A1-X P3 A3-X Acceptor2: P1 P2 A1-X P3 A2-Y A3-X Acceptor3: P2 A2-Y

  22. Paxos: Sample Execution 22 Acceptor1: P1 A1-X Acceptor2: P1 A1-X P2 Acceptor3: P2

  23. Paxos: Sample Execution 23 Acceptor1: P1 A1-X Acceptor2: P1 A1-X P2 A2-X Acceptor3: P2 A2-X

  24. Paxos: Sample Execution 24 Acceptor1: P1 Acceptor2: P1 A1-X P2 Acceptor3: P2

  25. Paxos: Sample Execution 25 Acceptor1: P1 Acceptor2: P1 A1-X P2 A2-X Acceptor3: P2 A2-X

  26. Paxos: Sample Execution 26 Acceptor1: P1 A1-X Acceptor2: P1 P2 Acceptor3: P2

  27. Paxos: Sample Execution 27 Acceptor1: P1 A1-X Acceptor2: P1 P2 A2-Y Acceptor3: P2 A2-Y

  28. Paxos is safe 28 Intuition: Once proposal with value v accepted, then every higher-numbered proposal issued by any proposer has value v Majority of acceptors accept (n, v): Next prepare request with proposal n+1 v is decided

  29. Desired Properties of Solution 29 Safety: Accept a value only if accepted by a majority Accept a value only if proposed by some client Liveness: If any values are proposed, one of them will eventually be accepted If a value is accepted, all replicas will eventually discover that it was chosen

  30. Race condition leads to liveness problem 30 Process 0 Process 1 Completes phase 1 with proposal n0 Starts and completes phase 1 with proposal n1 > n0 Performs phase 2, acceptors reject Retries and completes phase 1 with proposal n2 > n1 Performs phase 2, acceptors reject can go on indefinitely

  31. Paxos: Race condition 31 Acceptor1: P1 A1-X P3 Acceptor2: P1 P2 A1-X P3 A2-Y P4 Acceptor3: P2 A2-Y P4 How to fix this?

  32. Fixes to liveness problem 32 When proposal fails, back off for a random period of time before retrying Pre-determined ordering of proposers Negative response from acceptor includes ID of proposer to whom the acceptor has committed Back off period chosen based on ordering Note co-operative nature of protocol

  33. Why two phases? 33 Liveness problem is partly due to two phases Between one proposer s Prepare and Accept phases, n_p updated by another proposer Alternate design: Proposer sends propose messages to all acceptors Retry with higher proposal no. if majority don t accept Problem? Once a value is accepted by majority, we don t want another value accepted by a majority

  34. Paxos: Three Phases 34 Prepare Phase: Proposer gets commitment from majority of acceptors Accept Phase: Proposer sends proposed value to all acceptors Waits to get proposal accepted by majority Learn Phase: Learners discover value accepted by majority

  35. Paxos in Action 35 P L A L R A L R R A L R R A L A L

  36. Paxos Phase 3 36 Goal: For all learners to discover if any value was accepted by majority Potential approaches: Proposer who has proposal accepted by majority of acceptors informs all learners Acceptor broadcasts to all learners whenever it accepts any value Acceptors notify distinguished learner, which informs others

  37. Paxos Phase 3 37 Learners mimic proposers Discover value accepted by each acceptor in response to prepare messages

  38. Paxos: Sample Execution 38 Acceptor1: P1 A1-X P2 Acceptor2: P1 A1-X P2 P3 Acceptor3: P1 A1-X P3

  39. RSM with Paxos 39 Log of updates at every replica Replicas execute updates in order in log Use Paxos to come to consensus about each slot of the log

  40. RSM with Paxos 40 Slots in log

  41. RSM with Paxos 41 Example: updates from MapReduce workers submitted to replicated Master Whenever an update is submitted: Attempt to get update accepted to a particular slot in replicated log If unsuccessful, retry proposing to higher slot Challenge: Must guess slot at end of log

  42. RSM with Paxos 42 e1: an operation is accepted to ith slot in log e2: ith operation is executed at all replicas Arbitrarily large delay between events e1 and e2 Consequence: Local state at any replica differs from state of replicated log

  43. Comparing with P/B Replication 43 P B B B B

  44. Comparing with P/B Replication 44 P L A L R A L R R A L l Benefit of Paxos: R u Need only majority of replicas to be up R A L A L l Downside of Paxos? u Need two rounds of inter- replica communication

  45. Leader-based Paxos 45 P L A L R A L R R A L R R A L A L

  46. Leader-based Paxos 46 Pick one of the acceptors as the leader All clients submit proposals to leader Leader can directly skip to Accept phase because no contention Learn phase executed asynchronously How to pick a leader? Paxos! Drawbacks compared to leaderless Paxos? Leader may be far from client

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#