Consistency Protocols in Distributed Systems

 
Distributed Systems
CS 15-440
 
Replication – Part III
Lecture 24, November 06, 2022
 
Mohammad Hammoud
 
Today…
 
Last Session:
Replication – Part II
Data- and client-centric consistency models
 
Today’s Session:
Replication – Part III
Consistency protocols
 
 
Announcements:
PS5 is due today by midnight
Project 4 is due on Wed Nov. 09 by midnight
The final exam is on Wednesday Nov. 16 from 2:30PM to 5:30PM in
Room 1190
 
Overview
Motivation
Consistency Models
Data-Centric Consistency Models
Client-Centric Consistency Models
Consistency Protocols
Last two lectures
Today’s lecture
  
 
Overview
 
Motivation
 
Consistency Models
Data-Centric Consistency Models
Client-Centric Consistency Models
 
 
Consistency Protocols
Consistency Protocols
 
A consistency protocol describes the 
implementation
 of a specific
consistency model (e.g., strict consistency)
 
We will study 2 types of consistency protocols:
Primary-based Protocols
One primary coordinator is 
elected
 to control replication across multiple replicas
 
Replicated-write Protocols
Multiple replicas coordinate to provide consistency guarantees
Consistency Protocols
 
Primary-Based Protocols
 
In primary-based protocols, a simple centralized design is used to
implement consistency models
Each data-item 
x
 has an associated “
primary replica
The primary replica is responsible for coordinating write operations
 
 
 
We will study one example of primary-based protocols that
implements the 
Strict Consistency Model
The Remote-Write Protocol
Remote-Write Protocol
 
Two Rules:
All write operations are forwarded to the primary replica
Read operations are carried out 
locally
 at each replica
 
Approach for write operations:
 
Client connects to some replica 
R
C
If the client issues write operation to 
R
C
R
C 
forwards the request to the primary replica 
R
P
, 
which
U
pdates its local value
Then f
orwards the update to other replicas 
R
i
Other replicas 
R
i 
perform updates, and send ACKs
back to 
R
P
After 
R
P
 receives all ACKs, it informs 
R
C
 
that
the write operation was successful
R
C 
acknowledges the client, stating that the
write operation was successful
R
3
R
1
R
2
Primary Replica
x+=5
Client 1
x
1
=0
x
2
=0
x
3
=0
x
2
=5
x
1
=5
x
3
=5
D
a
t
a
-
s
t
o
r
e
Remote-Write Protocol – Discussion
 
The Remote-Write Protocol
Provides a simple way to implement strict consistency
Guarantees that clients see always the most recent values
 
However, latency is high in the Remote-Write Protocol
The client blocks until all the replicas are updated
In what scenarios would you use the Remote-Write protocol?
Typically, for distributed databases and file systems in data-centers
(i.e., in LAN settings)
Replicas are placed on the same LAN to reduce latency
 
Consistency Protocols
 
Replicated-Write Protocols
 
In replicated-write protocols, updates can be carried out at multiple replicas
 
We will study two examples of the replicated-write protocols
Active Replication Protocol
Clients write at 
any
 replica (no primary replicas)
The altered replica will propagate updates to other replicas
 
Quorum-Based Protocol
A 
voting scheme 
is used
Active Replication Protocol
 
Protocol: when a client writes at a replica, the replica will send the
update to all other replicas
 
Challenges with Active Replication
Ordering of operations can differ leading to conflicts/inconsistencies
So how to maintain consistent ordering?
R
3
R
1
Client 1
x
1
=0
x
2
=0
x
3
=0
D
a
t
a
-
s
t
o
r
e
x+=2
R
2
Client 2
x*=3
x
1
=2
x
2
=2
x
3
=2
x
3
=6
x
2
=6
x
1
=6
Centralized Active Replication Protocol
 
A Possible Approach:
Elect a centralized coordinator (let us call it 
sequencer 
(
Seq
))
When a client connects to a replica 
R
C
 and issues a write operation
R
C 
forwards the update to 
Seq
Seq
 assigns a 
sequence number 
to the update operation
R
C 
propagates the sequence number and the operation to other replicas
Operations are carried out at all replicas in the order of the sequence numbers
R
3
R
1
Client 1
D
a
t
a
-
s
t
o
r
e
x+=5
R
2
Client 2
x-=2
Seq
 
10
10
x+=5
x-=2
 
11
11
Replicated-Write Protocols
In replicated-write protocols, updates can be carried out at multiple replicas
We will study two examples of the replicated-write protocols
Active Replication Protocol
Clients write at any replica (no primary replicas)
The replica will propagate updates to other replicas
Quorum-Based Protocol
A 
voting scheme 
is used
 
Quorum-Based Protocols
 
Replicated writes can also be accomplished via using a 
voting
 
scheme
, originally
proposed by Thomas (1979) then generalized by Gifford (1979)
 
Basic Idea (
Recap
)
:
Clients are required to 
request and acquire 
the permission of multiple servers
before either 
reading
 or 
writing
 from or to a replicated data item
Rules on reads and writes should be established
Each replica is assigned a 
version number
, which is incremented on each write
 
Another protocol was proposed by Lamport in 1998 and referred to
as 
Paxos
Quorum-Based Protocols
 
Working Example
:
Consider a distributed file system and suppose that a file is
replicated on 
N
 servers
Write Rule
:
A client must first contact 
N/2 + 1
 servers (a 
majority
) before
updating a file
Once majority votes are attained, the file is updated and its
version number is incremented
This 
is 
pursued at replica sites
Quorum-Based Protocols
 
Working Example
:
Consider a distributed file system and suppose that a file is
replicated on 
N
 servers
Read Rule
:
A client must contact 
N
/2 + 1 servers, asking them to send their
version numbers of its requested file
If all the version numbers are equal, this must be the most
recent version of the file
 
Quorum-Based Protocols
 
Gifford's scheme generalizes Thomas’ one
 
Gifford’s Scheme
:
Read Rule:
A client needs to assemble a 
read quorum
, which is an arbitrary
collection of any 
N
R
 servers, or more
 
Write Rule:
To modify a file, a 
write quorum 
of at least 
N
W
 servers is required
Quorum-Based Protocols
The values of 
N
R
 and 
N
W
 are subject to the following two constraints:
Constraint 1 (or 
C1
): 
N
R
 + 
N
W
 > 
N
Constraint 2 (or 
C2
): 
N
W
 > 
N
/2
Claim
:
C1
 prevents read-write (RW) conflicts
C2
 prevents write-write (WW) conflicts
Another protocol was proposed by Lamport in 1998 and referred to
as 
Paxos
Assumptions in Paxos
 
Paxos assumes asynchronous, non-Byzantine (
more on this under
fault-tolerance
) model, in which:
Processes
:
Operate at arbitrary speeds
May fail by stopping, but may restart
Since any process may fail after a 
value is chosen
 and then restart, a
solution is impossible unless some information can be remembered
(e.g., 
through logging
) by a process that has failed and restarted
 
Messages
:
May be lost, duplicated, delayed (and thus reordered), but 
not
 corrupted
 
 
 
 
Roles in Paxos
 
Processes can take different 
roles
:
Client
:
Issues a request (e.g., write on a replicated file) to the distributed system and
waits for a response
Proposer (
or a process bidding to become a coordinator/leader
)
:
Advocates for a Client and suggests values for consideration by 
Acceptors
Acceptor (
or a voter
)
:
Considers the values proposed by Proposers and renders an accept/reject
decision
Learner
:
Once a Client’s request has been 
agreed upon
 by the Acceptors, the Learner
can take action (e.g., execute the request and send a response to the Client)
Quorums in Paxos
 
Any message sent to an Acceptor must be sent to a 
quorum of Acceptors
consisting of 
more than half
 of all Acceptors (i.e., 
majority-- not unanimity
)
 
Any two quorums should have a nonempty intersection
Common node acts as “tie-breaker”
This helps avoid the “split-brain” problem (or a situation when Acceptors’
decisions are not in agreement)
 
In a system with 2
m
+1 Acceptors, 
m
 Acceptors can fail and consensus can still
be reached
 
 
 
 
 
Paxos Algorithm: Phase I
 
Note that multiple processes can bid to become coordinators
 
Hence, how can each coordinator select a 
unique
 sequence number?
E
v
e
r
y
 
p
r
o
c
e
s
s
,
 
P
,
 
c
a
n
 
b
e
 
a
s
s
i
g
n
e
d
 
a
 
u
n
i
q
u
e
 
I
D
P
,
 
b
e
t
w
e
e
n
 
0
 
a
n
d
 
k
 
 
1
,
 
a
s
s
u
m
i
n
g
a
 
t
o
t
a
l
 
o
f
 
k
 
p
r
o
c
e
s
s
e
s
P
 
c
a
n
 
s
e
l
e
c
t
 
t
h
e
 
s
m
a
l
l
e
s
t
 
s
e
q
u
e
n
c
e
 
n
u
m
b
e
r
,
 
s
,
 
t
h
a
t
 
i
s
 
l
a
r
g
e
r
 
t
h
a
n
 
a
l
l
 
s
e
q
u
e
n
c
e
n
u
m
b
e
r
s
 
s
e
e
n
 
t
h
u
s
 
f
a
r
,
 
s
o
 
t
h
a
t
 
s
 
%
 
k
 
=
 
I
D
P
E
.
g
.
,
 
P
 
w
i
l
l
 
p
i
c
k
 
a
 
s
e
q
u
e
n
c
e
 
n
u
m
b
e
r
 
o
f
 
2
3
 
f
o
r
 
i
t
s
 
n
e
x
t
 
b
i
d
 
i
f
 
I
D
P
 
=
 
3
,
 
k
 
=
 
5
,
 
a
n
d
l
a
r
g
e
s
t
 
n
u
m
b
e
r
 
s
e
e
n
 
=
 
2
0
 
Paxos Algorithm: Phase I
Example
C
l
i
e
n
t
P
r
o
p
o
s
e
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
 
r
e
q
u
e
s
t
 
p
r
e
p
a
r
e
(
n
)
 
p
r
o
m
i
s
e
(
n
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
n
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
n
,
 
N
U
L
L
)
Q
u
o
r
u
m
 
S
i
z
e
 
=
 
3
,
w
h
i
c
h
 
i
s
 
d
e
c
i
d
e
d
b
y
 
t
h
e
 
p
r
o
p
o
s
e
r
Example
C
l
i
e
n
t
P
r
o
p
o
s
e
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
 
r
e
q
u
e
s
t
 
p
r
e
p
a
r
e
(
n
)
 
p
r
o
m
i
s
e
(
n
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
n
,
 
N
U
L
L
)
Q
u
o
r
u
m
 
S
i
z
e
 
=
 
2
,
 
w
h
i
c
h
 
i
s
 
t
h
e
 
m
i
n
a
c
c
e
p
t
a
b
l
e
 
q
u
o
r
u
m
 
s
i
z
e
i
n
 
t
h
i
s
 
e
x
a
m
p
l
e
 
Paxos Algorithm: Phase II
 
Paxos Algorithm: Phase II
Example
C
l
i
e
n
t
P
r
o
p
o
s
e
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
 
r
e
q
u
e
s
t
 
p
r
e
p
a
r
e
(
n
)
 
p
r
o
m
i
s
e
(
n
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
n
,
 
N
U
L
L
)
 
a
c
c
e
p
t
(
n
,
 
v
)
 
a
c
c
e
p
t
e
d
(
n
,
 
v
)
 
a
c
c
e
p
t
e
d
(
n
,
 
v
)
But, an Acceptor can accept multiple concurrent proposals!
Example
P
r
o
p
o
s
e
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
P
r
o
p
o
s
e
r
 
p
r
e
p
a
r
e
(
1
)
 
p
r
e
p
a
r
e
(
2
)
 
p
r
o
m
i
s
e
(
1
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
1
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
2
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
2
,
 
N
U
L
L
)
 
a
c
c
e
p
t
(
1
,
 
A
)
 
a
c
c
e
p
t
e
d
(
1
,
 
A
)
 
N
A
K
(
1
)
 
a
c
c
e
p
t
(
2
,
 
B
)
 
a
c
c
e
p
t
e
d
(
2
,
 
B
)
 
a
c
c
e
p
t
e
d
(
2
,
 
B
)
But, what if before the blue Proposer sends its accept message,
another Proposer (could be the green one again) submits a new
proposal with a higher sequence number?
 
Example
P
r
o
p
o
s
e
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
P
r
o
p
o
s
e
r
 
p
r
e
p
a
r
e
(
1
)
 
p
r
e
p
a
r
e
(
2
)
 
p
r
o
m
i
s
e
(
1
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
1
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
2
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
2
,
 
N
U
L
L
)
 
a
c
c
e
p
t
(
1
,
 
A
)
 
a
c
c
e
p
t
e
d
(
1
,
 
A
)
 
N
A
K
(
1
)
 
a
c
c
e
p
t
(
2
,
 
B
)
 
a
c
c
e
p
t
e
d
(
2
,
 
B
)
 
a
c
c
e
p
t
e
d
(
2
,
 
B
)
The blue round will fail also!
 
Example
P
r
o
p
o
s
e
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
P
r
o
p
o
s
e
r
 
p
r
e
p
a
r
e
(
1
)
 
p
r
e
p
a
r
e
(
2
)
 
p
r
o
m
i
s
e
(
1
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
1
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
2
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
2
,
 
N
U
L
L
)
 
a
c
c
e
p
t
(
1
,
 
A
)
 
a
c
c
e
p
t
e
d
(
1
,
 
A
)
 
N
A
K
(
1
)
 
a
c
c
e
p
t
(
2
,
 
B
)
 
a
c
c
e
p
t
e
d
(
2
,
 
B
)
 
a
c
c
e
p
t
e
d
(
2
,
 
B
)
What if this keeps happening?
 
Example
P
r
o
p
o
s
e
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
A
c
c
e
p
t
o
r
P
r
o
p
o
s
e
r
 
p
r
e
p
a
r
e
(
1
)
 
p
r
e
p
a
r
e
(
2
)
 
p
r
o
m
i
s
e
(
1
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
1
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
2
,
 
N
U
L
L
)
 
p
r
o
m
i
s
e
(
2
,
 
N
U
L
L
)
 
a
c
c
e
p
t
(
1
,
 
A
)
 
a
c
c
e
p
t
e
d
(
1
,
 
A
)
 
N
A
K
(
1
)
 
a
c
c
e
p
t
(
2
,
 
B
)
 
a
c
c
e
p
t
e
d
(
2
,
 
B
)
 
a
c
c
e
p
t
e
d
(
2
,
 
B
)
Paxos will not commit until this scenario stops!
A Note on Liveness
 
If two Proposers keep concurrently issuing proposals with increasing
sequence numbers, none of them will succeed
Hence, Paxos cannot guarantee 
liveness
 (i.e., cannot guarantee that a
proposed value will 
be chosen 
within a finite time
)
 
Is there a way liveness can be guaranteed in 
Basic
 Paxos?
Short Answer
: No
But
: We can apply an optimization 
to potentially expedite (not
guarantee) 
liveness
 in the presence of multiple concurrent Proposers
 
 
A Note on Liveness
 
To expedite 
liveness
:
A 
distinguished Proposer
 
can be selected as the 
only
 entity to try
submitting proposals
If this distinguished Proposer:
Can communicate successfully with a majority of Acceptors
And
 uses a sequence number that is greater than any number
used already
Then it will succeed in issuing a proposal that can be accepted,
assuming enough of the system (Proposer, Acceptors, and
network) is working properly
 
Clearly, liveness remains impossible to guarantee in finite time since
any component in the system could fail (e.g., a 
network partition
can arise)
 
 
 
Possible Failures in Paxos
 
Would a network partition impact Paxos’s 
correctness (NOT liveness)
?
No, due to the quorum mechanism
 
What if an Acceptor fails?
Case 1
: The Acceptor is not a member of the Proposer’s quorum
No recovery is needed
Case 2
: The Acceptor is a member of the Proposer’s quorum, but quorum
size > majority of Acceptors
No recovery is needed
 
 
 
 
 
Possible Failures in Paxos
 
Would a network partition impact Paxos’s 
correctness (NOT liveness)
?
No, due to the quorum mechanism
 
What if an Acceptor fails?
Case 3
: The Acceptor is a member of the Proposer’s quorum and quorum size
equals to the majority of Acceptors
Sub-case 3.1
: The Acceptor fails 
after
 accepting the proposal
No recovery is needed, assuming the Proposer will receive (or has
received already) its acceptance message
Sub-case 3.2
: The Acceptor fails 
before
 accepting the proposal
Worst case: New quorum and round can be established
 
 
 
 
 
 
 
Possible Failures in Paxos
 
What if a Proposer fails?
Case 1
: The Proposer fails 
after
 proposing a value, but 
before
 a consensus is
reached
New Proposer can take over
Case 2
: The Proposer fails 
after
 a consensus is reached, but 
before
 it gets to
know about it
Either its failure gets detected and a new round is launched
Or, it recovers and starts a new round itself
Case 3
: The Proposer fails 
after
 a consensus is reached and 
after
 it gets to
know about it (
but before letting the Learner knowing
)
Either its failure gets detected and a new round is launched
Or, it recovers and learns again from its stable storage that it has succeeded in its
bidding
 
 
 
 
 
 
 
 
 
Next Lecture…
 
Fault-tolerance
Slide Note
Embed
Share

Today's lecture covers consistency protocols in distributed systems, focusing on primary-based protocols and replicated-write protocols. These protocols play a crucial role in ensuring consistency across multiple replicas. One example discussed is the Remote-Write Protocol, which enforces strict consistency by forwarding all write operations to a primary replica. The protocol ensures that read operations can be executed locally at each replica, while write operations are coordinated by the primary replica and propagated to other replicas for consistency. The session also includes announcements regarding upcoming project deadlines and the final exam schedule.

  • Distributed Systems
  • Consistency Protocols
  • Replication
  • Primary-Based Protocols
  • Remote-Write Protocol

Uploaded on Jul 13, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Distributed Systems CS 15-440 Replication Part III Lecture 24, November 06, 2022 Mohammad Hammoud

  2. Today Last Session: Replication Part II Data- and client-centric consistency models Today s Session: Replication Part III Consistency protocols Announcements: PS5 is due today by midnight Project 4 is due on Wed Nov. 09 by midnight The final exam is on Wednesday Nov. 16 from 2:30PM to 5:30PM in Room 1190

  3. Overview Last two lectures Motivation Consistency Models Data-Centric Consistency Models Client-Centric Consistency Models Consistency Protocols Today s lecture

  4. Overview Motivation Consistency Models Data-Centric Consistency Models Client-Centric Consistency Models Consistency Protocols

  5. Consistency Protocols A consistency protocol describes the implementation of a specific consistency model (e.g., strict consistency) We will study 2 types of consistency protocols: Primary-based Protocols One primary coordinator is elected to control replication across multiple replicas Replicated-write Protocols Multiple replicas coordinate to provide consistency guarantees

  6. Consistency Protocols Replica Control Protocols Primary-Based Protocols Replicated-Write Protocols

  7. Primary-Based Protocols In primary-based protocols, a simple centralized design is used to implement consistency models Each data-item xhas an associated primary replica The primary replica is responsible for coordinating write operations We will study one example of primary-based protocols that implements the Strict Consistency Model The Remote-Write Protocol

  8. Remote-Write Protocol Two Rules: All write operations are forwarded to the primary replica Read operations are carried out locally at each replica Approach for write operations: Client connects to some replica RC If the client issues write operation to RC RC forwards the request to the primary replica RP, which Updates its local value Then forwards the update to other replicas Ri Other replicas Ri perform updates, and send ACKs back to RP After RP receives all ACKs, it informs RCthat the write operation was successful RC acknowledges the client, stating that the write operation was successful x+=5 Client 1 Primary Replica R2 R3 R1 x1=0 x1=5 x2=0 x2=5 x3=0 x3=5 Data-store

  9. Remote-Write Protocol Discussion The Remote-Write Protocol Provides a simple way to implement strict consistency Guarantees that clients see always the most recent values However, latency is high in the Remote-Write Protocol The client blocks until all the replicas are updated In what scenarios would you use the Remote-Write protocol? Typically, for distributed databases and file systems in data-centers (i.e., in LAN settings) Replicas are placed on the same LAN to reduce latency

  10. Consistency Protocols Consistency Protocols Primary-Based Protocols Replicated- Write Protocols Remote-Write Protocol

  11. Replicated-Write Protocols In replicated-write protocols, updates can be carried out at multiple replicas We will study two examples of the replicated-write protocols Active Replication Protocol Clients write at any replica (no primary replicas) The altered replica will propagate updates to other replicas Quorum-Based Protocol A voting scheme is used

  12. Active Replication Protocol Protocol: when a client writes at a replica, the replica will send the update to all other replicas Challenges with Active Replication Ordering of operations can differ leading to conflicts/inconsistencies So how to maintain consistent ordering? x+=2 x*=3 Client 1 Client 2 W(x) R(x)2 R(x)6 x+=2 R1 R(x)0 R(x)2 R2 W(x) R(x)2 R(x)6 R1 x*=3 R2 R3 R3 x1=0 x1=2 x1=6 x2=0 x2=2 x2=6 x3=0 x3=2 x3=6 Data-store

  13. Centralized Active Replication Protocol A Possible Approach: Elect a centralized coordinator (let us call it sequencer (Seq)) When a client connects to a replica RC and issues a write operation RC forwards the update to Seq Seq assigns a sequence number to the update operation RC propagates the sequence number and the operation to other replicas Operations are carried out at all replicas in the order of the sequence numbers x-=2 x+=5 Client 1 Client 2 10 11 R1 R2 R3 Seq 11 10 x-=2 x+=5 Data-store

  14. Replicated-Write Protocols In replicated-write protocols, updates can be carried out at multiple replicas We will study two examples of the replicated-write protocols Active Replication Protocol Clients write at any replica (no primary replicas) The replica will propagate updates to other replicas Quorum-Based Protocol A voting scheme is used

  15. Quorum-Based Protocols Replicated writes can also be accomplished via using a votingscheme, originally proposed by Thomas (1979) then generalized by Gifford (1979) Basic Idea (Recap): Clients are required to request and acquire the permission of multiple servers before either reading or writing from or to a replicated data item Rules on reads and writes should be established Each replica is assigned a version number, which is incremented on each write Another protocol was proposed by Lamport in 1998 and referred to as Paxos

  16. Assumptions in Paxos Paxos assumes asynchronous, non-Byzantine (more on this under fault-tolerance) model, in which: Processes: Operate at arbitrary speeds May fail by stopping, but may restart Since any process may fail after a value is chosen and then restart, a solution is impossible unless some information can be remembered (e.g., through logging) by a process that has failed and restarted Messages: May be lost, duplicated, delayed (and thus reordered), but not corrupted

  17. Roles in Paxos Processes can take different roles: Client: Issues a request (e.g., write on a replicated file) to the distributed system and waits for a response Proposer (or a process bidding to become a coordinator/leader): Advocates for a Client and suggests values for consideration by Acceptors Acceptor (or a voter): Considers the values proposed by Proposers and renders an accept/reject decision Learner: Once a Client s request has been agreed upon by the Acceptors, the Learner can take action (e.g., execute the request and send a response to the Client)

  18. Quorums in Paxos Any message sent to an Acceptor must be sent to a quorum of Acceptors consisting of more than half of all Acceptors (i.e., majority-- not unanimity) Any two quorums should have a nonempty intersection Common node acts as tie-breaker This helps avoid the split-brain problem (or a situation when Acceptors decisions are not in agreement) In a system with 2m+1 Acceptors, m Acceptors can fail and consensus can still be reached

  19. Paxos Algorithm: Phase I Phase I The Proposer selects a unique sequence (or round) number n and sends a prepare(n) request to a quorum of Acceptors Step 1: Prepare Each acceptor does the following: Note that multiple processes can bid to become coordinators If n > (the sequence number of any previous promises or acceptances) It writes n to a stable storage, promising that it will never accept any future proposed number less than n It sends a promise(n, (N, U)) response, where N and U are the last sequence number and value it accepted so far (if any) Hence, how can each coordinator select a unique sequence number? Every process, P, can be assigned a unique IDP, between 0 and k 1, assuming Step 2: Promise a total of k processes P can select the smallest sequence number, s, that is larger than all sequence numbers seen thus far, so that s % k = IDP E.g., P will pick a sequence number of 23 for its next bid if IDP = 3, k = 5, and largest number seen = 20

  20. Paxos Algorithm: Phase I Phase I The Proposer selects a unique sequence (or round) number n and sends a prepare(n) request to a quorum of Acceptors Step 1: Prepare Each Acceptor does the following: If n > (the sequence number of any of its previous promises or acceptances) It writes n to a stable storage, promising that it will never accept any future proposed number less than n It sends a promise(n, (N, U)) response, where N and U are the last sequence number and value it accepted so far (if any) Step 2: Promise

  21. Example Proposer Acceptor Acceptor Acceptor Client request prepare(n) Quorum Size = 3, which is decided by the proposer promise(n, NULL) promise(n, NULL) promise(n, NULL)

  22. Example Proposer Acceptor Acceptor Acceptor Client request Quorum Size = 2, which is the min acceptable quorum size in this example prepare(n) promise(n, NULL) promise(n, NULL)

  23. Paxos Algorithm: Phase II Phase II If the Proposer receives promise responses from a quorum of Acceptors, it sends an accept(n, v) request to those Acceptors (v is the value of the highest-numbered proposal among the promise responses, or any value if no promise contained a proposal) Step 1: Accept Each acceptor does the following: If n >= the number of any previous promise It writes (n, v) to a stable storage, indicating that it accepts the proposal It sends an accepted(n, v) response Else It does not accept (it sends a NACK) Step 2: Accepted

  24. Paxos Algorithm: Phase II Phase II If the Proposer receives promise responses from a quorum of Acceptors, it sends an accept(n, v) request to those Acceptors (v is the value of the highest-numbered proposal among the promise responses, or any value if no promise contained a proposal) Step 1: Accept Each Acceptor does the following: If n >= the number of any previous promise It writes (n, v) to a stable storage, indicating that it accepts the proposal It sends an accepted(n, v) response Else It does not accept (it sends a NACK) Step 2: Accepted

  25. Example Proposer Acceptor Acceptor Acceptor Client request prepare(n) promise(n, NULL) promise(n, NULL) accept(n, v) accepted(n, v) accepted(n, v) But, an Acceptor can accept multiple concurrent proposals!

  26. Example Proposer Proposer Acceptor Acceptor Acceptor prepare(1) promise(1, NULL) promise(1, NULL) prepare(2) promise(2, NULL) promise(2, NULL) accept(1, A) accepted(1, A) NAK(1) accept(2, B) accepted(2, B) accepted(2, B) But, what if before the blue Proposer sends its accept message, another Proposer (could be the green one again) submits a new proposal with a higher sequence number?

  27. Example Proposer Proposer Acceptor Acceptor Acceptor prepare(1) promise(1, NULL) promise(1, NULL) prepare(2) promise(2, NULL) promise(2, NULL) accept(1, A) accepted(1, A) NAK(1) accept(2, B) accepted(2, B) accepted(2, B) The blue round will fail also!

  28. Example Proposer Proposer Acceptor Acceptor Acceptor prepare(1) promise(1, NULL) promise(1, NULL) prepare(2) promise(2, NULL) promise(2, NULL) accept(1, A) accepted(1, A) NAK(1) accept(2, B) accepted(2, B) accepted(2, B) What if this keeps happening?

  29. Example Proposer Proposer Acceptor Acceptor Acceptor prepare(1) promise(1, NULL) promise(1, NULL) prepare(2) promise(2, NULL) promise(2, NULL) accept(1, A) accepted(1, A) NAK(1) accept(2, B) accepted(2, B) accepted(2, B) Paxos will not commit until this scenario stops!

  30. A Note on Liveness If two Proposers keep concurrently issuing proposals with increasing sequence numbers, none of them will succeed Hence, Paxos cannot guarantee liveness (i.e., cannot guarantee that a proposed value will be chosen within a finite time) Is there a way liveness can be guaranteed in Basic Paxos? Short Answer: No But: We can apply an optimization to potentially expedite (not guarantee) liveness in the presence of multiple concurrent Proposers

  31. A Note on Liveness To expedite liveness: A distinguished Proposer can be selected as the only entity to try submitting proposals If this distinguished Proposer: Can communicate successfully with a majority of Acceptors And uses a sequence number that is greater than any number used already Then it will succeed in issuing a proposal that can be accepted, assuming enough of the system (Proposer, Acceptors, and network) is working properly Clearly, liveness remains impossible to guarantee in finite time since any component in the system could fail (e.g., a network partition can arise)

  32. Possible Failures in Paxos Would a network partition impact Paxos scorrectness (NOT liveness)? No, due to the quorum mechanism What if an Acceptor fails? Case 1: The Acceptor is not a member of the Proposer s quorum No recovery is needed Case 2: The Acceptor is a member of the Proposer s quorum, but quorum size > majority of Acceptors No recovery is needed

  33. Possible Failures in Paxos Would a network partition impact Paxos scorrectness (NOT liveness)? No, due to the quorum mechanism What if an Acceptor fails? Case 3: The Acceptor is a member of the Proposer s quorum and quorum size equals to the majority of Acceptors Sub-case 3.1: The Acceptor fails after accepting the proposal No recovery is needed, assuming the Proposer will receive (or has received already) its acceptance message Sub-case 3.2: The Acceptor fails before accepting the proposal Worst case: New quorum and round can be established

  34. Possible Failures in Paxos What if a Proposer fails? Case 1: The Proposer fails after proposing a value, but before a consensus is reached New Proposer can take over Case 2: The Proposer fails after a consensus is reached, but before it gets to know about it Either its failure gets detected and a new round is launched Or, it recovers and starts a new round itself Case 3: The Proposer fails after a consensus is reached and after it gets to know about it (but before letting the Learner knowing) Either its failure gets detected and a new round is launched Or, it recovers and learns again from its stable storage that it has succeeded in its bidding

  35. Next Lecture Fault-tolerance

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#