Self-Stabilization in Distributed Systems: A Concept of Fault-Tolerance
Self-stabilization is a fault-tolerance concept ensuring a system reaches a legitimate state without external intervention. The system model involves communication among computers or state machines using message passing or shared memory. A self-stabilizing system guarantees convergence and closure from any state, maintaining correctness. Design issues include state count, network types, token ring optimization, mutual exclusion, and cost considerations.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Self- Stabilization Deepan Sathyamoorthy Krishna Bharadwaj Sai Sundaram Parthasarathi CS553 Self-stabilization 2/20/2025 1
Introduction It is a concept of fault-tolerance in distributed systems Regardless of a system's initial state, the system will end up in a legitimate state in a finite amount of time without any external intervention Example: Making a group of people to build a self stabilizing circle CS553 Self-stabilization 2/20/2025 2
System Model Consists of a set of n computers or state machines that communicate with each other Communication can be either message passing or shared memory Configuration for message passing system o Processes Pi and Pj are neighbors. Si is the state of the processor i and Qi,j is FIFO queue which contains all messages sent by processor Pi to its neighbor Pj CS553 Self-stabilization 2/20/2025 3
System Model contd. Configuration for shared memory model o si is the state of Pi and ri is the content of the communication register Read, Write and State transition operations can be done in a single atomic operation(Composite atomicity) CS553 Self-stabilization 2/20/2025 4
Self-Stabilizing systems Formal Definition A system is self-stabilizing if and only if o Starting from any state, it is guaranteed that the system will eventually reach a correct state (convergence) o Given that the system is in a correct state, it is guaranteed to stay in a correct state, provided that no fault happens (closure) CS553 Self-stabilization 2/20/2025 5
Issues in the design of self- stabilizing algorithms The main issues in the design of self-stabilization algorithm o Number of states in each of the machines o Uniform and Non-Uniform networks o Central and Distributed demon o Reducing the number of the states in the token ring o Shared Memory models o Mutual Exclusion o Cost of self-stabilization CS553 Self-stabilization 2/20/2025 6
Self-Stabilization in spite of distributed control Basis: The relation The system is in a legitimate state is kept invariant o Test each individual step to ensure that it doesn't violate the invariant and delay steps that could possibly cause violation until a more favorable system state has been reached Issue: Due to the lack of a shared memory, local actions taken on account of local information must accomplish a global objective. CS553 Self-stabilization 2/20/2025 7
Problem definition A system built from N+1 finite state machines numbered 0 through N. (State space is the Cartesian product of individual state spaces of N+1 machines) Machines are arranged in a ring as below: http://deptinfo.unice.fr/twiki/pub/Minfo/DistributedAlgo/Cours_Failures Detectors-Consensus-SelfStabilization.pdf 2/20/2025 8
Problemdefinition contd. A demon gives each time, one of the machines the command to adjust itself (In fair random order) As a function of its own state, (and states of its neighbors) a machine may be privileged The legitimate states are those in which exactly one machine is privileged CS553 Self-stabilization 2/20/2025 9
Observations The problem cannot be solved if we require our machines to be identical because then, one privileged machine cannot be chosen o We can make all the machines different from one another or, o We can make one machine exceptional and all other mutually equal The adjustment command when given to a non privileged machine will have no effect on it. When it is given to a privileged machine, after adjustment, it loses its privilege Thus a legitimate state is one where exactly one machine is privileged at any given point CS553 Self-stabilization 2/20/2025 10
Observations We must have one machine exceptional and all others to be mutually equal to one another. Let us suppose machine 0 is exceptional and rest of them from 1 to N are mutually equal. The 'being privileged' function can be defined to affect this behavior. Machine i is privileged if x[i] x[i-1]. Due to transitivity nature of equality, we can say that when every machine except for machine 0 is non privileged, then x[0] = x[N]. (x[i] is the state of i) If we define this to be the condition for machine 0 to be privileged then at least one machine will be privileged at any point. CS553 Self-stabilization 2/20/2025 11
Observations If the adjustment that we make should cause the system to lose it privilege then it must be o if x[i] x[i-1] then x[i] := x[i-1] //for any non exceptional machine o if x[0] = x[N] then x[0] := (x[0] + 1) mod k // for the exceptional machine CS553 Self-stabilization 2/20/2025 12
Circulation of token http://deptinfo.unice.fr/twiki/pub/Minfo/DistributedAlgo/Cours_FailuresDetect ors-Consensus-SelfStabilization.pdf 2/20/2025 13
Dijkstras Self-stabilization with four-state machines Idea: For each machine i (0 i N) we introduce two Booleans called x[i] and up[i] respectively. Let machine 0 be bottommachine and machine N be topmachine . In the bottom machine up[0] = true holds permanently, in the top machine up[N] = false holds permanently. All other Booleans are variables. (Two Boolean variables make up for the four states, hence the title!) CS553 Self-stabilization 2/20/2025 14
Dijkstras Self-stabilization with four-state machines contd. Two kinds of privileges are defined: privilege from below and privilege from above . For all machines except for the bottom machine, privilege from below is defined as: x[i] x[i-1] For the normal machines, corresponding move is defined as: x[i] := !x[i]; up[i] := true For top machine, it is reduced to: x[N] := !x[N] CS553 Self-stabilization 2/20/2025 15
Dijkstras Self-stabilization with four-state machines contd. For all machines except for the top machine, privilege from above is defined as o x[i] = x[i+1] and up[i] and !up[i+1] For the normal machines, corresponding move is defined as o up[i] := false For bottom machine, it is reduced to o x[0] := !x[0] CS553 Self-stabilization 2/20/2025 16
Dijkstras self-stabilizing token ring system contd. Third solution o When the number of states, k = 3 CS553 Self-stabilization 2/20/2025 17
Dijkstras self-stabilizing token ring system contd. The algorithm for the bottom machine: o If S = 0 and R = 1, then the state of S is changed to 2 o If S = 1 and R = 2, then the state of S is changed to 0 o If S = 2 and R = 0, then the state of S is changed to 1 CS553 Self-stabilization 2/20/2025 18
Dijkstras self-stabilizing token ring system contd. The algorithm for the top machine o If L = 0, then the state of S is changed to 1 o If L = 1, then the state of S is changed to 2 o If L = 2, then the state of S is changed to 0 CS553 Self-stabilization 2/20/2025 19
Dijkstras self-stabilizing token ring system contd. The algorithm for all other machines: o If s = 0 and L = 1, then s = 1 o If s = 1 and L = 2, then s = 2 o If s = 2 and L = 0. then s = 0 CS553 Self-stabilization 2/20/2025 20
Dijkstras self-stabilizing token ring system contd. CS553 Self-stabilization 2/20/2025 21
Ghoshs solution Special networks where the number of states required by each processor is either 0 or 1 A node needs to use information from all its neighbors. Let s[i] denote the state of a machine i Let b denote an arbitrary state and its complementary state be CS553 Self-stabilization 2/20/2025 22
Ghoshs solution CS553 Self-stabilization 2/20/2025 23
Ghoshs solution contd. This algorithm assumes a large atomicity because each machine must be able to examine the states of all its neighbors in one atomic step(composite atomicity) CS553 Self-stabilization 2/20/2025 24
Uniform vs. non-uniform networks In a uniform network, all machine execute the same algorithm It is often necessary to have non-uniformity among machines Exceptional machines executing different algorithm thus indicating non-uniformity CS553 Self-stabilization 2/20/2025 25
Central and Distributed demons The presence of central demon is an undesirable constraint in self-stabilizing algorithms With distributed demon, each privileged machine makes a decision unilaterally Central demons help in verifying the weak correctness of the algorithm Burns et al. showed that central demon assumption is not necessary for three and four state algorithms CS553 Self-stabilization 2/20/2025 26
Number of states in the token ring The objective is to minimize the number of states of a machine for efficient implementation Ghosh s solution uses only two states in a special network non-ring topology CS553 Self-stabilization 2/20/2025 27
Shared Memory Models No processor has direct access to the state of its neighbors Only way state of the neighbors is through shared registers The system is represented as a graph in which each processor is represented as a node and there is a link between two nodes which denote that there is communication between two processors In self-stabilization algorithm, only one process can change a register at any instance and this happens after the system is stabilized CS553 Self-stabilization 2/20/2025 28
Mutual Exclusion In mutual exclusion, each process has a critical section of code, and only one process can enter its critical section at any time, and every process that wants to enter its critical section must be able to enter its critical section in finite time If the process does not want to enter its critical section, it simply passes its privilege to its neighbor Since there is only one privilege in the system, each process enjoys a privilege an infinite amount of times Example: Token system, which has processes circulating tokens. If a process has a token, it enters its critical section. After a finite time, only one token exists in the system CS553 Self-stabilization 2/20/2025 29
Cost of self-stabilization Definition of self-stabilization does not put any upper bound on the number of transitions to reach a safe state starting from an unsafe state Two concepts related to cost of self-stabilization o Convergence span: Maximum number of transitions that can be executed in a system starting from an arbitrary state, before it reaches a safe state o Response span: The maximum number of transitions that can be executed in a system to reach a specified target state, starting from some initial state Aim of the designer is to reduce the convergence span and response span CS553 Self-stabilization 2/20/2025 30
Methodologies for design of self- stabilizing systems A malicious adversary(virus or hardware failure) might disrupt the distributed system. To be called self-stabilizing, a system must have the capability to recover when exposed to such attacks However, if the system is destroyed completely, then the self stabilization is not possible Layering and Modularization o How it works? Divide the system into smaller components, making each component self stabilizing and then integrate them to compose the system o First step is to build a self-stabilizing platform and any program written on that platform becomes self-stabilizing o We require primitives to provide a platform on which algorithms are built o Two primitives Common clock and Topology-based CS553 Self-stabilization 2/20/2025 31
Methodologies for design of self- stabilizing systems contd. Common clock primitives o Maintaining time through the use of local clocks in shared memory systems o Two properties safety and progress For synchronous shared memory system o Safety: All clocks have the same value o Progress: At each step, each clock is incremented by the same amount For asynchronous systems with shared memory: o Safety: Clocks of 2 neighboring nodes differ by at most 1 o Progress: A clock is incremented to i+1 when clocks at all neighboring nodes have i or i+1 CS553 Self-stabilization 2/20/2025 32
Methodologies for design of self- stabilizing systems contd. Topology based primitives Layer 1 o Elect a leader and construct a spanning tree Layer 2 o Algorithms for mutual exclusion or reset can be easily built on top of layer 1 Example o A two-layered self stabilizing algorithm for mutual exclusion First layer: creates a spanning tree from an arbitrarily connected graph Second layer: token-based system. When a node receives the token/privilege, it executes its critical section in a depth first manner The above two layers are superimposed to obtain a single self-stabilizing algorithm CS553 Self-stabilization 2/20/2025 33
Communication Protocols Collection of processes that exchange messages over communication links in a network Adversely affected for several reasons 1. Initialization to an illegal state 2. A change in the mode of operation 3. Transmission error because loss of message 4. Process failure and recovery 5. A local memory crash CS553 Self-stabilization 2/20/2025 34
Communication Protocols Protocol is stabilizing if and only if starting from any unsafe state, the protocol is guaranteed to converge to safe state within finite number of state transitions Gouda and Multari Properties to be self-stabilizing 1. It must be non-terminating 2. There are infinite number of safe states 3. There are timeout actions in a non-empty subset of process CS553 Self-stabilization 2/20/2025 35
Dolev, Israeli & Morgan Algorithm Self-stabilizing BFS spanning tree construction Algorithm for - Semi-uniform system - Central demon - Read/Write atomicity Every node maintains two variables - Pointer to one of its incoming edges - distance in hops to the root http://www.dcg.ethz.ch/lectures/ss06/distcomp/lecture/dolev_israeli_moran.pdf 2/20/2025 36
Dolev, Israeli & Morgan Algorithm Nodes periodically exchange their distance value Root node always send 0 Each node chooses the neighbor with minimum distance as its new parent and updates its distance accordingly After reading all the neighbor for k times, distance value of a process is at least k+1 After all the nodes at distance d have computed their distance and updated register, their register no longer change Eventually, entire tree will be stabilized http://www.dcg.ethz.ch/lectures/ss06/distcomp/lecture/dolev_israeli_moran.pdf 2/20/2025 37
Dolev, Israeli & Morgan Algorithm http://www.dcg.ethz.ch/lectures/ss06/distcomp/lecture/dolev_israeli_moran.pdf 2/20/2025 38
Dolev, Israeli & Morgan Algorithm http://www.dcg.ethz.ch/lectures/ss06/distcomp/lecture/dolev_israeli_moran.pdf 2/20/2025 39
Similar Algorithms Collin & Dolev Algorithm - Semi-uniform - DFS spanning-tree - Central demon - Read/Write atomicity Herman Algorithm - Semi-uniform - DFS spanning-tree - Central demon - Composite atomicity CS553 Self-stabilization 2/20/2025 40
Afek, Kutten & Yung Algorithm BFS spanning tree in the read/write atomicity No assumption of a distinguished root process All nodes have globally unique identifier that can be totally ordered Largest identifier will be root of the tree Periodically, nodes exchange identifier information If a node has maximum identifier, then it becomes root If there is larger root identifier, then send join request and receive grant from the root CS553 Self-stabilization 2/20/2025 41
Arora and Gouda Algorithm BFS spanning tree in the read/write atomicity with central demon No assumption of a distinguished root process Like Afek et al., all nodes have globally unique identifier that can be totally ordered and largest identifier will be root of the tree Needs a bound N on the number n of nodes in the network to work correctly Cycles are detected when distance bound grows to exceed N CS553 Self-stabilization 2/20/2025 42
Afek and Bremler Algorithm For synchronous and asynchronous networks Node with smallest identifier is considered as root Based on new idea called Power Supply Nodes are expected to receive power (a continuous flow of message, one per round) from root Only legal roots may be source of power Nodes attached to fake roots eventually fail to receive power and make themselves the root of new tree When a node receives power from a neighbor with small identifier, it attaches itself to the tree CS553 Self-stabilization 2/20/2025 43
1-Maximal Independent Set A maximal independent set is a set of nodes such that every node not in the set is adjacent to a node in the set A 1-maximal independent set is a maximal independent set provided one cannot increase the cardinality of the independent set by removing one node and adding more nodes 2/20/2025 44 http://people.cs.clemson.edu/~goddard/papers/maxIndependent.pdf
Shi et al., Algorithm A connected, undirected graph with node set V and edge set E N(i) denotes set of neighbor of node I Algorithm is presented as a set of rules, each with Boolean predicate and action. A node will be privileged if predicate is true If a node is privileged, it may execute the corresponding action, called move A central demon is assumed to exist Node is state 0 will be in desired set 2/20/2025 45 http://people.cs.clemson.edu/~goddard/papers/maxIndependent.pdf
Shi et al., Algorithm Rules 1. If not involved in a transition process, then set state to the number of neighbors in state 0 or state 0'. 2. If in state 0 and adjacent to at least two 1s, change to state 0. 3. If in state 1 and adjacent to a 0', change state to 1 . 4. If in state 0' and adjacent to at least two 1's, change state to 2 . 5. If in state 1' and adjacent to no 0', change state to 0. 6. If in state 2' and adjacent to no 1', change state to 2. 2/20/2025 46 http://people.cs.clemson.edu/~goddard/papers/maxIndependent.pdf
Shi et al., Algorithm 2/20/2025 47 CS553 Self-stabilization
Shi et al., Algorithm 2/20/2025 48 CS553 Self-stabilization
Probabilistic self-stabilizing leader election algorithm Dolev et al. algorithm to choose leader among n station During a time unit, stations can detect either silence, success or collision Silence no station tried to transmit a message Success only one station transmit a message Collision at least two stations attempted to transmit a message CS553 Self-stabilization 2/20/2025 49
Probabilistic self-stabilizing leader election algorithm CS553 Self-stabilization 2/20/2025 50