Understanding the Raft Consensus Algorithm

Understanding the Raft Consensus Algorithm
Slide Note
Embed
Share

Raft Consensus Algorithm, developed by Diego Ongaro and John Ousterhout at Stanford University, ensures agreement on shared state and autonomous recovery from server failures. It focuses on eliminating single points of failure and building consistent storage systems using replicated state machines. The algorithm compares to Paxos, highlighting Raft's easier understanding with less complexity. Consensus mechanisms like ZooKeeper, etcd, and consul are discussed, emphasizing the importance of consensus in maintaining system consistency.

  • Raft Consensus Algorithm
  • Shared State Agreement
  • Server Failures
  • Replicated State Machines
  • Consensus Mechanisms

Uploaded on Mar 02, 2025 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. The Raft Consensus Algorithm Diego Ongaro John Ousterhout Stanford University http://raftconsensus.github.io

  2. What is Consensus? Agreement on shared state (single system image) Recovers from server failures autonomously Minority of servers fail: no problem Majority fail: lose availability, retain consistency Servers Key to building consistent storage systems September 2014 Raft Consensus Algorithm Slide 2

  3. Inside a Consistent System TODO: eliminate single point of failure An ad hoc algorithm This case is rare and typically occurs as a result of a network partition with replication lag. OR A consensus algorithm (built-in or library) Paxos, Raft, A consensus service ZooKeeper, etcd, consul, September 2014 Raft Consensus Algorithm Slide 3

  4. Replicated State Machines Clients z 6 Consensus Module State Machine x 1 Consensus Module State Machine x 1 Consensus Module State Machine x 1 y 2 y 2 y 2 Servers z 6 z 6 z 6 Log Log Log x 3 y 2 x 1 z 6 x 3 y 2 x 1 z 6 x 3 y 2 x 1 z 6 Replicated log All servers execute same commands in same order replicated state machine Consensus module ensures proper log replication System makes progress as long as any majority of servers are up Failure model: fail-stop (not Byzantine), delayed/lost messages September 2014 Raft Consensus Algorithm Slide 4

  5. How Is Consensus Used? Top-level system configuration Replicate entire database state September 2014 Raft Consensus Algorithm Slide 5

  6. Paxos Protocol Leslie Lamport, 1989 Nearly synonymous with consensus The dirty little secret of the NSDI community is that at most five people really, truly understand every part of Paxos ;-). NSDI reviewer There are significant gaps between the description of the Paxos algorithm and the needs of a real-world system the final system will be based on an unproven protocol. Chubby authors September 2014 Raft Consensus Algorithm Slide 6

  7. Rafts Design for Understandability We wanted the best algorithm for building real systems Must be correct, complete, and perform well Must also be understandable What would be easier to understand or explain? Fundamentally different decomposition than Paxos Less complexity in state space Less mechanism September 2014 Raft Consensus Algorithm Slide 7

  8. Raft User Study Quiz Grades Survey Results September 2014 Raft Consensus Algorithm Slide 8

  9. Raft Overview 1. Leader election Select one of the servers to act as cluster leader Detect crashes, choose new leader 2. Log replication (normal operation) Leader takes commands from clients, appends them to its log Leader replicates its log to other servers (overwriting inconsistencies) 3. Safety Only a server with an up-to-date log can become leader September 2014 Raft Consensus Algorithm Slide 9

  10. RaftScope Visualization September 2014 Raft Consensus Algorithm Slide 10

  11. Core Raft Review 1. Leader election Heartbeats and timeouts to detect crashes Randomized timeouts to avoid split votes Majority voting to guarantee at most one leader per term 2. Log replication (normal operation) Leader takes commands from clients, appends them to its log Leader replicates its log to other servers (overwriting inconsistencies) Built-in consistency check simplifies how logs may differ 3. Safety Only elect leaders with all committed entries in their logs New leader defers committing entries from prior terms September 2014 Raft Consensus Algorithm Slide 11

  12. Randomized Timeouts How much randomization is needed to avoid split votes? Conservatively, use random range ~10x network latency September 2014 Raft Consensus Algorithm Slide 12

  13. Raft Implementations (Stale) go-raft kanaka/raft.js hashicorp/raft rafter ckite kontiki LogCabin akka-raft floss CRaft barge harryw/raft py-raft Go JS Go Erlang Scala Haskell C++ Scala Ruby C Java Ruby Python Ben Johnson (Sky) and Xiang Li (CoreOS) Joel Martin Armon Dadgar (HashiCorp) Andrew Stone (Basho) Pablo Medina Nicolas Trangez Diego Ongaro (Stanford) Konrad Malawski Alexander Flatten Willem-Hendrik Thiart Dave Rusek Harry Wilkinson Toby Burress September 2014 Raft Consensus Algorithm Slide 13

  14. Facebook HydraBase Example https://code.facebook.com/posts/321111638043166/hydrabase-the-evolution-of-hbase-facebook/ September 2014 Raft Consensus Algorithm Slide 14

  15. Conclusions Consensus widely regarded as difficult Raft designed for understandability Easier to teach in classrooms Better foundation for building practical systems Paper/thesis covers much more Cluster membership changes (simpler in thesis) Log compaction (expanded tech report/thesis) Client interaction (expanded tech report/thesis) Evaluation (thesis) September 2014 Raft Consensus Algorithm Slide 15

  16. Questions raftconsensus.github.io September 2014 Raft Consensus Algorithm Slide 16

More Related Content