Why Consensus?
Imagine a distributed system with multiple servers. How do they agree on a
single truth, like the sequence of updates to a database? That’s where
consensus comes in, ensuring reliability even if some servers fail. Raft is
a consensus algorithm designed to be simple and understandable, making it a
favorite over the notoriously complex Paxos.
If you are familiar with the CAP theorem, Raft helps achieve consistency at the cost of availability, (along partition tolerance).
Part 1: Understanding Raft in Simple Steps
How Raft Works in 4 Simple Steps
Leader Election
Raft starts with followers. When a follower notices no leader’s heartbeat (because of failure or network issues), it becomes a candidate and starts an election. It requests votes from other nodes. If it gets a majority, it becomes the leader.
"Leader is the boss, followers are the team players."
Log Replication
The leader takes commands (like serving API requests, or whatever your Raft enabled service does, which would result in database updates) from clients, adds them to
its log, and sends these logs to followers. Once a majority acknowledges,
the leader commits the logs, ensuring consistency across the system.
Depending on your system, you could replicate the state from the logs at the moment, or recover the state after a failure.
Heartbeat Mechanism
The leader sends regular heartbeats to followers to maintain authority and prevent them from starting new elections. Heartbeats also act as "check points" for logs, making sure no one is out of sync (incase someone new has joined, or had gone out of sync due to network issues)
“I’m still in charge, listen to me!”
Consensus achieved. Data saved.
What Makes Raft Reliable?
- Safety: No conflicting logs, even during failures.
- Liveness: Progress is guaranteed, even if some nodes fail or the network gets messy.
Raft handles leader crashes and network splits gracefully, making it a robust choice for real-world systems.
Part 2: Comparing Raft to Paxos
Paxos laid the groundwork for consensus, but it decoding it is not a
misniscule task, leave alone an implementation.
It’s powerful, supposedly accurate, but complicated.
Simplicity:
Raft is like a well-illustrated guidebook. Paxos? More like a riddle.
Structure:
Raft separates leader election and log replication clearly. Paxos blends these phases, leading to confusion. Or rather, leaving it onto the developer to figure out how to use the "consensus".
This, is done as per the "Log Management" phases. Raft links logs to leader terms, making consistency easy to enforce. Paxos lacks this explicit link, leaving more room for interpretation.
Also, given there exist many implementations and some live uses of Raft, it is known to have acceptable time complexit (message overhead, etc) making it the best choice right now.
Check out my (not a hundred percent robust) implementation on GitHub
