[HN Gopher] The Raft Consensus Algorithm (2015)
       ___________________________________________________________________
        
       The Raft Consensus Algorithm (2015)
        
       Author : oumua_don17
       Score  : 240 points
       Date   : 2023-09-03 12:22 UTC (10 hours ago)
        
 (HTM) web link (raft.github.io)
 (TXT) w3m dump (raft.github.io)
        
       | henrik_w wrote:
       | Can't resist posting this (from Classic Programmer Paintings)
       | 
       | "Raft Consensus Algorithm Failure",Theodore Gericault, 1819
       | 
       | https://classicprogrammerpaintings.com/post/6141087496359280...
        
       | galenmarchetti wrote:
       | If you're interested in consensus algorithms, you might be
       | interested in this book that I used in a theoretical course on
       | distributed system called "Reasoning about Knowledge"
       | (https://mitpress.mit.edu/9780262562003/reasoning-about-
       | knowl...).
       | 
       | You have to invest a bit in learning about modal logic, but once
       | you do get past that part, this book provides proofs of why
       | things like Raft or Paxos work that are super intuitive and
       | straightforward. Basically pushing the complexity of proving
       | these algorithms into the logic structure used to form the proof
       | (in an intuitive way). Highly recommend, changed how I think
       | about consensus!
        
       | bjornasm wrote:
       | Here is their answer to their own question - "What is Raft?"
       | 
       | >Raft is a consensus algorithm that is designed to be easy to
       | understand. It's equivalent to Paxos in fault-tolerance and
       | performance. The difference is that it's decomposed into
       | relatively independent subproblems, and it cleanly addresses all
       | major pieces needed for practical systems. We hope Raft will make
       | consensus available to a wider audience, and that this wider
       | audience will be able to develop a variety of higher quality
       | consensus-based systems than are available today.
       | 
       | After reading that I still have no idea. They are not alone in
       | doing this, but I think its a shame that people don't spend the
       | extra time and effort in properly describing their work.
        
         | oasisaimlessly wrote:
         | Did you try reading the next paragraph ("Hold on--what is
         | consensus?")?
         | 
         | Sometimes concepts are too "big" to be introduced to someone
         | with no background in a single paragraph; such is life.
         | However, the linked article is a great intro if you manage to
         | keep reading.
        
         | BirdieNZ wrote:
         | Arguably it's a good description because it automatically
         | filters out audiences who don't know what it means; if you're
         | building a distributed system and looking at different
         | consensus algorithms then it's a simple and clear explanation,
         | but if you aren't then it's not a relevant algorithm for you
         | anyway!
         | 
         | The general class of consensus algorithms are for trying to
         | solve the problem of what to do when you have multiple replicas
         | of a data store across several physical devices, when one or
         | many of the devices or their connections fail in some fashion.
         | They are titled "consensus" because the machines need to come
         | to a consensus for what decision to make about a piece of data
         | when a failure event occurs.
         | 
         | For example, you have three servers all replicating the same
         | SQL database:
         | 
         | (A) - (B) - (C)
         | 
         | The network connection linking (C) to the other two drops; (A)
         | and (B) are notified and (B) is promoted to be the primary (it
         | receives the writes and then distributes them to the replicas).
         | 
         | However, (C) doesn't know what's happened, and continues to
         | receive some writes. The network connection is restored, and
         | now (A) and (B) and (C) need to decide what to do. (B) and (C)
         | have both independently received a different set of writes, and
         | the servers need to come to a consensus on what to do with the
         | data.
         | 
         | This is what Raft, Paxos etc. are attempting to solve in a
         | consistent and performant fashion.
        
       | EGreg wrote:
       | Is Raft byzantine-fault-tolerant though? Can it be made so?
       | 
       | Paxos can:
       | http://en.wikipedia.org/wiki/Paxos_(computer_science)#Byzant...
        
         | intelVISA wrote:
         | No, Raft is mostly for learning the basics of distributed
         | consensus afaik.
        
         | kevdev wrote:
         | Raft is not byzantine fault tolerant.
        
         | posix86 wrote:
         | It assumes no malicious agents. If you're malicious you can
         | take control over everything with a single message (advertise
         | being the leader of the next term).
        
         | sethev wrote:
         | There's a BFT version: https://www.scs.stanford.edu/17au-
         | cs244b/labs/projects/clow_...
        
         | dwheeler wrote:
         | Raft cannot handle Byzatune faults by itself. There are variant
         | algorithms that can.
        
         | u320 wrote:
         | Paxos is a family of algorithms of which Raft is a member. You
         | can change Raft in various ways but then it's not Raft anymore,
         | but it can still be Paxos. So of course Paxos can "do more".
        
       | henrik_w wrote:
       | When I was studying the Raft algorithm a year and a half ago, I
       | found this video on it by John Ousterhout to be a good
       | complement:
       | 
       | Designing for Understandability: The Raft Consensus Algorithm
       | 
       | https://www.youtube.com/watch?v=vYp4LYbnnW8
        
       | eatonphil wrote:
       | I had a fun time recently implementing Raft leader election and
       | log replication (i.e. I didn't get to
       | snapshotting/checkpointing). One of the most challenging projects
       | I've tried to do.
       | 
       | The Raft paper is very gentle to read, and gives you a great
       | intuition on its own. Even if you don't want to implement it, you
       | probably use software that uses it: like etcd or consul or
       | cockroach or tidb, etc.
       | 
       | I collected all the resources I found useful while doing it here:
       | https://github.com/eatonphil/goraft#references. This includes
       | Diego Ongaro's thesis and his TLA+ spec.
       | 
       | Some people say Figure 2 of the Raft paper has everything you
       | need but I'm pretty sure that's just not true. It's a little bit
       | more vague than looking at the TLA+ spec to me anyway.
        
         | agonz253 wrote:
         | This looks great. However, without a truly comprehensive test
         | suite it's guaranteed to have a great many subtle bugs :). I
         | recommend trying to hook it up with the tests from MIT's
         | distributed systems course:
         | https://pdos.csail.mit.edu/6.824/labs/lab-raft.html.
         | 
         | It looks as though it would take only a minor refactoring at
         | least for the leader election and log replication.
        
         | kevdev wrote:
         | Figure 2 is great, but I would agree the entire paper is needed
         | if you are implementing raft. There are a few specifics in the
         | paper that you need when implementing it.
         | 
         | P.S. love your blog :)
        
           | eatonphil wrote:
           | Cheers!
           | 
           | > Figure 2 is great, but I would agree the entire paper is
           | needed if you are implementing raft. There are a few
           | specifics in the paper that you need when implementing it.
           | 
           | It was more than that. I'm blanking on what it was but there
           | were parts where I really couldn't find _anything_ about the
           | intended behavior in the paper (let alone in Figure 2) except
           | for in Diego 's thesis or in the TLA+ spec.
           | 
           | Though maybe I was just not reading the paper correctly.
        
       | lpage wrote:
       | Maelstrom [1], a workbench for learning distributed systems from
       | the creator of Jepsen, includes a simple (model-checked)
       | implementation of Raft and an excellent tutorial on implementing
       | it.
       | 
       | Raft is a simple algorithm, but as others have noted, the
       | original paper includes many correctness details often brushed
       | over in toy implementations. Furthermore, the fallibility of
       | real-world hardware (handling memory/disk corruption and grey
       | failures), the requirements of real-world systems with tight
       | latency SLAs, and a need for things like flexible quorum/dynamic
       | cluster membership make implementing it for production a long and
       | daunting task. The commit history of etcd and hashicorp/raft,
       | likely the two most battle-tested open source implementations of
       | raft that still surface correctness bugs on the regular tell you
       | all you need to know.
       | 
       | The tigerbeetle team talks in detail about the real-world aspects
       | of distributed systems on imperfect hardware/non-abstracted
       | system models, and why they chose viewstamp replication, which
       | predates Paxos but looks more like Raft.
       | 
       | [1]: https://github.com/jepsen-io/maelstrom/
       | 
       | [2]:
       | https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/DE...
        
         | mananaysiempre wrote:
         | > [Viewstamped replication] predates Paxos but looks more like
         | Raft.
         | 
         | Heidi Howard and Richard Mortier's paper[1] on the topic of
         | Paxos vs Raft has (multi-decree) Paxos and Raft written out in
         | a way that makes it clear that they are very, very close. I'm
         | very far from knowing what consequences (if any) this has for
         | the implementation concerns you state, but the paper is lovely
         | and I wanted to plug it. (There was also a presentation[2], but
         | IMO the text works better when you want to refer back and
         | forth.)
         | 
         | [1] https://doi.org/10.1145/3380787.3393681
         | 
         | [2] https://www.youtube.com/watch?v=0K6kt39wyH0
        
         | LAC-Tech wrote:
         | The view-stamped replication paper was surprisingly readable -
         | I'd never looked at consensus algorithms before in my life and
         | I found I could kind of follow it after a couple of reads.
         | 
         | https://dspace.mit.edu/bitstream/handle/1721.1/71763/MIT-CSA...
        
         | m00dy wrote:
         | looks like a great playground to get familiar with ds
        
       | kevdev wrote:
       | I love this site. When I was learning & implementing raft in my
       | distributed systems course, this page was invaluable. Plus the
       | paper itself is pretty easy to read.
        
         | dwheeler wrote:
         | I teach a class distributed systems, and this site is one of
         | the references. Thanks for making Raft so clear!
        
       | candiddevmike wrote:
       | Anyone using object storage like S3 for cluster
       | coordination/election instead?
        
         | hotpotamus wrote:
         | I want to say that jgroups has an S3 coordination method. I
         | came across quite a few methods while trying to set up a
         | Keycloak cluster in Docker Swarm, but ultimately the Kubernetes
         | DNS method worked best for my case (Swarm seems close enough
         | that I was able to use it with no problem). I'd also note that
         | my understanding is that Swarm uses the Raft algorithm.
        
         | ithkuil wrote:
         | Not sure about S3. Iirc they relatively recently implemented
         | consistent writes.
         | 
         | But this article explains how to "piggyback" on stores
         | providing consistent writes in order to implement your own
         | little leader election Algo on top of a foundation layer
         | 
         | https://cloud.google.com/blog/topics/developers-practitioner...
        
         | cweld510 wrote:
         | I think S3 is too slow to put in your control plane's critical
         | path; better to use something like Redis.
        
       | maxpert wrote:
       | I've written a whole SQLite replication system that works on top
       | of RAFT ( https://github.com/maxpert/marmot ). Best part is RAFT
       | has a well understood and strong library ecosystem as well. I
       | started of with libraries and when I noticed I am reimplementing
       | distributed streams, I just took off the shelf implementation
       | (https://docs.nats.io/nats-concepts/jetstream) and embedded it in
       | system. I love the simplicity and reasoning that comes with RAFT.
       | However I am playing with epaxos these days
       | (https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf), because
       | then I can truly decentralize the implementation for truly
       | masterless implementation. Right now I've added sharding
       | mechanism on various streams so that in high load cases masters
       | can be distributed across nodes too.
        
       | dang wrote:
       | I took a crack at finding the interesting past related threads.
       | Any others?
       | 
       | (I've left out posts about particular implementations,
       | extensions, and so on--there are too many. The intention is
       | threads about the algorithm itself.)
       | 
       |  _Raft Is So Fetch: The Raft Consensus Algorithm Explained
       | Through Mean Girls_ -
       | https://news.ycombinator.com/item?id=33071069 - Oct 2022 (53
       | comments)
       | 
       |  _Raft Consensus Animated (2014)_ -
       | https://news.ycombinator.com/item?id=32484584 - Aug 2022 (67
       | comments)
       | 
       |  _Why use Paxos instead of Raft?_ -
       | https://news.ycombinator.com/item?id=32467962 - Aug 2022 (45
       | comments)
       | 
       |  _In Search of an Understandable Consensus Algorithm (2014)
       | [pdf]_ - https://news.ycombinator.com/item?id=29837995 - Jan 2022
       | (12 comments)
       | 
       |  _Raft Consensus Protocol_ -
       | https://news.ycombinator.com/item?id=29079079 - Nov 2021 (51
       | comments)
       | 
       |  _Paxos vs. Raft: Have we reached consensus on distributed
       | consensus?_ - https://news.ycombinator.com/item?id=27831576 -
       | July 2021 (48 comments)
       | 
       |  _Raft Visualization_ -
       | https://news.ycombinator.com/item?id=25326645 - Dec 2020 (35
       | comments)
       | 
       |  _Raft: A Fantastical and Absurd Exploration_ -
       | https://news.ycombinator.com/item?id=23129707 - May 2020 (1
       | comment)
       | 
       |  _Understanding Raft Consensus_ -
       | https://news.ycombinator.com/item?id=23128787 - May 2020 (3
       | comments)
       | 
       |  _In Search of an Understandable Consensus Algorithm (2014)
       | [pdf]_ - https://news.ycombinator.com/item?id=23113419 - May 2020
       | (26 comments)
       | 
       |  _Paxos vs. Raft: Have we reached consensus on distributed
       | consensus?_ - https://news.ycombinator.com/item?id=22994420 -
       | April 2020 (65 comments)
       | 
       |  _Raft Is So Fetch: The Raft Consensus Algorithm Explained
       | Through Mean Girls_ -
       | https://news.ycombinator.com/item?id=22520040 - March 2020 (4
       | comments)
       | 
       |  _Implementing Raft: Part 2: Commands and Log Replication_ -
       | https://news.ycombinator.com/item?id=22451959 - Feb 2020 (16
       | comments)
       | 
       |  _Building a Large-Scale Distributed Storage System Based on
       | Raft_ - https://news.ycombinator.com/item?id=21447528 - Nov 2019
       | (5 comments)
       | 
       |  _In Search of an Understandable Consensus Algorithm [pdf]_ -
       | https://news.ycombinator.com/item?id=14724883 - July 2017 (14
       | comments)
       | 
       |  _Instructors ' Guide to Raft_ -
       | https://news.ycombinator.com/item?id=11300428 - March 2016 (3
       | comments)
       | 
       |  _Fuzzing Raft for Fun and Publication_ -
       | https://news.ycombinator.com/item?id=10432062 - Oct 2015 (10
       | comments)
       | 
       |  _Prove Raft Correct_ -
       | https://news.ycombinator.com/item?id=10017549 - Aug 2015 (27
       | comments)
       | 
       |  _Scaling Raft_ - https://news.ycombinator.com/item?id=9725094 -
       | June 2015 (12 comments)
       | 
       |  _Raft Consensus Algorithm_ -
       | https://news.ycombinator.com/item?id=9613493 - May 2015 (24
       | comments)
       | 
       |  _Creator of Raft is speaking at our meetup. What questions do
       | you want answered?_ -
       | https://news.ycombinator.com/item?id=9351794 - April 2015 (6
       | comments)
       | 
       |  _Replicating SQLite using Raft Consensus_ -
       | https://news.ycombinator.com/item?id=9092110 - Feb 2015 (21
       | comments)
       | 
       |  _Raft Refloated: Do We Have Consensus? [pdf]_ -
       | https://news.ycombinator.com/item?id=9015085 - Feb 2015 (4
       | comments)
       | 
       |  _Analysis of Raft Consensus [pdf]_ -
       | https://news.ycombinator.com/item?id=8736868 - Dec 2014 (3
       | comments)
       | 
       |  _The Raft Consensus Algorithm_ -
       | https://news.ycombinator.com/item?id=8527440 - Oct 2014 (27
       | comments)
       | 
       |  _Raft: Understandable Distributed Consensus_ -
       | https://news.ycombinator.com/item?id=8271957 - Sept 2014 (79
       | comments)
       | 
       |  _Raft - The Understandable Distributed Protocol_ -
       | https://news.ycombinator.com/item?id=6859101 - Dec 2013 (10
       | comments)
       | 
       |  _Raft, a scrutable successor to Paxos_ -
       | https://news.ycombinator.com/item?id=5624627 - April 2013 (2
       | comments)
        
       | jmholla wrote:
       | Are there consensus algorithms that don't require changes go
       | through a leader? In many distributed systems, you want to
       | distribute intake as well.
        
         | dboreham wrote:
         | Blockchains
        
           | Rhapso wrote:
           | No, those only allow 1 leader and 1 intake at a time. They
           | just pick the leader by proof of waste first.
        
             | lucb1e wrote:
             | I think you may be confusing proof of work with blockchain
             | as a general principle. (I'm not a fan of this
             | cryptocurrency tech anymore either, my comment is purely
             | about the technical properties which can be interesting to
             | study.)
        
               | Detrytus wrote:
               | No, GP is right. Without Proof of Work blockchains have
               | no leader - think about Git repo with multiple branches:
               | which one is main? Nothing in blockchain itself
               | determines this, only we, humans chose to name one of
               | them "main" (formerly "master").
        
               | lucb1e wrote:
               | > Without Proof of Work blockchains have no leader
               | 
               | But, proof of stake?
        
         | mrkeen wrote:
         | Paxos doesn't favour any node above any other node.
         | 
         | That said, if you're trying to distribute for the purpose of
         | throughput, then it might be more efficient to only need one
         | leader rather than the quorum that Paxos requires. Only
         | speculating though.
         | 
         | Paxos is also more efficient if calls go to the same place each
         | time - you avoid contention and revotes, etc.
        
         | darkmarmot wrote:
         | Many systems have lots of leaders to handle concurrency. Ours
         | has millions of them for instance.
        
         | havnagiggle wrote:
         | There's a sharded paxos with a shardmaster. Essentially each
         | shard has its own paxos consensus.
        
         | User23 wrote:
         | This paper[1] discusses "termination detection" which consensus
         | can directly be built on with trivial proof. It assumes a
         | single environment, and while mathematically that is a kind of
         | graph node it really maps to all outside sources of input and
         | not a cluster member.
         | 
         | [1]
         | https://www.cs.utexas.edu/users/EWD/transcriptions/EWD06xx/E...
        
         | LAC-Tech wrote:
         | I think in this case there's no longer a 'consensus' - since
         | any node can accept any write immediately.
         | 
         | Look into CRDTs, which are essentially algebraic laws for
         | systems that do this and are guaranteed to eventually converge.
        
         | Racing0461 wrote:
         | The original dynamo paper (not dynamodb, its predecessor)
         | didn't use leader election. it used quorom with some speed
         | hacks (hinted handoffs and sloppy quorom)
        
         | vhiremath4 wrote:
         | There are algos which prioritize high fault tolerance and do so
         | by increasing the number of leaders (or the minimum number of
         | nodes that must have a copy of the lookup data).
         | 
         | One such algo is Chord:
         | 
         | https://en.m.wikipedia.org/wiki/Chord_(peer-to-peer)
         | 
         | It's a peer-to-peer ring of nodes which have their values
         | consistently hashed between them. The network leverages these
         | things called "finger tables" which essentially store
         | replication information in the form of a table. This table can
         | have information which is incorrect or outdated and the peer
         | you go to can tell you to go to another peer (usually the
         | "next"/"successor") until you find the value (or don't).
         | 
         | Reason this algo can be used with no "leader" is because it can
         | also work by just going to a node and doing a linear scan
         | across all nodes. You don't need a thumb table to speed up
         | queries.
        
         | toast0 wrote:
         | Electing a leader and sending changes through the leader
         | simplifies the system and improves throughput and
         | predictability in the presence of contention.
         | 
         | If your transactions are on independent topics, you can
         | distribute the load by sharding leaders: assign ranges of the
         | key space to different leaders and manipulate the elections so
         | each node has a reasonable share of leadership.
         | 
         | You can go leaderless and structure each write as essentially
         | an election: broadcast the tenative transaction (or a request
         | to transact, if the transaction is large enough) to all nodes,
         | if you get a quorum of acceptance, you win and can commit the
         | transaction. But if multiple nodes attempt transactions near
         | the same time, consensus may be time consuming. If you have
         | many nodes, and they all have pending transactions for the same
         | topic, electing a leader and sending all transactions through
         | the leader is going to be a lot faster than establishing
         | consensus on every transaction individually.
        
           | lostcolony wrote:
           | Or more succinctly - a quorum agreeing to a leader to
           | serialize writes through is both less work than trying to get
           | a quorum to agree on every individual write, and equally as
           | consistent.
        
       | htowerad3242 wrote:
       | It's a shame many undergraduate CS curricula are allergic to
       | distributed systems and type systems.
       | 
       | Even the grad program I was looking at is hot dog water.
       | 
       | I've been playing with raft and paxos. Employers will not care as
       | these were learned out-of-band from degree mills.
        
       | badcarbine wrote:
       | ELI5
        
         | quest88 wrote:
         | You need to at least try.
        
           | badcarbine wrote:
           | You need to, too.
        
         | cweagans wrote:
         | http://thesecretlivesofdata.com/raft/
        
         | mrkeen wrote:
         | * 1 computer will eventually break (or one service will crash,
         | or be replaced). To keep processing requests, you need more
         | than 1 computer.
         | 
         | * Two-or-more computers can be fed conflicting information.
         | 
         | * Consensus algorithms prevent such conflicts from being
         | accepted by the system.
         | 
         | * Raft is one such algorithm. It operates by choosing a leader
         | and then routing all requests via that leader, so that no
         | observers will see a different sequence of requests.
        
       | lucb1e wrote:
       | If anyone else doesn't understand what the visualisation is
       | supposed to show, note that you can click on one of the nodes and
       | make them fail. Particularly try this with the current "leader"
       | (the thing that's sending and receiving all the packets). Press
       | the little pause icon next to the first slider to turn it back
       | into a clock and resume the simulation.
       | 
       | Has someone else figured out what the spreadsheet on the right
       | is? It looks broken to me (but so I thought the rest of the
       | simulation was before understanding that it only shows the happy
       | flow by default), as it always remains empty. The clickable
       | elements I discovered so far are the two sliders, the clock/pause
       | icon, and the individual servers.
        
         | sethev wrote:
         | It shows the state of each replica's log. Click on the leader
         | and select 'request' to simulate sending a command. If you take
         | a replica offline, you can see that it falls behind and then
         | gets caught up when it comes back.
        
           | lucb1e wrote:
           | Ah, thanks, though I pushed that button a couple times (also
           | before your comment) but it didn't/doesn't do anything that I
           | can tell.
        
       | _daor wrote:
       | [flagged]
        
         | capableweb wrote:
         | > ??? That isn't easy to understand
         | 
         | Distributed systems aren't simple, but compared to Paxos, Raft
         | is definitely on the simpler side. I think they're also having
         | a specific audience in mind (distributed systems engineers)
         | rather than the public, when they write "it's easy to
         | understand", which makes sense.
        
       | 63 wrote:
       | > Raft is a consensus algorithm that is designed to be easy to
       | understand.
       | 
       | > Consensus typically arises in the context of replicated state
       | machines, a general approach to building fault-tolerant systems.
       | 
       | I recognize that I'm not the intended audience but I do think I
       | would be a lot more capable of understanding this article if it
       | used less jargon or at least defined what it meant. I'm only
       | mentioning this because ease of understanding is an explicit
       | goal.
       | 
       | Can someone give a real world example of where this would be used
       | in a production app? I'm sure it's very practical but I'm getting
       | caught up in trying to understand what it's saying
        
         | galenmarchetti wrote:
         | Consensus algorithms underly distributed databases like
         | Cassandra, or distributed caches like Redis. (in fact Redis
         | does use a version of Raft).
         | 
         | Simple use case, you're running a database across three
         | servers, so that users can ping any one of the servers and it
         | still works (fault tolerance).
         | 
         | User X pings server A to write something to a table. User Y
         | pings server B to read from that table.
         | 
         | How do you make sure that User Y reads what User X wrote, so
         | that User X and User Y are on the same page with what happened?
         | 
         | That's the consensus algo
        
           | couchand wrote:
           | Not to be too pedantic, but the step of actually shipping
           | those changes across servers is usually outside the scope of
           | consensus algorithms. Generally they are limited to picking a
           | single server to act as a short-term leader. That server then
           | is responsible for managing the data itself.
           | 
           | Though you can conceive of a system where all data flows
           | through the consensus algorithm, practically speaking that
           | would introduce significant overhead at a granularity where
           | it isn't adding value.
           | 
           | There isn't neceasarily one dictator for the whole cluster,
           | but rather usually they are scoped to each domain that
           | requires serialization.
        
         | adunk wrote:
         | Wikipedia has a nice list of software using the algorithm:
         | https://en.wikipedia.org/wiki/Raft_(algorithm)#Production_us...
        
           | oriettaxx wrote:
           | apparently there is no reference to Docker in swarm mode,
           | where it is used to decide what node should be the Leader
        
         | dcchuck wrote:
         | Simplest terms - your app needs several different services or
         | instances of a service to agree upon a value. There are a lot
         | of reasons you can't use things like the system clock to agree
         | upon when something happened for example - this is where RAFT
         | steps in.
         | 
         | You'll see "fault-tolerant" and "replicated state machines"
         | often alongside them. Let's break those down in this context.
         | 
         | For "fault-tolerance" - think production environments where I
         | need to plan for hardware failure. If one of my services goes
         | down, I want to be able to continue operating - so we run a few
         | copies of the app, and when one goes down, another operating
         | instance will step up.
         | 
         | In that case - how do we pick what's in charge? How do all
         | copies agree on things while everything is working smoothly?
         | Raft.
         | 
         | For "replicated state machines" - let's stay in this world of
         | fault-tolerance, where we have multiple instances of our app
         | running. In each service, could reside a state machine. The
         | state machine promises to take any ordered series of events,
         | and always arrive at the same value. Meaning - if all of our
         | instances get the same events in the same order, they will all
         | have the same state. Determinism.
         | 
         | This is where it all comes together, and why I think the jargon
         | becomes tightly coupled to an "easy to understand" definition.
         | 
         | You will reach for replicated state machines when you need
         | deterministic state across multiple service instances. But the
         | replicated state machines need a way to agree on order and
         | messages received. That's the contract - if you give everything
         | all the messages in the same order, everything will be in the
         | same state.
         | 
         | But how do we agree on order? How do we agree on what messages
         | were actually received? Just because "Client A" sends a
         | messages "1", and "2', in a specific order does not guaranteed
         | it is delivered at all, let alone in that order.
         | 
         | Raft creates "consensus" around these values. It allows the
         | copies to settle on which messages were actually received and
         | when.
         | 
         | So, you could use other approaches to manage "all your service
         | copies getting along" but a replicated state machine is a nice
         | approach. That replicated state machine architecture needs some
         | way to agree on order, and Raft is a great choice for that.
        
         | bjornasm wrote:
         | Made a similar comment as this. If it is for a wider audience
         | etc you would think it would be beneficial to explain it in a
         | way that the wider audience understands.
        
         | uw_rob wrote:
         | Think databases which run across many different machines.
         | 
         | Distributed databases are often conceptually modeled as a state
         | machine. Writes are then mutations on the state machine.
         | 
         | With a starting state (empty database), if everyone agrees that
         | on a fixed list of mutations which are executed in a rigidity
         | defined ordering, you will get the same final state.
         | 
         | Which makes sense, right? If you run the following commands on
         | an empty database, you would expect the same final state:
         | 
         | 1. CREATE TABLE FOO (Columns = A, B) 1. INSERT INTO FOO (1, 2)
         | 1. INSERT INTO FOO (3, 4)
         | 
         | Which would be:
         | 
         | ``` FOO: |A|B| |-|-| |1|2| |3|4| ```
         | 
         | So where does "consensus" come into play? Consensus is needed
         | to determine `mutation 4`. If the user can send a request to
         | HOST 1 saying 'Mutation 4. should be `INSERT INTO FOO (5, 6)`'
         | then HOST 1 will need to coordinate together with all of the
         | other hosts and hopefully all of the hosts can agree that this
         | is the 4th mutation and then enact that change on their local
         | replica.
         | 
         | This ordering of mutations is called the transaction log.
         | 
         | So, why is this such a hard problem? Most of the reasons are in
         | [Fallacies of distributed computing](https://en.wikipedia.org/w
         | iki/Fallacies_of_distributed_compu...) but the tl;dr is that
         | everything in distributed computing is hard because hardware is
         | unreliable and anything that can go wrong will go wrong. Also,
         | because multiple things can be happening at the same time in
         | multiple places so it's hard to figure out who came first, etc.
         | 
         | RAFT is such an algorithm to let all of these hosts coordinate
         | together in a fault tolerant way to figure out the 4th
         | mutation.
         | 
         | Disclaimer: The above is just one use of RAFT. Another way RAFT
         | is used in distributed databases is as a mechanism for the
         | hosts to coordinate a hierarchy of communicate among themselves
         | and when a host in the hierarchy is having problems RAFT can be
         | used again to figure out another hierarchy. (Think consensus is
         | reached on leader election to improve throughput)
        
       ___________________________________________________________________
       (page generated 2023-09-03 23:00 UTC)