[HN Gopher] Raft: Understandable Distributed Consensus (2014)
___________________________________________________________________
Raft: Understandable Distributed Consensus (2014)
Author : Hrun0
Score : 218 points
Date : 2024-09-27 12:54 UTC (1 days ago)
(HTM) web link (thesecretlivesofdata.com)
(TXT) w3m dump (thesecretlivesofdata.com)
| ko_pivot wrote:
| The writing and the visualizations are great. The 'continue'
| button is way too frequent.
| MarkMarine wrote:
| This is one of my favorite pieces of software engineering because
| it took something difficult and tried to design something easy to
| understand as a main criteria for success. The PHD Thesis has a
| lot more info about this if anyone is curious, it is approachable
| and easy to read:
|
| https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pd...
|
| I think this was core to Raft's success, and I strive to create
| systems like this with understandability as a first goal.
| throwawaymaths wrote:
| Weirdly it's also kinda worse is better: raft is non-
| deterministic and has an unboundedly long election cycle time.
| IIRC:
|
| - it assumes no hysteresis in network latencies and if there is
| a hysteresis it's possible that elections can be
| deterministically infinite.
|
| - this fact and the use of raft in production has caused real,
| large scale network outages.
|
| Paxos is of course a beast and hard to understand. There is an
| alternative, VSR (which was developed ~time of paxos) which is
| easy to understand and does not have the issues caused by
| election nondeterminism in raft.
|
| Of course everyone uses raft so raft dominates.
| eatonphil wrote:
| > - this fact and the use of raft in production has caused
| real, large scale network outages.
|
| While this has surely happened, I am not so confident about
| what the reasons were for this. If you've got links on
| details I'd love to read.
|
| > which is easy to understand
|
| I've implemented core bits of Raft twice now and have looked
| at VSR a couple of times and VSR wasn't easier for me to
| understand. I'm sure I could implement VSR and would like to
| some day, but just comparing the papers alone I personally
| felt like Raft was better presented (i.e. easier to
| understand).
|
| Also keep in mind that nobody ships consensus implementations
| exactly in line with the original paper. There are dozens or
| hundreds of papers on variations and extensions of Raft/Paxos
| and every actual implementation is going to implement some
| selection of these extensions/variations. You have to look at
| each implementation carefully to know how it diverges from
| the original paper.
| throwawaymaths wrote:
| > A known limitation of the base Raft protocol is that
| partial/asymmetric network partitions can cause a loss of
| liveness [27, 32]. For instance, if a leader can no longer
| make progress because it cannot receive messages from the
| other nodes, it continues to send AE heartbeats to
| followers, preventing them from timing out and from
| electing a new leader who can make progress.
|
| (Howard, Abram et al)
|
| Me: note this can also occur if there isn't a complete
| outage, if the latency back to the shit leader is different
| from the latency out of the shit leader.
|
| > nobody ships consensus implementations exactly in line
| with the original paper. There are dozens or hundreds of
| papers on variations
|
| As the paper above explains once you add extensions you
| might have broken the correctness proofs in raft. More to
| the original point, you're now in a state where it's no
| longer "simple"... I would go so far as to say if you
| _have_ to consider the extensions, which are distributed
| over several papers and sometimes not even papers at all,
| you 're in "deceptively simple" land.
|
| As a pedagogical tool, raft is valuable because it can be a
| launching ground for conversations like these... But maybe
| we shouldn't use it in prod when there are better,
| straightforward options? I get the feeling that being hard
| sold as simple nerdsniped devs into writing it and someone
| r/very smart put it into prod and with social proof more
| people did and now here we are
| lifeinthevoid wrote:
| > A known limitation of the base Raft protocol is that
| partial/asymmetric network partitions can cause a loss of
| liveness [27, 32]. For instance, if a leader can no
| longer make progress because it cannot receive messages
| from the other nodes, it continues to send AE heartbeats
| to followers, preventing them from timing out and from
| electing a new leader who can make progress.
|
| Real-world raft implementations make the leader step down
| if it hasn't heard from a quorum for a while. Not part of
| vanilla raft though.
| eatonphil wrote:
| The thesis does describe doing this fwiw while the paper
| does not.
| convolvatron wrote:
| don't forget about calm. its a shame that that isn't the
| default we reach for, and only struggle when we really need
| stronger latency.
|
| https://arxiv.org/abs/1901.01930
| prydt wrote:
| I think the CALM theorem and this whole line of research is
| so interesting and it is still carried on by the CRDT
| people. But I would love to see more of this.
|
| I feel like it doesn't get as much attention as it
| deserves.
| kfrzcode wrote:
| I've made a longer comment in another thread; but have you
| investigated the hashgraph algorithm? Gossip-about-gossip and
| virtual voting combine to result in leaderless consensus,
| fair ordering and aBFT. It's very performant with over 10k
| TPS on-chain. I'm learning about DLT from the perspective of
| hashgraph which is why I don't understand why it doesn't get
| love - it seems to have all of the good and none of the bad.
| MarkMarine wrote:
| > this fact and the use of raft in production has caused
| real, large scale network outages.
|
| Paxos as well, I remember full cloud GCP outage that had
| something to do with Paxos, and I can't find the data on it
| but I thought there was a nasty bug in zookeeper paxos
| implementation.
|
| That isn't to say any of these are perfect or bug free, it's
| made by humans and we're going to make mistakes, but my
| experience implementing both was I had a working raft
| implementation and paxos baked my brain until I gave up.
|
| I think everyone uses raft _because_ it was possible to
| implement for a working dev, so there are a number of
| implementations, and it's easier to understand the phases the
| application is in.
|
| I'll check out VSR I appreciate the rec.
| lanstin wrote:
| My zookeeper outages are always due to simple things like
| all the workers having the same basic image and then having
| increased write rate filling up the disk of all workers at
| the same time.
| rapsey wrote:
| While understandable, implementing it is however far from easy.
| MarkMarine wrote:
| Right, but Paxos is double hard in comparison. I've read both
| papers multiple times, tried to implement and failed, and I
| still don't think I understand Paxos.
| candiddevmike wrote:
| IMO, Paxos has a lot less edge cases than Raft, mostly
| because of the complexity of the implementation covers
| them/forces you to think about how to handle them.
| hinkley wrote:
| How many people actually understand Paxos?
|
| The assertion at the time was that only a few people
| understood it well enough to make a correct implementation,
| the others were full of bugs.
|
| The problem we have with Lamport is that he's very good at
| talking to computers but not so good at talking to humans.
| I think the world would be a better place today if someone
| had forced him to learn to speak human.
|
| He did a presentation at MS just after he won the Turing
| Award. He prefaces it with how important writing is to
| thinking, and how you don't really know what you think
| until you write it down. Those are the words and thinking
| of an introvert. Writing is still the shallow end of
| understanding. The deep end is teaching. If you understand
| something and you teach it to others, then you have proven
| that you understand it, and caused that understanding not
| to be lost to posterity. If you only write about it, it
| might work as instructional material, or it may require
| some very clever people who can teach themselves using your
| words. But they may also get it wrong, and not have you for
| feedback.
|
| The latter is where seem to be with Leslie's works. The
| consensus is that few people actually understand what he's
| talking about well enough to implement it correctly.
|
| I had a coworker once who was shocked to learn I read the
| ACM SIGPLAN proceedings. "You can read those??" I knew what
| he meant and yeah, a lot of those were very unapproachable
| and I understood two thirds of them and only half of each
| of the rest. Before I committed to using Raft I gave
| Lamport's paper a try. It was a slog and he doesn't sell
| the why of each part. He's just giving you a very, very
| long recipe without the mental models necessary to
| reproduce it robustly.
| mrkeen wrote:
| I'm on the other side.
|
| I think Leslie Lamport asserted that Paxos is minimal, and
| that "all other consensus algorithms are just Paxos with more
| steps". I'm inclined to believe him.
|
| I've implemented Paxos but I can't get through "Raft for
| dummies" style blog posts.
|
| Regarding Raft [1]: > The consensus problem
| is divided into three sub-problems: Leader election,
| Replication and Safety.
|
| What is leader election? It's a distributed system coming to
| consensus on a fact (i.e. who the leader is.) Then once you
| have the leader, you do additional steps. _The entirety_ of
| Paxos is a distributed system coming to consensus on a fact.
|
| When I read these posts, i see things like "timeout",
| "heartbeat", and I think: timeout according to whom? I read
| "once the leader has been elected", um, hangon, according to
| whom? Has node 1 finally agreed on the leader, just while
| node 3 has given up and started another election? I don't
| doubt that Raft is correct, but the writing about it seems
| simple by glossing over details.
|
| Paxos, on the other hand, seems timeless. (And the writing
| about it doesn't trigger my "distributed system fallacies"
| reaction)
|
| [1] https://www.brianstorti.com/raft/
| spmurrayzzz wrote:
| I tend to agree that many explanations of raft dont get
| into the useful details and handwave some of the hard
| problems. But the original paper does do a good job of this
| and is pretty accessible to read IMO.
|
| > I read "once the leader has been elected", um, hangon,
| according to whom? Has node 1 finally agreed on the leader,
| just while node 3 has given up and started another
| election?
|
| The simple response I think to "according to whom" is "the
| majority of voting nodes". When the leader assumes its
| role, it sends heartbeats which are then accepted by the
| other nodes in the cluster. Even if (in your example) node
| 3 starts a new election, it will only succeed if it can get
| a majority of votes. If node 2 has already acknowledged a
| leader, it won't vote for node 3 in the same term.
|
| There's some implicit concessions inherent there around
| eventual consistency, but I don't think thats novel to Raft
| compared to other distributed consensus protocols.
| withinboredom wrote:
| > The simple response I think to "according to whom" is
| "the majority of voting nodes".
|
| Reminds me of this one time we had a Raft cluster arguing
| over who was the leader for 20 minutes in production.
| Raft leader election is non-deterministic, while Paxos is
| deterministic. It can 'randomly' get into a situation it
| cannot resolve for quite a long time.
| spmurrayzzz wrote:
| > Reminds me of this one time we had a Raft cluster
| arguing over who was the leader for 20 minutes in
| production
|
| That's certainly an interesting failure mode. Do you
| recall the details around root cause? I could imagine
| ephemeral network partitions (flapping interfaces?
| peering loss?) causing something like this for sure.
|
| In my own experience, I've been running services that use
| Raft under the hood for the last ~10 years in production
| and haven't seen this happen myself. Though I do
| absolutely remember having misconfigured election
| timeouts causing very painful latency issues in failover
| scenarios.
| ivankelly wrote:
| 100% agree. I haven't read the raft paper in years, but I
| remember thinking there's just too much stuff in there.
| That stuff in important, but if you want people to
| understand what's happening they internalize the
| fundamental idea of being able to block other writers by
| bumping a number. Which is all covered in the single decree
| paxos section in part time parliment.
| kfrzcode wrote:
| Paxos is nice, sure, but Hedera does DLT _with_ aBFT and
| much more efficient, as well as being faster and ensuring
| fairness. It 's leaderless, and achieves incredible TPS
| (10k+ in practice, 100k+ in theory).
|
| I am curious on your thoughts here.
| Vervious wrote:
| Raft has a problem where, in the protocol description,
| sometimes I have no idea why some line is there, but if you
| take it out the protocol comes to a grinding halt... it's
| really a mumble jumble. It has good intuition, but the
| details are really messy, it's edge cases all the way down.
|
| Whereas I think each line of pseudocode in Paxos is much more
| motivated.
|
| In other words, if a philosopher had to design a crash-fault
| protocol from scratch, without having seen any before, I
| think 80% of the time it would look exactly like Paxos.
| jeffbee wrote:
| Students of raft say they understand it but when they have to
| implement it they make as many mistakes as students of paxos.
| The simplicity is misleading.
| ukd1 wrote:
| Having implemented it twice (for fun/learning, though) I think
| you're right - yet it's also the easiest, imho. The paper, and
| resources around implementing it (posts, other implementations)
| are great. Also, the authors still reply on github when issues
| are opened, which is great.
| jimbokun wrote:
| How do you write a testing framework/system for evaluating
| distributed algorithms like this?
|
| Doing that in a way that's easy to automate, set up, evaluate,
| and tear down various scenarios seems like the hardest part to
| me.
| skilning wrote:
| I was asked to click "continue" after each of the first two
| sentences, and the fade-in of the text took longer than reading
| the text.
|
| This may be a great article, but I'll never know because it's
| frustrating to try and read.
| benbjohnson wrote:
| Author here. I'm happy to answer any questions although this
| project was from 10+ years ago so I could be a little rusty.
|
| Over the years I've been trying to find better ways to do this
| kind of visualization but for other CS topics. Moving to video is
| the most realistic option but using something like After Effects
| takes A LOT of time and energy for long-form visualizations. It
| also doesn't produce a readable output file format that could be
| shared, diff'd, & tweaked.
|
| I spent some time on a project recently to build out an SVG-based
| video generation tool that can use a sidecar file for defining
| animations. It's still a work in progress but hopefully I can get
| it to a place where making this style of visualizations isn't so
| time intensive.
| mgenglder wrote:
| This is wonderful. Can I ask how you created it? Stack used and
| sour e code? I'd love to create something like this to help
| visualize things I'm working with currently.
| benbjohnson wrote:
| It's all done with d3 and JavaScript. The visualizations
| aren't deterministic so I ended up writing a shitty Raft
| implementation in JS. Overall it was a terrible approach
| because it was so time consuming but I made it work. You can
| find all the source code in this repo:
| https://github.com/benbjohnson/thesecretlivesofdata
| kfrzcode wrote:
| What are your thoughts on Dr. Leemon Baird's Hedera Hashgraph?
|
| https://www.swirlds.com/downloads/SWIRLDS-TR-2016-01.pdf
| benbjohnson wrote:
| I haven't read that paper but it seems like it's fixing a
| different problem of Byzantine fault tolerance. Most
| consensus systems that are internal for an organization don't
| have the Byzantine issue so it simplifies the problem.
| huntaub wrote:
| I just want you to know how much this visualization was
| appreciated. In my time working at AWS, I recommended this
| website to every one of our new hires to learn how distributed
| consensus works. Know that this has taught probably 50+ people.
| Thank you for what you've built.
| benbjohnson wrote:
| Thanks so much for letting me know! It's always hard to tell
| when I put something out there if it just gets lost in the
| ether. I'm glad to hear it helped so many folks.
| eatonphil wrote:
| Ben's visualization here is great.
|
| The other biggest help to me aside from the paper and the thesis
| was Ongaro's TLA+ spec:
| https://github.com/ongardie/raft.tla/blob/master/raft.tla. It's
| the only super concise "implementation" I found that is free of
| production-grade tricks, optimizations, and abstractions.
|
| And for building an intuition, TigerBeetle's sim.tigerbeetle.com
| is great. What happens to consensus when there's high latency to
| disk or network? Or as processes crash more frequently? It
| demonstrates.
| pa7ch wrote:
| Interesting that TigerBeetle uses Viewstamp Replication over
| Paxos/Raft. TB says viewstamp replication lends itself to a
| more performant implementation and doesn't rely on disk storage
| as much. I'm surprised I'm not seeing this brought up more in
| Paxos/Raft discussions.
| butterisgood wrote:
| VR is often overlooked and really pretty easily understood
| _russross wrote:
| I am in the minority who thinks Raft is overrated.
|
| I tried teaching Raft one year instead of Paxos but ended up
| switching back. While it was much easier to understand how to
| implement Raft, I think my students gained deeper insight when
| focusing on single-decision Paxos. There is a lightbulb moment
| when they first understand that consensus is a property of the
| system that happens first (and they can point at the moment it
| happens) and then the nodes discover that it has been achieved
| later. Exploring various failure modes and coming to understand
| how Paxos is robust against them seems to work better in this
| setting as well.
|
| I think this paper by Heidi Howard and Richard Mortier is a great
| way to move on to Multipaxos:
|
| https://arxiv.org/abs/2004.05074
|
| They present Multipaxos in a similar style to how Raft is laid
| out and show that Multipaxos as it is commonly implemented and
| Raft are almost the same protocol.
|
| Raft was a great contribution to the engineering community to
| make implementing consensus more approachable, but in the end I
| don't think the protocol itself is actually more understandable.
| It was presented better for implementers, but the implementation
| focus obscures some of the deep insights that plain Paxos
| exposes.
| withinboredom wrote:
| Know that I join you in Raft being overrated. I'm working on a
| multipaxos implementation right now. There are some really neat
| capabilities/properties that paxos has that Raft can never
| achieve (see wpaxos, for example, that lets keys migrate to
| nodes near the client).
| weinzierl wrote:
| This is an interesting insight into the educational side, but
| now I am curious about the implementation side. Raft is easier
| to implement but that's just one factor. Looking at real world
| usages there seems to be a draw. I could easily count as many
| Paxos implementations as Raft. Is this just historical or are
| there good reasons for a new project to still implement Oaxos?
| bfdes wrote:
| Paxos and Multi-Paxos have been around much longer than Raft.
| The paper that introduced Raft was published in 2014.
| alexchamberlain wrote:
| I've read both the Paxos and Raft papers a few times, and
| hacked on some implementations, but never quite got one over
| the line to working...
|
| Raft strikes me as a particular set of decisions made within a
| Paxos framework, such as having 1 entity for Proposers,
| Acceptor and Followers. It's frustrating that there isn't a
| clearly written defacto paper on Paxos - the story style
| confused the monkeys out of me.
| jimbokun wrote:
| > but never quite got one over the line to working...
|
| I've never implemented something like this. But my first
| thought is "how do you implement the testing system?"
|
| I feel like once you had a robust testing system that can
| verify things work correctly in all the different network
| partition and other scenarios, and allowing rapid iteration
| of setting up those scenarios, the implementation would be
| comparatively easy.
| alexchamberlain wrote:
| Yeah, you kind of can't test any of it until you test all
| of it...
| heromal wrote:
| Check out maelstrom
| senderista wrote:
| Agree, Raft is less modular and therefore harder to understand
| than MultiPaxos:
|
| https://maheshba.bitbucket.io/blog/2021/12/14/Modularity.htm...
| alexnewman wrote:
| You are telling me your students can safely implement single
| decree paxos.... I worked on a few paxos production
| implementations before the RAFT paper, single and multi decree.
| The idea that paxos, as it is to be implemented, is easy for
| students to understand... Well let me re-read the paper, but i
| assure you, raft was a big deal
| hintymad wrote:
| Is there any paper/handouts/video that explains Paxos in depth,
| especially its implementations and intuitions? Paxos Made
| Simple gave intuitive explanations, but I feel it still misses
| a lot of intricate details if I were to build Praxos for
| production use.
| mananaysiempre wrote:
| The papers and talks[1] by Heidi Howard and Richard Mortier,
| in particular "Paxos vs Raft: Have we reached consensus on
| distributed consensus?"[2,3] and "Flexible Paxos"[4,5], are
| what finally made things click for me. A real implementation
| also needs other stuff[6], though, such as dynamic membership
| and state machine replication, which I still don't know how
| to do.
|
| [1] https://fpaxos.github.io/
|
| [2] https://dl.acm.org/doi/abs/10.1145/3380787.3393681 or
| https://arxiv.org/abs/2004.05074
|
| [3] https://www.youtube.com/watch?v=0K6kt39wyH0
|
| [4] https://doi.org/10.4230/LIPIcs.OPODIS.2016.25 or
| https://arxiv.org/abs/1608.06696
|
| [5] https://www.youtube.com/watch?v=r6NG_1HM0lA
|
| [6] https://github.com/heidihoward/distributed-consensus-
| reading...
| nvarsj wrote:
| The best approach is to implement Paxos.
|
| I suggest https://github.com/emichael/dslabs.
|
| It will take 100-200 hours to fully implement with all tests
| passing.
| withinboredom wrote:
| Bummer that it requires java. Would be awesome to have a
| 'networked' version where you just need to implement the
| protocol.
| sriram_malhar wrote:
| I have had the opposite trajectory. Used to teach Paxos, but
| was so relieved to switch to Raft when the paper came out.
|
| I discuss distributed data structures in the context of maps
| and sequences.
|
| For maps, I discuss key-value stores (NUMA, Redis). I have them
| implement cache coherence (MESI protocol, TARDIS 2.0), then
| linearizable, fault-tolerant, wait-free shared memory registers
| (the Attiya/Bar-Noy/Dolev algorithm[1]).
|
| For sequences, I cover shared logs and state machine
| replication, including database log shipping, Kafka, queues and
| Raft.
|
| I like Raft because it cuts down the design space by making
| certain very intuitive and pragmatic choices, like using
| timeouts (which are almost beneath Lamport to discuss :), or
| idioms like "follow the leader", "if the leader is unreachable,
| stand for election", "elect the latest & most informed leader",
| (how I wish that was true in real life!), "always append" etc.
| There are simple mechanisms to preserve invariants.
|
| The problem with Paxos is that there is such a large range of
| papers that there is no one paper that makes the leap in easy
| digestible chunks from Basic to MultiPaxos. When I got students
| to implement MultiPaxos, I never could get sufficient
| confidence that it was done right (esp. the "disorderly"
| filling in of log slots).
|
| Paxos is like Monads; when you get it, you feel compelled to
| write a "Paxos explained" paper :)
|
| [1]https://groups.csail.mit.edu/tds/papers/Attiya/PODC90.pdf
| withinboredom wrote:
| I feel like you are doing your students a disservice.
| (multi-)Paxos, while complex to wrap your head around,
| enables far more modes of consensus. The possibilities and
| papers out there are amazing.
|
| Raft essentially only allows a single mode. Moreover, you are
| starting to see people putting things on top of Raft instead
| of something like Paxos, in the enterprise, because they
| don't know any better nor have the foundation to understand
| what they are doing is "wrong."
|
| > When I got students to implement MultiPaxos, I never could
| get sufficient confidence that it was done right
|
| Testing this is fairly straightforward, they should be able
| to join an already existing cluster. If they got it wrong, it
| shouldn't take down the cluster, and they should be able to
| step through their own code. There aren't any timeouts, so
| they can take their time, going through each step of the
| process until a value is committed.
|
| At that point, you simply explain each step as an individual
| algorithm, not the sum of its parts. You can even build each
| part individually because an existing cluster should recover
| from a misbehaving peer.
|
| From there, it is a rather simple visualization process to
| see what is going on.
|
| The hard part of paxos is building it from scratch.
| sriram_malhar wrote:
| Can you tell more about what other interesting modes
| multiPaxos allows? For an example of what I find
| uninteresting, it is proposers or acceptors not having a
| local disk (I know there are some uses for it, but there
| are relatively straightforward ways of solving that issue
| without requiring a whole new protocol). In all examples I
| have seen, multipaxos and raft are fairly alike, except for
| parts of their leadership election as Howard and Mortier
| also describe.
|
| Raft's simplicity and the fact that there's exactly one way
| to do it is what I find most comforting in the most
| important component of a distributed system. I can look at
| an implementation of raft and immediately understand what's
| going on, and what is missing.
| withinboredom wrote:
| What makes multiPaxos a better learning tool isn't
| multiPaxos for the sake of multiPaxos, but rather Paxos
| itself. The write-once consensus primitive is a valuable
| tool (whether you are implementing Raft or multiPaxos) or
| thing to know. Especially when it comes to flexible /
| compartmentalized Paxos. Don't get me wrong, the same
| things can theoretically be done with Raft (if they
| haven't already), but that single-degree primitive makes
| it make that much more sense. This is missing from Raft.
|
| It's like learning about a byte and never learning about
| endianness because everything is pretty much little-
| endian these days.
| kfrzcode wrote:
| DLT technology discussions are entirely incomplete without
| consideration of Hedera Hashgraph [0], an aBFT, leaderless, fair
| and _fast_ DLT using a gossip-about-gossip consensus mechanism.
| It 's absolutely a more robust and scalable technology than Paxos
| or any other DLT for that matter. I'd love to know what the HN
| crowd thinks about Hedera as the trust layer of the internet
| but.... nobody around here seems to have any. It's like ignoring
| Linux while comparing Mac and Windows based computing.
|
| [0]: https://www.swirlds.com/downloads/SWIRLDS-TR-2016-01.pdf
| ryanthemadone wrote:
| Paxos isn't a DLT, it's a consensus algorithm -- granted, DLTs
| tend to require a consensus algorithm, but they're not the same
| things.
|
| As for Hedera Hashgraph being the trust layer of the internet,
| we tend to build the internet through the IETF and standards
| setting. Unfortunately HH is an endeavour from a private
| company so isn't especially likely to be taken on in that
| context.
|
| I'd also wonder what you mean by the trust layer of the
| internet, what are the use cases that you'd like to see solved
| with such a trust layer?
| kfrzcode wrote:
| Great point, I wasn't being precise. My point stands,
| however! I must also correct your understanding re: "private
| company."
|
| The DLT in question - Hedera - is built on the unique
| consensus algorithm, the "hashgraph". The hashgraph algorithm
| combines a gossip-about-gossip protocol with virtual voting.
|
| In fact, services development is now in the hands of the
| largest open source foundation in the world. These
| implementations are entirely open source [0], and very
| recently the codebase has been donated in whole to the Linux
| Foundation.
|
| Furthermore, Hedera is a Pioneer member of the newly founded
| Linux Foundation Decentralized Trust organization [1] - along
| with Chainlink, Deloitte, Hitachi, many other major
| organizations, some of whom are also on the Hedera governing
| council [2]. This foundation will be a big player in the
| future of decentralized web, and Hedera is the only L1 that I
| know of which is both primed for this future and actually
| scalable.
|
| I understand the IETF and standards approach; and am not
| aware of a current draft or intention for such a draft. The
| idea of Hedera being the 'trust layer' of the internet is
| more about use cases like decentralized recovery, process
| validation, carbon offsets, extremely granular supply chain
| auditing, and any other application you might imagine that
| would benefit from having extremely fast (10k+ TPS), 100%
| guaranteed aBFT consensus on-chain. I'd love to hear what you
| might think up or where this could be particularly useful.
| Strong governance with 39 council members - including Google,
| IBM, Boeing, Tata, AP+, Hitachi and more... with
| decentralized network operation and stable fees (essential
| for enterprise application).
|
| > Additional use cases include, but are not limited to,
| financial markets, matching engines (like those used in Uber
| or AirBnb), or supply chain negotiations (e.g., several
| competing factories bidding on parts from several competing
| parts suppliers).
|
| So, admittedly the use cases maybe aren't evident or
| interesting to you and I right now at the TCP/IP layer, but I
| can certainly say there are a plethora of trust-based
| problems that could be solved with consensus only needing a
| few thousand ms. Think digital identity, healthcare,
| financial markets, IoT, supply chain, real-world asset
| tokenization... For any real-world, scaled application, a DLT
| must have very high throughput at high performance. It's the
| absolute highest performance and security possible in a
| leaderless consensus-based DLT as far as I know.
|
| Literally carbon neutral or negative because of buybacks but
| even without Hedera buying carbon credits, it's the single
| most "green" i.e. power-efficient DLT on the market. Does HN
| still care about Bitcoin using too much power? They would
| like Hedera, to that extent. Predictable, very low fixed
| fees. Long list of the biggest tech players leading the open
| development process. What's not to love?
|
| I urge you to help me invalidate these claims as it's pretty
| important I understand the tech here... But I'm very bullish
| on the token price as the technology is proven, robust, and
| overall extremely undervalued by retail - in my opinion. NFA.
|
| > The Hashgraph consensus algorithm is an algorithm for
| asynchronous Byzantine fault tolerance intended for
| distributed shared ledgers. Its main distinguishing
| characteristic is it achieves consensus without exchanging
| any extra messages; each participant's votes can be
| determined from public information, so votes need not be
| transmitted.
|
| For more rigorous explanation, see [3] and the associated Coq
| proof of the algorithm [4].
|
| [0]: https://github.com/hashgraph/hedera-services
|
| [1]: https://www.lfdecentralizedtrust.org/
|
| [2]: https://hedera.com/ecosystem/governing-council
|
| [3]: https://hedera.com/papers
|
| [4]: https://www.cs.cmu.edu/~crary/papers/2021/hashgraph.pdf
| lapinot wrote:
| I got interested into the hashgraph algorithm quite early and
| wrote a toy implementation in python (in fact discovering an
| error in the paper in the process). Unless something changed
| it's entirely useless for open-membership internet-scale
| consensus. As I remember, the processing time of a message at a
| node is linear in the number of nodes and same for the local
| storage at a node, meaning it's not very scalable. Moreover,
| the nodes must somehow a priori agree on the list of
| participants of the consensus process, again something which is
| not realistic for internet-wide consensus. The protocol is
| quite neat and not too hard to implement but it's similar in
| scope to paxos/raft: consensus inside an organization, where
| some things are a priori agreed upon.
| cedws wrote:
| Can't proof-of-work be used as a leader election algorithm? If
| the proof is hard enough to generate then one node should be able
| to generate one and broadcast it before the other nodes can, then
| that node becomes the leader.
| withinboredom wrote:
| There's a paper about that, using paxos as the base. Can't find
| it right now, it is called "chained paxos" or "block paxos" or
| something like that.
| ryanthemadone wrote:
| Proof of work is a leader election algorithm!
| mbivert wrote:
| In case this is of interest, MIT's 6.5840[0], distributed
| systems, has a series of labs, implementing Raft in Go. Haven't
| made it through the whole thing yet, but it's quite entertaining
| so far.
|
| The teachers provide you with some code templates, a bunch of
| tests, and a progressive way to implement it all.
|
| [0]: https://pdos.csail.mit.edu/6.824/index.html
| lifeinthevoid wrote:
| really liked the course, it was also instrumental in landing me
| my previous job :-)
| prydt wrote:
| I've run a reading group for distributed systems for the last 2
| years now and I do think that Raft is a better introduction to
| Consensus than any Paxos paper I have seen (I mean the Paxos Made
| Simple paper literally has bugs in it). But when I learned
| consensus in school, we used Paxos and Multi-Paxos and I do
| believe that there was a lot to be gained by learning both
| approaches.
|
| Heidi Howard has several amazing papers about how the differences
| between Raft and Multi-Paxos are very surface level and that
| Raft's key contribution is its presentation as well as being a
| more "complete" presentation since there are so many fragmented
| different presentations of Multi-Paxos.
|
| As a bonus, one of my favorite papers I have read recently is
| Compartmentalized Paxos:
| https://vldb.org/pvldb/vol14/p2203-whittaker.pdf which is just a
| brilliant piece on how to scale Multi-Paxos
| senderista wrote:
| There are several Multi-Paxos papers (some of them dating
| before Raft) that are intended as guidance for implementers:
|
| https://paper-notes.zhjwpku.com/assets/pdfs/paxos_for_system...
|
| https://www.cs.cornell.edu/home/rvr/Paxos/paxos.pdf
|
| https://www.scs.stanford.edu/~dm/home/papers/paxos.pdf
| prydt wrote:
| Ah thank you. That is a good list although I personally
| dislike the "Paxos Made Moderately Complex" paper... I think
| it adds too many different roles for very little benefit.
| When implementing multi-Paxos for class, I used that paper
| and felt it was more trouble than it needed to be.
|
| I'll check out the other two papers though! Also just looking
| around and I found this paper https://arxiv.org/pdf/1103.2408
| [PDF] which looks useful as well.
| shepherdjerred wrote:
| What's your reading group?
|
| I took a DS class and (poorly) implemented Paxos a few years
| ago. I'm curious about how others continue learning about DS.
| nano_o wrote:
| The bug in the Paxos Made Simple paper is that Lamport forgot
| to mention that, upon accepting a proposal, an acceptor also
| implicitly promises not to accept any proposal in lower
| ballots. It's discussed at length here:
| https://stackoverflow.com/questions/29880949/contradiction-i...
| dang wrote:
| Related:
|
| _Raft Consensus Animated (2014)_ -
| https://news.ycombinator.com/item?id=32484584 - Aug 2022 (67
| comments)
|
| _Raft Visualization_ -
| https://news.ycombinator.com/item?id=25326645 - Dec 2020 (35
| comments)
|
| _Raft: Understandable Distributed Consensus_ -
| https://news.ycombinator.com/item?id=8271957 - Sept 2014 (79
| comments)
| shiredude95 wrote:
| "Paxos Made Moderately Complex" by Robert van Renesse and Deniz
| Altinbuken:
| http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf is a
| great starting point for implementing multi-paxos. The authors
| also provide a working python implementation.
| jhanschoo wrote:
| Just earlier this month I was going through
| https://github.com/jepsen-io/maelstrom and following the demo
| implementing (a cut-corners version of) Raft, and I found it
| quite elucidating. The sample (ruby) code contains some bugs, and
| I had to use some understanding to fix them. (The bugs were of
| the kind where a dummy implementation from an earlier step isn't
| correctly changed)
| matthewfcarlson wrote:
| This really teaches Raft well. Is there a good example of this
| but for Paxos?
| quelltext wrote:
| In the log replication example, after healing the partition the
| uncommitted log changes in the minority group are rolled back and
| the leader's log is used.
|
| However it's not clear how that log is transmitted. Until this
| point only heartbeats via append entry were discussed, so it's
| not clear if the followers pull that information from the leader
| somehow via a different mechanism, or whether it's the leader's
| responsibility to detect followers that are left behind and
| replay everything. That would seem rather error prone and a lot
| of coordination effort. So how's it actually done?
| wg0 wrote:
| Off topic - folks in the know, besides Paxos/Raft what are some
| of the other most complex algorithm in computer sciences that are
| widely used or are a bedrock?
| mbivert wrote:
| I'd be curious to know as well; I'd bet on things pertaining to
| either compilers (e.g. optimization stuff), concurrency/OSes,
| graphics engines & image processing.
| withinboredom wrote:
| I can think of a deceptively simple one: recursive descent
| parsers. They are pretty straightforward to implement but hard
| to read/reason about if you don't already know the grammar.
| eatonphil wrote:
| Here's a visualization of Paxos and MultiPaxos that seems to be
| based on Ben's work: https://visual.ofcoder.com/.
___________________________________________________________________
(page generated 2024-09-28 23:02 UTC)