[HN Gopher] RaptorCast: Designing a Messaging Layer
___________________________________________________________________
RaptorCast: Designing a Messaging Layer
Author : wwolffrec
Score : 32 points
Date : 2025-06-23 06:46 UTC (16 hours ago)
(HTM) web link (www.category.xyz)
(TXT) w3m dump (www.category.xyz)
| wwolffrec wrote:
| Monad uses RaptorCast to send out block proposals quickly and
| reliably to a global network of validators. At Category Labs,
| designing an effective messaging protocol to meet Monad's high
| performance requirements was challenging and educational. Read
| more about the design in the link.
| klabb3 wrote:
| > Assuming zero network latency and a bandwidth of 1 Gbps, this
| would still take around 16 seconds
|
| I'm not seeing how the new design affects the throughput needs,
| but I'll say this:
|
| Except for highly controlled environments (OS, NICs etc), you
| will run into perf issues with UDP-based protocols much sooner
| than with TCP, even if you're just pushing zeroes. Packet
| switching is much more difficult to optimize.
|
| If you only use sporadic messages without backpressure, and
| you're willing and able to handle out-of-order messages and
| retransmission logic, by all means, use UDP. Like for realtime
| multiplayer games, it makes sense.
|
| For high throughput on diverse platforms and hardware, the story
| is very different. Yes, even with Quic. I learnt this the hard
| way.
|
| All that said, I'm very curious what the results are. Is this
| designed fully deployed, and if so in what kind of environment
| and traffic patterns? Even better: benchmarks/stress tests would
| be fantastic.
| tklenze wrote:
| Thanks for your insights! Yes, real-life behavior is indeed
| interesting to look at, and for this purpose, two testnets are
| running right now (https://www.gmonads.com).
|
| RaptorCast uses erasure coding to break a block proposal into
| smaller pieces with plenty of redundancy to allow for
| omissions. This means that if you receive sufficiently many
| chunks, you can decode the block proposal (no matter which of
| the chunks you received). The redundancy factor can be tweaked,
| but it'll likely be >2x, to allow for networking issues and
| faulty/malicious nodes. Furthermore, the blockchain can make
| progress as long as >2/3 of the validators receive the block
| proposal and are honest. This means that at least in theory,
| you should be able to tolerate a lot of packet losses.
|
| Re throughput: Monad has 2 blocks / s, each 2MB in size. So
| even with a redundancy factor of 3x, each validator only has to
| send 12MB per second.
|
| Re backpressure: Not really an option for blockchains. If you
| have 100 peers and one of them is too slow, what are you going
| to do? If you back pressure to slow down consensus, you slow
| down the entire blockchain even though most peers are fast.
| There's a recent paper about this problem:
| https://arxiv.org/abs/2410.22080.
|
| What's important is that the amount of bandwidth required per
| validator remains constant in RaptorCast, no matter how many
| validators are part of the network. And you always just need
| one round-trip to broadcast a block proposal, as opposed to
| Gossip protocols that may involve more steps and have higher
| latency.
| lxgr wrote:
| > The redundancy factor can be tweaked, but it'll likely be
| >2x
|
| If your packet loss is due to your traffic overwhelming a
| queue at any intermediate hop, sending more redundant packets
| would be aggravating the problem instead of solving it.
|
| Are you running this on top of something providing congestion
| control?
| PhilipRoman wrote:
| It's a shame because the datagram paradigm is so much more
| elegant. In real world cases you end up having to emulate it by
| putting length prefixed data in TCP streams, reducing TCP
| timeouts, constantly reconnecting sockets (with the latency
| penalty), etc.
|
| Really, the only thing that's missing from UDP is (optional)
| backpressure.
|
| A lot of software can handle out-of-order datagrams with no
| performance penalty (like file uploads, etc.). This is
| especially annoying when you're operating in an environment
| with link aggregation where the interface insists on limiting
| your bandwidth to a single link.
| simcop2387 wrote:
| This is one reason that I'm still upset about the failure
| that SCTP has ended up. It really did try to create a new
| protocol for dealing with exactly all of these issues but
| support and ossification basically meant it's a non-starter.
| I'd have loved if it was a mandatory part of IPv6 so that
| it'd eventually get useful support but I'm pretty sure that
| would have made IPv6 adoption even worse.
| lxgr wrote:
| As long as you're fine with UDP encapsulation, you can
| definitely use SCTP today! WebRTC data channels do, for
| example.
| Veserv wrote:
| Well we have QUIC now which layers over UDP and is
| functionally strictly superior to SCTP as SCTP still
| suffered from head-of-line blocking due to bad
| acknowledgement design.
| lxgr wrote:
| > the only thing that's missing from UDP is (optional)
| backpressure.
|
| The lack of congestion control seems significant too. Most
| message-oriented protocols layered on top of UDP end up
| adding that back at the application layer as a consequence.
| yangl1996 wrote:
| Looks like there is no mentioning in the blogpost of the paper
| (poster) [1] in which the two-level broadcast idea is proposed.
|
| [1] https://dl.acm.org/doi/pdf/10.1145/3548606.3563494
| ethan_smith wrote:
| The cited paper is indeed foundational, introducing not just
| two-level broadcast but also optimizations for validator
| selection and network topology that RaptorCast appears to build
| upon.
| compyman wrote:
| It seems very similar to an earlier paxos optimization called
| 'pig paxos' https://dl.acm.org/doi/10.1145/3448016.3452834
___________________________________________________________________
(page generated 2025-06-23 23:01 UTC)