[HN Gopher] Fast UDP I/O for Firefox in Rust
___________________________________________________________________
Fast UDP I/O for Firefox in Rust
Author : Bender
Score : 291 points
Date : 2025-09-26 15:14 UTC (7 hours ago)
(HTM) web link (max-inden.de)
(TXT) w3m dump (max-inden.de)
| jcranmer wrote:
| > After many hours of back and forth with the reporter, luckily a
| Mozilla employee as well, I ended up buying the exact same
| laptop, same color, in a desperate attempt to reproduce the
| issue.
|
| Glad to know that networking still produces insanity trying to
| reproduce issues a la https://xkcd.com/2259/.
| 3form wrote:
| For that matter, a fun read in the "The map download struggle,
| part 2 (Technical)" section at
| https://www.factorio.com/blog/post/fff-176 (end of the
| document).
| Analemma_ wrote:
| Factorio's dev blog is a great deal of fun. It's on pause at
| the moment after the release of 2.0, but if you go through
| the archives there's great stuff in there. A lot of it is
| about optimizations which only matter once you're building
| 10,000+ SPM gigafactories, which casual players will never
| even come close to, but since crazy excess is practically
| what defines hardcore Factorio players it's cool to see the
| devs putting in the work to make the experience shine for
| their most devoted fans.
| rkomorn wrote:
| This is how I find out there's a 2.0 Factorio? What am I
| doing with my life??
| dinosaurdynasty wrote:
| Not only that, there's also a DLC with 4 new planets.
| rkomorn wrote:
| Well there goes the rest of the year...
| Maken wrote:
| Be careful, some of these new planets can _spoil_ the
| fun.
| rkomorn wrote:
| Oh? Tell me more.
| gizmo686 wrote:
| That's what I asked _after_ downloading it.
| bobmcnamara wrote:
| Could be related to UDP checksum offload.
|
| 0x0000 is a special value for some NICs meaning please
| calculate for me.
|
| One NIC years ago would set 0xFFFF for bad checksum. At first
| we thought this was horrifyingly broken. But really you can
| just fallback to software verification for the handful of
| legitimate and bad packets that arrive with that checksum.
| Joel_Mckay wrote:
| It is funnier if you've ever dealt with mystery packet runts,
| as most network appliances still do not handle them very
| cleanly.
|
| UDP/QUIC can DoS any system not based on a cloud deployment
| large enough to soak up the peak traffic. It is silly, but it
| pushes out any hosting operation that can't reach a
| disproportionate bandwidth asymmetry with the client traffic.
| i.e. fine for FAANG, but a death knell for most other
| small/medium organizations.
|
| This is why many LAN still drop most UDP traffic, and rate-
| limit the parts needed for normal traffic. Have a nice day =3
| znpy wrote:
| It's crazy thar sendmmsg/recvmmsg are considered "modern"... i
| mean, they've been around for quite a while.
|
| I was expecting to see io_uring mentioned somewhere in the linux
| section of the article.
| Cloudef wrote:
| io_uring doesn't really have equivalent[1], it can't batch
| multiple UDP diagrams, best it can do is batch multiple
| sendmsgs and recvmsgs. GSO/GRO is the way to go.
| sendmmsg/recvmmsg are indeed very old, and some kernel devs
| wish they could sunset them :)
|
| 1: https://github.com/axboe/liburing/discussions/1346
| metadat wrote:
| The key takeaway is hidden in the middle:
|
| _> In extreme cases, on purely CPU bound benchmarks, we're
| seeing a jump from < 1Gbit/s to 4 Gbit/s. Looking at CPU
| flamegraphs, the majority of CPU time is now spent in I/O system
| calls and cryptography code._
|
| 400% increase in throughput, which should translate to a
| proportionate reduction in CPU utilization for UDP network
| activity. That's pretty cool, especially for better power
| efficiency on portable clients (mobile and notebook).
|
| I found this presentation refreshing. Too often, claims about
| transition to "modern" stacks are treated as being inherently
| good and do not come with the data to back it up.
| a-dub wrote:
| i wonder if we'll ever see hardware accelerated cross-context
| message passing for user and system programs.
| wbl wrote:
| Shared ring buffers for IO exist in Linux, I don't think
| we'll ever see it extend to DMA for the NIC due to the
| rearchitecture of security required. However if the NIC is
| smart enough and the rules simple maybe.
| a-dub wrote:
| sure, but what about some kind of generalized cross-context
| ipc primitive towards a zero copy messaging mechanism for
| high performance multiprocessing microkernels?
| jitl wrote:
| There are systems that move the NIC control to user space
| entirely. For example Snabb has an Intel 10g Ethernet
| controller driver that appears to use a ring buffer on DMA
| memory.
|
| https://github.com/snabbco/snabb/blob/master/src/apps/intel
| /...
| Cloudef wrote:
| Interesting I was not aware of GSO/GRO equivalent on Windows and
| MacOS, though unfortunate that they seem buggy.
| superkuh wrote:
| Wow! Does this mean that Firefox can re-enable self-signed certs
| for it's HTTP/3 stack since it's using a custom implementation
| and not someone elses big QUIC lib and default build flags
| anymore? That'd be a huge win for human people and their typical
| LAN use cases. Even if the corporate use cases don't want it for
| 'security' reasons.
| jeroenhd wrote:
| I think self-signed certs should be possible on principal, but
| is there a reason to use HTTP/3 on LAN use cases? In low-
| latency situations, there's barely any advantage to using HTTP3
| over http/2, and even HTTP 1.1 is good enough for most use
| cases (and will outperform the other options in terms of pure
| throughput).
| ekr____ wrote:
| Certificate verification in Firefox happens at a layer way
| above HTTP and TLS (for those who care, it's in PSM), so which
| QUIC library is used is basically not relevant.
|
| The reason that Firefox -- and other major browsers -- make
| self-signed certs so difficult to use is that allowing users to
| override certificate checks weakens the security of HTTPS,
| which otherwise relies on certificates being verifiable against
| the trust anchor list. It's true that this makes certain cases
| harder, but the judgement of the browser community was that
| that wasn't worth the security tradeoff. In other words, it's a
| policy decision, not a technical one.
| rcxdude wrote:
| It's a pretty bad one, though. It massively undermines the
| security of connections to local devices for a slight
| improvement in security on the open internet. It's very
| frustrating how browser vendors don't even seem to consider
| it something worth solving, even if e.g. the way it is
| presented to the user is different. At the moment if you just
| use plain HTTP then things do mostly work (apart from some
| APIs which are somewhat arbitrarily locked to 'secure
| contexts' which means very little about the trustworthiness
| of the code that does or does not have access to those APIs),
| but if you try to use HTTPs then you get a million 'this is
| really inesecure' warnings. There's no 'use HTTPs but treat
| it like HTTP' option.
| dochtman wrote:
| I'm pretty sure private PKIs are an option that is pretty
| straightforward to use.
|
| Security is still a lot better because the root is
| communicated out of band.
| ekr____ wrote:
| I don't think it's correct to say that browser vendors
| don't think it's worth solving. For instance, Martin
| Thomson from Mozilla has done some thinking about it. https
| ://docs.google.com/document/u/0/d/170rFC91jqvpFrKIqG4K8....
|
| However, it's not an entirely trivial problem to get it
| right, especially because how how deeply the scheme is tied
| into the Web security model. Your example here is a good
| one of what I'm talking about:
|
| > At the moment if you just use plain HTTP then things do
| mostly work (apart from some APIs which are somewhat
| arbitrarily locked to 'secure contexts' which means very
| little about the trustworthiness of the code that does or
| does not have access to those APIs),
|
| You're right that being served over HTTPS doesn't make the
| site trustworthy, but what it _does_ do is provide
| integrity for the identity of the server. So, for instance,
| the user might look at the URL and decide that the server
| is trustworthy and can be allowed to use the camera or
| microphone. However, if you use HTTPS but without verifying
| the certificate, then an attacker might in the future
| substitute themselves and take advantage of that camera and
| microphone access. Another example is when the user enters
| their password.
|
| Rather than saying that browser vendors don't think this is
| worth solving in the abstract I would say that it's not
| very high on the priority list, especially because most of
| the ideas people have proposed don't work very well.
| ElectricalUnion wrote:
| Either you really are secure, or ideally you should not be
| able to even pretend you are secure. Allowing "pretend it's
| secure" downgrades the security in all contexts.
|
| IMHO they should gradually lock all dynamic code execution
| such as dynamic CSS and javascript behind a explicit toggle
| for insecure http sites.
|
| > It massively undermines the security of connections to
| local devices
|
| No, you see the prompt, it _is_ insecure. If the network
| admin wants it secure, it means either a internal CA, or a
| literally free cert from let 's encrypt. As the network
| admin did not care, it's insecure.
|
| "but I have legacy garbage with hardcoded self-signed
| certs" then reverse proxy that legacy garbage with Caddy?
| wolrah wrote:
| You can still have self-signed certs, you just have to actually
| set up your own CA and import it as trusted in the relevant
| trust store so it can be verified.
|
| You can't just have some random router, printer, NAS, etc.
| generate its own cert out of thin air and tell the browser to
| ignore the fact that it can't be verified.
|
| IMO this is a good thing. The way browsers handle HTTPS on
| older protocols is a result of the number of legacy badly
| configured systems there are out there which browser vendors
| don't want to break. Anywhere someone's supporting HTTP/3
| they're doing something new, so enforcing a "do it right or
| don't do it at all" policy is possible.
| Veserv wrote:
| While their improvements are real and necessary for actual high
| speed (100 Gb/s and up), 4 Gb/s is not fast. That is only 500
| MB/s. Something somewhere, likely not in their code, is terribly
| slow. I will explain.
|
| As the author cited, kernel context switch is only on the order
| of 1 us (which seems too high for a system call anyways). You can
| reach 500 MB/s even if you still call sendmsg() on literally
| every packet as long as you average ~500 bytes/packet which is
| ~1/3 of the standard 1500 bytes MTU. So if you average MTU sized
| packets, you get 2 us of _processing_ in addition to a full
| system call to reach 4 Gb /s.
|
| The old number of 1 Gb/s could be reached with a average of ~125
| bytes/packet, ~1/12 of the MTU or ~11 us of processing.
|
| "But there are also memory copies in the network stack." A
| trivial 3 instruction memory copy will go ~10-20 GB/s, 80-160
| Gb/s. In 2 us you can drive 20-40 KB of copies. You are arguing
| the network stack does 40-80(!) copies to put a UDP packet, a
| thin veneer over a literal packet, into a packet. I have written
| commercial network drivers. Even without zero-copy, with direct
| access you can shovel UDP packets into the NIC buffers at
| basically memory copy speeds.
|
| "But encryption is slow." Not that slow. Here is some AES-128 GCM
| performance done what looks like over 5 years ago. [1] The Intel
| i5-6500, a midline processor from 8 years ago, averages 1729
| MB/s. It can do the encryption for a 500 byte packet in 300 ns,
| 1/6 of the remaining 2 us budget. Modern processors seem to be
| closer to 3-5 GB/s per core, or about 25-40 Gb/s, 6-10x the
| stated UDP throughput.
|
| [1] https://calomel.org/aesni_ssl_performance.html
| vlovich123 wrote:
| There is no indication what class the CPU they're benchmarking
| on. Additionally, this is presumably including the overhead of
| managing the QUIC protocol as well given they mention
| encryption which isn't relevant for raw UDP. And QUIC is known
| to not have a good story of NIC offload for encryption at the
| moment the way you can do kTLS offload for TCP streams.
| Veserv wrote:
| Encryption is unlikely to be relevant. As I pointed out,
| doing it on any modern desktop CPU with no offload gets you
| 25-40 Gb/s, 6-10x faster than the benchmarked throughput. It
| is not the bottleneck unless it is being done horribly wrong
| or they do not have access to AES instructions.
|
| "It is slow because it is being layered over QUIC." Then why
| did you layer over a bottleneck that slows you down by 25x.
| Second of all, they did not used to do that and they still
| only got 1 Gb/s previously which is abysmal.
|
| Third of all, you can achieve QUIC feature parity (minus
| encryption which will be your per-core bottleneck) at 50-100
| Gb/s per core, so even that is just a function of using a
| slow protocol.
|
| Finally, CPU class used in benchmarking is largely irrelevant
| because I am discussing 20x per-core performance bottlenecks.
| You would need to be benchmarking on a desktop CPU from 25
| years ago to get that degree of single-core performance
| difference. We are talking iPhone 6, a decade old phone,
| territory for a efficient implementation to bottleneck on the
| processor at just 4 Gb/s.
|
| But again, it is probably not a problem with their code. It
| is likely something else stupid happening on the network
| stack or protocol side of which they are merely a client.
| raggi wrote:
| > which seems too high for a system call anyways
|
| spectre & meltdown.
|
| > you get 2 us of processing in addition to a full system call
| to reach 4 Gb/s
|
| TCP has route binding, UDP does not (connect(2) helps one side,
| but not both sides).
|
| > "But encryption is slow." Not that slow.
|
| Encryption _is slow_ for small PDUs, at least the common
| constructions we're currently using. Everyone's essentially
| been optimizing for and benchmarking TCP with large frames.
|
| If you hot loop the state as the micro-benchmarks do you can do
| better, but you still see a very visible cost of state setup
| that only starts to amortize decently well above 1024 byte
| payloads. Eradicate a bunch of cache efficiency by removing the
| tightness of the loop and this amortization boundary shifts
| quite far to the right, up into tens of kilobytes.
|
| ---
|
| All of the above, plus the additional framing overheads come
| into play. Hell even the OOB data blocks are quite expensive to
| actually validate, it's not a good API to fix this problem,
| it's just the API we have shoved over bsd sockets.
|
| And we haven't even gotten to buffer constraints and contention
| yet, but the default UDP buffer memory available on most
| systems is woefully inadequate for these use cases today. TCP
| buffers were scaled over time, but UDP buffers basically never
| were, they're still conservative values from the late 90s/00s
| really.
|
| The API we really need for this kind of UDP setup is one where
| you can do something like fork the fd, connect(2) it with a
| full route bind, and then fix the RSS/XSS challenges that come
| from this splitting. After that we need a submission queue API
| rather than another bsd sockets ioctl style mess (uring, rio,
| etc). Sadly none of this is portable.
|
| On the crypto side there are KDF approaches which can remove a
| lot of the state cost involved, it's not popular but some
| vendors are very taken with PSP for this reason - but PSP
| becoming more well known or used was largely suppressed by its
| various rejections in the ietf and in linux. Vendors doing
| scale tests with it have clear numbers though, under high
| concurrency you can scale this much better than the common tls
| or tls like constructions.
| Veserv wrote:
| I think you are just agreeing with me?
|
| You are basically saying: "It is slow because of all these
| system/protocol decisions that mismatch what you need to get
| high performance out of the primitives."
|
| Which is my point. They are leaving, by my estimation, 10-20x
| performance on the floor due to external factors. They might
| be "fast given that they are bottlenecked by low performance
| systems", which is good as their piece is not the bottleneck,
| but they are not objectively "fast" as the primitives can be
| configured to solve a substantially similar problem
| dramatically faster if integrated correctly.
| raggi wrote:
| > I think you are just agreeing with me?
|
| sure, i mean i have no goal of alignment or misalignment,
| i'm just trying to provide more insights into what's going
| on based on my observations of this from having also worked
| on this udp path.
|
| > Which is my point. They are leaving, by my estimation,
| 10-20x performance on the floor due to external factors.
| They might be "fast given that they are bottlenecked by low
| performance systems", which is good as their piece is not
| the bottleneck, but they are not objectively "fast" as the
| primitives can be configured to solve a substantially
| similar problem dramatically faster if integrated
| correctly.
|
| yes, though this basically means we're talking about
| throwing out chunks of the os, the crypto design, the
| protocol, and a whole lot of tuning at each layer.
|
| the only vendor in a good position to do this is apple
| (being the only vendor that owns every involved layer in a
| single product chain), and they're failing to do so as
| well.
|
| the alternative is a long old road, where folks make
| articles like this from time to time, we share our
| experiences and hope that someone is inspired enough
| reading it to be sniped into making incremental progress.
| it'd be truly fantastic if we sniped a group with the vigor
| and drive that the mptcp folks seem to have, as they've
| managed to do an unusually broad and deep push across a
| similar set of layered challenges (though still in
| progress).
| Arcuru wrote:
| > Instead of starting from scratch, we built on top of quinn-udp,
| the UDP I/O library of the Quinn project, a QUIC implementation
| in Rust. This sped up our development efforts significantly. Big
| thank you to the Quinn project.
|
| Awesome, so you sponsored them right?
|
| https://opencollective.com/quinn-rs
| kouteiheika wrote:
| > Awesome, so you sponsored them right?
|
| Why bother sponsoring any open source projects when they can
| throw a few extra million into their CEO's salary, while that
| CEO is running their flagship product (Firefox) into the
| ground?
| Avamander wrote:
| They contributed in other ways?
| dochtman wrote:
| When I asked about financial support, the Senior Principal
| Software Engineer from Mozilla I talked to said "Mozilla has no
| money".
|
| To be fair, we've gotten a great amount of code contributions
| from the Mozilla folks, so it's not like they haven't
| contributed anything.
|
| (I am one of the Quinn maintainers.)
| philipallstar wrote:
| I really liked this. All Mozilla content should be like this.
| Technical content written by literate engineers. No alegria.
| brycewray wrote:
| https://bugzilla.mozilla.org/show_bug.cgi?id=1979683
|
| Still seeing this in Firefox with Cloudflare-hosted sites on both
| macOS and Fedora.
___________________________________________________________________
(page generated 2025-09-26 23:00 UTC)