[HN Gopher] QUIC for the kernel
       ___________________________________________________________________
        
       QUIC for the kernel
        
       Author : Bogdanp
       Score  : 174 points
       Date   : 2025-07-31 15:57 UTC (7 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | WASDx wrote:
       | I recall this article on QUIC disadvantages:
       | https://www.reddit.com/r/programming/comments/1g7vv66/quic_i...
       | 
       | Seems like this is a step in the right direction to resole some
       | of those issues. I suppose nothing is preventing it from getting
       | hardware support in future network cards as well.
        
         | miohtama wrote:
         | QUIC does not work very well for use cases like machine-to-
         | machine traffic. However most of traffic in Internet today is
         | from mobile phones to servers and it is were QUIC and HTTP 3
         | shine.
         | 
         | For other use cases we can keep using TCP.
        
           | thickice wrote:
           | Why doesn't QUIC work well for machine-to-machine traffic ?
           | Is it due to the lack of offloads/optimizations for TCP and
           | machine-to-machine traffic tend to me high volume/high rate ?
        
             | yello_downunder wrote:
             | QUIC would work okay, but not really have many advantages
             | for machine-to-machine traffic. Machine-to-machine you tend
             | to have long-lived connections over a pretty good network.
             | In this situation TCP already works well and is currently
             | handled better in the kernel. Eventually QUIC will probably
             | be just as good for TCP in this use case, but we're not
             | there yet.
        
               | jabart wrote:
               | You still have latency, legacy window sizes, and packet
               | schedulers to deal with.
        
               | spwa4 wrote:
               | But that is the huge advantage of QUIC. It does NOT
               | totally outcompete TCP traffic on links (we already have
               | bittorrent over udp for that purpose). They redesigned
               | the protocol 5 times or so to achieve that.
        
             | m00x wrote:
             | It's explained in the reddit thread. Most of it is because
             | you have to handle a ton of what TCP does in userland.
        
             | extropy wrote:
             | The NAT firewalls do not like P2P UDP traffic. Majoritoy of
             | the routers lack the smarts to passtrough QUIC correctly,
             | they need to treat it the same as TCP essentially.
        
               | beeflet wrote:
               | NAT is the devil. bring on the IPoc4lypse
        
               | hdgvhicv wrote:
               | Nat is massively useful for all sorts of reasons which
               | has nothing to do with ip limitations.
        
               | unethical_ban wrote:
               | Rather, NAT is a bandage for all sorts of reasons besides
               | IP exhaustion.
               | 
               | Example: Janky way to get return routing for traffic when
               | you don't control enterprise routes.
               | 
               | Source: FW engineer
        
               | beeflet wrote:
               | sounds great but it fucks up P2P in residential
               | connections, where it is mostly used due to ipv4 address
               | conservation. You can still have nat in IPv6 but
               | hopefully I won't have to deal with it
        
               | paulddraper wrote:
               | The NAT RPC talks purely about IP exhaustion.
               | 
               | What do you have in mind.
        
             | dan-robertson wrote:
             | I think basically there is currently a lot of overhead and,
             | when you control the network more and everything is more
             | reliable, you can make tcp work better.
        
             | exabrial wrote:
             | For starters, why encrypt something literally in the same
             | datacenter 6 feet away? Add significant latency and
             | processing overhead.
        
               | 20k wrote:
               | Because the NSA actively intercepts that traffic. There's
               | a reason why encryption is non optional
        
               | Karrot_Kream wrote:
               | To me this seems outlandish (e.g. if you're part of PRISM
               | you know what's happening and you're forced to comply.)
               | But to think through this threat model, you're worried
               | that the NSA will tap intra-DC traffic but not that it
               | will try to install software or hardware on your hosts to
               | spy traffic at the NIC level? I guess it would be harder
               | to intercept and untangle traffic at the NIC level than
               | intra-DC, but I'm not sure?
        
               | viraptor wrote:
               | > you're worried that the NSA will tap intra-DC traffic
               | but not that it will try to install software or hardware
               | on your hosts
               | 
               | It doesn't have to be one or the other. We've known for
               | over a decade that the traffic between DCs was tapped htt
               | ps://www.theguardian.com/technology/2013/oct/30/google-
               | re... Extending that to intra-DC wouldn't be surprising
               | at all.
               | 
               | Meanwhile backdoored chips and firmware attacks are a
               | constant worry and shouldn't be discounted regardless of
               | the first point.
        
               | adgjlsfhk1 wrote:
               | The difference between tapping intra-DC and in computer
               | spying is that in computer spying is much more likely to
               | get caught and much less easily able to get data out.
               | There's a pretty big difference between software/hardware
               | weaknesses that require specific targeting to exploit and
               | passive scooping everything up and scanning
        
               | exabrial wrote:
               | Imaginary problems are the funnest to solve.
        
               | cherryteastain wrote:
               | If you are concerned about this, how do you think you
               | could protect against AWS etc allowing NSA to snoop on
               | you from the hypervisor level?
        
               | switchbak wrote:
               | Service meshes often encrypt traffic that may be running
               | on the same physical host. Your security policy may
               | simply require this.
        
               | lll-o-lll wrote:
               | To stop or slow down the attacker who is inside your
               | network and trying to move horizontally? Isn't this the
               | principle of defense in depth?
        
               | sleepydog wrote:
               | Encryption gets you data integrity "for free". If a bit
               | is flipped by faulty hardware, the packet won't decrypt.
               | TCP checksums are not good enough for catching corruption
               | in many cases.
        
               | mschuster91 wrote:
               | Because any random machine in the same datacenter and
               | network segment might be compromised and do stuff like
               | running ARP spoofing attacks. Cisco alone has had so many
               | vendor-provided backdoors cropping up that I wouldn't
               | trust _anything_ in a data center with Cisco gear.
        
       | dahfizz wrote:
       | > QUIC is meant to be fast, but the benchmark results included
       | with the patch series do not show the proposed in-kernel
       | implementation living up to that. A comparison of in-kernel QUIC
       | with in-kernel TLS shows the latter achieving nearly three times
       | the throughput in some tests. A comparison between QUIC with
       | encryption disabled and plain TCP is even worse, with TCP winning
       | by more than a factor of four in some cases.
       | 
       | Jesus, that's bad. Does anyone know if userspace QUIC
       | implementations are also this slow?
        
         | klabb3 wrote:
         | Yes, they are. Worse, I've seen them shrink down to nothing in
         | the face of congestion with TCP traffic. If Quic is indeed the
         | future protocol, it's a good thing to move it into the kernel
         | IMO. It's just madness to provide these massive userspace impls
         | everywhere, on a _packet switched_ protocol nonetheless, and
         | expect it to beat good old TCP. Wouldn't surprise me if we need
         | optimizations all the way down to the NIC layer, and maybe even
         | middleboxes. Oh and I haven't even mentioned the CPU cost of
         | UDP.
         | 
         | OTOH, TCP is like a quiet guy at the gym who always wears baggy
         | clothes but does 4 plates on the bench when nobody is looking.
         | Don't underestimate. I wasted months to learn that lesson.
        
           | vladvasiliu wrote:
           | Why is QUIC being pushed, then?
        
             | dan-robertson wrote:
             | The problem it is trying to solve is not overhead of the
             | Linux kernel on a big server in a datacenter
        
             | favflam wrote:
             | I know in the p2p space, peers have to send lots of small
             | pieces of data. QUIC stops stream blocking on a single
             | packet delay.
        
             | toast0 wrote:
             | It has good properties compared to tcp-in-tcp (http/2),
             | especially when connected to clients without access to
             | modern congestion control on iffy networks. http/2 was
             | perhaps adopted too broadly; binary protocol is useful,
             | header compression is useful (but sometimes dangerous), but
             | tcp multiplexing is bad, unless you have very low loss ...
             | it's not ideal for phones with inconsistent networking.
        
             | fkarg wrote:
             | because it _does_ provide a number of benefits (potentially
             | fewer initial round-trips, more dynamic routing control by
             | using UDP instead of TCP, etc), and is a userspace softare
             | implementation compared with a hardware-accelerated option.
             | 
             | QUIC getting hardware acceleration should close this gap,
             | and keep all the benefits. But a kernel (software)
             | implementation is basically necessary before it can be
             | properly hardware-accelerated in future hardware (is my
             | current understanding)
        
               | 01HNNWZ0MV43FF wrote:
               | To clarify, the userspace implementation is not a
               | benefit, it's just that you can't have a brand new
               | protocol dropped into a trillion dollars of existing
               | hardware overnight, you have to do userspace first as PoC
               | 
               | It does save 2 round-trips during connection compared to
               | TLS-over-TCP, if Wikipedia's diagram is accurate:
               | https://en.wikipedia.org/wiki/QUIC#Characteristics That
               | is a decent latency win on every single connection, and
               | with 0-RTT you can go further, but 0-RTT is stateful and
               | hard to deploy and I expect it will see very little use.
        
             | klabb3 wrote:
             | From what I understand the "killer app" initially was
             | because of mobile spotty networks. TCP is interface (and
             | IP) specific, so if you switch from WiFi to LTE the conn
             | breaks (or worse, degrades/times out slowly). QUIC has a
             | logical conn id that continues to work even when a peer
             | changes the path. Thus, your YouTube ads will not buffer.
             | 
             | Secondary you have the reduced RTT, multiple streams
             | (prevents HOL blocking), datagrams (realtime video on same
             | conn) and you can scale buffers (in userspace) to avoid BDP
             | limits imposed by kernel. However.. I think in practice
             | those haven't gotten as much visibility and traction, so
             | the original reason is still the main one from what I can
             | tell.
        
               | wahern wrote:
               | MPTCP provides interface mobility. It's seen widespread
               | deployment with the iPhone, so network support today is
               | much better than one would assume. Unlike QUIC, the
               | changes required by applications are minimal to none. And
               | it's backward compatible; an application can request
               | MPTCP, but if the other end doesn't support it,
               | everything still works.
        
         | dan-robertson wrote:
         | I think the 'fast' claims are just different. QUIC is meant to
         | make things fast by:
         | 
         | - having a lower latency handshake
         | 
         | - avoiding some badly behaved 'middleware' boxes between users
         | and servers
         | 
         | - avoiding resetting connections when user up addresses change
         | 
         | - avoiding head of line blocking / the increased cost of many
         | connections ramping up
         | 
         | - avoiding poor congestion control algorithms
         | 
         | - probably other things too
         | 
         | And those are all things about working better with the kind of
         | network situations you tend to see between users (often on
         | mobile devices) and servers. I don't think QUIC was meant to be
         | fast by reducing OS overhead on sending data, and one should
         | generally expect it to be slower for a long time until
         | operating systems become better optimised for this flow and
         | hardware supports offloading more of the work. If you are
         | Google then presumably you are willing to invest in specialised
         | network cards/drivers/software for that.
        
           | dahfizz wrote:
           | Yeah I totally get that it optimizes for different things.
           | But the trade offs seem way too severe. Does saving one round
           | trip on the handshake mean anything at all if you're only
           | getting _one fourth_ of the throughput?
        
             | dan-robertson wrote:
             | Are you getting one fourth of the throughput? Aren't you
             | going to be limited by:
             | 
             | - bandwidth of the network
             | 
             | - how fast the nic on the server is
             | 
             | - how fast the nic on your device is
             | 
             | - whether the server response fits in the amount of data
             | that can be sent given the client's initial receive window
             | or whether several round trips are required to scale the
             | window up such that the server can use the available
             | bandwidth
        
             | brokencode wrote:
             | Maybe it's a fourth as fast in ideal situations with a fast
             | LAN connection. Who knows what they meant by this.
             | 
             | It could still be faster in real world situations where the
             | client is a mobile device with a high latency, lossy
             | connection.
        
             | eptcyka wrote:
             | There are claims of 2x-3x operating costs on the server
             | side to deliver better UX for phone users.
        
             | yello_downunder wrote:
             | It depends on the use case. If your server is able to
             | handle 45k connections but 42k of them are stalled because
             | of mobile users with too much packet loss, QUIC could look
             | pretty attractive. QUIC is a solution to some of the
             | problematic aspects of TCP that couldn't be fixed without
             | breaking things.
        
               | drewg123 wrote:
               | The primary advantage of QUIC for things like congestion
               | control is that companies like Google are free to
               | innovate both sides of the protocol stack (server in
               | prod, client in chrome) simultaneously. I believe that
               | QUIC uses BBR for congestion control, and the major
               | advantage that QUIC has is being able to get a bit more
               | useful info from the client with respect to packet loss.
               | 
               | This could be achieved by encapsulating TCP in UDP and
               | running a custom TCP stack in userspace on the client.
               | That would allow protocol innovation without throwing
               | away 3 decades of optimizations in TCP that make it 4x as
               | efficient on the server side.
        
           | jeroenhd wrote:
           | > - avoiding some badly behaved 'middleware' boxes between
           | users and servers
           | 
           | Surely badly behaving middleboxes won't just ignore UDP
           | traffic? If anything, they'd get confused about udp/443 and
           | act up, forcing clients to fall back to normal TCP.
        
             | zamadatix wrote:
             | Your average middlebox will just NAT UDP (unless it's
             | outright blocked by security policy) and move on. It's TCP
             | where many middleboxes think they can "help" the congestion
             | signaling, latch more deeply into the session information,
             | or worse. Unencrypted protocols can have further
             | interference under either TCP or UDP beyond this note.
             | 
             | QUIC is basically about taking all of the information
             | middleboxes like to fuck with in TCP, putting it under the
             | encryption layer, and packaging it back up in a UDP packet
             | precisely so it's either just dropped or forwarded. In
             | practice this (i.e. QUIC either being just dropped or left
             | alone) has actually worked quite well.
        
         | rayiner wrote:
         | It's an interesting testament to how well designed TCP is.
        
           | adgjlsfhk1 wrote:
           | IMO, it's more a testament to how fast hardware designers can
           | make things with 30 years to tune.
        
         | eptcyka wrote:
         | QUIC performance requires careful use of batching. Using UDP
         | spckets naively, i.e. sending one QUIC packet per syscall, will
         | incur a lot of oberhead - every time the kernel has to figure
         | out which interface to use, queue it up on a buffer, and all
         | the rest. If one uses it like TCP, batching up lots of data and
         | enquing packets in one "call" helps a ton. Similarly, the
         | kernel wireguard implementation can be slower than wireguard-go
         | since it doesn't batch traffic. At the speeds offered by modern
         | hardware, we really need to use vectored I/O to be efficient.
        
         | Veserv wrote:
         | Yes. msquic is one of the best performing implementations and
         | only achieves ~7 Gbps [1]. The benchmarks for the Linux kernel
         | implementation only get ~3 Gbps to ~5 Gbps with encryption
         | disabled.
         | 
         | To be fair, the Linux kernel TCP implementation only gets ~4.5
         | Gbps at normal packets sizes and still only achieves ~24 Gbps
         | with large segmentation offload [2]. Both of which are
         | ridiculously slow. It is straightforward to achieve ~100
         | Gbps/core at normal packet sizes without segmentation offload
         | with the same features as QUIC with a properly designed
         | protocol and implementation.
         | 
         | [1] https://microsoft.github.io/msquic/
         | 
         | [2]
         | https://lwn.net/ml/all/cover.1751743914.git.lucien.xin@gmail...
        
         | 0x457 wrote:
         | I would expect that a protocol such as TCP performs much better
         | than QUIC in benchmarks. Now do a realistic benchmark over
         | roaming LTE connection and come back with the results.
         | 
         | Without seeing actual benchmark code, it's hard to tell if you
         | should even care about that specific result.
         | 
         | If your goal is to pipe lots of bytes from A to B over internal
         | or public internet there probably aren't make things, if any,
         | that can outperform TCP. Decades were spent optimizing TCP for
         | that. If HOL blocking isn't an issue for you, then you can keep
         | using HTTP over TCP.
        
       | jeffbee wrote:
       | This seems to be a categorical error, for reasons that are
       | contained in the article itself. The whole appeal of QUIC is
       | being immune to ossification, being free to change parameters of
       | the protocol without having to beg Linux maintainers to agree.
        
         | corbet wrote:
         | Ossification does not come about from the decisions of "Linux
         | maintainers". You need to look at the people who design, sell,
         | and deploy middleboxes for that.
        
           | jeffbee wrote:
           | I disagree. There is _plenty_ of ossification coming from
           | inside the house. Just some examples off the top of my head
           | are the stuck-in-1974 minimum RTO and ack delay time
           | parameters, and the unwillingness to land microsecond
           | timestamps.
        
             | otterley wrote:
             | Not a networking expert, but does TCP in IPv6 suffer the
             | same maladies?
        
               | pumplekin wrote:
               | Yes.
               | 
               | Layer4 TCP is pretty much just slapped on top of Layer3
               | IPv4 or IPv6 in exactly the same way for both of them.
               | 
               | Outside of some little nitpicky things like details on
               | how TCP MSS clamping works, it is basically the same.
        
               | ComputerGuru wrote:
               | ...which is basically how it's supposed to work (or how
               | we teach that it's supposed to work). (Not that you said
               | anything to the contrary!)
        
           | 0xbadcafebee wrote:
           | The "middleboxes" excuse for not improving (or replacing)
           | protocols in the past was horseshit. If a big incumbent
           | player in the networking world releases a new feature that
           | everyone wants (but nobody else has), everyone else
           | (including 'middlebox' vendors) will bend over backwards to
           | support it, because if you don't your competitors will and
           | then you lose business. It was never a technical or
           | logistical issue, it was an economic and supply-demand issue.
           | 
           | To prove it:
           | 
           | 1. Add a new OSI Layer 4 protocol called "QUIC" and give it a
           | new protocol number, and just for fun, change the UDP frame
           | header semantics so it can't be confused for UDP.
           | 
           | 2. Then release kernel updates to support the new protocol.
           | 
           | Nobody's going to use it, right? Because internet routers,
           | home wireless routers, servers, shared libraries, etc would
           | all need their TCP/IP stacks updated to support the new
           | protocol. If we can't ship it over a weekend, it takes too
           | long!
           | 
           | But wait. What if ChatGPT/Claude/Gemini/etc _only_ supported
           | communication over that protocol? You know what would happen:
           | every vendor in the world would backport firmware patches
           | overnight, bending over backwards to support it. Because they
           | can _smell_ the money.
        
         | toast0 wrote:
         | IMHO, you likely want the server side to be in the kernel, so
         | you can get to performance similar to in-kernel TCP, and
         | ossification is less of a big deal, because it's "easy" to
         | modify the kernel on the server side.
         | 
         | OTOH, you want to be in user land on the client, because
         | modifying the kernel on clients is hard. If you were Google,
         | maybe you could work towards a model where Android clients
         | could get their in-kernel protocol handling to be something
         | that could be updated regularly, but that doesn't seem to be
         | something Google is willing or able to do; Apple and Microsoft
         | can get priority kernel updates out to most of their users
         | quickly; Apple also can influence networks to support things
         | they want their clients to use (IPv6, MP-TCP). </rant>
         | 
         | If you were happy with congestion control on both sides of TCP,
         | and were willing to open multiple TCP connections like http/1,
         | instead of multiplexing requests on a single connection like
         | http/2, (and maybe transfer a non-pessimistic bandwidth
         | estimate between TCP connections to the same peer), QUIC still
         | gives you control over retransmission that TCP doesn't, but I
         | don't think that would be compelling enough by itself.
         | 
         | Yes, there's still ossification in middle boxes doing TCP
         | optimization. My information may be old, but I was under the
         | impression that nobody does that in IPv6, so the push for v6 is
         | both a way to avoid NAT and especially CGNAT, but also a way to
         | avoid optimizer boxes as a benefit for both network providers
         | (less expense) and services (less frustration).
        
           | jeffbee wrote:
           | This is a perspective, but just one of many. The overwhelming
           | majority of IP flows are within data centers, not over
           | planet-scale networks between unrelated parties.
        
             | toast0 wrote:
             | I've never been convinced by an explanation of how QUIC
             | applies for flows in the data center.
             | 
             | Ossification doesn't apply (or it shouldn't, IMHO, the
             | point of Open Source software is that you can change it to
             | fit your needs... if you don't like what upstream is doing,
             | you _should_ be running a local fork that does what you
             | want... yeah, it 's nicer if it's upstreamed, but try
             | running a local fork of Windows or MacOS); you can make
             | congestion control work for you when you control both
             | sides; enterprise switches and routers aren't messing with
             | tcp flows. If you're pushing enough traffic that this is an
             | issue, the cost of QUIC seems _way_ too high to justify,
             | even if it helps with some issues.
        
               | jeffbee wrote:
               | I don't see why this exception to the end-to-end
               | principle should exist. At the scale of single hosts
               | today, with hundreds of CPUs and hundreds of tenants in a
               | single system sharing a kernel, the kernel itself becomes
               | an unwanted middlebox.
        
             | jeroenhd wrote:
             | Unless you're using QUIC as some kind of datacenter-to-
             | datacenter protocol (basically as SCTP on steroids with
             | TLS), I don't think QUIC in the datacenter makes much sense
             | at all.
             | 
             | As very few server administrators bother turning on
             | features like MPTCP, QUIC has an advantage on mobile phones
             | with moderate to bad reception. That's not a huge issue for
             | me most of the time, but billions of people are using
             | mobile phones as their only access to the internet,
             | especially in developing countries that are practically
             | skipping widespread copper and fiber infrastructure and
             | moving directly to 5G instead. Any service those people are
             | using should probably consider implementing QUIC, and if
             | they use it, they'd benefit from an in-kernel server.
             | 
             | All the data center operators can stick to (MP)TCP, the
             | telco people can stick to SCTP, but the consumer facing
             | side of the internet would do well to keep QUIC as an
             | option.
        
               | mschuster91 wrote:
               | > That's not a huge issue for me most of the time, but
               | billions of people are using mobile phones as their only
               | access to the internet, especially in developing
               | countries that are practically skipping widespread copper
               | and fiber infrastructure and moving directly to 5G
               | instead.
               | 
               | For what it's worth: Romania, one of the piss poorest
               | countries of Europe, has a perfectly fine mobile phone
               | network, and even outback small villages have XGPON fiber
               | rollouts everywhere. Germany? As soon as you cross into
               | the country from Austria, your phone signal instantly
               | drops, barely any decent coverage outside of the cities.
               | And forget about PON, much less GPON or even XGPON.
               | 
               | Germany should be considered a developing country when it
               | comes to expectations around telecommunication.
        
           | ComputerGuru wrote:
           | One thing is that congestion control choice is sort of cursed
           | in that it assumes your box/side is being switched but the
           | majority of the rest of the internet continues with legacy
           | limitations (aside from DCTCP, which is designed for intra-
           | datacenter usage), which is an essential part of the question
           | given that resultant/emergent network behavior changes
           | drastically depending on whether or not all sides are using
           | the same algorithm. (Cubic is technically another sort-of-
           | exception, at least since it became the default Linux CC
           | algorithm, but even then you're still dealing with all sorts
           | of middleware with legacy and/or pathological stateful
           | behavior you can't control.)
        
         | Karrot_Kream wrote:
         | Do you think putting QUIC in the kernel will significantly
         | ossify QUIC? If so, how do you want to deal with the
         | performance penalty for the actual syscalls needed? Your
         | concern makes sense to me as the Linux kernel moves slower than
         | userspace software and middleboxes sometimes never update their
         | kernels.
        
         | GuB-42 wrote:
         | The _protocol_ itself is resistant to ossification, no matter
         | how it is implemented.
         | 
         | It is mostly achieved by using encryption, and it is a reason
         | why it is such an important and mandatory part of the protocol.
         | The idea is to expose as little as possible of the protocol
         | between the endpoints, the rest is encrypted, so that
         | "middleboxes" can't look at the packet and do funny things
         | based on their own interpretation of the protocol stack.
         | 
         | Endpoint can still do whatever they want, and ossification can
         | still happen, but it helps against ossification at the
         | infrastructure level, which is the worst. Updating the linux
         | kernel on your server is easier than changing the proprietary
         | hardware that makes up the network backbone.
         | 
         | The use of UDP instead of doing straight QUIC/IP is also an
         | anti-ossification technique, as your app can just use UDP and a
         | userland library regardless of the QUIC kernel implementation.
         | In theory you could do that with raw sockets too, but that's
         | much more problematic since because you don't have ports, you
         | need the entire interface for yourself, and often root access.
        
       | darksaints wrote:
       | For the love of god, can we please move to microkernel-based
       | operating systems already? We're adding a million lines of code
       | to the linux kernel _every year_. That 's so much attack surface
       | area. We're setting ourselves up for a kessler syndrome of sorts
       | with every system that we add to the kernel.
        
         | mdavid626 wrote:
         | Most of that code is not loaded into the kernel, only when
         | needed.
        
           | darksaints wrote:
           | True, but the last time I checked (several years ago), the
           | size of the portion of code that is not drivers or kernel
           | modules was still 7 million lines of code, and the average
           | system still has to load a few million more via kernel
           | modules and drivers. That is still a phenomenally large
           | attack surface.
           | 
           | The SeL4 kernel is 10k lines of code. OKL4 is 13k. QNX is
           | ~30k.
        
             | arp242 wrote:
             | Can I run Firefox or PostgreSQL with reasonable performance
             | on SeL4, OKL4, or QNX?
        
               | doubled112 wrote:
               | Reasonable performance includes GPU acceleration for both
               | rendering and decoding media, right?
        
               | 0x457 wrote:
               | yes
        
           | regularfry wrote:
           | You've still got combinatorial complexity problem though,
           | because you never know what a specific user is going to load.
        
             | beeflet wrote:
             | Often you do know what a specific user is going to load
        
         | wosined wrote:
         | I might be wrong, but microkernel also need drivers, so the
         | attack surface would be the same, or not?
        
           | kaoD wrote:
           | You're not wrong, but monolithic kernel drivers run at a
           | privilege level that's even higher than root (ring 0) while
           | microkernels run them at userspace so they're as dangerous as
           | running a normal program.
        
             | pessimizer wrote:
             | _" Just think of the power of ring-0, muhahaha! Think of
             | the speed and simplicity of ring-0-only and identity-
             | mapping. It can change tasks in half a microsecond because
             | it doesn't mess with page tables or privilege levels.
             | Inter-process communication is effortless because every
             | task can access every other task's memory.
             | 
             | "It's fun having access to everything."_
             | 
             | -- Terry A. Davis
        
               | beeflet wrote:
               | > Inter-process communication is effortless because every
               | task can access every other task's memory.
               | 
               | I think this would get messy quick in an OS designed by
               | more than one person
        
         | 01HNNWZ0MV43FF wrote:
         | Redox is a microkernel written in Rust
        
       | valorzard wrote:
       | Would this (eventually) include the unreliable datagram
       | extension?
        
         | wosined wrote:
         | Don't know if it could get faster than UDP if it is on top of
         | it.
        
           | valorzard wrote:
           | The use case for this would be running a multiplayer game
           | server over QUIC
        
           | 01HNNWZ0MV43FF wrote:
           | Other use cases include video / audio streaming, VPNs over
           | QUIC, and QUIC-over-QUIC (you never know)
        
       | wosined wrote:
       | The general web is slowed down by bloated websites. But I guess
       | this can make game latency lower.
        
         | fmbb wrote:
         | https://en.m.wikipedia.org/wiki/Jevons_paradox
         | 
         | The Jevons Paradox is applicable in a lot of contexts.
         | 
         | More efficient use of compute and communications resources will
         | lead to higher demand.
         | 
         | In games this is fine. We want more, prettier, smoother,
         | pixels.
         | 
         | In scientific computing this is fine. We need to know those
         | simulation results.
         | 
         | On the web this is not great. We don't want more ads, tracking,
         | JavaScript.
        
           | 01HNNWZ0MV43FF wrote:
           | No, the last 20 years of browser improvements has made my
           | static site incredibly fast!
           | 
           | I'm benefiting from WebP, JS JITs, Flexbox, zstd, Wasm, QUIC,
           | etc, etc
        
       | Ericson2314 wrote:
       | What will the socket API look like for multiple streams? I guess
       | it is implied it is the same as multiple connections, with
       | caching behind the scenes.
       | 
       | I would hope for something more explicit, where you get a
       | connection object and then open streams from it, but I guess that
       | is fine for now.
       | 
       | https://github.com/microsoft/msquic/discussions/4257 ah but look
       | at this --- unless this is an extension, the _server_ side can
       | also create new streams, once a connection is established. The
       | client creating new  "connections" (actually streams) cannot
       | abstract over this. Something fundamentally new _is_ needed.
       | 
       | My guess is recvmsg to get a new file descriptor for new stream.
        
         | gte525u wrote:
         | I would look at SCTP socket API it supports multistreaming.
        
           | Ericson2314 wrote:
           | I checked that out and....yuck!
           | 
           | - Send specifies which stream by _ordinal number_? (Can 't
           | have different parts of a concurrent app independently open
           | new streams)
           | 
           | - Receive doesn't specify which stream at all?!
        
           | wahern wrote:
           | API RFC is https://datatracker.ietf.org/doc/html/draft-lxin-
           | quic-socket...
        
             | Ericson2314 wrote:
             | Ah fuck, it still has a stream_id notion
             | 
             | How are socket APIs always such garbage....
        
       | Bender wrote:
       | I don't know about using it in the kernel but I would love to see
       | OpenSSH support QUIC so that I get some of the benefits of Mosh
       | [1] while still having all the features of OpenSSH including
       | SFTP, SOCKS, port forwarding, less state table and keep alive
       | issues, roaming support, etc... Could OpenSSH leverage the kernel
       | support?
       | 
       | [1] - https://mosh.org/
        
         | wmf wrote:
         | SSH would need a lot of work to replace its crypto and mux
         | layers with QUIC. It's probably worth starting from scratch to
         | create a QUIC login protocol. There are a bunch of different
         | approaches to this in various states of prototyping out there.
        
           | Bender wrote:
           | Fair points. I suppose Mosh would be the proper starting
           | point then. I'm just selfish and want the benefits of QUIC
           | without losing all the really useful features of OpenSSH.
        
         | bauruine wrote:
         | OpenSSH is an OpenBSD project therefore I guess a Linux api
         | isn't that interesting but I could be wrong ofc.
        
           | Bender wrote:
           | That's a good point. At least it would not be an entirely new
           | idea. [1] Curious what reactions he received.
           | 
           | [1] - https://papers.freebsd.org/2022/eurobsdcon/jones-
           | making_free...
        
       | kibwen wrote:
       | I'm confused, I thought the revolution of the past decade or so
       | was in moving network stacks to userspace for better performance.
        
         | michaelsshaw wrote:
         | The constant mode switching for hardware access is slow. TCP/IP
         | remains in the kernel for windows and Linux.
        
         | wmf wrote:
         | Performance comes from dedicating core(s) to polling, not from
         | userspace.
        
         | shanemhansen wrote:
         | You are right but it's confusing because there are two
         | different approaches. I guess you could say both approaches
         | improve performance by eliminating context switches and system
         | calls.
         | 
         | 1. Kernel bypass combined with DMA and techniques like
         | dedicating a CPU to packet processing improve performance.
         | 
         | 2. What I think of as "removing userspace from the data plane"
         | improves performance for things like sendfile and ktls.
         | 
         | To your point, Quic in the kernel seems to not have either
         | advantage.
        
           | FuriouslyAdrift wrote:
           | So... RDMA?
        
             | michaelsshaw wrote:
             | No, the first technique describes the basic way they
             | already operate, DMA, but giving access to userspace
             | directly because it's a zerocopy buffer. This is handled by
             | the OS.
             | 
             | RDMA is directly from bus-to-bus, bypassing all the
             | software.
        
         | zamalek wrote:
         | What is done for that is userspace gets the network data
         | directly without (I believe) involving syscalls. It's not
         | something you'd do for end-user software, only the likes of
         | MOFAANG need it.
         | 
         |  _In theory_ the likes of io_uring would bring these benefits
         | across the board, but we haven 't seen that delivered (yet, I
         | remain optimistic).
        
           | phlip9 wrote:
           | I'm hoping we get there too with io_uring. It looks like the
           | last few kernel release have made a lot of progress with
           | zero-copy TCP rx/tx, though NIC support is limited and you
           | need some finicky network iface setup to get the flow
           | steering working
           | 
           | https://docs.kernel.org/networking/iou-zcrx.html
        
         | 0xbadcafebee wrote:
         | Networking is much faster in the kernel. Even faster on an
         | ASIC.
         | 
         | Network stacks were moved to userspace because Google wanted to
         | replace TCP itself (and upgrade TLS), but it only cared about
         | the browser, so they just put the stack _in_ the browser, and
         | problem solved.
        
         | Karrot_Kream wrote:
         | You still need to offload your bytes to a NIC buffer. Either
         | you can do something like DMA where you get privileged space to
         | write your bytes to that the NIC reads from or you have to
         | cross the syscall barrier and have your kernel write the bytes
         | into the NIC's buffer. Crossing the syscall barrier adds a huge
         | performance penalty due to the switch in memory space and
         | privilege rings so userspace networking only makes sense if
         | you're not having to deal with the privilege changes or you
         | have DMA.
        
           | Veserv wrote:
           | That is only a problem if you do one or more syscalls per
           | packet which is a utterly bone-headed design.
           | 
           | The copy itself is going at 200-400 Gbps so writing out a
           | standard 1,500 byte (12,000 bit) packet takes 30-60 ns (in
           | steady state with caches being prefetched). Of course you get
           | slaughtered if you stupidly do a syscall (~100 ns hardware
           | overhead) per packet since that is like 300% overhead. You
           | just batch like 32 packets so the write time is ~1,000-2,000
           | ns then your overhead goes from 300% to 10%.
           | 
           | At a 1 Gbps throughput, that is ~80,000 packets per second or
           | one packet per ~12.5 us. So, waiting for a 32 packet batch
           | only adds a additional 500 us to your end-to-end latency in
           | return for 4x efficiency (assuming that was your bottleneck;
           | which it is not for these implementations as they are nowhere
           | near the actual limits). If you go up to 10 Gbps, that is
           | only 50 us of added latency, and at 100 Gbps you are only
           | looking at 5 us of added latency for a literal 4x efficiency
           | improvement.
        
         | toast0 wrote:
         | Most QUIC stacks are built upon in-kernel UDP. You get
         | significant performance benefits if you can avoid your traffic
         | going through kernel and userspace and the context switches
         | involved.
         | 
         | You can work that angle by moving networking into user space...
         | setting up the NIC queues so that user space can access them
         | directly, without needed to context switch into the kernel.
         | 
         | Or you can work the angle by moving networking into kernel
         | space ... things like sendfile which let a tcp application
         | instruct the kernel to send a file to the peer without needing
         | to copy the content into userspace and then back into kernel
         | space and finally into the device memory, if you have in-kernel
         | TLS with sendfile then you can continue to skip copying to
         | userspace; if you have NIC based TLS, the kernel doesn't need
         | to read the data from the disk; if you have NIC based TLS and
         | the disk can DMA to the NIC buffers, the data doesn't need to
         | even hit main memory. Etc
         | 
         | But most QUIC stacks don't get benefit from either side of
         | that. They're reading and writing packets via syscalls, and
         | they're doing all the packetization in user space. No chance to
         | sendfile and skip a context switch and skip a copy. Batching io
         | via io_uring or similar helps with context switches, but
         | probably doesn't prevent copies.
        
       | qwertox wrote:
       | I recently had to add `ssl_preread_server_name` to my NGINX
       | configuration in order to `proxy_pass` requests for certain
       | domains to another NGINX instance. In this setup, the first
       | instance simply forwards the raw TLS stream (with
       | `proxy_protocol` prepended), while the second instance handles
       | the actual TLS termination.
       | 
       | This approach works well when implementing a failover mechanism:
       | if the default path to a server goes down, you can update DNS A
       | records to point to a fallback machine running NGINX. That
       | fallback instance can then route requests for specific domains to
       | the original backend over an alternate path without needing to
       | replicate the full TLS configuration locally.
       | 
       | However, this method won't work with HTTP/3. Since HTTP/3 uses
       | QUIC over UDP and encrypts the SNI during the handshake,
       | `ssl_preread_server_name` can no longer be used to route based on
       | domain name.
       | 
       | What alternatives exist to support this kind of SNI-based routing
       | with HTTP/3? Is the recommended solution to continue using
       | HTTP/1.1 or HTTP/2 over TLS for setups requiring this behavior?
        
         | jcgl wrote:
         | Hm, that's a good question. I suppose the same would apply to
         | TCP+TLS with Encrypted Client Hello as well, right? Presumably
         | the answer would be the same/similar between the two.
        
       | jbritton wrote:
       | The article didn't discuss ACK. I have often wondered if it makes
       | sense for the protocol to not have ACKs, and to leave that up to
       | the application layer. I feel like the application layer has to
       | ensure this anyway, so I don't know how much benefit it is to
       | additionally support this at a lower layer.
        
       | xgpyc2qp wrote:
       | Looks good. Quick is a real game changer for many. Internet
       | should be a little faster with it. Probably we will not care
       | because of 5g, but still valuable. Wondering that there is a
       | separate tow handshake, I was thinking that qick embeds tls, but
       | seams like I am wrong.
        
       | another_twist wrote:
       | I have a question - bottleneck for TCP is said to the handshake.
       | But that can be solved by reusing connections and/or
       | multiplexing. The current implementation is 3-4x slower than the
       | Linux impl and performance gap is expected to close.
       | 
       | If speed is touted as the advantage for QUIC and it is in fact
       | slower, why bother with this protocol ? The author of the PR
       | itself attributes some of the speed issues to the protocol
       | design. Are there other problems in TCP that need fixing ?
        
         | morning-coffee wrote:
         | That's just one bottleneck. The other issue is head-of-line
         | blocking. When there is packet loss on a TCP connection,
         | nothing sent after that is delivered until the loss is
         | repaired.
        
           | anonymousiam wrote:
           | TCP windowing fixes the issue you are describing. Make the
           | window big and TCP will keep sending when there is a packet
           | loss. It will also retry and usually recover before the end
           | of the window is reached.
           | 
           | https://en.wikipedia.org/wiki/TCP_window_scale_option
        
             | quietbritishjim wrote:
             | The statement in the comment you're replying to is still
             | true. While waiting for those missed packets, the later
             | packets will not be dropped if you have a large window
             | size. But they won't be delivered either. They'll be cached
             | in the kennel, even though it may be that the application
             | could make use of them before the earlier blocked packet.
        
             | morning-coffee wrote:
             | They are unrelated. Larger windows help achieve higher
             | throughput over paths with high delay. You allude to
             | selective acknowledgements as a way to repair loss before
             | the window completely drains which is true, but my point is
             | that no data can be _delivered_ to the application until
             | the loss is repaired (and that repair takes at least a
             | round-trip time). (Then the follow-on effects from noticed
             | loss on the congestion controller can limit subsequent in-
             | flight data for a time, etc, etc.)
        
             | Twirrim wrote:
             | The queuing discipline used by default (pfifo_fast) is
             | barely more than 3 FIFO queues bundled together. The 3
             | queues allow for a barest minimum semblance of
             | prioritisation of traffic, where Queue 0 > 1 > 2, and you
             | can tweak some tcp parameters to have your traffic land in
             | certain queues. If there's something in queue 0 it must be
             | processed first _before_ anything in queue 1 gets touched
             | etc.
             | 
             | Those queues operate purely head-of-queue basis. If what is
             | at the top of the queue 0 is blocked in any way, the whole
             | queue behind it gets stuck, regardless of if it is talking
             | to the same destination, or a completely different one.
             | 
             | I've seen situations where a glitching network card caused
             | some serious knock on impacts across a whole cluster,
             | because the card would hang or packets would drop, and that
             | would end up blocking the qdisc on a completely healthy
             | host that was in the middle of talking to it, which would
             | have impacts on any other host that happened to be talking
             | to that _healthy_ host. A tiny glitch caused much wider
             | impacts than you 'd expect.
             | 
             | The same kind of effect would happen from a VM that went
             | through live migration. The tiny, brief pause would cause a
             | spike of latency all over the place.
             | 
             | There are classful alternatives like fq_codel that can be
             | used, that can mitigate some fo this, but you do have to
             | pay a small amount of processing overhead on every packet,
             | because now you have a queuing discipline that actually
             | needs to track some semblance of state.
        
           | another_twist wrote:
           | Whats the packet loss rate on modern networks ? Curious.
        
             | reliablereason wrote:
             | That depends on how much data you are pushing. if you are
             | pushing 200 mb on a 100mb line you will get 50% packet
             | loss.
        
               | spwa4 wrote:
               | Well, yes, that's the idea behind TCP itself, but a
               | "normal" rate of packet loss is something along the lines
               | of 5/100k packets dropped on any given long-haul link.
               | Let's say a random packet passes about 8 such links, so a
               | "normal" rate of packet loss is 0.025% or so.
        
             | wmf wrote:
             | It can be high on cellular.
        
             | adgjlsfhk1 wrote:
             | ~80% when you step out of wifi range on your cell phone.
        
         | jauntywundrkind wrote:
         | The article discusses many of the reasons QUIC is currently
         | slower. Most of them seem to come down to "we haven't done any
         | optimization for this yet".
         | 
         | > _Long offers some potential reasons for this difference,
         | including the lack of segmentation offload support on the QUIC
         | side, an extra data copy in transmission path, and the
         | encryption required for the QUIC headers._
         | 
         | All of these three reasons seem potentially very addressable.
         | 
         | It's worth noting that the benchmark here is on pristine
         | network conditions, a drag race if you will. If you are on
         | mobile, your network will have a lot more variability, and
         | there TCP's design limits are going to become much more
         | apparent.
         | 
         | TCP itself often has protocols run on top of it, to do QUIC
         | like things. HTTP/2 is an example of this. So when you compare
         | QUIC and TCP, it's kind of like comparing how fast a car goes
         | with how fast an engine bolted to a frame with wheels on it
         | goes. QUIC goes significantly up the OSI network stack, is
         | layer 5+, where-as TCP+TLS is layer 3. Thats less system
         | design.
         | 
         | QUIC also has wins for connecting faster, and especially for
         | reconnecting faster. It also has IP mobility: if you're on
         | mobile and your IP address changes (happens!) QUIC can keep the
         | session going without rebuilding it once the client sends the
         | next packet.
         | 
         | It's a fantastically well thought out & awesome advancement,
         | radically better in so many ways. The advantages of having
         | multiple non-blocking streams (alike SCTP) massively reduces
         | the scope that higher level protocol design has to take on. And
         | all that multi-streaming stuff being in the kernel means it's
         | deeply optimizable in a way TCP can never enjoy.
         | 
         | Time to stop driving the old rust bucket jalopy of TCP around
         | everywhere, crafting weird elaborate handmade shit atop it. We
         | need a somewhat better starting place for higher level
         | protocols and man oh man is QUIC alluring.
        
           | redleader55 wrote:
           | > QUIC goes significantly up the OSI network stack, is layer
           | 5+, where-as TCP+TLS is layer 3
           | 
           | IP is layer 3 - network(ensures packets are routed to the
           | correct host). TCP is layer 4 - transport(some people argue
           | that TCP has functions from layer 5 - eg. establishing
           | sessions between apps), while TLS adds a few functions from
           | layer 6(eg. encryption), which QUIC also has.
        
           | w3ll_w3ll_w3ll wrote:
           | TCP is level 4 in the OSI model
        
         | frmdstryr wrote:
         | The "advantage" is tracking via the server provided connection
         | ID
         | https://www.cse.wustl.edu/~jain/cse570-21/ftp/quic/index.htm...
        
       | gethly wrote:
       | I've been hearing about QUIC for ages, yet it is still an obscure
       | tech and will likely end up like IPv6.
        
         | gfody wrote:
         | isn't it just http3 now?
        
         | tralarpa wrote:
         | Your browser is using it when you watch a video on youtube
         | (HTTP/3).
        
         | Jtsummers wrote:
         | QUIC is already out and in active use. Every major web browser
         | supports it, and it's not like IPv6. There's no fundamental
         | infrastructure change needed to support it since it's built on
         | top of UDP. The end points obviously have to support it, but
         | that's the same as any other protocol built on UDP or TCP (like
         | HTTP, SNMP, etc.).
        
         | rstuart4133 wrote:
         | > yet it is still an obscure tech and will likely end up like
         | IPv6.
         | 
         | Probably. According to Google, IPv6 has a measly 46% of
         | internet traffic now [0], and growing at about 5% per year.
         | QUIC is 40% of Chrome traffic, and is growing at 5% every two
         | years [1]. So yeah, their fates do look similar, which is to
         | say both are headed for world domination in a couple of
         | decades.
         | 
         | [0] https://dnsmadeeasy.com/resources/the-state-of-
         | ipv6-adoption...
         | 
         | [1] https://www.cellstream.com/2025/02/14/an-update-on-quic-
         | adop...
        
           | gethly wrote:
           | When you remove IoT, those numbers will look very
           | differently.
        
       ___________________________________________________________________
       (page generated 2025-07-31 23:00 UTC)