[HN Gopher] QUIC is not quick enough over fast internet
       ___________________________________________________________________
        
       QUIC is not quick enough over fast internet
        
       Author : Shank
       Score  : 619 points
       Date   : 2024-09-09 02:34 UTC (20 hours ago)
        
 (HTM) web link (dl.acm.org)
 (TXT) w3m dump (dl.acm.org)
        
       | jacob019 wrote:
       | Maybe moving the connection protocol into userspace isn't such a
       | great plan.
        
         | foota wrote:
         | I don't have access to the article, but they're saying the
         | issue is due to client side ack processing. I suspect they're
         | testing at bandwidths far beyond what's normal for consumer
         | applications.
        
           | dartharva wrote:
           | It's available on arxiv and nope, they are testing mostly for
           | regular 4G/5G speeds.
           | 
           | https://arxiv.org/pdf/2310.09423
        
             | DannyBee wrote:
             | 4g tops out at 1gbps only when one person is on the
             | network. 5g tops out at ~10gbps (some 20gbps i guess) only
             | when one person is on the network.
             | 
             | They are testing at 1gbps.
             | 
             | This is not regular 4g speed for sure, and it's a rare 5g
             | speed. regular 5g speed is (in the US) 40-50mbps, so, 20x
             | slower than they are testing.
        
               | dartharva wrote:
               | Still won't be beyond normal consumer applications'
               | capacity, right?
        
               | DannyBee wrote:
               | correct
        
               | izend wrote:
               | What about 1gbps fiber at home, it is becoming common in
               | Canada. I have 1gbps up/down.
        
               | DannyBee wrote:
               | This would affect that.
               | 
               | As said, i was only replying to the claim that this
               | affects things at 4g/5g cell phone speeds, which it
               | clearly does not, by their own data.
        
               | vrighter wrote:
               | Gigabit fiber internet is quite cheap and increasingly
               | available (I'm not from the US). I don't just use the
               | internet over a 4/5g connection. This definitely affects
               | more people than you think.
        
               | DannyBee wrote:
               | I think it affects lots of people.
               | 
               | I have 5gbps internet at home myself.
               | 
               | But that is not what i was replying to. I was replying to
               | the claim that this affects regular 4g/5g cell phone
               | speeds. The data is clear that it does not.
        
               | KaiserPro wrote:
               | Http1.1 has been around for 28 years. At the time,
               | gigabit ethernet was _expensive_. 9600baud on mobile was
               | rare.
               | 
               | and yet http1.1 runs on gigabit networks pretty well.
        
             | yencabulator wrote:
             | Your 5G has 0.23ms ping to the average webserver?
        
           | spacebacon wrote:
           | See arXiv link in comments.
        
         | 01HNNWZ0MV43FF wrote:
         | Does QUIC mandate that, or is that just the stepping stone
         | until the chicken-and-egg problem is solved and we get kernel
         | support?
        
           | wmf wrote:
           | On mobile the plan is to never use kernel support so that
           | apps can have the latest QUIC on old kernels.
        
           | vlovich123 wrote:
           | As others in the thread summarized the paper as saying the
           | issue is ack offload. That has nothing to do with whether the
           | stack is in kernel space or user space. Indeed there's some
           | concern about this inevitable scenario because the kernel is
           | so slow moving, updates take much longer to propagate to
           | applications needing them without a middle ground whereas as
           | user space stacks they can update as the endpoint
           | applications need them to.
        
           | kmeisthax wrote:
           | No, but it depends on how QUIC works, how Ethernet hardware
           | works, and how much you actually want to offload to the NIC.
           | For example, QUIC has TLS encryption built-in, so anything
           | that's encrypted can't be offloaded. And I don't think most
           | people want to hand all their TLS keys to their NIC[0].
           | 
           | At the very least you probably would have to assign QUIC its
           | own transport, rather than using UDP as "we have raw sockets
           | at home". Problem is, only TCP and UDP reliably traverse the
           | Internet[1]. _Everything_ in the middle is sniffing traffic,
           | messing with options, etc. In fact, Google rejected an
           | alternate transport protocol called SCTP (which does all the
           | stream multiplexing over a single connection that QUIC does)
           | specifically because, among other things, SCTP 's a transport
           | protocol and middleboxes choke on it.
           | 
           | [0] I am aware that "SSL accelerators" used to do exactly
           | this, but in modern times we have perfectly good crypto
           | accelerators right in our CPU cores.
           | 
           | [1] ICMP sometimes traverses the internet, it's how ping
           | works, but a lot of firewalls blackhole ICMP. Or at least
           | they did before IPv6 made it practically mandatory to forward
           | ICMP packets.
        
             | _flux wrote:
             | I don't think passing just the session keys to NIC would
             | sound so perilous, though.
        
             | justinphelps wrote:
             | SCTP had already solved the problem that QUIC proposes to
             | solve. Google of all companies has the influence to
             | properly implement and accommodate other L4 protocols. QUIC
             | seems like doubling down on a hack and breaks the elegance
             | of OSI model.
        
               | tepmoc wrote:
               | SCTP still have some donwsides it has to resolve
               | https://http3-explained.haxx.se/en/why-quic/why-
               | tcpudp#why-n...
               | 
               | Plus we need happy eyeballs for transport if SCTP run
               | over IP and not encapuslated
               | https://datatracker.ietf.org/doc/html/draft-grinnemo-
               | taps-he
               | 
               | But IPv4 pretty much non-workable since most end-users
               | behind NAT and there no known implementation to work
               | around that.
        
         | kccqzy wrote:
         | The flexibility and ease of changing a userspace protocol IMO
         | far outweighs anything else. If the performance problem
         | described in this article (which I don't have access to) is in
         | userspace QUIC code, it can be fixed and deployed very quickly.
         | If similar performance issue were to be found in TCP, expect to
         | wait multiple years.
        
           | vrighter wrote:
           | Well, the problem is probably that it is in userspace in the
           | first place.
        
         | mrweasel wrote:
         | Maybe moving the entire application to the browser/cloud wasn't
         | the best idea for a large number of use cases?
         | 
         | Video streaming, sure, but we're already able to stream 4K
         | video over a 25Mbit line. With modern internet connections
         | being 200Mbit to 1Gbit, I don't see that we need the bandwidth
         | in private homes. Maybe for video conferencing in large
         | companies, but that also doesn't need to be 4K.
         | 
         | The underlying internet protocols are old, so there's no harm
         | in assessing if they've outlived their usefulness. However, we
         | should also consider in web applications and "always connected"
         | is truly the best solution for our day to day application
         | needs.
        
           | kuschku wrote:
           | > With modern internet connections being 200Mbit to 1Gbit, I
           | don't see that we need the bandwidth in private homes
           | 
           | Private connections tend to be asymmetrical. In some cases,
           | e.g. old DOCSIS versions, that used to be due to technical
           | necessity.
           | 
           | Private connections tend to be unstable, the bandwidth
           | fluctuates quite a bit. Depending on country, the actually
           | guaranteed bandwidth is somewhere between half of what's on
           | the sticker, to nothing at all.
           | 
           | Private connections are usually used by families, with
           | multiple people using it at the same time. In recent years,
           | you might have 3+ family members in a video call at the same
           | time.
           | 
           | So if you're paying for a 1000/50 line (as is common with
           | DOCSIS deployments), what you're actually getting is usually
           | a 400/20 line that sometimes achieves more. And those 20Mbps
           | upload are now split between multiple people.
           | 
           | At the same time, you're absolutely right - Gigabit is enough
           | for most people. Download speeds are enough for quite a
           | while. We should instead be increasing upload speeds and
           | deploying FTTH and IPv6 everywhere to reduce the latency.
        
             | throwaway2037 wrote:
             | This is a great post. I often forget that home Internet
             | connections are frequently shared between many people.
             | 
             | This bit:                   > IPv6 everywhere to reduce the
             | latency
             | 
             | I am not an expert on IPv4 vs IPv6. Teach me: How will
             | migrating to IPv6 reduce latency? As I understand, a lot of
             | home Internet connections are always effectively IPv6 via
             | CarrierNAT. (Am I wrong? Or not relevant to your point?)
        
               | kuschku wrote:
               | IPv4 routing is more complicated, especially with
               | multiple levels of NAT applied.
               | 
               | Google has measured to most customers about 20ms less
               | latency on IPv6 than on IPv4, according to their IPv6
               | report.
        
               | simoncion wrote:
               | > Google has measured to most customers about 20ms less
               | latency on IPv6 than on IPv4, according to their IPv6
               | report.
               | 
               | I've run that comparison across four ISPs and never seen
               | any significant difference in latency... not once in the
               | decades I've had "dual stack" service.
               | 
               | I imagine that Google is getting confounded by folks with
               | godawful middle/"security"ware that is too stupid to know
               | how to handle IPv6 traffic and just passes it through.
        
           | throwaway2037 wrote:
           | Overall, I like your post very much.                   > but
           | that also doesn't need to be 4K.
           | 
           | Here, I would say "need" is a strong term. Surely, you are
           | correct at the most basic level, but if the bandwidth exists,
           | then _some_ streaming platforms will use it. Deeper question:
           | Is there any practical use case for Internet connections
           | about 1Gbit? I struggle to think of any. Yes, I can
           | understand that people may wish to reduce latency, but I don
           | 't think home users need any more bandwidth at this point. I
           | am astonished when I read about 10Gbit home Internet access
           | in Switzerland, Japan, and Korea.
           | 
           | Zero trolling: Can you help me to better understand your last
           | sentence?                   > However, we should also
           | consider in web applications and "always connected" is truly
           | the best solution for our day to day application needs.
           | 
           | I cannot tell if this is written with sarcasm. Let me ask
           | more directly: Do you think it is a good design for our
           | modern apps to always be connected or not? Honestly, I don't
           | have a strong opinion on the matter, but I am interested to
           | hear your opinion.
        
             | mrweasel wrote:
             | Generally speaking I think we should aim for offline first,
             | always. Obvious things like Teams or Slack requires an
             | internet connection to work, but assuming a working
             | internet connection shouldn't even be a requirement for a
             | web browser.
             | 
             | I think it is bad design to expect a working internet
             | connection, because in many places your can't expect
             | bandwidth be cheap, or the connection to be stable. That's
             | not to say that something like Google Docs (others seems to
             | like it, but everyone in my company thinks it's awful)
             | should be a thing, there's certainly value in the real time
             | collaboration features, but it should be able to function
             | without an internet connection.
             | 
             | Last week someone was complaining about the S3 (sleep)
             | feature on laptops, and one thing that came to my mind is
             | that despite these being portable, we somehow expect them
             | to be always connected to the internet. That just seems
             | like a somewhat broken mindset to me.
        
               | surajrmal wrote:
               | Note that in deeper sleep states you typically see more
               | aggressive limiting of what interrupts can take you out
               | of the sleep state. Turning off network card interrupts
               | is common.
        
         | simiones wrote:
         | The problem is that the biggest win by far with QUIC is merging
         | encryption and session negotiation into a single packet, and
         | the kernel teams have been adamant about not wanting to
         | maintain encryption libraries in kernel. So, QUIC or any other
         | protocol like it being in kernel is basically a non-starter.
        
         | suprjami wrote:
         | Looking forward to QUIC in the kernel. Linux already has kTLS.
        
       | JoshTriplett wrote:
       | Seems to be available on arXiv: https://arxiv.org/pdf/2310.09423
        
         | Tempest1981 wrote:
         | The page headings say "Conference'17, July 2017" -- why is
         | that?
         | 
         | Although the sidebar on page 1 shows "13 Oct 2023".
        
           | mrngm wrote:
           | It's likely the authors used an existing conference template
           | to fit in their paper's contents. Upon sending it to the
           | conference, the editors can easily fit the contents in their
           | prescribed format, and the authors know how many characters
           | they can fit in the page limit.
           | 
           | arXiv typically contains pre-prints of papers. These may not
           | have been peer-reviewed, and the contents may not reflect the
           | actual "published" paper that was accepted (and/or corrected
           | after peer review) to a conference or journal.
           | 
           | arXiv applies a watermark to the submitted PDF such that
           | different versions are distinguishable on download.
        
       | JoshTriplett wrote:
       | In the early days of QUIC, many people pointed out that the UDP
       | stack has had far far less optimization put into it than the TCP
       | stack. Sure enough, some of the issues identified here arise
       | because the UDP stack isn't doing things that it _could_ do but
       | that nobody has been motivated to make it do, such as UDP generic
       | receive offload. Papers like this are very likely to lead to
       | optimizations both obvious and subtle.
        
         | Animats wrote:
         | What is UDP offload going to _do_? UDP barely does anything but
         | queue and copy.
         | 
         | Linux scheduling from packet-received to thread has control is
         | not real-time, and if the CPUs are busy, may be rather slow.
         | That's probably part of the bottleneck.
         | 
         | The embarrassing thing is that QUIC, even in Google's own
         | benchmarks, only improved performance by about 10%. The added
         | complexity probably isn't worth the trouble. However, it gave
         | Google control of more of the stack, which may have been the
         | real motivation.
        
           | Vecr wrote:
           | I think one of the original drivers was the ability to
           | quickly tweak parameters, after Linux rejected what I think
           | was userspace adjustment of window sizing to be more
           | aggressive than the default.
           | 
           | The Linux maintainers didn't want to be responsible for
           | congestion collapse, but UDP lets you spray packets from
           | userspace, so Google went with that.
        
           | JoshTriplett wrote:
           | Among other things, GRO (receive offloading) means you can
           | get more data off of the network card in fewer operations.
           | 
           | Linux has receive packet steering, which can help with
           | getting packets from the network card to the right CPU and
           | the right userspace thread without moving from one CPU's
           | cache to another.
        
             | suprjami wrote:
             | RPS is just software RSS.
             | 
             | You mean Receive Flow Steering, and RFS can only control
             | RPS, so to do it in hardware you actually mean Accelerated
             | RFS (which requires a pretty fancy NIC these days).
             | 
             | Even ignoring the hardware requirement, unfortunately it's
             | not that simple. I find results vary wildly whether you
             | should put process and softirq on the same CPU core
             | (sharing L1 and L2) or just on the same CPU socket (sharing
             | L3 but don't constantly blow out L1/L2).
             | 
             | Eric Dumazet said years ago at a Netdev.conf that L1 cache
             | sizes have really not kept up with reality. That matches my
             | experience.
             | 
             | QUIC doing so much in userspace adds another class of
             | application which has a so-far uncommon design pattern.
             | 
             | I don't think it's possible to say whether any QUIC
             | application benefits from RFS or not.
        
           | infogulch wrote:
           | In my head the main benefit of QUIC was always multipath, aka
           | the ability to switch interfaces on demand without losing the
           | connection. There's MPTCP but who knows how viable it is.
        
             | rocqua wrote:
             | Mptcp sees use in the Telco space, so they probably know.
        
             | modeless wrote:
             | Is that actually implemented and working in practice? My
             | connection still hangs whenever my wifi goes out of
             | range...
        
             | Sesse__ wrote:
             | Apple's Siri is using MPTCP, so it is presumably viable.
        
               | jshier wrote:
               | It requires explicit backend support, and Apple supports
               | it for many of their services, but I've never seen
               | another public API that does. Anyone have any examples?
        
               | mh- wrote:
               | Last I looked into this (many years), ELB/GLBs didn't
               | support it on AWS/GCP respectively. That prevented us
               | from further considering implementing it at the time
               | (mobile app -> AWS-hosted EC2 instances behind an ELB).
               | 
               | Not sure if that's changed, but at the time it wasn't
               | worth having to consider rolling our own LBs.
               | 
               | To answer your original question, no, I haven't
               | (knowingly) seen it on any public APIs.
        
             | suprjami wrote:
             | I always thought the main benefit of QUIC was to encrypt
             | the important part of the transport header, so endpoints
             | control their own destiny, not some middle device.
             | 
             | If I had a dollar for every firewall vendor who thought
             | dropping TCP retransmissions or TCP Reset was a good
             | idea...
        
           | amluto wrote:
           | Last I looked (several months ago), Linux's UDP stack did not
           | seemed well tuned in its memory management accounting.
           | 
           | For background, the mental model of what receiving network
           | data looks like in userspace is almost completely backwards
           | compared to how general-purpose kernel network receive
           | actually works. User code thinks it allocates a buffer (per-
           | socket or perhaps a fancier io_uring scheme), then receives
           | packets into that buffer, then processes them.
           | 
           | The kernel is the other way around. The kernel allocates
           | buffers and feeds pointers to those buffers to the NIC. The
           | NIC receives packets and DMAs them into the buffers, then
           | tells the kernel. But the NIC and the kernel have _absolutely
           | no concept_ of which socket those buffers belong to until
           | _after_ they are DMAed into the buffers. So the kernel cannot
           | possibly map received packets to the actual recipient 's
           | memory. So instead, after identifying who owns a received
           | packet, the kernel retroactively charges the recipient for
           | the memory. This happens on a per-packet basis, it involves
           | per-socket _and_ cgroup accounting, and there is no support
           | for having a socket  "pre-allocate" this memory in advance of
           | receiving a packet. So the accounting is gnarly, involves
           | atomic operations, and seems quite unlikely to win any speed
           | awards. On a very cursory inspection, the TCP code seemed
           | better tuned, and it possibly also won by generally handling
           | more bytes per operation.
           | 
           | Keep in mind that the kernel _can 't_ copy data to
           | application memory synchronously -- the application memory
           | might be paged out when a packet shows up. So instead the
           | whole charging dance above happens immediately when a packet
           | is received, and the data is copied later on.
           | 
           | For quite a long time, I've thought it would be nifty if
           | there was a NIC that kept received data in its own RAM and
           | then allowed it to be efficiently DMAed to application memory
           | when the application was ready for it. In essence, a lot of
           | the accounting and memory management logic could move out of
           | the kernel into the NIC. I'm not aware of anyone doing this.
        
             | fragmede wrote:
             | RDMA is common for high performance applications but it
             | doesn't work over the Internet.
        
               | Danieru wrote:
               | It's a good thing the NIC is connected over pcie then.
        
               | shaklee3 wrote:
               | You can do GPUdirect over the Internet without RDMA
               | though.
        
               | jpgvm wrote:
               | GPUDirect relies on the PeerDirect extensions for RDMA
               | and are thus an extension to the RDMA verbs, not a
               | separate an independent thing that works without RDMA.
        
               | shaklee3 wrote:
               | Again, you can do what I said. You may be using different
               | terminology, but you can do GPUdirect in dpdk without
               | rdma
        
               | throw0101c wrote:
               | > _RDMA is common for high performance applications but
               | it doesn 't work over the Internet._
               | 
               | RoCEv2 is routable.
               | 
               | * https://en.wikipedia.org/wiki/RDMA_over_Converged_Ether
               | net
               | 
               | * https://docs.nvidia.com/networking/display/winofv550530
               | 00/ro...
               | 
               | Of course you're going to get horrible latency because of
               | speed-of-light limitations, so the definition of "work"
               | may be weak, but data should be able to be transmitted.
        
             | JoshTriplett wrote:
             | > For quite a long time, I've thought it would be nifty if
             | there was a NIC that kept received data in its own RAM and
             | then allowed it to be efficiently DMAed to application
             | memory when the application was ready for it.
             | 
             | I wonder if we could do a more advanced version of receive-
             | packet steering that sufficiently identifies packets as
             | _definitely_ for a given process and DMAs them directly to
             | that process 's pre-provided buffers for later
             | notification? In particular, can we offload enough
             | information to a smart NIC that it can identify where
             | something should be DMAed to?
        
               | amluto wrote:
               | I don't think the result would be compatible with the
               | socket or io_uring API, but maybe io_uring could be
               | extended a bit. Basically the kernel would
               | opportunistically program a "flow director" or similar
               | rule to send packets to special rx queue, and that queue
               | would point to (pinned) application memory. Getting this
               | to be compatible with iptables/nftables would be a mess
               | or maybe entirely impossible.
               | 
               | I've never seen the accelerated steering stuff work well
               | in practice, sadly. The code is messy, the diagnostics
               | are basically nonexistent, and it's not clear to me that
               | many drivers support it well.
        
               | mgaunard wrote:
               | Most advanced NICs support flow steering, which makes the
               | NIC write to different buffers depending on the target
               | port.
               | 
               | In practice though, you only have a limited amount of
               | these buffers, and it causes complications if multiple
               | processes need to consume the same multicast.
        
               | eptcyka wrote:
               | Multicast may well be shitcanned to an expensive slow
               | path, given that multicast is rarely used for high
               | bandwidth scenarios, especially when multiple processes
               | need to receive the same packet.
        
               | lokar wrote:
               | The main real use of multicast I've seen is pretty high
               | packet rate. High frequency traders get multicast feeds
               | of tick data from the exchange.
        
               | mgaunard wrote:
               | multicast is precisely used for low-latency high-
               | throughput message buses.
        
               | eptcyka wrote:
               | With multiple processes listening for the data? I think
               | that's a market niche. In terms of billions of devices,
               | multicast is mostly used for zero-config service
               | discovery. I am not saying there isn't a market for high-
               | bandwidth multicast, I am stating that for the vast
               | majority of software deployments, multi-cast performance
               | is not an issue. For whatever deployments it is an issue,
               | they can specialize. And, as in the sibling comment
               | mentions, people who need breakneck speeds have already
               | proven that they can create a market for themselves.
        
               | mgaunard wrote:
               | That's not a market niche, that's the normal mode of
               | operation of a message bus.
               | 
               | The cloud doesn't implement multicast, but that doesn't
               | mean it doesn't get used by people that build non-
               | Internet networks and applications.
        
             | derefr wrote:
             | Presuming that this is a server that has One (public) Job,
             | couldn't you:
             | 
             | 1. dedicate a NIC to the application;
             | 
             | 2. and have the userland app open a packet socket against
             | the NIC, to drink from its firehose through MMIO against
             | the kernel's own NIC DMA buffer;
             | 
             | ...all without involving the kernel TCP/IP (or in this
             | case, UDP/IP) stack, and any of the accounting logic
             | squirreled away in there?
             | 
             | (You _can_ also throw in a BPF filter here, to drop
             | everything except UDP packets with the expected specified
             | ip:port -- but if you 're already doing more packet
             | validation at the app level, you may as well just take the
             | whole firehose of packets and validate them for being
             | targeted at the app at the same time that they're validated
             | for their L7 structure.)
        
               | amluto wrote:
               | I think DPDK does something like this. The NIC is
               | programmed to aim the packets in question at a specific
               | hardware receive queue, and that queue is entirely owned
               | by a userspace program.
               | 
               | A lot of high end NICs support moderately complex receive
               | queue selection rules.
        
               | SSLy wrote:
               | > _1. dedicate a NIC to the application;_
               | 
               | you need to respond to ICMPs which have different
               | proto/header number than UDP or TCP.
        
             | veber-alex wrote:
             | Have you looked into NVIDIA VMA?
             | 
             | https://docs.nvidia.com/networking/display/vmav9860/introdu
             | c...
        
             | rkagerer wrote:
             | Why don't we eliminate the initial step of an app reserving
             | a buffer, keep each packet in its own buffer, and once the
             | socket it belongs to is identified hand a pointer and
             | ownership of that buffer back to the app? If buffers can be
             | of fixed (max) size, you could still allow the NIC to fill
             | a bunch of them in one go.
        
           | apitman wrote:
           | Ditching head of line blocking is potentially a big win, but
           | I really wish it wouldn't have come with so much complexity.
        
           | raggi wrote:
           | UDP offload gets you implicitly today:
           | 
           | - 64 packets per syscall, which is enough data to amortize
           | the syscall overhead - a single packet is not.
           | 
           | - UDP offload optionally lets you defer checksum computation,
           | often offloading it to hardware.
           | 
           | - UDP offload lets you skip/reuse route lookups for
           | subsequent packets in a bundle.
           | 
           | What UDP offload is no good for though, is large scale
           | servers - the current APIs only work when the incoming packet
           | chains neatly organize into batches per peer socket. If you
           | have many thousands of active sockets you'll stop having full
           | bundles and the overhead starts sneaking back in. As I said
           | in another thread, we really need a replacement for the BSD
           | APIs here, they just don't scale for modern hardware
           | constraints and software needs - much too expensive per
           | packet.
        
           | 10000truths wrote:
           | Bulk throughout isn't on par with TLS mainly because NICs
           | with dedicated hardware for QUIC offload aren't commercially
           | available (yet). Latency is undoubtedly better - the 1-RTT
           | QUIC handshake substantially reduces time-to-first-byte
           | compared to TLS.
        
           | majke wrote:
           | > What is UDP offload going to do?
           | 
           | Handling ACK packets in kernelspace would be one thing -
           | helping for example RTT estimation. With userspace stack
           | ACK's are handled in application and are subject to
           | scheduler, suffering a lot on a loaded system.
        
             | morning-coffee wrote:
             | There are no ACKs inherent in the _UDP_ protocol, so  "UDP
             | offload" is not where the savings are. There are ACKs in
             | the _QUIC_ protocol and they are carried by UDP datagrams
             | which need to make their way up to user land to be
             | processed, and this is the crux of the issue.
             | 
             | What is needed is for _QUIC offload_ to be invented
             | /supported by HW so that most of the high-frequency/tiny-
             | packet processing happens there, just as it does today for
             | _TCP offload_. TCP large-send and large-receive offload is
             | what is responsible for all the CPU savings as the
             | application deals in 64KB or larger send /receives and the
             | segmentation and receive coalescing all happen in hardware
             | before an interrupt is even generated to involve the
             | kernel, let alone userland.
        
         | RachelF wrote:
         | Also bear in mind that many of today's network cards have
         | processors in them that handle much of the TCP/IP overhead.
        
           | kccqzy wrote:
           | That's mostly still for the data center. Which end-user
           | network cards that I can buy can do TCP offloading?
        
             | phil21 wrote:
             | Unless I'm missing something here, pretty much any Intel
             | nic released in the past decade should support tcp offload.
             | I imagine the same is true for Broadcom and other vendors
             | as well, but I don't have something handy to check.
        
             | JoshTriplett wrote:
             | Some wifi cards offload a surprising amount in order to do
             | wake-on-wireless, but that's not for performance.
        
             | throw0101c wrote:
             | > _Which end-user network cards that I can buy can do TCP
             | offloading?_
             | 
             | Intel's I210 controllers support offloading:
             | 
             | > _Other performance-enhancing features include IPv4 and
             | IPv6 checksum offload, TCP /UDP checksum offload, extended
             | Tx descriptors for more offload capabilities, up to 256 KB
             | TCP segmentation (TSO v2), header splitting, 40 KB packet
             | buffer size, and 9.5 KB Jumbo Frame support._
             | 
             | * https://cdrdv2-public.intel.com/327935/327935-Intel%20Eth
             | ern...
             | 
             | And cost US$ 22:
             | 
             | * https://www.amazon.com/dp/B0728289M7/
        
             | vel0city wrote:
             | Practically every on-board network adapter I've had for
             | over a decade has had TCP offload support. Even the network
             | adapter on my cheap $300 Walmart laptop has hardware TCP
             | offload support.
        
             | suprjami wrote:
             | All of them. You'd be hard pressed to buy a new NIC which
             | _doesn 't_ have a raft of protocol offloads.
             | 
             | Even those garbage tg3 things from 1999 that OEMs are still
             | putting onboard of enterprise servers have some TCP offload
             | capability.
        
           | dilyevsky wrote:
           | Not just that but TLS too. Starting ConnectX-5 i think you
           | can push kTLS down to nic. Dont think there's a QUIC
           | equivalent for this
        
         | nextaccountic wrote:
         | Do you mean that under the same workload, TCP will perform
         | better?
        
         | skywhopper wrote:
         | The whole reason QUIC even exists in user space is because its
         | developers were trying to hack a quick speed-up to HTTP rather
         | than actually do the work to improve the underlying networking
         | fundamentals. In this case the practicalities seem to have
         | caught them out.
         | 
         | If you want to build a better TCP, do it. But hacking one in on
         | top of UDP was a cheat that didn't pay off. Well, assuming
         | performance was even the actual goal.
        
           | osmarks wrote:
           | They couldn't have built it on anything but UDP because the
           | world is now filled with poorly designed firewall/NAT
           | middleboxes which will not route things other than TCP, UDP
           | and optimistically ICMP.
        
           | kbolino wrote:
           | It already exists, it's called SCTP. It doesn't work over the
           | Internet because there's too much crufty hardware in the
           | middle that will drop it instead of routing it. Also,
           | Microsoft refused to implement it in Windows and also banned
           | raw sockets so it's impossible to get support for it on that
           | platform without custom drivers that practically nobody will
           | install.
           | 
           | I don't know how familiar the developers of QUIC were with
           | SCTP in particular but they were definitely aware of the
           | problems that prevented a better TCP from existing. The only
           | practical solution is to build something on top of UDP, but
           | if even that option proves unworkable, then the only other
           | possibility left is to fragment the Internet.
        
             | suprjami wrote:
             | I like (some aspects of) SCTP too but it's not a solution
             | to this problem.
             | 
             | If you've followed Dave Taht's bufferbloat stuff, the
             | reason he lost faith in TCP is because middle devices have
             | access to the TCP header and can interfere with it.
             | 
             | If SCTP got popular, then middle devices would ruin SCTP in
             | the same way.
             | 
             | QUIC is the bufferbloat preferred solution because the
             | header is encrypted. It's not possible for a middle device
             | to interfere with QUIC. Endpoints, and only endpoints,
             | control their own traffic.
        
           | adgjlsfhk1 wrote:
           | counterpoint, it is paying off, just taking a while. this
           | paper wasn't "quick is bad" it was "OSes need more
           | optimization for quick to be as fast as https"
        
             | guappa wrote:
             | The whole point of the project was for it to be faster
             | without touching the OS...
        
               | adgjlsfhk1 wrote:
               | I think this is slightly wrong. the goal was faster
               | without requiring the OS/middleware support. optimizing
               | the OSes that need high performance is much easier since
               | that's a much smaller set of OSes (basically just
               | Linux/Mac/windows)
        
           | IshKebab wrote:
           | Yeah they probably wanted a protocol that would actually work
           | on the wild internet with real firewalls and routers and
           | whatnot. The only option if you want that is building on top
           | of UDP or TCP and you obviously can't use TCP.
        
         | morning-coffee wrote:
         | The UDP optimizations are already there and have been pretty
         | much wrung out.
         | 
         | https://www.fastly.com/blog/measuring-quic-vs-tcp-computatio...
         | has good details and was done almost five years ago.
         | 
         | The solution isn't in more _UDP_ offload optimizations as there
         | aren 't any semantics in UDP that are expensive other than the
         | quantity and frequency of datagrams to be processed in the
         | context of the QUIC protocol that uses UDP as a transport.
         | QUIC's state machine needs to see every UDP datagram carrying
         | QUIC protocol messages in order to move forward. Just like was
         | done for TCP offload more than twenty years ago, portions of
         | QUIC state need to move and be maintained in hardware to
         | prevent the host from having to see so many high-frequency tiny
         | state updates messages.
        
         | suprjami wrote:
         | Your first point is correct - papers ideally lead to innovation
         | and tangible software improvements.
         | 
         | I think a kernel implementation of QUIC is the next logical
         | step. A context switch to decrypt a packet header and send
         | control traffic is just dumb. That's the kernel's job.
         | 
         | Userspace network stacks have never been a good idea. QUIC is
         | no different.
         | 
         | (edit: Xin Long already has started a kernel implementation,
         | see elsewhere on this page)
        
       | mholt wrote:
       | I don't have access to the paper but based on the abstract and a
       | quick scan of the presentation, I can confirm that I have seen
       | results like this in Caddy, which enables HTTP/3 out of the box.
       | 
       | HTTP/3 implementations vary widely at the moment, and will likely
       | take another decade to optimize to homogeneity. But even then,
       | QUIC requires a _lot_ of state management that TCP doesn 't have
       | to worry about (even in the kernel). There's a ton of processing
       | involved with every UDP packet, and small MTUs, still engrained
       | into many middle boxes and even end-user machines these days,
       | don't make it any better.
       | 
       | So, yeah, as I felt about QUIC ... oh, about 6 years ago or so...
       | HTTP/2 is actually really quite good enough for most use cases.
       | The far reaches of the world and those without fast connections
       | will benefit, but the majority of global transmissions will
       | likely be best served with HTTP/2.
       | 
       | Intuitively, I consider each HTTP major version an increased
       | order of magnitude in complexity. From 1 to 2 the main
       | complexities are binary (that's debatable, since it's technically
       | simpler from an encoding standpoint), compression, and streams;
       | then with HTTP/3 there's _so, so much_ it does to make it work.
       | It _can_ be faster -- that's proven -- but only when networks are
       | slow.
       | 
       | TCP congestion control is its own worst enemy, but when networks
       | aren't congested (and with the right algorithm)... guess what.
       | It's fast! And the in-order packet transmissions (head-of-line
       | blocking) makes endpoint code so much simpler and faster. It's no
       | wonder TCP is faster these days when networks are fast.
       | 
       | I think servers should offer HTTP/3 but clients should be choosy
       | when to use it, for the sake of their own experience/performance.
        
         | altairprime wrote:
         | The performance gap is shown to be due to hardware offloading,
         | not due to congestion control, in the arxiv paper above.
        
           | vlovich123 wrote:
           | And because Quic is encrypted at a fundamental level, offload
           | likely means needing to share keys with the network card
           | which is a trust concern.
        
             | 10000truths wrote:
             | This is already how TLS offload is implemented for NICs
             | that support it. The handshake isn't offloaded, only the
             | data path. So essentially, the application performs the
             | handshake, then it calls setsockopt to convert the TCP
             | socket to a kTLS socket, then it passes the shared key, IV,
             | etc. to the kTLS socket, and the OS's network stack passes
             | those parameters to the NIC. From there, the NIC only
             | handles the bulk encryption/decryption and record
             | encapsulation/decapsulation. This approach keeps the
             | drivers' offload implementations simple, while still
             | allowing the application/OS to manage the session state.
        
               | vlovich123 wrote:
               | Sure, similar mechanisms are available but for TCP ack
               | offloading and TLS encryption/decryption offloading are
               | distinct features. With QUIC there's no separation which
               | changes the threat model. Of course the root
               | architectural problem is that this kind of stuff is part
               | of the NIC instead of an "encryption accelerator" that
               | can be requested to operate with a key ID on a RAM region
               | and then the kernel only needs to give the keys to the SE
               | (and potentially that's where they even originate instead
               | of ever living anywhere else)
        
             | jstarks wrote:
             | Your NIC can already access arbitrary RAM via DMA. It can
             | read your keys already.
        
               | altairprime wrote:
               | That is often incorrect for Apple computers, whether
               | x64+T2 or aarch64: https://support.apple.com/fr-
               | tn/guide/security/seca4960c2b5/...
               | 
               | And it's often incorrect on x64 PCs when IOMMU access is
               | appropriately segmented. See also e.g. Thunderclap:
               | https://www.ndss-symposium.org/wp-
               | content/uploads/ndss2019_0...
               | 
               | It may still be true in some cases, but it shouldn't be
               | taken for granted that it's _always_ true.
        
               | yencabulator wrote:
               | Nope. https://en.wikipedia.org/wiki/Input%E2%80%93output_
               | memory_ma...
        
         | truetraveller wrote:
         | I'd say Http1.1 is good enough for most people, especially with
         | persistent connections. Http2 is an exponential leap in
         | complexity, and burdensome/error-prone for clients to
         | implement.
        
           | 01HNNWZ0MV43FF wrote:
           | Yeah I imagine 1 + 3 being popular. 1.1 is so simple to
           | implement and WebTransport / QUIC is basically a teeny VPN
           | connection.
        
           | apitman wrote:
           | The day they come for HTTP/1.1 is the day I die on a hill.
        
         | Sparkyte wrote:
         | Agreed on this.
        
         | geocar wrote:
         | I turned off HTTP2 and HTTP3 a few months ago.
         | 
         | I see a few million daily page views: Memory usage has been
         | down, latency has been down, network accounting (bandwidth) is
         | about the same. Revenue (ads) is up.
         | 
         | > It _can_ be faster -- that's proven -- but only when networks
         | are slow.
         | 
         | It can be faster in a situation that doesn't exist.
         | 
         | It sounds charitable to say something like "when networks are
         | slow" -- but because everyone has had a slow network
         | experience, they are going to think that QUIC would help them
         | out, but _real world slow network problems_ don 't look like
         | the ones that QUIC solves.
         | 
         | In the real world, QUIC wastes memory and money and increases
         | latency on the _average case_. Maybe some Google engineers can
         | come up with a clever heuristic involving TCP options or the
         | RTT information to  "switch on QUIC selectively" but honestly I
         | wish they wouldn't bother, simply because I don't want to waste
         | my time benchmarking another half-baked google fart.
        
           | withinboredom wrote:
           | The thing is, very few people who use "your website" are on
           | slow, congested networks. The number of people who visit
           | google on a slow, congested network (airport wifi, phones at
           | conferences, etc) is way greater than that. This is a
           | protocol to solve a google problem, not a general problem or
           | even a general solution.
        
             | geocar wrote:
             | Since I buy ads on Google to my site I would argue it's
             | representative of Google's traffic.
             | 
             | But nice theory.
        
               | withinboredom wrote:
               | It's not. Think about what you search for on your mobile,
               | while out or traveling, and what you search for on
               | desktop/wifi. They are vastly different. Your traffic is
               | not representative of the majority of searches.
        
               | geocar wrote:
               | I'm sure the majority of searches are for "google" and
               | "facebook" and you're right in a way: I'm not interested
               | in those users.
               | 
               | I'm only interested in searches that advertisers are
               | interested in, but this is also where Google gets _their_
               | revenue from, so we are aligned with which users we want
               | to prioritise, so I do not understand who you possibly
               | think QUIC is for if not Google 's ad business?
        
               | withinboredom wrote:
               | That's literally what I said; the entire protocol is
               | engineered for google. Not for everyone else. 99.99% of
               | websites out there do not need it.
        
               | suprjami wrote:
               | No. Google traffic is the Google Search page, Gmail,
               | GSuite like Drive and Meet, and YouTube. You probably
               | aren't hosting those.
        
           | replete wrote:
           | It's strange to read this when you see articles like this[0]
           | and see Lighthouse ranking better with it switched on.
           | Nothing beats real world stats though. Could this be down to
           | server/client implementation of HTTP2 or would you say its a
           | fundamental implication of the design of the protocol?
           | 
           | Trying to make my sites load faster led me to experiment with
           | QUIC and ultimately I didn't trust it enough to leave it on
           | with the increase of complexity.
           | 
           | [0]: https://kiwee.eu/blog/http-3-how-it-performs-compared-
           | to-htt...
        
             | withinboredom wrote:
             | > It's strange to read this when you see articles like
             | this[0] and see Lighthouse ranking better with it switched
             | on.
             | 
             | I mean, Lighthouse is maintained by Google (IIRC), and I
             | can believe they are going to give their own protocol bonus
             | points.
             | 
             | > Could this be down to server/client implementation of
             | HTTP2 or would you say its a fundamental implication of the
             | design of the protocol?
             | 
             | For stable internet connections, you'll see http2 beat
             | http3 around 95% of the time. It's the 95th+ percentile
             | that really benefits from http3 on a stable connection.
             | 
             | If you have unstable connections, then http3 will win,
             | hands down.
        
       | sbstp wrote:
       | Even HTTP/2 seems to have been rushed[1]. Chrome has removed
       | support for server push. Maybe more thought should be put into
       | these protocols instead of just rebranding whatever Google is
       | trying to impose on us.
       | 
       | [1] https://varnish-
       | cache.org/docs/trunk/phk/h2againagainagain.h...
        
         | surajrmal wrote:
         | It's okay to make mistakes, that's how you learn and improve.
         | Being conservative has drawbacks of its own. Id argue we need
         | more parties involved earlier in the process rather than just
         | time.
        
           | zdragnar wrote:
           | It's a weird balancing act. On the other hand, waiting for
           | everyone to agree on everything means that the spec will take
           | a decade or two for everyone to come together, and then all
           | the additional time for everyone to actively support it.
           | 
           | AJAX is a decent example. Microsoft's Outlook Web Access team
           | implemented XMLHTTP as an activex thing for IE 5 and soon the
           | rest of the vendors adopted it as a standard thing as
           | XmlHttpRequest objects.
           | 
           | In fact, I suspect the list of things that exist in browsers
           | because one vendor thought it was a good idea and everyone
           | hopped on board is far, far longer than those designed by
           | committee. Often times, the initially released version is not
           | exactly the same that everyone standardized on, but they all
           | get to build on the real-world consequences of it.
           | 
           | I happen to like the TC39 process https://tc39.es/process-
           | document/ which requires two live implementations with use in
           | the wild for something to get into the final stage and become
           | an official part of the specification. It is obviously harder
           | for something like a network stack than a JavaScript engine
           | to get real world use and feedback, but it has helped to keep
           | a lot of the crazier vendor specific features at bay.
        
           | arcbyte wrote:
           | It's okay to make mistakes, but its not okay to ignore the
           | broad consensus that HTTP2 was TERRIBLY designed and then
           | admit it 10 years later as if it was unknowable. We knew it
           | was bad.
        
         | est wrote:
         | I don't blame Google, all major version changes are very brave,
         | I praised them. The problem is lack of non-google protocols for
         | competition.
        
         | KaiserPro wrote:
         | HTTP2 was a prototype that was designed by people who either
         | assumed that mobile internet would get better much quicker than
         | it did, or who didn't understand what packet loss did to
         | throughput.
         | 
         | I suspect part of the problem is that some of the rush is that
         | people at major companies will get a promotion if they do "high
         | impact" work out in the open.
         | 
         | HTTP/2 "solves head of line blocking" which is doesn't. It
         | exchanged an HTTP SSL blocking issues with TCP on the real
         | internet issue. This was predicted at the time.
         | 
         | The other issue is that instead of keeping it a simple
         | protocol, the temptation to add complexity to aid a specific
         | use case gets too much. (It's human nature I don't blame them)
        
           | pornel wrote:
           | H/2 doesn't solve blocking it on the TCP level, but it solved
           | another kind of blocking on the protocol level by having
           | multiplexing.
           | 
           | H/1 pipelining was unusable, so H/1 had to wait for a
           | response before sending the next request, which added a ton
           | of latency, and made server-side processing serial and
           | latency-sensitive. The solution to this was to open a dozen
           | separate H/1 connections, but that multiplied setup cost, and
           | made congestion control worse across many connections.
        
             | KaiserPro wrote:
             | > it solved another kind of blocking on the protocol level
             | 
             | Indeed! and it works well on low latency, low packet loss
             | networks. On high packet loss networks, it performs worse
             | than HTTP1.1. Moreover it gets increasingly worse the
             | larger the page the request is serving.
             | 
             | We pointed this out at the time, but were told that we
             | didn't understand the web.
             | 
             | > H/1 pipelining was unusable,
             | 
             | Yup, but think how easy it would be to create http1.2 with
             | better spec for pipe-lining. (but then why not make changes
             | to other bits as well, soon we get HTTP2!) But of course
             | pipelining only really works in a low packet loss network,
             | because you get head of line blocking.
             | 
             | > open a dozen separate H/1 connections, but that
             | multiplied setup cost
             | 
             | Indeed, that SSL upgrade is a pain in the arse. But
             | connections are cheap to keep open. So with persistent
             | connections and pooling its possible to really nail down
             | the latency.
             | 
             | Personally, I think the biggest problem with HTTP is that
             | its a file access protocol, a state interchange protocol
             | and an authentication system. I would tentatively suggest
             | that we adopt websockets to do state (with some extra
             | features like optional schema sharing {yes I know thats a
             | bit of enanthema}) Make http4 a proper file sharing
             | prototcol and have a third system for authentication token
             | generation, sharing and validation.
             | 
             | However the real world says that'll never work. So
             | connection pooling over TCP with quick start TLS would be
             | my way forward.
        
               | kiitos wrote:
               | > Personally, I think the biggest problem with HTTP is
               | that its a file access protocol, a state interchange
               | protocol and an authentication system.
               | 
               | HTTP is a state interchange protocol. It's not any of the
               | other things you mention.
        
         | liveoneggs wrote:
         | Part of/Evidence of the Google monopoly position in the web
         | stack are these big beta tests of protocols they cook up for
         | whatever reason.
        
           | surajrmal wrote:
           | This is a weak argument that simply caters to the ongoing HN
           | hivemind opinion. While Google made the initial proposal,
           | many other parties did participate in getting quic
           | standardized. The industry at large was in favor.
        
             | oefrha wrote:
             | IETF QUIC ended up substantially different from gQUIC.
             | People who say Google somehow single-handedly pushed things
             | through probably haven't read anything along the
             | standardization process, but of course everyone has to have
             | an opinion about all things Google.
        
       | botanical wrote:
       | > we identify the root cause to be high receiver-side processing
       | overhead
       | 
       | I find this to be the issue when it comes to Google, and I bet it
       | was known before hand; pushing processing to the user. For
       | example, the AV1 video codec was deployed when no consumer had HW
       | decoding capabilities. It saved them on space at the expense of
       | increased CPU usage for the end-user.
       | 
       | I don't know what the motive was there; it would still show that
       | they are carbon-neutral while billions are busy processing the
       | data.
        
         | anfilt wrote:
         | Well I will say if your running servers hit billions of times
         | per day. Offloading processing to the client when safe to do so
         | starts make sense financially. Google does not have to pay for
         | your CPU or storage usage ect...
         | 
         | Also I will say if said overhead is not too much it's not that
         | bad of a thing.
        
         | kccqzy wrote:
         | This is indeed an issue but it's widespread and everyone does
         | it, including Google. Things like servers no longer generating
         | actual dynamic HTML, replaced with servers simply serving pure
         | data like JSON and expecting the client to render it into the
         | DOM. It's not just Google that doesn't care, but the majority
         | of web developers also don't care.
        
           | SquareWheel wrote:
           | There's clearly advantages to writing a web app as an SPA,
           | otherwise web devs wouldn't do it. The idea that web devs
           | "don't care" (about what exactly?) really doesn't make any
           | sense.
           | 
           | Moving interactions to JSON in many cases is just a better
           | experience. If you click a Like button on Facebook, which is
           | the better outcome: To see a little animation where the
           | button updates, or for the page to reload with a flash of
           | white, throw away the comment you were part-way through
           | writing, and then scroll you back to the top of the page?
           | 
           | There's a reason XMLHttpRequest took the world by storm. More
           | than that, jQuery is still used on more than 80% of websites
           | due in large part to its legacy of making this process easier
           | and cross-browser.
        
             | tock wrote:
             | I don't think Facebook is the best example given the sheer
             | number of loading skeletons I see on their page.
        
             | consteval wrote:
             | > To see a little animation where the button updates, or
             | for the page to reload with a flash of white, throw away
             | the comment you were part-way through writing, and then
             | scroll you back to the top of the page
             | 
             | I don't understand how web devs understand the concept of
             | loading and manipulating JSON to dynamically modify the
             | page's HTML, but they don't understand the concept of
             | loading and manipulating HTML to dynamically modify the
             | page's HTML.
             | 
             | It's the same thing, except now you don't have to do a
             | conversion from JSON->HTML.
             | 
             | There's no rule anywhere saying receiving HTML on the
             | client should do a full page reload and throw up the
             | current running javascript.
             | 
             | > XMLHttpRequest
             | 
             | This could've easily been HTMLHttpRequest and it would've
             | been the same API, but probably better. Unfortunately,
             | during that time period Microsoft was obsessed with XML.
             | Like... obsessed obsessed.
        
             | kccqzy wrote:
             | Rendering JSON into HTML has nothing to do with
             | XMLHttpRequest.
             | 
             | Funny that you mention jQuery. When jQuery was hugely
             | popular, people used it to make XMLHttpRequests that
             | returned HTML which you then set as the innerHTML of some
             | element. Of course being jQuery, people used the shorthand
             | of `$("selector").html(...)` instead.
             | 
             | In the heyday of jQuery the JSON.parse API didn't exist.
        
         | danpalmer wrote:
         | > the AV1 video codec was deployed when no consumer had HW
         | decoding capabilities
         | 
         | This was a bug. An improved software decoder was deployed for
         | Android and for buggy reasons the YouTube app used it instead
         | of a hardware accelerated implementation. It was fixed.
         | 
         | Having worked on a similar space (compression formats for app
         | downloads) I can assure you that all factors are accounted for
         | with decisions like this, we were profiling device thermals for
         | different compression formats. Setting aside bugs, the teams
         | behind things like this are taking wide-reaching views of the
         | ecosystem when making these decisions, and at scale, client
         | concerns almost always outweigh server concerns.
        
           | watermelon0 wrote:
           | YouTube had the same issue with VP9 on laptops, where you had
           | to use an extension to force H264, to avoid quickly draining
           | the battery.
        
           | toastal wrote:
           | If only they would give us JXL on Android
        
       | Sparkyte wrote:
       | Maybe I'm the only person who thinks that trying to make existing
       | internet protocols faster is wasted energy. But who am I to say
       | anything.
        
         | cheema33 wrote:
         | > Maybe I'm the only person who thinks that trying to make
         | existing internet protocols faster is wasted energy. But who am
         | I to say anything.
         | 
         | If you have a valid argument to support your claim, why not
         | present it?
        
           | Sparkyte wrote:
           | They are already expected standards so when you create
           | optimizations you're building on functions that need to be
           | supported additionally on top of them. This leads to
           | incompatibility and sometimes often worse performance as what
           | is being experienced here with QUIC.
           | 
           | You can read more about such things from, The Evolution of
           | the Internet Congestion Control. https://groups.csail.mit.edu
           | /ana/Publications/The_Evolution_...
           | 
           | A good solution is to create a newer protocol when the limits
           | of an existing protcol are exceeded. No one thought of
           | needing HTTPS long ago and now we have 443 for HTTP security.
           | If we need something to be faster and it has already achieved
           | an arbitrary limit for the sake of backward compatibility it
           | would be better to introduce a new protocol.
           | 
           | I dislike the idea that we're turning into another Reddit
           | where we are pointing fingers at people for updoots. If you
           | dislike my opinion please present one equal to where that can
           | be challenged.
        
             | likis wrote:
             | You posted your opinion without any kind of accompanying
             | argument, and it was also quite unclear what you meant.
             | Whining about being a target and being downvoted is not
             | really going to help your case.
             | 
             | I initially understood your first post as: "Let's not try
             | to make the internet faster"
             | 
             | With this reply, you are clarifying your initial post that
             | was very unclear. Now I understand it as:
             | 
             | "Let's not try to make existing protocols faster, let's
             | make new protocols instead"
        
               | Sparkyte wrote:
               | More that if a protocol has met it's limit and you are at
               | a dead end it is better to build a new one from the
               | ground up. Making the internet faster is great but you
               | eventually hit a wall. You need to be creative and come
               | up with better solutions.
               | 
               | In fact our modern network infrastructure returns on
               | designs intended for limited network performance. Our
               | networks are fiber and 5g which are roughly 170,000 times
               | faster and wider since the initial inception of the
               | internet.
               | 
               | Time for a QUICv2
               | 
               | https://datatracker.ietf.org/doc/rfc9369/
               | 
               | But I don't think it addresses the disparity between it
               | and lightweight protocols as networks get faster.
        
             | paulgb wrote:
             | > A good solution is to create a newer protocol when the
             | limits of an existing protcol are exceeded.
             | 
             | It's not clear to me how this is different to what's
             | happening. Is your objection that they did it on top of UDP
             | instead of inventing a new transport layer?
        
               | Sparkyte wrote:
               | No, actually what I mean was that QUIC being a protocol
               | on UDP was intended to take advantage of the speed of UDP
               | to do things faster that some TCP protocols did. While
               | the merit is there the optimizations done on TCP itself
               | has drastically improved the performance of TCP based
               | protocols. UDP is still exceptional but it is like using
               | a crowbar to open bottle. Not exactly the tool intended
               | for the purpose.
               | 
               | Creating a new protocol starting from scratch would be
               | better effort spent. A QUICv2 is on the way.
               | https://datatracker.ietf.org/doc/rfc9369/
               | 
               | I don't think it addresses the problems with QUICv1 in
               | terms of lightweight performance and bandwidth which the
               | post claims QUIC lacks.
        
               | Veserv wrote:
               | QUICv2 is not really a new standard. It explicitly exists
               | merely to intentionally rearrange some fields to prevent
               | standard hardcoding/ossification and exercise the version
               | negotiation logic of implementations. It says so right in
               | the abstract:
               | 
               | "Its purpose is to combat various ossification vectors
               | and exercise the version negotiation framework."
        
               | simiones wrote:
               | Creating a new transport protocol for use on the whole
               | Internet is a massive undertaking, not only in purely
               | technical terms, but much more difficult, in political
               | terms. Getting all of the world's sysadmins to allow your
               | new protocol is a massive massive undertaking.
               | 
               | And if you have the new protocol available today, with
               | excellent implementations for Linux, Windows, BSD, MacOS,
               | Apple iOS, and for F5, Cisco, etc routers done, it will
               | still take an absolute minimum of 5-10 years until it
               | _starts_ becoming available on the wider Internet, and
               | that is if people are _desperate_ to adopt it. And the
               | vast majority of the world will not use it for the next
               | 20 years.
               | 
               | The time for updating hardware to allow and use new
               | protocols is going to be a massive hurdle to anything
               | like this. And the advantage to doing so over just using
               | UDP would have to be monumental to justify such an
               | effort.
               | 
               | The reality is that there will simply not be a new
               | transport protocol used on the wide internet in our
               | lifetimes. Trying to get one to happen is a pipe dream.
               | Any attempts at replacing TCP will just use UDP.
        
               | hnfong wrote:
               | While you're absolutely correct, I think it is
               | interesting to note that your argument could also have
               | applied to the HTTP protocol itself, given how widely
               | HTTP is used.
               | 
               |  _However_ , in reality, the people/forces pushing for
               | HTTP2 and QUIC are the same one(s?) who have a defacto
               | monopoly on browsers.
               | 
               | So, yes, it's a political issue, and they just
               | implemented their changes on a layer (or even... "app")
               | that they had the most control over.
               | 
               | On a purely "moral" perspective, political expediency
               | probably shouldn't be the reason why something is done,
               | but of course that's what actually happens in the real
               | world...
        
               | simiones wrote:
               | There are numerous non-HTTP protocols used successfully
               | on the Internet, as long as they run over TCP or UDP.
               | Policing content running on TCP port 443 to enforce that
               | it is HTTP/1.1 over TLS is actually extremely rare,
               | outside some very demanding corporate networks. If you
               | wanted to send your own new "HTTP/7" traffic today, with
               | some new encapsulation over TLS on port 443, and you
               | controlled the servers and the clients for this, I think
               | you would actually meet minimal issues.
               | 
               | The problems with SCTP, or any new transport-layer
               | protocol (or any even lower layer protocol), run much
               | deeper than deploying a new protocol on any higher layer.
        
         | foul wrote:
         | It's wasted energy when they aren't used at their full
         | capacity.
         | 
         | I think that GoogleHTTP has real-world uses for bad
         | connectivity or in datacenters where they can fine-tune their
         | data throughput (and buy crazy good NICs), but it seems that to
         | use it for replacing TCP (which seems to be confirmed as very
         | good when receiver and sender aren't controlled by the same
         | party) the world needs a hardware overhaul or something.
        
           | Sparkyte wrote:
           | Maybe, the problem is that we are designed around a limited
           | bandwidth network at the initial inception of the internet
           | and have been building around that design for 50 years. We
           | need to change the paradigm to think about our wide bandwidth
           | networks.
        
         | suprjami wrote:
         | You aren't the only one. The bufferbloat community has given up
         | on TCP altogether.
        
       | apitman wrote:
       | Currently chewing my way laboriously through RFC9000. Definitely
       | concerned by how complex it is. The high level ideas of QUIC seem
       | fairly straight forward, but the spec feels full of edge cases
       | you must account for. Maybe there's no other way, but it makes me
       | uncomfortable.
       | 
       | I don't mind too much as long as they never try to take HTTP/1.1
       | from me.
        
         | ironmagma wrote:
         | Considering they can't really even make IPv6 happen, that seems
         | like a likely scenario.
        
           | BartjeD wrote:
           | https://www.google.com/intl/en/ipv6/statistics.html
           | 
           | I think it's just your little corner of the woods that isn't
           | adopting it. Over here the trend is very clearly to move away
           | from IPv4, except for legacy reasons.
        
             | apitman wrote:
             | The important milestone is when it's safe to turn IPv4 off.
             | And that's not going to happen as long as any country
             | hasn't fully adopted it, and I don't think that's ever
             | going to happen. For better or worse NAT handles outgoing
             | connections and SNI routing handles incoming connections
             | for most use cases. Self-hosting is the most broken but IMO
             | that's better handled with tunneling anyway so you don't
             | expose your home IP.
        
               | jeroenhd wrote:
               | IPv4 doesn't need to be off. Hacks and workarounds like
               | DS-Lite can stay with us forever, just like hacks and
               | workarounds like NAT and ALGs will.
        
               | consp wrote:
               | DS-lite (aka CGNAT), now we don't need to give the
               | costumers a proper IP address anymore. It should be
               | banned as it limits IPv6 adoption and it getting more and
               | more use for "customers own good" and is annoying as hell
               | to work around.
        
             | AlienRobot wrote:
             | >I think it's just your little corner of the woods that
             | isn't adopting it.
             | 
             | The graph says adoption is under 50%.
             | 
             | Even U.S. is at only 50%. Some countries are under 1%.
        
               | BartjeD wrote:
               | Parts of the EU: 74%
        
               | ktosobcy wrote:
               | And others are 10-15%...
        
             | alt227 wrote:
             | The majority of this traffic is mobile devices. Most use
             | ipv6 by default.
             | 
             | Uptake on dekstop/laptops/servers is still extremely low
             | and will be for a long time to come.
        
               | sandos wrote:
               | Sweden is awful here, neither my home connection nor my
               | phone uses ipv6.
               | 
               | We were once very early with internet stuff, but now we
               | lagging it seems.
        
             | ktosobcy wrote:
             | Save for the France/Germany (~75%) and then
             | USA/Mexcico/Brazil (~50%) rest of the world is not really
             | adopting it... Even in Europe Spain has only ~10% and
             | Poland ~17% penetration but yeah... let's be dismissive
             | with "your little corner"...
        
               | 71bw wrote:
               | >and Poland ~17% penetration
               | 
               | Almost exclusively due to Orange Polska -> belongs to
               | France Telecom -> go figure...
        
             | arp242 wrote:
             | Adoption is not even 50%, and the line goes up fairly
             | linear so ~95% will be around 2040 or so?
             | 
             | And if you click on the map view you will see "little
             | corner of the woods" is ... the entire continent of Africa,
             | huge countries like China and Indonesia.
        
             | mardifoufs wrote:
             | Why did adoption slow down after a sudden rise? I guess
             | some countries switched to ipv6 and since then, progress
             | has been slow? It's hard to infer from the graph but my
             | guess would be india? They have a very nice adoption rate.
             | 
             | Sadly here in Canada I don't think any ISP even supports
             | IPv6 in any shape or form except for mobile. Videotron has
             | been talking about it for a decade (and they have a
             | completely outdated infrastructure now, only DOCSIS and a
             | very bad implementation of it too), and Bell has fiber but
             | does not provide any info on that either.
        
               | jtakkala wrote:
               | Rogers and Teksavvy support IPv6
        
               | mardifoufs wrote:
               | Ah that's cool! It sucks that they are basically non
               | existent in Quebec, at least for residential internet.
               | But I think they are pushing for a bigger foothold here
        
               | apitman wrote:
               | There's simply not enough demand. ISPs can solve their IP
               | problems with NAT. Web services can solve theirs with SNI
               | routing. The only people who really need IPv6 are self
               | hosters.
        
         | jakeogh wrote:
         | I think keeping HTTP/1.1 is almost as important as not dropping
         | IPV4 (there are good reasons to not being able to tag
         | everything; it's harder to block a country than a user.) For
         | similar reasons we should keep old protocols.
         | 
         | On a positive note, AFAICT 90%(??) of QUIC implementations
         | ignored the proposed the spin bit:
         | https://news.ycombinator.com/item?id=20990754
        
       | jauntywundrkind wrote:
       | I wonder if these results reproduce on Windows. Is there any TCP
       | offload or GSO there? If not maybe the results wouldn't vary?
        
         | v1ne wrote:
         | Oh, sure there is! https://learn.microsoft.com/en-us/windows-
         | hardware/drivers/n...
        
       | raggi wrote:
       | There are a number of concrete problems:
       | 
       | - syscall interfaces are a mess, the primitive APIs are too slow
       | for regular sized packets (~1500 bytes), the overhead is too
       | high. GSO helps but it's a horrible API, and it's been buggy even
       | lately due to complexity and poor code standards.
       | 
       | - the syscall costs got even higher with spectre mitigation - and
       | this story likely isn't over. We need a replacement for the BSD
       | sockets / POSIX APIs they're terrible this decade. Yes, uring is
       | fancy, but there's a tutorial level API middle ground possible
       | that should be safe and 10x less overhead without resorting to
       | uring level complexity.
       | 
       | - system udp buffers are far too small by default - they're much
       | much smaller than their tcp siblings, essentially no one but
       | experts have been using them, and experts just retune stuff.
       | 
       | - udp stack optimizations are possible (such as possible route
       | lookup reuse without connect(2)), gso demonstrates this, though
       | as noted above gso is highly fallible, quite expensive itself,
       | and the design is wholly unnecessarily intricate for what we
       | need, particularly as we want to do this safely from unprivileged
       | userspace.
       | 
       | - several optimizations currently available only work at low/mid-
       | scale, such as connect binding to (potentially) avoid route
       | lookups / GSO only being applicable on a socket without high
       | peer-competition (competing peers result in short offload chains
       | due to single-peer constraints, eroding the overhead wins).
       | 
       | Despite all this, you can implement GSO and get substantial
       | performance improvements, we (tailscale) have on Linux. There
       | will be a need at some point for platforms to increase platform
       | side buffer sizes for lower end systems, high load/concurrency,
       | bdp and so on, but buffers and congestion control are a high
       | complex and sometimes quite sensitive topic - nonetheless, when
       | you have many applications doing this (presumed future state),
       | there will be a need.
        
         | SomaticPirate wrote:
         | What is GSO?
        
           | thorncorona wrote:
           | presumably generic segmentation offloading
        
           | throwaway8481 wrote:
           | Generic Segmentation Offload
           | 
           | https://www.kernel.org/doc/html/latest/networking/segmentati.
           | ..
        
           | chaboud wrote:
           | Likely Generic Segmentation Offload (if memory serves), which
           | is a generalization of TCP segmentation offload.
           | 
           | Basically (hyper simple), the kernel can lump stuff together
           | when working with the network interface, which cuts down on
           | ultra slow hardware interactions.
        
             | raggi wrote:
             | it was originally for the hardware, but it's also valuable
             | on the software side as the cost of syscalls is far too
             | high for packet sized transactions
        
           | jesperwe wrote:
           | Generic Segmentation Offload
           | 
           | "GSO gains performance by enabling upper layer applications
           | to process a smaller number of large packets (e.g. MTU size
           | of 64KB), instead of processing higher numbers of small
           | packets (e.g. MTU size of 1500B), thus reducing per-packet
           | overhead."
        
             | underdeserver wrote:
             | This is more the result.
             | 
             | Generally today an Ethernet frame, which is the basic
             | atomic unit of information over the wire, is limited to
             | 1500 bytes (the MTU, or Maximum Transmission Unit).
             | 
             | If you want to send more - the IP layer allows for 64k
             | bytes per IP packet - you need to split the IP packet into
             | multiple (64k / 1500 plus some header overhead) frames.
             | This is called segmentation.
             | 
             | Before GSO the kernel would do that which takes buffering
             | and CPU time to assemble the frame headers. GSO moves this
             | to the ethernet hardware, which is essentially doing the
             | same thing only hardware accelerated and without taking up
             | a CPU core.
        
           | USiBqidmOOkAqRb wrote:
           | Shipping? Government services online? Piedmont airport?
           | Alcoholics anonymous? Obviously not.
           | 
           |  _Please_ introduce your initialisms, if it 's not guaranteed
           | that first result in a search will be correct.
        
             | mh- wrote:
             | _> first result in a search will be correct_
             | 
             | Searching for _GSO network_ gives you the correct answer in
             | the first result. I 'd consider that condition met.
        
         | cookiengineer wrote:
         | Say what you want but I bet we'll see lots of eBPF modules
         | being loaded in the future for the very reason you're
         | describing. An ebpf quic module? Why not!
         | 
         | And that scares me, because there's not a single tool that has
         | this on its radar for malware detection/prevention.
        
           | raggi wrote:
           | we can consider ebpf "a solution" when there's even a remote
           | chance you'll be able to do it from an unentitled ios app.
           | somewhat hyperbole, but the point is, this problem is a
           | problem for userspace client applications, and bpf isn't a
           | particularly "good" solution for servers either, it's high
           | cost of authorship for a problem that is easily solvable with
           | a better API to the network stack.
        
             | mgaunard wrote:
             | ebpf is linux technology, you will never be able to do it
             | from iOS.
        
               | dan-robertson wrote:
               | https://github.com/microsoft/ebpf-for-windows
        
         | JoshTriplett wrote:
         | > Yes, uring is fancy, but there's a tutorial level API middle
         | ground possible that should be safe and 10x less overhead
         | without resorting to uring level complexity.
         | 
         | I don't think io_uring is as complex as its reputation
         | suggests. I don't think we need a substantially simpler _low-
         | level_ API; I think we need more high-level APIs built on top
         | of io_uring. (That will also help with portability: we need
         | APIs that can be most efficiently implemented atop io_uring but
         | that work on non-Linux systems.)
        
           | raggi wrote:
           | > I don't think io_uring is as complex as its reputation
           | suggests.
           | 
           | uring is extremely problematic to integrate into many common
           | application / language runtimes and it has been demonstrably
           | difficult to integrate into linux safely and correctly as
           | well, with a continual stream of bugs, security and policy
           | control issues.
           | 
           | in principle a shared memory queue is a reasonable basis for
           | improving the IO cost between applications and IO stacks such
           | as the network or filesystem stacks, but this isn't easy to
           | do well, cf. uring bugs and binder bugs.
        
             | JoshTriplett wrote:
             | > with a continual stream of bugs, security and policy
             | control issues
             | 
             | This has not been true for a long time. There was an early
             | design mistake that made it quite prone to these, but that
             | mistake has been fixed. Unfortunately, the reputational
             | damage will stick around for a while.
        
               | raggi wrote:
               | 13 CVEs so far this year afaik
        
               | bonzini wrote:
               | CVE numbers from the Linux CNA are bollocks.
        
               | JoshTriplett wrote:
               | This conversation would be a good one to point them to to
               | show that their policy is not just harmless point-
               | proving, but in fact does cause harm.
               | 
               | For context, to the best of my knowledge the current
               | approach of the Linux CNA is, in keeping with long-
               | standing Linux security policy of "every single fix
               | _might_ be a security fix ", to assign CVEs regardless of
               | whether something has any security impact or not.
        
               | di4na wrote:
               | I would not call it harm. The use of uring in higher
               | level languages is definitely prone to errors, bugs and
               | security problems
        
               | JoshTriplett wrote:
               | See the context I added to that comment; this is not
               | about security issues, it's about the Linux CNA's absurd
               | approach to CVE assignment for things that aren't CVEs.
        
               | tialaramex wrote:
               | I don't agree that it's absurd. I would say it reflects a
               | proper understanding of their situation.
               | 
               | You've doubtless heard Tony Hoare's "There are two ways
               | to write code: write code so simple there are obviously
               | no bugs in it, or write code so complex that there are no
               | obvious bugs in it.". Linux is definitely in the latter
               | category, it's now such a sprawling system that
               | determining whether a bug "really" has security
               | implications is no long a reasonable task compared to
               | just fixing the bug.
               | 
               | The other reason is that Linux is so widely used that
               | almost no assumption made to simplify that above task is
               | definitely correct.
        
               | JoshTriplett wrote:
               | That's fine, except that it is thus no longer meaningful
               | to compare CVE count.
        
               | hifromwork wrote:
               | I like CVEs, I think Linux approach to CVEs is stupid,
               | but also it was never meaningful to compare CVE count.
               | But I guess it's hard to make people stop doing that, and
               | that's the reason Linux does the thing it does out of
               | spite.
        
               | kuschku wrote:
               | CVE assignment != security issue
               | 
               | CVE numbers are just a way to ensure everyone is talking
               | about the same bug. Not every security issue has a CVE,
               | not every CVE is a security issue.
               | 
               | Often, a regular bug turns out years later to have been a
               | security issue, or a security issue turns out to have no
               | security impact at all.
               | 
               | If you want a central authority to tell you what to
               | think, just use CVSS instead of the binary "does it have
               | a CVE" metric.
        
               | skywhopper wrote:
               | That's definitely not the understanding that literally
               | anyone outside the Linux team has for what a CVE is,
               | including the people who came up with them and run the
               | database. Overloading a well-established mechanism of
               | communicating security issues to just be a registry of
               | Linux bugs is an abuse of an important shared resource.
               | Sure "anything could be a security issue" but in
               | practice, most bugs aren't, and putting meaningless bugs
               | into the international security issue database is just a
               | waste of everyone's time and energy to make a very stupid
               | point.
        
               | kuschku wrote:
               | > including the people who came up with them
               | 
               | How do you figure that? The original definition of CVE is
               | exactly the same as how Linux approaches it.
               | 
               | Sure, in recent years security consultants have been
               | overloading CVE to mean something else, but that's
               | something to fix, not to keep.
        
               | jiripospisil wrote:
               | Can you post the original definition?
        
               | vel0city wrote:
               | Common Vulnerabilities and Exposures
        
               | jiripospisil wrote:
               | Right but I was hoping for a definition which supports
               | OP's claim that "CVE assignment != security issue".
        
               | kuschku wrote:
               | Then check out these definitions, from 2000, defined by
               | the CVE editorial board:
               | 
               | > The CVE list aspires to describe and name all publicly
               | known facts about computer systems that could allow
               | somebody to violate a reasonable security policy for that
               | system
               | 
               | As well as:
               | 
               | > Discussions on the Editorial Board mailing list and
               | during the CVE Review meetings indicate that there is no
               | definition for a "vulnerability" that is acceptable to
               | the entire community. At least two different definitions
               | of vulnerability have arisen and been discussed. There
               | appears to be a universally accepted, historically
               | grounded, "core" definition which deals primarily with
               | specific flaws that directly allow some compromise of the
               | system (a "universal" definition). A broader definition
               | includes problems that don't directly allow compromise,
               | but could be an important component of a successful
               | attack, and are a violation of some security policies (a
               | "contingent" definition).
               | 
               | > In accordance with the original stated requirements for
               | the CVE, the CVE should remain independent of multiple
               | perspectives. Since the definition of "vulnerability"
               | varies so widely depending on context and policy, the CVE
               | should avoid imposing an overly restrictive perspective
               | on the vulnerability definition itself.
               | 
               | Under this definition, any kernel bug that could lead to
               | user-space software acting differently is a CVE.
               | Similarly, all memory management bugs in the kernel
               | justify a CVE, as they could be used as part of an
               | exploit.
        
               | jiripospisil wrote:
               | > to violate a reasonable security policy for that system
               | 
               | > with specific flaws that directly allow some compromise
               | of the system
               | 
               | > important component of a successful attack, and are a
               | violation of some security policies
               | 
               | All of these are talking about security issues, not
               | "acting differently".
        
               | josefx wrote:
               | > All of these are talking about security issues, not
               | "acting differently".
               | 
               | Because no system has been ever taken down by code that
               | behaved different from what it was expected to do? Right?
               | Like http desync attacks, sql escape bypasses, ... .
               | Absolutely no security issue going to be caused by a very
               | minor and by itself very secure difference in behavior.
        
               | kuschku wrote:
               | > important component of a successful attack, and are a
               | violation of some security policies
               | 
               | If the kernel returned random values from gettime, that'd
               | lead to tls certificate validation not being reliable
               | anymore. As result, any bug in gettime is certainly
               | worthy of a CVE.
               | 
               | If the kernel shuffled filenames so they'd be returned
               | backwards, apparmor and selinux profiles would break. As
               | result, that'd be worthy of a CVE.
               | 
               | If the kernel has a memory corruption, use after free,
               | use of uninitialized memory or refcounting issue, that's
               | obviously a violation of security best practices and can
               | be used as component in an exploit chain.
               | 
               | Can you now see how almost every kernel bug can and most
               | certainly will be turned into a security issue at some
               | point?
        
               | cryptonector wrote:
               | > that could allow somebody to violate a reasonable
               | security policy for that system
               | 
               | That's "security bug". Please stop saying it's not.
        
               | kuschku wrote:
               | As detailed in my sibling reply, by definition that
               | includes any bug in gettime (as that'd affect tls
               | certificate validation), any bug in a filesystem (as
               | that'd affect loading of selinux/apparmor profiles), any
               | bug in eBPF (as that'd affect network filtering), etc.
               | 
               | Additionally, any security bug in the kernel itself, so
               | any use after free, any refcounting bug, any use of
               | uninitialized memory.
               | 
               | Can you now see why pretty much every kernel bug fulfills
               | that definition?
        
               | simiones wrote:
               | This is completely false. The CVE website defines these
               | very clearly:
               | 
               | > The mission of the CVE(r) Program is to identify,
               | define, and catalog _publicly disclosed cybersecurity
               | vulnerabilities_ [emphasis mine].
               | 
               | In fact, CVE stands for "Common Vulnerabilities and
               | Exposures", again showing that CVE == security issue.
               | 
               | It's of course true that just because your _code_ has an
               | unpatched CVE doesn 't automatically mean that your
               | _system_ is vulnerable - other mitigations can be in
               | place to protect it.
        
               | kuschku wrote:
               | That's the modern definition, which is rewriting history.
               | Let's look at the actual, original definition:
               | 
               | > The CVE list aspires to describe and name all publicly
               | known facts about computer systems that could allow
               | somebody to violate a reasonable security policy for that
               | system
               | 
               | There's also a decision from the editorial board on this,
               | which said:
               | 
               | > Discussions on the Editorial Board mailing list and
               | during the CVE Review meetings indicate that there is no
               | definition for a "vulnerability" that is acceptable to
               | the entire community. At least two different definitions
               | of vulnerability have arisen and been discussed. There
               | appears to be a universally accepted, historically
               | grounded, "core" definition which deals primarily with
               | specific flaws that directly allow some compromise of the
               | system (a "universal" definition). A broader definition
               | includes problems that don't directly allow compromise,
               | but could be an important component of a successful
               | attack, and are a violation of some security policies (a
               | "contingent" definition).
               | 
               | > In accordance with the original stated requirements for
               | the CVE, the CVE should remain independent of multiple
               | perspectives. Since the definition of "vulnerability"
               | varies so widely depending on context and policy, the CVE
               | should avoid imposing an overly restrictive perspective
               | on the vulnerability definition itself.
               | 
               | For more details, see https://web.archive.org/web/2000052
               | 6190637fw_/http://www.cve... and https://web.archive.org/
               | web/20020617142755/http://cve.mitre....
               | 
               | Under this definition, any kernel bug that could lead to
               | user-space software acting differently is a CVE.
               | Similarly, all memory management bugs in the kernel
               | justify a CVE, as they could be used as part of an
               | exploit.
        
               | simiones wrote:
               | Those two links say that CVEs can be one of two
               | categories: universal vulnerabilities or exposures. But
               | the examples of exposures are _not_ , in any way, "any
               | bug in the kernel". They give specific examples of things
               | which _are known_ to make a system more vulnerable to
               | attack, even if not everyone would agree that they are a
               | problem.
               | 
               | So yes, any CVE is supposed to be a security problem, and
               | it has always been so. Maybe not for your specific system
               | or for your specific security posture, but for someone's.
               | 
               | Extending this to any bugfix is a serious
               | misunderstanding of what an "exposure" means, and it is a
               | serious difference from other CNAs. Linux CNA-assigned
               | CVEs just can't be taken as seriously as normal CNAs.
        
               | immibis wrote:
               | As I understand it, they adopted this policy because the
               | other policy was also causing harm.
               | 
               | They are right, by the way. When CVEs were used for
               | things like Heartbleed they made sense - you could point
               | to Heartbleed's CVE number and query various information
               | systems about vulnerable systems. When every single
               | possible security fix gets one, AND automated systems are
               | checking the you've patched every single one or else you
               | fail the audit (even ones completely irrelevant to the
               | system, like RCE on an embedded device with no internet
               | access) the system is not doing anything useful - it's
               | deleting value from the world and must be repaired or
               | destroyed.
        
               | hifromwork wrote:
               | The problem here are the automated systems and braindead
               | auditors, not the CVE system itself.
        
               | immibis wrote:
               | Well, the CVE system itself is only about assigning
               | identifiers, and assigning identifiers unnecessarily
               | couldn't possibly hurt anyone, who isn't misusing the
               | system, unless they're running out of identifiers.
        
               | raggi wrote:
               | this is a bit of a distraction, sure the leaks and some
               | of the deadlocks are fairly uninteresting, but the
               | toctou, overflows, uid race/confusion and so on are real
               | issues that shouldn't be dismissed as if they don't
               | exist.
        
             | jeffparsons wrote:
             | I find this surprising, given that my initial response to
             | reading the iouring design was:
             | 
             | 1. This is pretty clean and straightforward. 2. This is
             | obviously what we need to decouple a bunch of things
             | without the previous downsides.
             | 
             | What has made it so hard to integrate it into common
             | language runtimes? Do you have examples of where there's
             | been an irreconcilable "impedance mismatch"?
        
               | raggi wrote:
               | https://github.com/tailscale/tailscale/pull/2370 was a
               | practical drive toward this, will not proceed on this
               | path.
               | 
               | much more approachable, boats has written about
               | challenges integrating in rust:
               | https://without.boats/tags/io-uring/
               | 
               | in the most general form: you need a fairly "loose"
               | memory model to integrate the "best" (performance wise)
               | parts, and the "best" (ease of use/forward looking
               | safety) way to integrate requires C library linkage. This
               | is troublesome in most GC languages, and many managed
               | runtimes. There's also the issue that uring being non-
               | portable means that the things it suggests you must do
               | (such as say pinning a buffer pool and making APIs like
               | read not immediate caller allocates) requires a
               | substantially separate API for this platform than for
               | others, or at least substantial reworks over all the
               | existing POSIX modeled APIs - thus back to what I said
               | originally, we need a replacement for POSIX & BSD here,
               | broadly applied.
        
               | gpderetta wrote:
               | I can see how a zero-copy API would be hard to implement
               | on some languages, but you could still implement
               | something on top of io_uring with posix buffer copy
               | semantics , while using batching to decrease syscall
               | overhead.
               | 
               | Zero-copy APIs will necessarily be tricky to implement
               | and use, especially on memory safe languages.
        
               | gmokki wrote:
               | I think most GC languages support native/pinned me(at
               | least Java and C# do memory to support talking to kernel
               | or native libraries. The APIs are even quite nice.
        
               | neonsunset wrote:
               | Java's off-heap memory and memory segment API is quite
               | dreadful and on the slower side. C# otoh gives you easy
               | and cheap object pinning, malloc/free and stack-allocated
               | buffers.
        
               | asveikau wrote:
               | I read the oldest of those blog posts the closest.
               | 
               | Seems like the author points out two things:
               | 
               | 1. The lack of rust futures supporting manual
               | cancellation. That doesn't seem like an inevitable choice
               | by rust.
               | 
               | 2. Sharing buffers with kernel mode. This is probably a
               | bigger topic.
        
               | withoutboats3 wrote:
               | Rust's async model can support io-uring fine, it just has
               | to be a different API based on ownership instead of
               | references. (That's the conclusion of my posts you link
               | to.)
        
             | arghwhat wrote:
             | Two things:
             | 
             | One, uring is not extremely problematic to integrate, as it
             | can be chained into a conventional event loop if you want
             | to, or can even be fit into a conventionally blocking
             | design to get localized syscall benefits. That is, you do
             | not need to convert to a fully uring event loop design,
             | even if that would be superior - and it can usually be kept
             | entirely within a (slightly modified) event loop
             | abstraction. The reason it has not yet been implemented is
             | just priority - most stuff _isn 't_ bottlenecked on IOPS.
             | 
             | Two, yes you could have e middle-ground. I assume the
             | syscall overhead you call out is the need to send UDP
             | packets one at a time through sendmsg/sendto, rather than
             | doing one big write for several packets worth of data on
             | TCP. An API that allowed you to provide a chain of
             | messages, like sendmsg takes an iovec for data, is
             | possible. But it's also possible to do this already as a
             | tiny blocking wrapper around io_uring, saving you new
             | syscalls.
        
               | Veserv wrote:
               | The system call to send multiple UDP packets in a single
               | call has existed since Linux 3.0 over a decade ago[1]:
               | sendmmsg().
               | 
               | [1] https://man7.org/linux/man-pages/man2/sendmmsg.2.html
        
               | arghwhat wrote:
               | Ah nice, in that case OP's point about syscall overhead
               | is entirely moot. :)
               | 
               | That should really be in the `SEE ALSO` of `man 3
               | sendmsg`...
        
               | evntdrvn wrote:
               | patches welcome :p
        
               | justincormack wrote:
               | At one point if I remember it didnt actually work, it
               | still just sent one message at a time and returned the
               | length of the first piece of the iovec. Hopefully it got
               | fixed.
        
               | johnp_ wrote:
               | Looks like Mozilla is currently working on implementing
               | `sendmmsg` and `recvmmsg` use in neqo (Mozilla's QUIC
               | implementation) [1].
               | 
               | [1] https://github.com/mozilla/neqo/issues/1693
        
               | londons_explore wrote:
               | I think you need to look at a common use case and
               | consider how many syscalls you'd like it to take and how
               | many CPU cycles would be reasonable.
               | 
               | Let's take downloading a 1MB jpeg image over QUIC and
               | rendering it on the screen.
               | 
               | I would hope that can be done in about 100k CPU cycles
               | and 20 syscalls, considering that all the jpeg decoding
               | and rendering is going to be hardware accelerated. The
               | decryption is also hardware accelerated.
               | 
               | Unfortunately, no network API allows that right now. The
               | CPU needs to do a substantial amount of processing for
               | every individual packet, in both userspace and kernel
               | space, for receiving the packet and sending the ACK, and
               | there is no 'bulk decrypt' non-blocking API.
               | 
               | Even the data path is troublesome - there should be a way
               | for the data to go straight from the network card to the
               | GPU, with the CPU not even touching it, but we're far
               | from that.
        
               | arghwhat wrote:
               | There's a few issues here.
               | 
               | 1. A 1 MB file is at the very least 64 individually
               | encrypted TLS records (16k max size) sent in sequence,
               | possibly more. So decryption 64 times is the maximum
               | amount of bulk work you can do - this is done to allow
               | streaming verification and decryption in parallel with
               | the download, whereas one big block would have you wait
               | for the very last byte before any processing could start.
               | 
               | 2. TLS is still userspace and decryption does not involve
               | the kernel, and thus no syscalls. The benefits of kernel
               | TLS largely focus on servers sending files straight from
               | disk, bypassing userspace for the entire data processing
               | path. This is not really relevant receive-side for
               | something you are actively decoding.
               | 
               | 3. JPEG is, to my knowledge, rarely hardware offloaded on
               | desktop, so no syscalls there.
               | 
               | Now, the number of actual syscalls end up being dictated
               | by the speed of the sender, and the tunable receive
               | buffer size. The slower the sender, the _more_ kernel
               | roundtrips you end upo with, which allows you to amortize
               | the processing over a longer period so everything is
               | ready when the last packet is. For a fast enough sender
               | with big enough receive buffers, this could be a single
               | kernel roundtrip.
        
               | miohtama wrote:
               | JPEG is not a particular great example. However most
               | video streams and partially hardware decoded. Usually you
               | still need to decode part of the stream, namely entropy
               | coding and metadata, first on the CPU.
        
               | immibis wrote:
               | This system call you're asking for already exists - it's
               | called sendmmsg. There is also recvmmsg.
        
           | lukeh wrote:
           | async/await io_uwring wrappers for languages such as Swift
           | [1] and Rust [2] [3] can improve usability considerably. I'm
           | not super familiar with the Rust wrappers but, I've been
           | using IORingSwift for socket, file and serial I/O for some
           | time now.
           | 
           | [1] https://github.com/PADL/IORingSwift [2]
           | https://github.com/bytedance/monoio [3]
           | https://github.com/tokio-rs/tokio-uring
        
           | anarazel wrote:
           | FWIW, the biggest problem I've seen with efficiently using
           | io_uring for networking is that none of the popular TLS
           | libraries have a buffer ownership model that really is
           | suitable for asynchronous network IO.
           | 
           | What you'd want is the ability to control the buffer for the
           | "raw network side", so that asynchronous network IO can be
           | performed without having to copy between a raw network buffer
           | and buffers owned by the TLS library.
           | 
           | It also would really help if TLS libraries supported
           | processing multiple TLS records in a batched fashion. Doing
           | roundtrips between app <-> tls library <-> userspace network
           | buffer <-> kernel <-> HW for every 16kB isn't exactly
           | efficient.
        
         | quotemstr wrote:
         | > Yes, uring is fancy, but there's a tutorial level API middle
         | ground possible that should be safe and 10x less overhead
         | without resorting to uring level complexity.
         | 
         | And the kernel has no business providing this middle-layer API.
         | Why should it? Let people grab whatever they need from the
         | ecosystem. Networking should be like Vulkan: it should have a
         | high-performance, flexible API at the systems level with being
         | "easy to use" a non-goal --- and higher-level facilities on
         | top.
        
           | astrange wrote:
           | The kernel provides networking because it doesn't trust
           | userspace to do it. If you provided a low level networking
           | API you'd have to verify everything a client sends is not
           | malicious or pretending to be from another process. And for
           | the same reason, it'd only work for transmission, not
           | receiving.
           | 
           | That and nobody was able to get performant microkernels
           | working at the time, so we ended up with everything in the
           | monokernel.
           | 
           | If you do trust the client processes then it could be better
           | to just have them read/write IP packets though.
        
           | namibj wrote:
           | Also, it is really easy to do the normal IO "syscall
           | wrappers" on top of io_uring instead, even easily exposing a
           | very simple async/await variant of them that splits out the
           | "block on completion (after which just like normal IO the
           | data buffer has been copied into kernel space)" from the rest
           | of the normal IO syscall, which allow pipelining & coalescing
           | of requests.
        
         | modeless wrote:
         | Seems to me that the real problem is the 1500 byte MTU that
         | hasn't increased in practice in over _40 years_.
        
           | asmor wrote:
           | That's on the list that right after we all migrate to IPv6.
        
           | p_l wrote:
           | For all practical purposes, the internet MTU is lower than
           | ethernet default MTU.
           | 
           | Sometimes for ease of mind I end up clamping it to v6 minimum
           | (1280) just in case .
        
           | j16sdiz wrote:
           | The real problem is some so called "sysadmin" drop all ICMP,
           | breaking path mtu discovery.
        
             | icedchai wrote:
             | The most secure network is one that doesn't pass any
             | traffic at all. ;)
        
           | throw0101c wrote:
           | > _Seems to me that the real problem is the 1500 byte MTU
           | that hasn 't increased in practice in over 40 years._
           | 
           | As per a sibling comment, 1500 is just for Ethernet (the
           | default, jumbo frames being able to go to (at least) 9000).
           | But the Internet is more than just Ethernet.
           | 
           | If you're on DSL, then RFC 2516 states that PPPoE's MTU is
           | 1492 (and you probably want an MSS of 1452). The PPP, L2TP,
           | and ATM AAL5 standards all have 16-bit length fields allowing
           | for packets up 64k in length. GPON ONT MTU is 2000. The
           | default MTU for LTE is 1428. If you're on an HPC cluster,
           | there's a good chance you're using Infiniband, which goes to
           | 4096.
           | 
           | What are size do you suggest everyone on the planet go to?
           | Who exactly is going to get everyone to switch to the new
           | value?
        
             | Hikikomori wrote:
             | The internet is mostly ethernet these days (ISP core/edge),
             | last mile connections like DSL and cable already handle a
             | smaller MTU so should be fine with a bigger one.
        
               | cesarb wrote:
               | > The internet is mostly ethernet these days (ISP
               | core/edge),
               | 
               | A lot of that ISP edge is CPEs with WiFi, which AFAIK
               | limits the MTU to 2304 bytes.
        
               | throw0101c wrote:
               | > _The internet is mostly ethernet these days_ [...]
               | 
               | Except for the bajillion mobile devices in people's
               | pockets/purses.
        
             | fallingsquirrel wrote:
             | > What are size do you suggest everyone on the planet go
             | to?
             | 
             | 65536
             | 
             | > Who exactly is going to get everyone to switch to the new
             | value?
             | 
             | The same people who got everyone to switch to IPv6. It's a
             | missed opportunity that these migrations weren't done at
             | the same time imho.
             | 
             | It'll take a few decades, sure, but that's how big
             | migrations go. What's the alternative? Making no
             | improvements at all, forever?
        
               | 0xbadcafebee wrote:
               | > got everyone to switch to IPv6
               | 
               | I have some bad news...
               | 
               | > What's the alternative? Making no improvements at all,
               | forever?
               | 
               | No, sadly. The alternative is what the entire tech world
               | has been doing for the past 15 years: shove
               | "improvements" inside whatever crap we already have
               | because nobody wants to replace the crap.
               | 
               | If IPv6 were made today, it would be tunneled inside an
               | HTTP connection. All the new apps would adopt it, the
               | legacy apps would be abandoned or have shims made, and
               | the whole thing would be inefficient and buggy, but
               | adopted. Since poking my head outside of the tech world
               | and into the wider world, it turns out this is how most
               | of the world works.
        
               | MerManMaid wrote:
               | >If IPv6 were made today, it would be tunneled inside an
               | HTTP connection. All the new apps would adopt it, the
               | legacy apps would be abandoned or have shims made, and
               | the whole thing would be inefficient and buggy, but
               | adopted. Since poking my head outside of the tech world
               | and into the wider world, it turns out this is how most
               | of the world works.
               | 
               | What you're suggesting here wouldn't work, wrapping all
               | the addressing information inside HTTP which relies on IP
               | for delivery does not work. It would be the equivalent of
               | sealing all the addressing information for a letter you'd
               | like to send _inside_ the envelope.
        
               | throw0101c wrote:
               | > _If IPv6 were made today, it would be tunneled inside
               | an HTTP connection._
               | 
               | Given that one of the primary goals of IPv6 was increased
               | address space, how would putting IPv6 in an HTTP
               | connection riding over IPv4 solve that?
        
         | Diggsey wrote:
         | Historically there have been too many constraints on the Linux
         | syscall interface:
         | 
         | - Performance
         | 
         | - Stability
         | 
         | - Convenience
         | 
         | - Security
         | 
         | This differs from eg. Windows because on Windows the stable
         | interface to the OS is in user-space, not tied to the syscall
         | boundary. This has resulted in unfortunate compromises in the
         | design of various pieces of OS functionality.
         | 
         | Thankfully things like futex and io-uring have dropped the
         | "convenience" constraint from the syscall itself and moved it
         | into user-space. Convenience is still important, but it doesn't
         | need to be a constraint at the lowest level, and shouldn't
         | compromise the other ideals.
        
         | amluto wrote:
         | Hi, Tailscale person! If you want a fairly straightforward
         | improvement you could make: Tailscale, by default uses
         | SOCK_RAW. And having any raw socket listening at all hurts
         | receive performance systemwide:
         | 
         | https://lore.kernel.org/all/CALCETrVJqj1JJmHJhMoZ3Fuj685Unf=...
         | 
         | It shouldn't be particularly hard to port over the optimization
         | that prevents this problem for SOCK_PACKET. I'll get to it
         | eventually (might be quite a while), but I only care about this
         | because of Tailscale, and I don't have a ton of bandwidth.
        
           | raggi wrote:
           | Very interesting, thank you. We'll take a look at this!
        
           | bradfitz wrote:
           | BTW, that code changed just recently:
           | 
           | https://github.com/tailscale/tailscale/commit/1c972bc7cbebfc.
           | ..
           | 
           | It's now a AF_PACKET/SOCK_DGRAM fd as it was originally meant
           | to be.
        
         | cryptonector wrote:
         | Of these the hardest one to deal with is route lookup caching
         | and reuse w/o connect(2). Obviously the UDP connected TCB can
         | cache that, but if you don't want a "connected" socket fd...
         | then there's nowhere else to cache it except ancillary data, so
         | ancillary data it would have to be. But getting return-to-
         | sender ancillary data on every read (so as to be able to copy
         | it to any sends back to the same peer) adds overhead, so that's
         | not good.
         | 
         | A system call to get that ancillary data adds overhead that can
         | be amortized by having the application cache it, so that's
         | probably the right design, and if it could be combined with
         | sending (so a new flavor of sendto(2)) that would be even
         | better, and it all has to be uring-friendly.
        
       | latentpot wrote:
       | QUIC is the standard problem across n number of clients who
       | choose Zscaler and similar content inspection tools. You can
       | block it at the policy level but you also need to have it
       | disabled at the browser level. Which sometimes magically turns on
       | again and leads to a flurry of tickets for 'slow internet',
       | 'Google search not working' etcetera.
        
         | watermelon0 wrote:
         | Wouldn't the issue in this case be with Zscaler, and not with
         | QUIC?
        
         | chgs wrote:
         | The problem here is choosing software like zscaler
        
           | mcosta wrote:
           | Zscaler is not chosen, it is imposed by the corporation
        
         | v1ne wrote:
         | Hmm, interesting. We also have a policies imposed by the
         | Regulator(tm) that leads to us inspecting all web traffic. All
         | web traffic goes through a proxy that's configured in the web
         | browser. No proxy, no internet.
         | 
         | Out of curiosity: What's your use case to use ZScaler for this
         | inspection instead?
        
       | AlphaCharlie wrote:
       | Free PDF file of the research: https://arxiv.org/pdf/2310.09423
        
       | jiggawatts wrote:
       | I wonder if the trick might be to repurpose technology from
       | server hardware: partition the physical NIC into virtual PCI-e
       | devices with distinct addresses, and map to user-space processes
       | instead of virtual machines.
       | 
       | So in essence, each browser tab or even each listening UDP socket
       | could have a distinct IPv6 address dedicated to it, with packets
       | delivered into a ring buffer in user-mode. This is so similar to
       | what goes on with hypervisors now that existing hardware designs
       | might even be able to handle it already.
       | 
       | Just an idle thought...
        
         | KaiserPro wrote:
         | Or just have multiple TCP streams. Super simple, low cost, uses
         | all the optimisations we have already.
         | 
         | when the latency/packet drop is low, prune the connections and
         | you get monster speed.
         | 
         | When the latency/loss is high, grow the number of concurrent
         | connections to overcome slow start.
         | 
         | Doesn't give you QUIC like multipath though.
        
           | m_eiman wrote:
           | There's Multipath TCP.
        
             | KaiserPro wrote:
             | I mean there is, but from what I recall its more a link
             | aggregation thing, rather than a network portable system
        
         | jeroenhd wrote:
         | I've often pondered if it was possible to assign every
         | application/tab/domain/origin a different IPv6 address to
         | exchange data with, to make tracking people just a tad harder,
         | but also to simplify per-process firewall rules. With the bare
         | minimum, a /64, you could easily host billions of addresses per
         | device without running out.
         | 
         | I think there may be a limit to how many IP addresses NICs (and
         | maybe drivers) can track at once, though.
         | 
         | What I don't really get is why QUIC had to be invented when
         | multi-stream protocols like SCTP already exist. SCTP brings the
         | reliability of TCP with the multi-stream system that makes QUIC
         | good for websites. Piping TLS over it is a bit of a pain (you
         | don't want a separate handshake per stream), but surely there
         | could be techniques to make it less painful (leveraging 0-RTT?
         | Using session resumptions with tickets from the first connected
         | stream?).
        
           | simiones wrote:
           | First and foremost, you can't use SCTP on the Internet, so
           | the whole idea is dead on arrival. The Internet only really
           | works for TCP and UDP over IP - anything else, you have a
           | loooooong tail of networks which will drop the traffic.
           | 
           | Secondly, the whole point of QUIC is to merge the TLS and
           | transport handskakes into a single packet, to reduce RTT.
           | This would mean you need to modify SCTP anyway to allow for
           | this use case, so even what small support exists for SCTP in
           | the large would need to be upgraded.
           | 
           | Thirdly, there is no reason to think that SCTP is better
           | handled than UDP at the kernel's IP stack level. All of the
           | problems of memory optimizations are likely to be much worse
           | for SCTP than for UDP, as it's used far, far less.
        
             | astrange wrote:
             | Is there a service like test-ipv6 to see if SCTP works?
             | Obviously harder to run since you can't do it in a browser.
        
               | simiones wrote:
               | I doubt there is, because it's just not a very popular
               | thing to even try. Even WebRTC, which uses SCTP for non-
               | streaming data channels, uses it over DTLS over UDP.
        
             | jeroenhd wrote:
             | I don't see why you can't use SCTP over the internet. HTTP2
             | has fallbacks for broken or generally shitty middleboxes, I
             | don't see why the weird corporate networks should hold back
             | the rest of the world.
             | 
             | TLS already does 0-RTT so you don't need QUIC for that.
             | 
             | The problem with UDP is that many optimisations are simply
             | not possible. The "TCP but with blackjack and hookers"
             | approach QUIC took makes it very difficult to accelerate.
             | 
             | SCTP is Fine(tm) on Linux but it's basically unimplemented
             | on Windows. Acceleration beyond what these protocols can do
             | right now requires either specific kernel/hardware QUIC
             | parsing or kernel mode SCTP on Windows.
             | 
             | Getting Microsoft to actually implement SCTP would be a lot
             | cleaner than to hack yet another protocol on top of UDP out
             | of fear of the mighty shitty middleboxes.
        
               | simiones wrote:
               | WebRTC decided they liked SCTP, so... they run it over
               | UDP (well, over DTLS over UDP). And while HTTP/2 might
               | fail over to HTTP/1.1, what would an SCTP session fall
               | back to?
               | 
               | The problem is not that Windows doesn't have in-kernel
               | support for SCTP (there are several user-space libraries
               | already available, you wouldn't even need to convince MS
               | to do anything). The blocking issue is that many, many
               | routers on the Internet, especially but not exclusively
               | around all corporate networks, will drop any packet that
               | is neither TCP or UDP over IP.
               | 
               | And if you think UDP is not optimized, I'd bet you'll
               | find that the SCTP situation is far, far worse.
               | 
               | And regarding 0-RTT, that only works for resumed
               | connections, and it is still actually 1 RTT (TCP
               | connection establish). New connections still need 2-3
               | round trips (1 for TCP, 1 for TLS 1.3, or 2 for TLS 1.2)
               | with TLS; they only need 1 round trip (even when using
               | TLS 1.2 for encryption). With QUIC, you can have true
               | 0-RTT traffic, sending the (encrypted) HTTP request data
               | in the very first packet you send to a host [that you
               | communicated with previously].
        
               | kbolino wrote:
               | How is userspace SCTP possible on Windows? Microsoft
               | doesn't implement it in WinSock and, back in the XP SP2
               | days, Microsoft disabled/hobbled raw sockets and has
               | never allowed them since. Absent a kernel-mode driver, or
               | Microsoft changing their stance (either on SCTP or raw
               | sockets), you cannot send pure SCTP from a modern Windows
               | box using only non-privileged application code.
        
               | simiones wrote:
               | Per these Microsoft docs [0], it seems that it should
               | still be possible to open a raw socket on Windows 11, as
               | long as you don't try to send TCP or UDP traffic through
               | it (and have the right permissions, presumably).
               | 
               | Of course, to open a raw socket you need privileged
               | access, just like you do on Linux, because a raw socket
               | allows you to see and respond to traffic from any other
               | application (or even system traffic). But in principle
               | you should be able to make a Service that handles SCTP
               | traffic for you, and a non-privileged application could
               | send its traffic to this service and receive data back.
               | 
               | I did find some user-space library that is purported to
               | support SCTP on Windows [1], but it may be quite old and
               | not supported. Not sure if there is any real interest in
               | something like this.
               | 
               | [0] https://learn.microsoft.com/en-
               | us/windows/win32/winsock/tcp-...
               | 
               | [1] https://www.sctp.de/sctp-download.html
        
               | kbolino wrote:
               | Interesting. I think the service approach would now be
               | viable since it can be paired with UNIX socket support,
               | which was added a couple of years ago (otherwise COM or
               | RPC would be necessary, making clients more complicated
               | and Windows-specific). But yeah, the lack of interest is
               | the bigger problem now.
        
             | tepmoc wrote:
             | SCTP works fine on internet, as long your egress is comming
             | from public IP and you don't perform NAT. So in case IPv6
             | its non issue at all unless you sit behind middle boxes.
             | 
             | Probably best approuch would be is like happy eye balls but
             | for transport. https://datatracker.ietf.org/doc/html/draft-
             | grinnemo-taps-he
        
               | simiones wrote:
               | How many corporate or residential firewalls are
               | configured to allow SCTP traffic through?
        
               | tepmoc wrote:
               | residential - not many. Corporate on other hand is
               | different story, thus why happy eyeballs for transport
               | still would needed to gradual rollout anyway.
        
       | wseqyrku wrote:
       | There's a work in progress for kernel support:
       | https://github.com/lxin/quic
        
       | crashingintoyou wrote:
       | Don't have access to the published version but draft at
       | https://arxiv.org/pdf/2310.09423 mentions ping RTT at 0.23ms.
       | 
       | As someone frequently at 150ms+ latency for a lot of websites
       | (and semi-frequently 300ms+ for non-geo-distributed websites), in
       | practice with the latency QUIC is easily the best for throughput,
       | HTTP/1.1 with a decent number of parallel connections is a not-
       | that-distant second, and in a remote third is HTTP/2 due to head-
       | of-line-blocking issues if/when a packet goes missing.
        
       | sylware wrote:
       | To go faster, you need to simplify a lot.
        
         | bell-cot wrote:
         | To force a lucrative cycle of hardware upgrades, you need
         | software to do the opposite.
         | 
         | True story: Back in the early aughties, Intel was hosting
         | regular seminars for dealers and integrators selling either
         | Intel-made PC's, or white box ones. I attended one of those,
         | and the Intel rep openly claimed that Intel had challenged
         | Microsoft to produce software which could bring a GHz CPU to
         | its knees.
        
       | larsonnn wrote:
       | Site is blocking Apples private relay :(
        
       | Banou wrote:
       | I think one of the reasons Google choose UDP is that it's already
       | a popular protocol, on which you can build reliable packets,
       | while also having the base UDP unreliability on the side.
       | 
       | From my perspective, which is a web developer's, having QUIC,
       | allowed the web standards to easily piggy back on top of it for
       | the Webtransport API, which is ways better than the current HTTP
       | stack and WebRTC which is a complete mess. Basically giving a TCP
       | and UDP implementation for the web.
       | 
       | Knowing this, I feel like it makes more sense to me why Google
       | choose this way of doing, which some people seem to be
       | criticizing.
        
         | simoncion wrote:
         | > I think one of the reasons Google choose UDP is that it's
         | already a popular protocol...
         | 
         | If you want your packets to reliably travel fairly unmolested
         | between you and an effectively-randomly-chosen-peer on The
         | Greater Internet, you have two transport protocol choices:
         | TCP/IP or UDP/IP.
         | 
         | If you don't want the connection-management & etc that TCP/IP
         | does for you, then you have exactly one choice.
         | 
         | > ...which some people seem to be criticizing.
         | 
         | People are criticizing the fact that on LAN link speeds (and
         | fast (for the US) home internet speeds) QUIC is no better than
         | (and sometimes worse than) previous HTTP transport protocols,
         | despite the large amount of effort put into it.
         | 
         | It also seems that some folks are suggesting that Google could
         | have put that time and effort into improving Linux's packet-
         | handling code and (presumably) getting that into both Android
         | and mainline Linux.
        
       | dathinab wrote:
       | it says it isn't fast _enough_
       | 
       | but as far as I can tell it's fast _enough_ just not as fast as
       | it could be
       | 
       | mainly they seem to test situations related to bandwidth/latency
       | which aren't very realistically for the majority of users
       | (because most users don't have supper fast high bandwidth
       | internet)
       | 
       | this doesn't meant QUIC can't be faster or we shouldn't look into
       | reducing overhead, just it's likely not as much as a deal as it
       | might initially loook
        
       | M2Ys4U wrote:
       | >The results show that QUIC and HTTP/2 exhibit similar
       | performance when the network bandwidth is relatively low (below
       | ~600 Mbps)
       | 
       | >Next, we investigate more realistic scenarios by conducting the
       | same file download experiments on major browsers: Chrome, Edge,
       | Firefox, and Opera. We observe that the performance gap is even
       | larger than that in the cURL and quic_client experiments: on
       | Chrome, QUIC begins to fall behind when the bandwidth exceeds
       | ~500 Mbps.
       | 
       | Okay, well, this isn't going to be a problem over the general
       | Internet, it's more of a problem in local networks.
       | 
       | For people that have high-speed connections, how often are you
       | getting >500Mbps from a single source?
        
         | sinuhe69 wrote:
         | Well, I have other issues with QUIC: when I access Facebook
         | with QUIC, the site often loads the first pages but then it
         | kind of hung, force me to refresh the site, which is annoying.
         | I didn't know it's a problem with QUIC, until I turned it off.
         | Since then, FB & Co. load at the same speed, but don't show
         | this annoying behavior anymore!
        
         | inetknght wrote:
         | > _For people that have high-speed connections, how often are
         | you getting >500Mbps from a single source?_
         | 
         | Often enough over HTTP/1.1 that discussions like this are
         | relevant to my concerns.
        
       | thelastparadise wrote:
       | Gotta be QUIC er than that, buddy!
        
       | throw0101c wrote:
       | Netflix has gotten TCP/TLS up to 800 Gbps (over many streams):
       | 
       | * https://news.ycombinator.com/item?id=32519881
       | 
       | * https://news.ycombinator.com/item?id=33449297
       | 
       | hitting 100 Gbps (20k-30k customers) using less that 100W:
       | 
       | * https://twitter.com/ocochardlabbe/status/1781848334145130661
       | 
       | * https://news.ycombinator.com/item?id=40630699#unv_40630785
        
       | ahmetozer wrote:
       | For mobile connectivity -> quic For home internet wifi & cable
       | access -> http2 For heavy loaded enterprise slow wifi network ->
       | quic
        
       | necessary wrote:
       | Does QUIC do better with packet loss compared to TCP? TCP
       | perceives packet loss as network congestion and so throughput
       | over high bandwidth+high packet loss links suffers.
        
       | AtNightWeCode wrote:
       | For us, what QUIC solves is that mobile users that move around in
       | the subway and so on are not getting these huge latency spikes.
       | Which was one of our biggest complains.
        
       | 404mm wrote:
       | When looking at the tested browsers, I want to ask why this was
       | not tested on Safari (which is currently the second most used
       | browser by share).
        
       | exabrial wrote:
       | QUIC needs an unencrypted mode!
        
         | suprjami wrote:
         | Pretty sure Dave Taht would explode if anyone did this.
        
       | edwintorok wrote:
       | TCP has a lot of offloads that may not all be available for UDP.
        
       ___________________________________________________________________
       (page generated 2024-09-09 23:01 UTC)