[HN Gopher] TCP Fast Open? Not so fast (2021)
       ___________________________________________________________________
        
       TCP Fast Open? Not so fast (2021)
        
       Author : fanf2
       Score  : 32 points
       Date   : 2024-10-23 11:42 UTC (11 hours ago)
        
 (HTM) web link (blog.apnic.net)
 (TXT) w3m dump (blog.apnic.net)
        
       | dang wrote:
       | Discussed at the time:
       | 
       |  _TCP Fast Open? Not so fast_ -
       | https://news.ycombinator.com/item?id=27745422 - July 2021 (20
       | comments)
        
       | lemagedurage wrote:
       | It's one of many network protocol improvements that could never
       | be used effectively due to middle boxes. QUIC is specifically
       | designed to prevent ossification like this.
       | 
       | https://en.wikipedia.org/wiki/Protocol_ossification#Examples
        
         | toast0 wrote:
         | This kind of ossification can be reduced by pressure campaigns.
         | 
         | Apple is very good at these. Mobile carriers were forced into
         | configuring IPv6 and allowing MPTCP because Apple included that
         | as part of certification to sell iPhones. Convince Apple that
         | TCP Fast Open is important, and they'll make it work on mobile
         | carriers through their considerable pressure. Home networks,
         | not so much, so you've always got to have heuristics and
         | detection; which again, Apple is very good at; they've had
         | effective and rapid fallback for bad path MTU on iPhones for a
         | lot longer than Androids, even though the Android kernel had
         | options for it since the beginning --- they were only enabled
         | recently. I'm very much not an Apple fan beyond the Apple II
         | era, but they do client side networking very well.
         | 
         | Google probably can't exert this kind of pressure directly,
         | they don't have the carrier sales volume, IMHO. Maybe Samsung
         | could. Nokia could have before the fall. Google could put it
         | into a PageSpeed type tool though; they've got influence
         | through that kind of tooling. And they control the two ends of
         | lots of traffic, so they could test through changes in ChromeOS
         | and their servers.
        
           | mschuster91 wrote:
           | > Google probably can't exert this kind of pressure directly,
           | they don't have the carrier sales volume, IMHO.
           | 
           | Google can just push the responsibility to manufacturers with
           | the Play Store certification. That's a huuuuuge leverage they
           | have.
           | 
           | Sadly they don't use it for much else than pushing anti-root
           | crap.
        
       | kev009 wrote:
       | I sponsored implementing the client side for FreeBSD when I
       | worked at a large service provider. The use is cleaner when you
       | have control of both ends, such as a cache/proxy to origin or
       | peer cache and you aren't going to be subject to packet manglers
       | and can configure a shared cookie out of band.
       | 
       | The typical microservices misarchitecture would really benefit
       | from this kind of thing since the setup time of TCP is
       | substantial.
       | 
       | We did have some ambitions to expose TFO to customers in some way
       | but unfortunately I left before that ever went under test to see
       | how it could be commercialized.
        
       | fweimer wrote:
       | One curiosity: TCP Fast Open requires working path MTU discovery
       | because MSS clamping no longer works as a hack advertise a lower
       | MTU: the Fast Open cookie adds an arbitrary amount of data which
       | counts towards the packet length, but not towards the segment
       | length.
        
         | toast0 wrote:
         | As described in the RFC, TCP Fast Open is supposed to be used
         | from a client IP that connected to the server IP recently. The
         | client could be expected to have discovered the effective MTU
         | during that previous connection and limit it's SYN data to that
         | size.
         | 
         | The server gets the (presumably clamped) MSS option with the
         | SYN+data, and could use that, or possibly include the previous
         | MSS in its syn cookie and use the lesser of the two. Or more
         | conservatively the size of the client SYN packet or 576/1280.
         | 
         | MSS/MTU doesn't have to be the same in both directions, but
         | it's usually a pretty good assumption to assume it is. I've
         | seen much better results with broken networks when the syn+ack
         | sends back min(server mss, client mss) rather than always
         | sending back client mss; and the impact on working networks is
         | very small. sending back min(server mss, client mss - X) where
         | X is 8 (assume PPPoE) or 20 (assume IPIP tunnel) works even
         | better at establishing working connections, although with some
         | additional overhead where the client actually knows its mss.
         | Some devices clamp MSS on outbound SYN but not on inbound
         | SYN+ACK. _sigh_
         | 
         | Personally, if I were to deploy something like TCP Fast Open,
         | I'd dispense with the cookie... allow clients to speculatively
         | include syn data, if they want. Cap to packets of 576/1280
         | length to be reasonable. Servers should consider recent results
         | of Fast Open and local capacity to decide if they want to
         | accept it or not. Server response should be limited to the same
         | size as the client sent to avoid amplification. Publish some
         | common heuristics --- if clients sending fast open get to
         | connected in 90% of syn+acks sent, go ahead and process the
         | fast open data, but when success falls below that, don't do it.
         | Have some limit on how many fast opens you want to leave open.
         | Server side done.
         | 
         | Client side, try it, if it doesn't work try without. Every time
         | the retry without works, double a skip counter; every time the
         | try with it works, decrement the skip counter. Max out at
         | trying once every 1024 connections. You can even do happy
         | eyeballs stuff like send out a plain SYN after a short time,
         | use whichever comes back first, but if the fast open does come
         | back a bit after plain syn, you know that fast open is viable
         | on this network / to that destination.
        
       | rwmj wrote:
       | It's a good article but it seemed to leave some questions
       | unanswered for me. (1) Why isn't TFO enabled by default? I guess
       | the answer involves bad middleboxes, but then how is the
       | client/server meant to know any better, and isn't the very
       | conservative fallback supposed to mitigate that? (2) What
       | _should_ the queue size be set to?
        
       ___________________________________________________________________
       (page generated 2024-10-23 23:01 UTC)