[HN Gopher] TCP Connection Repair
       ___________________________________________________________________
        
       TCP Connection Repair
        
       Author : mfiguiere
       Score  : 39 points
       Date   : 2022-10-26 19:38 UTC (3 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | verisimilitudes wrote:
       | So, this is making a TCP Connection serializable in an ad hoc
       | fashion.
       | 
       | > It is natural to want those connections to follow the container
       | to its new host, preferably without the remote end even noticing
       | that something has changed, but the Linux networking stack was
       | not written with this kind of move in mind.
       | 
       | Yes, it was written in the C language.
        
         | yjftsjthsd-h wrote:
         | Why would the programming language have anything to do with it?
        
           | verisimilitudes wrote:
           | Some languages provide serialization of most anything by
           | default, such as Lisp. Now, even in Lisp there are objects
           | which don't make sense to serialize, including a TCP
           | connection; however, the components thereof can be collected
           | and sent across the wire or wherever else in a standardized
           | way. The C language, in comparison, offers a few
           | serialization routines for non-structured types, and that's
           | about all.
           | 
           | So, my point is the ability to take running state, serialize
           | it, and reinstate it elsewhere is only impressive to those
           | who have misused computers for so long that they don't
           | understand this was something basic in 1970 at the latest.
        
       | yokaze wrote:
       | (2012)
        
       | dmw_ng wrote:
       | This can also be used to implement connection acceleration with
       | no kernel-side hackery involved, by having userspace maintain a
       | pool of half-open connections by spamming SYNs at a target, then
       | constructing a regular socket via TCP_REPAIR when a connection is
       | actually required. That allows omitting one roundtrip from a
       | typical TCP connection setup, which may be substantial in a
       | variety of scenarios.
       | 
       | The technique sounds messy but actually it involves not much work
       | when the target is Linux, a single half-open SYN is good for 63
       | seconds with the default sysctl settings, which seem to be used
       | with almost every Internet service you might want to reach
       | (including e.g. Google)
       | 
       | I was playing with this during an interview last year and
       | intended to write it up, but never got around to it. The
       | technique seems to work as intended, I made a little prototype
       | reverse proxy for it in Python using a temporary listening socket
       | with a drop-all SO_ATTACH_FILTER to allocate a port number and
       | prevent Linux on the initiator side from responding with a RST to
       | ACKs for a half-open connection it knows nothing about
        
       | londons_explore wrote:
       | This API is only used by one project to my knowledge...
       | 
       | CRIU is a project to save an application or containers complete
       | running state to a file, and then restore it elsewhere.
       | 
       | It comes with _lots_ of caveats.
        
         | boucher wrote:
         | CRIU is a really impressive tool, which can do some really cool
         | things. It's pretty difficult to use, given the nature of what
         | it's doing and all the moving parts to orchestrate. I added
         | experimental support to Docker for migrating containers, but
         | for lots of reasons it never made it as a full fledged feature.
        
           | xuhu wrote:
           | Since the ELF interpreter and glibc initialize differently
           | depending on cpu flags, it's probably unlikely that migrating
           | between arbitrary hosts can be expected to always work.
        
         | touisteur wrote:
         | I use it a lot (without criu or libsoccr) for high-availability
         | shenanigans, to avoid the reconnection delay. Machine A is
         | 'main' and had established a connection to distant-machine 1. A
         | crashes (the hardware stops), machine B takes over its MAC, IP
         | and all its established TCP connections. Nothing can be seen
         | from the distant-machine side (maybe a gratuitous ARP slips
         | out... to no avail). And yes, for <1 millisecond takeover this
         | is necessary and I thank the criu project with all my
         | engineering heart for not going the 'just put a module in there
         | and be done with it' but actually making sockets checkpointable
         | and restartable, and saving me from the pains of a userland
         | network stack.
         | 
         | The actual details are far more funny and interesting (we could
         | talk about checkpoint not being an atomic operation for the
         | kernel, how you need to do some magic with "plug" qdiscs and
         | qdiscs being applicable on egress only you'll look into IFBs
         | and I love Linux it is so versatile and full of amazing little
         | features). Don't forget to hot-update conntrack too...
         | 
         | And since libsoccr is GPL you might need to do this yourself,
         | and you'll want to do it anyway, because it's interesting and
         | you'll learn so many things.
         | 
         | My only gripe is the checkpoint still being a bit slow and
         | maybe if I keep annoying Jens Axboe on twitter maybe soon it'll
         | be a io_uring chain <3.
        
           | remram wrote:
           | How do you know about the other machine's outgoing
           | connections in real-time?
        
           | dj_gitmo wrote:
           | If you don't mind me asking, what kind of workload requires
           | this kind of "high-availability shenanigans". Sounds
           | fascinating.
        
       | mrlonglong wrote:
       | This article is dated 2012.
        
       | barbazoo wrote:
       | > Migrating a running container from one physical host to another
       | is a tricky job on a number of levels
       | 
       | I don't have much context here, what's the use case where one
       | would migrate a running container from one physical host to
       | another?
        
         | dilyevsky wrote:
         | Any application that has internal state that can't be easily
         | checkpointed/restored but has to be moved to a different node.
         | For example an ffmpeg process that is running a long transcode
         | and you want to utilize cheap spot instances
        
       | rektide wrote:
       | One of the biggest interests & excitements I feel over QUIC &
       | HTTP3 is the potential for something really different &
       | drastically better in this realm. Right out of the box, QUIC is
       | "connectionless", using cryptography to establish session. I feel
       | like there's so much more possibility for a data-center to move
       | around who is serving a QUIC connection. I have a lot of research
       | to do, but ideally that connection can get routed stream by
       | stream, & individual servers can do some kind of Direct Server
       | Return (DSR), to individual streams. But I'm probably pie in the
       | sky with these over-flowing hopes.
       | 
       | Edit: oh here's a Meta talk on their QUIC CDN doing DSR[1].
       | 
       | The original "live migration of virtual machines"[2] paper blew
       | me away & reset my expectations for computing & the connectivity,
       | way back in 2005. They live migrated a Quake 3 server. :)
       | 
       | [1] https://engineering.fb.com/2022/07/06/networking-
       | traffic/wat...
       | 
       | [2]
       | https://lass.cs.umass.edu/~shenoy/courses/spring15/readings/...
        
         | touisteur wrote:
         | Multihoming is one of the key features of most protocols
         | invented after tcp (sctp, QUIC, mptcp) and for good reason, it
         | is so so useful in many scenarios.
        
           | convolvatron wrote:
           | I know its heresy to consider.
           | 
           | but given the place where we ended up, maybe host addresses
           | make more sense than interface addresses (ignoring the effect
           | that would have on routing table aggregation)
        
       | rwmj wrote:
       | Presumably packets which continue to arrive at the old host must
       | get forwarded to the new host, and returning packets must be
       | spoofed. Or does this also involve some upstream network
       | reconfiguration?
        
         | randombits0 wrote:
         | That's implied, either a tunnel or just on the same domain.
         | You'd have to pull some ARP shenanigans as well.
         | 
         | When you are elbow-deep into the state of the interface, those
         | other issues should be pretty trivial.
        
       ___________________________________________________________________
       (page generated 2022-10-26 23:00 UTC)