[HN Gopher] TCP Connection Repair
___________________________________________________________________
TCP Connection Repair
Author : mfiguiere
Score : 39 points
Date : 2022-10-26 19:38 UTC (3 hours ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| verisimilitudes wrote:
| So, this is making a TCP Connection serializable in an ad hoc
| fashion.
|
| > It is natural to want those connections to follow the container
| to its new host, preferably without the remote end even noticing
| that something has changed, but the Linux networking stack was
| not written with this kind of move in mind.
|
| Yes, it was written in the C language.
| yjftsjthsd-h wrote:
| Why would the programming language have anything to do with it?
| verisimilitudes wrote:
| Some languages provide serialization of most anything by
| default, such as Lisp. Now, even in Lisp there are objects
| which don't make sense to serialize, including a TCP
| connection; however, the components thereof can be collected
| and sent across the wire or wherever else in a standardized
| way. The C language, in comparison, offers a few
| serialization routines for non-structured types, and that's
| about all.
|
| So, my point is the ability to take running state, serialize
| it, and reinstate it elsewhere is only impressive to those
| who have misused computers for so long that they don't
| understand this was something basic in 1970 at the latest.
| yokaze wrote:
| (2012)
| dmw_ng wrote:
| This can also be used to implement connection acceleration with
| no kernel-side hackery involved, by having userspace maintain a
| pool of half-open connections by spamming SYNs at a target, then
| constructing a regular socket via TCP_REPAIR when a connection is
| actually required. That allows omitting one roundtrip from a
| typical TCP connection setup, which may be substantial in a
| variety of scenarios.
|
| The technique sounds messy but actually it involves not much work
| when the target is Linux, a single half-open SYN is good for 63
| seconds with the default sysctl settings, which seem to be used
| with almost every Internet service you might want to reach
| (including e.g. Google)
|
| I was playing with this during an interview last year and
| intended to write it up, but never got around to it. The
| technique seems to work as intended, I made a little prototype
| reverse proxy for it in Python using a temporary listening socket
| with a drop-all SO_ATTACH_FILTER to allocate a port number and
| prevent Linux on the initiator side from responding with a RST to
| ACKs for a half-open connection it knows nothing about
| londons_explore wrote:
| This API is only used by one project to my knowledge...
|
| CRIU is a project to save an application or containers complete
| running state to a file, and then restore it elsewhere.
|
| It comes with _lots_ of caveats.
| boucher wrote:
| CRIU is a really impressive tool, which can do some really cool
| things. It's pretty difficult to use, given the nature of what
| it's doing and all the moving parts to orchestrate. I added
| experimental support to Docker for migrating containers, but
| for lots of reasons it never made it as a full fledged feature.
| xuhu wrote:
| Since the ELF interpreter and glibc initialize differently
| depending on cpu flags, it's probably unlikely that migrating
| between arbitrary hosts can be expected to always work.
| touisteur wrote:
| I use it a lot (without criu or libsoccr) for high-availability
| shenanigans, to avoid the reconnection delay. Machine A is
| 'main' and had established a connection to distant-machine 1. A
| crashes (the hardware stops), machine B takes over its MAC, IP
| and all its established TCP connections. Nothing can be seen
| from the distant-machine side (maybe a gratuitous ARP slips
| out... to no avail). And yes, for <1 millisecond takeover this
| is necessary and I thank the criu project with all my
| engineering heart for not going the 'just put a module in there
| and be done with it' but actually making sockets checkpointable
| and restartable, and saving me from the pains of a userland
| network stack.
|
| The actual details are far more funny and interesting (we could
| talk about checkpoint not being an atomic operation for the
| kernel, how you need to do some magic with "plug" qdiscs and
| qdiscs being applicable on egress only you'll look into IFBs
| and I love Linux it is so versatile and full of amazing little
| features). Don't forget to hot-update conntrack too...
|
| And since libsoccr is GPL you might need to do this yourself,
| and you'll want to do it anyway, because it's interesting and
| you'll learn so many things.
|
| My only gripe is the checkpoint still being a bit slow and
| maybe if I keep annoying Jens Axboe on twitter maybe soon it'll
| be a io_uring chain <3.
| remram wrote:
| How do you know about the other machine's outgoing
| connections in real-time?
| dj_gitmo wrote:
| If you don't mind me asking, what kind of workload requires
| this kind of "high-availability shenanigans". Sounds
| fascinating.
| mrlonglong wrote:
| This article is dated 2012.
| barbazoo wrote:
| > Migrating a running container from one physical host to another
| is a tricky job on a number of levels
|
| I don't have much context here, what's the use case where one
| would migrate a running container from one physical host to
| another?
| dilyevsky wrote:
| Any application that has internal state that can't be easily
| checkpointed/restored but has to be moved to a different node.
| For example an ffmpeg process that is running a long transcode
| and you want to utilize cheap spot instances
| rektide wrote:
| One of the biggest interests & excitements I feel over QUIC &
| HTTP3 is the potential for something really different &
| drastically better in this realm. Right out of the box, QUIC is
| "connectionless", using cryptography to establish session. I feel
| like there's so much more possibility for a data-center to move
| around who is serving a QUIC connection. I have a lot of research
| to do, but ideally that connection can get routed stream by
| stream, & individual servers can do some kind of Direct Server
| Return (DSR), to individual streams. But I'm probably pie in the
| sky with these over-flowing hopes.
|
| Edit: oh here's a Meta talk on their QUIC CDN doing DSR[1].
|
| The original "live migration of virtual machines"[2] paper blew
| me away & reset my expectations for computing & the connectivity,
| way back in 2005. They live migrated a Quake 3 server. :)
|
| [1] https://engineering.fb.com/2022/07/06/networking-
| traffic/wat...
|
| [2]
| https://lass.cs.umass.edu/~shenoy/courses/spring15/readings/...
| touisteur wrote:
| Multihoming is one of the key features of most protocols
| invented after tcp (sctp, QUIC, mptcp) and for good reason, it
| is so so useful in many scenarios.
| convolvatron wrote:
| I know its heresy to consider.
|
| but given the place where we ended up, maybe host addresses
| make more sense than interface addresses (ignoring the effect
| that would have on routing table aggregation)
| rwmj wrote:
| Presumably packets which continue to arrive at the old host must
| get forwarded to the new host, and returning packets must be
| spoofed. Or does this also involve some upstream network
| reconfiguration?
| randombits0 wrote:
| That's implied, either a tunnel or just on the same domain.
| You'd have to pull some ARP shenanigans as well.
|
| When you are elbow-deep into the state of the interface, those
| other issues should be pretty trivial.
___________________________________________________________________
(page generated 2022-10-26 23:00 UTC)