[HN Gopher] Using the FreeBSD Rack TCP Stack
___________________________________________________________________
Using the FreeBSD Rack TCP Stack
Author : rodrigo975
Score : 116 points
Date : 2021-09-16 08:29 UTC (14 hours ago)
(HTM) web link (klarasystems.com)
(TXT) w3m dump (klarasystems.com)
| bogomipz wrote:
| I'm having trouble parsing the following passage.
|
| >"However, when the loss is at the end of a transmission, near
| the end of the connection or after a chunk of video has been
| sent, then the receiver won't receive more segments that would
| generate ACKs. When this sort of Tail loss occurs, a lengthy
| retransmission time out (RTO) must fire before the final segments
| of data can be sent."
|
| I believe this whole passage is just describing TCP fast
| retransmit vs a retransmit timeout expiring. However if the final
| TCP segment from the sender is lost wouldn't the receiver also
| start sending duplicate ACKs as well? This sentence seems to
| indicate duplicate ACKs would not be sent if the last segment was
| the TCP segment that was lost. In other words a duplicate ACK
| from the receiver is lost and so the RTO expires.
| allanjude wrote:
| I think the implication is: TCP kind of assumed you will either
| keep transmitting, or close the connection.
|
| The many video-streaming type workloads, the connection will go
| idle, for seconds or even minutes at a time. If the loss is at
| the tail end of some activity, before a period of idle, the
| recovery takes a lot longer than it would if there are further
| activity on the connection.
| bogomipz wrote:
| Ah OK thanks that makes sense now. They did actually use the
| word "furthest" in the sentence previous to the one I quoted
| which also makes sense tin the context. Cheers.
| aidenn0 wrote:
| TIL that Linux has pluggable congestion control algorithms.
|
| Anyone know if there's one that can deal with severe buffer boat?
| I have a connection where I control both ends and I have seen
| ping times exceed 20s under load. Throughput is highly variable
| so I can't just throttle the connection.
| zamadatix wrote:
| You sure that's not just badly defined priority based queues or
| something else going on in-between? 20s is a hell of a lot of
| buffer unless this connection is via dialup modem.
|
| That being said it sounds like this connection is a good
| candidate for using BBR anyways so I'd give that a shot and see
| if anything changes.
| aidenn0 wrote:
| This is a G.hn link that usually sustains 50Mbps but
| occasionally drops to sub 1Mbps (so more like 1st gen DSL
| speeds than dialup speeds); it only drops that low for a
| fairly short period of time, but it takes a _long_ time to
| recover if the buffers are full.
| nix23 wrote:
| Hmm maybe make the buffers much smaller and try BBR instead
| of Cubic?
|
| https://blog.apnic.net/2020/01/10/when-to-use-and-not-use-
| bb...
| guenthert wrote:
| That's not a buffering issue (as others mentioned before,
| 20s would be quite a buffer). G.hn made the same mistake as
| dial-up modem long time ago: implementing forward error
| correction. That's a great feature on a space probe leaving
| the solar system, but not for peers on a LAN talking TCP,
| as that does performs its own error correction (as is well
| known). If some link layer hiccup occurs, the forward error
| correction of the link layer causes delays potentially
| causing re-transmissions on the TCP layer (check with
| `netstat -s`).
| aidenn0 wrote:
| Does it really take 20s to perform FEC on a packet? I
| assumed it was retransmitting at the link-level.
|
| Either way, the packet is buffered in the sense that it
| is stored in a buffer on the switch; otherwise the
| packets would be dropped, not eventually make it through.
| Aaronstotle wrote:
| Great write-up post, I wasn't aware that FreeBSD could do this
| and this does make me want to give FreeBSD another shot.
| drewg123 wrote:
| This stack was developed by my colleagues at Netflix (primarily
| Randall Stewart, known for SCTP). It serves the vast majority of
| our video and other CDN traffic.
| tiffanyh wrote:
| Since Netflix serves (relatively) few but extremely large
| files, would you recommend using this stack for a typical web
| server (serving lots of small files)?
| drewg123 wrote:
| Yes. We also use it for non-video small files on our CDN.
| [deleted]
| skissane wrote:
| Interesting, didn't know that FreeBSD has this feature.
|
| I know of one other operating system which has a somewhat similar
| feature, but not quite the same. z/OS supports running multiple
| TCP/IP stacks concurrently on the same OS instance [0]
|
| Whereas this is multiple TCP stacks, but still only one IP stack
| (or maybe one for v4 and one for v6)
|
| [0] https://www.ibm.com/docs/en/zos/2.2.0?topic=overview-
| conside...
| crest wrote:
| But on VNET enabled kernels (enabled in GENERIC since 13.0) you
| can have multiple _instances_ of the IP stack to provide jails
| with their own IP stacks including a loopback, firewall(s) and
| IPsec.
| skissane wrote:
| So, with FreeBSD multiple TCP stack feature, a single process
| can talk to multiple TCP stacks simultaneously, by calling
| setsockopt(TCP_FUNCTION_BLK) on each socket to select a
| different stack.
|
| Similarly, in z/OS, a single process can have sockets
| belonging to multiple TCP/IP stacks. There is a system call
| (setibmopt) which can be used to choose a default stack, and
| thereafter all sockets created by that process will be bound
| to that stack only (the "stack affinity" is inherited over
| fork/exec; can also be set with _BPXK_SETIBMOPT_TRANSPORT
| environment variable). Alternatively, you can call
| ioctl(SIOCSETRTTD) on a socket to pick which TCP/IP stack to
| use for that particular socket. There is also a feature,
| CINET, where the OS chooses which TCP/IP stack to use for
| each socket automatically, based on the address the process
| binds it to. CINET asks each TCP/IP stack to provide a copy
| of its routing tables, and then uses those routing tables to
| "preroute" sockets to the appropriate stack.
|
| But I get the impression VNET doesn't allow a single process
| to use multiple IP stack instances simultaneously? If VNET is
| bound to jails, a single process can belong to only one jail.
|
| One reason why z/OS has this multiple TCP/IP stack support,
| is historically the TCP/IP stack has been a third party
| product, not a core part of the OS. So instead of IBM's
| stack, but some people used third party ones instead, such as
| CA TCPaccess (at one point resold by Cisco as IOS for S/390).
| One can even use both products on the same OS instance,
| primarily to help with piecemeal migrations from one to the
| other. Other operating systems with a history of supporting
| TCP/IP stacks from multiple vendors include OpenVMS and older
| versions of Windows (especially 3.x)
| hestefisk wrote:
| VNET has been around for quite a while though...
| crest wrote:
| And has been a quick way to panic() for most of them.
| nix23 wrote:
| No not "most" but some of them. And it's declared stable
| in FreeBSD13
| stiray wrote:
| Thank you for this link, I wasnt aware of that... building kernel
| just as I type :)
___________________________________________________________________
(page generated 2021-09-16 23:02 UTC)