[HN Gopher] Using the FreeBSD Rack TCP Stack
       ___________________________________________________________________
        
       Using the FreeBSD Rack TCP Stack
        
       Author : rodrigo975
       Score  : 116 points
       Date   : 2021-09-16 08:29 UTC (14 hours ago)
        
 (HTM) web link (klarasystems.com)
 (TXT) w3m dump (klarasystems.com)
        
       | bogomipz wrote:
       | I'm having trouble parsing the following passage.
       | 
       | >"However, when the loss is at the end of a transmission, near
       | the end of the connection or after a chunk of video has been
       | sent, then the receiver won't receive more segments that would
       | generate ACKs. When this sort of Tail loss occurs, a lengthy
       | retransmission time out (RTO) must fire before the final segments
       | of data can be sent."
       | 
       | I believe this whole passage is just describing TCP fast
       | retransmit vs a retransmit timeout expiring. However if the final
       | TCP segment from the sender is lost wouldn't the receiver also
       | start sending duplicate ACKs as well? This sentence seems to
       | indicate duplicate ACKs would not be sent if the last segment was
       | the TCP segment that was lost. In other words a duplicate ACK
       | from the receiver is lost and so the RTO expires.
        
         | allanjude wrote:
         | I think the implication is: TCP kind of assumed you will either
         | keep transmitting, or close the connection.
         | 
         | The many video-streaming type workloads, the connection will go
         | idle, for seconds or even minutes at a time. If the loss is at
         | the tail end of some activity, before a period of idle, the
         | recovery takes a lot longer than it would if there are further
         | activity on the connection.
        
           | bogomipz wrote:
           | Ah OK thanks that makes sense now. They did actually use the
           | word "furthest" in the sentence previous to the one I quoted
           | which also makes sense tin the context. Cheers.
        
       | aidenn0 wrote:
       | TIL that Linux has pluggable congestion control algorithms.
       | 
       | Anyone know if there's one that can deal with severe buffer boat?
       | I have a connection where I control both ends and I have seen
       | ping times exceed 20s under load. Throughput is highly variable
       | so I can't just throttle the connection.
        
         | zamadatix wrote:
         | You sure that's not just badly defined priority based queues or
         | something else going on in-between? 20s is a hell of a lot of
         | buffer unless this connection is via dialup modem.
         | 
         | That being said it sounds like this connection is a good
         | candidate for using BBR anyways so I'd give that a shot and see
         | if anything changes.
        
           | aidenn0 wrote:
           | This is a G.hn link that usually sustains 50Mbps but
           | occasionally drops to sub 1Mbps (so more like 1st gen DSL
           | speeds than dialup speeds); it only drops that low for a
           | fairly short period of time, but it takes a _long_ time to
           | recover if the buffers are full.
        
             | nix23 wrote:
             | Hmm maybe make the buffers much smaller and try BBR instead
             | of Cubic?
             | 
             | https://blog.apnic.net/2020/01/10/when-to-use-and-not-use-
             | bb...
        
             | guenthert wrote:
             | That's not a buffering issue (as others mentioned before,
             | 20s would be quite a buffer). G.hn made the same mistake as
             | dial-up modem long time ago: implementing forward error
             | correction. That's a great feature on a space probe leaving
             | the solar system, but not for peers on a LAN talking TCP,
             | as that does performs its own error correction (as is well
             | known). If some link layer hiccup occurs, the forward error
             | correction of the link layer causes delays potentially
             | causing re-transmissions on the TCP layer (check with
             | `netstat -s`).
        
               | aidenn0 wrote:
               | Does it really take 20s to perform FEC on a packet? I
               | assumed it was retransmitting at the link-level.
               | 
               | Either way, the packet is buffered in the sense that it
               | is stored in a buffer on the switch; otherwise the
               | packets would be dropped, not eventually make it through.
        
       | Aaronstotle wrote:
       | Great write-up post, I wasn't aware that FreeBSD could do this
       | and this does make me want to give FreeBSD another shot.
        
       | drewg123 wrote:
       | This stack was developed by my colleagues at Netflix (primarily
       | Randall Stewart, known for SCTP). It serves the vast majority of
       | our video and other CDN traffic.
        
         | tiffanyh wrote:
         | Since Netflix serves (relatively) few but extremely large
         | files, would you recommend using this stack for a typical web
         | server (serving lots of small files)?
        
           | drewg123 wrote:
           | Yes. We also use it for non-video small files on our CDN.
        
         | [deleted]
        
       | skissane wrote:
       | Interesting, didn't know that FreeBSD has this feature.
       | 
       | I know of one other operating system which has a somewhat similar
       | feature, but not quite the same. z/OS supports running multiple
       | TCP/IP stacks concurrently on the same OS instance [0]
       | 
       | Whereas this is multiple TCP stacks, but still only one IP stack
       | (or maybe one for v4 and one for v6)
       | 
       | [0] https://www.ibm.com/docs/en/zos/2.2.0?topic=overview-
       | conside...
        
         | crest wrote:
         | But on VNET enabled kernels (enabled in GENERIC since 13.0) you
         | can have multiple _instances_ of the IP stack to provide jails
         | with their own IP stacks including a loopback, firewall(s) and
         | IPsec.
        
           | skissane wrote:
           | So, with FreeBSD multiple TCP stack feature, a single process
           | can talk to multiple TCP stacks simultaneously, by calling
           | setsockopt(TCP_FUNCTION_BLK) on each socket to select a
           | different stack.
           | 
           | Similarly, in z/OS, a single process can have sockets
           | belonging to multiple TCP/IP stacks. There is a system call
           | (setibmopt) which can be used to choose a default stack, and
           | thereafter all sockets created by that process will be bound
           | to that stack only (the "stack affinity" is inherited over
           | fork/exec; can also be set with _BPXK_SETIBMOPT_TRANSPORT
           | environment variable). Alternatively, you can call
           | ioctl(SIOCSETRTTD) on a socket to pick which TCP/IP stack to
           | use for that particular socket. There is also a feature,
           | CINET, where the OS chooses which TCP/IP stack to use for
           | each socket automatically, based on the address the process
           | binds it to. CINET asks each TCP/IP stack to provide a copy
           | of its routing tables, and then uses those routing tables to
           | "preroute" sockets to the appropriate stack.
           | 
           | But I get the impression VNET doesn't allow a single process
           | to use multiple IP stack instances simultaneously? If VNET is
           | bound to jails, a single process can belong to only one jail.
           | 
           | One reason why z/OS has this multiple TCP/IP stack support,
           | is historically the TCP/IP stack has been a third party
           | product, not a core part of the OS. So instead of IBM's
           | stack, but some people used third party ones instead, such as
           | CA TCPaccess (at one point resold by Cisco as IOS for S/390).
           | One can even use both products on the same OS instance,
           | primarily to help with piecemeal migrations from one to the
           | other. Other operating systems with a history of supporting
           | TCP/IP stacks from multiple vendors include OpenVMS and older
           | versions of Windows (especially 3.x)
        
           | hestefisk wrote:
           | VNET has been around for quite a while though...
        
             | crest wrote:
             | And has been a quick way to panic() for most of them.
        
               | nix23 wrote:
               | No not "most" but some of them. And it's declared stable
               | in FreeBSD13
        
       | stiray wrote:
       | Thank you for this link, I wasnt aware of that... building kernel
       | just as I type :)
        
       ___________________________________________________________________
       (page generated 2021-09-16 23:02 UTC)