[HN Gopher] A transport protocol's view of Starlink
       ___________________________________________________________________
        
       A transport protocol's view of Starlink
        
       Author : rolph
       Score  : 147 points
       Date   : 2024-11-30 23:12 UTC (5 days ago)
        
 (HTM) web link (blog.apnic.net)
 (TXT) w3m dump (blog.apnic.net)
        
       | NelsonMinar wrote:
       | I noticed a huge improvement just switching to stock BBR to my
       | Starlink as well. During a particularly congested time I was
       | bouncing between 5 to 12 Mbps via Starlink. With BBR enabled I
       | got a steady 12. The main problem is that you need BBR on the
       | server for this to work, as a client using Starlink I don't have
       | any control over what all the servers I connect to are doing.
       | (Other than my one server I was testing with).
       | 
       | I like Huston's idea of a Starlink-tuned BBR, I wonder if it's a
       | traffic shaping that SpaceX could apply themselves in their
       | ground station datacenters? That'd involve messing with the TCP
       | stream though, maybe a bad idea.
       | 
       | The fact that Starlink has this 15 second switching built in is
       | pretty weird, but you can definitely see it in every continuous
       | latency measure. Even weirder it seems to be globally
       | synchronized: all the hundreds of thousands of dishes are
       | switching to new satellites the same millisecond, globally.
       | Having a customized BBR aware of that 15 second cycle is an
       | interesting idea.
        
         | btilly wrote:
         | If you use a VPN, wouldn't it suffice to just make your VPN
         | connection use BBR?
         | 
         | Ditto if you use an https proxy of some kind.
        
           | jofla_net wrote:
           | I would guess that that would be beneficial, but again only
           | if youre using a TCP vpn, which is suboptimal for other
           | reasons. I think it was called meltdown. If that is all you
           | have access to though, im sure it would help.
        
           | Hikikomori wrote:
           | Proxy yes, vpn no. Tcp over tcp vpn is bad, no tcp vpn would
           | make no difference to no vpn.
        
           | Alex-Programs wrote:
           | https://github.com/apernet/hysteria has the option to use
           | https://github.com/apernet/tcp-brutal, a deliberately
           | unfair/selfish congestion control algorithm.
           | 
           | It's designed to mitigate certain methods of blocking-via-
           | throttling.
           | 
           | I looked into it for a report I wrote a while back, and I was
           | surprised to find that nobody has made something purpose-
           | built for greedy TCP congestion handling in order to improve
           | performance at the expense of others. If there is such a
           | thing, I couldn't find it. Perhaps I'm a little too cynical
           | in my expectations!
           | 
           | Maybe TCP-over-TCP is so bad that it's not worth it?
        
       | sgt101 wrote:
       | Fascinating that the throughput is about 250mbs. Presumably
       | that's over the area served by one satellite? I wonder how much
       | cache they put in each one... I vaguely remember a stat that 90%
       | of requests (in data terms) are served from a TB of cache on the
       | consumer internet, perhaps having the satellites gossip for cache
       | hits would work to preserve uplink bandwidth as well. Maybe
       | downlink bandwidth is the thing for this network though and
       | caches just won't work.
        
         | echoangle wrote:
         | I would be surprised if there is a lot or even any cache on the
         | satellites itself. Fast large storage that's radiation hardened
         | would be extremely expensive, and they have a lot of
         | satellites. The satellites are low enough that general
         | radiation isn't that bad, but every pass through the South
         | Atlantic Anomaly would risk damage if regular flash storage is
         | used.
        
           | Alex-Programs wrote:
           | Beyond that, how would the cache actually work?
           | 
           | Everything is HTTPS nowadays; you can't just MITM and stick a
           | caching proxy in front. You could put DNS on the sat I
           | suppose, but other than that you'd need to have a full
           | Netflix/Cloudflare node, and the sats are far too small and
           | numerous for that.
        
             | michaelt wrote:
             | I'm working on a new type of protocol so that when there's
             | a large-scale online event like the superbowl or a major
             | boxing match on a streaming platform,
             | 
             | satellite and cable internet providers will be able to send
             | a single copy of the video stream on the uplink, and using
             | a new technique we've named "casting the net broadly"
             | multiple viewers will be able to receive the same downlink
             | packets
             | 
             | which we believe will have excellent scalability, enabling
             | web-scale streaming video.
        
               | spockz wrote:
               | Isn't this what multicast does?
        
         | vardump wrote:
         | I think a single Starlink v1 satellite has a maximum bandwidth
         | of about 20 Gbps. Newer versions might have a lot more.
        
       | kevincox wrote:
       | > the endpoints need to use large buffers to hold a copy of all
       | the unacknowledged data, as is required by the TCP protocol.
       | 
       | It makes me wonder if anyone has tried to break down the layers
       | to optimize this. In the fairly common case of serving a file off
       | of long-term storage you can jut fetch the old data if needed
       | (likely from the page cache anyways, but still better than
       | duplicating it) and some encryption algorithms are seekable so
       | you can redo the encryption as well.
       | 
       | Right now the kernel doesn't really have a choice but to buffer
       | all unacknowledged data as the UNIX socket API has no provision
       | for requesting old data from the writer. But a smarter API could
       | put the application in charge of making that data available as
       | required and avoid the need for extra copies in common cases.
       | 
       | I know that Netflix did lots of work with the FreeBSD kernel for
       | file to socket and eventually adding in-kernel TLS to remove user
       | space from the equation. But I don't know if they went as far to
       | remove the socket buffers.
        
         | cyberax wrote:
         | > It makes me wonder if anyone has tried to break down the
         | layers to optimize this.
         | 
         | Yep. There was a bunch of proxy servers that optimized HTTP for
         | satellite service. I used Globax back in the day to speed up
         | one-way satellite service:
         | https://web.archive.org/web/20040602203838/http://globax.inf...
         | 
         | Back then traffic was around 10 cents per megabyte in my city,
         | so satellite service was a good way to shave off these costs.
        
         | lxgr wrote:
         | There aren't necessarily extra copies even when just using TCP,
         | thanks to sendfile(2) and similar mechanisms.
         | 
         | Buffer size isn't that much of an issue either, given the
         | relatively low latencies involved and that you can indicate
         | which parts exactly are missing pretty accurately these days
         | with selective TCP acknowledgements, so you'll need at most a
         | few round trips to identify these to the sender and eventually
         | receive them.
         | 
         | Practically, you'll probably not see much loss anyway, for
         | better or worse: TCP historically interpreted packet loss as
         | congestion, instead of actual non-congestion-induced loss on
         | the physical layer. This is why most lower-layer protocols with
         | higher error rates than Ethernet usually implement some sort of
         | lower-layer ARQ to present themselves as "Ethernet-like" to
         | upper layers in terms of error/loss rate, and this in turn has
         | made loss-tolerant TCP (such as BBR, as described in the
         | article) less of a research priority, I believe.
        
         | moandcompany wrote:
         | This is an "old" problem that has historically been addressed
         | through things like "Performance Enhancing Proxies (PEPs)" that
         | are defined in RFC 3135 and RFC 3449.
         | (https://en.wikipedia.org/wiki/Performance-enhancing_proxy)
         | 
         | In internet-style communications, such as routing IP traffic
         | over satellite links to low-earth-orbit or GEO, with much
         | longer round-trip times, link latency is substantially higher
         | than most terrestrial wired or wireless applications and
         | acknowledgements required as part of TCP take much longer to
         | facilitate. PEPs as an example augment the connection allowing
         | end-user/client devices in the network with inline-PEPs to
         | retain their normal network settings and perform the task of
         | running or starting sessions with higher TCP-window sizes, as a
         | method for improving overall throughput.
         | 
         | The utility of a PEP, or PEP-acting device goes up when you
         | imagine multiple devices, or a network of devices attached to a
         | satellite communications terminal for WAN/backhaul connections
         | as the link's performance can be managed at one point versus on
         | all downstream client devices.
        
           | lxgr wrote:
           | TIL that that's what they're called. Thank you!
           | 
           | Do you know if they became obsolete due to modern TCP stacks
           | handling LFNs better or for some other reason?
           | 
           | I could imagine them being quite useful for high-loss, high-
           | latency paths (i.e. in addition to LFNs in conjunction with
           | poorly tuned TCP implementation), but most wireless protocols
           | I know (802.11, GPRS and beyond etc.) just implement an ARQ
           | process that masks the loss at the packet or lower layer.
           | 
           | So maybe between that and LFN-aware TCPs, there wasn't much
           | left for them to improve to justify the additional
           | complexity?
        
             | moandcompany wrote:
             | It's been a long time since I've paid attention to that
             | world, but AFAIK, PEPs are still used and essential
             | equipment for internet-style communication (i.e. TCP over
             | IP) via GEO satellites.
             | 
             | It looks like in this 2022 blog post evaluating latency and
             | throughput over Starlink, they concluded that PEPs were not
             | being used in the Starlink network (and probably
             | unnecessary) due to the lower latency characteristics from
             | use of LEO satellites. They also mention that PEPs are
             | (still) commonly employed by GEO-satcom based operators.
             | 
             | https://blog.apnic.net/2022/11/28/fact-checking-starlinks-
             | pe...
        
               | lxgr wrote:
               | > they concluded that PEPs were not being used in the
               | Starlink network
               | 
               | That makes sense, given that they're probably most useful
               | for high-latency networks. But what I find quite
               | surprising is that Starlink does nothing about the 1-2%
               | packet loss, as described in TFA; I'd really have
               | expected them to fix that using an ARQ at a lower layer.
               | 
               | Then again, maybe that's a blessing - indiscriminate ARQs
               | like that would be terrible for time critical things like
               | A/V, which can usually tolerate packet loss much better
               | than ARQ-induced jitter.
               | 
               | Thinking about it, that actually strengthens the case for
               | PEPs: They could improve TCP performance (and maybe
               | things like QUIC?), while leaving non-stream oriented
               | things (like non-QUIC UDP) alone.
               | 
               | Maybe Starlink just expects BBR to eventually become the
               | dominant TCP congestion control algorithm and the problem
               | to solve itself that way?
        
       | trebligdivad wrote:
       | How different is this behaviour to using a mobile phone in a car
       | or train? Doesn't that also get you odd changes in latency and
       | notice the handoffs between cells?
        
         | lxgr wrote:
         | The distances involved are orders of magnitude smaller, so you
         | don't get these effects nearly as much.
        
       | teleforce wrote:
       | Such a quality written article on satellite networking
       | technology, kudos to APNIC.
       | 
       | This makes me wonder perhaps TCP is not really suitable or
       | optimized for satellite network.
       | 
       | John Ousterhout (of TCL/TK fame) has recently proposed a new Homa
       | transport protocol as an alternative for TCP in data center [1].
       | Perhaps a new more suitable transport protocol for satellite or
       | NTN is also needed. That's the beauty of the Internet, the
       | transport protocol is expendable but not network protocol or IP.
       | The fact that IPv6 still a fringe rather than becoming mainstream
       | although it's arguably better than IPv4.
       | 
       | [1] Homa, a transport protocol to replace TCP for low-latency RPC
       | in data centers:
       | 
       | https://news.ycombinator.com/item?id=28204808
        
         | supriyo-biswas wrote:
         | Throwing out TCP for a message-oriented layer as Homa does is
         | not really required for addressing this need.
         | 
         | Perhaps what would be more useful in this context would be for
         | operating system vendors to perform a HTTP request to a
         | globally distributed endpoint similar to captive portal
         | detection, and then use a more aggressive congestion control
         | algorithm in the case of networks with good throughputs but
         | with high latency.
        
           | teleforce wrote:
           | I am not saying similar to Homa protocol since it's catering
           | for data center traffic but a new transport protocol
           | alternative to cater for rapid delay variations or jitter
           | needs that's unique to LEO networking.
        
       | supriyo-biswas wrote:
       | Previous discussion:
       | https://news.ycombinator.com/item?id=42284758
        
       | cagenut wrote:
       | I am soooo grateful for this post. After years and years of words
       | and one-off measurements, this is the first time I have seen
       | clear measuring of the key metrics, specifically for someone
       | considering the 'fast twitch shooter' (counterstrike et al) use
       | case. To sum:                 - the satellite handoff period is
       | 15 seconds       - you cycle through the three nearest satellites
       | - on the farthest one, inside the 15 second stable connection
       | window, latency is 40 - 55ms       - on the middle one latency is
       | 30 - 45ms       - on the closest one latency is 20 - 35ms       -
       | the moment of handoff between satellites is a 70 - 100ms latency
       | spike       - more importantly it is a near guaranteed moment of
       | ~10% packet loss
       | 
       | so my takeaway here is "it will mostly seem fine but you will
       | stutter-lag every 15 seconds". given that not every 15 seconds
       | will be an active moment of 'twitch off' shooting the engine will
       | probably smooth over most of them without you noticing.
       | 
       | this could probably be used in subtle ways the same as laghacking
       | already is. like if you knew some packet loss was coming and you
       | knew how your engine handled it you could do things like time
       | jumps around corners so you appear to teleport. or if the engine
       | is very conservative then you could at least know "don't push
       | that corner right now, wait 2 seconds for the handoff, then go".
       | 
       | edit: side note, thinking about the zoom use case, and this would
       | be kindof awful? imagine dropping a syllable every 15 seconds in
       | a conversation.
        
       | koksik202 wrote:
       | Retransmission and packet loss not shown?
        
       ___________________________________________________________________
       (page generated 2024-12-06 23:02 UTC)