[HN Gopher] A transport protocol's view of Starlink
___________________________________________________________________
A transport protocol's view of Starlink
Author : rolph
Score : 147 points
Date : 2024-11-30 23:12 UTC (5 days ago)
(HTM) web link (blog.apnic.net)
(TXT) w3m dump (blog.apnic.net)
| NelsonMinar wrote:
| I noticed a huge improvement just switching to stock BBR to my
| Starlink as well. During a particularly congested time I was
| bouncing between 5 to 12 Mbps via Starlink. With BBR enabled I
| got a steady 12. The main problem is that you need BBR on the
| server for this to work, as a client using Starlink I don't have
| any control over what all the servers I connect to are doing.
| (Other than my one server I was testing with).
|
| I like Huston's idea of a Starlink-tuned BBR, I wonder if it's a
| traffic shaping that SpaceX could apply themselves in their
| ground station datacenters? That'd involve messing with the TCP
| stream though, maybe a bad idea.
|
| The fact that Starlink has this 15 second switching built in is
| pretty weird, but you can definitely see it in every continuous
| latency measure. Even weirder it seems to be globally
| synchronized: all the hundreds of thousands of dishes are
| switching to new satellites the same millisecond, globally.
| Having a customized BBR aware of that 15 second cycle is an
| interesting idea.
| btilly wrote:
| If you use a VPN, wouldn't it suffice to just make your VPN
| connection use BBR?
|
| Ditto if you use an https proxy of some kind.
| jofla_net wrote:
| I would guess that that would be beneficial, but again only
| if youre using a TCP vpn, which is suboptimal for other
| reasons. I think it was called meltdown. If that is all you
| have access to though, im sure it would help.
| Hikikomori wrote:
| Proxy yes, vpn no. Tcp over tcp vpn is bad, no tcp vpn would
| make no difference to no vpn.
| Alex-Programs wrote:
| https://github.com/apernet/hysteria has the option to use
| https://github.com/apernet/tcp-brutal, a deliberately
| unfair/selfish congestion control algorithm.
|
| It's designed to mitigate certain methods of blocking-via-
| throttling.
|
| I looked into it for a report I wrote a while back, and I was
| surprised to find that nobody has made something purpose-
| built for greedy TCP congestion handling in order to improve
| performance at the expense of others. If there is such a
| thing, I couldn't find it. Perhaps I'm a little too cynical
| in my expectations!
|
| Maybe TCP-over-TCP is so bad that it's not worth it?
| sgt101 wrote:
| Fascinating that the throughput is about 250mbs. Presumably
| that's over the area served by one satellite? I wonder how much
| cache they put in each one... I vaguely remember a stat that 90%
| of requests (in data terms) are served from a TB of cache on the
| consumer internet, perhaps having the satellites gossip for cache
| hits would work to preserve uplink bandwidth as well. Maybe
| downlink bandwidth is the thing for this network though and
| caches just won't work.
| echoangle wrote:
| I would be surprised if there is a lot or even any cache on the
| satellites itself. Fast large storage that's radiation hardened
| would be extremely expensive, and they have a lot of
| satellites. The satellites are low enough that general
| radiation isn't that bad, but every pass through the South
| Atlantic Anomaly would risk damage if regular flash storage is
| used.
| Alex-Programs wrote:
| Beyond that, how would the cache actually work?
|
| Everything is HTTPS nowadays; you can't just MITM and stick a
| caching proxy in front. You could put DNS on the sat I
| suppose, but other than that you'd need to have a full
| Netflix/Cloudflare node, and the sats are far too small and
| numerous for that.
| michaelt wrote:
| I'm working on a new type of protocol so that when there's
| a large-scale online event like the superbowl or a major
| boxing match on a streaming platform,
|
| satellite and cable internet providers will be able to send
| a single copy of the video stream on the uplink, and using
| a new technique we've named "casting the net broadly"
| multiple viewers will be able to receive the same downlink
| packets
|
| which we believe will have excellent scalability, enabling
| web-scale streaming video.
| spockz wrote:
| Isn't this what multicast does?
| vardump wrote:
| I think a single Starlink v1 satellite has a maximum bandwidth
| of about 20 Gbps. Newer versions might have a lot more.
| kevincox wrote:
| > the endpoints need to use large buffers to hold a copy of all
| the unacknowledged data, as is required by the TCP protocol.
|
| It makes me wonder if anyone has tried to break down the layers
| to optimize this. In the fairly common case of serving a file off
| of long-term storage you can jut fetch the old data if needed
| (likely from the page cache anyways, but still better than
| duplicating it) and some encryption algorithms are seekable so
| you can redo the encryption as well.
|
| Right now the kernel doesn't really have a choice but to buffer
| all unacknowledged data as the UNIX socket API has no provision
| for requesting old data from the writer. But a smarter API could
| put the application in charge of making that data available as
| required and avoid the need for extra copies in common cases.
|
| I know that Netflix did lots of work with the FreeBSD kernel for
| file to socket and eventually adding in-kernel TLS to remove user
| space from the equation. But I don't know if they went as far to
| remove the socket buffers.
| cyberax wrote:
| > It makes me wonder if anyone has tried to break down the
| layers to optimize this.
|
| Yep. There was a bunch of proxy servers that optimized HTTP for
| satellite service. I used Globax back in the day to speed up
| one-way satellite service:
| https://web.archive.org/web/20040602203838/http://globax.inf...
|
| Back then traffic was around 10 cents per megabyte in my city,
| so satellite service was a good way to shave off these costs.
| lxgr wrote:
| There aren't necessarily extra copies even when just using TCP,
| thanks to sendfile(2) and similar mechanisms.
|
| Buffer size isn't that much of an issue either, given the
| relatively low latencies involved and that you can indicate
| which parts exactly are missing pretty accurately these days
| with selective TCP acknowledgements, so you'll need at most a
| few round trips to identify these to the sender and eventually
| receive them.
|
| Practically, you'll probably not see much loss anyway, for
| better or worse: TCP historically interpreted packet loss as
| congestion, instead of actual non-congestion-induced loss on
| the physical layer. This is why most lower-layer protocols with
| higher error rates than Ethernet usually implement some sort of
| lower-layer ARQ to present themselves as "Ethernet-like" to
| upper layers in terms of error/loss rate, and this in turn has
| made loss-tolerant TCP (such as BBR, as described in the
| article) less of a research priority, I believe.
| moandcompany wrote:
| This is an "old" problem that has historically been addressed
| through things like "Performance Enhancing Proxies (PEPs)" that
| are defined in RFC 3135 and RFC 3449.
| (https://en.wikipedia.org/wiki/Performance-enhancing_proxy)
|
| In internet-style communications, such as routing IP traffic
| over satellite links to low-earth-orbit or GEO, with much
| longer round-trip times, link latency is substantially higher
| than most terrestrial wired or wireless applications and
| acknowledgements required as part of TCP take much longer to
| facilitate. PEPs as an example augment the connection allowing
| end-user/client devices in the network with inline-PEPs to
| retain their normal network settings and perform the task of
| running or starting sessions with higher TCP-window sizes, as a
| method for improving overall throughput.
|
| The utility of a PEP, or PEP-acting device goes up when you
| imagine multiple devices, or a network of devices attached to a
| satellite communications terminal for WAN/backhaul connections
| as the link's performance can be managed at one point versus on
| all downstream client devices.
| lxgr wrote:
| TIL that that's what they're called. Thank you!
|
| Do you know if they became obsolete due to modern TCP stacks
| handling LFNs better or for some other reason?
|
| I could imagine them being quite useful for high-loss, high-
| latency paths (i.e. in addition to LFNs in conjunction with
| poorly tuned TCP implementation), but most wireless protocols
| I know (802.11, GPRS and beyond etc.) just implement an ARQ
| process that masks the loss at the packet or lower layer.
|
| So maybe between that and LFN-aware TCPs, there wasn't much
| left for them to improve to justify the additional
| complexity?
| moandcompany wrote:
| It's been a long time since I've paid attention to that
| world, but AFAIK, PEPs are still used and essential
| equipment for internet-style communication (i.e. TCP over
| IP) via GEO satellites.
|
| It looks like in this 2022 blog post evaluating latency and
| throughput over Starlink, they concluded that PEPs were not
| being used in the Starlink network (and probably
| unnecessary) due to the lower latency characteristics from
| use of LEO satellites. They also mention that PEPs are
| (still) commonly employed by GEO-satcom based operators.
|
| https://blog.apnic.net/2022/11/28/fact-checking-starlinks-
| pe...
| lxgr wrote:
| > they concluded that PEPs were not being used in the
| Starlink network
|
| That makes sense, given that they're probably most useful
| for high-latency networks. But what I find quite
| surprising is that Starlink does nothing about the 1-2%
| packet loss, as described in TFA; I'd really have
| expected them to fix that using an ARQ at a lower layer.
|
| Then again, maybe that's a blessing - indiscriminate ARQs
| like that would be terrible for time critical things like
| A/V, which can usually tolerate packet loss much better
| than ARQ-induced jitter.
|
| Thinking about it, that actually strengthens the case for
| PEPs: They could improve TCP performance (and maybe
| things like QUIC?), while leaving non-stream oriented
| things (like non-QUIC UDP) alone.
|
| Maybe Starlink just expects BBR to eventually become the
| dominant TCP congestion control algorithm and the problem
| to solve itself that way?
| trebligdivad wrote:
| How different is this behaviour to using a mobile phone in a car
| or train? Doesn't that also get you odd changes in latency and
| notice the handoffs between cells?
| lxgr wrote:
| The distances involved are orders of magnitude smaller, so you
| don't get these effects nearly as much.
| teleforce wrote:
| Such a quality written article on satellite networking
| technology, kudos to APNIC.
|
| This makes me wonder perhaps TCP is not really suitable or
| optimized for satellite network.
|
| John Ousterhout (of TCL/TK fame) has recently proposed a new Homa
| transport protocol as an alternative for TCP in data center [1].
| Perhaps a new more suitable transport protocol for satellite or
| NTN is also needed. That's the beauty of the Internet, the
| transport protocol is expendable but not network protocol or IP.
| The fact that IPv6 still a fringe rather than becoming mainstream
| although it's arguably better than IPv4.
|
| [1] Homa, a transport protocol to replace TCP for low-latency RPC
| in data centers:
|
| https://news.ycombinator.com/item?id=28204808
| supriyo-biswas wrote:
| Throwing out TCP for a message-oriented layer as Homa does is
| not really required for addressing this need.
|
| Perhaps what would be more useful in this context would be for
| operating system vendors to perform a HTTP request to a
| globally distributed endpoint similar to captive portal
| detection, and then use a more aggressive congestion control
| algorithm in the case of networks with good throughputs but
| with high latency.
| teleforce wrote:
| I am not saying similar to Homa protocol since it's catering
| for data center traffic but a new transport protocol
| alternative to cater for rapid delay variations or jitter
| needs that's unique to LEO networking.
| supriyo-biswas wrote:
| Previous discussion:
| https://news.ycombinator.com/item?id=42284758
| cagenut wrote:
| I am soooo grateful for this post. After years and years of words
| and one-off measurements, this is the first time I have seen
| clear measuring of the key metrics, specifically for someone
| considering the 'fast twitch shooter' (counterstrike et al) use
| case. To sum: - the satellite handoff period is
| 15 seconds - you cycle through the three nearest satellites
| - on the farthest one, inside the 15 second stable connection
| window, latency is 40 - 55ms - on the middle one latency is
| 30 - 45ms - on the closest one latency is 20 - 35ms -
| the moment of handoff between satellites is a 70 - 100ms latency
| spike - more importantly it is a near guaranteed moment of
| ~10% packet loss
|
| so my takeaway here is "it will mostly seem fine but you will
| stutter-lag every 15 seconds". given that not every 15 seconds
| will be an active moment of 'twitch off' shooting the engine will
| probably smooth over most of them without you noticing.
|
| this could probably be used in subtle ways the same as laghacking
| already is. like if you knew some packet loss was coming and you
| knew how your engine handled it you could do things like time
| jumps around corners so you appear to teleport. or if the engine
| is very conservative then you could at least know "don't push
| that corner right now, wait 2 seconds for the handoff, then go".
|
| edit: side note, thinking about the zoom use case, and this would
| be kindof awful? imagine dropping a syllable every 15 seconds in
| a conversation.
| koksik202 wrote:
| Retransmission and packet loss not shown?
___________________________________________________________________
(page generated 2024-12-06 23:02 UTC)