[HN Gopher] Optimizing global message transit latency: a journey...
___________________________________________________________________
Optimizing global message transit latency: a journey through TCP
configuration
Author : amnonbc
Score : 34 points
Date : 2024-08-19 14:40 UTC (8 hours ago)
(HTM) web link (ably.com)
(TXT) w3m dump (ably.com)
| vrnvu wrote:
| I always enjoy reading posts about optimization like this one.
|
| Optimizing a running service is often underrated. Many engineers
| focus on scaling horizontally or vertically, adding more
| instances or using more powerful machines to solve problems. But
| there's a third option: service optimization, which offers
| significant benefits.
|
| Whether it's tuning TCP configurations or profiling to identify
| CPU and memory bottlenecks, optimizing the service itself can
| lead to better performance and cost savings. It's a smart
| approach that shouldn't be overlooked.
| jeffbee wrote:
| Reupping my assertion that kernel network protocol stacks are
| painfully obsolete in a world of containers, namespaces, and
| virtual machines. Default parameters are trying to balance
| competing interests, none of which will be relevant to your use
| case. Userspace protocol stacks are more aligned with the end-to-
| end principle. QUIC is a convenient compromise that moves most of
| the complexity up into the application while still benefitting
| from the kernel UDP stack with relatively fewer knobs.
| electricshampo1 wrote:
| On prod servers I see a bunch of frontend stalls & code misses
| in the L2 for the kernel tcp stack; having each process
| statically embed its own network stack may make that worse
| (though using dynamic shared quic lib for ex. in userspace
| shared across multiple processes partially addresses that but
| with other tradeoffs).
|
| Of course depending on usecase etc the benefit from first-order
| network behavior improvements is almost certainly more
| important than the second-order cache pollution effects of
| replicated/seperate network stacks.
| jeffbee wrote:
| When using a userspace stack you can (and should!) optimize
| your program during and after link to put hot code together
| on same/nearby lines and pages. You cannot do this or
| anything approximating this between an application and the
| Linux kernel. When Linux is built the linker doesn't know
| which parts of its sprawling network stack are hot or cold.
| tbarbugli wrote:
| Nice article which matches my experience when it comes to
| optimizing for performance: Linux defaults are never good
| defaults and you don't need webscale or anything before you get
| bitten by them.
|
| To make a few examples: on many distributions you get 1024 as the
| file limits, 4KB of shared memory (shmall) and Nagle's algorithm
| is enabled by default.
|
| Another thing that we noticed at work (shameless plug to
| getstream.io) when it comes to tail latency for APIs / HTTP
| services:
|
| - TLS over HTTP is annoyingly slow (too many roundtrips)
|
| - Having edge nodes / POPs close to end-users greatly improves
| tail latency (and reduces latency related errors). This works
| incredibly well for simple relays (the "weak" link has lower
| latency)
|
| - QUIC is awesome
| pocketarc wrote:
| That honestly just makes me think: Is there a distro where this
| isn't the case? Where the defaults are set up for performance
| in a modern server context, with the expectation that the
| system will be admin'd by someone technical who knows the
| tradeoffs? Heck, the decisions + tradeoffs can all be
| documented in docs.
|
| Is there a reason I'm missing why this wouldn't be worth
| jumping on?
| kevincox wrote:
| > 1024 as the file limits
|
| https://0pointer.net/blog/file-descriptor-limits.html is a good
| overview of the unfortunate reason why this is and how it
| should be handled.
| divbzero wrote:
| Are there any Linux distributions (or packages) that apply the
| best network configurations by default?
| iscmt wrote:
| Wrote my Computer Networking final last Friday. Some parts of TCP
| felt dry to study, but it's incredible to see how those same
| concepts are being used for optimization.
___________________________________________________________________
(page generated 2024-08-19 23:00 UTC)