[HN Gopher] Optimizing global message transit latency: a journey...
       ___________________________________________________________________
        
       Optimizing global message transit latency: a journey through TCP
       configuration
        
       Author : amnonbc
       Score  : 34 points
       Date   : 2024-08-19 14:40 UTC (8 hours ago)
        
 (HTM) web link (ably.com)
 (TXT) w3m dump (ably.com)
        
       | vrnvu wrote:
       | I always enjoy reading posts about optimization like this one.
       | 
       | Optimizing a running service is often underrated. Many engineers
       | focus on scaling horizontally or vertically, adding more
       | instances or using more powerful machines to solve problems. But
       | there's a third option: service optimization, which offers
       | significant benefits.
       | 
       | Whether it's tuning TCP configurations or profiling to identify
       | CPU and memory bottlenecks, optimizing the service itself can
       | lead to better performance and cost savings. It's a smart
       | approach that shouldn't be overlooked.
        
       | jeffbee wrote:
       | Reupping my assertion that kernel network protocol stacks are
       | painfully obsolete in a world of containers, namespaces, and
       | virtual machines. Default parameters are trying to balance
       | competing interests, none of which will be relevant to your use
       | case. Userspace protocol stacks are more aligned with the end-to-
       | end principle. QUIC is a convenient compromise that moves most of
       | the complexity up into the application while still benefitting
       | from the kernel UDP stack with relatively fewer knobs.
        
         | electricshampo1 wrote:
         | On prod servers I see a bunch of frontend stalls & code misses
         | in the L2 for the kernel tcp stack; having each process
         | statically embed its own network stack may make that worse
         | (though using dynamic shared quic lib for ex. in userspace
         | shared across multiple processes partially addresses that but
         | with other tradeoffs).
         | 
         | Of course depending on usecase etc the benefit from first-order
         | network behavior improvements is almost certainly more
         | important than the second-order cache pollution effects of
         | replicated/seperate network stacks.
        
           | jeffbee wrote:
           | When using a userspace stack you can (and should!) optimize
           | your program during and after link to put hot code together
           | on same/nearby lines and pages. You cannot do this or
           | anything approximating this between an application and the
           | Linux kernel. When Linux is built the linker doesn't know
           | which parts of its sprawling network stack are hot or cold.
        
       | tbarbugli wrote:
       | Nice article which matches my experience when it comes to
       | optimizing for performance: Linux defaults are never good
       | defaults and you don't need webscale or anything before you get
       | bitten by them.
       | 
       | To make a few examples: on many distributions you get 1024 as the
       | file limits, 4KB of shared memory (shmall) and Nagle's algorithm
       | is enabled by default.
       | 
       | Another thing that we noticed at work (shameless plug to
       | getstream.io) when it comes to tail latency for APIs / HTTP
       | services:
       | 
       | - TLS over HTTP is annoyingly slow (too many roundtrips)
       | 
       | - Having edge nodes / POPs close to end-users greatly improves
       | tail latency (and reduces latency related errors). This works
       | incredibly well for simple relays (the "weak" link has lower
       | latency)
       | 
       | - QUIC is awesome
        
         | pocketarc wrote:
         | That honestly just makes me think: Is there a distro where this
         | isn't the case? Where the defaults are set up for performance
         | in a modern server context, with the expectation that the
         | system will be admin'd by someone technical who knows the
         | tradeoffs? Heck, the decisions + tradeoffs can all be
         | documented in docs.
         | 
         | Is there a reason I'm missing why this wouldn't be worth
         | jumping on?
        
         | kevincox wrote:
         | > 1024 as the file limits
         | 
         | https://0pointer.net/blog/file-descriptor-limits.html is a good
         | overview of the unfortunate reason why this is and how it
         | should be handled.
        
       | divbzero wrote:
       | Are there any Linux distributions (or packages) that apply the
       | best network configurations by default?
        
       | iscmt wrote:
       | Wrote my Computer Networking final last Friday. Some parts of TCP
       | felt dry to study, but it's incredible to see how those same
       | concepts are being used for optimization.
        
       ___________________________________________________________________
       (page generated 2024-08-19 23:00 UTC)