hngopher.com

       [HN Gopher] The Surprising gRPC Client Bottleneck in Low-Latency...
       ___________________________________________________________________
        
       The Surprising gRPC Client Bottleneck in Low-Latency Networks
        
       Author : eivanov89
       Score  : 66 points
       Date   : 2025-07-23 13:23 UTC (9 hours ago)
        
 (HTM) web link (blog.ydb.tech)
 (TXT) w3m dump (blog.ydb.tech)
        
       | yuliyp wrote:
       | If you have a single TCP connection, all the data flows through
       | that connection, ultimately serializing at least some of the
       | processing. Given that the workers are just responding with OK,
       | no matter how many CPU cores you give to that you're still bound
       | by the throughput of the IO thread (well by the minimum of the
       | client and server IO thread). If you want more than 1 IO thread
       | to share the load, you need more than one TCP connection.
        
       | xtoilette wrote:
       | classic case of head of line blocking!
        
         | yuliyp wrote:
         | I don't think this is head-of-line blocking. That is, it's not
         | like a single slow request causes starvation of other requests.
         | The IO thread for the connection is grabbing and dispatching
         | data to workers as fast as it can. All the requests are
         | uniform, so it's not like one request would be bigger/harder to
         | handle for that thread.
        
           | otterley wrote:
           | > First, we checked the number of TCP connections using lsof
           | -i TCP:2137 and found that only a single TCP connection was
           | used regardless of in-flight count.
           | 
           | It's head-of-line blocking. When requests are serialized, the
           | queue will grow as long as the time to service a request is
           | longer than the interval between arriving requests. Queue
           | growth is bad if sufficient capacity exists to service
           | requests in parallel.
        
       | lacop wrote:
       | Somewhat related, I'm running into a gRPC latency issue in
       | https://github.com/grpc/grpc-go/issues/8436
       | 
       | If request payload exceeds certain size the response latency goes
       | from network RTT to double that, or triple.
       | 
       | Definitely something wrong with either TCP or HTTP/2 windowing as
       | it doesn't send the full request without getting ACK from server
       | first. But none of the gRPC windowing config options nor linux
       | tcp_wmem/rmem settings work. Sending one byte request every few
       | hundred milliseconds fixes it by keeping the gRPC channel / TCP
       | connection active. Nagle / slow start is disabled.
        
         | littlecranky67 wrote:
         | sounds like classic tcp congestion window scaling delay. Sounds
         | like your payload exceeds 10x initcwnd.
        
           | lacop wrote:
           | Doesn't initcwnd only apply as the initial value? I don't
           | care that the first request on the gRPC channel is slow, but
           | subsequent requests on the same channel reuse the TCP
           | connection and should have larger window size. This works as
           | long as the channel is actively being used, but after short
           | inactivity (few hundred ms, unsure exactly) something appears
           | to revert back.
        
             | littlecranky67 wrote:
             | Yes, in case of hot tcp connections congestion control
             | should not be the issue.
        
               | lacop wrote:
               | Yeah that was my understanding too, hence I filed the bug
               | (actually duplicate of older bug that was closed because
               | poster didn't provide reproduction).
               | 
               | Still not sure if this is linux network configuration
               | issue or grpc issue, but something is for sure broken if
               | I can't send a ~1MB request and get response within
               | roughly network RTT + server processing time.
        
         | eivanov89 wrote:
         | That's indeed interesting, thank you for sharing.
        
       | ltbarcly3 wrote:
       | gRPC is a very badly implemented system. I have gotten 25%-30%+
       | improvements in throughput just by monkeypatching client
       | libraries for google cloud to force json api endpoint usage.
       | 
       | At least try something else besides gRPC when building systems so
       | you have a baseline performance understanding. gRPC is OFTEN
       | introducing performance breakdowns that goes unnoticed.
        
         | stock_toaster wrote:
         | Have you done any comparisons with connect-rpc?
        
       ___________________________________________________________________
       (page generated 2025-07-23 23:01 UTC)