[HN Gopher] How we tamed Node.js event loop lag: a deepdive
       ___________________________________________________________________
        
       How we tamed Node.js event loop lag: a deepdive
        
       Author : mifydev
       Score  : 24 points
       Date   : 2024-07-08 20:52 UTC (2 hours ago)
        
 (HTM) web link (trigger.dev)
 (TXT) w3m dump (trigger.dev)
        
       | dexwiz wrote:
       | Says event loop in the title, but the real culprit is a non
       | paginated endpoint with a nested looped. Pagination or guard
       | rails are basic things for customer facing features. Any time you
       | design a service for X items, some will try it with 10x-1000X
       | items. Be ready for that.
        
         | corytheboyd wrote:
         | And every single time a product person will challenge the need
         | for pagination, or any other limits that make your systems
         | actually scalable. Sigh.
        
       | ricardobeat wrote:
       | This is not about "taming lag" as suggested by the title, which
       | implies some form of failure on node's part.
       | 
       | They accidentally wrote synchronous O(n^2) code that hogged the
       | CPU, blocking the event loop, then fixed it. But that doesn't
       | sound as adventurous...
       | 
       | Otherwise a solid example of using observability tools to debug a
       | live issue.
        
         | williamdclt wrote:
         | While I don't think the article is very advanced, it's really
         | not about the root cause. The O(n^2) code isn't the subject
         | (they don't even show the fix, as it's not really interesting).
         | 
         | It's about how to systematically detect and debug the problem.
         | In Node that's not a trivial thing to do. That has value
        
           | moralestapia wrote:
           | >In Node that's not a trivial thing to do.
           | 
           | Depends on your code best practices, I've found it way easier
           | than other platforms I've used (C++, Python). Even without
           | explicit interrupts and such.
        
       | williamdclt wrote:
       | I'm a bit confused by the monitoring described. Event loop lag is
       | insidious because it doesn't affect only the slow part of your
       | app, it affects everything: one small part of a request takes
       | seconds, making every concurrent request take seconds. Generally,
       | i found that when the event loop is having lag issue, you can't
       | really trust much of your application monitoring (OTel spans are
       | very long, but it's actually just waiting for the event loop).
       | How then did find the root causes of these lag issues?
       | 
       | As an aside, it's a bit weird to create a span to mark that
       | something happened, OTel events are made for that
        
       | moonlion_eth wrote:
       | How we turned our amateur code into click bait
        
       | cpursley wrote:
       | tl;dr: should have used Erlang/Elixir as well as our customers.
        
         | mplewis wrote:
         | How does Erlang fix this?
        
       | sgarland wrote:
       | They put emoji into logs (console, but still). Node devs are
       | deeply unserious people, and you cannot convince me otherwise.
        
       ___________________________________________________________________
       (page generated 2024-07-08 23:00 UTC)