[HN Gopher] How we tamed Node.js event loop lag: a deepdive
___________________________________________________________________
How we tamed Node.js event loop lag: a deepdive
Author : mifydev
Score : 24 points
Date : 2024-07-08 20:52 UTC (2 hours ago)
(HTM) web link (trigger.dev)
(TXT) w3m dump (trigger.dev)
| dexwiz wrote:
| Says event loop in the title, but the real culprit is a non
| paginated endpoint with a nested looped. Pagination or guard
| rails are basic things for customer facing features. Any time you
| design a service for X items, some will try it with 10x-1000X
| items. Be ready for that.
| corytheboyd wrote:
| And every single time a product person will challenge the need
| for pagination, or any other limits that make your systems
| actually scalable. Sigh.
| ricardobeat wrote:
| This is not about "taming lag" as suggested by the title, which
| implies some form of failure on node's part.
|
| They accidentally wrote synchronous O(n^2) code that hogged the
| CPU, blocking the event loop, then fixed it. But that doesn't
| sound as adventurous...
|
| Otherwise a solid example of using observability tools to debug a
| live issue.
| williamdclt wrote:
| While I don't think the article is very advanced, it's really
| not about the root cause. The O(n^2) code isn't the subject
| (they don't even show the fix, as it's not really interesting).
|
| It's about how to systematically detect and debug the problem.
| In Node that's not a trivial thing to do. That has value
| moralestapia wrote:
| >In Node that's not a trivial thing to do.
|
| Depends on your code best practices, I've found it way easier
| than other platforms I've used (C++, Python). Even without
| explicit interrupts and such.
| williamdclt wrote:
| I'm a bit confused by the monitoring described. Event loop lag is
| insidious because it doesn't affect only the slow part of your
| app, it affects everything: one small part of a request takes
| seconds, making every concurrent request take seconds. Generally,
| i found that when the event loop is having lag issue, you can't
| really trust much of your application monitoring (OTel spans are
| very long, but it's actually just waiting for the event loop).
| How then did find the root causes of these lag issues?
|
| As an aside, it's a bit weird to create a span to mark that
| something happened, OTel events are made for that
| moonlion_eth wrote:
| How we turned our amateur code into click bait
| cpursley wrote:
| tl;dr: should have used Erlang/Elixir as well as our customers.
| mplewis wrote:
| How does Erlang fix this?
| sgarland wrote:
| They put emoji into logs (console, but still). Node devs are
| deeply unserious people, and you cannot convince me otherwise.
___________________________________________________________________
(page generated 2024-07-08 23:00 UTC)