[HN Gopher] Detecting Node Event Loop Blockers
___________________________________________________________________
Detecting Node Event Loop Blockers
Author : Ben-G
Score : 71 points
Date : 2022-03-17 17:20 UTC (5 hours ago)
(HTM) web link (www.ashbyhq.com)
(TXT) w3m dump (www.ashbyhq.com)
| lmarcos wrote:
| When working in a green field project, does Node.js provide any
| advantages over, let's say, Go? In medium to big size teams, I
| find tricky to keep the event loop free of blockers. In this
| regard, Go's goroutines model make it easier to not block the
| whole app due to silly mistakes.
| jerf wrote:
| This is ultimately what killed cooperative scheduling in the 90s
| and the early 2000s. It works at first, then it keeps working,
| and indeed, it keeps working for a long time. If you've got a
| problem below that threshold, you're probably be fine. But
| eventually, as you scale up, you _will_ eventually hit the
| problem that you 've got things that sometimes run longer than
| you thought, sometimes a _lot_ longer than you thought, and it
| locks the whole system up. You can maybe fix the first few, as
| the article does here, but the problem just gets larger and
| larger and eventually you run out of optimization juice if your
| system has to keep getting bigger.
|
| MacOS prior to X was cooperatively scheduled across the whole OS,
| and it was definitely breaking down and why Apple needed such a
| radical change. Likely you aren't cooperatively scaling on this
| scale, which is why it continues to work with many programs.
|
| In some sense, the problem is that every line of code you write
| is essentially an event loop blocker. Sooner or later, as you
| roll the dice, one of them is going to end up taking longer than
| you thought. It's the same basic forces that make it so that
| nobody, no matter how experienced, really _knows_ how their code
| is going to run until they put it under a profiler. There 's just
| too much stuff going on nowadays for anyone to keep track of it
| all and know in advance.
|
| But it is just one of those things you need to look at when first
| sitting down to decide what to implement your new system in. If
| you run the math and you need the n'th degree of performance and
| memory control, don't pick Python, for instance. If the profile
| of tasks you need to run is going to be very irregular and
| consume lots of the CPU and doesn't have easy ways to break it
| apart (or it would be too much developer effort to keep manually
| breaking these tasks apart), don't pick a cooperatively-scheduled
| language.
|
| Fortunately, it's a lot easier than it used to be to try to take
| these particular tasks out and move them to another language and
| another process, while not having to rewrite the entire code base
| to get those bits out.
| paxys wrote:
| This is why Node.js is well suited for tiny microservices where
| a single person (or small group of people) can have context on
| every line of code running in the thread and can identify and
| fix such blockers. When the application grows and evolves
| beyond that, sometimes to the complexity of an operating
| system, it is going to be impossible to keep it performant for
| these reasons.
| 3pt14159 wrote:
| > n'th degree of performance and memory control, don't pick
| Python, for instance.
|
| Or be comfortable rolling the critical parts in another
| language like cython, which I've done and got a 10000x
| performance speedup because the hot part of the code fit in the
| CPU cache and the rest was plain old python. There are times to
| start with something like Go when you're absolutely sure you'll
| need it, but I rarely regret starting in Python. It's usually
| easy enough to shift a single endpoint to another language or
| to use extensions to call into something faster.
| dgellow wrote:
| You will also rarely regret starting with Go. It's not really
| a language you would use to have fine memory control and the
| best performances, so I'm not sure why you mention it as a
| response to the text you quoted. Go is more like a good
| balance between low memory usage, good performances, while
| being a very productive tool.
| egberts1 wrote:
| Only non productive part is the endless juggling of Go
| modules' versioning.
| samhw wrote:
| This is true. I'm a Rust fanatic myself, but I grudgingly
| admit that I wouldn't start a company with Rust. I just
| wouldn't trust future engineers enough. Whereas with Go,
| I'd never write it in my spare time, and I hate the
| paternalism and limitedness, but those same qualities make
| it an excellent quality for a company language.
|
| If there's one thing I'd say about Go, it's that it's a
| language you can roll out across hundreds of junior
| engineers writing relatively sophisticated code and trust
| that you'll get a very respectable balance of (a) runtime
| performance, (b) developer productivity, and (c) safety
| (memory, thread[0], type, etc).
|
| [0] OK, not in the strict way that Rust offers. But the
| simplicity and verbosity makes it easy to spot errors, the
| standard lib offers excellent primitives for concurrent
| programming, and `go test -race` sweeps up most of the
| rest.
| eyelidlessness wrote:
| > MacOS prior to X was cooperatively scheduled across the whole
| OS, and it was definitely breaking down and why Apple needed
| such a radical change.
|
| Preemptive multitasking was definitely one of the headline
| features of Mac OS X, but I think the other major one--
| protected memory--was the more important problem to solve with
| classic MacOS. Performance on classic MacOS was _better_ for
| many common workloads than OS X, for several versions of the
| latter. And yes, playing a video while minimizing the window
| was a cool demo, and that capability is table stakes for an OS
| now. But the crashing bomb app demo was far more compelling for
| most of us Mac greybeards.
|
| That said, cooperative concurrency in JS is a very different
| thing than in an OS. It's not a panacea of course, but the
| typical APIs available are pervasively concurrent or trivial to
| make concurrent. And where that's not enough (CPU bound
| workloads), all mainstream JS environments now support real
| threads. Granted that doesn't mean JS is an ideal solution for
| CPU bound workloads... but if JS is already a prerequisite,
| it's worth considering scaling out with threads before more
| expensive process spawning and IPC.
| mfbx9da4 wrote:
| This is excellent and an answer to my deleted question on
| stackoverflow!
| https://stackoverflow.com/questions/69886637/measure-time-ta...
| hexsprite wrote:
| how come it was deleted?
| naugtur wrote:
| Hi, blocked-at author here.
|
| Get in touch over email if you want to explore further.
|
| Some quick comments:
|
| - I didn't notice the node version or you didn't state it. The
| impact from async hooks is _vastly_ different between node
| versions
|
| - I need to update the package a bit, I've got a known perf
| improvement to add AFAIR.
|
| - your implementation is likely to produce false positives. The
| trick to prevent (some) false positives is the most valuable part
| of the lib.
|
| Also, take a look at the event loop utilization metric introduced
| last year
| https://m.youtube.com/watch?v=WetXnEPraYM&list=PL0CdgOSSGlBa...
|
| And for more of my diagnostics experiments see debugging-aid
| package https://www.npmjs.com/package/debugging-aid
| abhikp wrote:
| Hey! We (I work at Ashby) are on latest node 16. We'll update
| the post to reflect that. Happy to chat over email (edit: found
| your email and will reach out)
| naugtur wrote:
| We could break blocked-at into two layers so that you could
| provide the data collection mechanism. You could report to
| datadog while using the false positive filter.
|
| Also, there's a bunch of tools you could use before you
| deploy blocked-at or async hooks at all.
|
| Note it's midnight here and I'll stop responding very soon :D
| des429 wrote:
| fyi to anyone seriously considering using these: neither is going
| to work as expected if something blocks the loop indefinitely. In
| other words, you won't know how long something blocked the loop
| until that thing has finished blocking. Timeouts for async code
| or limits on loop statements are still relevant.
| naugtur wrote:
| If something is blocking the eventloop indefinitely, it's
| unlikely to reach production and even if it does, a simple
| crash report or perf inspection will reveal it. You don't need
| more tools than node-report for that case. Permanent eventloop
| block is the simple case.
|
| Regular eventloop blocking by synchronous processing is the
| middle ground.
|
| Performance issues with utilization of resources, too many
| promises or broken backpressure - that's where the fun begins.
|
| If you can run your software locally and simulate load, just
| use node clinic. It's the best looking one ;)
___________________________________________________________________
(page generated 2022-03-17 23:00 UTC)