[HN Gopher] Detecting Node Event Loop Blockers
       ___________________________________________________________________
        
       Detecting Node Event Loop Blockers
        
       Author : Ben-G
       Score  : 71 points
       Date   : 2022-03-17 17:20 UTC (5 hours ago)
        
 (HTM) web link (www.ashbyhq.com)
 (TXT) w3m dump (www.ashbyhq.com)
        
       | lmarcos wrote:
       | When working in a green field project, does Node.js provide any
       | advantages over, let's say, Go? In medium to big size teams, I
       | find tricky to keep the event loop free of blockers. In this
       | regard, Go's goroutines model make it easier to not block the
       | whole app due to silly mistakes.
        
       | jerf wrote:
       | This is ultimately what killed cooperative scheduling in the 90s
       | and the early 2000s. It works at first, then it keeps working,
       | and indeed, it keeps working for a long time. If you've got a
       | problem below that threshold, you're probably be fine. But
       | eventually, as you scale up, you _will_ eventually hit the
       | problem that you 've got things that sometimes run longer than
       | you thought, sometimes a _lot_ longer than you thought, and it
       | locks the whole system up. You can maybe fix the first few, as
       | the article does here, but the problem just gets larger and
       | larger and eventually you run out of optimization juice if your
       | system has to keep getting bigger.
       | 
       | MacOS prior to X was cooperatively scheduled across the whole OS,
       | and it was definitely breaking down and why Apple needed such a
       | radical change. Likely you aren't cooperatively scaling on this
       | scale, which is why it continues to work with many programs.
       | 
       | In some sense, the problem is that every line of code you write
       | is essentially an event loop blocker. Sooner or later, as you
       | roll the dice, one of them is going to end up taking longer than
       | you thought. It's the same basic forces that make it so that
       | nobody, no matter how experienced, really _knows_ how their code
       | is going to run until they put it under a profiler. There 's just
       | too much stuff going on nowadays for anyone to keep track of it
       | all and know in advance.
       | 
       | But it is just one of those things you need to look at when first
       | sitting down to decide what to implement your new system in. If
       | you run the math and you need the n'th degree of performance and
       | memory control, don't pick Python, for instance. If the profile
       | of tasks you need to run is going to be very irregular and
       | consume lots of the CPU and doesn't have easy ways to break it
       | apart (or it would be too much developer effort to keep manually
       | breaking these tasks apart), don't pick a cooperatively-scheduled
       | language.
       | 
       | Fortunately, it's a lot easier than it used to be to try to take
       | these particular tasks out and move them to another language and
       | another process, while not having to rewrite the entire code base
       | to get those bits out.
        
         | paxys wrote:
         | This is why Node.js is well suited for tiny microservices where
         | a single person (or small group of people) can have context on
         | every line of code running in the thread and can identify and
         | fix such blockers. When the application grows and evolves
         | beyond that, sometimes to the complexity of an operating
         | system, it is going to be impossible to keep it performant for
         | these reasons.
        
         | 3pt14159 wrote:
         | > n'th degree of performance and memory control, don't pick
         | Python, for instance.
         | 
         | Or be comfortable rolling the critical parts in another
         | language like cython, which I've done and got a 10000x
         | performance speedup because the hot part of the code fit in the
         | CPU cache and the rest was plain old python. There are times to
         | start with something like Go when you're absolutely sure you'll
         | need it, but I rarely regret starting in Python. It's usually
         | easy enough to shift a single endpoint to another language or
         | to use extensions to call into something faster.
        
           | dgellow wrote:
           | You will also rarely regret starting with Go. It's not really
           | a language you would use to have fine memory control and the
           | best performances, so I'm not sure why you mention it as a
           | response to the text you quoted. Go is more like a good
           | balance between low memory usage, good performances, while
           | being a very productive tool.
        
             | egberts1 wrote:
             | Only non productive part is the endless juggling of Go
             | modules' versioning.
        
             | samhw wrote:
             | This is true. I'm a Rust fanatic myself, but I grudgingly
             | admit that I wouldn't start a company with Rust. I just
             | wouldn't trust future engineers enough. Whereas with Go,
             | I'd never write it in my spare time, and I hate the
             | paternalism and limitedness, but those same qualities make
             | it an excellent quality for a company language.
             | 
             | If there's one thing I'd say about Go, it's that it's a
             | language you can roll out across hundreds of junior
             | engineers writing relatively sophisticated code and trust
             | that you'll get a very respectable balance of (a) runtime
             | performance, (b) developer productivity, and (c) safety
             | (memory, thread[0], type, etc).
             | 
             | [0] OK, not in the strict way that Rust offers. But the
             | simplicity and verbosity makes it easy to spot errors, the
             | standard lib offers excellent primitives for concurrent
             | programming, and `go test -race` sweeps up most of the
             | rest.
        
         | eyelidlessness wrote:
         | > MacOS prior to X was cooperatively scheduled across the whole
         | OS, and it was definitely breaking down and why Apple needed
         | such a radical change.
         | 
         | Preemptive multitasking was definitely one of the headline
         | features of Mac OS X, but I think the other major one--
         | protected memory--was the more important problem to solve with
         | classic MacOS. Performance on classic MacOS was _better_ for
         | many common workloads than OS X, for several versions of the
         | latter. And yes, playing a video while minimizing the window
         | was a cool demo, and that capability is table stakes for an OS
         | now. But the crashing bomb app demo was far more compelling for
         | most of us Mac greybeards.
         | 
         | That said, cooperative concurrency in JS is a very different
         | thing than in an OS. It's not a panacea of course, but the
         | typical APIs available are pervasively concurrent or trivial to
         | make concurrent. And where that's not enough (CPU bound
         | workloads), all mainstream JS environments now support real
         | threads. Granted that doesn't mean JS is an ideal solution for
         | CPU bound workloads... but if JS is already a prerequisite,
         | it's worth considering scaling out with threads before more
         | expensive process spawning and IPC.
        
       | mfbx9da4 wrote:
       | This is excellent and an answer to my deleted question on
       | stackoverflow!
       | https://stackoverflow.com/questions/69886637/measure-time-ta...
        
         | hexsprite wrote:
         | how come it was deleted?
        
       | naugtur wrote:
       | Hi, blocked-at author here.
       | 
       | Get in touch over email if you want to explore further.
       | 
       | Some quick comments:
       | 
       | - I didn't notice the node version or you didn't state it. The
       | impact from async hooks is _vastly_ different between node
       | versions
       | 
       | - I need to update the package a bit, I've got a known perf
       | improvement to add AFAIR.
       | 
       | - your implementation is likely to produce false positives. The
       | trick to prevent (some) false positives is the most valuable part
       | of the lib.
       | 
       | Also, take a look at the event loop utilization metric introduced
       | last year
       | https://m.youtube.com/watch?v=WetXnEPraYM&list=PL0CdgOSSGlBa...
       | 
       | And for more of my diagnostics experiments see debugging-aid
       | package https://www.npmjs.com/package/debugging-aid
        
         | abhikp wrote:
         | Hey! We (I work at Ashby) are on latest node 16. We'll update
         | the post to reflect that. Happy to chat over email (edit: found
         | your email and will reach out)
        
           | naugtur wrote:
           | We could break blocked-at into two layers so that you could
           | provide the data collection mechanism. You could report to
           | datadog while using the false positive filter.
           | 
           | Also, there's a bunch of tools you could use before you
           | deploy blocked-at or async hooks at all.
           | 
           | Note it's midnight here and I'll stop responding very soon :D
        
       | des429 wrote:
       | fyi to anyone seriously considering using these: neither is going
       | to work as expected if something blocks the loop indefinitely. In
       | other words, you won't know how long something blocked the loop
       | until that thing has finished blocking. Timeouts for async code
       | or limits on loop statements are still relevant.
        
         | naugtur wrote:
         | If something is blocking the eventloop indefinitely, it's
         | unlikely to reach production and even if it does, a simple
         | crash report or perf inspection will reveal it. You don't need
         | more tools than node-report for that case. Permanent eventloop
         | block is the simple case.
         | 
         | Regular eventloop blocking by synchronous processing is the
         | middle ground.
         | 
         | Performance issues with utilization of resources, too many
         | promises or broken backpressure - that's where the fun begins.
         | 
         | If you can run your software locally and simulate load, just
         | use node clinic. It's the best looking one ;)
        
       ___________________________________________________________________
       (page generated 2022-03-17 23:00 UTC)