[HN Gopher] Node.js, Pipes, and Disappearing Bytes
       ___________________________________________________________________
        
       Node.js, Pipes, and Disappearing Bytes
        
       Author : mooreds
       Score  : 49 points
       Date   : 2024-10-13 14:54 UTC (7 hours ago)
        
 (HTM) web link (sxlijin.github.io)
 (TXT) w3m dump (sxlijin.github.io)
        
       | moralestapia wrote:
       | ???
       | 
       | Wait, so what's the solution?
        
         | simonbw wrote:
         | My understanding is that it's "don't call system.exit() until
         | you have finished writing everything to system.stdout".
        
           | gcommer wrote:
           | Even better: never use process.exit(). Set process.exitCode
           | instead.
           | 
           | https://nodejs.org/api/process.html#processexitcode
        
             | o11c wrote:
             | Or just throw an error.
             | 
             | See also `process.on('exit', ...)`; you can conditionally
             | set process.exitCode in it (only if there was no other
             | failure) if you want.
        
         | 3np wrote:
         | Use Stream.finished.                   $ node -e "const {
         | Stream } = require('stream'); s = process.stdout;
         | s.write('@'.repeat(128 * 1024)); Stream.finished(s, () => {
         | process.exit(0); })" | wc -c         131072
         | 
         | https://nodejs.org/api/stream.html#streamfinishedstream-opti...
        
         | ksr wrote:
         | I use fs.writeFileSync:                 $ node -e
         | "fs.writeFileSync(1, Buffer.from('@'.repeat(128 * 1024)));
         | process.exit(0);" | wc -c       131072
        
           | 3np wrote:
           | It doesn't work for all streams but for the stdout/stderr
           | case where they can be treated like file descriptors, I
           | prefer this simpler approach over fiddling with the Streams
           | API.
           | 
           | (If blocking the process on writing synchronously to stdout
           | is indeed what you actually want...)
        
       | jitl wrote:
       | Here's how I solved this problem in Notion's internal command
       | line tools:                   function
       | flushWritableStream(stream: NodeJS.WritableStream) {
       | return new Promise(resolve => stream.write("", resolve)).catch(
       | handleFlushError,           )         }                  /**
       | * In NodeJS, process.stdout and process.stderr behave
       | inconsistently depending on the type          * of file they are
       | connected to.          *          * When connected to unix pipes,
       | these streams are *async*, so we need to wait for them to be
       | flushed          * before we exit, otherwise we get truncated
       | results when using a Unix pipe.          *          * @see
       | https://nodejs.org/api/process.html#process_a_note_on_process_i_o
       | */         export async function flushStdoutAndStderr() {
       | await Promise.all([
       | flushWritableStream(process.stdout),
       | flushWritableStream(process.stderr),           ])         }
       | /**          * If `module` is the NodeJS entrypoint:          *
       | * Wait for `main` to finish, then exit 0.          * Note that
       | this does not wait for the event loop to drain;          * it is
       | suited to commands that run to completion.          *          *
       | For processes that must outlive `main`, see `startIfMain`.
       | */         if (require.main === module) {           await
       | main(argv)           await flushStdoutAndStderr()
       | setTimeout(() => process.exit(0))         }
        
         | 3np wrote:
         | Hm, that does not solve the issue in TFA and I'm not sure it
         | consistently works like you intend:                   $ node -e
         | "process.stdout.write('@'.repeat(128 * 1024));
         | process.stdout.write(''); process.exit(0);" | wc -c
         | 65536
         | 
         | You want to use `Stream.finished`:                   $ node -e
         | "const { Stream } = require('stream'); s = process.stdout;
         | s.write('@'.repeat(128 * 1024)); Stream.finished(s, () => {
         | process.exit(0); })" | wc -c         131072
         | 
         | https://nodejs.org/api/stream.html#streamfinishedstream-opti...
         | 
         | If this helps you, consider bothering your infra team to look
         | at opening access back up for tor IPs (;
        
           | ecedeno wrote:
           | > node -e "process.stdout.write('@'.repeat(128 * 1024));
           | process.stdout.write(''); process.exit(0);" | wc -c
           | 
           | That's missing the part where it waits for the last write to
           | flush before exiting.
        
             | 3np wrote:
             | You might think that would be it?                   $ node
             | -e "process.stdout.write('@'.repeat(128 * 1024));
             | process.stdout.write(''); setTimeout(()=>{ process.exit(0);
             | }, 0);" | wc -c         131072
             | 
             | Sure. But not so fast! You're still just racing and have no
             | guarantees. Increase the pressure and it snaps back:
             | $ node -e "process.stdout.write('@'.repeat(1280 * 1024));
             | process.stdout.write(''); setTimeout(()=>{ process.exit(0);
             | }, 0);" | wc -c         65536
             | 
             | Meanwhile:                   $ node -e "const { Stream } =
             | require('stream'); s = process.stdout;
             | s.write('@'.repeat(1280 * 1024)); Stream.finished(s, () =>
             | { process.exit(0); })" | wc -c         1310720
        
           | jitl wrote:
           | Your one liner doesn't do the same thing as the code I
           | posted. You missed passing a callback to stream.write and
           | waiting for the callback before exiting the process.
           | 
           | Once we await that callback to resolve, then the buffered
           | data from before that point is known to be flushed. I don't
           | want to wait for a "finished" event because I want my program
           | to terminate once my main() function and a flush after main()
           | is complete. If I wait for finish, a background "thread" can
           | extend the process lifetime by continuing to log.
           | 
           | This is also why we don't set process.exitCode and wait for
           | the event loop to drain and Node to auto-exit. That might be
           | the "right way", but if someone leaves a lingering
           | setTimeout(..., 10_MINUTES) you'll have a bad time.
           | 
           | It's also more clear to reason about. Our rule: if you want
           | the process to wait for your async work to complete, then
           | `await` it.
        
             | 3np wrote:
             | > Once we await that callback to resolve, then the buffered
             | data from before that point is known to be flushed.
             | 
             | This is not true IME. Just more likely.
             | 
             | See my comment here:
             | https://news.ycombinator.com/item?id=41829905
             | 
             | > If I wait for finish, a background "thread" can extend
             | the process lifetime by continuing to log.
             | 
             | Can you not address this by calling .end() on the stream?
             | 
             | https://nodejs.org/api/stream.html#writableendchunk-
             | encoding...
             | 
             | > Our rule: if you want the process to wait for your async
             | work to complete, then `await` it.
             | 
             | In general I very much agree. In this specific case,
             | though, the awaits are just buying you a bit more time to
             | race the buffer which is why they appear to help. The
             | "flush" will not wait for completion before exiting the
             | process.
             | 
             | Maybe also worth keeping in mind that even if this would be
             | working, stuff like this has historically changed
             | breakingly a few times between major Node releases. If
             | you're relying on unspecified behavior you're obviously in
             | darker waters.
        
               | throwitaway1123 wrote:
               | > See my comment here:
               | https://news.ycombinator.com/item?id=41829905
               | 
               | Your comment isn't equivalent to the original code.
               | 
               | Your one liner is doing this: `process.stdout.write('')`
               | 
               | Jitl's example is doing this: `new Promise(resolve =>
               | stream.write("", resolve))`
               | 
               | He's passing in a promise resolver as the callback to
               | stream.write (this is basically a `util.promisify`
               | version of the writable.write chunk callback). If the
               | data is written in order (which it should be), then I
               | don't see how the `stream.write` promise could resolve
               | before prior data is flushed. The documentation says:
               | "The writable.write() method writes some data to the
               | stream, and calls the supplied callback once the data has
               | been fully handled". [1]
               | 
               | [1]
               | https://nodejs.org/api/stream.html#writablewritechunk-
               | encodi...
        
               | jitl wrote:
               | It's so easy to miss a tiny but important detail when
               | working with or discussing Node streams and come out the
               | other side bamboozled by bugs. This whole thread
               | exemplifies why I tell people to stay as far away from
               | streams as possible. Though in the case of stdout/stderr
               | we can't avoid it.
        
           | maple3142 wrote:
           | I think this is what @jitl means:                 node -e
           | "process.stdout.write('@'.repeat(128 * 1024));
           | process.stdout.write('',()=>process.exit(0)); " | wc -c
           | 
           | It writes an empty string and use its callback to detect if
           | it has been flushed, which means previous writes are also
           | flushed.
        
       | hipadev23 wrote:
       | I'm confused. If process.stdout.write() returns false when the
       | pipe is full, do you not need to loop and call it again or
       | something analogous? Or does it continue operating on the write
       | in the background and that's why waiting for the .finished()
       | event works?
       | 
       | Is there a reason it doesn't use standard nodejs promise
       | semantics (await process.stdout.write)? So probably best solution
       | is util.promisify()?
        
       | benatkin wrote:
       | This is clickbait. The process exiting without unflushed output
       | doesn't mean disappearing bytes. The bytes were there but the
       | program left without them.
        
         | richbell wrote:
         | Yes, that is the conclusion of the investigation. It does not
         | make this clickbait.
        
           | benatkin wrote:
           | I disagree, which is why I left my comment. The bytes didn't
           | disappear in a way that justifies the headline on HN IMHO
           | because there was a very clear explanation of where they
           | went.
        
       | fovc wrote:
       | POSIX is weird, but NodeJS streams are designed to be misused
        
       | molsson wrote:
       | process.stdout._handle.setBlocking(true)
       | 
       | ...is a bit brutal but works. Draining the stream before exiting
       | also kind of works but there are cases where drain will just
       | permanently block.
       | 
       | async function drain(stream) { return new Promise((resolve) =>
       | stream.on('drain', resolve)) }
        
         | yladiz wrote:
         | Which cases will it permanently block?
        
           | jitl wrote:
           | The writable stream will only emit 'drain' if the buffer
           | fills past the limit. In that case, a prior call to
           | `writable.write(...)` would return `false` indicating you
           | should wait for drain before calling write again. Even if
           | your code can't access the return value for the last
           | writable.write call, you can check `if
           | (writable.writableNeedDrain) { ...` to decide to wait for
           | drain.
           | 
           | This program will run forever, since we never write to
           | stdout, stdout never "drains":
           | setInterval(() => console.error("stderr: ", new
           | Date().toISOString()), 5_000)
           | process.stdout.once("drain", () => {
           | console.error("stderr: stdout.once.drain, exit")
           | process.exit(0)         })
        
         | jitl wrote:
         | When you want to await a single instance of a Node
         | EventEmitter, please use `stream.once('drain', (err) => ...)`
         | so you don't leak your listener callback after the promise
         | resolves.
        
       | arctek wrote:
       | fsync doesn't work here right because unix pipes are in memory?
       | I've had luck elsewhere with nodejs and WriteableStreams that
       | refuse to flush their buffers before a process.exit() using fsync
       | on the underlying file descriptors.
        
       ___________________________________________________________________
       (page generated 2024-10-13 22:00 UTC)