[HN Gopher] Node.js, Pipes, and Disappearing Bytes
___________________________________________________________________
Node.js, Pipes, and Disappearing Bytes
Author : mooreds
Score : 49 points
Date : 2024-10-13 14:54 UTC (7 hours ago)
(HTM) web link (sxlijin.github.io)
(TXT) w3m dump (sxlijin.github.io)
| moralestapia wrote:
| ???
|
| Wait, so what's the solution?
| simonbw wrote:
| My understanding is that it's "don't call system.exit() until
| you have finished writing everything to system.stdout".
| gcommer wrote:
| Even better: never use process.exit(). Set process.exitCode
| instead.
|
| https://nodejs.org/api/process.html#processexitcode
| o11c wrote:
| Or just throw an error.
|
| See also `process.on('exit', ...)`; you can conditionally
| set process.exitCode in it (only if there was no other
| failure) if you want.
| 3np wrote:
| Use Stream.finished. $ node -e "const {
| Stream } = require('stream'); s = process.stdout;
| s.write('@'.repeat(128 * 1024)); Stream.finished(s, () => {
| process.exit(0); })" | wc -c 131072
|
| https://nodejs.org/api/stream.html#streamfinishedstream-opti...
| ksr wrote:
| I use fs.writeFileSync: $ node -e
| "fs.writeFileSync(1, Buffer.from('@'.repeat(128 * 1024)));
| process.exit(0);" | wc -c 131072
| 3np wrote:
| It doesn't work for all streams but for the stdout/stderr
| case where they can be treated like file descriptors, I
| prefer this simpler approach over fiddling with the Streams
| API.
|
| (If blocking the process on writing synchronously to stdout
| is indeed what you actually want...)
| jitl wrote:
| Here's how I solved this problem in Notion's internal command
| line tools: function
| flushWritableStream(stream: NodeJS.WritableStream) {
| return new Promise(resolve => stream.write("", resolve)).catch(
| handleFlushError, ) } /**
| * In NodeJS, process.stdout and process.stderr behave
| inconsistently depending on the type * of file they are
| connected to. * * When connected to unix pipes,
| these streams are *async*, so we need to wait for them to be
| flushed * before we exit, otherwise we get truncated
| results when using a Unix pipe. * * @see
| https://nodejs.org/api/process.html#process_a_note_on_process_i_o
| */ export async function flushStdoutAndStderr() {
| await Promise.all([
| flushWritableStream(process.stdout),
| flushWritableStream(process.stderr), ]) }
| /** * If `module` is the NodeJS entrypoint: *
| * Wait for `main` to finish, then exit 0. * Note that
| this does not wait for the event loop to drain; * it is
| suited to commands that run to completion. * *
| For processes that must outlive `main`, see `startIfMain`.
| */ if (require.main === module) { await
| main(argv) await flushStdoutAndStderr()
| setTimeout(() => process.exit(0)) }
| 3np wrote:
| Hm, that does not solve the issue in TFA and I'm not sure it
| consistently works like you intend: $ node -e
| "process.stdout.write('@'.repeat(128 * 1024));
| process.stdout.write(''); process.exit(0);" | wc -c
| 65536
|
| You want to use `Stream.finished`: $ node -e
| "const { Stream } = require('stream'); s = process.stdout;
| s.write('@'.repeat(128 * 1024)); Stream.finished(s, () => {
| process.exit(0); })" | wc -c 131072
|
| https://nodejs.org/api/stream.html#streamfinishedstream-opti...
|
| If this helps you, consider bothering your infra team to look
| at opening access back up for tor IPs (;
| ecedeno wrote:
| > node -e "process.stdout.write('@'.repeat(128 * 1024));
| process.stdout.write(''); process.exit(0);" | wc -c
|
| That's missing the part where it waits for the last write to
| flush before exiting.
| 3np wrote:
| You might think that would be it? $ node
| -e "process.stdout.write('@'.repeat(128 * 1024));
| process.stdout.write(''); setTimeout(()=>{ process.exit(0);
| }, 0);" | wc -c 131072
|
| Sure. But not so fast! You're still just racing and have no
| guarantees. Increase the pressure and it snaps back:
| $ node -e "process.stdout.write('@'.repeat(1280 * 1024));
| process.stdout.write(''); setTimeout(()=>{ process.exit(0);
| }, 0);" | wc -c 65536
|
| Meanwhile: $ node -e "const { Stream } =
| require('stream'); s = process.stdout;
| s.write('@'.repeat(1280 * 1024)); Stream.finished(s, () =>
| { process.exit(0); })" | wc -c 1310720
| jitl wrote:
| Your one liner doesn't do the same thing as the code I
| posted. You missed passing a callback to stream.write and
| waiting for the callback before exiting the process.
|
| Once we await that callback to resolve, then the buffered
| data from before that point is known to be flushed. I don't
| want to wait for a "finished" event because I want my program
| to terminate once my main() function and a flush after main()
| is complete. If I wait for finish, a background "thread" can
| extend the process lifetime by continuing to log.
|
| This is also why we don't set process.exitCode and wait for
| the event loop to drain and Node to auto-exit. That might be
| the "right way", but if someone leaves a lingering
| setTimeout(..., 10_MINUTES) you'll have a bad time.
|
| It's also more clear to reason about. Our rule: if you want
| the process to wait for your async work to complete, then
| `await` it.
| 3np wrote:
| > Once we await that callback to resolve, then the buffered
| data from before that point is known to be flushed.
|
| This is not true IME. Just more likely.
|
| See my comment here:
| https://news.ycombinator.com/item?id=41829905
|
| > If I wait for finish, a background "thread" can extend
| the process lifetime by continuing to log.
|
| Can you not address this by calling .end() on the stream?
|
| https://nodejs.org/api/stream.html#writableendchunk-
| encoding...
|
| > Our rule: if you want the process to wait for your async
| work to complete, then `await` it.
|
| In general I very much agree. In this specific case,
| though, the awaits are just buying you a bit more time to
| race the buffer which is why they appear to help. The
| "flush" will not wait for completion before exiting the
| process.
|
| Maybe also worth keeping in mind that even if this would be
| working, stuff like this has historically changed
| breakingly a few times between major Node releases. If
| you're relying on unspecified behavior you're obviously in
| darker waters.
| throwitaway1123 wrote:
| > See my comment here:
| https://news.ycombinator.com/item?id=41829905
|
| Your comment isn't equivalent to the original code.
|
| Your one liner is doing this: `process.stdout.write('')`
|
| Jitl's example is doing this: `new Promise(resolve =>
| stream.write("", resolve))`
|
| He's passing in a promise resolver as the callback to
| stream.write (this is basically a `util.promisify`
| version of the writable.write chunk callback). If the
| data is written in order (which it should be), then I
| don't see how the `stream.write` promise could resolve
| before prior data is flushed. The documentation says:
| "The writable.write() method writes some data to the
| stream, and calls the supplied callback once the data has
| been fully handled". [1]
|
| [1]
| https://nodejs.org/api/stream.html#writablewritechunk-
| encodi...
| jitl wrote:
| It's so easy to miss a tiny but important detail when
| working with or discussing Node streams and come out the
| other side bamboozled by bugs. This whole thread
| exemplifies why I tell people to stay as far away from
| streams as possible. Though in the case of stdout/stderr
| we can't avoid it.
| maple3142 wrote:
| I think this is what @jitl means: node -e
| "process.stdout.write('@'.repeat(128 * 1024));
| process.stdout.write('',()=>process.exit(0)); " | wc -c
|
| It writes an empty string and use its callback to detect if
| it has been flushed, which means previous writes are also
| flushed.
| hipadev23 wrote:
| I'm confused. If process.stdout.write() returns false when the
| pipe is full, do you not need to loop and call it again or
| something analogous? Or does it continue operating on the write
| in the background and that's why waiting for the .finished()
| event works?
|
| Is there a reason it doesn't use standard nodejs promise
| semantics (await process.stdout.write)? So probably best solution
| is util.promisify()?
| benatkin wrote:
| This is clickbait. The process exiting without unflushed output
| doesn't mean disappearing bytes. The bytes were there but the
| program left without them.
| richbell wrote:
| Yes, that is the conclusion of the investigation. It does not
| make this clickbait.
| benatkin wrote:
| I disagree, which is why I left my comment. The bytes didn't
| disappear in a way that justifies the headline on HN IMHO
| because there was a very clear explanation of where they
| went.
| fovc wrote:
| POSIX is weird, but NodeJS streams are designed to be misused
| molsson wrote:
| process.stdout._handle.setBlocking(true)
|
| ...is a bit brutal but works. Draining the stream before exiting
| also kind of works but there are cases where drain will just
| permanently block.
|
| async function drain(stream) { return new Promise((resolve) =>
| stream.on('drain', resolve)) }
| yladiz wrote:
| Which cases will it permanently block?
| jitl wrote:
| The writable stream will only emit 'drain' if the buffer
| fills past the limit. In that case, a prior call to
| `writable.write(...)` would return `false` indicating you
| should wait for drain before calling write again. Even if
| your code can't access the return value for the last
| writable.write call, you can check `if
| (writable.writableNeedDrain) { ...` to decide to wait for
| drain.
|
| This program will run forever, since we never write to
| stdout, stdout never "drains":
| setInterval(() => console.error("stderr: ", new
| Date().toISOString()), 5_000)
| process.stdout.once("drain", () => {
| console.error("stderr: stdout.once.drain, exit")
| process.exit(0) })
| jitl wrote:
| When you want to await a single instance of a Node
| EventEmitter, please use `stream.once('drain', (err) => ...)`
| so you don't leak your listener callback after the promise
| resolves.
| arctek wrote:
| fsync doesn't work here right because unix pipes are in memory?
| I've had luck elsewhere with nodejs and WriteableStreams that
| refuse to flush their buffers before a process.exit() using fsync
| on the underlying file descriptors.
___________________________________________________________________
(page generated 2024-10-13 22:00 UTC)