[HN Gopher] Zig's New Writer
___________________________________________________________________
Zig's New Writer
Author : Bogdanp
Score : 100 points
Date : 2025-07-17 15:01 UTC (2 days ago)
(HTM) web link (www.openmymind.net)
(TXT) w3m dump (www.openmymind.net)
| mishafb wrote:
| I agree on the last point of the lack of composition here.
|
| While it's true that writers need to be aware of buffering to
| make use of fancy syscalls, implementing that should be an
| option, but not a requirement.
|
| Naively this would mean implementing one of two APIs in an
| interface, which ruins the direct peformance. So I see why the
| choice was made, but I still hope for something better.
|
| It's probably not possible with zig's current capabilities, but I
| would ideally like to see a solution that:
|
| - Allows implementations to know at comptime what the interface
| actually implements and optimize for that (is buffering
| supported? Can you get access to the buffer inplace for zero
| copy?).
|
| - For the generic version (which is in the vtable), choose one of
| the methods and wrap it (at comptime).
|
| There's so many directions to take Zig into (more types? more
| metaprogramming? closer to metal?) so it's always interesting to
| see new developments!
| biggerben wrote:
| I wonder if making this change will improve design of buffering
| across IO implementers because buffering needs consideration
| upfront, rather than treatment as some feature bolted on the
| side?
|
| It's a good sacrifice if the redesign, whilst being more
| complicated, is avoiding an oversimplified abstraction which
| end up restricting optimisation opportunities.
| messe wrote:
| > While it's true that writers need to be aware of buffering to
| make use of fancy syscalls, implementing that should be an
| option, but not a requirement.
|
| Buffering is implemented and handled in the vtable struct
| itself, the writers (implentations of the interface) themselves
| don't actually have to know or care about it other than passing
| through the user-provided buffer when initializing the vtable.
|
| If you don't want buffering, you can pass a zero-length buffer
| upon creation, and it'll get optimized out. This optimization
| doesn't require devirtualization because the buffering happens
| before any virtual function calls.
| amluto wrote:
| From my personal experience, buffered and unbuffered writers are
| different enough that I think it's a bit of a mistake to make
| them indistinguishable to the type system. An unbuffered writer
| sends the data out of process immediately. A buffered writer
| usually doesn't, so sleeping after a write (or just doing
| something else and not writing more for a while) will delay the
| write indefinitely. An unbuffered write does not do this.
|
| This means that plenty of algorithms are correct with unbuffered
| writers and are incorrect with buffered writers. I've been bitten
| by this and diagnosed bugs caused by this multiple times.
|
| Meanwhile an unbuffered writer has abysmal performance if you
| write a byte at a time.
|
| I'd rather see an interface (trait, abstract class, whatever the
| language calls it) for a generic writer, with appropriate
| warnings that you probably don't want to use it unless you take
| specific action to address its shortcomings, and subtypes for
| buffered and unbuffered writers.
|
| And there could be a conditional buffering wrapper that
| temporarily adds buffering to a generic writer and is zero-cost
| if applied to an already buffered writer. A language with
| enforced borrowing semantics like Rust could make this very hard
| to misuse. But even Python could do it decently well, e.g.:
| w: MaybeBufferedByteWriter with io.LocalBuffer(w) as
| bufwriter: do stuff with bufwriter
| donatj wrote:
| I absolutely agree, and would like to add I feel like the
| ergonomics of the new interface are just very awkward and
| almost leaky.
|
| Buffered and unbuffered IO should just be entirely separately
| things, and separate interfaces. Then as you mention the
| standard library can provide an adapter in at least one
| direction, maybe both.
|
| This seems like a blunder to me.
| josephg wrote:
| > This means that plenty of algorithms are correct with
| unbuffered writers and are incorrect with buffered writers.
| I've been bitten by this and diagnosed bugs caused by this
| multiple times.
|
| But write() on POSIX is also a buffered API. Until your program
| calls fsync / fdatasync, linux isn't required to actually flush
| anything to the underlying storage medium. And even then, many
| consumer storage devices will lie and return from fsync
| immediately before data has actually been flushed.
|
| All the OSes that I know of will eagerly write data instead of
| waiting for fsync, but there's no guarantee the data will be
| persisted by the time your write() call returns. It usually
| isn't. If you're relying on write() to durably flush data to
| disk, you've probably got correctness / data corruption bugs
| lurking in your code that will show up if power goes out at the
| wrong time.
| InfiniteRand wrote:
| This has bit me a few times when a Linux system crashes so
| there's no final call to fsync implicit or otherwise
| o11c wrote:
| I wouldn't call that "buffered", since `write` _is_
| guaranteed to appear immediately from the view of other
| processes that can see the same mount. It 's only the disk
| that needs to be informed to really pick up (could we say
| "read"?) the changes.
| amluto wrote:
| I'm not talking about data loss if the host crashes. I'm
| talking about a much broader sense of correctness.
|
| Imagine an RPC server that uses persistent connections. If
| you reply using a buffered writer and forget to flush, then
| your tail latency blows up, possibly to infinity. It's very
| easy to imagine situations involving multiple threads or
| processes that simply don't work if buffers aren't flushed on
| time.
|
| Imagine a program that is intended to crash on error but
| writes a log message first. If it buffers and doesn't flush,
| then the log message will almost always get lost, whereas if
| the writer is unbuffered or is flushed, then the message will
| usually not get lost.
| 0xcafefood wrote:
| Great point. It's like the earlier days where remote procedure
| calls were intended to happen "transparently" but the fact that
| networking is involved in some procedure calls and not others
| makes them very different in key ways that should not be
| hidden.
| mmastrac wrote:
| It's an interesting choice, but every writer now needs to handle:
|
| 1) vectored i/o (array of arrays, lots of fun for cache lines)
|
| 2) buffering
|
| 3) a splat optimization for compression? (skipped over in this
| post, but mentioned in an earlier one)
|
| I'm skeptical here, but I guess we will see if adding this
| overhead on all I/O is a win. Devirtualization helps _sometimes_
| but when you've got larger systems it's entirely possible you've
| got sync and async I/O in the same optimization space and lose
| out on optimization opportunities.
|
| In practice, I/O stacks tend to consist of a lot of composition,
| and in many cases, leak a lot of abstractions. Buffering is one
| part, corking/backpressure is another (neither of which is
| handled here, but I might be mistaken). In some cases, you've got
| meaningful framing on streams that needs to be maintained (or
| decorated with metadata).
|
| If it works out, I suppose this will be a new I/O paradigm. In
| fairness, nobody has _really_ solved I/O yet, so maybe a brave
| new swing is what we need.
| wasmperson wrote:
| I'm not usually in the "defending C++" camp, but when I see this:
| pub const File = struct { pub fn writer(self:
| *File, buffer: []u8) Writer{ return .{
| .file = self, .interface = std.Io.Writer{
| .buffer = buffer, .vtable = .{.drain =
| Writer.drain}, } }; }
| pub const Writer = struct { file: *File,
| interface: std.Io.Writer, // this has a bunch of other
| fields fn drain(io_w: *Writer, data: []const
| []const u8, splat: usize) !usize { const self:
| *Writer = @fieldParentPtr("interface", io_w); // ....
| } } }
|
| ...I can't help but think of this: struct
| FileWriter: public Writer { File *file; //
| this has a bunch of other fields FileWriter(File
| *self, span<char> buffer) : Writer(buffer),
| file(self) {} size_t drain(span<span<char const>
| const> data, size_t splat) override { // ....
| } };
|
| Writing code to build a vtable and having it implicitly run at
| compile time is pretty neat, though!
| varyous wrote:
| The zig std library often builds vtables for structs in an
| effort to minimize runtime cost for the typical non-virtual
| cases. I feel it leads to creating a lot of boilerplate to
| effectively have virtual functions. Worse, you have to study
| the zig code in a case by case basis to determine how to even
| use this ad-hoc virtual function scheme. Surely zig can
| introduce virtual function support in a more ergonomic way than
| this, as it's so widely used in real life code and extensively
| in zig's own std library.
| bvrmn wrote:
| I consider built-in buffering as a huge win.
|
| * For example python gives you buffered by default streams. It's
| an amazing DX.
|
| * In case of Zig you as a developer should be explicit about
| buffer sizes.
|
| * You could opt-out to unbuffered any time.
|
| * Allows for optimization without leaky "composable" io stack.
___________________________________________________________________
(page generated 2025-07-19 23:01 UTC)