[HN Gopher] Zig's New Writer
       ___________________________________________________________________
        
       Zig's New Writer
        
       Author : Bogdanp
       Score  : 100 points
       Date   : 2025-07-17 15:01 UTC (2 days ago)
        
 (HTM) web link (www.openmymind.net)
 (TXT) w3m dump (www.openmymind.net)
        
       | mishafb wrote:
       | I agree on the last point of the lack of composition here.
       | 
       | While it's true that writers need to be aware of buffering to
       | make use of fancy syscalls, implementing that should be an
       | option, but not a requirement.
       | 
       | Naively this would mean implementing one of two APIs in an
       | interface, which ruins the direct peformance. So I see why the
       | choice was made, but I still hope for something better.
       | 
       | It's probably not possible with zig's current capabilities, but I
       | would ideally like to see a solution that:
       | 
       | - Allows implementations to know at comptime what the interface
       | actually implements and optimize for that (is buffering
       | supported? Can you get access to the buffer inplace for zero
       | copy?).
       | 
       | - For the generic version (which is in the vtable), choose one of
       | the methods and wrap it (at comptime).
       | 
       | There's so many directions to take Zig into (more types? more
       | metaprogramming? closer to metal?) so it's always interesting to
       | see new developments!
        
         | biggerben wrote:
         | I wonder if making this change will improve design of buffering
         | across IO implementers because buffering needs consideration
         | upfront, rather than treatment as some feature bolted on the
         | side?
         | 
         | It's a good sacrifice if the redesign, whilst being more
         | complicated, is avoiding an oversimplified abstraction which
         | end up restricting optimisation opportunities.
        
         | messe wrote:
         | > While it's true that writers need to be aware of buffering to
         | make use of fancy syscalls, implementing that should be an
         | option, but not a requirement.
         | 
         | Buffering is implemented and handled in the vtable struct
         | itself, the writers (implentations of the interface) themselves
         | don't actually have to know or care about it other than passing
         | through the user-provided buffer when initializing the vtable.
         | 
         | If you don't want buffering, you can pass a zero-length buffer
         | upon creation, and it'll get optimized out. This optimization
         | doesn't require devirtualization because the buffering happens
         | before any virtual function calls.
        
       | amluto wrote:
       | From my personal experience, buffered and unbuffered writers are
       | different enough that I think it's a bit of a mistake to make
       | them indistinguishable to the type system. An unbuffered writer
       | sends the data out of process immediately. A buffered writer
       | usually doesn't, so sleeping after a write (or just doing
       | something else and not writing more for a while) will delay the
       | write indefinitely. An unbuffered write does not do this.
       | 
       | This means that plenty of algorithms are correct with unbuffered
       | writers and are incorrect with buffered writers. I've been bitten
       | by this and diagnosed bugs caused by this multiple times.
       | 
       | Meanwhile an unbuffered writer has abysmal performance if you
       | write a byte at a time.
       | 
       | I'd rather see an interface (trait, abstract class, whatever the
       | language calls it) for a generic writer, with appropriate
       | warnings that you probably don't want to use it unless you take
       | specific action to address its shortcomings, and subtypes for
       | buffered and unbuffered writers.
       | 
       | And there could be a conditional buffering wrapper that
       | temporarily adds buffering to a generic writer and is zero-cost
       | if applied to an already buffered writer. A language with
       | enforced borrowing semantics like Rust could make this very hard
       | to misuse. But even Python could do it decently well, e.g.:
       | w: MaybeBufferedByteWriter         with io.LocalBuffer(w) as
       | bufwriter:             do stuff with bufwriter
        
         | donatj wrote:
         | I absolutely agree, and would like to add I feel like the
         | ergonomics of the new interface are just very awkward and
         | almost leaky.
         | 
         | Buffered and unbuffered IO should just be entirely separately
         | things, and separate interfaces. Then as you mention the
         | standard library can provide an adapter in at least one
         | direction, maybe both.
         | 
         | This seems like a blunder to me.
        
         | josephg wrote:
         | > This means that plenty of algorithms are correct with
         | unbuffered writers and are incorrect with buffered writers.
         | I've been bitten by this and diagnosed bugs caused by this
         | multiple times.
         | 
         | But write() on POSIX is also a buffered API. Until your program
         | calls fsync / fdatasync, linux isn't required to actually flush
         | anything to the underlying storage medium. And even then, many
         | consumer storage devices will lie and return from fsync
         | immediately before data has actually been flushed.
         | 
         | All the OSes that I know of will eagerly write data instead of
         | waiting for fsync, but there's no guarantee the data will be
         | persisted by the time your write() call returns. It usually
         | isn't. If you're relying on write() to durably flush data to
         | disk, you've probably got correctness / data corruption bugs
         | lurking in your code that will show up if power goes out at the
         | wrong time.
        
           | InfiniteRand wrote:
           | This has bit me a few times when a Linux system crashes so
           | there's no final call to fsync implicit or otherwise
        
           | o11c wrote:
           | I wouldn't call that "buffered", since `write` _is_
           | guaranteed to appear immediately from the view of other
           | processes that can see the same mount. It 's only the disk
           | that needs to be informed to really pick up (could we say
           | "read"?) the changes.
        
           | amluto wrote:
           | I'm not talking about data loss if the host crashes. I'm
           | talking about a much broader sense of correctness.
           | 
           | Imagine an RPC server that uses persistent connections. If
           | you reply using a buffered writer and forget to flush, then
           | your tail latency blows up, possibly to infinity. It's very
           | easy to imagine situations involving multiple threads or
           | processes that simply don't work if buffers aren't flushed on
           | time.
           | 
           | Imagine a program that is intended to crash on error but
           | writes a log message first. If it buffers and doesn't flush,
           | then the log message will almost always get lost, whereas if
           | the writer is unbuffered or is flushed, then the message will
           | usually not get lost.
        
         | 0xcafefood wrote:
         | Great point. It's like the earlier days where remote procedure
         | calls were intended to happen "transparently" but the fact that
         | networking is involved in some procedure calls and not others
         | makes them very different in key ways that should not be
         | hidden.
        
       | mmastrac wrote:
       | It's an interesting choice, but every writer now needs to handle:
       | 
       | 1) vectored i/o (array of arrays, lots of fun for cache lines)
       | 
       | 2) buffering
       | 
       | 3) a splat optimization for compression? (skipped over in this
       | post, but mentioned in an earlier one)
       | 
       | I'm skeptical here, but I guess we will see if adding this
       | overhead on all I/O is a win. Devirtualization helps _sometimes_
       | but when you've got larger systems it's entirely possible you've
       | got sync and async I/O in the same optimization space and lose
       | out on optimization opportunities.
       | 
       | In practice, I/O stacks tend to consist of a lot of composition,
       | and in many cases, leak a lot of abstractions. Buffering is one
       | part, corking/backpressure is another (neither of which is
       | handled here, but I might be mistaken). In some cases, you've got
       | meaningful framing on streams that needs to be maintained (or
       | decorated with metadata).
       | 
       | If it works out, I suppose this will be a new I/O paradigm. In
       | fairness, nobody has _really_ solved I/O yet, so maybe a brave
       | new swing is what we need.
        
       | wasmperson wrote:
       | I'm not usually in the "defending C++" camp, but when I see this:
       | pub const File = struct {                pub fn writer(self:
       | *File, buffer: []u8) Writer{           return .{
       | .file = self,             .interface = std.Io.Writer{
       | .buffer = buffer,               .vtable = .{.drain =
       | Writer.drain},             }           };         }
       | pub const Writer = struct {           file: *File,
       | interface: std.Io.Writer,           // this has a bunch of other
       | fields                  fn drain(io_w: *Writer, data: []const
       | []const u8, splat: usize) !usize {             const self:
       | *Writer = @fieldParentPtr("interface", io_w);             // ....
       | }         }       }
       | 
       | ...I can't help but think of this:                 struct
       | FileWriter: public Writer {           File *file;           //
       | this has a bunch of other fields                  FileWriter(File
       | *self, span<char> buffer)               : Writer(buffer),
       | file(self) {}                  size_t drain(span<span<char const>
       | const> data, size_t splat) override {             // ....
       | }       };
       | 
       | Writing code to build a vtable and having it implicitly run at
       | compile time is pretty neat, though!
        
         | varyous wrote:
         | The zig std library often builds vtables for structs in an
         | effort to minimize runtime cost for the typical non-virtual
         | cases. I feel it leads to creating a lot of boilerplate to
         | effectively have virtual functions. Worse, you have to study
         | the zig code in a case by case basis to determine how to even
         | use this ad-hoc virtual function scheme. Surely zig can
         | introduce virtual function support in a more ergonomic way than
         | this, as it's so widely used in real life code and extensively
         | in zig's own std library.
        
       | bvrmn wrote:
       | I consider built-in buffering as a huge win.
       | 
       | * For example python gives you buffered by default streams. It's
       | an amazing DX.
       | 
       | * In case of Zig you as a developer should be explicit about
       | buffer sizes.
       | 
       | * You could opt-out to unbuffered any time.
       | 
       | * Allows for optimization without leaky "composable" io stack.
        
       ___________________________________________________________________
       (page generated 2025-07-19 23:01 UTC)