[HN Gopher] Allocgate: Restructuring how allocators work in Zig
       ___________________________________________________________________
        
       Allocgate: Restructuring how allocators work in Zig
        
       Author : todsacerdoti
       Score  : 60 points
       Date   : 2021-12-15 20:21 UTC (2 hours ago)
        
 (HTM) web link (pithlessly.github.io)
 (TXT) w3m dump (pithlessly.github.io)
        
       | khiner wrote:
       | This was a fantastic, thorough explanation, thank you for putting
       | in the effort!
        
       | celeritascelery wrote:
       | I am not fully understanding how fat-pointers allow LLVM to
       | devirtualize the function calls. If the allocators are
       | polymorphic then a particular piece of code doesn't know which
       | vtable it will get at run time correct?
        
         | Spex_guy wrote:
         | It depends a lot, but in practice in Zig devirtualization is
         | effectively constant propagation. The compiler needs to see the
         | place where the vtable is created, and follow that to the place
         | where virtual functions are called, ensuring along the way that
         | nothing modifies the vtable. This is not possible for all uses
         | of interfaces, but it is possible for many of them, especially
         | ones where the interface is sort of "temporary" and you are
         | usually passing around the implementation. These are the cases
         | targeted by this change.
         | 
         | The difference in results has to do with pointer provenance
         | tracking and aliasing. With both approaches, the first call to
         | an interface function will almost definitely be devirtualized.
         | The problem is that that first call will also modify
         | implementation state. If the implementation function is not
         | inlined (which is common), this is tracked as a modification to
         | the memory region containing the implementation state. But with
         | the fieldParentPtr model, that's the same memory region
         | containing the vtable! So this breaks constant propagation on
         | the vtable and any later calls must always be fully virtual,
         | even if the optimizer can see the whole way from vtable
         | creation to virtual call.
        
         | ayende wrote:
         | You aren't modifying the same object When you have a fat
         | pointer, llvm can tell you are modifying the ptr, bot the
         | vtable
         | 
         | You can then cache the function call
         | 
         | When your vtable is in the object you are mutating, it needs to
         | read each time
        
           | celeritascelery wrote:
           | Does LLVM do the inline caching itself? Or does it just
           | enable zig to do this optimization?
        
             | ayende wrote:
             | What happens is likely that the code gen can do the virtual
             | lookup once, instead on each loop iteration
             | 
             | This is llvm, bit zig
        
       | Shadonototra wrote:
       | why not just an an interface type instead of this giant mess?
        
         | kristoff_it wrote:
         | "allocgate" is actually a meme name that we purposely gave to
         | this API change. In reality it's not a big deal and that's the
         | superpower that a language v0 has: you can make breaking
         | changes.
         | 
         | As for the builtin interface type, why get locked into one
         | particular implementation when you can have all of them by
         | leaving the choice to the programmer.
        
           | Shadonototra wrote:
           | > As for the builtin interface type, why get locked into one
           | particular implementation when you can have all of them by
           | leaving the choice to the programmer.
           | 
           | that's a very good point!
        
           | williamstein wrote:
           | I agree that it's not a big deal. I am not a Zig developer,
           | but I've written a few thousand lines of Zig code in the last
           | few months that extensively uses allocators all over the
           | place, and it only took me a few minutes to update my code to
           | work with this API change. Also, having seriously played
           | around in Zig for a few months writing high performance pure
           | mathematics and number theory code for fun, I *really,
           | really* like it. Zig is a fantastic language for certain
           | application domains.
        
       | levzettelin wrote:
       | Does anyone know if RAII types will ever be a thing in Zig?
        
         | Spex_guy wrote:
         | It's unlikely. RAII comes with a surprising amount of
         | complexity. In order to have a reasonably complete language
         | that has RAII and value types, you _must_ also have: -
         | constructors - destructors - overloadable copy assignment
         | operators - placement new - move semantics and rvalue
         | references
         | 
         | These features come together or not at all. If you lose any of
         | them, the language becomes less complete. I think Rust and C++
         | are doing a fine job of exploring the design space of languages
         | that have this feature set, but it's too much complexity for
         | Zig.
        
           | steveklabnik wrote:
           | > In order to have a reasonably complete language that has
           | RAII and value types, you must also have: - constructors -
           | destructors - overloadable copy assignment operators -
           | placement new - move semantics and rvalue references
           | 
           | Rust has RAII and value types, and does not have
           | constructors, overloadable copy assignment operators,
           | placement new, or rvalue references (though we do of course
           | have a very similar notion to rvalue/lvalue in general, but
           | that's not the same thing as "rvalue references" with
           | relation to all of this). While it has move semantics,
           | they're significantly different.
        
             | Spex_guy wrote:
             | I don't mean that these things need to manifest in exactly
             | the same way as they do in C++, but analagous features are
             | needed. You're right that rvalue references are not
             | necessary, but some form of move semantics are. When I say
             | constructors and destructors, I am really referring to
             | having a concept of object lifetimes as part of the
             | language. Zig does not have this, and is much simpler
             | because of it.
             | 
             | Edit: to clarify, the thing that makes a
             | constructor/destructor useful in this case _is_ the
             | property that it begins /ends an object lifetime according
             | to the language. This lifetime reasoning certainly has
             | benefits, like the ability to have const fields in C++ and
             | the ability to do static checking of lifetimes in Rust.
             | However it also comes with significant complexity, because
             | move semantics are needed throughout the language, and
             | begin/end lifetime tags are needed when implementing data
             | structures that use preallocated backing arrays.
        
               | steveklabnik wrote:
               | Hm, personally I consider "object lifetimes exist" to be
               | completely different than "constructors", which are a
               | hook into a specific point in some sort of object
               | lifetime cycle. Rust doesn't have the hook, so it doesn't
               | have the feature. Note that I didn't put destructors on
               | my list; the Drop trait does exist in Rust and is the
               | same general idea as destructors.
               | 
               | I guess that basically, to me at least, if you've
               | stretched the definitions of these features far enough to
               | include what Rust does, you don't really have a
               | meaningful definition any more.
        
           | tialaramex wrote:
           | Rust doesn't end up with an overloadable copy assignment
           | operator, it does something else which I would argue is
           | cleverer (although you can't just add it to an existing
           | language)
           | 
           | Because Rust knows the lifetime of everything in your program
           | (in Rust the lifetime of things is part of their type) the
           | effect of the assignment operator = is to dispose of whatever
           | was in the variable before, and _move_ the assigned item into
           | the variable.
           | 
           | Rust's Copy trait does _not_ alter the semantics of the
           | assignment operators - you can 't overload that. You promise
           | that your type's in-memory representation is all that
           | matters, and then if you move _from_ a variable the value in
           | that variable is still live even though there was a copy
           | made, usually that value would be dead because it was moved
           | from.
           | 
           | Rust's Clone trait behaves a little like a C++ copy
           | constructor, except, it's an explicit trait, the only way to
           | get a clone of x is to x.clone() or various moral equivalents
           | e.g. Clone::clone(&x); so you're not getting one without
           | explicitly asking for it.
           | 
           | Rust doesn't formally have Constructors, or from another
           | perspective, any Rust code anywhere which wants to make a
           | Thing, is a "Constructor" for that Thing. (safe) Rust won't
           | let you do any of the shenanigans which is common in C++ like
           | having two separate pieces of code share responsibility for
           | initialising a data structure, in Rust when you make a Thing
           | you need to explicitly set all the values in the Thing at
           | once, if the easy way to write that involves temporaries, no
           | matter the compiler does have an optimiser and knows how to
           | use it.
           | 
           | It is idiomatic in Rust to provide a function named new() in
           | the implementation of a structure which will make you one of
           | that structure if doing so makes sense, but that function and
           | its name aren't magic, it's just a convention. Rust's vector
           | type Vec has a new() function but it also has a
           | with_capacity(n) function, they're both "Constructors" in the
           | C++ sense if you want to think about it that way,
           | with_capacity() isn't calling new() to make the vector, that
           | would be crazy.
        
             | petertodd wrote:
             | Worth noting that even though Rust doesn't let you have two
             | separate pieces of code share responsibility for actually
             | initializing a data structure, in practice that doesn't
             | lead to much, if any, code duplication: a Foo::new() can
             | usually be written as a wrapper around a call to
             | Foo::new_with_options(x, y, z).
        
           | Kranar wrote:
           | Only C++ has all of what you mention. Plenty of languages
           | provide RAII without introducing all that complexity such as
           | Ada, Rust, Vale. D has RAII but unfortunately it too carries
           | much (but not all) of the complexity of C++.
        
         | asddubs wrote:
         | maybe since Zig still is making breaking changes, they can call
         | it IIRA
        
         | kristoff_it wrote:
         | `defer` and `errdefer` give you 90% of what destructors & co
         | give you for 0% of the complexity.
         | 
         | For now the the closest thing to RAII is this proposal, but
         | there is no guarantee that it will be accepted.
         | 
         | https://github.com/ziglang/zig/issues/782
        
           | initplus wrote:
           | I find in C++ at least reasoning about RAII is always
           | surprisingly complex. The second you have to write a custom
           | destructor you get into the weeds of reasoning about
           | copy/move/copy-assignment etc.
           | 
           | https://en.cppreference.com/w/cpp/language/rule_of_three
           | 
           | C++ RAII also rubs up painfully against handle based APIs
           | (looking at you Windows) in my experience. There is a lack of
           | standardized RAII wrappers for handle types like there is for
           | pointers. Yes you can pull in a custom handle RAII wrapper
           | but at that point it's simpler to just manage manually.
           | 
           | Defer on the other hand is simple. Anyone can understand it
           | in five minutes.
        
             | jcelerier wrote:
             | > There is a lack of standardized RAII wrappers for handle
             | types like there is for pointers.
             | 
             | .. what's wrong with                   #include <memory>
             | template<typename T, auto Free>         using safe_handle_t
             | = std::unique_ptr<T, decltype([] (auto p) { Free(p); })>;
             | using file_handle = safe_handle_t<FILE, fclose>;
             | void file_example() {             file_handle
             | f{fopen("foo", "r")};         }
        
               | 10000truths wrote:
               | Now you have a double indirection. Worse, the handle
               | probably already references dynamically allocated memory.
               | So that's two dynamic memory allocations per "safe"
               | handle. Which might be perfectly acceptable for non-
               | critical programs, but it is certainly going to tank your
               | memory usage efficiency and thrash your CPU cache if
               | scaled up to millions of handles.
        
               | fbkr wrote:
               | There is no double indirection and allocation here, it
               | stores the FILE* directly, not as a pointer to a FILE*.
        
           | dnautics wrote:
           | I'm pinning my hopes that certain types can be marked as
           | "shared resources", either in-lang, or through a minimalistic
           | helper tool - and analyzed at a lower level (ZIR or AIR) for
           | lifetime analysis.
        
         | AndyKelley wrote:
         | When people talk about RAII in relation to Zig I think they
         | mean something slightly different than RAII, but then the
         | conversation starts to become about what is the definition of
         | RAII rather than whether the Zig language is lacking a certain
         | kind of useful abstraction.
         | 
         | Examples:
         | 
         | [1]: https://news.ycombinator.com/item?id=29506814
         | 
         | [2]:
         | https://gist.github.com/andrewrk/190170bc1441839644c3f15725a...
        
           | nikki93 wrote:
           | Not saying this means you actually need to add constructors
           | and destructors, but I think the main tricky bit is when you
           | have eg. a grow / shrinkable array and want to call something
           | on it that removes elements and it should be calling
           | destructors on those. AFAICT there's no clear place to add a
           | `defer` that makes it happen at the right time at the lexical
           | site those elements are originally added. I think this is
           | where the main RAII complexity comes from vs. the lexical
           | scope local variable scenario which is definitely handled by
           | `defer`.
           | 
           | All that said, not an argument for having that, just wanted
           | to nuance it. I think not having it and keeping the language
           | focused on its priorities can be good, for example.
        
       ___________________________________________________________________
       (page generated 2021-12-15 23:00 UTC)