https://pithlessly.github.io/allocgate.html

Allocgate is coming in Zig 0.9, and you will have to change your code

Date: 2021-12-15

Version 0.9 of the Zig programming language is planned to release in
about a week. One of the major changes coming in this release is a
restructuring of how allocators work, which has been termed Allocgate
. This is a breaking change to the Allocator API provided by the
standard library, so pretty much any code that uses allocators will
have to be updated in response. I hope to use this article to explain
the justification for this change and what steps you need to take to
adjust your code.

The TL;DR is:

  * Where you previously passed around *Allocator to functions that
    need to allocate, you now pass around Allocator directly.
    Allocator is now a struct, which is the size of two pointers.
  * When you construct an allocator (such as var gpa:
    std.heap.GeneralPurposeAllocator(.{});), you call gpa.allocator()
    to get the type-erased allocator implementation rather than
    writing &gpa.allocator.

Unless you are writing your own allocator, this should be all you
need to change.

Two ways to approach polymorphism

Zig's allocators rely on dynamic dispatch. The choice of allocation
and deallocation functions is not known until runtime, so if you want
to write code that is agnostic to the user's choice of allocator (and
you generally do when writing library code), you need to accept these
functions as a parameter in some way. Most languages use a structure
called a virtual function table (vtable) which stores the addresses
of each dynamically-known function. Zig's standard library provides a
variety of allocators (such as ArenaAllocator and
GeneralPurposeAllocator), each of which has some associated state
along with a set of functions (alloc, resize, and free) which operate
on that state to work as a memory allocator. The addresses of these
three functions are collected into an "allocator vtable."

Any polymorphic object's state needs to be passed along with its
vtable somehow. One way to do it is to represent an object as a pair
of pointers: one to the object's fields, and one to a vtable which
knows how to operate on those fields. I'll follow Rust's terminology
and call this (impl, vtable) pair a "fat pointer."

 Dynamic           Object
----------        ----------
| impl   | -----> | fields |
|--------|        | ...    |
| vtable | ---.   ----------
----------    |
              |    Vtable
              |   ----------
              `-> | alloc  |
                  | resize |
                  | free   |
                  ----------

Another way would be to store a separate copy of the vtable in every
object. By knowing its address, the vtable's functions would be able
to deduce the address of the struct containing them, an operation
sometimes called container_of in C and provided by Zig under the name
@fieldParentPtr. Indeed, this is such a common use of this builtin
that the technique is commonly described as just "the @fieldParentPtr
idiom."

                   Object
                  --------------
                  | fields     |
 Dynamic          | ...        |
----------        | ---------- |
| vtable | -------->| alloc  | |
----------        | | resize | |
                  | | free   | |
                  | ---------- |
                  | ...        |
                  --------------

(We could also imagine a third solution in which the object contains
a pointer to the vtable in its fields, and we pass around a pointer
to that pointer. This has the advantage of sharing vtables, while
still requiring only a single pointer to be passed around. However,
this also means we would need a double indirection to access the
vtable, which would be more expensive.)

Polymorphic allocators in Zig

Without stooping so low as to actually benchmark, what might the
respective advantages of these two approaches be?

  * Using fat pointers, we get to make fewer copies of the vtable.
    Every polymorphic value can point to the same one, which saves
    memory and allows dynamic objects to be smaller.
  * With the @fieldParentPtr approach, we're able to pass a dynamic
    object around as just a single pointer. This improves performance
    and saves memory for any struct that needs to hold a dynamic
    allocator (notably, most std containers).

In the case of allocators, it might seem like the @fieldParentPtr
approach would win out. After all, the average program passes
relatively many pointers to relatively few different allocators, so
making the latter large at the expense of keeping the former small is
a sensible trade-off.

However, as it turns out, this approach has a performance problem
with has to do with devirtualization. Virtual function calls are more
expensive than calls to known functions for a number of reasons, so
LLVM would like to rewrite them to static calls anywhere it can prove
that a function pointer always has a particular value. This becomes
more difficult when the function pointer lives inside a structure
like GeneralPurposeAllocator. Zig's structs don't even have
compile-time encapsulation, much less runtime encapsulation. There is
nothing stopping you from reaching into your GeneralPurposeAllocator
and changing the vtable functions to do something else. It would
definitely be against std's contract, but it wouldn't immediately
cause undefined behavior. This means any code which reads vtables out
of the embedded Allocator has to be defensive against the possibility
that the vtable was modified, making devirtualization impossible.

In contrast, when using fat pointers, the vtables are shared constant
objects, and a new fat pointer is constructed on every call to
gpa.allocator(). So while we incur the cost of passing around a
larger fat pointer, the allocation and deallocation functions
potentially become much faster to call.

Allocgate

As far as I can tell, the performance issues of the @fieldParentPtr
approach were first noticed about two months ago in the std.rand API,
which uses a similar approach to allow code to be polymorphic over
the caller's choice of RNG algorithm (perhaps the performance issues
were more apparent here because RNG algorithms are designed to have
high throughput, so the relative overhead of indirect calls is
larger). It was noted in that issue that any other code using this
approach would suffer the same problem.

With Allocgate, the std.mem.Allocator API was also changed from
embedded vtables to fat pointers. This is the reason behind the two
API changes I mentioned at the beginning of the article: Allocator
was changed from holding the vtable itself to being a fat pointer
struct, and you call gpa.allocator() instead of &gpa.allocator
because the vtable is no longer a field of the allocator and you
instead need to construct a fat pointer.

This also happens to address a specific failure mode I have run into
before, where instead of writing

var gpa = std.heap.GeneralPurposeAllocator(.{}){};
const alloc = &gpa.allocator;

the user writes:

var gpa = std.heap.GeneralPurposeAllocator(.{}){};
var alloc = gpa.allocator;

This copies the allocator out of its parent struct and then attempts
to call methods like alloc.alloc(). This will generally compile
correctly (since Zig will automatically take the address of an object
when calling a method that takes self by pointer), but causes
undefined behavior at runtime because alloc attempts to use
@fieldParentPtr despite no longer being contained in the struct it
expects to be in. Post-Allocgate, this code instead becomes:

var gpa = std.heap.GeneralPurposeAllocator(.{}){};
const alloc = gpa.allocator();

... which is pretty much impossible to silently get wrong.

What's next for vtable-based polymorphism?

Allocgate has been merged into Zig's master branch and should ship in
Zig 0.9. This involved an impressive amount of work, with much of the
standard library needing to be changed. It should hopefully yield
better performance for code that uses any of the standard library's
allocator abstractions. Zig's contributors have gone back and forth
on the specifics of the change, and looked at a number of benchmarks,
but I'm sure they would welcome additional data on this.

I think this is a great example of the kind of ecosystem-wide
improvements Zig can make as a relatively young language that has yet
to tie itself to any stability guarantees. It sucks for language
users who will have to change their code, of course, but hopefully
they knew what they were getting into. One of Zig's principles is
"avoid local maxima," which, at this point in the language's
evolution, means to be willing to move to a better solution even if
it means breaking backwards compatibility.

A number of topics are also being discussed that have the potential
to further improve the performance and ergonomics of runtime
polymorphism. For example, it could be that improvements to Zig or
LLVM's aliasing model will make it easier to optimize virtual
function calls (for example, by making it possible to mark a
particular field of a struct as immutable). Zig's developers have
also expressed some interest in having first-class support for
interfaces in the standard library or even the language itself, which
would make it easier for the user and reduce the burden of having to
modify every API in the standard library when changes like this are
made.

Credit for the change

I had no personal involvement in Allocgate. I am simply an outsider
who is interested in Zig and thought others might benefit from an
explanation of this change. Instead, I ought to credit:

  * Andrew Kelley (@andrewrk), creator of Zig and initial designer of
    the allocator API;
  * @ominitay, who first diagnosed the problem in std.rand and
    provided benchmarks;
  * Martin Wickham (@SpexGuy), who discovered the optimization
    problems with LLVM and the need to move away from
    @fieldParentPtr-based APIs;
  * Lee Cannon (@leecannon), who implemented the bulk of the change
    to Allocator, including updating the entire standard library;
  * and everyone else who contributed feedback, bug fixes, and
    benchmarks on the PR.