[HN Gopher] Tracing garbage collection for arenas
___________________________________________________________________
Tracing garbage collection for arenas
Author : sirwhinesalot
Score : 56 points
Date : 2024-06-24 11:31 UTC (2 days ago)
(HTM) web link (btmc.substack.com)
(TXT) w3m dump (btmc.substack.com)
| zozbot234 wrote:
| Interesting discussion. If you want to experiment with
| 'lightweight' traced GC for systems programming, you might like
| https://github.com/chc4/samsara (for Rust) or
| https://github.com/pebal/sgcl (for C++) both implementing a
| concurrent tracing algorithm which is a bit more involved than
| what's directly discussed in OP.
| sirwhinesalot wrote:
| Hi author here, I wasn't aware of those, I should probably link
| to some existing library implementations of GCs.
|
| My goal was really to make the simplest, dumbest thing possible
| that still works. It can't compete with more serious GCs like
| that, but hey, it's intentionally not meant to.
| titzer wrote:
| Virgil uses a simple stop-the-world semispace copying
| collector. I did that because it's better to start moving
| rather than nonmoving. I think the hotly contested questions
| in systems programming are all complete red herrings. It's
| not about GC versus non-GC, it's about gracefully handling
| unsafe code, integrating JITed code, interfacing with the
| lowest level APIs (e.g. kernel syscalls), implementing the
| runtime system, etc. Data formats like network packets, file
| formats, in-memory data structures, etc. That stuff is way
| more important than if your goddamn lists of widgets are
| heap, stack, or region allocated.
| diffxx wrote:
| I am confused by this comment. All the enumerated features
| are dependent directly or indirectly on the memory
| management approach that the system uses. If one chooses
| badly for the target workload, there are many things that
| will go wrong including excessive copying (which kills the
| performance of the system) and inscrutable crashes/logic
| errors as well as security vulnerabilities.
|
| Personally, I tend to think that if you nail the memory
| management technique all the other items will tend to work
| themselves out. And I think this is actually harder to get
| right _systemically_ than all of those other things.
| titzer wrote:
| > if you nail the memory management technique all the
| other items will tend to work themselves out.
|
| From my experience, implementing a language runtime for a
| GC'd language in a non-GC'd language will lead you down
| one of two well-trodded roads full of footguns and booby
| traps, which includes either conservative-on-the-stack
| scanning or a huge PITA handle system which has a bug
| tail measured in decades. So you want to implement a GC'd
| language in a GC'd language. But that's probably not what
| you meant.
|
| To the larger point, no, absolutely not. There are so
| many other considerations for systems programming that
| this whole "GC or not" debate is a waste of time. Read
| the comment again. System programming deals with
| constraints imposed by other hardware and software, as
| well as the need to look at the implementation guts of
| things, generate new code, etc. Solving the language's
| memory management problems is only tangentially related,
| meaning, if you do solve it, then you still have these
| problems left over to solve.
| zozbot234 wrote:
| > I am confused by this comment. All the enumerated
| features are dependent directly or indirectly on the
| memory management approach that the system uses.
|
| The key feature of low-level systems programming is that
| there's no such thing as "the" one memory-management
| approach. GC with arenas can be used in a "pluggable" way
| for the limited case of objects referencing one another
| in a fully general graph (including possible cycles)
| while preserving other, more lightweight approaches
| (including refcounting and simple RAII/static ownership
| tracking) for the bulk of cases where the full generality
| of GC isn't always required.
| sirwhinesalot wrote:
| Yup, 100% agreed. GC is a tool, we need to talk about
| _specific_ GCs and their costs as to whether or not they
| 're suitable for a particular task.
|
| For me the main issue has to do with FFI and needing to
| drag around a hefty runtime to make a library available to
| other languages.
|
| Not an issue with simpler GCs.
| zozbot234 wrote:
| FFI from GC'd to non-GC'd languages becomes workable if
| you can manually add FFI-referenced objects as roots (so
| that a collection cycle will not free them or anything
| they reference in turn) and ensure that the GC can call
| finalizers in order to ensure proper disposal (such as
| decreasing reference counts or freeing memory) whenever a
| GC object happens to control the lifecycle of non-GC'd
| resources.
| sirwhinesalot wrote:
| I meant the other direction actually, non-GC'd calling
| GC'd. FFI from GC'd to non-GC'd has its issues but is
| thankfully much better explored.
|
| With a minimal ref counting "GC" on GC'd side, you just
| need 2 extra "runtime" functions (incref, decref), which
| happen to fit like a glove with the scope-based resource
| management you have in Rust/C++/Swift/C (the latter with
| GCC extensions).
|
| With a tracing GC, my hope was to prove that if you make
| it as "minimal" as your usual ref counting
| implementation, then it's also just a few functions (one
| to init a GC heap, one to allocate on that heap, one to
| collect that heap), which can hardly be considered a
| "runtime".
|
| Your typical tracing GC is a large, powerful and complex
| beast that doesn't play nice with anything besides the
| language/vm it was designed for.
| neonsunset wrote:
| You pretty much described how FFI works in .NET :)
|
| When you are passing a pointer across FFI which points to
| an object interior, you "pin" such object with `fixed`
| statement which toggles a bit in the header of that
| object. Conveniently, it is also used by GC during the
| mark phase so your object is effectively pre-marked as
| live. Objects pinned in such way cannot be moved,
| ensuring the pointer to their interiors remains valid.
|
| It's not as simple implementation-wise - .NET's GC has
| multiple strategies to minimize the impact of pinned
| objects on GC throughput and heap size. Additionally, for
| objects that are expected to be long-lived and pinned
| throughout their entire lifetime, there's a separate
| Pinned Object Heap to further improve the
| interoperability. In practical terms, this is not used
| that often because most of the time you pass a struct or
| a pointer to a struct on the stack that needs no
| marshalling or pinning. In the rare case the pointer
| needs to point to a long-lived pinned array, these are
| allocated with `GC.AllocateArray<T>(length, isPinned)`.
|
| .NET has another interesting feature that is important to
| FFI: ByRefs which are special pointers that GC is aware
| of that can point to arbitrary memory. You can receive an
| int*, len from C and wrap it into a Span<int> (which
| would be ref int + length), and GC will be able to
| efficiently disambiguate that such pointer does not point
| to GC-owned heap and can be safely ignored while should
| it point to object memory, the memory range needs to be
| marked and byrefs pointing to it need to be updated if it
| is moved. That Span<int> can then be consumed by most
| standard library APIs that offer span overloads next to
| the array ones (there are more and more that accept just
| spans, since many types with contiguous memory can be
| treated as one).
|
| This works the other way too and there is a separate
| efficient mechanism for pinning byrefs when passing them
| as plain pointers to C/C++/Rust/etc.
| efitz wrote:
| Despite the title "tracing garbage collection for arenas", the
| article is not about analysis of sanitation practices at sports
| facilities.
| colechristensen wrote:
| I'd totally watch a 5-30 minute documentary about waste
| management in large venues. Bonus points if there were
| questionable infographics about beer and hotdogs.
| sirwhinesalot wrote:
| This got a healthy chuckle out of me, well done :)
| xcskier56 wrote:
| I was definitely convinced that the article was somehow going
| to pivot to this topic until it started getting into "arenas"
| forrestthewoods wrote:
| > Can we make "the simplest form" of tracing garbage collection
| work for systems programming?
|
| No. 24 bytes per allocation and having to perform a copy and
| point fixup is too expensive.
|
| The problem with GC is that it's fine until it's not. And when it
| becomes not fine you need to radically change your entire
| architecture. There is no incremental fix.
|
| That said, Unreal Engine does in fact use a mark-and-sweep GC for
| certain types of objects. Which could be something for the author
| to look into.
| sirwhinesalot wrote:
| Are 16/24 bytes that much worse than 8/16 (with weak refs) for
| ref counting?
|
| The copying can be too expensive yes, though hopefully the
| ability to use non-garbage collected arenas would mitigate that
| issue (just don't put huge live data in the default, GCed
| arena)
| forrestthewoods wrote:
| > Are 16/24 bytes that much worse than 8/16 (with weak refs)
| for ref counting?
|
| No, but that's not the right question. Most allocations in
| languages like C/C++/Rust aren't using ref counts. In fact
| excessive use of std::shared_ptr is a well known footgun in
| C++!
|
| > the ability to use non-garbage collected arenas would
| mitigate that issue
|
| I agree that not using GC is a way to mitigate GC issues! But
| if you're only able to performantly use GC for a limited
| number of allocations you also aren't getting much benefit
| from GC.
| sirwhinesalot wrote:
| Note that you can allocate as much temp data as you want to
| the GC heap without incurring any performance issues (in
| fact allocation will be faster than malloc), because
| copying collectors only ever care about live objects.
|
| So the question is, how much of an issue it is to "contort"
| the program to not allocate large, live objects to the GC
| heap, vs contorting it to have a tree-like structure for
| RAII-style memory management.
|
| Honestly, I have no idea, one of the reasons I really want
| to make a toy language to try it out and actually measure
| the runtime costs vs convenience etc.
|
| Note: For my particular C implementation, which is
| admittedly cobbled together with spit and ducktape, you can
| easily add support for individually managed allocations to
| the GC heap, since it already uses buckets anyway.
|
| If a bucket contains only 1 allocation, there's no need to
| copy, you can just keep that bucket somewhere in the linked
| list. This does require the programmer to explicitly
| request that kind of allocation though.
|
| For the "huge amount of small objects" case, GC and RAII/RC
| are complete opposites. If most of those small objects die,
| the GC will be a lot faster since it drops them by not
| copying them. But if most of the objects are live, then the
| GC will spend a brutal amount of time copying, while
| RAII/RC will barely do any work.
|
| Having memory management options is a good thing ;)
| amelius wrote:
| Manual garbage collection can also be expensive, in some cases
| even more expensive than automatic garbage collection.
| celeritascelery wrote:
| Interesting write up. I am curious about the usage of this
| framework. One thing in reference counting's favor is that it is
| hard to screw up. You just use the rc type and when it is no
| longer accessible it will magically get freed. That is how GC
| works in a high level language too, you almost never need to
| think about it.
|
| Let's look at the example from the docs:
| SWL_GCArena gc = swl_gc_arena_new(<starting-capacity>);
| int* foo = swl_gc_alloc(&gc, 5000, alignof(int), NULL);
| int* bar = swl_gc_alloc(&gc, 16000, alignof(int), NULL);
| int* baz = swl_gc_alloc(&gc, 3000, alignof(int), NULL);
| // foo and baz are roots SWL_GC_COLLECT(&gc,
| SWL_ROOT(foo), SWL_ROOT(baz)); // bar is gone, poof
|
| If you try to access bar at the end you get a use after free. I
| am pretty sure if you tried to access foo again it would trigger
| the same problem, because it is a pointer directly to int and not
| an indirection. What happens if you pass a non-arena pointer to
| SWL_ROOT? What happens if you have multiple arenas and you mix
| and match them?
|
| There are many ways you could trivially shoot yourself in the
| foot in large program. It is a lot less "idiot-proof" than simple
| reference counting. The fact that you create each piece of memory
| and then have to manually pass it back to the root macro on each
| collection is very similar to the "malloc/free" dance we want to
| try and avoid.
|
| The rooting issue is something I tried to tackle in my GC for
| rust[1]. Essentially making low level GC "idiot proof".
|
| [1] https://coredumped.dev/2022/04/11/implementing-a-safe-
| garbag...
| sirwhinesalot wrote:
| The idea is that this would be used in a language backend, so
| the compiler would track the roots for you. Note that in C ref
| counting is also rather error prone in that you can increment
| or decrement a reference the wrong number of times.
|
| But I agree this is not the best thing to do manually. The main
| advantage is that you only need to call collect in 1 place in
| your whole app. A lot easier to get right than having free all
| over the place.
|
| EDIT: placed your article on my reading list, thanks for
| sharing it!
| forrestthewoods wrote:
| > One thing in reference counting's favor is that it is hard to
| screw up.
|
| Well, circular references will cause your RC objects to never
| ever drop. So then you have to start using a bunch of Weak
| references. And it can very quickly become a confusing and hard
| to understand mess.
|
| The cost of a memory leak is significantly lower than the cost
| of a use after free. But disentangling the graph of refcounts
| can be super painful.
| jkcxn wrote:
| How do you move memory and have all the pointers update to the
| new position? I didn't understand that part.
| sirwhinesalot wrote:
| Hi, author here.
|
| It's handled through the actual GC implementation, Cheney's
| Algorithm to be specific, it's mentioned in the blog post but I
| didn't write how it actually works there.
|
| My git repo link at the end has an actual C implementation if
| you'd like to have a look.
|
| The TL;DR is that the algorithm starts by going through the
| roots and copying each of them to a new Arena. Each time it
| does it updates the root pointer to the new location and writes
| out a "forwarding pointer" in the old allocation such that if 2
| roots go to the same object, they can check if it's already
| been forwarded, and replace themselves with that pointer
| without copying.
|
| Once all the roots are copied, the algorithm goes iteratively
| through every allocation in the new Arena, digging through
| their pointers, doing the same copy/update/forward/dance onto
| the same Arena, until it reaches a fixpoint (no new copies
| happen). Then it frees the old Arena.
|
| It's a really simple algorithm, I strongly suggest having a
| look at the wikipedia page :)
| tekknolagi wrote:
| The pointers that the programmer controls live on the shadow
| stack and that's how they get updated. Another term for this is
| a "GC handle"
| moonchild wrote:
| i would do a variable-size header 4 or 8 bytes for most small
| objects (8 if you want to provide alignment, 4 if not or if you
| can squish your pointers). one bit identifies if the header is
| small or large; if small, next few bits give the size of the
| object in words, and the remainder are a bitmap telling which
| words are pointers. forwarding pointer stomps on the first word
| of the object
___________________________________________________________________
(page generated 2024-06-26 23:01 UTC)