[HN Gopher] Fundamentals of garbage collection (2023)
___________________________________________________________________
Fundamentals of garbage collection (2023)
Author : b-man
Score : 126 points
Date : 2025-07-09 00:03 UTC (3 days ago)
(HTM) web link (learn.microsoft.com)
(TXT) w3m dump (learn.microsoft.com)
| pjmlp wrote:
| On the context of .NET runtime, as missing from the title.
| BlimpSpike wrote:
| Kindof unrelated to the article, but I was recently wondering if
| it would be possible to detect and deny pointer cycles in a
| language in an efficient way, so that you could then use simple
| reference counting instead of full-blown garbage collection.
|
| It probably wouldn't be usable for a general-purpose programming
| language, but for a special-purpose scripting language I could
| see it making the language implementation easier.
| louthy wrote:
| On any one object you can just follow the references to see if
| you get back to the same object. Not super efficient as you'd
| have to do it for each reference as it is set.
|
| But if it was a simple scripting language and you needed that
| constraint, it's relativity easy to implement.
| Findecanor wrote:
| That would still be tracing. The problem is that if there is
| a cycle, the reference count would be too high, and you'd not
| detect that the object should be reclaimed.
| creata wrote:
| One solution is to forbid recursive data types - e.g., require
| every struct type to only reference types that have already
| been defined. I can't think of any languages that do this.
|
| Another solution is to make things immutable (like Erlang), or
| "as-if" immutable (like Koka), which guarantees that data can
| only point to things that have already been defined, preventing
| cycles.* Erlang uses this to simplify generational collection -
| because old data can't point to young data, it doesn't need a
| card table or anything like that.
|
| I think it's perfectly possible to have a general purpose
| language without cycles: you can just use integer indices into
| an array instead of pointers if you want cyclic data
| structures. This is common in Rust, when people want to avoid
| the overhead of reference counting, but don't want to use
| unsafe code.
|
| * A hidden assumption here is that the language is eagerly
| evaluated. There are languages like Haskell that have
| immutability and cyclic data structures.
| asplake wrote:
| Even with cyclic relationships between types, immutability
| makes cycles within instances difficult (without laziness
| anyway). A syntax tree would be a good example.
| creata wrote:
| Yes, either is sufficient, I think.
|
| Edit: I think the common idea with both solutions is that
| our objects have some weak order (the order in which their
| types were defined, and the time at which the object was
| created, respectively), and objects are only allowed to
| point to objects strictly less than them in this order.
| xscott wrote:
| Yes, and the nice thing about doing it with immutability is
| you can still have recursive types to build linked lists,
| trees, and/or dags. From there you can build hash-array-
| mapped-tries, finger-trees, and so on, giving you friendly
| dict/list or JSON style data structures.
| daxfohl wrote:
| Couldn't you make lazily evaluated code in erlang too, even
| if it's not lazy by default like Haskell? You'd just need
| function pointers, right? Or is that not enough?
| creata wrote:
| Haskell can have circular references because its laziness
| is implemented with _thunks,_ which have a mutable cell in
| which to store the computed value so that terms don 't get
| evaluated more than once. Here's a Haskell function that
| makes a circular linked list: -- circular
| linked list with one item repeat x = let xs = x:xs
| in xs
|
| Here's a rough equivalent in JavaScript that _doesn 't_ use
| thunks, just functions: function
| repeat(x) { function xs() {
| return [x, xs]; // [first, rest] }
| return xs; }
|
| The Haskell version has a cycle because, after evaluation,
| `repeat x` will be a circular linked list, but all the
| "lists" we create in the JavaScript code above are just the
| closure `xs`.
|
| For completeness, here's a JavaScript version that uses
| thunks: class Thunk {
| constructor(f) { this.f = f; } get() { if
| (this.f) { this.v = (this.f)(); delete this.f; } return
| this.v; } } function repeat(x) {
| let xs = new Thunk(() => { return [x, xs];
| // [first, rest] }); return xs;
| }
|
| If you try calling `x = repeat(1); x.get()`, you can see
| that we get a circular list.
| andreamonaco wrote:
| Hello, I'm writing an implementation of the Common Lisp
| language that uses an enhanced reference counting algorithm
| (that I've taken from literature) that detects and handles
| cycles. Performance seems okay, though I still haven't tried
| large programs.
|
| https://savannah.nongnu.org/p/alisp
| zozbot234 wrote:
| A somewhat different approach was recently proposed here:
| https://news.ycombinator.com/item?id=44319427 but it seems to
| have non-trivial overhead. (Still very much worthwhile, given
| the potential advantages of deterministic cycle collection.)
| The paper you reference is quite a bit older so it would of
| course be interesting to do a proper comparison.
| andreamonaco wrote:
| I'll look at that. About performance: people in practice
| have always favored GC, so I think there's a lot to be
| discovered in optimization of reference counting
| algorithms, including concurrent traversal (which is easier
| because each node has local info in the form of refcounts
| and flags) and maybe detection of problematic worse-case
| graphs
| Findecanor wrote:
| The talk for this paper came up on YouTube just the other
| day: https://www.youtube.com/watch?v=GwXjydSQjD8
| n_plus_1_acc wrote:
| Like Rust if it has no Rc?
| tonyedgecombe wrote:
| Rc is implemented in Rust so it would be possible to create
| an equivalent in your own code.
| Someone wrote:
| > I was recently wondering if it would be possible to detect
| and deny pointer cycles in a language in an efficient way
|
| In general, I think that cannot be done, but if one restricts
| what programs can do, solutions exist.
|
| A simple way to do it is by requiring all references "pointing
| out of" an object to be set the moment the object is created,
| and be immutable afterwards (that's what Lisp _cons_
| (https://en.wikipedia.org/wiki/Cons) does. Without _setf_ or
| similar, lisp code cannot create cycles)
|
| That disallows quite a ome code that modifies structures
| without introducing cycles, but still allows for quite some
| code to work.
|
| One could also store an 'age' field with each object and check,
| when a reference is updated in an object, that it points to an
| object that is older than the one being modified. That gives
| some more leeway, at the price of using more (a lot more, in
| code using small objects) memory.
|
| Another idea is to add a bit to each object "there are no
| cycles containing this object", and have the runtime clear that
| when it no longer can guarantee that (edit: unfortunately,
| maintaining that invariant can be very costly. Whenever code
| does _foo.field = bar_ , with both _foo_ and _bar_ known to be
| not part of a cycle, you still have to do a search through all
| objects reachable from _bar_ to check whether a cycle was
| created and, if so, clear that bit in all objects in the
| cycle(s). That makes this idea impractical)
|
| If, as I suspect happens in programming languages which are
| "mostly immutable", there are many objects for which that flag
| stays set, that can significantly speed up checking for the
| creation of cycles.
| jlouis wrote:
| You can make a programming language where cycles are
| impossible. Erlang is a prime example.
|
| Region inference is another strategy in this space. It can
| limit the need for full-blown garbage collection in many cases,
| but also comes with its own set of added trade-offs.
|
| Reference counting is just a different kind of garbage
| collection, really. It acts like a dual construction to a
| tracing GC in many cases. If you start optimizing both, you
| tend to converge to the same ideas over time. Refcounting isn't
| void of e.g. latency problems either: if I have a long linked
| list and snip the last pointer, then we have to collect all of
| that list. That's going to take O(n) time in the size of the
| list. For that reason, you'd have to delay collecting the large
| list right away, which means you are converging toward a
| tracing GC that can work simultaneously with the mutator. See
| e.g., Go's garbage collector.
| zozbot234 wrote:
| > latency problems either: if I have a long linked list and
| snip the last pointer, then we have to collect all of that
| list. That's going to take O(n) time in the size of the list.
| For that reason, you'd have to delay collecting the large
| list right away
|
| These latency issues are inherent to deterministic
| destruction, which is an often desirable feature otherwise;
| they have little to do with reference counting itself. In
| principle, they can be addressed by "parking" objects for
| which delayed disposal is non-problematic onto a separate,
| lower-priority task.
| battle-racket wrote:
| > It acts like a dual construction to a tracing GC in many
| cases
|
| yeah one of the most helpful realizations I've read is that
| tracing and ref counting are essentially two formulations of
| the same problem - one is finding objects that are alive (by
| tracing), and the other is finding things that are dead (i.e.
| their ref counts reach zero). and of course, every object is
| either dead or alive!
| ozim wrote:
| Question: does anyone run "Server GC" for the ASP.NET
| applications?
|
| There is bunch of people copy pasting documentation to SO
| "explaining" server GC. I am running bunch of .NET stuff in VMs
| and never set "Server GC" and never ran into issues with default
| but also not sure if it is worth testing out.
|
| I guess it does not matter much if you are running in containers
| but I am running on VMs in IIS.
| diggan wrote:
| When I play around with changing various GCs for Java (via
| Clojure), then I always setup benchmarks measuring what kind of
| thing I want to improve, run all GCs via that benchmark to
| chose which to use for that service/project and call it a day.
| There is a lot of theorizing and navel-gazing around GCs it
| seems to me, and in the end it is the results that count so
| setup some way to measure, find the differences then move on
| from there :)
| bob1029 wrote:
| Server GC is a tradeoff between latency and throughput. It
| makes a ton of sense for a web server where a small additional
| overhead of a few milliseconds on some responses won't matter.
|
| Workstation GC is what you want when latency is critical. This
| is what you'd use if you were developing a UI or game engine.
|
| I've seen workstation GC stay in the microsecond region when
| strategically executing GC.Collect at allocation batch
| boundaries.
| ozim wrote:
| Well great but you did not write out anything more than I
| could understand as a 15+ years developer of C#/.Net from
| whatever all those people in Stack Overflow wrote.
|
| Do you have anything like running a business line application
| for couple of years on Server GC to write about?
| Lariscus wrote:
| Server GC is the default garbage collector for Asp.net Core.
|
| >
| https://github.com/dotnet/AspNetCore.Docs/blob/main/aspnetco...
| ozim wrote:
| Great one to see, definetly vote up!
| Metalnem wrote:
| I recently re-read this article and can confirm that it's
| excellent--not just this specific page, but all the other
| sections under "Garbage Collection" as well.
|
| If you want to dive deeper into memory performance analysis in
| .NET, this is another must-read: https://github.com/Maoni0/mem-
| doc/blob/master/doc/.NETMemory...
|
| It was written by Maoni Stephens, the architect of .NET's garbage
| collection.
___________________________________________________________________
(page generated 2025-07-12 23:01 UTC)