[HN Gopher] Fundamentals of garbage collection (2023)
       ___________________________________________________________________
        
       Fundamentals of garbage collection (2023)
        
       Author : b-man
       Score  : 126 points
       Date   : 2025-07-09 00:03 UTC (3 days ago)
        
 (HTM) web link (learn.microsoft.com)
 (TXT) w3m dump (learn.microsoft.com)
        
       | pjmlp wrote:
       | On the context of .NET runtime, as missing from the title.
        
       | BlimpSpike wrote:
       | Kindof unrelated to the article, but I was recently wondering if
       | it would be possible to detect and deny pointer cycles in a
       | language in an efficient way, so that you could then use simple
       | reference counting instead of full-blown garbage collection.
       | 
       | It probably wouldn't be usable for a general-purpose programming
       | language, but for a special-purpose scripting language I could
       | see it making the language implementation easier.
        
         | louthy wrote:
         | On any one object you can just follow the references to see if
         | you get back to the same object. Not super efficient as you'd
         | have to do it for each reference as it is set.
         | 
         | But if it was a simple scripting language and you needed that
         | constraint, it's relativity easy to implement.
        
           | Findecanor wrote:
           | That would still be tracing. The problem is that if there is
           | a cycle, the reference count would be too high, and you'd not
           | detect that the object should be reclaimed.
        
         | creata wrote:
         | One solution is to forbid recursive data types - e.g., require
         | every struct type to only reference types that have already
         | been defined. I can't think of any languages that do this.
         | 
         | Another solution is to make things immutable (like Erlang), or
         | "as-if" immutable (like Koka), which guarantees that data can
         | only point to things that have already been defined, preventing
         | cycles.* Erlang uses this to simplify generational collection -
         | because old data can't point to young data, it doesn't need a
         | card table or anything like that.
         | 
         | I think it's perfectly possible to have a general purpose
         | language without cycles: you can just use integer indices into
         | an array instead of pointers if you want cyclic data
         | structures. This is common in Rust, when people want to avoid
         | the overhead of reference counting, but don't want to use
         | unsafe code.
         | 
         | * A hidden assumption here is that the language is eagerly
         | evaluated. There are languages like Haskell that have
         | immutability and cyclic data structures.
        
           | asplake wrote:
           | Even with cyclic relationships between types, immutability
           | makes cycles within instances difficult (without laziness
           | anyway). A syntax tree would be a good example.
        
             | creata wrote:
             | Yes, either is sufficient, I think.
             | 
             | Edit: I think the common idea with both solutions is that
             | our objects have some weak order (the order in which their
             | types were defined, and the time at which the object was
             | created, respectively), and objects are only allowed to
             | point to objects strictly less than them in this order.
        
             | xscott wrote:
             | Yes, and the nice thing about doing it with immutability is
             | you can still have recursive types to build linked lists,
             | trees, and/or dags. From there you can build hash-array-
             | mapped-tries, finger-trees, and so on, giving you friendly
             | dict/list or JSON style data structures.
        
           | daxfohl wrote:
           | Couldn't you make lazily evaluated code in erlang too, even
           | if it's not lazy by default like Haskell? You'd just need
           | function pointers, right? Or is that not enough?
        
             | creata wrote:
             | Haskell can have circular references because its laziness
             | is implemented with _thunks,_ which have a mutable cell in
             | which to store the computed value so that terms don 't get
             | evaluated more than once. Here's a Haskell function that
             | makes a circular linked list:                   -- circular
             | linked list with one item         repeat x = let xs = x:xs
             | in xs
             | 
             | Here's a rough equivalent in JavaScript that _doesn 't_ use
             | thunks, just functions:                   function
             | repeat(x) {             function xs() {
             | return [x, xs]; // [first, rest]             }
             | return xs;         }
             | 
             | The Haskell version has a cycle because, after evaluation,
             | `repeat x` will be a circular linked list, but all the
             | "lists" we create in the JavaScript code above are just the
             | closure `xs`.
             | 
             | For completeness, here's a JavaScript version that uses
             | thunks:                   class Thunk {
             | constructor(f) { this.f = f; }             get() { if
             | (this.f) { this.v = (this.f)(); delete this.f; } return
             | this.v; }         }              function repeat(x) {
             | let xs = new Thunk(() => {                 return [x, xs];
             | // [first, rest]             });             return xs;
             | }
             | 
             | If you try calling `x = repeat(1); x.get()`, you can see
             | that we get a circular list.
        
         | andreamonaco wrote:
         | Hello, I'm writing an implementation of the Common Lisp
         | language that uses an enhanced reference counting algorithm
         | (that I've taken from literature) that detects and handles
         | cycles. Performance seems okay, though I still haven't tried
         | large programs.
         | 
         | https://savannah.nongnu.org/p/alisp
        
           | zozbot234 wrote:
           | A somewhat different approach was recently proposed here:
           | https://news.ycombinator.com/item?id=44319427 but it seems to
           | have non-trivial overhead. (Still very much worthwhile, given
           | the potential advantages of deterministic cycle collection.)
           | The paper you reference is quite a bit older so it would of
           | course be interesting to do a proper comparison.
        
             | andreamonaco wrote:
             | I'll look at that. About performance: people in practice
             | have always favored GC, so I think there's a lot to be
             | discovered in optimization of reference counting
             | algorithms, including concurrent traversal (which is easier
             | because each node has local info in the form of refcounts
             | and flags) and maybe detection of problematic worse-case
             | graphs
        
             | Findecanor wrote:
             | The talk for this paper came up on YouTube just the other
             | day: https://www.youtube.com/watch?v=GwXjydSQjD8
        
         | n_plus_1_acc wrote:
         | Like Rust if it has no Rc?
        
           | tonyedgecombe wrote:
           | Rc is implemented in Rust so it would be possible to create
           | an equivalent in your own code.
        
         | Someone wrote:
         | > I was recently wondering if it would be possible to detect
         | and deny pointer cycles in a language in an efficient way
         | 
         | In general, I think that cannot be done, but if one restricts
         | what programs can do, solutions exist.
         | 
         | A simple way to do it is by requiring all references "pointing
         | out of" an object to be set the moment the object is created,
         | and be immutable afterwards (that's what Lisp _cons_
         | (https://en.wikipedia.org/wiki/Cons) does. Without _setf_ or
         | similar, lisp code cannot create cycles)
         | 
         | That disallows quite a ome code that modifies structures
         | without introducing cycles, but still allows for quite some
         | code to work.
         | 
         | One could also store an 'age' field with each object and check,
         | when a reference is updated in an object, that it points to an
         | object that is older than the one being modified. That gives
         | some more leeway, at the price of using more (a lot more, in
         | code using small objects) memory.
         | 
         | Another idea is to add a bit to each object "there are no
         | cycles containing this object", and have the runtime clear that
         | when it no longer can guarantee that (edit: unfortunately,
         | maintaining that invariant can be very costly. Whenever code
         | does _foo.field = bar_ , with both _foo_ and _bar_ known to be
         | not part of a cycle, you still have to do a search through all
         | objects reachable from _bar_ to check whether a cycle was
         | created and, if so, clear that bit in all objects in the
         | cycle(s). That makes this idea impractical)
         | 
         | If, as I suspect happens in programming languages which are
         | "mostly immutable", there are many objects for which that flag
         | stays set, that can significantly speed up checking for the
         | creation of cycles.
        
         | jlouis wrote:
         | You can make a programming language where cycles are
         | impossible. Erlang is a prime example.
         | 
         | Region inference is another strategy in this space. It can
         | limit the need for full-blown garbage collection in many cases,
         | but also comes with its own set of added trade-offs.
         | 
         | Reference counting is just a different kind of garbage
         | collection, really. It acts like a dual construction to a
         | tracing GC in many cases. If you start optimizing both, you
         | tend to converge to the same ideas over time. Refcounting isn't
         | void of e.g. latency problems either: if I have a long linked
         | list and snip the last pointer, then we have to collect all of
         | that list. That's going to take O(n) time in the size of the
         | list. For that reason, you'd have to delay collecting the large
         | list right away, which means you are converging toward a
         | tracing GC that can work simultaneously with the mutator. See
         | e.g., Go's garbage collector.
        
           | zozbot234 wrote:
           | > latency problems either: if I have a long linked list and
           | snip the last pointer, then we have to collect all of that
           | list. That's going to take O(n) time in the size of the list.
           | For that reason, you'd have to delay collecting the large
           | list right away
           | 
           | These latency issues are inherent to deterministic
           | destruction, which is an often desirable feature otherwise;
           | they have little to do with reference counting itself. In
           | principle, they can be addressed by "parking" objects for
           | which delayed disposal is non-problematic onto a separate,
           | lower-priority task.
        
           | battle-racket wrote:
           | > It acts like a dual construction to a tracing GC in many
           | cases
           | 
           | yeah one of the most helpful realizations I've read is that
           | tracing and ref counting are essentially two formulations of
           | the same problem - one is finding objects that are alive (by
           | tracing), and the other is finding things that are dead (i.e.
           | their ref counts reach zero). and of course, every object is
           | either dead or alive!
        
       | ozim wrote:
       | Question: does anyone run "Server GC" for the ASP.NET
       | applications?
       | 
       | There is bunch of people copy pasting documentation to SO
       | "explaining" server GC. I am running bunch of .NET stuff in VMs
       | and never set "Server GC" and never ran into issues with default
       | but also not sure if it is worth testing out.
       | 
       | I guess it does not matter much if you are running in containers
       | but I am running on VMs in IIS.
        
         | diggan wrote:
         | When I play around with changing various GCs for Java (via
         | Clojure), then I always setup benchmarks measuring what kind of
         | thing I want to improve, run all GCs via that benchmark to
         | chose which to use for that service/project and call it a day.
         | There is a lot of theorizing and navel-gazing around GCs it
         | seems to me, and in the end it is the results that count so
         | setup some way to measure, find the differences then move on
         | from there :)
        
         | bob1029 wrote:
         | Server GC is a tradeoff between latency and throughput. It
         | makes a ton of sense for a web server where a small additional
         | overhead of a few milliseconds on some responses won't matter.
         | 
         | Workstation GC is what you want when latency is critical. This
         | is what you'd use if you were developing a UI or game engine.
         | 
         | I've seen workstation GC stay in the microsecond region when
         | strategically executing GC.Collect at allocation batch
         | boundaries.
        
           | ozim wrote:
           | Well great but you did not write out anything more than I
           | could understand as a 15+ years developer of C#/.Net from
           | whatever all those people in Stack Overflow wrote.
           | 
           | Do you have anything like running a business line application
           | for couple of years on Server GC to write about?
        
         | Lariscus wrote:
         | Server GC is the default garbage collector for Asp.net Core.
         | 
         | >
         | https://github.com/dotnet/AspNetCore.Docs/blob/main/aspnetco...
        
           | ozim wrote:
           | Great one to see, definetly vote up!
        
       | Metalnem wrote:
       | I recently re-read this article and can confirm that it's
       | excellent--not just this specific page, but all the other
       | sections under "Garbage Collection" as well.
       | 
       | If you want to dive deeper into memory performance analysis in
       | .NET, this is another must-read: https://github.com/Maoni0/mem-
       | doc/blob/master/doc/.NETMemory...
       | 
       | It was written by Maoni Stephens, the architect of .NET's garbage
       | collection.
        
       ___________________________________________________________________
       (page generated 2025-07-12 23:01 UTC)