[HN Gopher] Preparing for the .NET 10 GC
       ___________________________________________________________________
        
       Preparing for the .NET 10 GC
        
       Author : benaadams
       Score  : 75 points
       Date   : 2025-09-24 10:37 UTC (12 hours ago)
        
 (HTM) web link (maoni0.medium.com)
 (TXT) w3m dump (maoni0.medium.com)
        
       | orphea wrote:
       | For those who like me was left wondering what DATAS is, here is
       | the link:
       | 
       | https://learn.microsoft.com/en-us/dotnet/standard/garbage-co...
        
         | gwbas1c wrote:
         | Yeah, I kept scrolling to the top to see if I overlooked
         | something.
         | 
         | Then I realized, "oh, it's hosted on Medium." (I generally find
         | Medium posts to be very low quality.) In this case, the author
         | implies that they are on the .Net team, so I'm continuing to
         | read.
         | 
         | (At least I hope the author actually is on the .Net team and
         | isn't blowing hot air, because it's a Medium post and not
         | something from an official MS blog.)
        
           | olidb wrote:
           | Maoni Stephens is indeed on the .net team and is, as far as I
           | know, the lead architect of the .net garbabe collector for
           | many years: https://github.com/Maoni0
           | https://devblogs.microsoft.com/dotnet/author/maoni/
           | 
           | Therefore she's probably the person with the most knowledge
           | about the .net GC but maybe not the best writer (I haven't
           | read the article yet).
        
             | moomin wrote:
             | The writing itself is fine, but she's assuming a LOT of
             | knowledge e.g. what a GC0 budget is and what increasing it
             | means.
        
               | lomase wrote:
               | I did not even know Server GC was a thing.
        
         | bob1029 wrote:
         | > Maximum throughput (measured in RPS) shows a 2-3% reduction,
         | but with a working set improvement of over 80%.
         | 
         | I have a hard time finding this approach compelling. The amount
         | of additional GC required in their example seems extreme to me.
        
       | bilekas wrote:
       | It's incredibly frustrating the author doesn't actually say
       | "Garbage Collector (GC)" I'm aware but something niggling in the
       | back of my head had me second guessing.
        
         | nu11ptr wrote:
         | Even worse: they don't explain what the DATAS acronym means.
         | Seems like the author makes too many assumptions about the
         | knowledge base of their reader IMO.
        
           | stonemetal12 wrote:
           | I am guessing he doesn't expect linking from outside. The
           | blog post before this one starts: "In this blog post I'm
           | going to talk about our .NET 8.0 GC feature called DATAS
           | (Dynamic Adaptation To Application Sizes)."
        
           | Akronymus wrote:
           | Because everyone knows at least the formulas for quartz, of
           | course
           | 
           | https://xkcd.com/2501/
        
           | graycat wrote:
           | Basic rule in technical writing: For obscure terminology,
           | always define and explain that (e.g., maybe with a reference)
           | before using it.
        
       | gwbas1c wrote:
       | This post would carry a lot more authority if it was on an
       | official MS or .net blog; instead of Medium. (I typically
       | associate Medium with low-quality blog entries and don't read
       | them.)
        
         | justin66 wrote:
         | Or if the author used their real name.
        
           | giancarlostoro wrote:
           | Agree. It's not like a blogpost that is about grey hat
           | subjects or something.
        
           | artimaeis wrote:
           | For what it's worth, Maoni is the author's real name. Maoni0
           | is what they go by everywhere. You can find interviews and
           | plenty of their other content if you search around a bit.
        
             | justin66 wrote:
             | Using a handle instead of their full name on an article is
             | a choice. The first impression is not "knowledgeable
             | employee making post about company's product."
             | 
             | Posting from a Microsoft blog would to some extent fix
             | this, to the OP's point.
             | 
             | (I know - who cares. But first impressions are what they
             | are)
        
         | nu11ptr wrote:
         | I don't generally find them low quality, but I do wish people
         | wouldn't use it since I don't subscribe to it.
        
           | giancarlostoro wrote:
           | Its the Pinterest of blogs, its really annoying.
        
         | giancarlostoro wrote:
         | Moreso if the authors profile picture wasn't what looks like a
         | memecat. I can't exactly share this around without feeling like
         | they'll judge it based on that alone.
        
           | pestkranker wrote:
           | Maoni0 is the mastermind behind the .NET GC. They won't judge
           | you, and if they do, that's their problem.
        
             | giancarlostoro wrote:
             | Sure, but how many people truly know this? I love knowing
             | about people who are key contributors to the industry. I
             | run into a lot of walls when trying to talk to 99% of my
             | coworkers about any of them.
        
           | bitwize wrote:
           | I think that might be her cat?
        
         | pjmlp wrote:
         | The author is one of the main GC architects on .NET, so we in
         | the known are aware of who she is.
         | 
         | Here is an interview with her,
         | 
         | https://www.youtube.com/watch?v=ujkSnko0JNQ
         | 
         | Having said this, I agree with you, the Aspire/MAUI architects
         | do the same, I really don't get why we have to search for this
         | kind of blog posts on other platforms instead of DevBlogs.
        
           | deburo wrote:
           | They should probably cross-post in both.
        
           | lomase wrote:
           | Microsoft likes to just delete thist kind of blogs from its
           | site.
           | 
           | Many times I have found an interesting article only for it to
           | be just not accesible.
           | 
           | Maybe that is why people stoped using it.
        
         | omnicognate wrote:
         | Microsoft have a terrible track record for moving and deleting
         | technical content, to the extent I think I'd rather their
         | developers host their articles almost anywhere else.
         | 
         | Maoni Stephens is the lead developer on the .NET garbage
         | collector. An "About" entry would probably help, but she has a
         | lot of name recognition in the .NET community and in the
         | article it's clear from the first sentence that she's talking
         | from the perspective of owning the GC.
        
       | gwbas1c wrote:
       | One anecdote from working with .Net for over 20 years: I've had a
       | few situations where someone (who isn't a programmer and/or
       | doesn't work with .Net) insists that the application has a memory
       | leak.
       | 
       | First, I explain that garbage collected applications don't
       | release memory immediately. Then I get sucked into a wild goose
       | chase looking for a memory leak that doesn't exist. Finally, I
       | point out that the behavior they see is normal, usually to some
       | grumbling.
       | 
       |  _From what I can tell, DATAS basically makes a .Net application
       | have a normal memory footprint._ Otherwise, .Net is quite a pig
       | when it comes to memory. https://github.com/GWBasic/soft_matrix,
       | implemented in Rust, generally has very low memory consumption.
       | An earlier version that I wrote in C# would consume gigabytes of
       | memory (and often run out of memory when run on Mono with the
       | Bohem garbage collector.)
       | 
       | ---
       | 
       | > If startup perf is critical, DATAS is not for you
       | 
       | This is one of my big frustrations with .net, (although I tend to
       | look at how dependency injection is implemented as a bigger
       | culprit.)
       | 
       | It does make me wonder: How practical is it to just use
       | traditional reference counting and then periodically do a mark-
       | and-sweep? I know it's a very different approach than .net was
       | designed for. (Because they deliberately decided that
       | dereferencing an object should have no computational cost.) It's
       | more of a rhetorical question.
        
         | nu11ptr wrote:
         | > It does make me wonder: How practical is it to just use
         | traditional reference counting and then periodically do a mark-
         | and-sweep? I know it's a very different approach than .net was
         | designed for. (Because they deliberately decided that
         | dereferencing an object should have no computational cost.)
         | It's more of a rhetorical question.
         | 
         | This is what CPython does. The trade off is solidly worse
         | allocator performance, however. You also have the reference
         | counting overhead, which is not trivial unless it is deferred.
         | 
         | There is always a connection between the allocator and
         | collector. If you use a compacting collector (which I assumed
         | .NET does), you get bump pointer allocation, which is very
         | fast. However, if you use a non-compacting collector (mark-and-
         | sweep is non-compacting), you would then fallback to a normal
         | free list allocator (aka as "malloc") which has solidly higher
         | overhead. You can see the impact of this (and reference
         | counting) in any benchmark that builds a tree (and therefore is
         | highly contended on allocation). This is also why languages
         | that use free list allocation often have some sort of "arena"
         | library, so they can have high speed bump pointer allocation in
         | hot spots (and then free all that memory at once later on).
         | 
         | BTW, reference counting, malloc/free performance also impact
         | Rust, but given Rust's heavy reliance on the stack it often
         | doesn't impact performance much (aka just doing less
         | allocations). For allocation heavy code, many of us use
         | MiMalloc one of the better malloc/free implementations.
        
           | gwbas1c wrote:
           | So basically you're trading lowering RAM consumption for
           | higher CPU consumption?
           | 
           | FWIW: When I look at Azure costs, RAM tends to cost more than
           | CPU. So the tradeoffs of using a "slower" memory manager
           | might be justified.
        
             | nu11ptr wrote:
             | It depends on workload. It is difficult to quantify the
             | trade offs without knowing that.
             | 
             | The problem is in languages like C#/Java almost everything
             | is an allocation, so I don't really think reference
             | counting would work well there. I suspect this is the
             | reason PyPy doesn't use reference counting, it is a big
             | slowdown for CPython. Reference counting really only works
             | well in languages with low allocations. Go mostly gets away
             | with a non-compacting mark-sweep collector because it has
             | low level control that allows many things to sit on the
             | stack (like Rust/C/C++, etc.).
        
               | adgjlsfhk1 wrote:
               | C# is a lot better than Java on this front since they
               | support stack allocated structs
        
           | whaleofatw2022 wrote:
           | Dotnet does both mark and sweep as well as compaction,
           | depends on what type of GC happens.
        
             | gwbas1c wrote:
             | In this case, we're discussing a case where mark-and-sweep
             | is used to collect cyclic references, and it's implied that
             | there are no generations. (Because otherwise, purely
             | relying on reference counting means that cyclic references
             | end up leaking unless things like weak references are
             | used.)
             | 
             | IE, the critical difference is that reference counting
             | frees memory immediately; albeit at a higher CPU cost and
             | needing to still perform a mark-and-sweep to clear out
             | cyclic references.
        
         | SideburnsOfDoom wrote:
         | > First, I explain that garbage collected applications don't
         | release memory immediately. ... I point out that the behavior
         | they see is normal
         | 
         | yes, this is an easily overlooked point: Using memory when it
         | going free is by design. It is often better to use use up
         | cheap, unused memory instead of expensive CPU doing a GC. When
         | memory is plentiful as it often is, then it is faster to just
         | not run a GC yet.
         | 
         | You're not in trouble unless you run short of memory, and a
         | necessary GC does not free up enough. Then only can you call it
         | an issue.
        
         | kg wrote:
         | One of the main problems with refcounting is that unless your
         | compiler/JIT are able to safely, aggressively optimize out
         | reference increment/decrements, you can spend a ton of CPU time
         | pointlessly bumping a counter up and down every time you enter
         | a new function/method. This has been a problem for ObjC and
         | Swift applications in the past AFAIK, though both of those
         | compilers do a great job of optimizing that stuff out where
         | possible.
         | 
         | There are some other things that would probably be improvements
         | coming along with refcounting though - you might be able to get
         | rid of GC write barriers.
        
         | bob1029 wrote:
         | To be fair, there is an entire class of GC/memory problems that
         | aren't _technically_ a leak but manifest in effectively the
         | same way.
         | 
         | The most common one I see is LOH (Large Object Heap)
         | fragmentation. When objects are promoted to the LOH the runtime
         | doesnt bother with moving them around anymore. There is a way
         | to explicitly compact the LOH but it can be a non-starter for a
         | lot of applications.
         | 
         | https://learn.microsoft.com/en-us/dotnet/api/system.runtime....
         | 
         | I've once exposed this as a button that a customer's IT
         | department could click whenever they received an alert on
         | memory utilization. The actual solution would have been to
         | refactor the entire product to not pass gigantic blobs around
         | all the time, but that wasn't in the cards for us.
        
         | WorldMaker wrote:
         | > From what I can tell, DATAS basically makes a .Net
         | application have a normal memory footprint.
         | 
         | In _Server environments_. DATAS is an upgrade to garbage
         | collection in  "Server mode". Server GC assumed it could be the
         | only thing running on a machine and could use as much memory as
         | it wanted and so would just easily over-allocate memory much
         | more than what it immediately needed. (As the article points
         | out, it would start at a large fixed amount of memory times the
         | number of CPU cores.)
         | 
         | (As opposed to "Workstation GC" which has always tried to
         | minimize memory consumption because it assumes it is running as
         | only one of many apps on an end user system.)
         | 
         | > (and often run out of memory when run on Mono with the Bohem
         | garbage collector.)
         | 
         | Not exactly a fair comparison between .NET's actual GC and
         | Mono's old simpler GC before the merger. (Today's .NET shares
         | the same GC on Windows and Linux [and macOS].)
         | 
         | > This is one of my big frustrations with .net, (although I
         | tend to look at how dependency injection is implemented as a
         | bigger culprit.)
         | 
         | Startup times have gotten a lot better in recent versions of
         | .NET, AOT compiling has much improved (especially compared to
         | the ancient ngen for anyone old enough to remember needing to
         | use that for startup optimization), and while I agree .NET has
         | seen a lot of terrible DI implementations the out-of-the-box
         | one in Microsoft.Extensions does a lot of things right now,
         | including avoiding a lot of Reflection in standard usage which
         | was the big thing slowing down older DI systems. (I've seen
         | people add Reflection based "helpers" back on top of the
         | Microsoft.Extensions DI, but at that point that is a user
         | problem more than a DI problem.)
         | 
         | > It does make me wonder: How practical is it to just use
         | traditional reference counting and then periodically do a mark-
         | and-sweep?
         | 
         | Technically the "mark" of "mark-and-sweep" can be implemented
         | as traditional reference counting (and some of the earliest
         | "mark-and-sweep" implementations did just that). It still only
         | solves half the problem, though. Also, the optimizations made
         | by modern "mark" systems come from that you don't need detailed
         | counts, you just need tools equivalent to Bloom filters (what's
         | the probability this is referenced at least once) and those can
         | be much faster/more efficient to compute and use a lot less
         | memory space than reference counters while doing that.
         | 
         | If your concern is total memory consumption, traditional
         | reference counting uses more space (if only just to store
         | counts), and by itself doesn't solve fragmentation (the "sweep"
         | part of "mark-and-sweep"). From a practical standpoint,
         | combining "traditional reference counting" and a "mark-and-
         | sweep" sounds to me like asking for a less efficient "mark-
         | twice-and-sweep" algorithm.
        
           | gwbas1c wrote:
           | See https://news.ycombinator.com/item?id=45360318 (if you
           | didn't read it already)
           | 
           | The important point:
           | 
           | > IE, the critical difference is that reference counting
           | frees memory immediately; albeit at a higher CPU cost and
           | needing to still perform a mark-and-sweep to clear out cyclic
           | references.
           | 
           | Regarding:
           | 
           | > If your concern is total memory consumption, traditional
           | reference counting uses more space (if only just to store
           | counts)
           | 
           | But it also frees memory immediately, meaning that many
           | processes will appear to use less memory (unless
           | fragmentation is an issue.)
           | 
           | Don't forget that GC often adds memory overhead too: IE, mark
           | and sweep sets a generation counter in each object that it
           | can reach, and then objects that weren't updated are
           | reclaimed.
        
             | WorldMaker wrote:
             | > But it also frees memory immediately, meaning that many
             | processes will appear to use less memory (unless
             | fragmentation is an issue.)
             | 
             | I think where we disagree is I that I of course do assume
             | fragmentation is an issue, and also maybe what
             | "immediately" means in this case. The type of total memory
             | consumption that matters when you look in say Task Manager
             | is when entire pages of memory are returned to the OS, not
             | when individual objects are marked unused/free. In
             | practical concerns, fragmentation will always delay entire
             | pages returning to the OS. Reference counted languages
             | build a lot of tricks to avoid fragmentation sure, but then
             | if you are also trying to use a "mark-and-sweep" heap you
             | lost most of those optimizations in part because you are
             | then already assuming fragmentation is a problem to solve.
             | 
             | > Don't forget that GC often adds memory overhead too: IE,
             | mark and sweep sets a generation counter in each object
             | that it can reach
             | 
             | I did mention it, but also that GCs have advanced from
             | "include a generation counter in each object" to things
             | like generation bitmaps where that data is stored outside
             | of the objects themselves and then from there further
             | optimized into even more "compressed" forms ala Bloom
             | filters (they maybe don't track every object, but every
             | cluster of objects, or just objects crossing generation
             | boundaries, or they use hash buckets and probability
             | analysis, and many of these structures don't need to be
             | permanent but are transient only during specific types of
             | garbage collections; there has been a lot of work in the
             | space and many decades of efficiencies studied and built).
             | It's still overhead, but it is now a very different class
             | of overhead from reference counts.
        
       | jcmontx wrote:
       | But don't you take a hit in performance by running the GC more
       | often?
        
         | NetMageSCW wrote:
         | Not necessarily if you have more (so smaller) heaps so each GC
         | takes less time.
        
         | stonemetal12 wrote:
         | Maybe, maybe not. If GC is a O(n^2) then running it twice at
         | n=5 is a much shorter run time than once at n=10.
        
       | daxfohl wrote:
       | Maybe I missed it, but is there a shadow mode to estimate the
       | memory and perf impact without actually enabling the feature? Or
       | better yet, a way to analyze existing dotnet 8 GC logs to
       | understand the approx impact?
        
       | graycat wrote:
       | For the author, some definitions:
       | 
       | GC? -- Maybe "Garbage Collection", i.e., have some memory (mainly
       | computer _main memory_ ) allocated, don't need it (just now or
       | forever), and want to _release_ it, i.e., no longer have it
       | allocated for its original purpose. By releasing can make it
       | available for other purposes, software _threads_ , programs,
       | virtual machines, etc.
       | 
       | DATAS? -- Not a spelling error or about any usual meaning for
       | data and instead is as in
       | 
       | https://learn.microsoft.com/en-us/dotnet/standard/garbage-co...
       | 
       | for "Dynamic adaptation to application sizes"
       | 
       | So, we're trying to take actions over time in response to some
       | inputs that are in some respects unpredictable.
       | 
       | Okay, what is the _objective_ , i.e., the reason, what we hope to
       | gain, or why bother?
       | 
       | And for the part that is somewhat _unpredictable_ over time, that
       | 's one or more _stochastic processes_ (or one _multidimensional_
       | stochastic process?).
       | 
       | So, in broad terms, we are interested in stochastic optimal
       | control. "Dynamic adaptation", is close and also close to one
       | method, _dynamic programming_ -- in an earlier thread at Hacker
       | News, gave a list of references. Confession, wrote my applied
       | math Ph.D. dissertation in that subject.
       | 
       | Hmm, how to proceed??? Maybe, (A) Know more about the context,
       | e.g., what the computer is doing, what's to be minimized or
       | maximized. (B) Collect some data on the way to knowing more about
       | the stochastic processes involved.
       | 
       | For me, how to get paid? If tried to make a living from applied
       | stochastic optimal control, would have died from starvation. Got
       | the Ph.D. JUST to be better prepared as an employee for such
       | problems and had to learn that NO one, not even one in the
       | galaxy, cares as much as one photon of ~1 Hz light.
       | 
       | So, am starting a business heavily in computing and applied math.
       | The code from Microsoft tools is all in .NET, ASP.NET, ADO.NET,
       | etc. Code runs fine. The .NET software, via the VB.NET _syntactic
       | sugar_ , is GREAT for writing the code.
       | 
       | So, MUST keep up on Microsoft tools, and here just did that.
       | Since .NET 10 is changing some versions of Windows, my reaction
       | is (i) add a lot of main memory until GC is nearly irrelevant,
       | (ii) in general, wait a few years to give Microsoft time to fix
       | problems, i.e., usually be a few years behind the latest
       | versions, i.e., to "Prepare for .NET 10", first wait a few years.
       | 
       | Experience: At one time, saw some server farms big on
       | reliability. One site had two of everything, one for the real
       | work and another to test the latest for bugs before being used
       | for real work. Another had their own electrical power, Diesel
       | generators ~30 feet high, a second site duplicating everything,
       | ~400 miles away, with every site with lots of redundancy. In such
       | contexts, working hard and taking risks trying to save money on
       | main memory seem unwise.
        
       | kg wrote:
       | Some translations for acronyms and terms from this post (sourced
       | from the glossary in dotnet/runtime along with source code
       | grepping):
       | 
       | GC: Garbage Collector
       | 
       | DATAS: Dynamic adaptation to application sizes
       | 
       | UOH: User Old Heap. I can't find an explanation for what this is.
       | 
       | LOH: Large Object Heap. This is where allocations over a size
       | threshold go in .NET.
       | 
       | POH: Pinned Object Heap. Pinning is used to stop an object in the
       | GC's memory from being moved around by the GC (for compaction).
       | 
       | ASP.net: Active Server Pages for .NET. This is a framework for
       | building web applications using .NET, a successor to the classic
       | ASP which was built on COM and scripting languages like
       | JScript/VBScript.
       | 
       | Workstation / Server GC: .NET has two major GC modes which have
       | different configurations for things like having per-cpu-core
       | segregated heaps, doing background or foreground GCs, etc. This
       | is designed to optimize for different workloads, like running a
       | webserver vs a graphical application.
       | 
       | Ephemeral GC / Ephemeral generation: To quote the docs:
       | 
       | > For small objects the heap is divided into 3 generations: gen0,
       | gen1 and gen2. For large objects there's one generation - gen3.
       | Gen0 and gen1 are referred to as ephemeral (objects lasting for a
       | short time) generations.
       | 
       | Essentially, generation 0 or gen0 is where brand new objects
       | live. If the GC sees that gen0 objects have survived when it does
       | a collection, it promotes them to gen1, and then they will
       | eventually get promoted to gen2. Most temporary objects live and
       | die in gen0.
       | 
       | Pause time: Most garbage collectors will need to pause the whole
       | application in order to run, though they may not need the
       | application to stay paused the whole time they are working. So
       | pause time and % pause time track how much time the application
       | spends paused for the GC to do its job; ideally these values are
       | low.
       | 
       | BCD: Quoting the post:
       | 
       | > 1) introduced a concept of "Budget Computed via DATAS (BCD)"
       | which is calculated based on the application size and gives us an
       | upper bound of the gen0 budget for that size, which can
       | approximate the generation size for gen0
       | 
       | Essentially, this is an estimate of how much space the ephemeral
       | generation (temporary objects plus some extras) is using.
       | 
       | TCP: Quoting the post again:
       | 
       | > 2) within this upper bound, we can further reduce memory if we
       | can still maintain reasonable performance. And we define this
       | "reasonable performance" with a target Throughput Cost Percentage
       | (TCP). This takes into consideration both GC pauses and how much
       | allocating threads have to wait.
        
         | moomin wrote:
         | Good guide. I _think_ UOH is "unpinned object heap", which is a
         | variant of the large object heap that allows compaction. So the
         | only things going into the LOH these days are both large and
         | pinned. But I'm not 100% on this.
        
           | orphea wrote:
           | The parent is correct, UOH is User Old Heap.
           | 
           | https://devblogs.microsoft.com/dotnet/internals-of-the-poh/
        
       ___________________________________________________________________
       (page generated 2025-09-24 23:01 UTC)