[HN Gopher] Preparing for the .NET 10 GC
___________________________________________________________________
Preparing for the .NET 10 GC
Author : benaadams
Score : 75 points
Date : 2025-09-24 10:37 UTC (12 hours ago)
(HTM) web link (maoni0.medium.com)
(TXT) w3m dump (maoni0.medium.com)
| orphea wrote:
| For those who like me was left wondering what DATAS is, here is
| the link:
|
| https://learn.microsoft.com/en-us/dotnet/standard/garbage-co...
| gwbas1c wrote:
| Yeah, I kept scrolling to the top to see if I overlooked
| something.
|
| Then I realized, "oh, it's hosted on Medium." (I generally find
| Medium posts to be very low quality.) In this case, the author
| implies that they are on the .Net team, so I'm continuing to
| read.
|
| (At least I hope the author actually is on the .Net team and
| isn't blowing hot air, because it's a Medium post and not
| something from an official MS blog.)
| olidb wrote:
| Maoni Stephens is indeed on the .net team and is, as far as I
| know, the lead architect of the .net garbabe collector for
| many years: https://github.com/Maoni0
| https://devblogs.microsoft.com/dotnet/author/maoni/
|
| Therefore she's probably the person with the most knowledge
| about the .net GC but maybe not the best writer (I haven't
| read the article yet).
| moomin wrote:
| The writing itself is fine, but she's assuming a LOT of
| knowledge e.g. what a GC0 budget is and what increasing it
| means.
| lomase wrote:
| I did not even know Server GC was a thing.
| bob1029 wrote:
| > Maximum throughput (measured in RPS) shows a 2-3% reduction,
| but with a working set improvement of over 80%.
|
| I have a hard time finding this approach compelling. The amount
| of additional GC required in their example seems extreme to me.
| bilekas wrote:
| It's incredibly frustrating the author doesn't actually say
| "Garbage Collector (GC)" I'm aware but something niggling in the
| back of my head had me second guessing.
| nu11ptr wrote:
| Even worse: they don't explain what the DATAS acronym means.
| Seems like the author makes too many assumptions about the
| knowledge base of their reader IMO.
| stonemetal12 wrote:
| I am guessing he doesn't expect linking from outside. The
| blog post before this one starts: "In this blog post I'm
| going to talk about our .NET 8.0 GC feature called DATAS
| (Dynamic Adaptation To Application Sizes)."
| Akronymus wrote:
| Because everyone knows at least the formulas for quartz, of
| course
|
| https://xkcd.com/2501/
| graycat wrote:
| Basic rule in technical writing: For obscure terminology,
| always define and explain that (e.g., maybe with a reference)
| before using it.
| gwbas1c wrote:
| This post would carry a lot more authority if it was on an
| official MS or .net blog; instead of Medium. (I typically
| associate Medium with low-quality blog entries and don't read
| them.)
| justin66 wrote:
| Or if the author used their real name.
| giancarlostoro wrote:
| Agree. It's not like a blogpost that is about grey hat
| subjects or something.
| artimaeis wrote:
| For what it's worth, Maoni is the author's real name. Maoni0
| is what they go by everywhere. You can find interviews and
| plenty of their other content if you search around a bit.
| justin66 wrote:
| Using a handle instead of their full name on an article is
| a choice. The first impression is not "knowledgeable
| employee making post about company's product."
|
| Posting from a Microsoft blog would to some extent fix
| this, to the OP's point.
|
| (I know - who cares. But first impressions are what they
| are)
| nu11ptr wrote:
| I don't generally find them low quality, but I do wish people
| wouldn't use it since I don't subscribe to it.
| giancarlostoro wrote:
| Its the Pinterest of blogs, its really annoying.
| giancarlostoro wrote:
| Moreso if the authors profile picture wasn't what looks like a
| memecat. I can't exactly share this around without feeling like
| they'll judge it based on that alone.
| pestkranker wrote:
| Maoni0 is the mastermind behind the .NET GC. They won't judge
| you, and if they do, that's their problem.
| giancarlostoro wrote:
| Sure, but how many people truly know this? I love knowing
| about people who are key contributors to the industry. I
| run into a lot of walls when trying to talk to 99% of my
| coworkers about any of them.
| bitwize wrote:
| I think that might be her cat?
| pjmlp wrote:
| The author is one of the main GC architects on .NET, so we in
| the known are aware of who she is.
|
| Here is an interview with her,
|
| https://www.youtube.com/watch?v=ujkSnko0JNQ
|
| Having said this, I agree with you, the Aspire/MAUI architects
| do the same, I really don't get why we have to search for this
| kind of blog posts on other platforms instead of DevBlogs.
| deburo wrote:
| They should probably cross-post in both.
| lomase wrote:
| Microsoft likes to just delete thist kind of blogs from its
| site.
|
| Many times I have found an interesting article only for it to
| be just not accesible.
|
| Maybe that is why people stoped using it.
| omnicognate wrote:
| Microsoft have a terrible track record for moving and deleting
| technical content, to the extent I think I'd rather their
| developers host their articles almost anywhere else.
|
| Maoni Stephens is the lead developer on the .NET garbage
| collector. An "About" entry would probably help, but she has a
| lot of name recognition in the .NET community and in the
| article it's clear from the first sentence that she's talking
| from the perspective of owning the GC.
| gwbas1c wrote:
| One anecdote from working with .Net for over 20 years: I've had a
| few situations where someone (who isn't a programmer and/or
| doesn't work with .Net) insists that the application has a memory
| leak.
|
| First, I explain that garbage collected applications don't
| release memory immediately. Then I get sucked into a wild goose
| chase looking for a memory leak that doesn't exist. Finally, I
| point out that the behavior they see is normal, usually to some
| grumbling.
|
| _From what I can tell, DATAS basically makes a .Net application
| have a normal memory footprint._ Otherwise, .Net is quite a pig
| when it comes to memory. https://github.com/GWBasic/soft_matrix,
| implemented in Rust, generally has very low memory consumption.
| An earlier version that I wrote in C# would consume gigabytes of
| memory (and often run out of memory when run on Mono with the
| Bohem garbage collector.)
|
| ---
|
| > If startup perf is critical, DATAS is not for you
|
| This is one of my big frustrations with .net, (although I tend to
| look at how dependency injection is implemented as a bigger
| culprit.)
|
| It does make me wonder: How practical is it to just use
| traditional reference counting and then periodically do a mark-
| and-sweep? I know it's a very different approach than .net was
| designed for. (Because they deliberately decided that
| dereferencing an object should have no computational cost.) It's
| more of a rhetorical question.
| nu11ptr wrote:
| > It does make me wonder: How practical is it to just use
| traditional reference counting and then periodically do a mark-
| and-sweep? I know it's a very different approach than .net was
| designed for. (Because they deliberately decided that
| dereferencing an object should have no computational cost.)
| It's more of a rhetorical question.
|
| This is what CPython does. The trade off is solidly worse
| allocator performance, however. You also have the reference
| counting overhead, which is not trivial unless it is deferred.
|
| There is always a connection between the allocator and
| collector. If you use a compacting collector (which I assumed
| .NET does), you get bump pointer allocation, which is very
| fast. However, if you use a non-compacting collector (mark-and-
| sweep is non-compacting), you would then fallback to a normal
| free list allocator (aka as "malloc") which has solidly higher
| overhead. You can see the impact of this (and reference
| counting) in any benchmark that builds a tree (and therefore is
| highly contended on allocation). This is also why languages
| that use free list allocation often have some sort of "arena"
| library, so they can have high speed bump pointer allocation in
| hot spots (and then free all that memory at once later on).
|
| BTW, reference counting, malloc/free performance also impact
| Rust, but given Rust's heavy reliance on the stack it often
| doesn't impact performance much (aka just doing less
| allocations). For allocation heavy code, many of us use
| MiMalloc one of the better malloc/free implementations.
| gwbas1c wrote:
| So basically you're trading lowering RAM consumption for
| higher CPU consumption?
|
| FWIW: When I look at Azure costs, RAM tends to cost more than
| CPU. So the tradeoffs of using a "slower" memory manager
| might be justified.
| nu11ptr wrote:
| It depends on workload. It is difficult to quantify the
| trade offs without knowing that.
|
| The problem is in languages like C#/Java almost everything
| is an allocation, so I don't really think reference
| counting would work well there. I suspect this is the
| reason PyPy doesn't use reference counting, it is a big
| slowdown for CPython. Reference counting really only works
| well in languages with low allocations. Go mostly gets away
| with a non-compacting mark-sweep collector because it has
| low level control that allows many things to sit on the
| stack (like Rust/C/C++, etc.).
| adgjlsfhk1 wrote:
| C# is a lot better than Java on this front since they
| support stack allocated structs
| whaleofatw2022 wrote:
| Dotnet does both mark and sweep as well as compaction,
| depends on what type of GC happens.
| gwbas1c wrote:
| In this case, we're discussing a case where mark-and-sweep
| is used to collect cyclic references, and it's implied that
| there are no generations. (Because otherwise, purely
| relying on reference counting means that cyclic references
| end up leaking unless things like weak references are
| used.)
|
| IE, the critical difference is that reference counting
| frees memory immediately; albeit at a higher CPU cost and
| needing to still perform a mark-and-sweep to clear out
| cyclic references.
| SideburnsOfDoom wrote:
| > First, I explain that garbage collected applications don't
| release memory immediately. ... I point out that the behavior
| they see is normal
|
| yes, this is an easily overlooked point: Using memory when it
| going free is by design. It is often better to use use up
| cheap, unused memory instead of expensive CPU doing a GC. When
| memory is plentiful as it often is, then it is faster to just
| not run a GC yet.
|
| You're not in trouble unless you run short of memory, and a
| necessary GC does not free up enough. Then only can you call it
| an issue.
| kg wrote:
| One of the main problems with refcounting is that unless your
| compiler/JIT are able to safely, aggressively optimize out
| reference increment/decrements, you can spend a ton of CPU time
| pointlessly bumping a counter up and down every time you enter
| a new function/method. This has been a problem for ObjC and
| Swift applications in the past AFAIK, though both of those
| compilers do a great job of optimizing that stuff out where
| possible.
|
| There are some other things that would probably be improvements
| coming along with refcounting though - you might be able to get
| rid of GC write barriers.
| bob1029 wrote:
| To be fair, there is an entire class of GC/memory problems that
| aren't _technically_ a leak but manifest in effectively the
| same way.
|
| The most common one I see is LOH (Large Object Heap)
| fragmentation. When objects are promoted to the LOH the runtime
| doesnt bother with moving them around anymore. There is a way
| to explicitly compact the LOH but it can be a non-starter for a
| lot of applications.
|
| https://learn.microsoft.com/en-us/dotnet/api/system.runtime....
|
| I've once exposed this as a button that a customer's IT
| department could click whenever they received an alert on
| memory utilization. The actual solution would have been to
| refactor the entire product to not pass gigantic blobs around
| all the time, but that wasn't in the cards for us.
| WorldMaker wrote:
| > From what I can tell, DATAS basically makes a .Net
| application have a normal memory footprint.
|
| In _Server environments_. DATAS is an upgrade to garbage
| collection in "Server mode". Server GC assumed it could be the
| only thing running on a machine and could use as much memory as
| it wanted and so would just easily over-allocate memory much
| more than what it immediately needed. (As the article points
| out, it would start at a large fixed amount of memory times the
| number of CPU cores.)
|
| (As opposed to "Workstation GC" which has always tried to
| minimize memory consumption because it assumes it is running as
| only one of many apps on an end user system.)
|
| > (and often run out of memory when run on Mono with the Bohem
| garbage collector.)
|
| Not exactly a fair comparison between .NET's actual GC and
| Mono's old simpler GC before the merger. (Today's .NET shares
| the same GC on Windows and Linux [and macOS].)
|
| > This is one of my big frustrations with .net, (although I
| tend to look at how dependency injection is implemented as a
| bigger culprit.)
|
| Startup times have gotten a lot better in recent versions of
| .NET, AOT compiling has much improved (especially compared to
| the ancient ngen for anyone old enough to remember needing to
| use that for startup optimization), and while I agree .NET has
| seen a lot of terrible DI implementations the out-of-the-box
| one in Microsoft.Extensions does a lot of things right now,
| including avoiding a lot of Reflection in standard usage which
| was the big thing slowing down older DI systems. (I've seen
| people add Reflection based "helpers" back on top of the
| Microsoft.Extensions DI, but at that point that is a user
| problem more than a DI problem.)
|
| > It does make me wonder: How practical is it to just use
| traditional reference counting and then periodically do a mark-
| and-sweep?
|
| Technically the "mark" of "mark-and-sweep" can be implemented
| as traditional reference counting (and some of the earliest
| "mark-and-sweep" implementations did just that). It still only
| solves half the problem, though. Also, the optimizations made
| by modern "mark" systems come from that you don't need detailed
| counts, you just need tools equivalent to Bloom filters (what's
| the probability this is referenced at least once) and those can
| be much faster/more efficient to compute and use a lot less
| memory space than reference counters while doing that.
|
| If your concern is total memory consumption, traditional
| reference counting uses more space (if only just to store
| counts), and by itself doesn't solve fragmentation (the "sweep"
| part of "mark-and-sweep"). From a practical standpoint,
| combining "traditional reference counting" and a "mark-and-
| sweep" sounds to me like asking for a less efficient "mark-
| twice-and-sweep" algorithm.
| gwbas1c wrote:
| See https://news.ycombinator.com/item?id=45360318 (if you
| didn't read it already)
|
| The important point:
|
| > IE, the critical difference is that reference counting
| frees memory immediately; albeit at a higher CPU cost and
| needing to still perform a mark-and-sweep to clear out cyclic
| references.
|
| Regarding:
|
| > If your concern is total memory consumption, traditional
| reference counting uses more space (if only just to store
| counts)
|
| But it also frees memory immediately, meaning that many
| processes will appear to use less memory (unless
| fragmentation is an issue.)
|
| Don't forget that GC often adds memory overhead too: IE, mark
| and sweep sets a generation counter in each object that it
| can reach, and then objects that weren't updated are
| reclaimed.
| WorldMaker wrote:
| > But it also frees memory immediately, meaning that many
| processes will appear to use less memory (unless
| fragmentation is an issue.)
|
| I think where we disagree is I that I of course do assume
| fragmentation is an issue, and also maybe what
| "immediately" means in this case. The type of total memory
| consumption that matters when you look in say Task Manager
| is when entire pages of memory are returned to the OS, not
| when individual objects are marked unused/free. In
| practical concerns, fragmentation will always delay entire
| pages returning to the OS. Reference counted languages
| build a lot of tricks to avoid fragmentation sure, but then
| if you are also trying to use a "mark-and-sweep" heap you
| lost most of those optimizations in part because you are
| then already assuming fragmentation is a problem to solve.
|
| > Don't forget that GC often adds memory overhead too: IE,
| mark and sweep sets a generation counter in each object
| that it can reach
|
| I did mention it, but also that GCs have advanced from
| "include a generation counter in each object" to things
| like generation bitmaps where that data is stored outside
| of the objects themselves and then from there further
| optimized into even more "compressed" forms ala Bloom
| filters (they maybe don't track every object, but every
| cluster of objects, or just objects crossing generation
| boundaries, or they use hash buckets and probability
| analysis, and many of these structures don't need to be
| permanent but are transient only during specific types of
| garbage collections; there has been a lot of work in the
| space and many decades of efficiencies studied and built).
| It's still overhead, but it is now a very different class
| of overhead from reference counts.
| jcmontx wrote:
| But don't you take a hit in performance by running the GC more
| often?
| NetMageSCW wrote:
| Not necessarily if you have more (so smaller) heaps so each GC
| takes less time.
| stonemetal12 wrote:
| Maybe, maybe not. If GC is a O(n^2) then running it twice at
| n=5 is a much shorter run time than once at n=10.
| daxfohl wrote:
| Maybe I missed it, but is there a shadow mode to estimate the
| memory and perf impact without actually enabling the feature? Or
| better yet, a way to analyze existing dotnet 8 GC logs to
| understand the approx impact?
| graycat wrote:
| For the author, some definitions:
|
| GC? -- Maybe "Garbage Collection", i.e., have some memory (mainly
| computer _main memory_ ) allocated, don't need it (just now or
| forever), and want to _release_ it, i.e., no longer have it
| allocated for its original purpose. By releasing can make it
| available for other purposes, software _threads_ , programs,
| virtual machines, etc.
|
| DATAS? -- Not a spelling error or about any usual meaning for
| data and instead is as in
|
| https://learn.microsoft.com/en-us/dotnet/standard/garbage-co...
|
| for "Dynamic adaptation to application sizes"
|
| So, we're trying to take actions over time in response to some
| inputs that are in some respects unpredictable.
|
| Okay, what is the _objective_ , i.e., the reason, what we hope to
| gain, or why bother?
|
| And for the part that is somewhat _unpredictable_ over time, that
| 's one or more _stochastic processes_ (or one _multidimensional_
| stochastic process?).
|
| So, in broad terms, we are interested in stochastic optimal
| control. "Dynamic adaptation", is close and also close to one
| method, _dynamic programming_ -- in an earlier thread at Hacker
| News, gave a list of references. Confession, wrote my applied
| math Ph.D. dissertation in that subject.
|
| Hmm, how to proceed??? Maybe, (A) Know more about the context,
| e.g., what the computer is doing, what's to be minimized or
| maximized. (B) Collect some data on the way to knowing more about
| the stochastic processes involved.
|
| For me, how to get paid? If tried to make a living from applied
| stochastic optimal control, would have died from starvation. Got
| the Ph.D. JUST to be better prepared as an employee for such
| problems and had to learn that NO one, not even one in the
| galaxy, cares as much as one photon of ~1 Hz light.
|
| So, am starting a business heavily in computing and applied math.
| The code from Microsoft tools is all in .NET, ASP.NET, ADO.NET,
| etc. Code runs fine. The .NET software, via the VB.NET _syntactic
| sugar_ , is GREAT for writing the code.
|
| So, MUST keep up on Microsoft tools, and here just did that.
| Since .NET 10 is changing some versions of Windows, my reaction
| is (i) add a lot of main memory until GC is nearly irrelevant,
| (ii) in general, wait a few years to give Microsoft time to fix
| problems, i.e., usually be a few years behind the latest
| versions, i.e., to "Prepare for .NET 10", first wait a few years.
|
| Experience: At one time, saw some server farms big on
| reliability. One site had two of everything, one for the real
| work and another to test the latest for bugs before being used
| for real work. Another had their own electrical power, Diesel
| generators ~30 feet high, a second site duplicating everything,
| ~400 miles away, with every site with lots of redundancy. In such
| contexts, working hard and taking risks trying to save money on
| main memory seem unwise.
| kg wrote:
| Some translations for acronyms and terms from this post (sourced
| from the glossary in dotnet/runtime along with source code
| grepping):
|
| GC: Garbage Collector
|
| DATAS: Dynamic adaptation to application sizes
|
| UOH: User Old Heap. I can't find an explanation for what this is.
|
| LOH: Large Object Heap. This is where allocations over a size
| threshold go in .NET.
|
| POH: Pinned Object Heap. Pinning is used to stop an object in the
| GC's memory from being moved around by the GC (for compaction).
|
| ASP.net: Active Server Pages for .NET. This is a framework for
| building web applications using .NET, a successor to the classic
| ASP which was built on COM and scripting languages like
| JScript/VBScript.
|
| Workstation / Server GC: .NET has two major GC modes which have
| different configurations for things like having per-cpu-core
| segregated heaps, doing background or foreground GCs, etc. This
| is designed to optimize for different workloads, like running a
| webserver vs a graphical application.
|
| Ephemeral GC / Ephemeral generation: To quote the docs:
|
| > For small objects the heap is divided into 3 generations: gen0,
| gen1 and gen2. For large objects there's one generation - gen3.
| Gen0 and gen1 are referred to as ephemeral (objects lasting for a
| short time) generations.
|
| Essentially, generation 0 or gen0 is where brand new objects
| live. If the GC sees that gen0 objects have survived when it does
| a collection, it promotes them to gen1, and then they will
| eventually get promoted to gen2. Most temporary objects live and
| die in gen0.
|
| Pause time: Most garbage collectors will need to pause the whole
| application in order to run, though they may not need the
| application to stay paused the whole time they are working. So
| pause time and % pause time track how much time the application
| spends paused for the GC to do its job; ideally these values are
| low.
|
| BCD: Quoting the post:
|
| > 1) introduced a concept of "Budget Computed via DATAS (BCD)"
| which is calculated based on the application size and gives us an
| upper bound of the gen0 budget for that size, which can
| approximate the generation size for gen0
|
| Essentially, this is an estimate of how much space the ephemeral
| generation (temporary objects plus some extras) is using.
|
| TCP: Quoting the post again:
|
| > 2) within this upper bound, we can further reduce memory if we
| can still maintain reasonable performance. And we define this
| "reasonable performance" with a target Throughput Cost Percentage
| (TCP). This takes into consideration both GC pauses and how much
| allocating threads have to wait.
| moomin wrote:
| Good guide. I _think_ UOH is "unpinned object heap", which is a
| variant of the large object heap that allows compaction. So the
| only things going into the LOH these days are both large and
| pinned. But I'm not 100% on this.
| orphea wrote:
| The parent is correct, UOH is User Old Heap.
|
| https://devblogs.microsoft.com/dotnet/internals-of-the-poh/
___________________________________________________________________
(page generated 2025-09-24 23:01 UTC)