[HN Gopher] Mark-Scavenge: Waiting for Trash to Take Itself Out
       ___________________________________________________________________
        
       Mark-Scavenge: Waiting for Trash to Take Itself Out
        
       Author : vips7L
       Score  : 154 points
       Date   : 2024-11-24 17:40 UTC (1 days ago)
        
 (HTM) web link (inside.java)
 (TXT) w3m dump (inside.java)
        
       | A4ET8a8uTh0 wrote:
       | What a time to be alive, I read it the opening fully expecting to
       | see an open source automated trashcan that takes itself to the
       | curb each Monday. I was disappointed to find out it is about an
       | actual garbage collection algorithm.
        
         | garblegarble wrote:
         | I know this is completely off-topic, but you might be
         | interested in this[1] YouTuber who did something not far off
         | that...
         | 
         | 1: https://www.youtube.com/watch?v=VhYEOG9LOIk
        
       | kibwen wrote:
       | _> Modern garbage collection assumes that the weak generational
       | hypothesis holds and that most objects die young_
       | 
       | Aside, I'm curious how first-class support for value types and
       | Go-style stack allocation via escape analysis changes the value
       | proposition of the generational hypothesis. If we hypothesize
       | that short-lived objects are local to a single function scope
       | (and thus eligible for stack allocation either explicitly via a
       | value type or heuristically via escape analysis) then it might
       | completely upend the generational hypothesis and make it so that
       | relatively more long-lived objects are getting heap-allocated.
       | Surely someone's done some studies on this?
        
         | DamonHD wrote:
         | AFAIK not nearly enough stuff gets caught by escape analysis -
         | and thus stack allocated - to make a difference.
        
           | kibwen wrote:
           | I'm open to believing that this is true, but some real
           | numbers would be nice. Surely it wouldn't be a hugely
           | invasive change to fork the Go compiler, change the stack
           | allocation check to `return false`, and then measure the
           | overhead of the garbage collector on real Go programs with
           | stack allocation both enabled and disabled.
        
             | DamonHD wrote:
             | I may have been answering past you - I am thinking of Java
             | running on the JDK here. And indeed I may be out of date
             | also.
        
               | cempaka wrote:
               | Yeah in Java land specifically I think the question would
               | become, "does the generational hypothesis still hold up
               | once we have Valhalla and a much larger share of short-
               | lived objects are stack allocated as value types?" but of
               | course it may be years until the ecosystem reaches that
               | point, if ever.
        
               | neonsunset wrote:
               | As shown by C#, it will generally continue to be relevant
               | since both primarily use JIT compilation with ability to
               | modify code at runtime which can violate inter-procedural
               | escape analysis assumptions leading to heap allocations
               | of the objects that are passed down to the callees (there
               | is work scheduled for .NET 10 to address this, at least
               | for AOT compilation where interproc analysis conclusions
               | will be impossible to violate).
               | 
               | You can craft a workload which violates the hypothesis by
               | only allocating objects that live for a long time but
               | both JVM and .NET GC implementations are still much
               | faster designs than Go's GC which prioritizes small
               | memory footprint and consistent latency on _low_
               | allocation traffic (though as of .NET 9, SRV GC puts much
               | more priority on this, making similar tradeoffs).
        
               | cempaka wrote:
               | > _ability to modify code at runtime_
               | 
               | Would Java's moves towards "integrity by default" mean
               | that this could be ruled out in more cases?
        
               | neonsunset wrote:
               | Reading through the JEP again it does not seem to be
               | related - it is about deprecating unsafe APIs that the
               | executed code itself uses. OpenJDK also has "partial
               | escape analysis" where the object that only conditionally
               | escapes can still be placed on the stack/scalar replaced.
               | 
               | I'm not privy to the exact APIs that OpenJDK exposes but
               | in .NET the main limitation around escape analysis that
               | spans multiple methods is the fact that CoreCLR has re-
               | JIT API which allows to perform a multitude of actions
               | like attaching a profiler or a debugger to a live
               | application and forcing the runtime to deoptimize a
               | particular method for debugging, or modifying the
               | implementation and re-JITting the result. Debug codegen
               | is very different especially around GC liveness tracking
               | and escape analysis that builds on top of it - it means
               | that even debug code would have to uphold stack-allocated
               | nature of such object in some way, complicating the
               | design significantly. In addition to that, debuggers and
               | other instrumentation may observe object references that
               | would have otherwise not escaped local scope.
               | 
               | This creates an unfortunate situation where the above
               | almost never happens in production, but ignoring it and
               | "just stack-allocating anyway" would lead to disastrous
               | breakage for all sorts of existing instrumentation.
               | Because Go does not have to deal with this constraint, it
               | can perform interproc escape analysis without risk -
               | whether a pointer escapes or not can be statically
               | proven. For NativeAOT, .NET could approach this problem
               | in the same way, but paraphrasing compiler team: "We
               | would like to avoid optimizations only available for one
               | of the target types be it JIT or AOT, and only supporting
               | AOT would not benefit the majority of the .NET users".
               | 
               | There is, however, recognition that more comprehensive
               | interproc analysis could be very beneficial, including
               | the EA which is why it is planned to work on it in .NET
               | 10:
               | 
               | - https://github.com/dotnet/runtime/issues/108931 IPA
               | framework
               | 
               | - https://github.com/dotnet/runtime/issues/104936 Stack
               | allocation enhancements
        
               | cempaka wrote:
               | Yeah there's a JEP around deprecating access to
               | sun.misc.Unsafe, but that's part of a larger effort
               | including Jigsaw to push the Java ecosystem in the
               | direction of modular builds, where more invariants are
               | assumed to hold (e.g. " 'final' fields are _actually_
               | final ") unless explicitly opted out for each module. I
               | would assume the lack of such guarantees in the status
               | quo wreaks a lot of havoc with EA.
               | 
               | Profiling and debugging would be separate considerations
               | -- I'm really not sure what limitations those impose on
               | the JVM JIT.
        
               | pjmlp wrote:
               | Integrity by default is what the OpenJDK folks are
               | pushing for so that any API that can break runtime
               | assumptions, has to be explicitly allowed, so that they
               | can actually make use of performance optimizations that
               | would otherwise be too risky if anyone at any time could
               | violate them.
        
               | DamonHD wrote:
               | Ahem --- JDK => JVM!
        
             | DarkNova6 wrote:
             | The reason escape analysis is not "good enough" is why we
             | have project Valhalla trying to bring Value Types into the
             | JVM.
             | 
             | I don't have numbers at hand, but I remember the JDK Expert
             | Group talking about this extensively in the past and why
             | they deferred bringing Value Types for such a long time.
             | They hoped complex enough EA can get rid of indirections
             | and heap allocations but it just wasn't powerful enough,
             | even with all advances throughout the years.
        
           | fweimer wrote:
           | Historically, Hotspot's escape analysis only resulted in
           | avoided heap allocations (via scalar replacement) if all uses
           | were inlined. I don't think this has changed.
        
           | eikenberry wrote:
           | Is there a language that makes this explicit, allocates the
           | variables on the stack via compiler enforced notation?
        
             | fanf2 wrote:
             | C, C++, Rust, Zig, ...
        
               | eikenberry wrote:
               | What's Zig's notation for it?
        
               | masklinn wrote:
               | Not doing anything, same as the other 3.
               | 
               | Heap allocation is what requires requesting memory from
               | an allocator.
        
               | eikenberry wrote:
               | Right.. been using GC languages to long. Everything
               | allocated to the stack unless it is specifically
               | allocated to the heap. I was stuck thinking about what
               | the keywords were. But I'm still curious.. are there any
               | GC languages that have a way to specify if something
               | should go on the stack or the heap?
        
               | masklinn wrote:
               | > are there any GC languages that have a way to specify
               | if something should go on the stack or the heap?
               | 
               | I think only in the sense that some GCd language have
               | value (stack) types as a separate hierarchy from heap
               | types? E.g. structs in C# or Swift are stack-allocated
               | (and value-identity, and copied) whereas classes are
               | heap-allocated.
               | 
               | Adding that for java is one of the goals of Project
               | Valhalla I believe.
        
             | neonsunset wrote:
             | C# (.NET in general) :)
             | 
             | Well, variables cannot be forced to stack specifically.
             | They are placed in the "local scope". And that would
             | usually be either stack or CPU registers - thinking in
             | stack only is a somewhat flawed mental model.
             | 
             | Both C# and F# complicate this by supporting closures,
             | iterator and async methods which capture variables placing
             | them in a state machine box / display class instead which
             | would be located on the heap, unless stack-allocated by
             | escape analysis (unlikely because these usually cross
             | method boundaries).
             | 
             | However, .NET has `ref structs` (or, in F#, [<Struct;
             | IsByRefLike>] types) which are subject to lifetime analysis
             | and can never be placed on the heap directly or otherwise.
        
           | masklinn wrote:
           | Go has much more significant stack allocation capabilities,
           | most notably it has no problem allocating entire structs on
           | the stack so doesn't need scalar replacement, which falls
           | over if you breathe on it
           | (https://pkolaczk.github.io/overhead-of-optional/).
           | 
           | According to https://github.com/golang/go/discussions/70257#d
           | iscussioncom...
           | 
           | > the weak generational hypothesis does not hold up well with
           | respect to heap-allocated memory in many real-world Go
           | programs (think 60-70% young object mortality vs. the 95%
           | typically expected)
        
         | DarkNova6 wrote:
         | But stack allocated objects are not part of the heap and
         | therefore not even part of Garbage Collection? And afaik stack
         | allocation is already done for objects which don't escape a
         | method.
        
           | masklinn wrote:
           | Yes, but that's the point: objects which don't escape are
           | pretty much all young objects. So by this process the stack
           | captures a significant fraction of the young generation, that
           | young generation never reaches the heap and this is never
           | under consideration by the GC.
           | 
           | Essentially the stack is a form of younggen. It is not as
           | complete (as there are things which must be heap allocated)
           | but because it _is_ , it reduces the benefits of a
           | generational GC... without having much impact on its costs
           | and complexity.
           | 
           | Depending on work load, that competition can be sufficient to
           | make a generational GC net negative.
        
             | DarkNova6 wrote:
             | Thanks for the answer. But is this actual behaviour for the
             | GCs of the JDK? I was certain that at the very least
             | Hotspot makes use of stack allocation as much as possible.
             | 
             | But perhaps the JDK GCs don't care so much about the stack
             | because that is already dealt by the JVM a step prior? In
             | any case, there will likely still be young objects
             | allocated in the heap and this new algorithm might prove
             | useful.
             | 
             | But you can tell I am far from an expert here.
        
               | masklinn wrote:
               | > Thanks for the answer. But is this actual behaviour for
               | the GCs of the JDK? I was certain that at the very least
               | Hotspot makes use of stack allocation as much as
               | possible.
               | 
               | Not really, java has some escape analysis but it's very
               | limited in its ability to stack allocate as it can't put
               | entire structures (objects) on the stack, it only works
               | if the compiler manages to scalar-replace the object
               | (https://shipilev.net/jvm/anatomy-quarks/18-scalar-
               | replacemen...) and that has somewhat restricted
               | applicability (https://pkolaczk.github.io/overhead-of-
               | optional/). The behaviour I'm talking about is mostly
               | that of Go, as it is much more capable of stack
               | allocating, and specifically has a non-generational GC
               | because in testing they found generational GCs had a very
               | variable impact depending on workload (rather than a
               | universally or near universally positive impact).
        
               | jerven wrote:
               | There was a microsoft prototype for more stack allocation
               | in OpenJDK (https://archive.fosdem.org/2020/schedule/even
               | t/reducing_gc_t...). I recall that being put on hold
               | because of how it would interact with project Loom fast
               | stack copying. But I don't know the current status.
               | 
               | GO has a non moving GC and I understand, that the cost of
               | introducing safe moving GC is considered high. If one has
               | a moving GC which the serious java one's are read/write
               | barriers are already required, especially if they are
               | concurrent like ZGC, C4 or Shenadoah. ZGc, C4 and
               | Shenadoah all started out as non generational GC
               | implementations, which gained them later, because in most
               | cases they do increase performance/reduce overhead.
               | 
               | Valhalla makes objects denser, and reduces overhead of
               | identity which is great. Reducing the difference in
               | memory layout between java objects and nested go structs.
               | 
               | Go with arena's reduce the GC de-allocation costs.
               | Something that the ZGC team is looking at in relation to
               | loom/virtual threads. (but I can't find the reference for
               | that right now)
        
             | pfdietz wrote:
             | I wonder if architectural support could be added to reduce
             | the cost of recording modification information.
        
         | masklinn wrote:
         | > it might completely upend the generational hypothesis
         | 
         | The generational hypothesis is about object lifetime, and that
         | doesn't change.
         | 
         | It does change the relevance of the generational hypothesis to
         | garbage collection.
         | 
         | > Surely someone's done some studies on this?
         | 
         | The go team has, and that's why go doesn't have a generational
         | GC. The complexity of adding generational support, especially
         | in a mutation-based language (so needing memory barriers and
         | the like) was found not to benefit when a significant fraction
         | of the newborn objects don't even reach the youngen. See
         | https://github.com/golang/go/discussions/70257#discussioncom...
         | from the current discussion of adding opt-in ad-hoc support for
         | memory regions.
        
         | uluyol wrote:
         | You might be interested in this talk:
         | https://go.dev/blog/ismmkeynote
        
         | ithkuil wrote:
         | If you flip the argument on its head you can frame it as: since
         | most objects die young it's very likely they will stay on the
         | stack and thus it makes sense to invest in an allocation-site
         | optimizer that will put the object on the heap only if static
         | escape analysis says it may escape the lexical scope.
        
         | pron wrote:
         | Stack allocation doesn't make a big difference. To see this,
         | consider the working set of a server application vs. the number
         | of threads. Assuming a stack size circa 1MB, thousands of
         | threads are needed to account for a significant portion of the
         | working set. Go and Java's user-mode threads certainly make it
         | more interesting, but not enough to dominate heap objects
         | (arena allocation, which is particularly nice in Zig) is a
         | different matter, and can have a bigger impact).
         | 
         | The main reason for Java's Project Valhalla is not stack
         | allocation, as it has too little impact, but arrays-of-structs
         | on the heap. These matter not so much for their memory-
         | management aspect, but the rate of cache-misses.
        
       | fweimer wrote:
       | How does this relate to Shenandoah's region selection logic?
       | Doesn't it have similar behavior?
        
       | m463 wrote:
       | This is fascinating.
       | 
       | The idea of mark/scavenge is pretty cool for page-based
       | allocation and deallocation.
       | 
       | Also real-time or maybe latency-intolerant systems, which
       | basically can't block for garbage collection. You could probably
       | keep scavenging and stay ahead of things without impacting things
       | in a black and white way.
        
         | AtlasBarfed wrote:
         | This may be naive, but gc has now been around since the java
         | days at mass scale (that is I'm calling java the first mass
         | market GC computing system, just go with it for now) for about
         | 30 years.
         | 
         | The big lie of GC is that you never need to worry about memory
         | management. The insidious lie is that you have no control when
         | you do need it
         | 
         | Rust went with a new extreme: no GC, but the compiler does a
         | memory verification and a LOT of obscure complex notation and
         | keywords
         | 
         | So ... I'm wondering if there is a happy medium, where you can
         | have keywords that strongly hint to a GC how to manage memory
         | (eg arena/permanent, thread group bound, local/young, etc) that
         | will be easier than rust.
         | 
         | Maybe we could have multiple heaps defined (a big problem with
         | GC heaps is the bigger they are the harder it is for the bigO
         | of the management algorithms to scale, so nice they aren't
         | linear bigO).
         | 
         | I'm thinking of something like Cassandra where tables and even
         | subsections of tables don't need their bloom filters, commit
         | logs, atc all intermingled in a single heap, to the great
         | detriment of heao performance.
         | 
         | Instead, provide annotations to partition the heaps or run
         | multiple heaps
         | 
         | Of course this is a massive complexity rabbit hole/slippery
         | slide.
        
           | jddj wrote:
           | Zig's allocators run along this line of thought
        
       | codetrotter wrote:
       | I was wondering if the .java TLD belonged to the Java island, or
       | to Oracle or to someone else.
       | 
       | https://icannwiki.org/.java
       | 
       | It's Oracle.
        
       ___________________________________________________________________
       (page generated 2024-11-25 23:01 UTC)