[HN Gopher] Garbage Collection for Systems Programmers (2023)
       ___________________________________________________________________
        
       Garbage Collection for Systems Programmers (2023)
        
       Author : ingve
       Score  : 246 points
       Date   : 2024-03-30 11:03 UTC (11 hours ago)
        
 (HTM) web link (bitbashing.io)
 (TXT) w3m dump (bitbashing.io)
        
       | spintin wrote:
       | Or you just use static atomic variables no?
       | 
       | I mean it wont solve your race conditions but it also wont crash.
       | 
       | And it will be VERY fast because parallelism without cache
       | misses!
       | 
       | I wish there was a Java with C struct JNI access.
        
         | bugbuddy wrote:
         | This is false. Depending on the CPU Arch, updating a shared
         | static atomic variable may cause cache invalidation. How fast
         | depends on how hot and tight you critical sections are.
        
         | yvdriess wrote:
         | Saying this as a big hardware atomics enjoyer: Static atomic
         | variables are not fast, are not parallel and definitely are the
         | worst kind of cache misses.
         | 
         | Atomics are also atomic only on that one single operation, such
         | as manipulating a single number in a struct or swapping out a
         | pointer from one version of an object to another. To atomically
         | expose a set of changes to a datastructure, you either use a
         | lock or swap the old for the new, which is the driving example
         | in the article.
        
         | neonsunset wrote:
         | It's called C# (supporting plain C structs is just a small part
         | of its low-level features)
        
           | pjmlp wrote:
           | Or D, available as official fronted language on GCC and LLVM
           | projects, :)
        
       | magicalhippo wrote:
       | A point that seems to get lost in many of the pro-GC articles,
       | this one included from what I can see, is that memory is just one
       | kind of resource.
       | 
       | Correct code, especially in systems programming, will need to
       | manage external resources as well, be it file handles, sockets or
       | whatnot. GC only solves the application memory part, thus doesn't
       | help at all for handling those external resources. In fact it can
       | make it much more complicated, just look at what it takes to
       | correctly implement a non-trivial IDispose in .Net.
       | 
       | Other approaches like RAII or reference counting makes it much
       | easier in my experience to handle both memory and external
       | resources in a unified way, thus making it easier to write
       | correct code and reason about it.
       | 
       | That said, I'm not blatantly against GC's. It's a tool and it has
       | some pros and cons, like everything else. The "manual GC" RCU
       | approach mentioned in the article is interesting for certain
       | tasks.
        
         | zwieback wrote:
         | So true, moving from C++ to mostly C# I'm liking memory
         | management by GC but hating tracking file handles, sockets,
         | etc. RAII is something I really appreciated
        
           | neonsunset wrote:
           | (SafeFileHandle is internally reference counted, but yes,
           | worst case when you forget to .Dispose it, it waits in the
           | finalizer queue for the GC to trigger the finalizer thread to
           | process the items in it)
        
           | pjmlp wrote:
           | Use Roslyn analysis that errors when using is forgotten in an
           | IDisposable type, for example.
           | 
           | By the way, in modern .NET, using makes use of structural
           | typing. Any class or struct, with a Dispose() method can be
           | referred to, no need to implement the interface, and can also
           | be retrofitted via extension methods.
        
             | osigurdson wrote:
             | This is strange advice. Why not just implement the
             | IDisposable interface like normal C# code? Using extension
             | methods for this is strange as well. Doing this even
             | somewhat correctly would mean creating another class that
             | implements IDisposable which can only work with public
             | members of the original. In general best not to do weird
             | stuff for no reason imo.
        
         | pron wrote:
         | There is a big difference between memory and other resources,
         | and that is that memory -- like processing -- is fundamental
         | for any computation; not for nothing do most theoretical models
         | of computation assume unbounded memory. Very few languages
         | require manual allocation of processing -- even though that is
         | done in some software like OS kernels and hard realtime
         | applications -- and for similar reasons automatic memory
         | management is very useful for abstracting computation, i.e. not
         | leaking their details of memory by a subroutine to its clients
         | just as this is rarely done for CPU use.
         | 
         | So while every non-trivial computation involves some non-
         | constant amount of processing and memory, I/O is usually done
         | at the edges of the system. Management of I/O is very
         | important, of course, but it's not as central to the notion of
         | computation (and therefore to abstracting computation) as
         | processing and memory.
        
         | samatman wrote:
         | > _GC only solves the application memory part, thus doesn 't
         | help at all for handling those external resources._
         | 
         | This is far from entirely true. Most languages with a garbage
         | collector have finalizers, which will clean that sort of thing
         | up. Generally, and unlike memory allocation, one can call those
         | finalizers from within user code, so as not to have to rely on
         | the GC running.
         | 
         | The distinction between reference counting and garbage
         | collection is an artificial one. Reference counting is an
         | approach to garbage collection, one with different tradeoffs
         | from more concept-central algorithms for GC. I agree with you
         | that it's a more unified approach to resource management,
         | finalizers require user attention in a way which ordinary
         | allocation doesn't, so yes, system resources get treated
         | differently. I don't see that as a slam-dunk against them,
         | however.
        
           | o11c wrote:
           | One problem with finalizers, try-finally, try-with-resources,
           | Python-`with`, etc. is that they don't actually guarantee the
           | code will be called even in common cases.
           | 
           | In particular, any function that returns a (possibly newly-
           | constructed) open file handle has not yet informed the caller
           | that it's actually a thing to keep track of, unlike with
           | destructors which are active _immediately_ once the object 's
           | lifetime begins, and only rely on `move` being relatively
           | atomic.
        
             | kaba0 wrote:
             | How is a try-with-resources not guaranteed? At least in
             | Java, it is guaranteed to run - plus, you can get your
             | closing/destructor logic wrong in RAII languages as well.
        
               | cesarb wrote:
               | > How is a try-with-resources not guaranteed? At least in
               | Java, it is guaranteed to run
               | 
               | There's a subtle detail you have to be careful with,
               | however: if you try to allocate memory or call a method
               | between allocating your resource and actually doing the
               | try-with-resources, you might get an OutOfMemoryError or
               | a StackOverflowError, and your resource will leak. That
               | is, if you do something like:                 try
               | (MyHolder holder = new MyHolder(allocateResource())) {
               | ... }
               | 
               | You can have a leak if the memory allocation in the "new
               | MyHolder()" fails, or if there's not enough space in the
               | stack to call the MyHolder constructor.
               | 
               | > plus, you can get your closing/destructor logic wrong
               | in RAII languages as well.
               | 
               | For instance, in C++ you can accidentally put your
               | resource in a temporary which is immediately destructed
               | at the end of the line, when you wanted it to last until
               | the end of the enclosing scope. Rust makes it harder for
               | this to happen, but it's still possible.
        
               | o11c wrote:
               | I mean, it's possible to write bad code in C++. But at
               | least it's possible to write good code too. C++ makes a
               | careful sequence that's hard to get wrong and easily
               | compiler enforced:
               | 
               | * if a given subobject (field or base class)'s
               | constructor runs, its destructor is guaranteed to run.
               | 
               | * in particular, in low-level ownership classes, you can
               | arrange for field initializers to be `noexcept`, so you
               | get to run their dtor, regardless of subsequent
               | manipulation of the fields - just be sure to not assume
               | invariants from the fully-constructed main object case.
               | In most classes, deferring _all_ ownership logic to the
               | fields is simpler - rule of zero beats rule of 5. And if
               | you _do_ add manual try-catch during construction it will
               | actually work.
               | 
               | Languages other than C++ generally fail in several ways:
               | 
               | * Allow exceptions to be thrown by the runtime for
               | reasons unrelated to the code being executed
               | 
               | * No support for subobjects, forcing all objects to be
               | allocated separately and thus the runtime not knowing it
               | needs to poke them since finalizers are only applied to
               | objects not pointers. In particular, Python's
               | `contextlib.ExitStack` can _narrow_ the window in which
               | case leaks are possible, but not eliminate it.
               | 
               | Rust does slightly better than C++ by enforcing trivial
               | moves, at the cost of making many useful programs
               | impossible to write.
        
               | worik wrote:
               | Ouch!
               | 
               | > Rust does slightly better than C++ by enforcing trivial
               | moves, at the cost of making many useful programs
               | impossible to write.
               | 
               | What "useful" programs fit a useful definition of "many"?
               | 
               | It makes self referential looped data structures hard to
               | write (double linked lists, trees with pointers to
               | parents).
               | 
               | No loss. Great improvement to have fewer of those
               | 
               | It makes arbitrary allocation hard. Enforces behaviors.
               | Makes a programmer think very hard.
               | 
               | "useful programs impossible to write. " No it does not.
               | It ,ages a lot of bad ideas hard to implement, makes
               | nothing "impossible "
        
             | pjmlp wrote:
             | You use the C and C++ approach to all their design flaws
             | and reach out to a static analysis tool that covers such
             | cases.
        
           | cesarb wrote:
           | > Most languages with a garbage collector have finalizers,
           | which will clean that sort of thing up.
           | 
           | Using finalizers for cleanup of non-memory resources is bad
           | because they will only be called when there's memory
           | pressure. If you have used all of your non-memory resource,
           | but there's still plenty of free memory, the allocation of
           | that resource will fail; if you instead force a garbage
           | collection at that point, not only will you cause a pause,
           | but also you will be collecting memory while there's still
           | plenty of it available.
        
           | zozbot234 wrote:
           | Finalizers will be non-deterministic when called as part of
           | GC. One advantage of reference counting is that it _is_
           | deterministic (and yes, this means that sometimes freeing the
           | last reference that 's keeping a bunch of objects around will
           | lead to a spike of extra deallocations. Guess what, that's
           | what determinism is. It's implied by the need to free
           | resources promptly.)
        
           | PhilipRoman wrote:
           | Trusting the GC with freeing externally observable resources
           | will bring you some very painful bugs.
           | 
           | On that note, it's 2024 and we still cannot unmap a mmapped
           | file from Java in a timely manner. I really hope they do
           | something about it.
        
         | tialaramex wrote:
         | Yes, and the "Memory Safety" arguments apply to the other
         | resources too. For example Rust eventually grew I/O safety, so
         | your File Handles (in Unix, OwnedFd) or just straight up
         | Handles (in Windows, OwnedHandle) are owned objects, not just
         | integers like the number 4
         | 
         | At the surface this looks like it's just about avoiding dumb
         | mistakes like using arithmetic on handles or mistakenly using
         | reserved values as sentinels -- but the ownership model also
         | means anything tricky we're doing with handles has explicit
         | ownership, transparent to future maintainers.
        
         | pjmlp wrote:
         | Yet, a point that tends to get lost when criticising GC
         | languages, is that most of them have features for deterministic
         | resource management, that many keep failing to learn.
         | 
         | - Some do have RAII
         | 
         | - Some do offer keywords
         | 
         | - Some do arena like management, lambdas with implicit
         | management
         | 
         | - Some have a little help from the type system
         | 
         | - Some do a bit of everything listed above
         | 
         | Additionally, just like system developers have to rely on
         | static analysers, the static analysers for those languages can
         | also provide validation when something is forgotten, when the
         | type system alone isn't enough.
        
           | magicalhippo wrote:
           | My point though is that by moving memory management into
           | "don't care" territory while still, obviously, requiring
           | explicit handling of other resources, it's easier to forget
           | or miss when you need to explicitly handle something.
           | 
           | When instantiating objects in C# say I need to check the
           | documentation or the source code to see if it implements
           | IDisposable to know if I need to handle it. Lets say that for
           | a given class X in library Y, it does not. So I don't worry,
           | I just instantiate and don't do anything about the cleanup
           | because GC will handle it.
           | 
           | Later, the implementation of X changes and IDisposable is
           | added to it. I now have to change my code, and not doing so
           | could lead to serious production issues. Yet the compiler
           | happily compiles my code without any warning.
           | 
           | Sure some static analyzers might catch it, but they're not
           | perfect, and you need to run them. A stock Visual Studio 2022
           | will not complain about the above scenario for example.
           | 
           | In my experience it's much less error prone to unify memory
           | and external resource management. If instead I had been using
           | a different language which has a more uniform resource
           | handling, the above change in class X would likely not have
           | been a big deal.
           | 
           | Also, my code will already be written with resource handling
           | in mind. It can be non-trivial having to change a hierarchy
           | of classes just because a dependency deep down suddenly had
           | IDisposable added to it.
           | 
           | I guess what I'm trying to say is that I think just focusing
           | on memory management, that is to GC or not to GC, is having a
           | myopic view things. I feel it's like arguing what kind of
           | pickle to have on your burger without considering the other
           | ingredients. Sure, the pickle is a crucial ingredient, but
           | there's a lot more to it.
        
         | worik wrote:
         | Yes. But...
         | 
         | > GC only solves the application memory part,
         | 
         | Except it does not. "Solve" that is.
         | 
         | It helps, yes it does
         | 
         | But as I have learnt the hard way (silly way too, to be
         | truthful) GC will not help unless you actually delete all
         | references to the unused memory
         | 
         | Perhaps GC have developed magic moo cow properties in the
         | twenty five years since I made that discovery, but I think the
         | point remains
         | 
         | GC is very helpful, but it does not stop resource leaks
        
       | keybored wrote:
       | The article motivates RCU and then does a u-turn and starts
       | making a general argument for general-purpose GC. Not quite a
       | trojan horse but a bit whiplash provoking.
        
         | o11c wrote:
         | I definitely wouldn't call the RCU thing a GC, since at no
         | point is the object garbage. It is in one of 3 states, and
         | changes between them as quickly as possible:
         | 
         | * active
         | 
         | * obsolete but alive for old readers
         | 
         | * deallocated
         | 
         | Note that depending on how you write your code, it may be
         | possible to _reuse_ an  "obsolete-but-alive" object for a "new"
         | allocation safely, though I haven't analyzed performance for
         | this in full.
         | 
         | As usual for GC discussions, it is _very_ handwavy about when
         | you have to  "fall back to `shared\\_ptr/Arc`", when in fact
         | avoiding refcounts (either because you can prove you already
         | have ownership (which does have implications for tail calls,
         | but you shouldn't use those anyway), or by avoiding indirection
         | entirely) is the whole point of serious refcount-based systems.
         | Doing nothing at all is obviously better than GC's "do
         | something, sometime".
        
       | bckr wrote:
       | Off topic but what is going on in the picture of the leaking
       | pipe? I don't think it's AI but there's a third arm in there that
       | I don't understand. There's at least two guys in the picture but
       | I can't assign one of the arms to either of them.
        
         | mindv0rtex wrote:
         | It's definitely AI generated, that picture makes no sense.
        
         | deutschepost wrote:
         | It's three people. The guy on the right is leaning over another
         | one under the water. It's probably a picture from a flood
         | training simulator.
         | 
         | Edit:
         | https://commons.wikimedia.org/wiki/File:US_Navy_040308-N-000...
         | 
         | The picture was uploaded in 2009. Downvote if you want, but
         | there is no AI at work here.
        
       | pron wrote:
       | Except in the special case where all memory can be easily handled
       | in arenas, good tracing GCs have long ago surpassed manual memory
       | management in throughput, and more recently their latency impact
       | is more than acceptable for the vast majority of applications
       | (OpenJDK's ZGC has typical pause times measured in double/triple-
       | digit microseconds, and the worst case rarely exceeds 1ms for a
       | reasonable allocation rate -- the pauses are in the same ballpark
       | as OS-induced ones). The only real and significant tradeoff is in
       | memory footprint, and outside of specialty niches (where arenas
       | just work for everything and worst-case latency is in the low
       | microseconds range) that is the only high order question: is my
       | application running in a memory-constrained environment (or it's
       | really worth it to sacrifice other things to keep down RAM
       | consumption) or not?
        
         | znpy wrote:
         | > (OpenJDK's ZGC has typical pause times measured in
         | double/triple-digit microseconds, and the worst case rarely
         | exceeds 1ms for a reasonable allocation rate -- the pauses are
         | in the same ballpark as OS-induced ones)
         | 
         | We've been benchmarking ZGC and Shenandoah at work, and the
         | p100 pause time is usually below 500us (micro-seconds). ZGC
         | seems to be performing a bit better, as it seems to be
         | performing less pauses than Shenandoah (hence doing more
         | work/pause).
         | 
         | We still have to run tests-in-production, but so far it seems
         | that GC pauses are largely a solved issue when using ZGC (and
         | Generational ZGC since Java21).
        
           | Galanwe wrote:
           | > We've been benchmarking ZGC and Shenandoah at work, and the
           | p100 pause time is usually below 500us
           | 
           | > so far it seems that GC pauses are largely a solved issue
           | when using ZGC
           | 
           | What? 500us is abysmally slow.
           | 
           | Most projects I work on have a latency budget of less than
           | 10us, the average being 2us. That is the budget for the whole
           | wire in/wire out for a packet. Even for less latency
           | sensitive workloads, 500us is a no go for most networking
           | application.
        
             | pron wrote:
             | We're talking about occasional hiccups, not an average-case
             | response-latency overhead. You can't get _worst-case_
             | latency of 2-10us with a non-realtime kernel. Even a page
             | fault could take longer than that.
        
               | Galanwe wrote:
               | > You can't get worst-case latency of 2-10us with a non-
               | realtime kernel. Even a page fault could take longer than
               | that.
               | 
               | You obviously can, and this has nothing to do with the
               | kernel being real-time or not.
               | 
               | There is no situation I can think of where a page fault
               | should occur on a properly setup system running a
               | production networking software, meaning no swap, huge
               | TLB, and proper memory management.
        
               | pron wrote:
               | If you can think of "no situation" where a server may
               | incur a page fault, forced preemption, or need to perform
               | any I/O to a service/database, then I hope you at least
               | recognise that your world in no way represents the state
               | of server software at large because _none_ of these
               | things is true for the vast majority of server software.
               | 
               | In a former life I worked on some safety-critical onboard
               | avionics software for an ultrasonic platform, and 2us was
               | around the upper-limit worst-case latency (i.e. you'll
               | kill someone if you miss that deadline); still, it's not
               | the kind of requirements the vast majority of software
               | finds itself under.
               | 
               | When working over the internet, some of the very best
               | services are at >10ms _ping_ latency anyway, where a
               | 500us hiccup is imperceptible.
        
               | Galanwe wrote:
               | > If you can think of "no situation" where a server may
               | incur a page fault, forced preemption, or need to perform
               | any I/O to a service/database, then I hope you at least
               | recognise that your world in no way represents the state
               | of server software at large
               | 
               | I won't deny that the majority of software out there is
               | not latency sensitive, but the context of this article is
               | specifically targeting those softwares that are _not_
               | using garbage collection, arguing that it is undeservedly
               | overlooked. OP even adds that GC is a "solved problem"
               | because some GC implementation is 500us worst case
               | latency.
               | 
               | My point is that the article author, and OP, are
               | mistaken. Because if you are in the category of "I write
               | server side software without GC" (e. g. C/C++), then
               | 500us is horribly wrong.
               | 
               | Your point being that 500us is fine for most software out
               | there is surely true, but not relevant, because if that
               | is the case, you're probably not using C/C++, thus this
               | article is not targeting you.
               | 
               | In _my world_ as you phrase it, traffic is unshaped. You
               | need to be able to support line rate, otherwise packets
               | are dropped, and hell breaks loose.
        
           | pron wrote:
           | FYI, ZGC doesn't perform _any_ collection work in the the
           | stop-the-world pauses. They are only required to get all
           | mutator threads to atomically observe the increment of the
           | "GC epoch" for all threads. All actual work, both marking and
           | compaction, is done by GC threads running concurrently with
           | the mutator threads or by the mutator threads themselves as
           | they run. It is only when allocation rate is very, very high
           | that an allocating mutator thread will be paused
           | ("throttled") for a significant amount of time to allow the
           | GC to catch up by freeing up memory (and if you hit these
           | cases, then you might be better off using a throughput-
           | oriented collector).
        
             | hawk_ wrote:
             | Strangely in our workloads we have noticed generational ZGC
             | latencies are better than G1 at ~99th-99.9th percentile or
             | below but worse at percentiles above that. The allocation
             | rate is moderate.
        
               | kaba0 wrote:
               | Above that you might easily get measuring artifacts, like
               | the OS swapping out your process once, or so.
        
               | hawk_ wrote:
               | Yes - but it's quite consistent. If it was 'noise' it
               | wouldn't be.
        
             | znpy wrote:
             | > FYI, ZGC doesn't perform any collection work in the the
             | stop-the-world pauses.
             | 
             | Yeah i know, but on a certain level I don't care what it
             | does or does not do.
             | 
             | I care about my application not having latency spikes due
             | to stop the world pauses :)
        
         | flohofwoe wrote:
         | > Except in the special case ...
         | 
         | IME it's the other way around, per-object individual lifetimes
         | is a rare special case, in most real world code there will be
         | many related objects of the same or very similar lifetimes. In
         | such code, tracking individual object lifetimes is overkill (in
         | the end, memory management is all about lifetimes, and fewer
         | individual lifetimes instead of many is always better because
         | it means less work, both in manual and automatic memory
         | management).
         | 
         | Not having to think about object lifetimes is just very
         | convenient, that's why GC languages were successful despite the
         | significant under-the-hood complexity of a good garbage
         | collector.
        
           | pron wrote:
           | That's not at all a rare special case in most server
           | applications.
           | 
           | One way to see that is to consider how much of a program's
           | working set is in threads' stacks vs the heap. If the most of
           | the working set is in the heap, there's _usually_ some non-
           | trivial object lifetimes involved (i.e. cases where lifetimes
           | can be encapsulated and abstracted away from client code).
           | Yes, sometimes all of these can be taken care of by arenas
           | (and there are languages, like Zig, that strive to be very
           | arena-friendly), but that -- i.e. the case where all objects
           | are easily arena-able -- is not the more common scenario.
        
             | flohofwoe wrote:
             | Depends on the type of server I guess. I can easily imagine
             | a situation where all allocations of a request are handled
             | through a simple bump allocator, and once the request is
             | done, the bump allocator is reset instead of 'freeing' each
             | individual allocation.
        
               | cogman10 wrote:
               | This would only work for fairly trivial applications. The
               | moment you start adding things like http clients or
               | databases you have to start considering having connection
               | pools with lifetimes that don't strictly match a request
               | lifecycle.
               | 
               | Not saying such an application doesn't exist, it
               | certainly does.
        
               | ncruces wrote:
               | That "simple bump allocator" is basically allocating
               | everything on the thread's stack, as the GP mentioned.
               | 
               | It's what you put on the heap that has complex lifetimes.
               | Sometimes you can fix that with an arena. If you can't,
               | you probably can't figure out when the last reference to
               | it dies either.
        
               | jayd16 wrote:
               | How did we go from "this is the most common case" to "I
               | can imagine it?" Sure there are cases where a request can
               | be handled by the stack but the point is that the more
               | complex case is extremely common.
               | 
               | Any kind of background/asynchronous work spawned from a
               | request and your plans are shot.
        
           | cratermoon wrote:
           | In generational GCs there are two or more allocation regions.
           | New objects are put in the "younger" generation, which is
           | garbage collected separated from the other generations. This
           | sort of resolves the issue of tracking individual object
           | lifetimes by having all the short-lived objects subject to
           | rapid GC. This means most of the effort tracking lifetimes is
           | reserved for the fewer long-lived objects.
        
             | lll-o-lll wrote:
             | It's "medium term" objects that cause all the trouble.
             | Async operations. That's why .net invested heavily into
             | zero-alloc tasks, as the gc would kill scaling in async
             | heavy code.
        
           | zozbot234 wrote:
           | > IME it's the other way around, per-object individual
           | lifetimes is a rare special case
           | 
           | It depends on your application domain. But in most cases
           | where objects have "individual lifetimes" you can still use
           | reference counting, which has lower latency and memory
           | overhead than tracing GC and interacts well with manual
           | memory management. Tracing GC can then be "plugged in" for
           | very specific cases, preferably using a high performance
           | concurrent implementation much like
           | https://github.com/chc4/samsara (for Rust) or
           | https://github.com/pebal/sgcl (for C++).
        
             | pron wrote:
             | Reference counting does not have lower latency than a
             | tracing GC nor is it more generally predictable. In terms
             | of performance, it is usually worse than a modern tracing
             | GC by most performance metrics. Its benefits are, indeed,
             | lower memory footprint and that it can achieve reasonable
             | latency even using a crude implementation. In other words,
             | you'd reach for refcounting either if you mostly care about
             | footprint or if you don't have a lot of resources to invest
             | in implementing the GC.
             | 
             | Modern concurrent tracing has very little impact on
             | latency. First, compared to refcounting, the barriers are
             | rarely invoked. Second, they compact the heap concurrently
             | in the background, and like all mark-compact GCs, their
             | amortized cost drops to zero as the RAM increases (the work
             | is linear with the size of the working set, which is
             | roughly constant for a given program, but it needs to be
             | done less frequently the more RAM you have). That's why
             | high-performance languages that rely heavily on a GC --
             | Java, C#, Go -- prefer tracing.
        
               | lll-o-lll wrote:
               | > nor is it more generally predictable
               | 
               | Can you explain what you mean here? This does not match
               | my experience or intuition.
        
               | pron wrote:
               | In theory, refcounting and tracing can behave similarly
               | [1], but assuming we're speaking about their
               | implementations in the industry (rather elaborate tracing
               | GCs; rather crude refcounting GCs) then a refcounting GC
               | would do some work as soon the refcount for a particular
               | object drops to zero. When exactly that happens and how
               | much work there is to do (and sometimes, what the
               | fragmentation effect on heap is) are local properties
               | that are not very predictable. In contrast, the amount of
               | work a tracing GC needs to do when compacting is directly
               | proportional to the program's working set and the
               | frequency in which it needs to do that work is
               | proportional to the allocation rate -- both of which are
               | fairly regular and predictable global properties for a
               | given program. For stop-the-world GCs there was then the
               | matter of the highly unpredictable exact timing of a
               | large STW pause, but modern concurrent GCs don't collect
               | anything in STW pauses anymore. They only have very short
               | (sub 1ms) and constant-time pauses and no surprise
               | throttling as long as the allocation rate is within an
               | acceptable range. So all in all, you pay a fairly fixed
               | tax on the CPU (and if you want to pay less, just add
               | more RAM) and virtually no impact on latency.
               | 
               | [1]: https://www.cs.cornell.edu/courses/cs6120/2019fa/blo
               | g/unifie...
        
               | lll-o-lll wrote:
               | Thanks for responding. My experience with tracing GC at
               | scale is exclusively in the realm of .Net, and RC
               | exclusively with C++ smart pointers. That matches your
               | "sophisticated vs crude" contrast.
               | 
               | The experience with .Net is that GC impact was difficult
               | to profile and correct, and also "lumpy", although that
               | may have been before GC tuning. GC would dominate
               | performance profiling in heavy async code, but these days
               | can be corrected by value tasks and other zero alloc
               | methods.
               | 
               | For C++ style ref counting, the impact was a continuous %
               | load and simple to profile (and therefore improve).
               | Although here, the ref counting needed to be stripped
               | from the hot paths.
               | 
               | The biggest issue I've hit between the two modes though,
               | is how they behave when hitting memory limits. Tracing GC
               | appears to have an order of magnitude perf hit when
               | memory becomes scarce, while ref counting does not suffer
               | in this way. This is enough for me to personally dislike
               | tracing GC, as that failure state is particularly
               | problematic.
        
               | neonsunset wrote:
               | When you hit memory limits, .NETs GC implementation would
               | perform much more frequent, invasive and aggressive
               | collections, including LOH compaction to reduce memory
               | watermark which leads to greater GC pauses, though this
               | is rarely seen in such a bad way on modern versions with
               | e.g. SRV GC.
               | 
               | The most scaling way to address this is usually to just
               | allocate less and use valuetasks with pooling where
               | applicable (frequent asynchronous yields), I'm certain if
               | you built a .NET 8 based solution you would see user-
               | written code dominate heap allocations profile, as most
               | hot internal paths of async utilize said state machine
               | box pooling+ValueTask<T>[0] and may be entirely
               | allocation-free.
               | 
               | [0] Example: https://github.com/dotnet/runtime/blob/cc7bf
               | 831f02cad241547e...
        
               | lll-o-lll wrote:
               | > When you hit memory limits, .NETs GC implementation
               | would perform much more frequent, invasive and aggressive
               | collections, including LOH compaction to reduce memory
               | watermark which leads to greater GC pauses, though this
               | is rarely seen in such a bad way on modern versions with
               | e.g. SRV GC.
               | 
               | The trouble with server GC mode is that then there is no
               | natural back pressure. If the processing is not CPU
               | bound, then memory allocation can grow unbounded. This is
               | not something that happens with RC as, again, the GC
               | performance hit is inlined with task processing. The
               | service may not be capable of as much throughput, but it
               | doesn't take out the entire server either.
               | 
               | > The most scaling way to address this is usually to just
               | allocate less and use valuetasks with pooling where
               | applicable (frequent asynchronous yields), I'm certain if
               | you built a .NET 8 based solution you would see user-
               | written code dominate heap allocations profile, as most
               | hot internal paths of async utilize said state machine
               | box pooling+ValueTask<T>[0] and may be entirely
               | allocation-free.
               | 
               | Absolutely; I think it's relatively simple to write
               | servers that scale using modern .net; the memory
               | allocation foot-guns when dealing with asynchronous code
               | are now well understood, and tooling is good. I am
               | compressing ~15 years of experiences in that previous
               | post.
               | 
               | It's probably the case that a tracing GC is the better
               | choice for most modern applications, excepting memory
               | constrained devices (like phones), and so long as you
               | design with memory in mind.
        
               | neonsunset wrote:
               | Ah, I see where you are coming from.
               | 
               | You are correct, sustained load heap size of SRV GC has
               | been a known pain point that had been particularly
               | exacerbated after beefy Windows Server hosts fell out of
               | fashion and got replaced by 512Mi Linux containers.
               | 
               | There has been work conducted on this each release
               | throughout Core 2.1, 3.1, and then 5, 6, 7 and 8 versions
               | to make it play nicer with more constrained memory limit
               | systems.
               | 
               | The two major features that address this are Regions[0]
               | (.NET 6/7) and DATAS[1] (.NET 8). The former is enabled
               | by default everywhere except macOS and the latter is
               | available opt-in either via env var
               | DOTNET_GCDynamicAdaptationMode=1 or msbuild poperty
               | GarbageCollectionAdaptationMode: 1 (see more in [1]).
               | 
               | The latter has shown to significantly reduce sustained
               | (or, especially, idling) heap size for some particularly
               | problematic workloads (but not all, sometimes you just
               | have a lot of live objects). I definitely recommend
               | giving it a try if this is something still relevant to
               | you.
               | 
               | TLDR of what DATAS does is dynamic heap count scaling and
               | much smarter heap up/downsizing depending on allocation
               | rate/frequency and anticipated throughput impact of
               | adjusting those.
               | 
               | [0] https://itnext.io/how-segments-and-regions-differ-in-
               | decommi... / https://devblogs.microsoft.com/dotnet/put-a-
               | dpad-on-that-gc/
               | 
               | [1] https://maoni0.medium.com/dynamically-adapting-to-
               | applicatio...
        
               | bluGill wrote:
               | when the reference count can only be zero or one
               | reference counts have different performance than when it
               | can be more, and it is much easier to reason about. This
               | is most cases.
        
               | dmurray wrote:
               | This sounds intuitively true. So...what if GC languages
               | could introduce an optional annotation for an object to
               | say that only one reference to it can exist, and use that
               | as a hint to the GC?
               | 
               | I don't see how this could be implemented in a safe and
               | performant way - either you check for existing references
               | at runtime, or you risk some use-after-free bug. But
               | perhaps your project is already in a GC language and
               | you're happy with that, but just want to optimise GC for
               | this critical component. And we already have the concept
               | of "unsafe" code blocks in many languages.
               | 
               | Does anything like this exist? I Googled "smart pointers
               | in Java" but just got a bunch of mid-quality answers
               | where people explained that I'm stupid for even asking
               | this and they're unnecessary because Java manages its own
               | memory. But hasn't someone smarter thought of this?
        
               | rbehrends wrote:
               | What happens if a large std::unordered_map<std::string,
               | std::string> has its destructor called?
               | 
               | The maximum number of references is a red herring. While
               | having a RC capped at 1 allows you to elide the actual
               | reference count and makes pointer assignment cheaper, it
               | does not materially affect the primary source of latency
               | in a reference counting implementation, namely cascading
               | deletions.
        
               | bluGill wrote:
               | Don't do that. good data design and architecture is
               | always needed.
        
               | pron wrote:
               | And yet, the reason tracing GCs are chosen by virtually
               | all high-performance languages that heavily rely on GC is
               | that they've been found to be faster in practice for the
               | common workloads.
               | 
               | One of the reasons why your intuition is not so
               | straightforward is that a tracing GC needs to do no work
               | whatsoever when the number of references is zero. One of
               | the common ways to teach the basics of GC is by starting
               | out as looking at tracing and refcounting as duals:
               | refcounting needs to work to free objects, while tracing
               | works to keep them alive. If you thinking in terms of
               | what work needs to be done to promptly determine when an
               | object becomes garbage, then you're already not thinking
               | in terms of tracing, because tracing never actually needs
               | to learn about when an object becomes garbage (this isn't
               | actually true when they do reference processing, but
               | that's another story).
               | 
               | Or, if you want to think about it another way, in a
               | tracing collector there are already only two cases no
               | matter how many pointers there are to an object:
               | reachable or not, i.e. the same one and zero as in your
               | case, only there isn't even a need to ever set the
               | counter to zero.
               | 
               | However, in principle tracing and refcounting can be
               | quite similar (https://www.cs.cornell.edu/courses/cs6120/
               | 2019fa/blog/unifie...) in their behaviour, but in
               | practice _most_ refcounting GCs in industry use are
               | crude, and don 't match the performance of tracing GCs in
               | common use, which are quite sophisticated.
        
         | Thaxll wrote:
         | Pauses are kind of solved but the CPU usage for the GC is still
         | pretty high.
         | 
         | You're still at the mercy of unpredictable tail latency and
         | other corner cases.
        
           | pron wrote:
           | That goes for manual memory management -- and certainly
           | languages with a reference-counting GC, like Rust -- as well.
           | The main difference is by far footprint overhead.
        
             | flohofwoe wrote:
             | I think the important thing to understand is that reference
             | counting isn't any better (and often worse) than "regular"
             | garbage collection.
             | 
             | The point of manual memory management is to come up with
             | problem-specific strategies to avoid or at least reduce
             | dynamic memory allocation, not to insert manual
             | release/free calls for individual objects ;)
        
               | pron wrote:
               | Reference counting _is_ regular garbage collections. The
               | two broad classes of GC algorithms are tracing and
               | refcounting, and while they can converge to similar
               | behaviour, _usually_ the former optimises for throughput
               | while the latter for memory footprint; latency is similar
               | these days.
        
               | flohofwoe wrote:
               | > Reference counting is regular garbage collection.
               | 
               | ...while I agree, for many C++ and Rust coders statements
               | like this are pure heresy ;)
        
               | pjmlp wrote:
               | They should read some CS literature, the kind that is
               | used to write the compilers they rely on. :)
        
               | cesarb wrote:
               | > ...while I agree, for many C++ and Rust coders
               | statements like this are pure heresy ;)
               | 
               | It's a matter of definitions. For many people, "garbage
               | collection" refers only to tracing GC, and reference
               | counting is a separate category. In my experience, that's
               | the common usage; insisting that "reference counting is
               | formally (in some paper from the last century) also
               | defined as a form of GC" will not magically change the
               | opinions "many C++ and Rust coders" have about tracing
               | GC. In fact, I'd say that insisting on this nomenclature
               | point only weakens the whole argument; tracing GC should
               | stand on its own merits, and not on depend on some
               | nomenclature equivalence to be accepted (if quibbling
               | about nomenclature is your strongest argument, your
               | arguments are weak).
        
               | pron wrote:
               | There's no need to "change opinions". People who work on
               | GCs know that reference counting and tracing are the two
               | general GC strategies. The only people who don't think of
               | refcounting as a GC are people who simply aren't familiar
               | with GCs and how they work. If they also think
               | refcounting has lower latencies (let alone higher
               | throughput) than tracing, then they're also just wrong.
               | No one needs to "insist" on the GC nomenclature. You're
               | either familiar with it or you're not (and since most
               | people are not, they commonly make mistakes on the
               | subject). Also, given that tracing GCs are used by ~90%
               | the market, they hardly require justification anymore;
               | they've won by a large margin over the application space
               | (which constitutes most of software). However, it's nice
               | to occasionally educate those unfamiliar with the subject
               | on GC algorithms and nomenclature.
        
               | ngrilly wrote:
               | > Also, given that tracing GCs are used by ~90% the
               | market, they hardly require justification anymore;
               | they've won by a large margin over the application space
               | (which constitutes most of software).
               | 
               | Tracing GCs have clearly proven themselves and are
               | everywhere (JVM, CLR, Go, Dart, OCaml, etc.) but we can't
               | ignore that the Apple ecosystem (Swift) is using ARC.
               | That's a significant share of the "market". Python and
               | Ruby also use reference counting, but I don't think
               | anyone is considering them state-of-the-art GC.
        
               | zozbot234 wrote:
               | Except that obligate ARC ala Swift has even lower
               | throughput than obligate tracing GC. It's the worst
               | possible choice unless you _really_ care about low-
               | latency and deterministic freeing of resources (and even
               | then, using RAII for common tree-like allocation patterns
               | like Rust does will perform better).
        
               | pron wrote:
               | You're right, I should have said "languages where GC is
               | the primary means of managing heap memory are used by 90%
               | of the market" rather than focused on a specific
               | algorithm.
        
               | ngrilly wrote:
               | Yes, this is quite fascinating how GC replaced manual
               | memory management in most apps over the last 20~30 years.
        
               | dasyatidprime wrote:
               | I have to wonder whether some of this is semantic drift
               | over time or context. My recollection since undergrad (a
               | few decades ago) involves treating "garbage collection"
               | as referring to tracing garbage collection, and
               | "reference counting" as a separate mechanism. There _is_
               | still a term for the category including both, only that
               | term is not "garbage collection" but "automatic memory
               | management". But what I see nowadays is closer to what
               | you describe.
        
               | zozbot234 wrote:
               | Automatic memory management is more general than that, it
               | also includes stack allocation.
        
               | dasyatidprime wrote:
               | I agree; I meant "including" in the non-restrictive
               | sense, not "including only". Stack allocation is a
               | special case where the lifetimes are arranged in a
               | convenient way--see also escape analysis in languages
               | where stack allocation isn't explicitly supported at the
               | language level but can be added by the compiler.
        
               | PaulDavisThe1st wrote:
               | From TFA:
               | 
               | > Tools to automate the "actually freeing the memory"
               | part, like lifetimes in Rust and RAII in C++, don't solve
               | these problems. They absolutely aid correctness,
               | something else you should care deeply about, but they do
               | nothing to simplify all this machinery.
        
             | MForster wrote:
             | Rust by default doesn't do reference counting.
             | 
             | You can opt into reference counting with `std::rc::Rc`.
             | (You can even opt into mark-and-sweep GC using the `gc`
             | crate, but this isn't done much...).
        
             | mdavidn wrote:
             | Rust is more similar to C++, in that the compiler inserts
             | calls to free as variables exit scope. Runtime reference
             | counting is limited to those objects wrapped with Rc or
             | Arc.
             | 
             | I agree with pron's larger point. GC is fine for most
             | applications. It's just factually inaccurate to compare
             | Rust's memory management with languages like Python and
             | PHP.
        
             | cesarb wrote:
             | > and certainly languages with a reference-counting GC,
             | like Rust
             | 
             | It's a mistake to say that the Rust _language_ has
             | reference counting. There 's a pair of reference-counting
             | wrapper types in its standard library (Rc and Arc), or you
             | can roll your own, but there's no special support for these
             | types in the language, and their use is optional. Most of
             | the time, you won't be using these reference-counting
             | types. Box (the generic heap-allocated or "boxed" type)
             | doesn't use reference counting. String (and its several
             | specialized variants) doesn't use it. Vec (the generic
             | heap-allocated array type) doesn't use it. HashMap,
             | HashSet, BTreeMap, BTreeSet, none of them use reference
             | counting. And so on. You can write a lot of Rust code
             | without using reference counting even once.
             | 
             | What the Rust language has is just C++-style RAII: when a
             | value goes out of scope, if it implements Drop, its
             | Drop::drop is called.
        
               | cogman10 wrote:
               | > It's a mistake to say that the Rust language has
               | reference counting.
               | 
               | Having these types in the standard library is the
               | language having those types.
               | 
               | Perhaps it's not integrated to the level that a language
               | like swift is. However, I think it's reasonable to say
               | the language supports Rc when the standard library
               | supports it. I'd say the same thing about C++ with
               | `shard_ptr`.
               | 
               | Otherwise you end up in weird pedantic notions about what
               | a language has or does not have. Does C have a heap?
               | Well, technically no since malloc and free are just
               | function calls in the standard library and you can write
               | valid C programs without calling those functions.
        
               | cesarb wrote:
               | > Having these types in the standard library is the
               | language having those types.
               | 
               | It depends on whether you consider the standard library
               | an indivisible part of the language or not. For Rust,
               | it's clearly not the case, since you have the #![no_std]
               | mode in which only a subset of the standard library is
               | available, and this subset does not include these
               | reference counted wrapper types (or any heap-allocated
               | type at all).
               | 
               | > Perhaps it's not integrated to the level that a
               | language like swift is. However, I think it's reasonable
               | to say the language supports Rc when the standard library
               | supports it. I'd say the same thing about C++ with
               | `shard_ptr`.
               | 
               | It's one thing to say a language "supports reference
               | counting", which only means you can use reference
               | counting with it, and another thing to say "[...]
               | languages with a reference-counting GC", which implies
               | that the language uses a GC for everything, and that GC
               | is a reference-counting GC.
               | 
               | > Does C have a heap? Well, technically no since malloc
               | and free are just function calls in the standard library
               | and you can write valid C programs without calling those
               | functions.
               | 
               | It's actually the same thing: C can run on either a
               | "hosted" environment or a "freestanding" environment, and
               | on the later, most of the standard library is not
               | available, including malloc and free. So C does not
               | necessarily have a heap when running on a freestanding
               | environment.
        
               | arcticbull wrote:
               | It's not part of std exactly, it's part of alloc. It's
               | re-exported by std.
               | 
               | It would still be available in a #![no_std] environment
               | using `extern crate alloc`.
               | 
               | This crate generally abstracts over the concept of
               | allocation too, so relying on it doesn't require you to
               | also have an allocator - it just requires someone at some
               | point specify one with #[global_allocator]
        
               | kaba0 wrote:
               | > and their use is optional
               | 
               | It is not, if you have objects with dynamic lifetimes,
               | and allocating them for the whole duration of the program
               | is not an option.
               | 
               | Sure, their use can be much less than a managed language
               | that can only do automatic memory management, but RC is
               | objectively a worse from most perspective than tracing
               | GC, except for the fact that they don't need runtime
               | support, and a slightly lower memory overhead.
        
           | adgjlsfhk1 wrote:
           | the CPU usage of manual memory is also pretty high. it's just
           | more evenly distributed throughout the program making it
           | harder to observe.
        
         | hyperpape wrote:
         | Where are the measurements comparing throughput of tracing GCs
         | and manual memory management? I'm aware of how incredibly hard
         | this area is to measure, but it's a shame that the state of
         | mainstream discussion is "most people just assume GC implies
         | slow, but then again a handful of people say it's not."
        
           | pron wrote:
           | Given that the no-GC-by-default market is ~10% of the global
           | software market [1] with no signs of shift in either
           | direction over the past couple of decades, which sounds about
           | right to me (considering the percentage of programs that need
           | to run in memory-constrained environment or must have precise
           | control over memory), it seems that the number of those who
           | may benefit significantly from a different choice is small
           | and so it doesn't look like anyone wants or needs to be
           | convinced of anything. "GC languages" already command ~90% of
           | the market and have little to gain from such a small market
           | of potential converts, and the others aren't trying or
           | succeeding in increasing their market share, so who cares
           | given the small stakes?
           | 
           | [1]: https://www.devjobsscanner.com/blog/top-8-most-demanded-
           | prog...
        
       | teleforce wrote:
       | For promising modern and parallel GC techniques please check MPL
       | or MaPLe with its novel Automatic Management of Parallelism. It
       | won distinguished paper award in POPL 2024 and ACM SIGPLAN
       | dissertation award 2023 by proposing these two main things
       | [1],[2]:
       | 
       | a) Provably efficient parallel garbage collection based on
       | disentanglement
       | 
       | b) Provably efficient automatic granularity control
       | 
       | [1] MaPLe (MPL):
       | 
       | https://github.com/MPLLang/mpl
       | 
       | [2] Automatic Parallelism Management:
       | 
       | https://dl.acm.org/doi/10.1145/3632880
        
         | bugbuddy wrote:
         | Nice links. Thanks for posting these.
        
         | pjmlp wrote:
         | Great material, thanks for sharing.
        
         | bool3max wrote:
         | What does "provably efficient" mean?
        
         | zogrodea wrote:
         | Standard ML and the community around it has been pretty
         | impressive as far as contributions to memory management
         | literature goes.
         | 
         | There is of course the paper you linked, and there's also the
         | MLKit which was among the first users, and one of the pioneers,
         | of region-based memory management.
        
         | amelius wrote:
         | How does this compare against recent efforts in OCaml to
         | support multicore parallelism?
        
       | mgaunard wrote:
       | I do a lot of different types of systems programming.
       | 
       | Only times I actually use GC is to manage resources other than
       | memory.
        
       | celrod wrote:
       | The RCU use case is convincing, but my experience with GCs in
       | other situations has been poor. To me, this reads more like an
       | argument for bespoke memory management solutions being able to
       | yield the best performance (I agree!), which is a totally
       | different case from the more general static lifetimes generally
       | outperforming dynamic lifetimes (especially when a tracing step
       | is needed to determine liveness).
       | 
       | > Lies people believe... Calling free() gives the memory back to
       | the OS.
       | 
       | I believe calling `free()` gives the memory back to the
       | allocator, which is much better than giving it to the OS;
       | syscalls are slow. Perhaps not immediately; mimalloc only makes
       | frees available to future `malloc`s periodically.
       | 
       | Trying a simple benchmark where I allocate and then immediately
       | `free` 800 bytes, 1 million times, and counting the number of
       | unique pointers I get: glibc's malloc: 1 jemalloc: 1 mimalloc: 4
       | Julia's garbage collector: 62767
       | 
       | 62767, at about 48 MiB, isn't that bad, but it still blows out my
       | computer's L3 cache. Using a GC basically guarantees every new
       | allocation is from RAM, rather than cache. This kills performance
       | of any heavily allocating code; we don't care only about how fast
       | memory management can work, but how quickly we can worth with
       | what it gives us. I gave a benchmark in Julia showcasing this:
       | https://discourse.julialang.org/t/blog-post-rust-vs-julia-in...
       | 
       | Malloc/free gives you a chance at staying hot, if your actual
       | working memory is small enough.
       | 
       | Allocators like mimalloc are also designed (like the compacting
       | GC) to have successive allocations be close together. The 4
       | unique pointers I got from mimalloc were 896 bytes apart.
       | 
       | My opinions might be less sour if I had more experience with
       | compacting GCs, but I think GCs are just a vastly more
       | complicated solution to the problem of safe memory management
       | than something like Rust's borrow checker. Given that the
       | complexity is foisted on the compiler and runtime developers,
       | that's normally not so bad for users, and an acceptable tradeoff
       | when writing code that isn't performance sensitive. Similarly,
       | RAII with static lifetimes is also a reasonable tradeoff for code
       | not important enough for more bespoke approaches. The articles
       | example is evidently one of those deserving a more bespoke
       | solution.
        
         | chc4 wrote:
         | It's really not enough to just say that a GC gave you more
         | pointers = it has worse cache locality. Compacting GC almost
         | always has _better_ cache utilization than malloc, because heap
         | fragmentation over long-running programs will waste TLB cache
         | entries and slack space between objects. A bump allocator from
         | a compacting GC will give you a new pointer for each allocation
         | because free doesn 't reclaim the memory...but those
         | allocations will be sequentially, and if you were in the case
         | where you are churning through your heap and only ever touch
         | the most recent object they will always be in cache still.
         | Benchmarking the knock-on effects of allocators and GCs are
         | insanely difficult and I'm very skeptical of basically any
         | synthetic benchmarks like this.
        
           | hedora wrote:
           | I think the fact that it is complicated to reason about is
           | precisely why systems developers don't trust GC's.
           | 
           | It's far easier to write a threadsafe bump (slab) allocator
           | than to locate and diagnose the code that someone wrote two
           | years ago and, as of last week, started causing the GC to
           | blow up the cache, contend on a lock, fragment the heap, add
           | unbounded pauses, burn an extra cpu, etc, etc.
           | 
           | (Though, at this point, most mallocs are so good that the
           | slab allocator loses anyway and there's no need to bother.)
        
           | convolvatron wrote:
           | compaction really does help runtimes alot. but I'm not sure
           | how much it really has to do with line level locality. in
           | general we don't try to batch related objects together except
           | in a coarse form by generation.
           | 
           | I think the measurable benefit comes from page level savings,
           | both reducing the number of trips to the kernel to get zeroes
           | pages, and from reduced pressure on the tlb.
           | 
           | but I have definitely seen numbers like 20% on some workloads
           | for turning on compaction
        
           | celrod wrote:
           | FWIW, that synthetic benchmark was reflective of some real
           | world code we were deploying. Using malloc/free for one
           | function led to something like a 2x performance improvement
           | of the whole program.
           | 
           | I think it's important to differentiate between malloc
           | implementations/algorithms, just like it's important to
           | differentiate between GCs. E.g., mimalloc "shards" size
           | classes into pages, with separate free lists per page. This
           | way, subsequent allocations are all from the same page.
           | Freeing does not free eagerly; only if the entire page is
           | freed, or if we hit a new allocation and the page is empty,
           | then it can hit a periodic slow path to do deferred work.
           | https://www.microsoft.com/en-
           | us/research/uploads/prod/2019/0...
           | 
           | Good malloc implementations can also employ techniques to
           | avoid fragmentation. It's unfortunate that the defaults are
           | bad.
           | 
           | But I confess, compacting GCs and profiling the effects of
           | heap fragmentation (especially over time in long running
           | programs) are both things I lack experience in.
           | Microbenchmarks are unlikely to capture that accurately.
        
         | louthy wrote:
         | This makes no sense to me. In a generational GC gen-0 is more
         | likely than not to be cached -- and that's where the 'churn'
         | is. Outside of that, any longer lived allocations are by
         | definition not easy to control cache-wise.
         | 
         | Locality is one of the big wins for GCs, the only issue I'm
         | aware of is the 'stop the world' mark/sweep (yes, I know modern
         | GCs have a background thread -- but you still get stop-the-
         | world events afaik)
        
         | darby_eight wrote:
         | If cache usage is that major of a concern, arena allocation
         | works just as well as it does with manual memory allocation.
         | Thankfully there aren't too many areas where garbage collection
         | has to compete with such conveniently contrived examples.
        
         | MichaelMoser123 wrote:
         | > I believe calling `free()` gives the memory back to the
         | allocator, which is much better than giving it to the OS
         | 
         | Having to deal with memory fragmentation in long running
         | servers is no fun at all, especially internal fragmentation of
         | pages maintained by slab allocators.
         | 
         | this is not a very common problem, but it is a hard one to deal
         | with.
        
         | arcticbull wrote:
         | The post explains why this works in the RCU context, why it
         | sucks in general, and then just writes it off and ignores it.
         | 
         | > At this point, some folks fire back with non-arguments about
         | how this isn't "real" garbage collection. Like, uh, because you
         | manually mark the garbage!
         | 
         | Yeah. People's concerns are that the process of figuring out
         | what memory is not longer used is inefficient and non-
         | deterministic relative to simply telling the allocator when
         | you're done with a resource. I've never met someone who's been
         | concerned with deferring deallocation.
         | 
         | Sure traversing the whole live set is rare and we've spent 30
         | years tweaking GC algorithms to make them better, and now
         | they're practically sentient. However this statement either
         | willfully or unintentionally writes off the thing people
         | actually have an issue with. If you run into GC issues in your
         | services you have to bring in a shaman to tweak things here and
         | there hoping it sends the angry spirits back to the shadow
         | realm.
         | 
         | If you're just marking the garbage and being notified when it's
         | no longer used, that entire process is gone.
         | 
         | Yes, it can be very fast to allocate memory in a GC. This
         | ignores the cost of marking and compaction that actually need
         | to be amortized in to get a fair comparison.
         | 
         | The other big issue people have with GC is that in general it
         | requires _significantly_ more memory than manual memory
         | management to achieve equivalent performance. And you have to
         | have a bunch of extra CPU power to throw at redundantly
         | checking if things are still referenced over and over. And you
         | have to be okay with a bunch of extra memory copies for
         | optimistic compaction.
         | 
         | Finally the post criticizes manual memory management (Rust's
         | Arc/Rc) as being necessary when you have unclear lifetimes -
         | but ignores that you basically build the exact same
         | infrastructure in GC'd languages to close external resources as
         | you can't rely on finalizers ever being called.
         | 
         | Anyways this has been beaten to death for the last 20-30 years
         | and this article doesn't seem to bring anything new to the
         | table besides ignoring the legitimate concerns of a GC using
         | memes, which is fine because memes are fun IMO.
         | 
         | The correct answer is exactly what you say - there is no
         | general correct answer. You use the tool appropriate to the job
         | to meet the design constraints of the system.
        
         | kazinator wrote:
         | free() gives back memory to your local POSIX. :)
        
         | osigurdson wrote:
         | >> My opinions might be less sour if I had more experience with
         | compacting GCs
         | 
         | I have quite a bit of experience with the GC in .NET. For
         | projects that deal with large data structures, the GC is
         | something that you are always thinking about though it's
         | behavior is conceptually transparent. I think I would
         | ultimately prefer a more explicit approach.
        
       | samatman wrote:
       | This skirts around the edge of an observation which I want to
       | dive into, which is that modern user OSes (anything which isn't a
       | specialized RTOS) has built-in garbage collection. We just don't
       | call it that: we just call it memory management. What do we call
       | languages with a built in GC? Memory-managed languages!
       | 
       | You see this in a lot of older "top-to-bottom" C programs: they
       | allocate, they clean up system resources (using longjmp to one
       | label), but they don't bother with free. When the program exits
       | the OS gets all that memory back anyway, so why bother?
       | 
       | There's a missed opportunity here, to have an OS with a garbage
       | collector with less isolation from programs, one which handles
       | resources more like a language's runtime GC. It will probably
       | stay missed, because in the typical GCed language, the GC is
       | intricately built into practically every line of the runtime, so
       | it's not really practical to make a distribution of that language
       | for one OS which hands that control over to the operating system.
       | 
       | But it's a pity, because there's a lot of room to improve some of
       | the chronic problems we see from this artificial isolation of
       | program-level memory management from OS level.
        
         | flohofwoe wrote:
         | > You see this in a lot of older "top-to-bottom" C programs:
         | they allocate, they clean up system resources (using longjmp to
         | one label), but they don't bother with free. When the program
         | exits the OS gets all that memory back anyway, so why bother?
         | 
         | You don't need to bother with releasing other types of
         | resources either though (files, sockets, threads, ...), since
         | the operating system will take care of that on process exit
         | (unless you are on AmigaOS). The only reason to free memory is
         | to recycle that memory in other allocations in long running
         | applications without grabbing new memory from the OS. For one-
         | shot command line tools it's usually not needed.
        
         | kaba0 wrote:
         | In Java, the Epsilon GC is just that.
        
         | ninkendo wrote:
         | The OS only knows it can free memory when your process exits
         | (same as file handles and other resources.) If your process is
         | designed to exit once it's done its job, you _can_ use the OS
         | as a garbage collector.
         | 
         | Having an operating system know when memory is unused _within
         | your running program_ is not something that has ever existed
         | though (except for perhaps some esoteric research OS 's.) I
         | wouldn't say we're missing an opportunity, because the thing
         | we're missing doesn't exist in any meaningful sense. On the
         | other hand, a programming methodology that uses very simple,
         | short-lived programs is a totally legitimate way to do
         | things... it's how CLI tools and the scripting languages that
         | script them work, it's how web servers used to work (with
         | CGI/etc), and it's a perfectly reasonable approach even today.
        
       | netbioserror wrote:
       | I use Nim in production. It defaults to RC. The biggest benefit
       | of runtime automatic memory management that rarely gets
       | mentioned: You can easily eliminate almost all memory semantics
       | from a typical server-side program. My code is 99.9% business
       | logic, with the 0.1% being a couple procedures which interface to
       | a C library.
       | 
       | Hardcore manual memory people seem to have completely misaligned
       | priorities to real-world concerns. Maintainability, safety,
       | productivity, and iteration speed seem to be bottom-of-list to
       | the egotistic holy grail be being able to say they write their
       | own memory routines. If they're not in an embedded context, I'm
       | really unsure why anyone paying them to write code should care
       | what they have to say. They're not talking about increasing
       | robustness and reducing costs, so what the hell are they even
       | rambling about? I've had too many of these debates in-person
       | face-to-face with people who can't match my ability to deliver.
       | 
       | The vast majority of us are not smarter than the compiler, and
       | are not smarter than automatic MM routines. Those who are write
       | compilers and GCs. They don't work on the same programs we all
       | write, just memory managed. It's the worst-kept secret that the
       | remaining contexts where manual management is used often have the
       | worst-maintained spaghetti codebases, leading to disasters,
       | whistleblown scandles, and hush-hush covered catastrophes waiting
       | in the wings while people get the hell out of dodge before they
       | blow. It's all duct tape and prayers. Having had to inspect and
       | translate procedures from a deliberately obfuscated spaghetti C
       | codebase, my position on this is going to be hard to budge.
       | Experience is an unbeatable Bayesian prior.
        
         | EatFlamingDeath wrote:
         | How's Nim in production? I really enjoyed working with the
         | language and I've been bitten too many times by Python's
         | package management. Do you think it's suitable for scripting
         | too?
        
           | netbioserror wrote:
           | I use it in a context it's extremely well suited for: As a
           | CLI executable invoked by other server-side scripts. It's a
           | program that does lots of mathematical and statistical data
           | processing, along with HTML report generation. Like I said, I
           | use the default RC and don't even think about memory. The
           | type system is very simple; there is only one nominal string
           | type, for example, instead of the 12 in Rust or C++, and it's
           | a fleshed-out string type that can be mutable or immutable. I
           | also take big advantage of its preference for functional-
           | style referential transparency with only local mutations. I
           | have only a single module that could be construed as "OO".
           | Oh, and the module system works exactly like you probably
           | intuit a module system should, so you probably already know
           | how to use it.
           | 
           | If you want to script with Nim, it's actually quite nice; it
           | has a "run" command which compiles and runs without
           | depositing artifacts in the run directory, and has an
           | incredibly robust standard library, comparable to that of
           | Python.
           | 
           | I have no experience making a live always-running Nim
           | application, so I can't speak to that. But in the context I
           | use it, it's incredible.
        
             | cb321 wrote:
             | > I have no experience making a live always-running Nim
             | application, so I can't speak to that. But in the context I
             | use it, it's incredible.
             | 
             | I have done this several ways quite successfully. Firstly,
             | instead of "%cpu" I tend to measure "10s..100s of parts per
             | billion" CPU from a like ~100 lines of code "cron library"
             | that I use instead of system cron demons:
             | https://github.com/c-blake/cron -- seeing 40 ppb on one box
             | and 73 ppb on another at the moment.
             | 
             | Another example might be
             | https://github.com/c-blake/bu/blob/main/doc/dirq.md which
             | is a kind of ad hoc demon to monitor however many
             | directories on Linux with inotify and then launch user-
             | commands against dropped in files. This can be a sort of
             | Plan 9 "plumb" style thing. E.g., one of the directories I
             | monitor is a browser download directory. So, one click to
             | save and them boom - a potential cascade of activity.
             | (EDTI: This is clocking in at about 9572177ns/(48*3600s) =~
             | 55 parts per billion for me right now.)
             | 
             | As a final example, I got annoyed with the unwanted
             | complexity of modern syslog junk. So, I whipped up
             | https://github.com/c-blake/kslog which just looking it up
             | for this post, according to /proc/PID/schedstat has only
             | accumulated about 27357590/(24*3600+24*60) =~ 311 parts per
             | billion of 1 CPU on a 4-core system since boot about
             | 24h:24m ago. This in about 25% of the lines of Nim code
             | that busybox spends on less useful C code (to your point of
             | "almost all the code is the actual logic, not other junk").
             | 
             | Also, while it is not as full-featured as stdlib `string`,
             | there is a zero-copy `cligen/mslice.MSlice` type (https://g
             | ithub.com/c-blake/cligen/blob/master/cligen/mslice....)
             | that does have basic features like splitting and number
             | parsing. There are probably others out in the Nimbleverse.
             | If view types ever migrate from experimental status to
             | relied-upon that might become less interesting.
             | 
             | Since you do a lot of statistics, you might also appreciate
             | the fmtUncertain* subsystem of https://github.com/c-blake/c
             | ligen/blob/master/cligen/strUt.n... which allows you to
             | retain only meaningful number of digits and also
             | https://github.com/c-blake/adix/blob/master/adix/mvstat.nim
             | which has efficient moving quantiles via a Fenwick Tree
             | logarithmic histogram.
        
           | netbioserror wrote:
           | I should also mention: Be careful analyzing forum or example
           | Nim code. A LOT of Nim programmers are coming from the C or
           | C++ side of things and trying to use it like C or C++. You
           | can write most Nim programs essentially like Python. Keep it
           | clean, keep it simple. Use iterators, use map(), use
           | universal function call syntax, use the full semantic power
           | of sequences (very comparable to Python lists).
        
         | zozbot234 wrote:
         | Most systems programming languages don't have fully manual
         | memory management these days - they use RAII. Manual
         | deallocation is possible, but not used often.
        
         | CyberDildonics wrote:
         | _Hardcore manual memory people seem to have completely
         | misaligned priorities to real-world concerns. Maintainability,
         | safety, productivity, and iteration speed seem to be bottom-of-
         | list to the egotistic holy grail be being able to say they
         | write their own memory routines._
         | 
         | I think you've gotten very worked up over a theoretical
         | boogieman person that doesn't exist. Most natively compiled
         | programs are written in C++ with Rust as a distant second and
         | both put a lot of effort into the selling point that you don't
         | have to manually manage memory the same way you do with C and
         | can do with ownership. I've never seen anyone like you're
         | describing.
        
           | netbioserror wrote:
           | I have met people like that. Sorry.
        
       | atum47 wrote:
       | I once enabled garbage collection for this software I was writing
       | and it collected everything.
        
         | ervine wrote:
         | Sprayed spot remover on my dog. Next day he was gone.
        
       | mwkaufma wrote:
       | (1) the pivot from rcu to general purpose tracing gcs is bait-
       | and-switch.
       | 
       | (2) Manual memory management is more than just malloc/free calls
       | -- it's about layout (e.g. struct-of-arrays, inlining, implicit
       | offsets, packing, etc)
        
         | kaba0 wrote:
         | I disagree with 2 being manual memory management. There is
         | definitely a lack of control in contemporary managed languages
         | (though it's also far from perfect in low-level languages) for
         | memory layout, but there are definitely ways to affect it.
        
           | cb321 wrote:
           | I agree with your disagreement. As a case in point, Nim has
           | packed bitfields but various choices in automatic memory
           | management. As a concrete example, a spell-check custom-data
           | store uses them here: https://github.com/c-blake/suggest/blob
           | /04e313f8f8d3adf4cb55... (there the memory is in a memory
           | mapped file and so that code has to manage the space itself..
           | so, maybe not the perfect example).
           | 
           | But I also agree there tends to be a correlation in PLang
           | design in avoiding both low-level memory layout and in manual
           | memory management. But it's just a tendency, not fundamental.
        
       | PaulDavisThe1st wrote:
       | I quoted this in another comment here, but just to highlight one
       | of the best couple of sentences in TFA:
       | 
       | > Tools to automate the "actually freeing the memory" part, like
       | lifetimes in Rust and RAII in C++, don't solve these problems.
       | They absolutely aid correctness, something else you should care
       | deeply about, but they do nothing to simplify all this machinery.
        
       | bsdpufferfish wrote:
       | Why are they always trying to sell these things to audiences who
       | are not interested?
        
       | HippoBaro wrote:
       | For the kind of software I write there are two cases: (1) the hot
       | path for which I will always have custom allocators and avoid
       | allocations and (2) everything else.
       | 
       | For (1) GC or not it doesn't make a difference, I'll opt-out. For
       | (2) GC is really convenient and correct.
        
         | leapis wrote:
         | Agreed- I come from a Java/C++ shop where we tried to tackle
         | this dichotomy with interop but it ended up causing more
         | problems than it solved. A lot of the work that Java has done
         | with modern garbage collectors is impressive, but even they
         | admit (indirectly, via Valhalla) that no/low-alloc code has
         | it's place.
        
       | musicale wrote:
       | The tricky part is identifying which systems programmers can be
       | garbage collected and when.
        
         | GeorgeTirebiter wrote:
         | Soylent Green was set in 2022/2023
         | https://en.wikipedia.org/wiki/Soylent_Green?useskin=vector
        
       | worik wrote:
       | Not mentioned in this article is one thing that goes very well
       | with GC is async/await
       | 
       | I am a async/await hater for personal Idiosyncratic style reasons
       | that I will not bore you with
       | 
       | I use it a lot in Type/Java script. Have done so in Dart. Works
       | as it should
       | 
       | I have used it in Rust. IMO it is a disaster there. Shoehorning
       | the sort of memory management required to use asyc/await with a
       | multi threaded runtime is a hellscape
       | 
       | https://doc.rust-lang.org/std/pin/index.html
        
       ___________________________________________________________________
       (page generated 2024-03-30 23:00 UTC)