[HN Gopher] Garbage Collection for Systems Programmers (2023)
___________________________________________________________________
Garbage Collection for Systems Programmers (2023)
Author : ingve
Score : 246 points
Date : 2024-03-30 11:03 UTC (11 hours ago)
(HTM) web link (bitbashing.io)
(TXT) w3m dump (bitbashing.io)
| spintin wrote:
| Or you just use static atomic variables no?
|
| I mean it wont solve your race conditions but it also wont crash.
|
| And it will be VERY fast because parallelism without cache
| misses!
|
| I wish there was a Java with C struct JNI access.
| bugbuddy wrote:
| This is false. Depending on the CPU Arch, updating a shared
| static atomic variable may cause cache invalidation. How fast
| depends on how hot and tight you critical sections are.
| yvdriess wrote:
| Saying this as a big hardware atomics enjoyer: Static atomic
| variables are not fast, are not parallel and definitely are the
| worst kind of cache misses.
|
| Atomics are also atomic only on that one single operation, such
| as manipulating a single number in a struct or swapping out a
| pointer from one version of an object to another. To atomically
| expose a set of changes to a datastructure, you either use a
| lock or swap the old for the new, which is the driving example
| in the article.
| neonsunset wrote:
| It's called C# (supporting plain C structs is just a small part
| of its low-level features)
| pjmlp wrote:
| Or D, available as official fronted language on GCC and LLVM
| projects, :)
| magicalhippo wrote:
| A point that seems to get lost in many of the pro-GC articles,
| this one included from what I can see, is that memory is just one
| kind of resource.
|
| Correct code, especially in systems programming, will need to
| manage external resources as well, be it file handles, sockets or
| whatnot. GC only solves the application memory part, thus doesn't
| help at all for handling those external resources. In fact it can
| make it much more complicated, just look at what it takes to
| correctly implement a non-trivial IDispose in .Net.
|
| Other approaches like RAII or reference counting makes it much
| easier in my experience to handle both memory and external
| resources in a unified way, thus making it easier to write
| correct code and reason about it.
|
| That said, I'm not blatantly against GC's. It's a tool and it has
| some pros and cons, like everything else. The "manual GC" RCU
| approach mentioned in the article is interesting for certain
| tasks.
| zwieback wrote:
| So true, moving from C++ to mostly C# I'm liking memory
| management by GC but hating tracking file handles, sockets,
| etc. RAII is something I really appreciated
| neonsunset wrote:
| (SafeFileHandle is internally reference counted, but yes,
| worst case when you forget to .Dispose it, it waits in the
| finalizer queue for the GC to trigger the finalizer thread to
| process the items in it)
| pjmlp wrote:
| Use Roslyn analysis that errors when using is forgotten in an
| IDisposable type, for example.
|
| By the way, in modern .NET, using makes use of structural
| typing. Any class or struct, with a Dispose() method can be
| referred to, no need to implement the interface, and can also
| be retrofitted via extension methods.
| osigurdson wrote:
| This is strange advice. Why not just implement the
| IDisposable interface like normal C# code? Using extension
| methods for this is strange as well. Doing this even
| somewhat correctly would mean creating another class that
| implements IDisposable which can only work with public
| members of the original. In general best not to do weird
| stuff for no reason imo.
| pron wrote:
| There is a big difference between memory and other resources,
| and that is that memory -- like processing -- is fundamental
| for any computation; not for nothing do most theoretical models
| of computation assume unbounded memory. Very few languages
| require manual allocation of processing -- even though that is
| done in some software like OS kernels and hard realtime
| applications -- and for similar reasons automatic memory
| management is very useful for abstracting computation, i.e. not
| leaking their details of memory by a subroutine to its clients
| just as this is rarely done for CPU use.
|
| So while every non-trivial computation involves some non-
| constant amount of processing and memory, I/O is usually done
| at the edges of the system. Management of I/O is very
| important, of course, but it's not as central to the notion of
| computation (and therefore to abstracting computation) as
| processing and memory.
| samatman wrote:
| > _GC only solves the application memory part, thus doesn 't
| help at all for handling those external resources._
|
| This is far from entirely true. Most languages with a garbage
| collector have finalizers, which will clean that sort of thing
| up. Generally, and unlike memory allocation, one can call those
| finalizers from within user code, so as not to have to rely on
| the GC running.
|
| The distinction between reference counting and garbage
| collection is an artificial one. Reference counting is an
| approach to garbage collection, one with different tradeoffs
| from more concept-central algorithms for GC. I agree with you
| that it's a more unified approach to resource management,
| finalizers require user attention in a way which ordinary
| allocation doesn't, so yes, system resources get treated
| differently. I don't see that as a slam-dunk against them,
| however.
| o11c wrote:
| One problem with finalizers, try-finally, try-with-resources,
| Python-`with`, etc. is that they don't actually guarantee the
| code will be called even in common cases.
|
| In particular, any function that returns a (possibly newly-
| constructed) open file handle has not yet informed the caller
| that it's actually a thing to keep track of, unlike with
| destructors which are active _immediately_ once the object 's
| lifetime begins, and only rely on `move` being relatively
| atomic.
| kaba0 wrote:
| How is a try-with-resources not guaranteed? At least in
| Java, it is guaranteed to run - plus, you can get your
| closing/destructor logic wrong in RAII languages as well.
| cesarb wrote:
| > How is a try-with-resources not guaranteed? At least in
| Java, it is guaranteed to run
|
| There's a subtle detail you have to be careful with,
| however: if you try to allocate memory or call a method
| between allocating your resource and actually doing the
| try-with-resources, you might get an OutOfMemoryError or
| a StackOverflowError, and your resource will leak. That
| is, if you do something like: try
| (MyHolder holder = new MyHolder(allocateResource())) {
| ... }
|
| You can have a leak if the memory allocation in the "new
| MyHolder()" fails, or if there's not enough space in the
| stack to call the MyHolder constructor.
|
| > plus, you can get your closing/destructor logic wrong
| in RAII languages as well.
|
| For instance, in C++ you can accidentally put your
| resource in a temporary which is immediately destructed
| at the end of the line, when you wanted it to last until
| the end of the enclosing scope. Rust makes it harder for
| this to happen, but it's still possible.
| o11c wrote:
| I mean, it's possible to write bad code in C++. But at
| least it's possible to write good code too. C++ makes a
| careful sequence that's hard to get wrong and easily
| compiler enforced:
|
| * if a given subobject (field or base class)'s
| constructor runs, its destructor is guaranteed to run.
|
| * in particular, in low-level ownership classes, you can
| arrange for field initializers to be `noexcept`, so you
| get to run their dtor, regardless of subsequent
| manipulation of the fields - just be sure to not assume
| invariants from the fully-constructed main object case.
| In most classes, deferring _all_ ownership logic to the
| fields is simpler - rule of zero beats rule of 5. And if
| you _do_ add manual try-catch during construction it will
| actually work.
|
| Languages other than C++ generally fail in several ways:
|
| * Allow exceptions to be thrown by the runtime for
| reasons unrelated to the code being executed
|
| * No support for subobjects, forcing all objects to be
| allocated separately and thus the runtime not knowing it
| needs to poke them since finalizers are only applied to
| objects not pointers. In particular, Python's
| `contextlib.ExitStack` can _narrow_ the window in which
| case leaks are possible, but not eliminate it.
|
| Rust does slightly better than C++ by enforcing trivial
| moves, at the cost of making many useful programs
| impossible to write.
| worik wrote:
| Ouch!
|
| > Rust does slightly better than C++ by enforcing trivial
| moves, at the cost of making many useful programs
| impossible to write.
|
| What "useful" programs fit a useful definition of "many"?
|
| It makes self referential looped data structures hard to
| write (double linked lists, trees with pointers to
| parents).
|
| No loss. Great improvement to have fewer of those
|
| It makes arbitrary allocation hard. Enforces behaviors.
| Makes a programmer think very hard.
|
| "useful programs impossible to write. " No it does not.
| It ,ages a lot of bad ideas hard to implement, makes
| nothing "impossible "
| pjmlp wrote:
| You use the C and C++ approach to all their design flaws
| and reach out to a static analysis tool that covers such
| cases.
| cesarb wrote:
| > Most languages with a garbage collector have finalizers,
| which will clean that sort of thing up.
|
| Using finalizers for cleanup of non-memory resources is bad
| because they will only be called when there's memory
| pressure. If you have used all of your non-memory resource,
| but there's still plenty of free memory, the allocation of
| that resource will fail; if you instead force a garbage
| collection at that point, not only will you cause a pause,
| but also you will be collecting memory while there's still
| plenty of it available.
| zozbot234 wrote:
| Finalizers will be non-deterministic when called as part of
| GC. One advantage of reference counting is that it _is_
| deterministic (and yes, this means that sometimes freeing the
| last reference that 's keeping a bunch of objects around will
| lead to a spike of extra deallocations. Guess what, that's
| what determinism is. It's implied by the need to free
| resources promptly.)
| PhilipRoman wrote:
| Trusting the GC with freeing externally observable resources
| will bring you some very painful bugs.
|
| On that note, it's 2024 and we still cannot unmap a mmapped
| file from Java in a timely manner. I really hope they do
| something about it.
| tialaramex wrote:
| Yes, and the "Memory Safety" arguments apply to the other
| resources too. For example Rust eventually grew I/O safety, so
| your File Handles (in Unix, OwnedFd) or just straight up
| Handles (in Windows, OwnedHandle) are owned objects, not just
| integers like the number 4
|
| At the surface this looks like it's just about avoiding dumb
| mistakes like using arithmetic on handles or mistakenly using
| reserved values as sentinels -- but the ownership model also
| means anything tricky we're doing with handles has explicit
| ownership, transparent to future maintainers.
| pjmlp wrote:
| Yet, a point that tends to get lost when criticising GC
| languages, is that most of them have features for deterministic
| resource management, that many keep failing to learn.
|
| - Some do have RAII
|
| - Some do offer keywords
|
| - Some do arena like management, lambdas with implicit
| management
|
| - Some have a little help from the type system
|
| - Some do a bit of everything listed above
|
| Additionally, just like system developers have to rely on
| static analysers, the static analysers for those languages can
| also provide validation when something is forgotten, when the
| type system alone isn't enough.
| magicalhippo wrote:
| My point though is that by moving memory management into
| "don't care" territory while still, obviously, requiring
| explicit handling of other resources, it's easier to forget
| or miss when you need to explicitly handle something.
|
| When instantiating objects in C# say I need to check the
| documentation or the source code to see if it implements
| IDisposable to know if I need to handle it. Lets say that for
| a given class X in library Y, it does not. So I don't worry,
| I just instantiate and don't do anything about the cleanup
| because GC will handle it.
|
| Later, the implementation of X changes and IDisposable is
| added to it. I now have to change my code, and not doing so
| could lead to serious production issues. Yet the compiler
| happily compiles my code without any warning.
|
| Sure some static analyzers might catch it, but they're not
| perfect, and you need to run them. A stock Visual Studio 2022
| will not complain about the above scenario for example.
|
| In my experience it's much less error prone to unify memory
| and external resource management. If instead I had been using
| a different language which has a more uniform resource
| handling, the above change in class X would likely not have
| been a big deal.
|
| Also, my code will already be written with resource handling
| in mind. It can be non-trivial having to change a hierarchy
| of classes just because a dependency deep down suddenly had
| IDisposable added to it.
|
| I guess what I'm trying to say is that I think just focusing
| on memory management, that is to GC or not to GC, is having a
| myopic view things. I feel it's like arguing what kind of
| pickle to have on your burger without considering the other
| ingredients. Sure, the pickle is a crucial ingredient, but
| there's a lot more to it.
| worik wrote:
| Yes. But...
|
| > GC only solves the application memory part,
|
| Except it does not. "Solve" that is.
|
| It helps, yes it does
|
| But as I have learnt the hard way (silly way too, to be
| truthful) GC will not help unless you actually delete all
| references to the unused memory
|
| Perhaps GC have developed magic moo cow properties in the
| twenty five years since I made that discovery, but I think the
| point remains
|
| GC is very helpful, but it does not stop resource leaks
| keybored wrote:
| The article motivates RCU and then does a u-turn and starts
| making a general argument for general-purpose GC. Not quite a
| trojan horse but a bit whiplash provoking.
| o11c wrote:
| I definitely wouldn't call the RCU thing a GC, since at no
| point is the object garbage. It is in one of 3 states, and
| changes between them as quickly as possible:
|
| * active
|
| * obsolete but alive for old readers
|
| * deallocated
|
| Note that depending on how you write your code, it may be
| possible to _reuse_ an "obsolete-but-alive" object for a "new"
| allocation safely, though I haven't analyzed performance for
| this in full.
|
| As usual for GC discussions, it is _very_ handwavy about when
| you have to "fall back to `shared\\_ptr/Arc`", when in fact
| avoiding refcounts (either because you can prove you already
| have ownership (which does have implications for tail calls,
| but you shouldn't use those anyway), or by avoiding indirection
| entirely) is the whole point of serious refcount-based systems.
| Doing nothing at all is obviously better than GC's "do
| something, sometime".
| bckr wrote:
| Off topic but what is going on in the picture of the leaking
| pipe? I don't think it's AI but there's a third arm in there that
| I don't understand. There's at least two guys in the picture but
| I can't assign one of the arms to either of them.
| mindv0rtex wrote:
| It's definitely AI generated, that picture makes no sense.
| deutschepost wrote:
| It's three people. The guy on the right is leaning over another
| one under the water. It's probably a picture from a flood
| training simulator.
|
| Edit:
| https://commons.wikimedia.org/wiki/File:US_Navy_040308-N-000...
|
| The picture was uploaded in 2009. Downvote if you want, but
| there is no AI at work here.
| pron wrote:
| Except in the special case where all memory can be easily handled
| in arenas, good tracing GCs have long ago surpassed manual memory
| management in throughput, and more recently their latency impact
| is more than acceptable for the vast majority of applications
| (OpenJDK's ZGC has typical pause times measured in double/triple-
| digit microseconds, and the worst case rarely exceeds 1ms for a
| reasonable allocation rate -- the pauses are in the same ballpark
| as OS-induced ones). The only real and significant tradeoff is in
| memory footprint, and outside of specialty niches (where arenas
| just work for everything and worst-case latency is in the low
| microseconds range) that is the only high order question: is my
| application running in a memory-constrained environment (or it's
| really worth it to sacrifice other things to keep down RAM
| consumption) or not?
| znpy wrote:
| > (OpenJDK's ZGC has typical pause times measured in
| double/triple-digit microseconds, and the worst case rarely
| exceeds 1ms for a reasonable allocation rate -- the pauses are
| in the same ballpark as OS-induced ones)
|
| We've been benchmarking ZGC and Shenandoah at work, and the
| p100 pause time is usually below 500us (micro-seconds). ZGC
| seems to be performing a bit better, as it seems to be
| performing less pauses than Shenandoah (hence doing more
| work/pause).
|
| We still have to run tests-in-production, but so far it seems
| that GC pauses are largely a solved issue when using ZGC (and
| Generational ZGC since Java21).
| Galanwe wrote:
| > We've been benchmarking ZGC and Shenandoah at work, and the
| p100 pause time is usually below 500us
|
| > so far it seems that GC pauses are largely a solved issue
| when using ZGC
|
| What? 500us is abysmally slow.
|
| Most projects I work on have a latency budget of less than
| 10us, the average being 2us. That is the budget for the whole
| wire in/wire out for a packet. Even for less latency
| sensitive workloads, 500us is a no go for most networking
| application.
| pron wrote:
| We're talking about occasional hiccups, not an average-case
| response-latency overhead. You can't get _worst-case_
| latency of 2-10us with a non-realtime kernel. Even a page
| fault could take longer than that.
| Galanwe wrote:
| > You can't get worst-case latency of 2-10us with a non-
| realtime kernel. Even a page fault could take longer than
| that.
|
| You obviously can, and this has nothing to do with the
| kernel being real-time or not.
|
| There is no situation I can think of where a page fault
| should occur on a properly setup system running a
| production networking software, meaning no swap, huge
| TLB, and proper memory management.
| pron wrote:
| If you can think of "no situation" where a server may
| incur a page fault, forced preemption, or need to perform
| any I/O to a service/database, then I hope you at least
| recognise that your world in no way represents the state
| of server software at large because _none_ of these
| things is true for the vast majority of server software.
|
| In a former life I worked on some safety-critical onboard
| avionics software for an ultrasonic platform, and 2us was
| around the upper-limit worst-case latency (i.e. you'll
| kill someone if you miss that deadline); still, it's not
| the kind of requirements the vast majority of software
| finds itself under.
|
| When working over the internet, some of the very best
| services are at >10ms _ping_ latency anyway, where a
| 500us hiccup is imperceptible.
| Galanwe wrote:
| > If you can think of "no situation" where a server may
| incur a page fault, forced preemption, or need to perform
| any I/O to a service/database, then I hope you at least
| recognise that your world in no way represents the state
| of server software at large
|
| I won't deny that the majority of software out there is
| not latency sensitive, but the context of this article is
| specifically targeting those softwares that are _not_
| using garbage collection, arguing that it is undeservedly
| overlooked. OP even adds that GC is a "solved problem"
| because some GC implementation is 500us worst case
| latency.
|
| My point is that the article author, and OP, are
| mistaken. Because if you are in the category of "I write
| server side software without GC" (e. g. C/C++), then
| 500us is horribly wrong.
|
| Your point being that 500us is fine for most software out
| there is surely true, but not relevant, because if that
| is the case, you're probably not using C/C++, thus this
| article is not targeting you.
|
| In _my world_ as you phrase it, traffic is unshaped. You
| need to be able to support line rate, otherwise packets
| are dropped, and hell breaks loose.
| pron wrote:
| FYI, ZGC doesn't perform _any_ collection work in the the
| stop-the-world pauses. They are only required to get all
| mutator threads to atomically observe the increment of the
| "GC epoch" for all threads. All actual work, both marking and
| compaction, is done by GC threads running concurrently with
| the mutator threads or by the mutator threads themselves as
| they run. It is only when allocation rate is very, very high
| that an allocating mutator thread will be paused
| ("throttled") for a significant amount of time to allow the
| GC to catch up by freeing up memory (and if you hit these
| cases, then you might be better off using a throughput-
| oriented collector).
| hawk_ wrote:
| Strangely in our workloads we have noticed generational ZGC
| latencies are better than G1 at ~99th-99.9th percentile or
| below but worse at percentiles above that. The allocation
| rate is moderate.
| kaba0 wrote:
| Above that you might easily get measuring artifacts, like
| the OS swapping out your process once, or so.
| hawk_ wrote:
| Yes - but it's quite consistent. If it was 'noise' it
| wouldn't be.
| znpy wrote:
| > FYI, ZGC doesn't perform any collection work in the the
| stop-the-world pauses.
|
| Yeah i know, but on a certain level I don't care what it
| does or does not do.
|
| I care about my application not having latency spikes due
| to stop the world pauses :)
| flohofwoe wrote:
| > Except in the special case ...
|
| IME it's the other way around, per-object individual lifetimes
| is a rare special case, in most real world code there will be
| many related objects of the same or very similar lifetimes. In
| such code, tracking individual object lifetimes is overkill (in
| the end, memory management is all about lifetimes, and fewer
| individual lifetimes instead of many is always better because
| it means less work, both in manual and automatic memory
| management).
|
| Not having to think about object lifetimes is just very
| convenient, that's why GC languages were successful despite the
| significant under-the-hood complexity of a good garbage
| collector.
| pron wrote:
| That's not at all a rare special case in most server
| applications.
|
| One way to see that is to consider how much of a program's
| working set is in threads' stacks vs the heap. If the most of
| the working set is in the heap, there's _usually_ some non-
| trivial object lifetimes involved (i.e. cases where lifetimes
| can be encapsulated and abstracted away from client code).
| Yes, sometimes all of these can be taken care of by arenas
| (and there are languages, like Zig, that strive to be very
| arena-friendly), but that -- i.e. the case where all objects
| are easily arena-able -- is not the more common scenario.
| flohofwoe wrote:
| Depends on the type of server I guess. I can easily imagine
| a situation where all allocations of a request are handled
| through a simple bump allocator, and once the request is
| done, the bump allocator is reset instead of 'freeing' each
| individual allocation.
| cogman10 wrote:
| This would only work for fairly trivial applications. The
| moment you start adding things like http clients or
| databases you have to start considering having connection
| pools with lifetimes that don't strictly match a request
| lifecycle.
|
| Not saying such an application doesn't exist, it
| certainly does.
| ncruces wrote:
| That "simple bump allocator" is basically allocating
| everything on the thread's stack, as the GP mentioned.
|
| It's what you put on the heap that has complex lifetimes.
| Sometimes you can fix that with an arena. If you can't,
| you probably can't figure out when the last reference to
| it dies either.
| jayd16 wrote:
| How did we go from "this is the most common case" to "I
| can imagine it?" Sure there are cases where a request can
| be handled by the stack but the point is that the more
| complex case is extremely common.
|
| Any kind of background/asynchronous work spawned from a
| request and your plans are shot.
| cratermoon wrote:
| In generational GCs there are two or more allocation regions.
| New objects are put in the "younger" generation, which is
| garbage collected separated from the other generations. This
| sort of resolves the issue of tracking individual object
| lifetimes by having all the short-lived objects subject to
| rapid GC. This means most of the effort tracking lifetimes is
| reserved for the fewer long-lived objects.
| lll-o-lll wrote:
| It's "medium term" objects that cause all the trouble.
| Async operations. That's why .net invested heavily into
| zero-alloc tasks, as the gc would kill scaling in async
| heavy code.
| zozbot234 wrote:
| > IME it's the other way around, per-object individual
| lifetimes is a rare special case
|
| It depends on your application domain. But in most cases
| where objects have "individual lifetimes" you can still use
| reference counting, which has lower latency and memory
| overhead than tracing GC and interacts well with manual
| memory management. Tracing GC can then be "plugged in" for
| very specific cases, preferably using a high performance
| concurrent implementation much like
| https://github.com/chc4/samsara (for Rust) or
| https://github.com/pebal/sgcl (for C++).
| pron wrote:
| Reference counting does not have lower latency than a
| tracing GC nor is it more generally predictable. In terms
| of performance, it is usually worse than a modern tracing
| GC by most performance metrics. Its benefits are, indeed,
| lower memory footprint and that it can achieve reasonable
| latency even using a crude implementation. In other words,
| you'd reach for refcounting either if you mostly care about
| footprint or if you don't have a lot of resources to invest
| in implementing the GC.
|
| Modern concurrent tracing has very little impact on
| latency. First, compared to refcounting, the barriers are
| rarely invoked. Second, they compact the heap concurrently
| in the background, and like all mark-compact GCs, their
| amortized cost drops to zero as the RAM increases (the work
| is linear with the size of the working set, which is
| roughly constant for a given program, but it needs to be
| done less frequently the more RAM you have). That's why
| high-performance languages that rely heavily on a GC --
| Java, C#, Go -- prefer tracing.
| lll-o-lll wrote:
| > nor is it more generally predictable
|
| Can you explain what you mean here? This does not match
| my experience or intuition.
| pron wrote:
| In theory, refcounting and tracing can behave similarly
| [1], but assuming we're speaking about their
| implementations in the industry (rather elaborate tracing
| GCs; rather crude refcounting GCs) then a refcounting GC
| would do some work as soon the refcount for a particular
| object drops to zero. When exactly that happens and how
| much work there is to do (and sometimes, what the
| fragmentation effect on heap is) are local properties
| that are not very predictable. In contrast, the amount of
| work a tracing GC needs to do when compacting is directly
| proportional to the program's working set and the
| frequency in which it needs to do that work is
| proportional to the allocation rate -- both of which are
| fairly regular and predictable global properties for a
| given program. For stop-the-world GCs there was then the
| matter of the highly unpredictable exact timing of a
| large STW pause, but modern concurrent GCs don't collect
| anything in STW pauses anymore. They only have very short
| (sub 1ms) and constant-time pauses and no surprise
| throttling as long as the allocation rate is within an
| acceptable range. So all in all, you pay a fairly fixed
| tax on the CPU (and if you want to pay less, just add
| more RAM) and virtually no impact on latency.
|
| [1]: https://www.cs.cornell.edu/courses/cs6120/2019fa/blo
| g/unifie...
| lll-o-lll wrote:
| Thanks for responding. My experience with tracing GC at
| scale is exclusively in the realm of .Net, and RC
| exclusively with C++ smart pointers. That matches your
| "sophisticated vs crude" contrast.
|
| The experience with .Net is that GC impact was difficult
| to profile and correct, and also "lumpy", although that
| may have been before GC tuning. GC would dominate
| performance profiling in heavy async code, but these days
| can be corrected by value tasks and other zero alloc
| methods.
|
| For C++ style ref counting, the impact was a continuous %
| load and simple to profile (and therefore improve).
| Although here, the ref counting needed to be stripped
| from the hot paths.
|
| The biggest issue I've hit between the two modes though,
| is how they behave when hitting memory limits. Tracing GC
| appears to have an order of magnitude perf hit when
| memory becomes scarce, while ref counting does not suffer
| in this way. This is enough for me to personally dislike
| tracing GC, as that failure state is particularly
| problematic.
| neonsunset wrote:
| When you hit memory limits, .NETs GC implementation would
| perform much more frequent, invasive and aggressive
| collections, including LOH compaction to reduce memory
| watermark which leads to greater GC pauses, though this
| is rarely seen in such a bad way on modern versions with
| e.g. SRV GC.
|
| The most scaling way to address this is usually to just
| allocate less and use valuetasks with pooling where
| applicable (frequent asynchronous yields), I'm certain if
| you built a .NET 8 based solution you would see user-
| written code dominate heap allocations profile, as most
| hot internal paths of async utilize said state machine
| box pooling+ValueTask<T>[0] and may be entirely
| allocation-free.
|
| [0] Example: https://github.com/dotnet/runtime/blob/cc7bf
| 831f02cad241547e...
| lll-o-lll wrote:
| > When you hit memory limits, .NETs GC implementation
| would perform much more frequent, invasive and aggressive
| collections, including LOH compaction to reduce memory
| watermark which leads to greater GC pauses, though this
| is rarely seen in such a bad way on modern versions with
| e.g. SRV GC.
|
| The trouble with server GC mode is that then there is no
| natural back pressure. If the processing is not CPU
| bound, then memory allocation can grow unbounded. This is
| not something that happens with RC as, again, the GC
| performance hit is inlined with task processing. The
| service may not be capable of as much throughput, but it
| doesn't take out the entire server either.
|
| > The most scaling way to address this is usually to just
| allocate less and use valuetasks with pooling where
| applicable (frequent asynchronous yields), I'm certain if
| you built a .NET 8 based solution you would see user-
| written code dominate heap allocations profile, as most
| hot internal paths of async utilize said state machine
| box pooling+ValueTask<T>[0] and may be entirely
| allocation-free.
|
| Absolutely; I think it's relatively simple to write
| servers that scale using modern .net; the memory
| allocation foot-guns when dealing with asynchronous code
| are now well understood, and tooling is good. I am
| compressing ~15 years of experiences in that previous
| post.
|
| It's probably the case that a tracing GC is the better
| choice for most modern applications, excepting memory
| constrained devices (like phones), and so long as you
| design with memory in mind.
| neonsunset wrote:
| Ah, I see where you are coming from.
|
| You are correct, sustained load heap size of SRV GC has
| been a known pain point that had been particularly
| exacerbated after beefy Windows Server hosts fell out of
| fashion and got replaced by 512Mi Linux containers.
|
| There has been work conducted on this each release
| throughout Core 2.1, 3.1, and then 5, 6, 7 and 8 versions
| to make it play nicer with more constrained memory limit
| systems.
|
| The two major features that address this are Regions[0]
| (.NET 6/7) and DATAS[1] (.NET 8). The former is enabled
| by default everywhere except macOS and the latter is
| available opt-in either via env var
| DOTNET_GCDynamicAdaptationMode=1 or msbuild poperty
| GarbageCollectionAdaptationMode: 1 (see more in [1]).
|
| The latter has shown to significantly reduce sustained
| (or, especially, idling) heap size for some particularly
| problematic workloads (but not all, sometimes you just
| have a lot of live objects). I definitely recommend
| giving it a try if this is something still relevant to
| you.
|
| TLDR of what DATAS does is dynamic heap count scaling and
| much smarter heap up/downsizing depending on allocation
| rate/frequency and anticipated throughput impact of
| adjusting those.
|
| [0] https://itnext.io/how-segments-and-regions-differ-in-
| decommi... / https://devblogs.microsoft.com/dotnet/put-a-
| dpad-on-that-gc/
|
| [1] https://maoni0.medium.com/dynamically-adapting-to-
| applicatio...
| bluGill wrote:
| when the reference count can only be zero or one
| reference counts have different performance than when it
| can be more, and it is much easier to reason about. This
| is most cases.
| dmurray wrote:
| This sounds intuitively true. So...what if GC languages
| could introduce an optional annotation for an object to
| say that only one reference to it can exist, and use that
| as a hint to the GC?
|
| I don't see how this could be implemented in a safe and
| performant way - either you check for existing references
| at runtime, or you risk some use-after-free bug. But
| perhaps your project is already in a GC language and
| you're happy with that, but just want to optimise GC for
| this critical component. And we already have the concept
| of "unsafe" code blocks in many languages.
|
| Does anything like this exist? I Googled "smart pointers
| in Java" but just got a bunch of mid-quality answers
| where people explained that I'm stupid for even asking
| this and they're unnecessary because Java manages its own
| memory. But hasn't someone smarter thought of this?
| rbehrends wrote:
| What happens if a large std::unordered_map<std::string,
| std::string> has its destructor called?
|
| The maximum number of references is a red herring. While
| having a RC capped at 1 allows you to elide the actual
| reference count and makes pointer assignment cheaper, it
| does not materially affect the primary source of latency
| in a reference counting implementation, namely cascading
| deletions.
| bluGill wrote:
| Don't do that. good data design and architecture is
| always needed.
| pron wrote:
| And yet, the reason tracing GCs are chosen by virtually
| all high-performance languages that heavily rely on GC is
| that they've been found to be faster in practice for the
| common workloads.
|
| One of the reasons why your intuition is not so
| straightforward is that a tracing GC needs to do no work
| whatsoever when the number of references is zero. One of
| the common ways to teach the basics of GC is by starting
| out as looking at tracing and refcounting as duals:
| refcounting needs to work to free objects, while tracing
| works to keep them alive. If you thinking in terms of
| what work needs to be done to promptly determine when an
| object becomes garbage, then you're already not thinking
| in terms of tracing, because tracing never actually needs
| to learn about when an object becomes garbage (this isn't
| actually true when they do reference processing, but
| that's another story).
|
| Or, if you want to think about it another way, in a
| tracing collector there are already only two cases no
| matter how many pointers there are to an object:
| reachable or not, i.e. the same one and zero as in your
| case, only there isn't even a need to ever set the
| counter to zero.
|
| However, in principle tracing and refcounting can be
| quite similar (https://www.cs.cornell.edu/courses/cs6120/
| 2019fa/blog/unifie...) in their behaviour, but in
| practice _most_ refcounting GCs in industry use are
| crude, and don 't match the performance of tracing GCs in
| common use, which are quite sophisticated.
| Thaxll wrote:
| Pauses are kind of solved but the CPU usage for the GC is still
| pretty high.
|
| You're still at the mercy of unpredictable tail latency and
| other corner cases.
| pron wrote:
| That goes for manual memory management -- and certainly
| languages with a reference-counting GC, like Rust -- as well.
| The main difference is by far footprint overhead.
| flohofwoe wrote:
| I think the important thing to understand is that reference
| counting isn't any better (and often worse) than "regular"
| garbage collection.
|
| The point of manual memory management is to come up with
| problem-specific strategies to avoid or at least reduce
| dynamic memory allocation, not to insert manual
| release/free calls for individual objects ;)
| pron wrote:
| Reference counting _is_ regular garbage collections. The
| two broad classes of GC algorithms are tracing and
| refcounting, and while they can converge to similar
| behaviour, _usually_ the former optimises for throughput
| while the latter for memory footprint; latency is similar
| these days.
| flohofwoe wrote:
| > Reference counting is regular garbage collection.
|
| ...while I agree, for many C++ and Rust coders statements
| like this are pure heresy ;)
| pjmlp wrote:
| They should read some CS literature, the kind that is
| used to write the compilers they rely on. :)
| cesarb wrote:
| > ...while I agree, for many C++ and Rust coders
| statements like this are pure heresy ;)
|
| It's a matter of definitions. For many people, "garbage
| collection" refers only to tracing GC, and reference
| counting is a separate category. In my experience, that's
| the common usage; insisting that "reference counting is
| formally (in some paper from the last century) also
| defined as a form of GC" will not magically change the
| opinions "many C++ and Rust coders" have about tracing
| GC. In fact, I'd say that insisting on this nomenclature
| point only weakens the whole argument; tracing GC should
| stand on its own merits, and not on depend on some
| nomenclature equivalence to be accepted (if quibbling
| about nomenclature is your strongest argument, your
| arguments are weak).
| pron wrote:
| There's no need to "change opinions". People who work on
| GCs know that reference counting and tracing are the two
| general GC strategies. The only people who don't think of
| refcounting as a GC are people who simply aren't familiar
| with GCs and how they work. If they also think
| refcounting has lower latencies (let alone higher
| throughput) than tracing, then they're also just wrong.
| No one needs to "insist" on the GC nomenclature. You're
| either familiar with it or you're not (and since most
| people are not, they commonly make mistakes on the
| subject). Also, given that tracing GCs are used by ~90%
| the market, they hardly require justification anymore;
| they've won by a large margin over the application space
| (which constitutes most of software). However, it's nice
| to occasionally educate those unfamiliar with the subject
| on GC algorithms and nomenclature.
| ngrilly wrote:
| > Also, given that tracing GCs are used by ~90% the
| market, they hardly require justification anymore;
| they've won by a large margin over the application space
| (which constitutes most of software).
|
| Tracing GCs have clearly proven themselves and are
| everywhere (JVM, CLR, Go, Dart, OCaml, etc.) but we can't
| ignore that the Apple ecosystem (Swift) is using ARC.
| That's a significant share of the "market". Python and
| Ruby also use reference counting, but I don't think
| anyone is considering them state-of-the-art GC.
| zozbot234 wrote:
| Except that obligate ARC ala Swift has even lower
| throughput than obligate tracing GC. It's the worst
| possible choice unless you _really_ care about low-
| latency and deterministic freeing of resources (and even
| then, using RAII for common tree-like allocation patterns
| like Rust does will perform better).
| pron wrote:
| You're right, I should have said "languages where GC is
| the primary means of managing heap memory are used by 90%
| of the market" rather than focused on a specific
| algorithm.
| ngrilly wrote:
| Yes, this is quite fascinating how GC replaced manual
| memory management in most apps over the last 20~30 years.
| dasyatidprime wrote:
| I have to wonder whether some of this is semantic drift
| over time or context. My recollection since undergrad (a
| few decades ago) involves treating "garbage collection"
| as referring to tracing garbage collection, and
| "reference counting" as a separate mechanism. There _is_
| still a term for the category including both, only that
| term is not "garbage collection" but "automatic memory
| management". But what I see nowadays is closer to what
| you describe.
| zozbot234 wrote:
| Automatic memory management is more general than that, it
| also includes stack allocation.
| dasyatidprime wrote:
| I agree; I meant "including" in the non-restrictive
| sense, not "including only". Stack allocation is a
| special case where the lifetimes are arranged in a
| convenient way--see also escape analysis in languages
| where stack allocation isn't explicitly supported at the
| language level but can be added by the compiler.
| PaulDavisThe1st wrote:
| From TFA:
|
| > Tools to automate the "actually freeing the memory"
| part, like lifetimes in Rust and RAII in C++, don't solve
| these problems. They absolutely aid correctness,
| something else you should care deeply about, but they do
| nothing to simplify all this machinery.
| MForster wrote:
| Rust by default doesn't do reference counting.
|
| You can opt into reference counting with `std::rc::Rc`.
| (You can even opt into mark-and-sweep GC using the `gc`
| crate, but this isn't done much...).
| mdavidn wrote:
| Rust is more similar to C++, in that the compiler inserts
| calls to free as variables exit scope. Runtime reference
| counting is limited to those objects wrapped with Rc or
| Arc.
|
| I agree with pron's larger point. GC is fine for most
| applications. It's just factually inaccurate to compare
| Rust's memory management with languages like Python and
| PHP.
| cesarb wrote:
| > and certainly languages with a reference-counting GC,
| like Rust
|
| It's a mistake to say that the Rust _language_ has
| reference counting. There 's a pair of reference-counting
| wrapper types in its standard library (Rc and Arc), or you
| can roll your own, but there's no special support for these
| types in the language, and their use is optional. Most of
| the time, you won't be using these reference-counting
| types. Box (the generic heap-allocated or "boxed" type)
| doesn't use reference counting. String (and its several
| specialized variants) doesn't use it. Vec (the generic
| heap-allocated array type) doesn't use it. HashMap,
| HashSet, BTreeMap, BTreeSet, none of them use reference
| counting. And so on. You can write a lot of Rust code
| without using reference counting even once.
|
| What the Rust language has is just C++-style RAII: when a
| value goes out of scope, if it implements Drop, its
| Drop::drop is called.
| cogman10 wrote:
| > It's a mistake to say that the Rust language has
| reference counting.
|
| Having these types in the standard library is the
| language having those types.
|
| Perhaps it's not integrated to the level that a language
| like swift is. However, I think it's reasonable to say
| the language supports Rc when the standard library
| supports it. I'd say the same thing about C++ with
| `shard_ptr`.
|
| Otherwise you end up in weird pedantic notions about what
| a language has or does not have. Does C have a heap?
| Well, technically no since malloc and free are just
| function calls in the standard library and you can write
| valid C programs without calling those functions.
| cesarb wrote:
| > Having these types in the standard library is the
| language having those types.
|
| It depends on whether you consider the standard library
| an indivisible part of the language or not. For Rust,
| it's clearly not the case, since you have the #![no_std]
| mode in which only a subset of the standard library is
| available, and this subset does not include these
| reference counted wrapper types (or any heap-allocated
| type at all).
|
| > Perhaps it's not integrated to the level that a
| language like swift is. However, I think it's reasonable
| to say the language supports Rc when the standard library
| supports it. I'd say the same thing about C++ with
| `shard_ptr`.
|
| It's one thing to say a language "supports reference
| counting", which only means you can use reference
| counting with it, and another thing to say "[...]
| languages with a reference-counting GC", which implies
| that the language uses a GC for everything, and that GC
| is a reference-counting GC.
|
| > Does C have a heap? Well, technically no since malloc
| and free are just function calls in the standard library
| and you can write valid C programs without calling those
| functions.
|
| It's actually the same thing: C can run on either a
| "hosted" environment or a "freestanding" environment, and
| on the later, most of the standard library is not
| available, including malloc and free. So C does not
| necessarily have a heap when running on a freestanding
| environment.
| arcticbull wrote:
| It's not part of std exactly, it's part of alloc. It's
| re-exported by std.
|
| It would still be available in a #![no_std] environment
| using `extern crate alloc`.
|
| This crate generally abstracts over the concept of
| allocation too, so relying on it doesn't require you to
| also have an allocator - it just requires someone at some
| point specify one with #[global_allocator]
| kaba0 wrote:
| > and their use is optional
|
| It is not, if you have objects with dynamic lifetimes,
| and allocating them for the whole duration of the program
| is not an option.
|
| Sure, their use can be much less than a managed language
| that can only do automatic memory management, but RC is
| objectively a worse from most perspective than tracing
| GC, except for the fact that they don't need runtime
| support, and a slightly lower memory overhead.
| adgjlsfhk1 wrote:
| the CPU usage of manual memory is also pretty high. it's just
| more evenly distributed throughout the program making it
| harder to observe.
| hyperpape wrote:
| Where are the measurements comparing throughput of tracing GCs
| and manual memory management? I'm aware of how incredibly hard
| this area is to measure, but it's a shame that the state of
| mainstream discussion is "most people just assume GC implies
| slow, but then again a handful of people say it's not."
| pron wrote:
| Given that the no-GC-by-default market is ~10% of the global
| software market [1] with no signs of shift in either
| direction over the past couple of decades, which sounds about
| right to me (considering the percentage of programs that need
| to run in memory-constrained environment or must have precise
| control over memory), it seems that the number of those who
| may benefit significantly from a different choice is small
| and so it doesn't look like anyone wants or needs to be
| convinced of anything. "GC languages" already command ~90% of
| the market and have little to gain from such a small market
| of potential converts, and the others aren't trying or
| succeeding in increasing their market share, so who cares
| given the small stakes?
|
| [1]: https://www.devjobsscanner.com/blog/top-8-most-demanded-
| prog...
| teleforce wrote:
| For promising modern and parallel GC techniques please check MPL
| or MaPLe with its novel Automatic Management of Parallelism. It
| won distinguished paper award in POPL 2024 and ACM SIGPLAN
| dissertation award 2023 by proposing these two main things
| [1],[2]:
|
| a) Provably efficient parallel garbage collection based on
| disentanglement
|
| b) Provably efficient automatic granularity control
|
| [1] MaPLe (MPL):
|
| https://github.com/MPLLang/mpl
|
| [2] Automatic Parallelism Management:
|
| https://dl.acm.org/doi/10.1145/3632880
| bugbuddy wrote:
| Nice links. Thanks for posting these.
| pjmlp wrote:
| Great material, thanks for sharing.
| bool3max wrote:
| What does "provably efficient" mean?
| zogrodea wrote:
| Standard ML and the community around it has been pretty
| impressive as far as contributions to memory management
| literature goes.
|
| There is of course the paper you linked, and there's also the
| MLKit which was among the first users, and one of the pioneers,
| of region-based memory management.
| amelius wrote:
| How does this compare against recent efforts in OCaml to
| support multicore parallelism?
| mgaunard wrote:
| I do a lot of different types of systems programming.
|
| Only times I actually use GC is to manage resources other than
| memory.
| celrod wrote:
| The RCU use case is convincing, but my experience with GCs in
| other situations has been poor. To me, this reads more like an
| argument for bespoke memory management solutions being able to
| yield the best performance (I agree!), which is a totally
| different case from the more general static lifetimes generally
| outperforming dynamic lifetimes (especially when a tracing step
| is needed to determine liveness).
|
| > Lies people believe... Calling free() gives the memory back to
| the OS.
|
| I believe calling `free()` gives the memory back to the
| allocator, which is much better than giving it to the OS;
| syscalls are slow. Perhaps not immediately; mimalloc only makes
| frees available to future `malloc`s periodically.
|
| Trying a simple benchmark where I allocate and then immediately
| `free` 800 bytes, 1 million times, and counting the number of
| unique pointers I get: glibc's malloc: 1 jemalloc: 1 mimalloc: 4
| Julia's garbage collector: 62767
|
| 62767, at about 48 MiB, isn't that bad, but it still blows out my
| computer's L3 cache. Using a GC basically guarantees every new
| allocation is from RAM, rather than cache. This kills performance
| of any heavily allocating code; we don't care only about how fast
| memory management can work, but how quickly we can worth with
| what it gives us. I gave a benchmark in Julia showcasing this:
| https://discourse.julialang.org/t/blog-post-rust-vs-julia-in...
|
| Malloc/free gives you a chance at staying hot, if your actual
| working memory is small enough.
|
| Allocators like mimalloc are also designed (like the compacting
| GC) to have successive allocations be close together. The 4
| unique pointers I got from mimalloc were 896 bytes apart.
|
| My opinions might be less sour if I had more experience with
| compacting GCs, but I think GCs are just a vastly more
| complicated solution to the problem of safe memory management
| than something like Rust's borrow checker. Given that the
| complexity is foisted on the compiler and runtime developers,
| that's normally not so bad for users, and an acceptable tradeoff
| when writing code that isn't performance sensitive. Similarly,
| RAII with static lifetimes is also a reasonable tradeoff for code
| not important enough for more bespoke approaches. The articles
| example is evidently one of those deserving a more bespoke
| solution.
| chc4 wrote:
| It's really not enough to just say that a GC gave you more
| pointers = it has worse cache locality. Compacting GC almost
| always has _better_ cache utilization than malloc, because heap
| fragmentation over long-running programs will waste TLB cache
| entries and slack space between objects. A bump allocator from
| a compacting GC will give you a new pointer for each allocation
| because free doesn 't reclaim the memory...but those
| allocations will be sequentially, and if you were in the case
| where you are churning through your heap and only ever touch
| the most recent object they will always be in cache still.
| Benchmarking the knock-on effects of allocators and GCs are
| insanely difficult and I'm very skeptical of basically any
| synthetic benchmarks like this.
| hedora wrote:
| I think the fact that it is complicated to reason about is
| precisely why systems developers don't trust GC's.
|
| It's far easier to write a threadsafe bump (slab) allocator
| than to locate and diagnose the code that someone wrote two
| years ago and, as of last week, started causing the GC to
| blow up the cache, contend on a lock, fragment the heap, add
| unbounded pauses, burn an extra cpu, etc, etc.
|
| (Though, at this point, most mallocs are so good that the
| slab allocator loses anyway and there's no need to bother.)
| convolvatron wrote:
| compaction really does help runtimes alot. but I'm not sure
| how much it really has to do with line level locality. in
| general we don't try to batch related objects together except
| in a coarse form by generation.
|
| I think the measurable benefit comes from page level savings,
| both reducing the number of trips to the kernel to get zeroes
| pages, and from reduced pressure on the tlb.
|
| but I have definitely seen numbers like 20% on some workloads
| for turning on compaction
| celrod wrote:
| FWIW, that synthetic benchmark was reflective of some real
| world code we were deploying. Using malloc/free for one
| function led to something like a 2x performance improvement
| of the whole program.
|
| I think it's important to differentiate between malloc
| implementations/algorithms, just like it's important to
| differentiate between GCs. E.g., mimalloc "shards" size
| classes into pages, with separate free lists per page. This
| way, subsequent allocations are all from the same page.
| Freeing does not free eagerly; only if the entire page is
| freed, or if we hit a new allocation and the page is empty,
| then it can hit a periodic slow path to do deferred work.
| https://www.microsoft.com/en-
| us/research/uploads/prod/2019/0...
|
| Good malloc implementations can also employ techniques to
| avoid fragmentation. It's unfortunate that the defaults are
| bad.
|
| But I confess, compacting GCs and profiling the effects of
| heap fragmentation (especially over time in long running
| programs) are both things I lack experience in.
| Microbenchmarks are unlikely to capture that accurately.
| louthy wrote:
| This makes no sense to me. In a generational GC gen-0 is more
| likely than not to be cached -- and that's where the 'churn'
| is. Outside of that, any longer lived allocations are by
| definition not easy to control cache-wise.
|
| Locality is one of the big wins for GCs, the only issue I'm
| aware of is the 'stop the world' mark/sweep (yes, I know modern
| GCs have a background thread -- but you still get stop-the-
| world events afaik)
| darby_eight wrote:
| If cache usage is that major of a concern, arena allocation
| works just as well as it does with manual memory allocation.
| Thankfully there aren't too many areas where garbage collection
| has to compete with such conveniently contrived examples.
| MichaelMoser123 wrote:
| > I believe calling `free()` gives the memory back to the
| allocator, which is much better than giving it to the OS
|
| Having to deal with memory fragmentation in long running
| servers is no fun at all, especially internal fragmentation of
| pages maintained by slab allocators.
|
| this is not a very common problem, but it is a hard one to deal
| with.
| arcticbull wrote:
| The post explains why this works in the RCU context, why it
| sucks in general, and then just writes it off and ignores it.
|
| > At this point, some folks fire back with non-arguments about
| how this isn't "real" garbage collection. Like, uh, because you
| manually mark the garbage!
|
| Yeah. People's concerns are that the process of figuring out
| what memory is not longer used is inefficient and non-
| deterministic relative to simply telling the allocator when
| you're done with a resource. I've never met someone who's been
| concerned with deferring deallocation.
|
| Sure traversing the whole live set is rare and we've spent 30
| years tweaking GC algorithms to make them better, and now
| they're practically sentient. However this statement either
| willfully or unintentionally writes off the thing people
| actually have an issue with. If you run into GC issues in your
| services you have to bring in a shaman to tweak things here and
| there hoping it sends the angry spirits back to the shadow
| realm.
|
| If you're just marking the garbage and being notified when it's
| no longer used, that entire process is gone.
|
| Yes, it can be very fast to allocate memory in a GC. This
| ignores the cost of marking and compaction that actually need
| to be amortized in to get a fair comparison.
|
| The other big issue people have with GC is that in general it
| requires _significantly_ more memory than manual memory
| management to achieve equivalent performance. And you have to
| have a bunch of extra CPU power to throw at redundantly
| checking if things are still referenced over and over. And you
| have to be okay with a bunch of extra memory copies for
| optimistic compaction.
|
| Finally the post criticizes manual memory management (Rust's
| Arc/Rc) as being necessary when you have unclear lifetimes -
| but ignores that you basically build the exact same
| infrastructure in GC'd languages to close external resources as
| you can't rely on finalizers ever being called.
|
| Anyways this has been beaten to death for the last 20-30 years
| and this article doesn't seem to bring anything new to the
| table besides ignoring the legitimate concerns of a GC using
| memes, which is fine because memes are fun IMO.
|
| The correct answer is exactly what you say - there is no
| general correct answer. You use the tool appropriate to the job
| to meet the design constraints of the system.
| kazinator wrote:
| free() gives back memory to your local POSIX. :)
| osigurdson wrote:
| >> My opinions might be less sour if I had more experience with
| compacting GCs
|
| I have quite a bit of experience with the GC in .NET. For
| projects that deal with large data structures, the GC is
| something that you are always thinking about though it's
| behavior is conceptually transparent. I think I would
| ultimately prefer a more explicit approach.
| samatman wrote:
| This skirts around the edge of an observation which I want to
| dive into, which is that modern user OSes (anything which isn't a
| specialized RTOS) has built-in garbage collection. We just don't
| call it that: we just call it memory management. What do we call
| languages with a built in GC? Memory-managed languages!
|
| You see this in a lot of older "top-to-bottom" C programs: they
| allocate, they clean up system resources (using longjmp to one
| label), but they don't bother with free. When the program exits
| the OS gets all that memory back anyway, so why bother?
|
| There's a missed opportunity here, to have an OS with a garbage
| collector with less isolation from programs, one which handles
| resources more like a language's runtime GC. It will probably
| stay missed, because in the typical GCed language, the GC is
| intricately built into practically every line of the runtime, so
| it's not really practical to make a distribution of that language
| for one OS which hands that control over to the operating system.
|
| But it's a pity, because there's a lot of room to improve some of
| the chronic problems we see from this artificial isolation of
| program-level memory management from OS level.
| flohofwoe wrote:
| > You see this in a lot of older "top-to-bottom" C programs:
| they allocate, they clean up system resources (using longjmp to
| one label), but they don't bother with free. When the program
| exits the OS gets all that memory back anyway, so why bother?
|
| You don't need to bother with releasing other types of
| resources either though (files, sockets, threads, ...), since
| the operating system will take care of that on process exit
| (unless you are on AmigaOS). The only reason to free memory is
| to recycle that memory in other allocations in long running
| applications without grabbing new memory from the OS. For one-
| shot command line tools it's usually not needed.
| kaba0 wrote:
| In Java, the Epsilon GC is just that.
| ninkendo wrote:
| The OS only knows it can free memory when your process exits
| (same as file handles and other resources.) If your process is
| designed to exit once it's done its job, you _can_ use the OS
| as a garbage collector.
|
| Having an operating system know when memory is unused _within
| your running program_ is not something that has ever existed
| though (except for perhaps some esoteric research OS 's.) I
| wouldn't say we're missing an opportunity, because the thing
| we're missing doesn't exist in any meaningful sense. On the
| other hand, a programming methodology that uses very simple,
| short-lived programs is a totally legitimate way to do
| things... it's how CLI tools and the scripting languages that
| script them work, it's how web servers used to work (with
| CGI/etc), and it's a perfectly reasonable approach even today.
| netbioserror wrote:
| I use Nim in production. It defaults to RC. The biggest benefit
| of runtime automatic memory management that rarely gets
| mentioned: You can easily eliminate almost all memory semantics
| from a typical server-side program. My code is 99.9% business
| logic, with the 0.1% being a couple procedures which interface to
| a C library.
|
| Hardcore manual memory people seem to have completely misaligned
| priorities to real-world concerns. Maintainability, safety,
| productivity, and iteration speed seem to be bottom-of-list to
| the egotistic holy grail be being able to say they write their
| own memory routines. If they're not in an embedded context, I'm
| really unsure why anyone paying them to write code should care
| what they have to say. They're not talking about increasing
| robustness and reducing costs, so what the hell are they even
| rambling about? I've had too many of these debates in-person
| face-to-face with people who can't match my ability to deliver.
|
| The vast majority of us are not smarter than the compiler, and
| are not smarter than automatic MM routines. Those who are write
| compilers and GCs. They don't work on the same programs we all
| write, just memory managed. It's the worst-kept secret that the
| remaining contexts where manual management is used often have the
| worst-maintained spaghetti codebases, leading to disasters,
| whistleblown scandles, and hush-hush covered catastrophes waiting
| in the wings while people get the hell out of dodge before they
| blow. It's all duct tape and prayers. Having had to inspect and
| translate procedures from a deliberately obfuscated spaghetti C
| codebase, my position on this is going to be hard to budge.
| Experience is an unbeatable Bayesian prior.
| EatFlamingDeath wrote:
| How's Nim in production? I really enjoyed working with the
| language and I've been bitten too many times by Python's
| package management. Do you think it's suitable for scripting
| too?
| netbioserror wrote:
| I use it in a context it's extremely well suited for: As a
| CLI executable invoked by other server-side scripts. It's a
| program that does lots of mathematical and statistical data
| processing, along with HTML report generation. Like I said, I
| use the default RC and don't even think about memory. The
| type system is very simple; there is only one nominal string
| type, for example, instead of the 12 in Rust or C++, and it's
| a fleshed-out string type that can be mutable or immutable. I
| also take big advantage of its preference for functional-
| style referential transparency with only local mutations. I
| have only a single module that could be construed as "OO".
| Oh, and the module system works exactly like you probably
| intuit a module system should, so you probably already know
| how to use it.
|
| If you want to script with Nim, it's actually quite nice; it
| has a "run" command which compiles and runs without
| depositing artifacts in the run directory, and has an
| incredibly robust standard library, comparable to that of
| Python.
|
| I have no experience making a live always-running Nim
| application, so I can't speak to that. But in the context I
| use it, it's incredible.
| cb321 wrote:
| > I have no experience making a live always-running Nim
| application, so I can't speak to that. But in the context I
| use it, it's incredible.
|
| I have done this several ways quite successfully. Firstly,
| instead of "%cpu" I tend to measure "10s..100s of parts per
| billion" CPU from a like ~100 lines of code "cron library"
| that I use instead of system cron demons:
| https://github.com/c-blake/cron -- seeing 40 ppb on one box
| and 73 ppb on another at the moment.
|
| Another example might be
| https://github.com/c-blake/bu/blob/main/doc/dirq.md which
| is a kind of ad hoc demon to monitor however many
| directories on Linux with inotify and then launch user-
| commands against dropped in files. This can be a sort of
| Plan 9 "plumb" style thing. E.g., one of the directories I
| monitor is a browser download directory. So, one click to
| save and them boom - a potential cascade of activity.
| (EDTI: This is clocking in at about 9572177ns/(48*3600s) =~
| 55 parts per billion for me right now.)
|
| As a final example, I got annoyed with the unwanted
| complexity of modern syslog junk. So, I whipped up
| https://github.com/c-blake/kslog which just looking it up
| for this post, according to /proc/PID/schedstat has only
| accumulated about 27357590/(24*3600+24*60) =~ 311 parts per
| billion of 1 CPU on a 4-core system since boot about
| 24h:24m ago. This in about 25% of the lines of Nim code
| that busybox spends on less useful C code (to your point of
| "almost all the code is the actual logic, not other junk").
|
| Also, while it is not as full-featured as stdlib `string`,
| there is a zero-copy `cligen/mslice.MSlice` type (https://g
| ithub.com/c-blake/cligen/blob/master/cligen/mslice....)
| that does have basic features like splitting and number
| parsing. There are probably others out in the Nimbleverse.
| If view types ever migrate from experimental status to
| relied-upon that might become less interesting.
|
| Since you do a lot of statistics, you might also appreciate
| the fmtUncertain* subsystem of https://github.com/c-blake/c
| ligen/blob/master/cligen/strUt.n... which allows you to
| retain only meaningful number of digits and also
| https://github.com/c-blake/adix/blob/master/adix/mvstat.nim
| which has efficient moving quantiles via a Fenwick Tree
| logarithmic histogram.
| netbioserror wrote:
| I should also mention: Be careful analyzing forum or example
| Nim code. A LOT of Nim programmers are coming from the C or
| C++ side of things and trying to use it like C or C++. You
| can write most Nim programs essentially like Python. Keep it
| clean, keep it simple. Use iterators, use map(), use
| universal function call syntax, use the full semantic power
| of sequences (very comparable to Python lists).
| zozbot234 wrote:
| Most systems programming languages don't have fully manual
| memory management these days - they use RAII. Manual
| deallocation is possible, but not used often.
| CyberDildonics wrote:
| _Hardcore manual memory people seem to have completely
| misaligned priorities to real-world concerns. Maintainability,
| safety, productivity, and iteration speed seem to be bottom-of-
| list to the egotistic holy grail be being able to say they
| write their own memory routines._
|
| I think you've gotten very worked up over a theoretical
| boogieman person that doesn't exist. Most natively compiled
| programs are written in C++ with Rust as a distant second and
| both put a lot of effort into the selling point that you don't
| have to manually manage memory the same way you do with C and
| can do with ownership. I've never seen anyone like you're
| describing.
| netbioserror wrote:
| I have met people like that. Sorry.
| atum47 wrote:
| I once enabled garbage collection for this software I was writing
| and it collected everything.
| ervine wrote:
| Sprayed spot remover on my dog. Next day he was gone.
| mwkaufma wrote:
| (1) the pivot from rcu to general purpose tracing gcs is bait-
| and-switch.
|
| (2) Manual memory management is more than just malloc/free calls
| -- it's about layout (e.g. struct-of-arrays, inlining, implicit
| offsets, packing, etc)
| kaba0 wrote:
| I disagree with 2 being manual memory management. There is
| definitely a lack of control in contemporary managed languages
| (though it's also far from perfect in low-level languages) for
| memory layout, but there are definitely ways to affect it.
| cb321 wrote:
| I agree with your disagreement. As a case in point, Nim has
| packed bitfields but various choices in automatic memory
| management. As a concrete example, a spell-check custom-data
| store uses them here: https://github.com/c-blake/suggest/blob
| /04e313f8f8d3adf4cb55... (there the memory is in a memory
| mapped file and so that code has to manage the space itself..
| so, maybe not the perfect example).
|
| But I also agree there tends to be a correlation in PLang
| design in avoiding both low-level memory layout and in manual
| memory management. But it's just a tendency, not fundamental.
| PaulDavisThe1st wrote:
| I quoted this in another comment here, but just to highlight one
| of the best couple of sentences in TFA:
|
| > Tools to automate the "actually freeing the memory" part, like
| lifetimes in Rust and RAII in C++, don't solve these problems.
| They absolutely aid correctness, something else you should care
| deeply about, but they do nothing to simplify all this machinery.
| bsdpufferfish wrote:
| Why are they always trying to sell these things to audiences who
| are not interested?
| HippoBaro wrote:
| For the kind of software I write there are two cases: (1) the hot
| path for which I will always have custom allocators and avoid
| allocations and (2) everything else.
|
| For (1) GC or not it doesn't make a difference, I'll opt-out. For
| (2) GC is really convenient and correct.
| leapis wrote:
| Agreed- I come from a Java/C++ shop where we tried to tackle
| this dichotomy with interop but it ended up causing more
| problems than it solved. A lot of the work that Java has done
| with modern garbage collectors is impressive, but even they
| admit (indirectly, via Valhalla) that no/low-alloc code has
| it's place.
| musicale wrote:
| The tricky part is identifying which systems programmers can be
| garbage collected and when.
| GeorgeTirebiter wrote:
| Soylent Green was set in 2022/2023
| https://en.wikipedia.org/wiki/Soylent_Green?useskin=vector
| worik wrote:
| Not mentioned in this article is one thing that goes very well
| with GC is async/await
|
| I am a async/await hater for personal Idiosyncratic style reasons
| that I will not bore you with
|
| I use it a lot in Type/Java script. Have done so in Dart. Works
| as it should
|
| I have used it in Rust. IMO it is a disaster there. Shoehorning
| the sort of memory management required to use asyc/await with a
| multi threaded runtime is a hellscape
|
| https://doc.rust-lang.org/std/pin/index.html
___________________________________________________________________
(page generated 2024-03-30 23:00 UTC)