[HN Gopher] Reference count, don't garbage collect
___________________________________________________________________
Reference count, don't garbage collect
Author : kcl
Score : 57 points
Date : 2022-07-29 13:29 UTC (9 hours ago)
(HTM) web link (kevinlawler.com)
(TXT) w3m dump (kevinlawler.com)
| assbuttbuttass wrote:
| In practice, I don't see any reference counting approaches that
| do cycle detection.
|
| I have an example from early in my career where I accidentally
| created a memory leak in Python from a cyclic reference between a
| closure and a function argument
|
| https://stackoverflow.com/questions/54726363/python-not-dele...
| yyyk wrote:
| Reference counting _is_ garbage collection, just a different
| strategy - and all these strategies tend to blur to the same
| methods eventually, eventually offering a latency-optimized GC or
| a throughput-optimized GC.
|
| Swift is inferior here because it uses reference counting GC
| without much work towards mitigating its drawbacks like cycles
| (judging by some recent posts, some of its fans apparently aren't
| even aware RC has drawbacks), while more established GC languages
| had much more time to mitigate their GC drawbacks - e.g. Java's
| ZGC mitigates latency by being concurrent.
| habibur wrote:
| It's interesting how we have come full circle from "Reference
| counting is the worst of two worlds [manual and GC] and will
| always be slower" to now "Well, we all know it's actually
| faster." in like 10 years.
| pclmulqdq wrote:
| Its actually usually slower than both manual memory management
| and GC. It's only coming back now because people are finally
| learning how to make memory allocations large and rare.
|
| This blog post is an answer to: "Tell me you haven't learned
| about cache coherence without telling me you haven't learned
| about cache coherence."
| rwmj wrote:
| Hiring would certainly be a lot easier if more people were to
| make bold, completely wrong blog postings like these. I could
| immediately give my negative recommendation without the time
| and hassle of a phone interview.
| pclmulqdq wrote:
| I completely agree, but before we scare people away from
| blogging too much, I will say that my big problem with this
| post isn't the lack of knowledge, it's the willful
| ignorance and lack of humility. It's clear that the author
| doesn't really understand the position they are trying to
| argue against.
| [deleted]
| hinkley wrote:
| Has anyone done a good paper on how memory bank affinity for
| processors affects these costs?
| OskarS wrote:
| > Its actually usually slower than both manual memory
| management and GC
|
| [citation needed]
|
| You and the blog post are arguing opposite things, and
| neither of you have shown any evidence. I get that you're
| arguing that reference counted objects are bigger (to store
| the reference count) and/or might use double indirection
| (depending on implementation), which are both bad for caches.
| It's not a bad argument. But the counter-argument that the
| blog posts makes is persuasive as well: it's expensive
| running a GC that scans the heap looking for loose objects,
| and reference counting does not need to do that. GC is also
| "stop-the-world" as well unpredictable and jittery in a way
| reference counting is not.
|
| My instinct is that reference counting is actually faster
| (which matches my personal experience), but really, this is
| not an argument you can solve by arguing in the abstract, you
| need actual data and benchmarks.
| marginalia_nu wrote:
| Eh, the GC you are describing sounds like something out of
| 1994. Modern concurrent algorithms do not need to stop
| execution. This type of code may paradoxically be slower in
| benchmarks when there isn't any memory churn, as it
| necessarily introduces barrier instructions at key points
| to ensure the GC has a consistent view of the memory
| (although memory consistency is also a concern for
| reference counting).
| adgjlsfhk1 wrote:
| Note that if you can afford stop the world pauses, GCs
| that use them are much more efficient. Parallel stop the
| world garbage collectors are by far the best for
| minimizing runtime, while concurrent GCs have higher
| overhead, but can guarantee no significant pauses.
| marginalia_nu wrote:
| Yeah, that's true, but it shouldn't be held against GC
| that it interrupts the execution, since it really doesn't
| have to.
| adgjlsfhk1 wrote:
| The blog post is largely incoherent for several reasons.
|
| 1. It recommends 8 byte counters. This implies making every
| object in your programming language 8 bytes bigger which is
| pretty much a non-starter. Almost everyone that actually
| uses reference counting uses more like 2-4 bits.
|
| 2. Reference counting can have arbitrarily long pauses as
| well. If you decrease the reference count of an object to
| zero, you have to decrease the reference count of all
| referenced objects, which can do significant amounts of
| work (specifically, it will do a lot of work in the cases
| where regular GC does almost no work).
|
| 3. The blog states that atomic writes are "basically free",
| but that ignores the fact that in multi-threaded code,
| atomic writes can actually be fairly expensive since each
| one requires communication between every thread (This is
| why python still has a GIL)
|
| 4. Because no one uses 8 bytes for the count (since you
| don't want to double your memory consumption), you still
| need a GC anyway to deal with objects that get too many
| references.
| OskarS wrote:
| > Almost everyone that actually uses reference counting
| uses more like 2-4 bits.
|
| Here is an incomplete list of languages which use 8 bytes
| for their reference count on 64-bit computers:
|
| 1. Rust, in Rc<T> and Arc<T>
|
| 2. C++, in std::shared_ptr<T>
|
| 3. Objective-C, in NSObject's retainCount
|
| 4. Swift, because of Objective-C
|
| 5. Python, where reference count is Py_ssize_t
|
| These were literally the first languages i thought of,
| they all use 64-bit types. I would argue since reference
| counting is much rarer than GC, these make up the bulk of
| reference counting use in the real world. "Almost
| everyone", huh? It's a bit rich accusing the author of
| being "almost incoherent" and then say this.
|
| > Reference counting can have arbitrarily long pauses as
| well.
|
| _Deallocating_ can take arbitrarily long, but
| deallocation does not stop-the-world. It stops the
| current thread, which is a huge difference. Malloc can
| take arbitrarily long as well, it 's not like it's wait-
| free.
|
| In addition, the GC pauses in regular programming
| languages are frequently orders of magnitude longer than
| deallocation. You would have to deallocate an enormous
| tree of object at the root for this to be an issue. And
| GCs have to do that as well, in addition to their regular
| stop-the-world pauses. This argument is just irrelevant.
|
| > The blog states that atomic writes are "basically
| free", but that ignores the fact that in multi-threaded
| code, atomic writes can actually be fairly expensive
| since each one requires communication between every
| thread (This is why python still has a GIL)
|
| First off, this is not why Python has a GIL, but lets
| leave that aside.
|
| Atomic writes are more expensive than non-atomic ones,
| but they are not slow operations in the grand scheme of
| things. If you properly implement acquire-release
| semantics, they are not even that slow under high
| contention. Compare this to a GC which literally STOPS
| ALL THREADS, it's nothing.
|
| > you still need a GC anyway to deal with objects that
| get too many references.
|
| This is just silly. In languages which has both reference
| counting and traditional garbage collection (e.g.
| Python), they do it to avoid reference cycles, not
| because objects get "too many references". That is a
| ridiculous statement.
|
| In fact! I just realized we do have data for which is
| more performant: this article describes how Instagram
| turned of GC for Python and just relied on reference
| counting. They gained 10% increase in performance:
|
| https://instagram-engineering.com/dismissing-python-
| garbage-...
|
| I think it's still an open question if reference counting
| is always faster than GC, but I do not believe you have
| the technical expertise to evaluate such a claim. Your
| comment is four paragraphs long and riddled with factual
| errors. If you want to be convincing, show data that is
| better than that Instagram case study.
| kaba0 wrote:
| > Deallocating can take arbitrarily long, but
| deallocation does not stop-the-world
|
| And modern GCs only have to stop the current thread to
| check for which thread-local objects are alive. The alice
| ones are moved to another generation, making the space
| reusable _for free_.
|
| And atomic writes need synchronization which is crazy
| expensive, I honestly don't get why you think it isn't.
|
| Also, just try writing rust/c++ code that relies entirely
| on RC vs Java in an object heavy workload - I really
| don't think it is an open question in any shape or form.
| OskarS wrote:
| > The alice ones are moved to another generation, making
| the space reusable for free.
|
| It's pretty hilarious to me that you first say "they have
| to move it to another generation" and then you say "it's
| free!" It's like saying "I paid for my dinner, and now I
| get to eat it for free!"
|
| Also: what do you think `free()` does? All modern memory
| allocators do this trick, keeping thread-local caches.
| This is not an advantage of GCs when reference counting
| does it as well.
|
| Almost all modern GCs are stop-the-world in at least
| phases, and for good reason: stop-the-world GCs are
| higher performance. You pay in other ways for skipping
| that stop.
|
| > And atomic writes need synchronization which is crazy
| expensive, I honestly don't get why you think it isn't.
|
| Because I've actually benchmarked it: https://quick-
| bench.com/q/ISEetAHOohv-GaEuYR-7MajJgTc
|
| 18.5 nanoseconds fits under no reasonable definition of
| "crazy expensive", not when a regular increment clocks in
| at 5.9 nanoseconds. And there is extremely few situations
| where you increment a reference count more than, like, 5
| times. It's just not an issue.
|
| This is like cargo cult programming: you've been told
| these things and never tested them in the real world, and
| you have all these preconceived notions that just don't
| stand up to two minutes of scrutiny.
|
| > Also, just try writing rust/c++ code that relies
| entirely on RC vs Java in an object heavy workload - I
| really don't think it is an open question in any shape or
| form.
|
| Yes, _of course_ garbage collectors are easier to use
| than reference counting. Nobody has ever disputed this.
| That is the whole raison d 'etre of garbage collectors.
| This is not what the discussion is about, it's about
| performance.
|
| I'm done with this thread now, unless anybody can show me
| any actual _data_ to back anything you say up. It 's
| really tiring. I started this by saying "I don't actually
| know", and everyone replying to me has been so darn
| certain of everything they say while being outright
| incorrect about most things, and without any actual data
| to back up the rest.
| hayley-patton wrote:
| You'd get very different results doing atomic operations
| on counters shared between multiple threads, with other
| stuff to do that causes your CPU to serialise and run
| less out-of-order. That single-threaded benchmark is
| data, but it doesn't appear awfully relevant to how naive
| RC barriers perform versus barriers for coalesced RC or
| tracing.
|
| Indeed the issue with measuring barriers is that
| measuring the barrier doesn't suffice; one wants to
| measure how the barrier affects the rest of execution.
| This entails coming up with programs that are much less
| trivial than repeatedly incrementing a counter.
| barsonme wrote:
| Is your benchmark running concurrently? Or single-
| threaded?
| kaba0 wrote:
| Moving to another generation can be done completely
| asyncronously on another thread that likely doesn't do
| any useful work on a modern, highly parallel hardware.
| `free` doesn't do much, but `malloc` does -- with the
| method I am talking about (TLAB in the JVM), you get as
| fast allocations as it gets, it's nothing more than a
| NON-ATOMIC pointer bump. Meanwhile malloc has to find an
| empty space that can fit the object at hand.
|
| > > Also, just try writing rust/c++ code that relies
| entirely on RC vs Java in an object heavy workload - I
| really don't think it is an open question in any shape or
| form. > Yes, of course garbage collectors are easier to
| use than reference counting. Nobody has ever disputed
| this. That is the whole raison d'etre of garbage
| collectors. This is not what the discussion is about,
| it's about performance.
|
| I am talking about performance exactly. Java's GC will
| smoke the hell out of C++'s shared pointers and Rust's
| (A)RC. Noone said anything about productivity/ease of
| usage.
|
| And as mentioned by another commenter - your benchmark
| didn't take into account anything related to _parallel_
| execution, which would be the point.
| pclmulqdq wrote:
| The blog post about this is pretty much incoherent, and
| comes from a bad understanding of performance, atomic, GC
| algorithms, and reference counting.
|
| The ONLY reason to reference count is when you need GC-like
| behavior on a few objects, but do not want to impose a GC
| on all objects. It is a very valuable tool in performance
| code not because it is fast, but because it allows you to
| make other things fast. Suggesting that you should
| reference count the world is patently ridiculous. This is
| the reason for Rust's Arc and C++ shared_ptr, not that they
| are faster than a GC.
|
| The blog post completely brushes aside the costs of
| _atomic_ counter increments and decrements, calling them
| "just increments." The "atomic" is the key performance
| problem, not the increment.
|
| Reference counting also makes objects larger so small
| objects cannot fit inside a machine register.
|
| Modern GC algorithms are very efficient. They do not need
| to pause the world, and they do not need to be jittery.
| However, someone who doesn't understand how expensive
| atomic ops are probably wouldn't understand this either.
| kaba0 wrote:
| With all due respect, why do you believe that your
| experience is meaningful compared to people writing actual
| industrial-scale runtimes, when RC is the hello world of GC
| algorithms? It's not that they don't know about it, it's
| that they studied the topic countless times and found
| significant difference between the two approaches, to the
| point that no language considered fast uses RC (or if they
| do, they do so because it is a useful escape hatch to
| manual memory management that doesn't need support from the
| runtime).
| flohofwoe wrote:
| Except that "refcounting is faster than a GC" is mostly a myth,
| both are equally bad if predictable performance matters.
| klodolph wrote:
| Looking at metrics from Go's garbage collector, what else do
| you want? GC pauses are damn low, and you'll see numbers in
| the sub 500ms range.
|
| If I needed hard-realtime, I would avoid allocation entirely.
| dpryden wrote:
| This article is naive to the point of being flat-out wrong, since
| it makes extremely naive assumptions about how a garbage
| collector works. This is basically another C++-centric programmer
| saying that smart pointers work better than the Boehm GC -- which
| is completely true but also completely misleading.
|
| I'm not saying that GC is _always_ the best choice, but this
| article gets the most important argument wrong:
|
| > 1. Updating reference counts is quite expensive. > > No, it
| isn't. It's an atomic increment, perhaps with overflow checks for
| small integer widths. This is about as minimal as you can get
| short of nothing at all.
|
| Yes, it is. Even an atomic increment is a _write_ to memory. That
| is not "about as minimal as you can get short of nothing at
| all".
|
| Additionally, every modern GC does generational collection, so
| for the vast majority of objects, _the GC literally does "nothing
| at all"_. No matter how little work it does, a RC solution has to
| do O(garbage) work, while a copying GC can do O(not garbage)
| work.
|
| Now, that's not to say that GC is automatically better. There are
| trade-offs here. It depends on the workload, the amount of
| garbage being created, and the ratio of read to write operations.
|
| The article says:
|
| > I've already stated I'm not going to do benchmarks. I am aware
| of two orgs who've already run extensive and far-reaching
| experiments on this: Apple, for use in their mobile phones, and
| the Python project.
|
| I can counterpoint that anecdata: Google extensively uses Java in
| high-performance systems, and _invented a new GC-only language_
| (Go) as a replacement for (their uses of) Python.
|
| The right answer is to do benchmarks. Or even better yet, don't
| worry about this and just write your code! Outside of a
| vanishingly small number of specialized use cases, by the time GC
| vs RC becomes relevant in any meaningful way to your performance,
| you've already succeeded, and now you're dealing with scaling
| effects.
| dfox wrote:
| One great advantage of garbage collection is that it removes need
| for thread synchronization in cases where is it only needed to
| make sure that object jou are going to call free/decref on is not
| in use in another thread. Corollaly to that GC is the thing that
| you need for many lock-free data structures to be practical and
| not of only theoretical interest.
|
| It might seem that it is simply about pushing your
| synchronizations problems onto the GC, but the synchronization
| issue that GC solves internally is different and usually more
| coarse-grained, so in the end you have significantly smaller
| synchronization overhead.
| byefruit wrote:
| This article provides very little evidence for it's claims and
| seems to only have a superficial understanding of modern GCs.
|
| "Increments and decrements happen once and at a predictable time.
| The GC is running all the time and traversing the universe of GC
| objects. Probably with bad locality, polluting the cache, etc."
|
| This is only the case with a mark-sweep collector, usually most
| of your allocations die young in the nursery. With reference
| counting you pay the counting cost for everything.
|
| "In object-oriented languages, where you can literally have a
| pointer to something, you simply mark a reference as a weak
| reference if it might create a cycle."
|
| As someone who has tried to identify memory leaks in production
| where someone has forgotten to "simply" mark a reference in some
| deep object graph as weak, this is naive.
|
| "With an 8-byte counter you will never overflow. So...you
| know...just expand up to 8-bytes as needed? Usually you can get
| by with a few bits."
|
| So now my "about as minimal as you can get short of nothing at
| all" check as an unpredictable branch in it?
|
| "If you must overflow, e.g., you cannot afford an 8-byte counter
| and you need to overflow a 4-byte counter with billions of
| references, if you can copy it, you create a shallow copy."
|
| I don't even know where to begin with this.
|
| "If GC is so good, why wouldn't Python just garbage collect
| everything, which they already did once and could trivially do,
| instead of going through the hassle of implementing reference
| counting for everything but the one case I mentioned?"
|
| This probably has more to do with finalising resources and
| deterministic destruction than anything else.
|
| --
|
| Anyone who is interested in actually studying this area would
| probably find
| https://courses.cs.washington.edu/courses/cse590p/05au/p50-b...
| interesting. Also https://gchandbook.org/
| knome wrote:
| >If GC is so good, why wouldn't Python just garbage collect
| everything, which they already did once and could trivially do
|
| I don't think python ever did pure mark-and-sweep ( cpython, at
| least, I'm sure jython and other alternate implementations have
| ).
|
| My understanding was that they did pure reference counting, and
| kludged on a sweep GC to do cycle breaking eventually, as
| manually breaking cycles in early versions of python was a pain
| point. A quick lookup seems to indicate python1 was pure
| reference counting, and they added the cycle breaking when they
| released python2.
| cogman10 wrote:
| That's my recollection as well (though I've not found
| evidence to support it). In fact, IIRC, as part of the python
| performance thing one of the topics was adding a proper
| generational GC into python for objects that don't refer to
| pinned memory.
| carry_bit wrote:
| You can optimize reference counting:
| https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23...
|
| With allocating as dead, you're basically turning it into a
| tracing collector for the young generation.
| jerf wrote:
| From what I can see, the myth that needs to be debunked isn't
| that garbage collection is super fast and easy with no
| consequences, it's the myth that garbage collection always
| automatically means your program is going to be spending 80% of
| its time doing it and freezing for a second every five seconds
| the instant you use a language with garbage collection. I see far
| more "I'm writing a web server that's going to handle six
| requests every hour but I'm afraid the garbage collector is going
| to trash my performance" than people who believe it's magically
| free.
|
| It's just another engineering decision. On modern systems, and
| especially with any runtime that can do the majority of the GC
| threaded and on an otherwise-unused core, you need to have some
| pretty serious performance requirements for GC to ever get to
| being your biggest problem. You should almost always know when
| you're setting out to write such a system, and then, sure, think
| about the GC strategy and its costs. However for the vast bulk of
| programs the correct solution is to spend on the order of 10
| seconds thinking about it and realizing that the performance
| costs of _any_ memory management solution are trivial and
| irrelevant and the only issue in the conversation is what
| benefits you get from the various options and what the non-
| performance costs are.
|
| It is in some sense as big a mistake (proportional to the program
| size) to write every little program like it's a AAA game as it is
| to write a AAA game as if it's just some tiny little project, but
| by the sheer overwhelming preponderance of programming problems
| that are less complicated than AAA games, the former happens
| overwhelmingly more often than the latter.
|
| Edit: I can be specific. I just greased up one of my production
| systems with Go memstats. It periodically scans XML files via
| network requests and parses them with a parser that cross-links
| parents, siblings, and children using pointers and then runs a
| lot of XPath on them, so, it's kinda pessimal behavior for a GC.
| I tortured it far out of its normal CPU range by calling the
| "give me all your data" JSON dump a 100 times. I've clicked
| around on the website it serves to put load on it a good 10x what
| it would normally see in an hour, minimum. In 15 minutes of this
| way-above-normal use, it has so far paused my program for 14.6
| milliseconds total. If you straight-up added 14.6 milliseconds of
| latency to every page it scanned, every processing operation, and
| every web page I loaded, I literally wouldn't be able to notice,
| and of course that's not what actually happened. Every second
| worrying about GC on this system would be wasted.
| nickbauman wrote:
| Yes; I have a friend who is part of a small team that wrote a
| very successful stock market trading gateway in Java. Turns out
| the JVM's GC can be tuned in very specific ways based on your
| needs. And there are ways to avoid having to do JVM GC in
| critical areas of the code as well.
| marginalia_nu wrote:
| There's several exchanges and clearing platforms running
| Java[1], although I'm not sure how many are still around
| after Nasdaq's hostile takeover.
|
| [1] https://www.marketswiki.com/wiki/TRADExpress
| PaulHoule wrote:
| Generally Java has made a huge investment in the garbage
| collector over a long period of time to address the problems
| that people have in some use cases. JDK 17 is much better
| than JDK 8. If you were writing a GC from scratch you are not
| going to do as well.
| hinkley wrote:
| To be fair, they definitely got into a rut in the JDK 6-7
| timeframe. I maintain it's no accident that memcache came
| into its own during this period. That was a major pain
| point, and going out-of-process shouldn't have been
| necessary.
| kaba0 wrote:
| The JVM GC's are absolutely insanely good. G1 can sustain
| loads with heap sizes well into TERAbytes.
| code_runner wrote:
| The biggest GC issues I've personally seen manifest were arrays
| of historical data that grew to tens of millions of entries and
| due to array storage in .net, the array was placed in the large
| object heap. Swapping to a linked list actually fixed the issue
| and the team lived to fight another day.
|
| Like a lot of premature optimization, it isn't a problem until
| it is... but solutions aren't unattainable.
| hinkley wrote:
| I still mostly remember the day a coworker convinced me that
| object pooling was dead because it tears the mature
| generation a new one over and over.
|
| It's nice when the runtime solves a problem you've had to
| solve yourself, but it also takes a bit of your fun away,
| even if your coworkers are relieved not to have to deal with
| 'clever' code anymore.
| agentultra wrote:
| A paper I quite enjoyed on automatic reference counting for pure,
| immutable functional programming:
| https://arxiv.org/abs/1908.05647
|
| It can be quite "fast."
| miloignis wrote:
| Indeed, and this line of research has been continuing to
| improve in follow on work on Perceus in Koka:
| https://xnning.github.io/papers/perceus.pdf and
| https://www.microsoft.com/en-us/research/uploads/prod/2021/1...
|
| Very cool stuff!
| shwestrick wrote:
| This debate has gone round and round for decades. There are no
| hard lines; this is about performance tradeoffs, and always will
| be.
|
| Perhaps the biggest misconception about reference counting is
| that people believe it avoids GC pauses. That's not true.
| Essentially, whereas tracing GC has pauses while tracing live
| data, reference counting has pauses while tracing garbage.
|
| Reference counting is really just another kind of GC. I'd highly
| recommend perusing this paper for more details: A Unifying Theory
| of Garbage Collection.
| https://web.eecs.umich.edu/~weimerw/2012-4610/reading/bacon-...
|
| One of the biggest issues with reference counting, from a
| performance perspective, is that it turns reads into writes: if
| you read a heap-object out of a data structure, you have to
| increment the object's reference count. Modern memory
| architectures have much higher read bandwidth than write
| bandwidth, so reference counting typically has much lower
| throughput than tracing GC does.
| goodpoint wrote:
| It's not always a tradeoff:
|
| Nim switched from GC to RC and it even increased performance.
| waterhouse wrote:
| > Essentially, whereas tracing GC has pauses while tracing live
| data, reference counting has pauses while tracing garbage.
|
| _This_ is pretty trivial to avoid. When your thread finds
| itself freeing a big chain of garbage objects, you can have it
| stop at any arbitrary point and resume normal work, and find
| some way to schedule the work of freeing the rest of the chain
| (e.g. on another thread). It 's much more complex and expensive
| to do this for tracing live data, because then you need to
| manage the scenario where the user program is modifying the
| graph of live objects while the GC is tracing it, using a write
| or read barrier; whereas for garbage, by definition you know
| the user _can 't_ touch the data, so a simple list of "objects
| to be freed" suffices.
|
| "Reads become writes" (indeed, they become atomic read-modify-
| writes when multiple threads might be refcounting
| simultaneously) is a problem, though.
| arcticbull wrote:
| RC may turn reads into writes, but of course, GC ends up having
| to go through literally every piece of memory ever from bottom
| to top once in a while.
|
| RC limits itself to modifying only relevant objects, whereas GC
| reads _all_ objects in a super cache-unfriendly way. Yes, an
| atomic read-modify-write is worse than a read, but it 's not
| worse than a linked-list traversal of all of memory all the
| time.
|
| And of course, not all kinds of object lend themselves to
| garbage collection - for instance, file descriptors, since you
| can't guarantee when or if they'll ever close. So you have to
| build your own reference counting system on top of your garbage
| collected system to handle these edge cases.
|
| There's trade-offs, yes, but the trade-off is simply that
| garbage collected languages refuse to provide the compiler and
| the runtime all the information they need to know in order to
| do their jobs - and a massive 30 year long effort kicked off to
| build a Rube Goldberg machine for closing that knowledge gap.
| klodolph wrote:
| > RC may turn reads into writes, but of course, GC ends up
| having to go through literally every piece of memory ever
| from time to time.
|
| Depends on the GC algorithm used. Various GC algorithms only
| trace reachable objects, not unreachable ones.
|
| Reference counting does the opposite, more or less. When you
| deallocate something, it's tracing unreachable objects.
|
| One of the problems with this is that reference counting
| touches all the memory right before you're done with it.
|
| > And of course, not all kinds of object lend themselves to
| garbage collection - for instance, file descriptors, since
| you can't guarantee when or if they'll ever close. So you
| have to build your own reference counting system on top of
| your garbage collected system.
|
| This is not a typical solution.
|
| Java threw finalizers into the mix and everyone overused them
| at first before they realized that finalizers suck. This is
| bad enough that, in response to "too many files open" in your
| Java program, you might invoke the GC. Other languages
| designed since then typically use some kind of scoped system
| for closing file descriptors. This includes C# and Go.
|
| Garbage collection does not need to be used to collect all
| objects.
|
| > There's trade-offs, yes, but the trade-off is simply that
| garbage collected languages refuse to provide the compiler
| and the runtime all the information they need to know in
| order to do their jobs - and a massive 30 year long rube
| goldberg machine was built around closing that gap.
|
| When I hear rhetoric like this, all I think is, "Oh, this
| person really hates GC, and thinks everyone else should hate
| GC."
|
| Embedded in this statement are usually some assumptions which
| should be challenged. For example, "memory should be freed as
| soon as it is no longer needed".
| arcticbull wrote:
| > When I hear rhetoric like this, all I think is, "Oh, this
| person really hates GC, and thinks everyone else should
| hate GC."
|
| Yeah, I think it's an inelegant, brute-force solution to a
| language problem - and that we continue to throw good money
| after bad improving it.
|
| We should be investing in removing the need for GC through
| smarter compilers and through languages that allow us to
| better express our intent - and our object graph. Instead,
| we continue to improve on a giant linked-list traversal
| through all of live memory to try and infer the object
| graph at runtime. Without sufficient information to do it
| efficiently. The fundamental issue is that we don't have
| languages that express our goals in a way that allows
| memory management to be elided efficiently.
|
| That doesn't mean I have some particular affinity for
| reference counting, though. It has its own issues as you
| rightly point out. I prefer it for its determinism, nothing
| more.
| hinkley wrote:
| What is your read on the lack of discussion of escape analysis?
|
| My own read on this is that it blurs the line with deferred
| collection/counting, because you could either use it to
| complement deferral making it cheaper, or avoid deferral
| because you're getting enough of the benefits of deferral by
| proving objects dead instead of discovering that they are dead.
| kaba0 wrote:
| One great example would be a C++ program that runs fast, and
| then just spends 10s of seconds doing "nothing" while it
| deallocates shared pointers' huge object graphs at the end.
| They really are two sides of the same coin, with tracing GCs
| being actually correct (you need cycle detection for a correct
| RC implementation), and having much better throughput. It's not
| an accident that runtimes with thousands dev hours are
| polishing their GC solutions instead of using a much more
| trivial RC.
| hinkley wrote:
| I don't know what the current state of the art is, but at one
| point the answer to GC in a realtime environment was to
| amortize free() across malloc(). Each allocation would clear
| up to 10 elements from queue of free-able memory locations.
| That gives a reasonably tight upper bound on worst case alloc
| time, and most workflows converge on a garbage-free heap. Big
| malloc after small free might still blow your deadlines, but
| big allocations after bootstrapping are generally frowned
| upon in realtime applications, so that's as much a social
| problem as a technical one.
| titzer wrote:
| I read most the article and it's just a lot of the same tired old
| arguments and an extremely simplified worldview of both GC and
| reference counting. I wish I had the author's address, because
| I'd like to mail them a copy of the Garbage Collection Handbook.
| They clearly have a very naive view of both garbage collection
| and reference counting. And there isn't a single dang measurement
| anywhere, so this can be completely dismissed IMHO.
| cogman10 wrote:
| Agreed. What I particularly disliked is how absent of nuance it
| is. RC is a form of GC and all GC algorithms make tradeoffs. RC
| trades throughput for latency. Compacting mark and sweep trade
| latency (and usually memory) for throughput.
|
| The rant at the end can be boiled down to "I use confirmation
| bias [1] to make my engineering decisions". The OP has already
| decided that "GC" is slow, so I'm sure every time a runtime
| with it misbehaves it's "Well, that darn GC, I knew it was
| bad!" and every time RC misbehaves it's likely "Oh, well you
| should have nulled out your link here to break the cycle
| dummy!"
|
| I really don't like such absolutist thinking in software dev.
| All of software dev is about making tradeoffs. RC and GC aren't
| superior or inferior to each other, they are just different and
| either (or both) could be valid depending on the circumstance.
|
| [1] https://en.wikipedia.org/wiki/Confirmation_bias
| titzer wrote:
| > absent of nuance
|
| Yes, this is a good point. It makes overly general claims.
|
| E.g. a GC proponent could claim "well, tracing collectors do
| _no_ work for dead objects, so they have _no_ overhead! "
| Which is a good point, but not the whole story. Tracing
| collectors may need to repeatedly traverse live objects.
| Sure. But then generational collectors only traverse modified
| live objects that point to new objects. True. And concurrent
| collectors can trace using spare CPU resources, incremental
| collectors can break marking work up into small pauses, on
| and on. There are zillions of engineering tradeoffs and the
| GC Handbook covers most of them really well.
| glouwbug wrote:
| Isn't garbage collection needed to solve circular reference
| counts?
| arcticbull wrote:
| Nope, you can just mark the back-reference as weak.
|
| GC is only required if you as a programmer (or programming
| language) do not provide sufficient information to the compiler
| or runtime to understand the object graph.
| klodolph wrote:
| It's not always obvious to know which reference to mark as
| weak, and there's not necessarily a clear indication of which
| reference is a back-reference.
|
| You can find various algorithms in journals or whatnot
| written with the assumption that there's GC. Algorithms
| designed with this assumption may not have clear ownership
| for objects, and those objects my have cyclic references.
|
| It's easy to say, "objects should have clear ownership
| relationships" but that kind of maxim, like most maxims,
| doesn't really survive if you try to apply it 100% of the
| time. Ownership is a tool that is very often useful for
| managing object lifetimes--it's not _always_ the tool that
| you want.
| flohofwoe wrote:
| Such a claim really needs hard data to back it up. Reference
| counting can be _very_ expensive, especially if the refcount
| update is an atomic operation. It 's hard to capture in profiling
| tools because the performance overhead is smeared all over the
| code base instead of centralized in a few hot spots, so most of
| the time you don't actually know how much performance you're
| losing because of refcounting overhead.
|
| The most performant approach is still manual memory management
| with specialized allocators tuned for specific situations, and
| then still only use memory allocation when actually needed.
| okennedy wrote:
| This. Exactly this.
|
| Garbage collection has a huge, and generally entirely
| unappreciated win when it comes to threaded code. As with most
| things, there are tradeoffs, but every reference counting
| implementation that I've used has turned any concurrent access
| to shared memory into a huge bottleneck.
| arcticbull wrote:
| > The most performant approach is still manual memory
| management with specialized allocators tuned for specific
| situations, and then still only use memory allocation when
| actually needed.
|
| RAII gets you a lot of the way there.
| cosmotic wrote:
| > This is about as minimal as you can get short of nothing at
| all.
|
| With GC, you _can_ do nothing at all. In a system with lots of
| garbage, you can do a GC by copying everything accessible from
| the GC root, then de-allocating all the garbage in a single free.
| [deleted]
| bitwize wrote:
| Boy, I can't wait for theangeryemacsshibe (posts here as hayley-
| patton) to tear into this one.
|
| But yeah, the correct way to handle resources (not just memory!)
| is with value semantics and RAII. Because then you know the
| object will be cleaned up as soon as it goes out of scope with
| zero additional effort on your part. In places where this is not
| appropriate, a simple reference counting scheme may be used, but
| the idea is to keep the number of rc'd objects small. Do not use
| cyclical data structures. Enforce a constraint: zero cycles. For
| data structures like arbitrary graphs, keep an array of vertices
| and an array of edges that reference vertices by index.
|
| If you use a language with GC, you're probably just contributing
| to global warming.
| kaba0 wrote:
| Why not just write embedded programs with fixed size memory
| allocation then if we are that okay with restricting the
| programs we write?
| bitwize wrote:
| Because maybe we're not okay with restricting the programs we
| write _that much_.
| kaba0 wrote:
| Nor am I okay with restricting my programs as much as a
| tree-like only allocation strategy would suffice and
| circular data structures are not evil.
| slavboj wrote:
| People have been managing garbage collection schedules for
| decades now. It's quite possible for many systems to have
| completely deterministic performance, with the
| allocation/deallocation performance made extremely fast, gc
| restricted to certain times or a known constant overhead, etc.
| Ironically, from a programming perspective it's incredibly easy
| in a language like Java to see exactly what allocates and bound
| those cases.
|
| Conversely, it's also possible for reference counting to have
| perverse performance cases over a truly arbitrary reference graph
| with frequent increments and decrements. You're not just doing
| atomic inc/dec, you're traversing an arbitrary number of pointers
| on every reference update, and it can be remarkably difficult to
| avoid de/allocations in something like Python where there's not
| really a builtin notion of a primitive non-object type.
|
| Generally speaking, memory de/allocation patterns are the issue,
| not the specific choice of reference counting vs gc.
| mirekrusin wrote:
| What about - garbage collect by reference counting, like Python?
| knome wrote:
| Reference counting can also have unpredictable hits if you
| release any large data structures. Whoever drops the last
| reference suddenly gets to sit through the entire deep set of
| items to release ( unless you can hand off the release cascade to
| a background thread ).
|
| I've never heard of a reference counting implementation that can
| handle memory compaction.
|
| Every time you update a reference count, which is every time you
| touch any object, you're going to have to write to that RAM,
| which means stealing it from any other threads using it on any
| other processors. If you share large trees of data between
| threads, traversing that tree in different threads will always
| end up with your threads constantly fighting with each other
| since there's no such thing as read only memory in reference
| counting.
|
| When releasing something like a huge list in reference counting,
| how does the release avoid blowing the stack with recursive
| releasing? My guess is this just a "don't use a large list whose
| release may blow the stack with recursive releasing" situation.
| mamcx wrote:
| > hits if you release any large data structures.
|
| Well, that depends in how is the RC done. This is key to
| understand because if you can control it, the RC become
| cheaper.
|
| You can see this way on
|
| http://sblom.github.io/openj-core/iojNoun.htm
|
| ie: If instead of `[Rc(1), Rc(2)]` you do `Rc([1, 2])` that
| work great.
| kaba0 wrote:
| How is that not the exact same for tracing GC?
| mamcx wrote:
| Well, you don't need the GC. That is the point of this: Rc
| _could_ be piece-meal. That is something that could be
| exploited very well making a interpreter, for example (that
| is what Array langs do under the hood)
| exabrial wrote:
| > I've already stated I'm not going to do benchmarks.
|
| Yikes
| jsnell wrote:
| > Basically, you attach the reference to the object graph once,
| and then free it when you're done with it.
|
| So reference counting works by the programmer knowing the
| lifetime of each object allowing them to only increment /
| decrement the refcount once, and trusting that the raw uncounted
| pointers they use elsewhere are always valid? There's another
| word we have for this: manual memory management. It's unsafe and
| unergonomic, and it's pretty telling that the author needs to
| this pattern to make RC appear competitive. It's because actually
| doing reference counting safely is really expensive.
|
| > If GC is so good, why wouldn't Python just garbage collect
| everything, which they already did once and could trivially do,
| instead of going through the hassle of implementing reference
| counting for everything but the one case I mentioned?
|
| Because they've made reference counting a part of their C
| extension API and ABI. If they wanted to use a GC, they'd instead
| need a very different API, and then migrate all the modules to
| the new API. (I.e. a way for those native extension to
| register/unregister memory addresses containing pointers to
| Python objects for the GC to see.)
|
| Early on the deterministic deallocation given by reference
| counting would also have been treated by programmers as a
| language level feature, making it so that a migration would have
| broken working code. But I don't think that was ever actually
| guaranteed in the language spec, and anyway this was not carried
| over to various alternative Python implementations.
| melony wrote:
| You are in for a fun time when you need to make circular data
| structures with ARC.
___________________________________________________________________
(page generated 2022-07-29 23:00 UTC)