[HN Gopher] Boehm-Demers-Weiser conservative C/C++ Garbage Colle...
___________________________________________________________________
Boehm-Demers-Weiser conservative C/C++ Garbage Collector
Author : swatson741
Score : 111 points
Date : 2024-01-21 11:30 UTC (11 hours ago)
(HTM) web link (hboehm.info)
(TXT) w3m dump (hboehm.info)
| torginus wrote:
| Ah the spectre of the 80s, most famously used by the ancient fork
| of Mono that Unity uses, leading to GC-pause induced stutters
| that have plagued gaming for over a decade.
|
| I'd estimate that the bad rap GC languages get can be traced back
| to this piece of code.
| pyeri wrote:
| You don't need to go that far for GC's bad rep considering that
| excruciatingly slow Python can be found TODAY!
| DarkNova6 wrote:
| But you must understand, Python is safe and C is difficult.
| Therefore, to be fast we need Python (fast development that
| is because nobody _really_ cares about performance (or
| maintainability)).
|
| If only more than two languages would exist!
| h4ch1 wrote:
| > fast development that is because nobody _really_ cares
| about performance (or maintainability)
|
| This is either
|
| 1. missing a /s
|
| 2. delusion
|
| 3. bait
| tmtvl wrote:
| I believe it's 1, considering the statement 'If only more
| than two languages would exist!'. Granted, sarcasm
| doesn't transmit well in written text.
| amelius wrote:
| Python wasn't designed for speed.
| torginus wrote:
| Afaik (C)Python uses reference counting, it's slowness comes
| from the interpreter itself.
| _flux wrote:
| It uses also tracing GC to handle RC loops.
|
| I guess doing it this way allows to have mostly
| deterministic collection (that is then just an
| implementation feature, not language semantics) while
| needing the actual tracing GC to not be that efficient, as
| it presumably needs to be executed less rarely.
| kaba0 wrote:
| Ref counting is a GC-algorithm, and the slowest at that
| (especially with multi-core sync, but that's not the case
| with python), as a tradeoff for less memory consumption.
| adev_ wrote:
| > You don't need to go that far for GC's bad rep considering
| that excruciatingly slow Python can be found TODAY!
|
| To be fair, the GC is probably the last reason of Python
| speed problems.
|
| Many languages also designed with GC performs significantly
| better than python in term of raw performance [^1].
|
| The extremely dynamic nature of python associated with its
| reference semantics are probably significantly more to blame
| here.
|
| [^1] https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
| p_l wrote:
| Mono is nowhere close to major use case of it, it has been
| succesfully deployed in maaaany projects.
|
| C++ crazies denigrating GC even when it visibly fixed their
| issues, or maybe especially when it did, was a phenomenon since
| 1990. Even when code with Boehm GC ran faster - and most
| importantly, didn't crash all the time.
| jstimpfle wrote:
| How did GC visibly fix their issues?
|
| Manually managed code doesn't crash all the time. It probably
| crashed way more in the 90s. There is an art to writing
| manually managed code that is maintainable and fast, sure,
| and it doesn't involve malloc() and free(). And then there is
| RAII, automating lots of the work.
|
| The C++ crazies are AFAICS still most everywhere where there
| is lots of temporary objects and serious requirements on
| throughput & latency. There are frameworks and languages that
| support garbage collection that get used in this space too,
| but I don't know how much can be inferred from that.
|
| Seems like C and C++ guys are constantly looking for ways to
| automate their garbage cleanup.
|
| And seems like GC guys are constantly looking for ways to fix
| their web-of-objects memory layout, and to prevent GC from
| ruining their performance. Plus, GC isn't deterministic, it
| mostly works for memory but otherwise you'll see _lots_ of
| manual cleanup code (not making use of the GC), and it always
| feels a bit tricky with the GC still being part of the game.
| Not that I take a huge issue with C-like cleanup code -- I
| think RAII can be tricky too because it's hard to "read" what
| happens, and I often prefer the manual cleanup code. But
| compared to GC, at least RAII is deterministic.
| pjmlp wrote:
| The deterministic myth only works out for basic types, see
| Herb Stutter CppCon 2015 talk on the matter, and how
| performance workarounds turn out a poor man's GC.
| jstimpfle wrote:
| What? RAII is totally deterministic, not sure what you
| want to contend here. CppCon 2015: Herb Sutter "Writing
| Good C++14... By Default": Not sure where I'm going to
| find my answer here.
|
| Anyway, for complex state that must be cleaned up, maybe
| flushed before etc., you'll always find a lot of manual
| cleanup code, or otherwise find code with lots of random
| problems, often unreported. That is true both for GC
| languages as well as C++ and other languages with RAII or
| RAII-likes.
| pjmlp wrote:
| Unless you can static assert how long each destructor
| takes, it is not deterministic, even less if ownership is
| given to background threads for cleanup like on
| C++/WinRT.
|
| The talk is "Leak-Freedom in C++", 2016 actually,
| https://youtu.be/JfmTagWcqoE?si=MY1JnWyybh5K8I8-
| jstimpfle wrote:
| What I want to say here is the most basic thing. The
| order in which the destructors get called (also in the
| context of the other code) is deterministic. GC doesn't
| give you that. It should be obvious what I said.
| pjmlp wrote:
| Even that depends if constructors/destructors are ellided
| when UB or LTO are part of the picture.
|
| An optimization that coroutines rely upon as means to
| remove the abstractions of their execution.
| jstimpfle wrote:
| UB? What? That's completely off-topic. Please stop that,
| it's annoying.
|
| I can't imagine what LTO should have to do with that
| either, but if there's any substance here please
| enlighten me.
|
| The order in which destructors get called is well defined
| (and generally useful insofar as it's the reverse order
| of construction). Can't you just accept a simple
| argument? I'm not even a huge fan of RAII, but it has its
| uses.
| nwlieb wrote:
| There's some ambiguity for argument destruction order for
| example: https://stackoverflow.com/a/36992250
|
| Similarly, the construction/destruction order for
| std::tuple elements is not well defined.
|
| Granted, that's implementation defined behavior, which is
| technically deterministic on a single compiler.
| jstimpfle wrote:
| This isn't really about constructors/destructors.
| Expressions like function calls with multiple arguments
| have always been "unsequenced" with respect to each
| other. In other words the order is left to decide for the
| compiler. It's always been like that, going back to C
| (and probably other languages). If you call _f(x++, x++)_
| what values get passed to _f_ is unspecified.
|
| I suppose the destruction of whatever the expressions
| constructed still happens in reverse order of
| construction.
|
| But either way I might not even care, I'm aware that at
| the level of a single statement the execution sequences
| aren't much defined, so I rarely put more than one
| mutating thing per expression, or otherwise I'm sure that
| I don't care about the order -- I could live with any
| sequence as well as totally parallel execution.
|
| Example: _buf[i++] = 5_ has 2 mutating sub-expressions,
| but I know it 's not messing up anything. I don't care
| whether _i_ gets incremented before _5_ gets assigned or
| the other way around.
| josefx wrote:
| That is a one and a half our long talk, can you point to
| the timestamp where he talks about determinism?
|
| > and how performance workarounds turn out a poor man's
| GC.
|
| If you need performance then it doesn't matter if it is a
| "poor mans GC", what matters is that it meets the
| space/time requirements, most general purpose GC tend to
| care about neither.
| pjmlp wrote:
| The only deterministic thing is the call point, Herb
| points out that just like calling malloc()/free() it is
| not possible to deterministically assert the execution
| time, more so with complex data structures where not only
| there is a domino effect of destructors being called,
| there is also the possibility of causing stack overflows
| if the destuctors are written incorrectly, to build an
| argument for presenting his deferred pointer class, which
| is basically a GC in disguise.
| jstimpfle wrote:
| That is an obvious thing, and totally besides the point.
| Also, probably GC only helps little against this. Sure,
| actually calling the destructor can be delayed in order
| to even out latency. You can do the same with RAII.
|
| Sometimes you can't do that though because you have
| certain constraints on the execution order of whatever
| routines you need to run. This is completely independent
| of whether you use manual cleanup or GC or RAII, however
| with GC it is weird/hard/impossible to synchronize the GC
| with the non-cleanup code.
| einpoklum wrote:
| In modern C++, you no longer need to manually-manage your
| memory (except when writing custom containers and low-level
| libraries). Memory is managed automagically, no garbage is
| generated and no collection is necessary.
|
| Of course, you can also do manual allocations if you really
| wanted to, but - with the changes in the language since
| C++11 - you just don't have the motivation to do it.
| mihaaly wrote:
| I assume that real time computing should have used real time
| computing compatible techniques. But luckily those exposed to
| other than games and game development might feel less aversion
| towards GC, probably even the other way around.
| pjmlp wrote:
| Another reason being that many people don't get GC theory, nor
| that because a language has some form of automatic resource
| management, does not mean the language doesn't offer other
| mechanisms.
| yxhuvud wrote:
| Another big reason is the lack of value types in languages like
| Java. Modern languages with a GC tend to have those, and where
| it makes sense to use them they help, a lot.
| amelius wrote:
| What are "value types"?
| greesil wrote:
| In Java, primitives. int, boolean, float. Allocated on the
| stack, or the heap along with the object it is a part of.
| jstimpfle wrote:
| If you nest an object within another object you often want
| to embed the child directly in its parent, sharing the
| memory and avoiding the indirection that you get if you
| nest with a reference as is the typical approach with GC
| languages. Also embedding makes it clear that it's a 1:1
| relation, i.e. non-optional, fixed identity etc.
|
| AFAICT few GC languages allow you to do this, basically you
| have to buy into indirection and end up with webs of
| objects.
|
| This is quite cache unfriendly, but the bigger issue for me
| is that it makes the code less clear and requires
| additional complexities since you can't just jump from a
| child to its parent by subtracting an offset from its
| address.
| ghusbands wrote:
| > makes the code less clear and requires additional
| complexities since you can't just jump from a child to
| its parent by subtracting an offset from its address.
|
| Wait, so you're saying that code that subtracts an offset
| from a child node to get to a parent node is clearer and
| less complex than alternatives?
| groestl wrote:
| > Wait, so you're saying that code that subtracts an
| offset from a child node to get to a parent node is
| clearer and less complex than alternatives?
|
| Parent made me laugh out loud on this.
| jstimpfle wrote:
| I hope you've stopped laughing and can read and
| understand my answer.
| groestl wrote:
| Please, I did not want to offend you personally. It just
| took me by surprise. I understand your points, even
| though I'm with your direct reply, all things considered.
| jstimpfle wrote:
| The address offset subtraction thing specifically is used
| quite a bit in the Linux kernel for example. It allows to
| combine abstract code with concrete data. For example
| they define a red-black tree as a graph of just plain
| nodes. These nodes are totally abstract, they contain
| only node pointers but no useful data. The code is
| totally abstract, no need to instanciate anything a dozen
| times with different offsets and such. You simply wrap
| the abstract nodes with whatever concrete useful data you
| fancy.
|
| The same approach works for lists and a variety of other
| data structures.
|
| You can get from the abstract link node to the containing
| struct with a macro: _CONTAINER_OF(ptr, containertype,
| membername)_. It 's easy to use and unlikely to get it
| wrong.
| jstimpfle wrote:
| Not having the address relationship requires additional
| complexities. To get from the child to the parent, you
| need to spend an additional reference field, and need to
| keep it consistent with the other reference. If you embed
| directly, there's nothing in the relationship of the
| objects' identities that could ever get consistent. It's
| a strong guarantee.
|
| How do you get from the child to the parent in, say Java?
| And how do you tell that a child is never swapped out?
|
| How do you copy an object that holds, say 4 vec4, vec4
| being a struct that itself 4 holds floats? With value
| types, say in C, it's 1 memcpy and you're done.
| rbehrends wrote:
| This needs some qualifications.
|
| The above problem is about latency of stop the world collectors
| in a domain that requires extremely low latency. And if you
| think that stop the world collections are representative of
| garbage collection as a whole (i.e. the "bad rap"), this is
| just not being up to the state of the art. (It's just that
| generational/incremental collection is hard to impossible
| without having support for write barriers/read
| barriers/snapshotting on the compiler or hardware side, which
| makes that a practical no-go for a language that was never
| designed to have GC support.)
|
| But in terms of throughput, the BDW GC actually performs pretty
| well, competitive with or beating malloc()/free() libraries.
| This is plenty enough for a number of applications, especially
| when it comes to batch processing. In fact, even the stop the
| world latency is (combined with parallel marking) plenty good
| enough for a number of types of regular GUI applications, where
| you don't have the fairly extreme latency requirements of
| standard video games.
|
| It is also worth noting that manual deallocation (especially
| via RAII) isn't a panacea for latency, as that is just as prone
| to large pauses due to cascading deletes [1].
|
| [1] While in theory you can do that lazily, in this case you're
| losing timeliness of deletion and may actually have worse worst
| case latency than a modern GC that can stagger its work to have
| bounded pause times. The benefit of RAII is that you may be
| able to plan these deletions, but even that can be surprisingly
| difficult at times, requiring extra management to keep data
| structures alive longer than normal. [2]
|
| [2] Note that lazily deleting objects in RC to avoid pause
| times is not necessarily the answer; for one thing, you're
| introducing much of the complexity associated with incremental
| GC again (especially having to do X amount of deallocation work
| for each allocation), or risking considerable space overhead.
| https://dl.acm.org/doi/abs/10.1145/964001.964019 It also
| remains a difficult challenge when you're dealing with objects
| that aren't of bounded size (e.g. large arrays).
| cplusplusfellow wrote:
| For all of the reasons you note above, this is why I've
| almost exclusively dealt with arena-style allocations of
| fixed-object-sizes when doing ultra low latency work.
| iainctduncan wrote:
| Hi, I'm just starting a (long) PhD in computer music
| languages, and one of my hoped-for projects is improving
| the GC for the Scheme implementation I work in for soft
| realtime use. If you (or anyone else) wouldn't mind sharing
| suggestion on resources for learning about low-latency
| modern GCs, it would be most appreciated. :-)
| moonchild wrote:
| read the garbage collection handbook -
| https://gchandbook.org/
| dataangel wrote:
| I have been writing low latency code for a decade and never
| seen cascading deletes be a problem. GC on the other hand is
| a constant latency problem for people.
| p_l wrote:
| running a throughput-oriented GC instead of
| latency/realtime one is a common error of those people
| (also, throughput oriented ones are usually the default)
| rbehrends wrote:
| > I have been writing low latency code for a decade and
| never seen cascading deletes be a problem.
|
| I don't know what your domain is, but the domains I'm
| familiar with (especially hard real-time) deal with the
| underlying constraints by way of restricting how you write
| programs. A side effect of these restrictions is that you
| generally avoid use constructs that give rise to cascading
| deletes. But ultimately, having to deal with these
| restrictions is suboptimal.
|
| I remember a talk by Gil Tene (CTO and co-founder of Azul
| Systems) about how the problem with writing real-time Java
| was that it historically required you to write non-
| idiomatic Java code, which came at a serious cost to reuse
| and where he argued that one of the benefits of Azul's GC
| was that you could write (mostly) idiomatic Java and still
| have GC noise be less than OS noise.
|
| There is of course the actual problem that the stricter
| your latency requirements are, the fewer GCs actually meet
| them. Plenty of modern off the shelf GCs meet typical soft
| real time requirements, but the harder your real time
| requirements get, the fewer implementations actually meet
| them (and are increasingly more expensive).
|
| I'll also add that depending on how strict your latency
| requirements are, off the shelf allocators may not meet
| them, either, even for a single malloc() (though there
| _are_ allocators that offer hard upper bounds for
| allocation costs, such as TLSF).
| gumby wrote:
| > It's just that generational/incremental collection is hard
| to impossible without having support for write barriers/read
| barriers/snapshotting on the compiler or hardware side, which
| makes that a practical no-go for a language that was never
| designed to have GC support.
|
| It was demonstrated in the mid 80s that the MMU could be used
| to implement the write barrier (when I saw that I knew lisp
| machines were a dead end, though I kept using them for as
| long as was feasible). You can use this and other compiler
| support to implement it in a language not designed for GC
| support, no problem.
|
| The issue I think you meant is that you can't have a
| _transporting_ collector (which also compacts your working
| set, even allowing you to return memory to the system). A
| language like C++ doesn't anticipate an object's address ever
| changing, so that's a no go.
|
| I suppose you could write special shared_ptr / unique_ptr
| that supported address mutation but you'd need a compiler
| flag to check for an object's address being taken... but to
| some degree the destructor paradigm renders a lot of this
| moot.
| rbehrends wrote:
| > It was demonstrated in the mid 80s that the MMU could be
| used to implement the write barrier (when I saw that I knew
| lisp machines were a dead end, though I kept using them for
| as long as was feasible). You can use this and other
| compiler support to implement it in a language not designed
| for GC support, no problem.
|
| Yes, that's why I wrote about support "on the compiler or
| _hardware_ side " (i.e. MMUs). That said, while this might
| work in theory, the BDW GC does have support for it, and in
| practice it doesn't actually work well. Throughput suffers
| and pause times can actually get worse.
|
| By the way, another alternative way of having incremental
| collection without compiler support is to use fork() for
| snapshotting (there's an option in D for that), but that
| has its own issues, of course.
|
| > The issue I think you meant is that you can't have a
| transporting collector (which also compacts your working
| set, even allowing you to return memory to the system). A
| language like C++ doesn't anticipate an object's address
| ever changing, so that's a no go.
|
| No, I wasn't talking about a compacting collector. I very
| specifically meant generational and/or incremental
| collection.
| gumby wrote:
| > No, I wasn't talking about a compacting collector. I
| very specifically meant generational and/or incremental
| collection.
|
| FWIW they are orthogonal
| zozbot234 wrote:
| The compiler support you need in practice is quite limited.
| There is an implementation of incremental cycle collection in
| Rust: https://github.com/chc4/samsara It's made possible
| because Rust can tell apart read-only and read-write
| references (except for references to interior mutable
| objects, but these are known to the compiler and references
| to them can be treated as read-write). This avoids a global
| stop-the-world for the entire program.
|
| Cascading deletes are rare in practice, and if anything they
| are inherent to _deterministic_ deletion, which is often a
| desirable property. When they 're possible, one can often use
| arena allocation to avoid the issue altogether since arenas
| are managed as a single allocation.
| rbehrends wrote:
| > The compiler support you need in practice is quite
| limited.
|
| This is true, but it's absent especially in C. Of course,
| as long as you can intercept writes and/or reads through
| pointers, you are good to go in principle. However, that
| solution may not be efficient.
|
| > Cascading deletes are rare in practice
|
| An array of objects (e.g. std::vector<std::string>) already
| has unbounded deallocation cost. People mostly think of
| tree-like structures here, but such structures only add the
| risk of stack blowout to the lack of an upper bound. But in
| principle, any container that does not have a size limit
| can give rise to unbounded pause times.
| torginus wrote:
| The problem isn't mark-and-sweep GCs it's that the Boehm GC
| is the Ford T model of garbage collectors.
| rbehrends wrote:
| Absolutely not. While it obviously should not be used for
| use cases that it isn't designed for, as a conservative
| mark-and-sweep GC it is extremely well engineered and has
| very good throughput.
| itsphilos wrote:
| My experience with Cpp isn't that extensive, but what is the use-
| case of a garbage collector in this language? I always had the
| impression that with the well-defined lifetimes of objects, you
| wouldn't really create garbage to begin with, but I guess there
| are some use-cases I don't yet know about?
| FartyMcFarter wrote:
| > what is the use-case of a garbage collector in this language?
|
| Same as other languages.
|
| > I always had the impression that with the well-defined
| lifetimes of objects, you wouldn't really create garbage to
| begin with
|
| There's no well defined lifetime of objects when it comes to
| dynamic allocation. For example, if you allocate something with
| the _new_ keyword, there are no language guarantees that this
| won 't leak memory.
| pjmlp wrote:
| Targeting .NET, hence C++/CLI, integration with Blueprints,
| hence Unreal C++.
| mike_hearn wrote:
| It's pretty useful. Chrome uses one extensively for example
| (called Oilpan). So does Unreal Engine. GC in C++ is much more
| widely used than people realize.
|
| The problem is that big programs in the real world often don't
| have well defined lifetimes for everything. The idea that
| everything in your app can have its lifetime worked out in
| advance isn't true when you're modelling the world (a world),
| or lifetimes are controlled by other people (e.g. website
| authors).
|
| Generally what you see is that these apps start out trying to
| do manual memory management, decide it's too difficult to do
| reliably at scale, and switch to a conservatively GCd heap with
| a custom collector or Boehm.
|
| Note that Rust doesn't fix this problem. Rust just encodes the
| assumption of pre-known lifetimes much more strongly in the
| language. If you're not in such a domain then you have to fall
| back to refcounting, which is (a) slow and (b) easy to get
| wrong such that you leak anyway. Early versions of Rust used
| refcounting for everything and iirc they found anywhere between
| 10-15% of the resulting binaries was refcounting instructions!
| steveklabnik wrote:
| > Early versions of Rust used refcounting for everything and
| iirc they found anywhere between 10-15% of the resulting
| binaries was refcounting instructions!
|
| Do you happen to have a citation for this? I don't remember
| ever hearing about it, but it's possible this was before my
| time, as I started in the smart pointers era.
| mike_hearn wrote:
| I read it in an old HN comment by pcwalton. Algolia to the
| rescue! Actually it's even worse, he said it was 19% of the
| binary size:
|
| https://news.ycombinator.com/item?id=5691684
|
| _The Rust compiler does this. Even so, 19% of the binary
| size in rustc is adjusting reference counts.
|
| I am not exaggerating this. One-fifth of the code in the
| binary is sitting there wasted adjusting reference counts.
| This is much of the reason we're moving to tracing garbage
| collection._
|
| It's interesting how many strategies Rust tried before
| settling on linear types.
| steveklabnik wrote:
| Thank you!!!
| klodolph wrote:
| > It's interesting how many strategies Rust tried before
| settling on linear types.
|
| Rust doesn't actually have linear types. I'm not sure
| what Rust's types are called (affine?), but linear types
| are the "must be consumed" (can't leak) types, and Rust
| doesn't have any support for this.
|
| Rust's guarantee is that you MUST NOT use an object after
| dropping it. Linear types would add the additional
| requirement that you MUST drop the object.
| steveklabnik wrote:
| Rust's types are affine, yes. Sometimes some people don't
| draw the distinction between the two and call both
| "linear." But "affine" is more precise.
| pcwalton wrote:
| Fascinating, thank you!
|
| (I completely forgot I wrote that and also I'm mildly
| embarrassed at how I used to write back then.) :)
| adgjlsfhk1 wrote:
| it's worth noting that percent of instructions is a bad
| metric since modern CPUs have lots of extra compute, so
| adding simple integer instructions that aren't in the
| critical path will often not affect the wall time at all.
| mike_hearn wrote:
| The problem is that once threading gets involved they have
| to be interlocked adjustments and that's very slow due to
| all the cache coherency traffic. Refcounts are also
| branches that may or may not be well predicted. And you're
| filling up icache, which is valuable.
| vlovich123 wrote:
| You could have a hybrid recount where you use basic
| integers when adjusting the ref count on the current
| thread and atomics to adjust the global count when you
| hit 0 (hybrid-rc is the crate you can use but something
| Swift-like where the compiler does ARC for specific
| values when you opt in may not be the worst). Also when
| the type isn't Send then you don't need to do atomic
| refcounts although the interaction with unsafe does get a
| like more complex.
|
| But yeah, at this point Rust's niche is a competitor to
| c/c++ and in that world implicit recounting doesn't have
| a lot of traction and people favor explicit gc and
| "manual" resource management (RAII mechanisms like drop
| and destructors are ok).
| bfgeek wrote:
| Oilpan can theoretically be used as a library as well - see:
| https://v8.dev/blog/oilpan-library
| https://v8.dev/blog/tags/cppgc https://chromium.googlesource.
| com/v8/v8.git/+/HEAD/include/c...
|
| Due to the nature of web engine workloads migrating objects
| to being GC'd isn't performance negative (as most people
| would expect). With care it can often end up performance
| positive.
|
| There are a few tricks that Oilpan can apply. Concurrent
| tracing helps a lot (e.g. instead of
| incrementing/decrementing refs, you can trace on a different
| thread), in addition when destructing objects, the
| destructors typically become trivial meaning the object can
| just be dropped from memory. Both these free up main thread
| time. (The tradeoff with concurrent tracing is that you need
| atomic barriers when assigning pointers which needs care).
|
| This is on top of the safey improvements you gain from being
| GC'd vs. smart pointers, etc.
|
| One major tradeoff that UAF bugs become more difficult to
| fix, as you are just accessing objects which "should" be
| dead.
| vlovich123 wrote:
| > One major tradeoff that UAF bugs become more difficult to
| fix, as you are just accessing objects which "should" be
| dead.
|
| Are you referring to access through a raw pointer after
| ownership has been dropped and then garbage collection is
| non deterministic?
| bfgeek wrote:
| > Are you referring to access through a raw pointer after
| ownership has been dropped and then garbage collection is
| non deterministic?
|
| No - basically objects sometimes have some state of when
| they are "destroyed", e.g. an Element detached from the
| DOM tree[1]. Other parts of the codebase might have
| references to these objects, and previously accessing
| them after they destroyed would be a UAF. Now its just a
| bug. This is good! Its not a security bug anymore!
| However much harder to determine what is happening as it
| isn't a hard crash.
|
| [1] This isn't a real case, just an example.
| uluyol wrote:
| Not sure early versions of rust is the best example of
| refcounting overhead. There are a bunch of tricks you can use
| to decrease that, and it usually doesn't make sense to invest
| too much time into that type of thing while there is so much
| flux in the language.
|
| Swift is probably a better baseline.
| EPWN3D wrote:
| Yeah I was thinking the same thing. "10 years ago the Rust
| compiler couldn't produce a binary without significant size
| coming from reference counts after spending minimal effort
| to try and optimize it" doesn't seem like an especially
| damning indictment of the overall strategy. Rust is a
| language which is sensitive to binary size, so they
| probably just saw a lower-effort, higher-reward way to get
| that size back and made the decision to abandon reference
| counts instead of sinking time into optimizing them.
|
| It was probably right for that language at that time, but I
| don't see it as being a generalizable decision.
|
| Swift and ObjC have plenty of optimizations for reference
| counting that go beyond "Elide unless there's an ownership
| change".
| pcwalton wrote:
| > Note that Rust doesn't fix this problem. Rust just encodes
| the assumption of pre-known lifetimes much more strongly in
| the language. If you're not in such a domain then you have to
| fall back to refcounting, which is (a) slow and (b) easy to
| get wrong such that you leak anyway. Early versions of Rust
| used refcounting for everything and iirc they found anywhere
| between 10-15% of the resulting binaries was refcounting
| instructions!
|
| Well, modern idiomatic Rust only uses Arc/Rc on the few
| objects where it's needed, so the overhead of reference count
| adjustment is so tiny as to never show up. You typically only
| see reference count traffic be expensive when either (a)
| _everything_ is reference counted, as in ancient Rust; or (b)
| on super-inefficient implementations of reference counting,
| as in COM where AddRef() and Release() are virtual calls.
| Jeaye wrote:
| I'm using C++ to build jank, a native Clojure dialect on LLVM
| with C++ interop. All of Clojure's runtime objects are
| dynamically allocated and the churn of reference counting is
| far too slow, compared to garbage collection. I had originally
| started with an intrusive_ptr and atomic count and the Boehm GC
| was about 2x faster for that benchmark (and at least as much
| for every later benchmark).
|
| Even outside of writing languages on top of C++, if you're
| using something like immer, for persistent immutable data
| structures in C++ (as jank does), it has memory policies for
| reference counting or garbage collection. This is because
| immutable data generally results in more garbage, even when
| transients are used for the most pathological cases. That
| garbage is the trade off for knowing your values will never be
| mutated in place. The huge win of that is complete thread
| safety for reading those values, as well as complete trust on
| reproducibility/purity and trivial memoization.
| habibur wrote:
| in CPP you don't need GC. Use RAII. Objects only need a
| constructor, destructor and a copy-constructor for a very
| efficient reference counting GC that's almost impossible to leak.
|
| In C you can construct your hackish tracing GC yourself. For root
| objects save the stack address at the beginning of main() by
| saving the address of a dummy stack variable.
| int dummy=0; void* stack_head=&dummy;
|
| and later during GC get the current stack address again and trace
| from old one to new. You also need to save the registers and can
| do it in a semi portable way using unix setjmp() longjmp()
| functions which save the register values into a structure that
| you can then later trace. There's also the issues of split stack
| that you need to keep into mind. But the positive thing is memory
| alignment ensures pointers are all 8 byte aligned, you only need
| to read address%8 values in between. It's a bit hackish
| regardless.
|
| I built a few apps using the above technique but now am more
| comfortable with old styled manual memory management. Valgrind
| will tell you exactly which line you allocated memory that
| leaked.
| pjmlp wrote:
| In theory, yes.
|
| In practice if you care about performance, one ends up creating
| a pseudo GC like Herb Sutter has shown on his CppCon talk about
| deferred _ptr, or why hazardous pointers are coming to C++26.
|
| Then there is the point RAII handles have to be written in
| exception safe way, also be movable if used as part of other
| data structures.
| mgaunard wrote:
| I wouldn't call hazard pointers garbage collection.
|
| Also, I don't think that's necessarily as critical as you
| suggest. I've been working with performance-critical software
| for many years and never saw a strong need for general-
| purpose concurrent reclamation techniques.
| rst wrote:
| The RAII/ref-counting technique often works great, but not
| always. Cyclic structures are a problem -- so, if you've got,
| say, a global tree structure with bidirectional parent/child
| links, you'll need some kind of special-casing to make ref-
| counting clean up an entire abandoned subtree. A full GC does
| take care of cases like this.
| TinkersW wrote:
| That is super rare and we have weak_ptr like objects for
| that.
| nemetroid wrote:
| Parallel reference counting can have extremely poor
| performance.
| shrubble wrote:
| Bjarne Stroustrup's view:
| https://www.stroustrup.com/bs_faq.html#garbage-collection
| nickpsecurity wrote:
| I think a mix of methods are best. We can use non-GC methods for
| whatever is easiest to prove is safe. In Python, they then use
| reference counting by default. Then, a GC when that would be a
| problem. I also prefer when the GC let's me tell it to
| immediately delete something which it just checks incrementally.
| Then, performance-sensitive modules can be coded without GC's
| using tools like Meta's Infer which work on large modules.
|
| For the GC component, there's two options I'd like to see more
| exploration of. One is a modern, concurrent GC to reduce the
| pauses people here are worried about. Immix [1] is an interesting
| design I remember.
|
| The other option is a runtime supporting multiple, app-specific
| GC's without much tuning. JX OS was the first variation of that I
| saw. I just stumbled on this other paper. So, we have a global,
| memory manager divided up into chunks managed differently across
| apps or components.
|
| [1] https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/immix/
|
| [2] https://www4.cs.fau.de/Projects/JX/publications.html (It was
| in 'Beyond Address Spaces.')
|
| [3] https://dl.acm.org/doi/abs/10.1145/1029873.1029880
| truenorth wrote:
| Incremental alternatives to stop-the-world garbage collectors
| were well known over 30 years ago.
| mgaunard wrote:
| The problem with the Boehm GC has always been that it's
| conservative.
|
| If it was something built into the language then you could do
| much better, but then that language wouldn't be C++.
| NelsonMinar wrote:
| What I love about the Boehm collector is the hack that enables it
| to work in a language with no runtime typing like C. It literally
| _guesses_ what 's a pointer. In C pointers are basically integers
| but they're unusual numbers, often quite large, and you know the
| entire set of valid numbers if you control malloc(). Surprisingly
| this worked even in 32 bit architectures.
| chubot wrote:
| I don't see that this is something to "love", and what you
| describe doesn't capture the full extent of the approximation.
|
| What scared me off of Boehm GC is that there a bazillion config
| options -- e.g. it requires tuning and blacklisting to avoid
| memory leaks -- and probably even correctness because register
| scanning isn't portable.
|
| The codebase is large and a bit scary, and there are seemingly
| lots of ways to use it poorly. People kept saying it's "drop
| in", but that seems like a myth.
|
| At least personally it doesn't seem like the kind of thing I
| want to debug.
|
| The opinions on Boehm generally seem very split -- a bunch of
| bad stories resulting in ripping it out of the codebase, and
| also people who think it's surprisingly good, and codebases
| that have used it for a very long time.
|
| I think the Nix evaluator uses it and is carrying around
| patches, and Guile Scheme uses it. Though I think there is some
| desire to move away from it in both cases. But again people
| have used it for a very long time, which isn't nothing!
|
| ---
|
| e.g. https://www.hboehm.info/gc/gcdescr.html
|
| _Recent versions of the collector use an approximate best fit
| algorithm by keeping free lists for several large block sizes.
| The actual implementation of GC_allochblk is significantly
| complicated by black-listing issues (see below)._
|
| ...
|
| _The collector implements black-listing of pages, as described
| in Boehm, ``Space Efficient Conservative Collection '', PLDI
| '93, also available here._
|
| _During the mark phase, the collector tracks ``near misses '',
| i.e. attempts to follow a ``pointer'' to just outside the
| garbage-collected heap, or to a currently unallocated page
| inside the heap. Pages that have been the targets of such near
| misses are likely to be the targets of misidentified
| ``pointers'' in the future. To minimize the future damage
| caused by such misidentifications they will be allocated only
| to small pointerfree objects._
|
| _The collector understands two different kinds of black-
| listing. A page may be black listed for interior pointer
| references (GC_add_to_black_list_stack), if it was the target
| of a near miss from a location that requires interior pointer
| recognition, e.g. the stack, or the heap if
| GC_all_interior_pointers is set. In this case, we also avoid
| allocating large blocks that include this page._
| NelsonMinar wrote:
| "love" in the sense of someone loving their snaggletooth
| mutt. What's remarkable to me is despite the obviously
| terrible idea at the core of it, it seems to work pretty well
| for some practical use. (IIRC it was at the core of GNU Java
| as well, and I could have sworn the earliest Sun JVMs used
| it.)
|
| I don't doubt it caused all sorts of problems too. It's just
| an interesting reminder that sometimes a really wacky hack
| can be pretty useful.
| m4nu3l wrote:
| I once wrote a non-conservative generational GC for C++ just as
| an exercise, with the constraint that I could only use standard
| C++ (it was in C++17).
|
| It worked based on a template type I called "gc_ptr<T>". So you
| could create one of these with the function template
| "make_gc(...)". This was the only way you could allocate a new
| garbage-collected object.
|
| "make_gc" would check if a type descriptor for the type was
| already initialized and if not it would allocate a new one. It
| would then set a thread_local with the descriptor and the address
| of the current object being created.
|
| Then it would call the object constructor. If the type being
| instantiated had gc_ptr members, these would be initialized by
| their constructor. The gc_ptr constructor would, in turn, check
| the descriptor of the parent object and add a record to it
| representing the member. The record would store the member offset
| calculated from its address and the one of the parent object.
|
| Garbage-collected objects would also use reference counting for
| gc_ptr(s) on the stack or inside objects that were not garbage-
| collected. It's easy to know if a gc_ptr is being constructed
| inside a garbage-collected object or not: If it is, then the code
| is also executing inside make_gc, so I just need a thread_local
| flag.
|
| For the mark and sweep, I would use an atomic flag to mark the GC
| as running. If the flag was set, then all gc_ptr would stop the
| thread (by waiting on a mutex) as soon as some code would try to
| swap/reassign them. This means that code that didn't touch them
| would be kept running during garbage collection.
|
| I would start marking objects from those allocated in the custom
| arena I used that had a reference count different from zero.
|
| I wanted to add containers that could allocate contiguous GC'ed
| objects by wrapping the standard containers but never finished
| that. GC'ed containers of gc_ptr(s) just worked fine.
|
| I just re-ran the benchmarks I wrote to compare it against the
| Boehm-Demser-Weiser GC. It is quite faster in doing the mark-and-
| sweep (up to an order of magnitude faster with thousands of
| objects and a bit less than 100 MBs of GC'ed memory in total).
|
| However, because of the atomic check, it was slower in swapping
| pointers by roughly an order of magnitude.
|
| I never used it in any project.
| frutiger wrote:
| Did you ever publish this library?
| rurban wrote:
| I have a library which has an extremely slow free, around 2m for
| large files, because of unnaturally scattered allocation
| patterns, but this old conservative GC didn't help at all. It was
| about 40% slower with libgc. mimalloc was a bit better. Best
| would be an arena allocator without free, or at least a properly
| fast GC, like mps https://github.com/Ravenbrook/mps, but this
| would be too much work.
| CyberDildonics wrote:
| If you have a program that takes 2 minutes to free allocations,
| that implies 100 million to 1 billion small allocations. An
| arena is not the answer, a normal data structure is (most
| likely a vector). I'm not sure an arena is ever really the
| answer, because it seems like a poor work around to multiple
| small allocations, which shouldn't happen in the first place.
|
| Maybe that time is coming from something even more ridiculous
| like scanning an entire linked list to find each element, but
| if you can open up that library there are without a doubt
| terrible memory allocation techniques that should never happen
| going on.
| Jeaye wrote:
| I have been using the Boehm-Demers-Weiser GC for jank [1], the
| native Clojure dialect on LLVM with C++ interop. Since jank is a
| Clojure dialect, and Clojure is built primarily on persistent,
| immutable data structures, there are potentially a lot of
| references to objects, potentially cyclical, across many threads,
| and there's a lot of garbage being churned. I originally started
| with reference counting and RAII, using boost::intrusive_ptr and
| an atomic count. The GC was actually 2x faster, in the sequence
| benchmark I was trying to optimize.
|
| At this point, jank is generally several times faster than the
| equivalent Clojure JVM code. I'm sure there are better GCs out
| there, in terms of performance, and I have my eye on MMTK [2] for
| a future upgrade, but the fact that remains is this: the Boehm GC
| is stupid simple to integrate and it's surprisingly fast. Compare
| it to MPS, MMTK, and others and both the documentation and the
| actual dev work required are worlds apart.
|
| For a project which needs a GC but doesn't need to pick the
| absolute fastest one first, it seems like the best option, based
| on my research and my experience.
|
| 1: https://jank-lang.org/
|
| 2: https://www.mmtk.io/code
| pharmakom wrote:
| Immutable structures cannot be cyclical. Maybe I'm missing
| something?
| nlitened wrote:
| In Clojure, the data structures may usually contain
| references to mutable cells, which in turn may form cycles.
| Jeaye wrote:
| Exactly as nlitened mentioned, Clojure has a few difference
| reference types, which are boxes that hold pointers to more
| immutable values. The values within those boxes can be
| changed transactionally. This allows for cyclical references.
| mtlmtlmtlmtl wrote:
| Just want to say, Never heard of Jank before and it looks
| awesome, great work!.
|
| As more of a CL guy who's dabbled in Clojure but found it a bit
| too slow(but conversely I also miss the Clojure's data
| structures and transactional state in CL; multithread
| programming really isn't CLs strong suit, for obvious
| historical reasons), this might reinvigorate me. I may sit down
| one day soon and churn out some AOC or something in Jank just
| to get a feel.
|
| Cheers :)
| Jeaye wrote:
| The shameless plugging pays off! One more HN reader
| converted. :D
|
| Check it out, join our Slack, and stay tuned. jank's still
| under heavy development right now, and not ready for AOC this
| year, but it's getting close. The blog is the best way to
| watch the progress: https://jank-lang.org/blog/
|
| Thanks for taking the time to comment!
| dang wrote:
| Related. Others?
|
| _A Garbage Collector for C and C++_ -
| https://news.ycombinator.com/item?id=9862135 - July 2015 (73
| comments)
| gregfjohnson wrote:
| This thread brings to mind an issue I've wondered about: For hard
| real-time systems, do people use specialized CPU's and hardware?
| Caches can introduce hard-to-predict timing issues. I'm sure
| there are other aspects of modern hardware that present similar
| problems; large pipelines? Speculative instruction execution?
| Does Intel Time-Coordinated Computing address some of these
| hardware issues? Are there embedded CPU's from TI, Intel, AMD,
| etc. that are specifically designed to support predictable and
| understandable "lockstep" hard real-time time behavior?
| deschutes wrote:
| It's an interesting question and one I'm unqualified to answer.
| From a brief survey I gather at least caching is enabled for
| some hard real time systems. The name of the game seems to be
| employing caches in ways that provably decrease worst case
| execution time. For example, ignoring associativity, if your
| task's worst case working set fits into a given cache level,
| you can reason that the task misses each line at most once.
| That's almost certainly a dramatic improvement in wcet over
| cache disabled.
|
| Worst case execution time with pipelining should be similarly
| measurable given a worst case execution (which seems in any
| case necessary to demonstrate a bound on the task, even in a
| lockstep system).
|
| See https://cs-
| people.bu.edu/rmancuso/files/papers/cache_survey_...
| HarHarVeryFunny wrote:
| I guess garbage collection has it's uses, but it's very niche.
| Since C++11 (i.e 2011) (STL, unique_ptr, shared_ptr), any need
| for it in C++ has all but disappeared. For the most part you are
| using STL data structures, and if you need something custom then
| just use a class with a smart pointer member owner, and you get
| memory management for free (and a lot more performant than a
| garbage collector).
| Rochus wrote:
| There are a lot of cases where unique and shared pointers are
| no good solution, and where a GC would help (e.g. when there is
| no clear ownership hierarchy but a lot of mutual dependencies
| between objects). And not to forget that smart pointers come
| with a non-negligible cost.
| Animats wrote:
| > In our experience, the only examples we have found of a failure
| with the current collector, even in multi-threaded code, were
| contrived.
|
| I once found a real incompatibilty. This collector assumes that a
| pointer to a block must point to somewhere in the block, or, in
| later versions, at least somewhere really close.
|
| The original Numerical Recipes in C was a conversion of Numerical
| Recipes in FORTRAN. FORTRAN arrays start at 1. So, NR in C had an
| array allocator which returned an array address computed such
| that subscripts starting at 1 would work.[1] If you allocated a
| 100x100 array of 4-byte f32 values, the array address would be
| offset downward by (100+1)*4 bytes. That would not look like a
| pointer to the array to the garbage collector.
|
| By the third edition, in 1986, the Numerical Recipes people had
| converted to C++, dropped FORTRAN array compatibility, and the
| problem went away.
|
| [1] http://s3.amazonaws.com/nrbook.com/book_C210.html page 19.
| einpoklum wrote:
| Related C++ Stackoverflow question/discussion:
|
| "Why Doesn't C++ have a garbage collector?"
| https://stackoverflow.com/q/147130/1593077
|
| and my answer: https://stackoverflow.com/a/48046118/1593077
|
| ... which develos Bjarne's classic quote:
|
| "I don't like garbage. I don't like littering. My ideal is to
| eliminate the need for a garbage collector by not producing any
| garbage. That is now possible."
___________________________________________________________________
(page generated 2024-01-21 23:00 UTC)