[HN Gopher] Garbage collected smart pointers in Rust via concurr...
___________________________________________________________________
Garbage collected smart pointers in Rust via concurrent cycle
collection
Author : maplant
Score : 94 points
Date : 2024-12-13 17:48 UTC (5 hours ago)
(HTM) web link (maplant.com)
(TXT) w3m dump (maplant.com)
| muizelaar wrote:
| An older non concurrent cycle collector is here:
| https://github.com/fitzgen/bacon-rajan-cc
| zozbot234 wrote:
| A slightly newer attempt which _is_ concurrent - in fact, it
| aims to implement the algorithm discussed in OP, though there
| may be some subtle differences in implementation:
| https://github.com/chc4/samsara
| iwontberude wrote:
| I thought one of major points of pride for Rust was eschewing
| garbage collectors. Why don't I understand?
| connicpu wrote:
| Microsoft also made a version of C++ that integrated with the
| .NET garbage collector. Just because it doesn't fit with how
| most people use the language doesn't mean someone won't find it
| useful. Not like this will ever become part of Rust's standard
| library anyways.
| zozbot234 wrote:
| Why shouldn't a high-performance concurrent GC implementation
| enter the standard library? I think it might be quite useful
| if it did, for the rare use case where possibly-cyclical
| graph structures make sense and Arc<> is not quite enough.
| umanwizard wrote:
| The Rust philosophy is to add as little as possible to the
| standard library, for many reasons, but particularly
| because otherwise you have to update your compiler to get
| library updates (which also means long release cycles).
|
| This is why there are various "de facto standard" crates
| for basic things like regexes or error handling that would
| be built-in in other languages.
| Yoric wrote:
| This feels unlikely. Rust is trying very, very hard to not
| have a runtime [1], by opposition to Go (which ships the
| runtime in every executable) or Java, .Net, JS, Python...
| (which require the runtime to be installed separately).
| That's why you need tokio (or some other executor) to
| execute async code, for instance.
|
| The benefit is that it makes Rust much easier to maintain
| and port to new architectures, and much more polyvalent,
| e.g. you can use Rust on true embedded, by opposition to
| most other modern languages, and presumably more future-
| proof. The drawback is that you, as a developer, need to
| pick a runtime.
|
| [1] In truth, there's still a runtime, but it's really
| tiny.
| zozbot234 wrote:
| You make a good point, but what's discussed in the OP
| seems to be less complex than a full async runtime, e.g.
| it does not really depend on low-level OS features. So it
| seems to be the kind of thing that might become part of
| the standard library at some point, much like the Rust
| HashMap. You would still be able to avoid it altogether
| by just not using Gc<> objects (you "don't pay for what
| you don't use") so it wouldn't be a true runtime.
| db48x wrote:
| As already stated, one of Rust's explicit goals is not to
| have a runtime of any kind. If your program happens to
| require a garbage collector, and you want to write it in
| Rust, then you can simply depend on a garbage collector
| provided by a crate. No need to have it built in when 90%
| of programs don't need it. And since you can just include
| one when you do need it, there can easily be multiple
| options to choose from. As engineering is all about
| managing tradeoffs, you want to be able to choose the
| crate that best matches your particular needs. A single
| option buried in the standard library and forced upon you
| may not be the best choice.
|
| That said, most programs should not be written in Rust.
| They should be written in a language that, like Rust, is
| memory safe, but it is best if they are written in a
| language which makes the developers more productive than
| Rust does. Languages with garbage collectors built in are
| at the top of that list. Common Lisp, Javascript, Python,
| Raku, etc; there are plenty of choices. These languages
| are ideal for exploration and productivity. Even Java or
| C#, if you can stomach it. Save Rust for later when you
| have to squeeze every drop of performance out of your
| program, you've already written it once so that you
| already know exactly what you need, and you can justify
| the expense of the rewrite using operational savings.
| Optimal performance will probably happen once you can
| drop the garbage collection entirely, and Rust truly
| shines when used that way.
| zozbot234 wrote:
| > it is best if they are written in a language which
| makes the developers more productive than Rust does.
|
| AIUI, recent assessments (including by Google) have found
| that Rust developers' productivity is competitive with GC
| languages such as Go.
|
| There's also plenty of reasons to think that a program
| that's first written in Rust will have fewer defects in
| the long run than the other languages you mentioned. Of
| course alternative languages may still have a role to
| play for quick-and-dirty, purely exploratory coding, but
| that's about it.
| db48x wrote:
| True, but it is my understanding that in those cases they
| were reimplementing existing services, where the precise
| requirements are already known. Exploration isn't for
| quick-and-dirty programs, but for finding the right
| solution in a wide space of possibilities where you don't
| even know what all the requirements are yet.
|
| For example, consider programming a game. If your only
| concern is putting triangles on the screen, then there
| are some well-known solutions that you can adopt that
| will work great and you can do that in any language you
| want. But gameplay mechanics are a different beast
| entirely; inventing (or discovering) something new and
| fun is exploratory. You'll benefit most from a dynamic
| language where you can change your code rapidly, and
| features like strict typing and lifetimes are not so
| helpful. For this a language like Common Lisp, with
| garbage collection, a REPL, gradual typing, just-in-time
| compilation, and support for multiple programming
| paradigms is ideal. Or Javascript, or whatever.
| throwup238 wrote:
| I think it's less about the runtime and more about
| nailing down and stabilizing the implementation details.
| GCs have a lot of subtle trade offs just like async
| runtimes (i.e. like the decision to require Send + Sync
| versus single threaded Futures) where there's no obvious
| sane default. There's too much variation in what the
| community needs from a GC compared to something like a
| HashMap where a general implementation works fine with
| options to change the hasher or allocator covering >90%
| of use cases.
| zozbot234 wrote:
| I think most async runtimes give you the option of
| launching a single-threaded executor when you explicitly
| request it.
|
| AIUI the most esoteric tradeoffs wrt. GC involve how to
| have data representations that can be efficiently traced,
| and whether you should allow e.g. moving data in memory
| as part of the collection phase. But I think you're
| overestimating somewhat the variation within the _Rust-
| focused_ community - given what most Rust code looks
| like, they 're going to be mostly interested in something
| straightforward yet efficient, that can interoperate with
| existing Rust code w/ relative ease while addressing the
| common issue of cyclical data references.
| tick_tock_tick wrote:
| > The benefit is that it makes Rust much easier to
| maintain and port to new architectures, and much more
| polyvalent, e.g. you can use Rust on true embedded
|
| "supposedly" Rust's architecture support for all but the
| most popular archs is horrendous.
| cogman10 wrote:
| Well, to be frank, the unpopular archs aren't really well
| supported by other languages.
|
| You may be thinking "What about C?" and the answer is
| that for those unpopular archs you are almost certainly
| looking at a hand patched version of GCC (3.2...
| probably) created by the vendor to support C.
|
| You generally won't see good support for unpopular archs
| mainlined in compilers.
| zozbot234 wrote:
| > "supposedly" Rust's architecture support for all but
| the most popular archs is horrendous.
|
| Unpopular architectures are a bit of a pain. But what if
| you compiled from Rust to WASM and then transpiled the
| WASM object code to C? Is this a viable approach?
| bbatha wrote:
| To add on to the siblings one of the major motivating use
| cases for rust is to interact with another GC, originally
| the JS runtime GC for the servo browser. An on-by-default
| one would interfere with that usecase heavily. What I could
| see and there have been some half-hearted efforts in this
| direction is to standardize on a `Trace` trait so that
| users who want to bring their own GC can do so safely and
| can allocate std lib objects into the GC heap without
| concern.
| zozbot234 wrote:
| I guess where I disagree is wrt. viewing something like
| the OP's implementation as something that's "on by
| default". To me it seems to be pretty much the opposite
| of that. You're right though that a standard Trace trait
| might be interesting.
| jerf wrote:
| I think it would be more accurate that Rust doesn't want to
| _need_ a garbage collector. Many more recent languages have
| chosen that as a default from day one. (And others haven 't,
| but it remains a popular option.)
|
| It has always been an option if you want it, though.
|
| Nice thing about it in Rust too is that you can get the benefit
| of the garbage collector, but only and exactly where you want
| it, rather than it being used for the entire language.
| nu11ptr wrote:
| Rust doesn't want to depend on a garbage collector. I'm not
| aware of anyone being against garbage collection on an opt-in
| basis. That said, the article represents a GC type specifically
| designed for the author's scheme variant (implemented in Rust).
| maplant wrote:
| This is in the context of implementing a smart pointer that can
| be used within a interpreter for Scheme
| rafaelferreira wrote:
| The OP is implementing a scheme interpreter in Rust.
| umanwizard wrote:
| Your question is reasonable; I don't think you should have been
| downvoted.
|
| That's indeed one of the major advantages of Rust but it's not
| the only reason people like the language. Just to give an
| example, you can write concurrent code without ever worrying
| that you accidentally failed to protect something with a mutex;
| whether some code is thread-safe or not is encoded in the type
| system and violating thread safety is a compiler error. This is
| a huge advantage that has little to do with lack of GC.
| throwaway81523 wrote:
| I wonder if there is an STM (software transactional memory)
| implementation that uses the borrow checker that way, similar
| to how Haskell's STM monad uses Haskell's type system to
| ensure nothing leaks.
| amelius wrote:
| Rust copied algebraic data types from ML style languages, yet
| programming in a ML style is pretty bad in Rust because there
| is no garbage collector (languages like Haskell would be
| completely unusable without garbage collector). This could fix
| that.
| bcrosby95 wrote:
| Considering not needing a garbage collector is a goal of
| Rust, if programming in an ML style without a garbage
| collector is bad then it sounds like Rust picked the wrong
| style.
| gf000 wrote:
| I would specify grandparent's claim a bit more - certain
| kind of code, frequently used in ML style without a GC is
| hard/impossible.
|
| E.g. heavily recursive data structures may not be a best
| fit (also not the best fit for the CPU).
|
| Rust has definitely grown out its own style, which is more
| of a "strongly typed FP with local-scoped imperative
| parts", elegantly combining ML's strengths with C/C++-style
| performant patterns.
| throwaway81523 wrote:
| Haskell is a pure FP language with what amounts to
| imperative EDSL's in the form of the ST, STM, and IO
| monads. Haskell has its own issues, but is that approach
| of an FP outer layer with embedded imperative sections
| such a bad approach? If you wanted to not have a GC, you
| could use just the imperative fragment.
| zozbot234 wrote:
| > If you wanted to not have a GC, you could use just the
| imperative fragment.
|
| I think it would be quite hard in practice to make
| Haskell not dependent on a GC. What Haskell is most
| obviously lacking at present is a more elegant story of
| how to interface "strict" and "lazy" parts of a single
| program. PLT researchers and logicians have been
| exploring these questions for some time, coming up with
| notions such as "polarity" and "focusing", views of some
| types as "naturally strict" (such as tagged unions),
| others as "naturally lazy" (such as functions) with still
| others coming in both strict and lazy varieties
| ("product" or record types), and elementary operators to
| shift between "strict" and "lazy" uses of any type - but
| the practical implications of these concepts in real-
| world programming language are still unclear. Haskell
| would make an ideal testing ground as such.
| amelius wrote:
| I think what they mean is that you can program without a
| GC in a GC'd language, but it is much harder to do it the
| other way around.
| zozbot234 wrote:
| I think it's just not feasible to totally avoid the GC in
| a language such as Haskell. It's inherently involved in
| anything that would require heap-allocated data in C/C++,
| and more besides. For example, Haskell does not have a
| borrow checker, so it needs the GC to ensure that
| closures' environments will stick around for as long as
| they're needed. (The Rust approach is the opposite by
| default - the compiler errors out whenever a closure will
| outlive the established lifetime of its referenced
| environment. You can always use Rc<> or Arc<> to have
| multiple owners that will all keep a program object alive
| as needed, but it's not the default.)
| hollerith wrote:
| I tried writing simple imperative programs in GHC using
| the IO and IORef types, and the programs ran about 1000
| times slower than analogous programs in C.
| dboreham wrote:
| Hmm. IANALL but aren't types orthogonal to allocation?
| estebank wrote:
| I personally believe that things like automatically calling
| `PureDeref::pure_deref` (or similar) on patterns (similar to
| the apply/unapply methods in Scala) would be a better way to
| improve that experience.
| bjoli wrote:
| Did they ever get siblings tail recursion? I remember using
| it, but since it only worked in release mode I could never
| rely on it.
|
| I found "loop" pretty horrible.
| bluGill wrote:
| When smart pointers first came to C++ Herb Sutter gave a take
| where he claimed that 80% (numbers from memory as I don't want
| to rewatch several hours of cppcon videos to find the real
| claim) of the time you should unique_ptr, 15% shared_ptr - but
| that last 5% you need a mark and sweep garbage collector. I'm
| not a rust guy (I've only done hello world), but my
| understanding is it only gives you unique pointers - I've done
| enough C++ to state that over the base decade unique_ptr has
| always been enough for everything, though sometimes I've needed
| some thought to make it work in weird situations, while
| shared_ptr would have just worked.
|
| There are known real world data structures that are sometimes
| useful (though I've never encountered a need to use them) where
| because you have circular references to things you have to use
| a mark and sweep garbage collector. If you happen to need such
| a thing in your Rust code than this will be useful.
| zozbot234 wrote:
| Rust does give you reference-counted shared pointers, both
| within a single thread (Rc<>) and across multiple threads
| (Arc<>). But reference counting does not properly clean up
| objects in the presence of reference cycles: that's when
| proper tracing GC may be required.
| Yoric wrote:
| To complete your post and the parent: most Rust
| applications don't need GC. Lifetimes, Box and Rc or Arc
| will generally amply cover all your needs.
|
| For a few applications, though (e.g. a JavaScript VM, or
| something that deals with complex graph structures with
| unpredictable lifetimes), a GC is a lifesaver. Even in
| these applications, most of the data structures won't need
| GC pointers. Rust has a few GCs at hand, and lets you opt-
| in to GC specific pointers. In theory, this means that the
| GC can be extremely fast, because the graphs it needs to
| walk will be much smaller than in languages with a general-
| purpose GC. The cost, of course, is that you, as a
| developer, need to pay attention to what needs a GC and
| what doesn't.
| wongarsu wrote:
| > The cost, of course, is that you, as a developer, need
| to pay attention to what needs a GC and what doesn't.
|
| But even there, lifetimes make your life easier. If
| lifetimes work out on their own you know the object will
| be dropped at some point (at the end of the lifetime).
| Only when you opt out of unique_ptr-like behavior you
| have to consider whether Rc/Arc will work (90% of the
| cases), will work with some special care by using some
| weak references, or whether you will require a GC
| gf000 wrote:
| Tracing GC has much better throughput than reference
| counting GCs (yeah, they both are GC algorithms),
| especially when compared against Arc<>, which would be
| quite expensive if many objects were making use of it, due
| to the synchronization overhead. (Note: I'm not saying that
| Rust would be slow or anything, as you can precisely go as
| low-level as you wish and circumvent the price of any form
| of GC. But if you were to have a lot of Arc objects around,
| e.g. you use them to represent stuff the user can also
| manipulate and thus have completely opaque lifetimes, then
| the overhead can become non-negligible)
|
| Tracing GCs can do most of the work completely
| independently of the user code, and in parallel. Also, only
| live objects are touched, which can be also beneficial for
| certain kinds of applications.
| zozbot234 wrote:
| > Note: I'm not saying that Rust would be slow or
| anything, as you can precisely go as low-level as you
| wish and circumvent the price of any form of GC.
|
| Rust can also circumvent some of the overhead of Arc<> as
| it can update the reference count whenever a true "owner"
| is added or removed, as opposed to just any reference to
| the object. But yes, you're right that relying on
| automated reference counting as the _default_ memory
| management strategy involves some overhead, and tracing
| GC may be preferable to that. This is a big issue wrt /
| Swift.
| loeg wrote:
| > I'm not a rust guy (I've only done hello world), but my
| understanding is it only gives you unique pointers
|
| Some very rough analogies: C++ : Rust
| unique_ptr : Box shared_ptr : Rc or Arc
| references : references
| zozbot234 wrote:
| Why does this need to rely on `static` variables in the
| implementation? Can we use QCell-like lifetime branding tricks to
| have multiple independent GC-reference graphs that are assured at
| compile time to always be disjoint, and might undergo collection
| by separate threads?
| maplant wrote:
| This relies on static because the mutation buffer could have
| types last for the lifetime of the program (imagine if the
| collector never runs). I am unfamiliar with what you are
| describing so I cannot answer your second question.
| zozbot234 wrote:
| "Types lasting for the lifetime of the program" is accounted
| for by 'static lifetime. I am referring to the use of
| _static_ and _static mut_ declarations in the code, which is
| not quite the same thing. The interesting question is whether
| it might be possible to have multiple "collection" threads
| within a single instance of the program, each working on
| independent graphs.
| maplant wrote:
| Sorry, I misread your question. I'm sure it's possible but
| this seemed like a good enough implementation for what I
| needed.
| bjoli wrote:
| Sorry for hijacking the thread, but: I was having a look at
| scheme-rs and read in the readme that you support syntax-
| case. This somewhat surprised me as the lower level macro
| facilities are the ugly underbelly of otherwise
| straightforward coding. More than once I have seen people
| skip syntax-case entirely and just go for er/ir-renaming
| macros instead.
|
| I found the syntax-rules implementation but didn't really see
| any part of syntax-case. Did you mean you support syntax-
| transformers in general (together with what looks like a very
| nice syntax-rules implementation), and not syntax-case in
| particular?
| maplant wrote:
| Ah, I meant syntax rules, but syntax case is supported -
| there's just no way to create syntax objects so it's
| practically not useful.
|
| My email is on my website, if you have any other questions
| feel free to shoot me an email.
| mmastrac wrote:
| Branded lifetimes are painful in practice as they infect
| _everything_ and often make it impossible to interop with other
| things.
|
| I don't think you'd find a GC library to be useful to most
| developers if it required that. For those who might need it, it
| can be useful but for everyone else you're just paying for
| unnecessary pain.
| thesave wrote:
| Since it seems on topic, I link a related project by an
| undergraduate at the University of Bologna
| https://saveriogiallorenzo.com/publications/sac2025a/sac2025...
|
| Essentially, it's a refinement of Bacon-Rajan's cycle collector
| (sequential) that does not require auxiliary heap memory and
| handles failures when tracing complex object graphs because it
| uses a breadth-first technique that fundamentally prevents stack
| overflow scenarios.
|
| What's particularly compelling is the Rust implementation, which
| weaves the type system and borrow checker into the algorithm's
| design. When dealing with garbage cycles, the algorithm doesn't
| just match current Rust GC alternatives, it outperforms them.
| lilyball wrote:
| Nitpick: In section 3.3, for the Vec impl, instead of finalizing
| a child and then forgetting it, the child should be moved into a
| ManuallyDrop first. That way if the finalization panics it
| doesn't try to drop the child again while unwinding.
___________________________________________________________________
(page generated 2024-12-13 23:00 UTC)