[HN Gopher] Garbage collected smart pointers in Rust via concurr...
       ___________________________________________________________________
        
       Garbage collected smart pointers in Rust via concurrent cycle
       collection
        
       Author : maplant
       Score  : 94 points
       Date   : 2024-12-13 17:48 UTC (5 hours ago)
        
 (HTM) web link (maplant.com)
 (TXT) w3m dump (maplant.com)
        
       | muizelaar wrote:
       | An older non concurrent cycle collector is here:
       | https://github.com/fitzgen/bacon-rajan-cc
        
         | zozbot234 wrote:
         | A slightly newer attempt which _is_ concurrent - in fact, it
         | aims to implement the algorithm discussed in OP, though there
         | may be some subtle differences in implementation:
         | https://github.com/chc4/samsara
        
       | iwontberude wrote:
       | I thought one of major points of pride for Rust was eschewing
       | garbage collectors. Why don't I understand?
        
         | connicpu wrote:
         | Microsoft also made a version of C++ that integrated with the
         | .NET garbage collector. Just because it doesn't fit with how
         | most people use the language doesn't mean someone won't find it
         | useful. Not like this will ever become part of Rust's standard
         | library anyways.
        
           | zozbot234 wrote:
           | Why shouldn't a high-performance concurrent GC implementation
           | enter the standard library? I think it might be quite useful
           | if it did, for the rare use case where possibly-cyclical
           | graph structures make sense and Arc<> is not quite enough.
        
             | umanwizard wrote:
             | The Rust philosophy is to add as little as possible to the
             | standard library, for many reasons, but particularly
             | because otherwise you have to update your compiler to get
             | library updates (which also means long release cycles).
             | 
             | This is why there are various "de facto standard" crates
             | for basic things like regexes or error handling that would
             | be built-in in other languages.
        
             | Yoric wrote:
             | This feels unlikely. Rust is trying very, very hard to not
             | have a runtime [1], by opposition to Go (which ships the
             | runtime in every executable) or Java, .Net, JS, Python...
             | (which require the runtime to be installed separately).
             | That's why you need tokio (or some other executor) to
             | execute async code, for instance.
             | 
             | The benefit is that it makes Rust much easier to maintain
             | and port to new architectures, and much more polyvalent,
             | e.g. you can use Rust on true embedded, by opposition to
             | most other modern languages, and presumably more future-
             | proof. The drawback is that you, as a developer, need to
             | pick a runtime.
             | 
             | [1] In truth, there's still a runtime, but it's really
             | tiny.
        
               | zozbot234 wrote:
               | You make a good point, but what's discussed in the OP
               | seems to be less complex than a full async runtime, e.g.
               | it does not really depend on low-level OS features. So it
               | seems to be the kind of thing that might become part of
               | the standard library at some point, much like the Rust
               | HashMap. You would still be able to avoid it altogether
               | by just not using Gc<> objects (you "don't pay for what
               | you don't use") so it wouldn't be a true runtime.
        
               | db48x wrote:
               | As already stated, one of Rust's explicit goals is not to
               | have a runtime of any kind. If your program happens to
               | require a garbage collector, and you want to write it in
               | Rust, then you can simply depend on a garbage collector
               | provided by a crate. No need to have it built in when 90%
               | of programs don't need it. And since you can just include
               | one when you do need it, there can easily be multiple
               | options to choose from. As engineering is all about
               | managing tradeoffs, you want to be able to choose the
               | crate that best matches your particular needs. A single
               | option buried in the standard library and forced upon you
               | may not be the best choice.
               | 
               | That said, most programs should not be written in Rust.
               | They should be written in a language that, like Rust, is
               | memory safe, but it is best if they are written in a
               | language which makes the developers more productive than
               | Rust does. Languages with garbage collectors built in are
               | at the top of that list. Common Lisp, Javascript, Python,
               | Raku, etc; there are plenty of choices. These languages
               | are ideal for exploration and productivity. Even Java or
               | C#, if you can stomach it. Save Rust for later when you
               | have to squeeze every drop of performance out of your
               | program, you've already written it once so that you
               | already know exactly what you need, and you can justify
               | the expense of the rewrite using operational savings.
               | Optimal performance will probably happen once you can
               | drop the garbage collection entirely, and Rust truly
               | shines when used that way.
        
               | zozbot234 wrote:
               | > it is best if they are written in a language which
               | makes the developers more productive than Rust does.
               | 
               | AIUI, recent assessments (including by Google) have found
               | that Rust developers' productivity is competitive with GC
               | languages such as Go.
               | 
               | There's also plenty of reasons to think that a program
               | that's first written in Rust will have fewer defects in
               | the long run than the other languages you mentioned. Of
               | course alternative languages may still have a role to
               | play for quick-and-dirty, purely exploratory coding, but
               | that's about it.
        
               | db48x wrote:
               | True, but it is my understanding that in those cases they
               | were reimplementing existing services, where the precise
               | requirements are already known. Exploration isn't for
               | quick-and-dirty programs, but for finding the right
               | solution in a wide space of possibilities where you don't
               | even know what all the requirements are yet.
               | 
               | For example, consider programming a game. If your only
               | concern is putting triangles on the screen, then there
               | are some well-known solutions that you can adopt that
               | will work great and you can do that in any language you
               | want. But gameplay mechanics are a different beast
               | entirely; inventing (or discovering) something new and
               | fun is exploratory. You'll benefit most from a dynamic
               | language where you can change your code rapidly, and
               | features like strict typing and lifetimes are not so
               | helpful. For this a language like Common Lisp, with
               | garbage collection, a REPL, gradual typing, just-in-time
               | compilation, and support for multiple programming
               | paradigms is ideal. Or Javascript, or whatever.
        
               | throwup238 wrote:
               | I think it's less about the runtime and more about
               | nailing down and stabilizing the implementation details.
               | GCs have a lot of subtle trade offs just like async
               | runtimes (i.e. like the decision to require Send + Sync
               | versus single threaded Futures) where there's no obvious
               | sane default. There's too much variation in what the
               | community needs from a GC compared to something like a
               | HashMap where a general implementation works fine with
               | options to change the hasher or allocator covering >90%
               | of use cases.
        
               | zozbot234 wrote:
               | I think most async runtimes give you the option of
               | launching a single-threaded executor when you explicitly
               | request it.
               | 
               | AIUI the most esoteric tradeoffs wrt. GC involve how to
               | have data representations that can be efficiently traced,
               | and whether you should allow e.g. moving data in memory
               | as part of the collection phase. But I think you're
               | overestimating somewhat the variation within the _Rust-
               | focused_ community - given what most Rust code looks
               | like, they 're going to be mostly interested in something
               | straightforward yet efficient, that can interoperate with
               | existing Rust code w/ relative ease while addressing the
               | common issue of cyclical data references.
        
               | tick_tock_tick wrote:
               | > The benefit is that it makes Rust much easier to
               | maintain and port to new architectures, and much more
               | polyvalent, e.g. you can use Rust on true embedded
               | 
               | "supposedly" Rust's architecture support for all but the
               | most popular archs is horrendous.
        
               | cogman10 wrote:
               | Well, to be frank, the unpopular archs aren't really well
               | supported by other languages.
               | 
               | You may be thinking "What about C?" and the answer is
               | that for those unpopular archs you are almost certainly
               | looking at a hand patched version of GCC (3.2...
               | probably) created by the vendor to support C.
               | 
               | You generally won't see good support for unpopular archs
               | mainlined in compilers.
        
               | zozbot234 wrote:
               | > "supposedly" Rust's architecture support for all but
               | the most popular archs is horrendous.
               | 
               | Unpopular architectures are a bit of a pain. But what if
               | you compiled from Rust to WASM and then transpiled the
               | WASM object code to C? Is this a viable approach?
        
             | bbatha wrote:
             | To add on to the siblings one of the major motivating use
             | cases for rust is to interact with another GC, originally
             | the JS runtime GC for the servo browser. An on-by-default
             | one would interfere with that usecase heavily. What I could
             | see and there have been some half-hearted efforts in this
             | direction is to standardize on a `Trace` trait so that
             | users who want to bring their own GC can do so safely and
             | can allocate std lib objects into the GC heap without
             | concern.
        
               | zozbot234 wrote:
               | I guess where I disagree is wrt. viewing something like
               | the OP's implementation as something that's "on by
               | default". To me it seems to be pretty much the opposite
               | of that. You're right though that a standard Trace trait
               | might be interesting.
        
         | jerf wrote:
         | I think it would be more accurate that Rust doesn't want to
         | _need_ a garbage collector. Many more recent languages have
         | chosen that as a default from day one. (And others haven 't,
         | but it remains a popular option.)
         | 
         | It has always been an option if you want it, though.
         | 
         | Nice thing about it in Rust too is that you can get the benefit
         | of the garbage collector, but only and exactly where you want
         | it, rather than it being used for the entire language.
        
         | nu11ptr wrote:
         | Rust doesn't want to depend on a garbage collector. I'm not
         | aware of anyone being against garbage collection on an opt-in
         | basis. That said, the article represents a GC type specifically
         | designed for the author's scheme variant (implemented in Rust).
        
         | maplant wrote:
         | This is in the context of implementing a smart pointer that can
         | be used within a interpreter for Scheme
        
         | rafaelferreira wrote:
         | The OP is implementing a scheme interpreter in Rust.
        
         | umanwizard wrote:
         | Your question is reasonable; I don't think you should have been
         | downvoted.
         | 
         | That's indeed one of the major advantages of Rust but it's not
         | the only reason people like the language. Just to give an
         | example, you can write concurrent code without ever worrying
         | that you accidentally failed to protect something with a mutex;
         | whether some code is thread-safe or not is encoded in the type
         | system and violating thread safety is a compiler error. This is
         | a huge advantage that has little to do with lack of GC.
        
           | throwaway81523 wrote:
           | I wonder if there is an STM (software transactional memory)
           | implementation that uses the borrow checker that way, similar
           | to how Haskell's STM monad uses Haskell's type system to
           | ensure nothing leaks.
        
         | amelius wrote:
         | Rust copied algebraic data types from ML style languages, yet
         | programming in a ML style is pretty bad in Rust because there
         | is no garbage collector (languages like Haskell would be
         | completely unusable without garbage collector). This could fix
         | that.
        
           | bcrosby95 wrote:
           | Considering not needing a garbage collector is a goal of
           | Rust, if programming in an ML style without a garbage
           | collector is bad then it sounds like Rust picked the wrong
           | style.
        
             | gf000 wrote:
             | I would specify grandparent's claim a bit more - certain
             | kind of code, frequently used in ML style without a GC is
             | hard/impossible.
             | 
             | E.g. heavily recursive data structures may not be a best
             | fit (also not the best fit for the CPU).
             | 
             | Rust has definitely grown out its own style, which is more
             | of a "strongly typed FP with local-scoped imperative
             | parts", elegantly combining ML's strengths with C/C++-style
             | performant patterns.
        
               | throwaway81523 wrote:
               | Haskell is a pure FP language with what amounts to
               | imperative EDSL's in the form of the ST, STM, and IO
               | monads. Haskell has its own issues, but is that approach
               | of an FP outer layer with embedded imperative sections
               | such a bad approach? If you wanted to not have a GC, you
               | could use just the imperative fragment.
        
               | zozbot234 wrote:
               | > If you wanted to not have a GC, you could use just the
               | imperative fragment.
               | 
               | I think it would be quite hard in practice to make
               | Haskell not dependent on a GC. What Haskell is most
               | obviously lacking at present is a more elegant story of
               | how to interface "strict" and "lazy" parts of a single
               | program. PLT researchers and logicians have been
               | exploring these questions for some time, coming up with
               | notions such as "polarity" and "focusing", views of some
               | types as "naturally strict" (such as tagged unions),
               | others as "naturally lazy" (such as functions) with still
               | others coming in both strict and lazy varieties
               | ("product" or record types), and elementary operators to
               | shift between "strict" and "lazy" uses of any type - but
               | the practical implications of these concepts in real-
               | world programming language are still unclear. Haskell
               | would make an ideal testing ground as such.
        
               | amelius wrote:
               | I think what they mean is that you can program without a
               | GC in a GC'd language, but it is much harder to do it the
               | other way around.
        
               | zozbot234 wrote:
               | I think it's just not feasible to totally avoid the GC in
               | a language such as Haskell. It's inherently involved in
               | anything that would require heap-allocated data in C/C++,
               | and more besides. For example, Haskell does not have a
               | borrow checker, so it needs the GC to ensure that
               | closures' environments will stick around for as long as
               | they're needed. (The Rust approach is the opposite by
               | default - the compiler errors out whenever a closure will
               | outlive the established lifetime of its referenced
               | environment. You can always use Rc<> or Arc<> to have
               | multiple owners that will all keep a program object alive
               | as needed, but it's not the default.)
        
               | hollerith wrote:
               | I tried writing simple imperative programs in GHC using
               | the IO and IORef types, and the programs ran about 1000
               | times slower than analogous programs in C.
        
           | dboreham wrote:
           | Hmm. IANALL but aren't types orthogonal to allocation?
        
           | estebank wrote:
           | I personally believe that things like automatically calling
           | `PureDeref::pure_deref` (or similar) on patterns (similar to
           | the apply/unapply methods in Scala) would be a better way to
           | improve that experience.
        
           | bjoli wrote:
           | Did they ever get siblings tail recursion? I remember using
           | it, but since it only worked in release mode I could never
           | rely on it.
           | 
           | I found "loop" pretty horrible.
        
         | bluGill wrote:
         | When smart pointers first came to C++ Herb Sutter gave a take
         | where he claimed that 80% (numbers from memory as I don't want
         | to rewatch several hours of cppcon videos to find the real
         | claim) of the time you should unique_ptr, 15% shared_ptr - but
         | that last 5% you need a mark and sweep garbage collector. I'm
         | not a rust guy (I've only done hello world), but my
         | understanding is it only gives you unique pointers - I've done
         | enough C++ to state that over the base decade unique_ptr has
         | always been enough for everything, though sometimes I've needed
         | some thought to make it work in weird situations, while
         | shared_ptr would have just worked.
         | 
         | There are known real world data structures that are sometimes
         | useful (though I've never encountered a need to use them) where
         | because you have circular references to things you have to use
         | a mark and sweep garbage collector. If you happen to need such
         | a thing in your Rust code than this will be useful.
        
           | zozbot234 wrote:
           | Rust does give you reference-counted shared pointers, both
           | within a single thread (Rc<>) and across multiple threads
           | (Arc<>). But reference counting does not properly clean up
           | objects in the presence of reference cycles: that's when
           | proper tracing GC may be required.
        
             | Yoric wrote:
             | To complete your post and the parent: most Rust
             | applications don't need GC. Lifetimes, Box and Rc or Arc
             | will generally amply cover all your needs.
             | 
             | For a few applications, though (e.g. a JavaScript VM, or
             | something that deals with complex graph structures with
             | unpredictable lifetimes), a GC is a lifesaver. Even in
             | these applications, most of the data structures won't need
             | GC pointers. Rust has a few GCs at hand, and lets you opt-
             | in to GC specific pointers. In theory, this means that the
             | GC can be extremely fast, because the graphs it needs to
             | walk will be much smaller than in languages with a general-
             | purpose GC. The cost, of course, is that you, as a
             | developer, need to pay attention to what needs a GC and
             | what doesn't.
        
               | wongarsu wrote:
               | > The cost, of course, is that you, as a developer, need
               | to pay attention to what needs a GC and what doesn't.
               | 
               | But even there, lifetimes make your life easier. If
               | lifetimes work out on their own you know the object will
               | be dropped at some point (at the end of the lifetime).
               | Only when you opt out of unique_ptr-like behavior you
               | have to consider whether Rc/Arc will work (90% of the
               | cases), will work with some special care by using some
               | weak references, or whether you will require a GC
        
             | gf000 wrote:
             | Tracing GC has much better throughput than reference
             | counting GCs (yeah, they both are GC algorithms),
             | especially when compared against Arc<>, which would be
             | quite expensive if many objects were making use of it, due
             | to the synchronization overhead. (Note: I'm not saying that
             | Rust would be slow or anything, as you can precisely go as
             | low-level as you wish and circumvent the price of any form
             | of GC. But if you were to have a lot of Arc objects around,
             | e.g. you use them to represent stuff the user can also
             | manipulate and thus have completely opaque lifetimes, then
             | the overhead can become non-negligible)
             | 
             | Tracing GCs can do most of the work completely
             | independently of the user code, and in parallel. Also, only
             | live objects are touched, which can be also beneficial for
             | certain kinds of applications.
        
               | zozbot234 wrote:
               | > Note: I'm not saying that Rust would be slow or
               | anything, as you can precisely go as low-level as you
               | wish and circumvent the price of any form of GC.
               | 
               | Rust can also circumvent some of the overhead of Arc<> as
               | it can update the reference count whenever a true "owner"
               | is added or removed, as opposed to just any reference to
               | the object. But yes, you're right that relying on
               | automated reference counting as the _default_ memory
               | management strategy involves some overhead, and tracing
               | GC may be preferable to that. This is a big issue wrt /
               | Swift.
        
           | loeg wrote:
           | > I'm not a rust guy (I've only done hello world), but my
           | understanding is it only gives you unique pointers
           | 
           | Some very rough analogies:                 C++        : Rust
           | unique_ptr : Box       shared_ptr : Rc or Arc
           | references : references
        
       | zozbot234 wrote:
       | Why does this need to rely on `static` variables in the
       | implementation? Can we use QCell-like lifetime branding tricks to
       | have multiple independent GC-reference graphs that are assured at
       | compile time to always be disjoint, and might undergo collection
       | by separate threads?
        
         | maplant wrote:
         | This relies on static because the mutation buffer could have
         | types last for the lifetime of the program (imagine if the
         | collector never runs). I am unfamiliar with what you are
         | describing so I cannot answer your second question.
        
           | zozbot234 wrote:
           | "Types lasting for the lifetime of the program" is accounted
           | for by 'static lifetime. I am referring to the use of
           | _static_ and _static mut_ declarations in the code, which is
           | not quite the same thing. The interesting question is whether
           | it might be possible to have multiple "collection" threads
           | within a single instance of the program, each working on
           | independent graphs.
        
             | maplant wrote:
             | Sorry, I misread your question. I'm sure it's possible but
             | this seemed like a good enough implementation for what I
             | needed.
        
           | bjoli wrote:
           | Sorry for hijacking the thread, but: I was having a look at
           | scheme-rs and read in the readme that you support syntax-
           | case. This somewhat surprised me as the lower level macro
           | facilities are the ugly underbelly of otherwise
           | straightforward coding. More than once I have seen people
           | skip syntax-case entirely and just go for er/ir-renaming
           | macros instead.
           | 
           | I found the syntax-rules implementation but didn't really see
           | any part of syntax-case. Did you mean you support syntax-
           | transformers in general (together with what looks like a very
           | nice syntax-rules implementation), and not syntax-case in
           | particular?
        
             | maplant wrote:
             | Ah, I meant syntax rules, but syntax case is supported -
             | there's just no way to create syntax objects so it's
             | practically not useful.
             | 
             | My email is on my website, if you have any other questions
             | feel free to shoot me an email.
        
         | mmastrac wrote:
         | Branded lifetimes are painful in practice as they infect
         | _everything_ and often make it impossible to interop with other
         | things.
         | 
         | I don't think you'd find a GC library to be useful to most
         | developers if it required that. For those who might need it, it
         | can be useful but for everyone else you're just paying for
         | unnecessary pain.
        
       | thesave wrote:
       | Since it seems on topic, I link a related project by an
       | undergraduate at the University of Bologna
       | https://saveriogiallorenzo.com/publications/sac2025a/sac2025...
       | 
       | Essentially, it's a refinement of Bacon-Rajan's cycle collector
       | (sequential) that does not require auxiliary heap memory and
       | handles failures when tracing complex object graphs because it
       | uses a breadth-first technique that fundamentally prevents stack
       | overflow scenarios.
       | 
       | What's particularly compelling is the Rust implementation, which
       | weaves the type system and borrow checker into the algorithm's
       | design. When dealing with garbage cycles, the algorithm doesn't
       | just match current Rust GC alternatives, it outperforms them.
        
       | lilyball wrote:
       | Nitpick: In section 3.3, for the Vec impl, instead of finalizing
       | a child and then forgetting it, the child should be moved into a
       | ManuallyDrop first. That way if the finalization panics it
       | doesn't try to drop the child again while unwinding.
        
       ___________________________________________________________________
       (page generated 2024-12-13 23:00 UTC)