[HN Gopher] I built a garbage collector for a language that does...
       ___________________________________________________________________
        
       I built a garbage collector for a language that doesn't need one
        
       Author : claytonwramsey
       Score  : 224 points
       Date   : 2023-08-14 13:48 UTC (9 hours ago)
        
 (HTM) web link (claytonwramsey.github.io)
 (TXT) w3m dump (claytonwramsey.github.io)
        
       | 3cats-in-a-coat wrote:
       | I think GC is a very cool set of algorithms that have their place
       | for collecting detached elements of a graph. But in a programming
       | language, the baseline memory model should not be a graph in the
       | first place.
        
         | kaba0 wrote:
         | What else should it be? Besides the memory model being an
         | implementation detail, that is not inherent to the problem,
         | most problems (mathematically for sure, but imo also
         | practically) require general graphs.
        
           | 3cats-in-a-coat wrote:
           | Most problems don't require a general graph. They require
           | subsets of a graph, because everything is a subset of a
           | graph. A list is a subset of a graph. A map is. A tree is. A
           | relational database is. A directed acyclic graph is. But we
           | can stop before it's just any "graph". We can stop at DAG,
           | like neural networks do, or even at tree, like programming
           | language syntax does ("abstract syntax tree", "expression
           | tree"). Code and data are two faces of the same thing.
           | 
           | You can also easily reproduce a graph when you need it, by
           | using simpler data with computation on top which builds the
           | graph "on the fly", without it having to be your baseline
           | memory model. You can for example easily describe a graph in
           | two SQL tables: edges and nodes. But SQL state itself is not
           | just any graph, and for a very good reason.
           | 
           | You probably know that the first databases were hierarchical
           | and graph. SQL won not because it gave more freedom, but
           | because it removed freedom in a highly specific way, which
           | enabled larger more reliable systems, contrary to naive
           | expectations. Too much freedom is freedom to shoot yourself
           | in the foot, whether you know better or not.
           | 
           | Today graph databases have enjoyed a renaissance, but despite
           | they have solid uses cases, they'll never be the baseline
           | type of database. They won't replace relations. People have
           | learned their lesson in databases. But in programming
           | languages, they haven't. It's one of those things where we
           | can't translate knowledge from one area to another, because
           | the words we use for the same effects are different and
           | therefore we can't make the connection.
           | 
           | I wrote more here:
           | https://news.ycombinator.com/item?id=37122934
        
             | kaba0 wrote:
             | > Most problems don't require a general graph. They require
             | subsets of a graph, because everything is a subset of a
             | graph. A list is a subset of a graph. A map is. A tree is.
             | A relational database is. A directed acyclic graph is. But
             | we can stop before it's just any "graph". We can stop at
             | DAG, like neural networks do, or even at tree, like
             | programming language syntax does ("abstract syntax tree",
             | "expression tree"). Code and data are two faces of the same
             | thing.
             | 
             | And if you have a single edge not fulfilling the stricter
             | structure's properties, you are left with a general graph.
             | A single one, hence my point.
             | 
             | > but because it removed freedom in a highly specific way,
             | which enabled larger more reliable systems, contrary to
             | naive expectations.
             | 
             | That I agree with, but I personally believe it is too
             | restrictive in case of memory models. Especially that the
             | price really is not high -- modern GCs have insanely good
             | throughputs with bounded latency, the only price being a
             | slightly larger memory footprint.
        
               | 3cats-in-a-coat wrote:
               | > And if you have a single edge not fulfilling the
               | stricter structure's properties, you are left with a
               | general graph.
               | 
               | And this is why it's not good to have a general graph as
               | a baseline model, because one mistake or temptation to
               | stray from a structure with given properties, and you
               | lose all the benefits of a constraint.
               | 
               | "This object will be passed deep into a call tree, but
               | never have mutable methods called on it from more than
               | one place in a process, but it'll be read from multiple
               | places". Good luck enforcing this in a general object
               | graph.
               | 
               | But if you eliminate mutable shared state, say like Rust
               | does (not that I approve the specifics of Rust exactly)
               | then you don't have to worry about it. Rust doesn't
               | implement just a general object graph where everyone can
               | have a reference to everything all the time. It has
               | constraints.
               | 
               | > That I agree with, but I personally believe it is too
               | restrictive in case of memory models. Especially that the
               | price really is not high -- modern GCs have insanely good
               | throughputs with bounded latency, the only price being a
               | slightly larger memory footprint.
               | 
               | However as I noted in the link, the GC is the smallest
               | price you pay.
               | 
               | Also it's not a slightly larger memory footprint, GC
               | languages typically take 2x the RAM.
        
               | kaba0 wrote:
               | I do believe that Rust's choice is correct for its
               | particular niche, but it is also a perfect example for
               | _how_ limiting that constraint is (just see any kind of
               | Rust help forum thingy), and not just for beginners, you
               | also get burned by it at the other end of the spectrum
               | (e.g. implementing lock-free algorithms, etc). Sure,
               | there are escape hatches and in most other topics I would
               | agree with you that having it constrained with escape
               | hatches is the correct solution (e.g. type systems with
               | casts), I still feel that the memory model is not the
               | correct place to enforce these. (For the Rustaceans, no,
               | I 'm not saying Rust is a bad language, I like it and it
               | is absolutely a gift to low-level programming. Not
               | necessarily a good fit for high level tasks though, for
               | me at least)
               | 
               | You mentioned SQL. The reason it can be so performant is
               | that it _doesn 't_ let the user over-specify their
               | constraints, letting the computer optimize its plans,
               | storage layout, etc. I'm sure one can physically write a
               | particular query faster than a DB could execute it, but
               | it is definitely not an easy task. So strangely enough,
               | both strong and weak constraints can give you good
               | performance. To get back at the exact topic at hand, let
               | me reference Rich Hickey's term: "Place-Oriented
               | Programming", referring to this notion that 'objects' are
               | _constrained_ to a given physical location, over being
               | what they should be: semantics. When you write a program
               | and you use a list, you don 't actually care about that
               | list being in a particular place, having to be moved when
               | more items are added to it, etc, those are all
               | implementation details. You care about its "list-ness"
               | only. I see GCs with arbitrary graphs as allowing for
               | this mental model. (Note, that an object _not_ having
               | identity, aka being a value type is a _semantic_
               | question, and that allows for plenty optimizations by a
               | decent runtime, so I 'm not telling that we shouldn't
               | care about performance).
               | 
               | > GC languages typically take 2x the RAM.
               | 
               | There is a niche where that is a huge price, but in 99%
               | of cases I believe we can afford that. Also, GCd
               | languages also have their own escape hatches when needed.
               | 
               | EDIT: Nonetheless, I hope I don't sound too
               | argumentative, I genuinely enjoy this discussion and our
               | differing view points - I'm engaging with curiosity, even
               | if the tone tells otherwise, I'm no native speaker.
        
         | claytonwramsey wrote:
         | That's an interesting idea. Can you clarify some more on what
         | the baseline memory model should be, if not a graph?
        
           | 3cats-in-a-coat wrote:
           | I don't want to impose what it should be, as I'm asking
           | myself this question every day.
           | 
           | But I can list a few thoughts. The actual memory model is
           | (seemingly) a flat address space, in which processes allocate
           | segments, and everything else arises from there. There's some
           | remapping behind the scenes, some segments are shared, some
           | not, but this is beside the point.
           | 
           | And a pattern I see often is arena allocation, where you
           | allocate a large segment and then "virtually" allocate
           | smaller segments inside, down to individual scalar variables.
           | This can happen several times with a lot of memory, but tends
           | to be shallow.
           | 
           | For this and many other reasons, I see mutable memory is best
           | structured as a tree, a shallow tree, almost relational, but
           | still permitting nesting, where every item has one owner, one
           | caller, and anything else is routed through that tree. Of
           | course once this semantic model is established, you can
           | reduce and optimize many calls into more direct calls. But
           | you get clear intercept points when you need to decorate,
           | block, remap and so on behaviors in the system. No "action
           | from distance" surprises.
           | 
           | Everything should be pass by copy. For larger structures, we
           | need copy-on-write optimization. Which means underneath the
           | mutable tree there is a Directed Acyclic Graph of immutable
           | segments supporting copy-on-write.
           | 
           | This is also not new, it's how forking works on Unix for
           | example. Every modern OS utilizes copy-on-write on the file
           | system, in memory, it's also utilized, if we think about it,
           | in network caching systems. But to the process it looks like
           | normal flat address space, where everyone owns their own copy
           | of the content.
           | 
           | For collecting unused segments in COW you'd naively need
           | refcounting. There are other approaches that borrow from GC
           | algorithms, in a much simpler and more performant way,
           | because, well, there are no cycles in a DAG. The best part is
           | that once you decide to use copy semantics, you have a
           | plethora of algorithms at your disposal to try and switch
           | between adaptively (including... just copying the thing if
           | it's small enough, which turns out is the fastest approach),
           | while maintaining the same exact semantics for the programmer
           | who doesn't have to care how it works under the hood.
           | 
           | Another benefit of copy semantics is that things are copied
           | between processes, or especially between machines in a
           | network, and now your language locally has the same semantics
           | as when it talks to the network, which removes barriers and
           | unifies interfaces. "Write once, run at any scale" if you
           | will.
           | 
           | Anyway, this is not comprehensive. And just my opinion. But
           | the problem with graphs is that it is the data structure with
           | least discipline, and most complications. And in most cases
           | those complications do not pay off. Not only they require
           | complex GC, but they result in heavily entangled code, tons
           | of shared mutable state, that gave the whole OOP space a bad
           | name (that it doesn't deserve). This was never a problem with
           | objects, because objects were never supposed to share mutable
           | state. It's a problem of graphs and handles, a system that
           | pretends everyone "has" a mutable piece of state... which it
           | doesn't have, as ownership must be exclusive to be reliable.
           | 
           | Discipline of dependency management (packages etc.) often
           | removes cycles and results in a tree or a DAG. This is not
           | coincidental. And there's no reason we can't replicate this
           | at the lower levels to individual objects.
           | 
           | Of course, we still need graphs. But not everywhere, and not
           | all the time.
        
       | SeniorMars wrote:
       | Honestly, this is a very cool project. I would suggest you
       | publish this as a formal paper to a pl journal.
        
       | MatthiasPortzel wrote:
       | I'm not a Rust programmer, but I've heard Rust has compile-time
       | AST macros. I wonder if it would be possible to implement a Rust
       | GC by defining a macro to tag garbage collected sections of code.
       | The macro expansion could then insert cycle-checks on moves as
       | described early in the post. I have no intuition for how
       | performance would compare to the trait-based approach that the
       | author went with.
        
         | chubot wrote:
         | IME a lot of people want some kind of help from the compiler
         | when writing a GC, but there is the fundamental static vs.
         | dynamic problem.
         | 
         | Heap integrity / reachability is a dynamic property of a
         | running program, and so all the static solutions basically end
         | up as half-measures. (I imagine this would include using
         | macros, though I don't know exactly what you mean),
         | 
         | FWIW this is my experience -
         | https://www.oilshell.org/blog/2023/01/garbage-collector.html
         | 
         | Take it with however many grains of salt, but I heard many
         | static solutions proposed, and they're all "wrong" for the
         | simple reasons in theory of computation.
         | 
         | Also, Rust's static memory management inherently clashes a bit
         | with dynamic memory management. That's also a fundamental
         | thing, and you can have a bazillion mechanisms to ameloriate it
         | for some cases (which may be valuable), but the problem will
         | still be there no matter what.
        
         | 1letterunixname wrote:
         | Yeap. 2 flavors: procedural and pattern matching. Even the
         | build can be scripted in Rust. And there is constant
         | evaluation. https://doc.rust-lang.org/reference/const_eval.html
         | 
         | Rust already has precise heap allocations with Box and Rc/Arc
         | without the need for GC. GC implies uncomputed heap allocation
         | liveness deferring work until later.
        
         | claytonwramsey wrote:
         | In Rust, macros work at the tokenization level. Making a macro
         | do that would require the macro to fully parse and analyze the
         | syntax tree - possible, but not very efficient. Additionally,
         | there's no guardrail to enforce that the user doesn't move a
         | `Gc` anyway.
        
       | loeg wrote:
       | > box_ref.tag.store(COLLECTING_TAG.load(Ordering::Release))
       | 
       | Is it just me or do release semantics not make sense for a load?
       | Release is for stores (I'm coming from the C/C++ atomics model).
       | Hm, the docs[1] say a Release load will panic:
       | 
       | > Panics if order is Release or AcqRel.
       | 
       | Maybe just a blog transcription typo for
       | tag.store(TAG.load(Relaxed), Release).
       | 
       | [1]: https://doc.rust-
       | lang.org/std/sync/atomic/struct.AtomicUsize...
        
         | claytonwramsey wrote:
         | This is correct, thank you. I'll fix it shortly.
        
       | bfeynman wrote:
       | Having some marine background the first sentence confused me, I
       | didn't realize people use the crab mascot for rust so much to use
       | evolutionary terms. But there is already a term for the
       | "opposite" of carcinization, it's just decarcinization. This
       | analogy can come full circle if the next variant of this language
       | uses the coconut crab or a hermit crab as its mascot.
        
         | gabereiser wrote:
         | You hear that crab-lang? :D Decarcinization is the correct
         | opposite of carcinization. Hermit crab would be the perfect
         | mascot in both function and metaphor.
        
       | nicechianti wrote:
       | [dead]
        
       | webkike wrote:
       | I've recently been looking at this space for using a Gc smart
       | pointer in a language I've been writing. It's extremely useful -
       | you can integrate builtins and external functions very cleanly
        
       | e_y_ wrote:
       | As I understand it, Rust originally had support for garbage
       | collected objects. It was removed because the community avoided
       | it as a general practice to ensure that libraries could work
       | without the optional GC runtime, due to its popularity as a
       | systems language.
       | 
       | https://web.archive.org/web/20130607161259/http://pcwalton.g...
        
         | 1letterunixname wrote:
         | Yeap. Vec, String, Box, format!, and println! all depend on
         | alloc for heap allocations. If an allocator is provided, alloc
         | can be used in no_std. For format!, there are alternatives such
         | as providing a fixed u8 buffer. It's quite amazing what can be
         | accomplished without alloc.
        
       | teleforce wrote:
       | Personally I think D language got it right when it makes GC as a
       | default but still allows for no GC whenever applicable thus
       | keeping most of the language relatively safe, easier to grok and
       | more Pythonic compared to Rust.
       | 
       | It will be very interesting to survey how many Rust
       | implementations nowadays do away with memory safety [1].
       | 
       | Not sure if the author of the article referred to this seminal
       | paper on Rust for GC for his GC implementation for Rust in Rust
       | [2].
       | 
       | [1] What is a safe programming language?
       | 
       | https://cs.stackexchange.com/questions/93798/what-is-a-safe-...
       | 
       | [2] Rust as a Language for High Performance GC Implementation:
       | 
       | https://users.cecs.anu.edu.au/~steveb/pubs/papers/rust-ismm-...
        
         | didip wrote:
         | I think D failed because it doesn't take a stand one way or the
         | other.
         | 
         | Overwhelming majority of mainstream programmers want important
         | choices to be made for them.
         | 
         | - People want language designers to decide: Has GC or no GC.
         | Choose one and be consistent with it. Then all the libraries
         | will be written on top of it.
         | 
         | - People want language designers to decide: Has async green
         | thread or not. Choose one and be consistent with it. Then all
         | the libraries will be written on top of it.
         | 
         | - People want language designers to decide an auto format
         | syntax style. Choose one and be consistent with it. Then all
         | the libraries will be written on top of it.
         | 
         | etc. etc.
        
           | quickthrower2 wrote:
           | C# doesn't do the latter two
        
         | OtomotO wrote:
         | Python is a very niche language in my country
        
           | kaba0 wrote:
           | Unless your country barely has any computers, I find it hard
           | to believe.
        
             | OtomotO wrote:
             | First world country, middle of Europe, Java everywhere. And
             | C#.
             | 
             | Saw Python last at university and just because I wanted to.
             | 
             | I know nobody that works with Python on a day by day basis
             | and I know many nerds like me :)
        
         | loeg wrote:
         | You would think that if D "got it right," it would have seen a
         | tenth the excitement / engagement Rust does. The proof is kind
         | of in the pudding?
        
           | speed_spread wrote:
           | Beyond technical merit, D never had a "carrier" application
           | like Rust had with Firefox (AFAIK).
        
             | baq wrote:
             | Rust had servo, not Firefox, but it doesn't matter that
             | much; Rust got legitimately popular for the 'safe non-GC
             | language' badge that it shares with precious little peers.
        
           | Jtsummers wrote:
           | > You would think that if D "got it right," it would have
           | seen a tenth the excitement / engagement Rust does.
           | 
           | Why? Excitement doesn't always follow correctness or even
           | "goodness" (whatever that means in context), even if we'd
           | like it to. Fashion and marketing matter more when it comes
           | to excitement.
        
             | loeg wrote:
             | I think they're very correlated!
        
               | cozzyd wrote:
               | counterpoint, PHP?
        
           | crabbone wrote:
           | It's very naive to think that language popularity correlates
           | with engineering merits...
           | 
           | I worked very little with D, and a lot more with Rust. I
           | cannot really speak to the qualities of D because, by and
           | large, I dealt with a very few instances of it, and only at
           | the interface level (mostly writing in another language).
           | But, in terms of fame and popularity -- Rust developers and
           | backers put a ton of effort into promoting the language.
           | There were times when every other week there would be an
           | article in some popular tech. media source from all kinds of
           | "big names" about how Rust is superior to something (usually
           | C++). There were massive efforts made to create a self-
           | sustained community of Rust-evangelists who'd go and spread
           | the word to the heathens...
           | 
           | I haven't been around to witness the birth of D and don't
           | remember if it had any kind of adoption campaign, but from
           | what I can tell, even if it had a campaign, it didn't
           | succeed.
           | 
           | Now, here's another example from the same category of
           | languages: Ada (esp. with Spark). I'm not very knowledgeable
           | about it, but from the little I know, there are a lot of good
           | reasons to prefer Spark over Rust. At least, on engineering
           | merits, they should be roughly equivalent. But, in terms of
           | popularity, Rust beats Spark by a lot.
        
             | loeg wrote:
             | Popularity is itself an engineering merit.
        
               | amno wrote:
               | Yeah, if you have $500M (half a billion of USD) to spend
               | on marketing, like Sun Microsystems poured into Java, you
               | must have some engineering merit behind you. Not
               | necessarily in the thing you market.
        
               | quickthrower2 wrote:
               | Social Engineering
        
         | surajrmal wrote:
         | D made the right tradeoffs for whom? There are many safe
         | languages with great ecosystems which have a GC. Realistically,
         | before rust, there wasn't really any for languages without a
         | GC. A major problem for folks who must not use a GC is that
         | large parts of the D ecosystem rely on the GC which makes it
         | not useful for the segments that cannot use said GC. I've not
         | actually used D professionally, so I could be off base, but
         | this is at least the perception.
         | 
         | Rust is facing a similar problem in terms of sync vs async,
         | which seems to limit options primarily for folks who want to
         | avoid async, but it doesn't seem like a major blocker for
         | adoption comparatively.
        
           | connicpu wrote:
           | I think one notable difference for Rust is that it's still
           | perfectly possible to call async code from sync code. It
           | might be slightly less efficient than if everything was
           | participating in async, but often that's fine when it's just
           | the one-off network request in your otherwise synchronous
           | CPU-heavy workload. Meanwhile, for people who need to have no
           | GC, you can't paper over an internal dependency that uses GC.
           | The whole stack has to be no-GC period. A similar comparison
           | in the Rust world is probably with the #![no_std] ecosystem.
           | Kernels/kernel modules and super low power microcontrollers
           | have the biggest restrictions on what libraries they can pull
           | in, since they can't use any dependencies that use the normal
           | standard library, only the core library. (Low power is
           | relative though, eg the ESP32 chips have nearly full std lib
           | support)
        
         | claytonwramsey wrote:
         | The big reason why this library exists is so that downstream
         | consumers can be entirely memory safe in their allocations -
         | it's the same way that the Rust standard library works. Library
         | writers can use unsafe code to provide safe abstractions and
         | APIs, so you end up with an unsafe core with a safe framework
         | on top of it. I'm just continuing that tradition.
         | 
         | I actually haven't read that paper! Thanks for sharing it. What
         | I'm doing is slightly different - it looks like they're
         | building a conservative GC with an unsafe API, much like one
         | might for C. My intent was closer toward building a GC which
         | can be used with any safe Rust code more or less
         | unconditionally.
        
         | [deleted]
        
         | programmer_dude wrote:
         | D was doomed from the start. It was designed as an incremental
         | improvement over C++. Initially it provided refuge to burnt out
         | C++ programmers like myself. But once I discovered Rust there
         | was no coming back.
         | 
         | Rust takes its fundamentals from more advanced and well thought
         | out functional languages. As a result it outclasses D and its
         | ilk.
        
       | ajani wrote:
       | > If carcinization happens when languages evolve to be more like
       | Rust, then what do you call it when Rust evolves to be more like
       | Java? Caffeination?
       | 
       | Surely oxidation is better than carcinization. And rust evolving
       | to Java would be mineralization? Or maybe germination?
        
         | OJFord wrote:
         | And it's the version I've actually heard used (multiple times)
         | before now - having looked up 'carcinisation', I assume that
         | comes via 'rustacean' (cf. crustacean), which makes it a bit
         | more contrived and then sort of implies the result is a Rust
         | _developer_ , not Rust itself?
         | 
         | Makes more sense to me that a language/framework/etc. would
         | 'oxidise' to Rust, and someone learning/getting hooked on Rust
         | would 'carcinise'.
         | 
         | But this is all silly and doesn't matter anyway, ha!
        
           | Tyr42 wrote:
           | The rust mascot Ferris is a crab. Like how go has a gopher.
        
             | munificent wrote:
             | And, of course, the name "Ferris" comes from being a
             | homophone of "ferrous".
        
         | jakear wrote:
         | Eh, carcinization is a better ontological fit. Many separate
         | things evolving into the same structure, versus the process of
         | losing electrons and/or structurally degrading.
        
       | senkora wrote:
       | Always remember: a garbage collector is a theorem prover that
       | proves that objects are unreachable, over and over, so that they
       | can be destroyed.
       | 
       | It is in general undecidable whether an object is unreachable,
       | but we can get good performance with heuristics.
        
         | KRAKRISMOTT wrote:
         | Why is it undecidable? Is it not just a graph
         | traversal/coloring problem?
        
           | zer8k wrote:
           | To put it simply it's _very_ difficult to determine what
           | objects (in the C term) are garbage. Hence why we have things
           | like reference counting. We need extra data to hint to the
           | collector what is, in fact, garbage.
        
             | bruce343434 wrote:
             | Okay, so you need to set up some scaffolding/bookkeep some
             | metadata. Still not impossible.
        
               | papruapap wrote:
               | No an expert, but afaik the problem is not the complexity
               | but the performance cost.
        
               | robertlagrant wrote:
               | Sure, but that's not "undecidable"?
        
               | hypertele-Xii wrote:
               | Technically the halting problem is a performance cost
               | issue, because you can always tell whether a program
               | stops by just running it. The performance cost is having
               | to actually run it and wait and see, rather than find
               | some quicker way to the answer.
        
               | myk9001 wrote:
               | I don't think it can be boiled down to a performance cost
               | issue. If a program is going to run forever, how do you
               | measure the forever has passed to conclude the program
               | doesn't halt?
        
               | kybernetikos wrote:
               | This is only true for the very small turing machines that
               | we know the busy beaver number for. In the general case,
               | for a system of even moderate size, you have no way of
               | knowing if you've waited long enough to know if the
               | program runs for ever or not.
        
             | tylerhou wrote:
             | If you define garbage in the conventional way -- "objects
             | that can never be reached from any GC roots" -- determining
             | what objects are garbage only takes linear time
             | (tracing)...
             | 
             | Reference counting helps the tracing phase out by pruning
             | objects that are clearly garbage early -- if a refcount
             | hits zero, that object can clearly be deallocated safely.
             | The converse is not true, which is why we need tracing.
        
           | fweimer wrote:
           | The case that arises quite often in practice is when a
           | reference is retained, but can only be used on an error path
           | (e.g., as part of an exception). The error condition may not
           | ever happen due to the way to code is written, but it's
           | unlikely the garbage collector can prove that except in the
           | simplest possible cases. Usually, it is acceptable to do a
           | conservative approximation of the actual lifetime, and blame
           | space leaks on the programmer.
           | 
           | Garbage collectors usually also make assumptions in the other
           | direction (a non-conservative approximation) when it comes to
           | finalizable resources (e.g., file descriptors). These uses do
           | not involve the reference that is visible to the collector.
           | This gives us Reference.reachabilityFence(), and again
           | mistakes are blamed on the programmer.
        
         | raincole wrote:
         | It's not. You're describing a static analyzer that inserts
         | free() automatically. Not GC. GC is pretty much the opposite.
         | 
         | Edit: Well, I didn't think it through. You're correct. A
         | "perfect GC" is indeed undecidable, and all the real world,
         | practical GC have "false positive" (a piece of memory is no
         | longer used by the code, but there is no way to know it). A
         | perfect GC without false positive is equal to halting problem.
         | 
         | Example:                 var obj = new Object();            //
         | === real code begin        ... 50 lines of real code that never
         | uses obj       if (something == true) return;       ... another
         | 50 lines of real code that never uses obj       // === real
         | code end            obj.doThings();
         | 
         | A perfect GC can release obj while the real code is running. A
         | real world, imperfect GC can't, since whether the real code
         | returns early or not is undecidable.
        
           | monocasa wrote:
           | You normally fix that not with the GC itself, but with the
           | compiler.
           | 
           | If new Object() truly doesn't have any effects visible other
           | than the valid reference, then it's creation can be moved to
           | after the if statement.
        
           | chubot wrote:
           | After writing a garbage collector, I can say that I've seen
           | the unreachable / "will never be used" distinction, and it's
           | more or less irrelevant trivia.
           | 
           | It doesn't help you write a garbage collector to think of it
           | that way. It doesn't provide insight into the problem. The
           | problems are elsewhere. Focus on reachability, dynamic
           | understanding of the heap, performance, etc.
        
             | nomel wrote:
             | > more or less irrelevant trivia
             | 
             | What are some cases? The only I can think of is if a
             | "remote" thing owned the memory. For example, if I sent a
             | pointer elsewhere, then reused it later. Like, maybe with a
             | device driver, or library.
        
           | lisper wrote:
           | > You're correct.
           | 
           | No, s/he's not. The original claim was:
           | 
           | > It is in general undecidable whether an object is
           | UNREACHABLE. [Emphasis added]
           | 
           | Being unreachable is not the same thing as being "no longer
           | used by the code". The latter is indeed undecidable, but the
           | former is not.
        
         | [deleted]
        
         | 3cats-in-a-coat wrote:
         | Empirically what you say can't be the case, because if it's
         | "undecidable" then one of these must be true:
         | 
         | 1. The GC frees objects that could be reachable (program gets
         | corrupted in spectacular ways).
         | 
         | 2. The GC never frees anything (as it can't decide).
         | 
         | The GC may not dispose of objects immediately, but when it
         | determines they're not reachable, they're not reachable,
         | period. There's nothing undecidable about it.
        
           | hashmash wrote:
           | It's undecidable when you consider the possibility that the
           | program won't ever attempt to access an object. A tracing
           | collector plays it safe and assumes such objects might still
           | be accessed at some point. In GC languages you'll often see
           | deliberate null assignments in order to allow the collector
           | to clean up objects sooner.
        
             | 3cats-in-a-coat wrote:
             | Reachable/accessible and reached/accessed are different
             | things, why are we talking about it as if it's the same
             | thing? One is a potential eventuality, the other is an
             | event.
             | 
             | GC are concerned with the potential eventuality, which is
             | 100% decidable, not with the latter, that was never under
             | discussion.
             | 
             | To say "reachability is undecidable" means you can't decide
             | whether you MAY or MAY NOT reach it. Which is wrong. You
             | can decide whether you MAY or MAY NOT reach it. And if you
             | MAY NOT... then that's 100% unreachable and safe to
             | collect.
        
         | munificent wrote:
         | It's easily decidable whether a given GC object _could_ be
         | reached. What isn 't decidable is whether a it _will_ be used
         | in the future.
         | 
         | The conservative approximation that GC's make is that they keep
         | all objects that _could_ be accessed in the future by the
         | program (reach- _able_ ) instead of only keeping the objects
         | that _will_ be used in the future.
         | 
         | A trivial example is:                   void main() {
         | var obj = new Object();                while (true) {
         | // Do other work...                  if (dayOfWeek == 9) {
         | print(obj);             }           }         }
         | 
         | Since there are only seven days in the week, that `print()`
         | statement will never be reached, and an optimal GC would free
         | obj. But the GC can't determine that. All it knows is that it
         | could possibly be reached, so the object stays in memory.
        
           | cozzyd wrote:
           | https://en.wikipedia.org/wiki/French_Republican_calendar
        
         | aatd86 wrote:
         | I'll join the others in trying to bring up a nuance.
         | 
         | The issue is not necessarily unreachability being undecidable.
         | The issue is that reachability depends on control flow. And
         | that implies being conservative which translates into having to
         | deal with false positives.
         | 
         | That static analysis in a nutshell: how to get precision and
         | soundness.
         | 
         | Probably a mere question of terminology.
        
         | convolvatron wrote:
         | I'm not sure this is right. the classic gc approach is dynamic,
         | not static, and it discovers the reachable set, not the
         | unreachable one. please elaborate?
        
           | Filligree wrote:
           | There's garbage collection algorithms that do one, the other,
           | or a combination. It depends mostly on what constraints you
           | have, though yes, high-performance GC mostly looks for the
           | reachable set.
        
       | tayistay wrote:
       | Is this essentially the "cyclic reference counting" algorithm
       | from the Garbage Collection Handbook [1], Chapter 5, Algorithm
       | 5.5?
       | 
       | [1] https://gchandbook.org
        
       | convolvatron wrote:
       | after spending the requisite time trudging through the lifetime
       | swamp (I couldn't couldn't imagine running my whole life on the
       | stack), and seeing all the nice things about rust, I couldn't
       | help but thinking at the end 'this would be a really nice
       | language if it were garbage collected'. of course you can't say
       | that.
       | 
       | this is nice
        
         | speed_spread wrote:
         | It's not taboo to say so, even within the Rust community. It's
         | something that will happen eventually. You might be interested
         | in reading https://without.boats/blog/revisiting-a-smaller-
         | rust/
        
         | echelon wrote:
         | This post is a horrible intro to Rust. It's a fantastic work,
         | but totally outside of what a normal everyday Rust program
         | looks like.
         | 
         | > requisite time trudging through the lifetime swamp
         | 
         | Rust really isn't like that. It gets a bad rap.
         | 
         | > I couldn't couldn't imagine running my whole life on the
         | stack
         | 
         | Most of the things you touch in Rust are heap allocated.
         | 
         | > this would be a really nice language if it were garbage
         | collected
         | 
         | It is a nice language, and all of your complaints go away once
         | you start using the language idiomatically.
        
         | masklinn wrote:
         | > of course you can't say that.
         | 
         | Oh you can absolutely say that.
         | 
         | It's just a dumb take. Because there are already languages
         | which have a solid type system, a functional bent, and a GC.
         | And you can probably bang out one yourself if you want.
         | 
         | If that was what Rust was, it probably wouldn't have existed in
         | the first place, Mozilla would not have been interested in it
         | in the second place, and no community would have gelled around
         | it in the third place. Hell, it was specifically dragged
         | further downstack by the people who coalesced around its
         | _potential_ in that space.
        
           | shpongled wrote:
           | It's not a dumb take at all. I have been using Rust for > 6
           | years now. A significant portion of my projects don't benefit
           | strongly from the borrow checker (the other ones massively
           | do). I use Rust primarily because it has a solid type system,
           | functional bent, and - most importantly - phenomenal tooling
           | (cargo, rust-analyzer), package ecosystem, and developer UX.
           | 
           | If OCaml/StandardML/Haskell/etc had anything close to the
           | ergonomics and convenience of cargo for testing, setting up
           | projects, adding deps, deploying them to any OS/platform, I
           | would probably write about 30-40% less Rust.
           | 
           | I'm not saying we should add GC to Rust - but I think there
           | is space for a Rust+GC language (or just improved ecosystem
           | and tooling for other functional langs)
        
             | pjmlp wrote:
             | Like opam and stackage?
        
               | shpongled wrote:
               | I'm familiar with both - but not really the same order of
               | magnitude. crates.io lists >120k packages. opam has
               | ~4500, and stackage ~3000. And the package ecosystem is
               | only part of it. Admittedly, I haven't written any OCaml
               | or Haskell in the last year or so, but my experience of
               | just getting a new library or executable up and running
               | isn't comparable to cargo either. But also hard to
               | separate my strong familiarity with Rust from the
               | comparison.
        
               | pjmlp wrote:
               | Quality matters more than quantity.
               | 
               | I rather have one package that does it right, than 20
               | doing a different set of 80% from what is needed.
        
               | shpongled wrote:
               | Who wouldn't. But let's not pretend that stackage and
               | crates.io are covering the same functionality space.
               | 
               | As just one example: there are no OCaml or Haskell
               | packages for natively reading or writing parquet files.
               | There are at least 2 such packages for Rust.
        
               | pjmlp wrote:
               | Yeah, yet it doesn't matter to me, I don't even know what
               | they are about.
               | 
               | Likewise, despite crates.io, Rust is missing lots of
               | stuff I miss from Java, C# and C++.
               | 
               | Plenty of stuff isn't there yet.
        
           | twic wrote:
           | It's not a dumb take. It's certainly true that Rust with GC
           | would not solve the problems Rust was intended to solve, and
           | would really not be Rust.
           | 
           | But that doesn't mean Rust with GC wouldn't be a nice
           | language! I don't know of any language that really fits that
           | description.
           | 
           | In fact, I'll pick just four things I like about Rust: a good
           | type system, a supportive, productive community with a focus
           | on doing real stuff, enough mindshare and buzz not to be
           | irrelevant, and being not totally insane.
           | 
           | The first criterion rules out all untyped languages, Go, and
           | the blue collar languages like Java and C#. The second rules
           | out all existing functional languages. The third rules out
           | whatever niche language you're about to suggest. The fourth
           | rules out C++.
           | 
           | There really isn't a mainstream managed language which has
           | managed to thread both the design and community needles the
           | way Rust has.
        
             | WorldMaker wrote:
             | C# probably has more of the "good type system" you are
             | looking for than not these days (and every compiler release
             | at this point seems to better it) as it keeps borrowing
             | clothes from F# and the ML family of languages. The biggest
             | missing tooth in the type system that most really want are
             | discriminated unions at this point, and you can build
             | things like them by hand or using common libraries
             | (including building them relatively easily in F# in a way
             | that is exportable to C#; it's not the most ergonomic
             | answer, but it is fine sometimes). It's also something that
             | seems increasingly likely to show up natively in a future
             | compiler release.
        
             | kaba0 wrote:
             | Scala seems to fit all of your requirements.
        
           | mhh__ wrote:
           | It's true that Rust would probably not exist today if it
           | didn't focus on lifetimes but I think a lot of Rust
           | programmers now either don't care or literally don't know
           | anything about memory management e.g. Every rust programmer
           | that I've met (my age at least) has taken to rust because
           | they like it as a language and are often trying to avoid
           | learning C++ (note that is completely different from looking
           | for a language _after_ having learnt C++).
           | 
           | Rust as a midpoint between Haskell and C, so really C with a
           | nicer type system, combined with momentum/meme status is more
           | important today than it's memory safety story. Memory safety
           | just isn't the buzzword that it was even in 2019 let alone
           | 2015.
           | 
           | > And you can probably bang out one yourself if you want.
           | 
           | I've written my own dataflow analysis code, aimed at lifetime
           | checking, too, but that's not really the point is it?
        
             | zer8k wrote:
             | Rust is nothing like C. It's not even a midway point
             | between C and Haskell.
             | 
             | Rust is C++ with stronger typing and a better memory
             | management model. Comparisons between Rust and C aren't
             | really accurate because Rust is the modern stainless steel
             | kitchen sink that C++ can't be. Without its lovely ADTs and
             | borrow checker it's just C++ with a better package
             | management story. Nothing wrong with that. It won't replace
             | C.
        
               | kaba0 wrote:
               | Targeting C for replacement wouldn't even have been a
               | great goal, C++ is the de facto performance queen in the
               | industry, follow its suit.
        
         | pjmlp wrote:
         | That language is called OCaml.
        
           | galangalalgol wrote:
           | OCaml doesn't have a borrow checker (yet). I'm sold on having
           | a borrow checker, not just for memory, but for io of any
           | sort. I'm not sure what adding GC to a borrow-checked
           | language allows for other than catching self referencing but
           | unreachable loops?
           | 
           | I definitely think a language that has room for abstractions
           | that aren't zero cost, while still having a borrow-checker
           | makes sense. Using rust to make that language also makes
           | sense. I think implementing a vm or interpreter entirely in
           | safe rust would effectively force borrow checking on the
           | resultant language wouldn't it?
           | 
           | Ruby seems like a good place to start the experiment. I'd
           | prefer it end up with something more like julia or python
           | just for my own use cases.
        
             | pjmlp wrote:
             | OP was asking for a language like Rust, using a GC instead,
             | duh!
             | 
             | As for having both, I rather take Swift memory ownership or
             | Linear Haskell approach, or even D/C#/Nim/Go.
             | 
             | The productivity of automatic memory management, with the
             | tooling to go low level C and C++ style, if and when it is
             | really needed.
        
               | galangalalgol wrote:
               | I would have considered the borrow checker and zero cost
               | abstractions to be rusts defining characteristics, so
               | ocaml isn't anything like rust except some syntax and
               | that they are both multi-paradigm. And then my point is
               | that as the author points out, you don't really need a GC
               | if you have a borrow checker. If you want a language with
               | few costly abstractions, but with a GC, go or D come to
               | mind I guess? But the borrow checker is such a big piece
               | of rust I don't really consider those similar languages
               | either.
               | 
               | If the idea is for a safe but more ergonomic language at
               | the cost of performance, I think you'd be focused on
               | making clone and copy implicit when a move doesn't work,
               | getting rid of explocit borrows/references, and perhaps
               | figuring out mutability without needing to annotate it.
               | The absolute type safety is probably the biggest
               | speedbump I hit in daily usage though, not the borrow-
               | checker. A strong but duck/dynamic type system might
               | solve that?
        
               | pjmlp wrote:
               | Most people when they say like Rust, they mean ML syntax,
               | as Rust is the very first experience with Hindley-Milner
               | type systems.
               | 
               | Borrow checker alone makes several algorithms and data
               | structures quite hard to implement.
               | 
               | Swift memory ownership model, Linear Haskell, Hylo (nee
               | Val), Chapel, Ada formal proofs, D's ownership model, are
               | all examples where language designers decided borrow
               | checker alone is too much to ask for.
        
               | galangalalgol wrote:
               | Swift's proposal with copy on write is the sort of thing
               | I was suggesting to make the borrow checker more
               | ergonomic. It is going to be optional though, which I
               | think takes some of the utility away. I want much
               | stronger safety guarantees in the overall infrastructure
               | than that would provide. Linear types in haskell would
               | provide that, but speaking of some data structures being
               | difficult, Haskell makes largely the same ones
               | problematic. Ada has a borrow checker of sorts, but it
               | makes unchecked_deallocate UB. As long as you don't free
               | memory it doesn't have UB. That isn't a bad trade really,
               | but it still isn't completely impossible to do bad things
               | with access types and they aren't that ergonomic either.
               | I don't know much about the others you mentioned. There
               | is definitely a space for a friendlier enforcement of
               | ownership that makes strong guarantees. But again, the
               | borrow checker isn't my main pain point in rust, its the
               | static strong typing with no duck typing for generics. It
               | makes for nice error messages, but it makes other stuff
               | hard.
        
               | pjmlp wrote:
               | No need to call unchecked_deallocate directly, unless one
               | isn't using controlled types, and even better SPARK
               | features.
               | 
               | No UB to worry about.
        
       | chc4 wrote:
       | Nice blog post! I also wrote a concurrent reference counted cycle
       | collector in Rust (https://github.com/chc4/samsara,
       | https://redvice.org/2023/samsara-garbage-collector/) though never
       | published it to crates.io. It's neat to see the different choices
       | that people made implementing similar goals, and dumpster works
       | pretty differently from how I did it. I hit the same problems wrt
       | concurrent mutation of the graph when trying to count in-degree
       | of nodes, or adding references during a collection - I didn't
       | even think of doing generational references and just have a
       | RwLock...
        
         | nextaccountic wrote:
         | Why not publish to crates.io? This seems useful
        
           | chc4 wrote:
           | Honest answer is mostly that I forgot; there was one more
           | small TODO that I needed to fix before I wanted to publish
           | it, and then I got busy and forgot about it.
        
         | claytonwramsey wrote:
         | That's really cool! Thanks for the kind words. I remember
         | peeking at it while hunting for collectors to benchmark
         | against.
        
       | zengid wrote:
       | This is really impressive work and I applaud the author for
       | building something that I'm sure lots of folks will find
       | interesting and possibly very useful, but I want to say that I
       | think there's a _reason_ reference semantics (using and sharing
       | heap allocations) are hard in Rust: It 's because references are
       | just plain hard to use safely. Rust tries to make the gnarly
       | things awkward, but what is the easy path?
       | 
       | If you look at the Hylo (formerly Val as of 3 days ago) language
       | [0] that folks like Dave Abrahams have been working on, they're
       | outright saying to just not use reference semantics if you can
       | avoid it. In this talk at CppCon about Value Semantics, Dave
       | argues that we should be decoupling object graphs from access to
       | objects [1]. That's right folks, using `usize` indexes into
       | collections, or adjacency lists, isn't just a hack to get around
       | the borrow checker as people like Jonathan Blow have famously
       | critiqued [2], it may just be _the right way of doing things_.
       | 
       | [0] https://www.hylo-lang.org
       | 
       | [1] https://youtu.be/QthAU-t3PQ4?t=2354
       | 
       | [2] https://www.youtube.com/watch?v=4t1K66dMhWk
        
         | IshKebab wrote:
         | > just a hack to get around the borrow checker as people like
         | Jonathan Blow have famously critiqued [2], it may just be the
         | right way of doing things.
         | 
         | I dunno, isn't a usize index into a collection essentially just
         | the same as a pointer? Ok it's a bit safer because you at least
         | know the type of the object you're accessing is correct, but
         | you can still get e.g. use after free bugs.
         | 
         | I think Jonathan Blow is right and wrong - it is a hack to get
         | around the borrow checker, but also that's totally fine. The
         | borrow checker still works for the other 95% of code you are
         | writing. Nobody ever claimed it was the perfect solution to
         | every problem.
        
         | claytonwramsey wrote:
         | I know of a lot of people trying to do "arena" GC in Rust using
         | integer indices. I think that's an excellent approach, but it
         | does mean you have to pass in the arena as a resource to every
         | single function which produces an allocation. This isn't even
         | necessarily a bad thing - the net result is a lot like Zig's
         | allocator API.
         | 
         | I know there's a nontrivial overhead to using index-based
         | references, especially since CPUs can't easily predict load
         | operations. This gives me an idea: create a CPU architecture
         | where pointers are actually (object, offset) tuples, enabling
         | better on-metal performance with index-based references. I've
         | neither the time, money, expertise, nor energy to implement
         | such a thing, but it would be really cool if it existed - maybe
         | CHERI is a close approximation, though I haven't looked very
         | closely at it.
        
           | gpderetta wrote:
           | > it does mean you have to pass in the arena as a resource to
           | every single function which produces an allocation
           | 
           | Implicit arguments! I'm more and more convinced that this
           | form of controlled dynamic scoping dynamic scoping should be
           | embraced by more languages.
           | 
           | > create a CPU architecture where pointers are actually
           | (object, offset) tuples
           | 
           | Segments might yet see a renaissance (and CHERI might be
           | indeed be a form of it).
        
             | norir wrote:
             | Yes, the obvious way to do this is have the compiler add a
             | context parameter to every function that allocates memory
             | or dereferences a pointer (semantically). To run the
             | program, the main function ala c initializes the context
             | and passes it in to the actual main program implementation.
             | When the program finishes, all the memory can just be
             | released in bulk in the wrapper main that initialized the
             | context.
        
               | sgtnoodle wrote:
               | How is that much different than just using malloc and
               | then letting the OS clean up when the process exits?
        
               | greiskul wrote:
               | By having implicit arguments, a framework can decide when
               | to override the implictness and be explicit to use a
               | different arena when needed.
               | 
               | Also, many process like servers are long lived, so the OS
               | clean up strategy does not work for them, and they are
               | exactly where you want some strategy for memory
               | allocation like arenas to prevent memory fragmentation.
        
               | nextaccountic wrote:
               | This is actually a serious proposal for the Rust
               | language, it's called contexts and capabilities
               | 
               | Here's a blog post from 2021
               | https://tmandry.gitlab.io/blog/posts/2021-12-21-context-
               | capa...
               | 
               | This isn't implemented yet, and the proposal never
               | matured into a proper RFC, but there's high interest for
               | some solution along those lines
        
               | nextaccountic wrote:
               | And just a note, this is called Reader monad in Haskell
               | 
               | In other words, receiving an implicit parameter is a kind
               | of effect. So another way to provide this feature is to
               | have a full blown effect system (which is much more
               | general and is probably overkill, but there's also some
               | interest for that)
        
             | mhink wrote:
             | > Implicit arguments! I'm more and more convinced that this
             | form of controlled dynamic scoping dynamic scoping should
             | be embraced by more languages.
             | 
             | This might be an odd comment, but this is sorta similar to
             | React's Context mechanism. For what it's worth, it makes a
             | lot of things easier but it can definitely also add a lot
             | of surface area for complexity. Although it's certainly
             | better than relying on globals all over the place.
        
       ___________________________________________________________________
       (page generated 2023-08-14 23:00 UTC)