[HN Gopher] I built a garbage collector for a language that does...
___________________________________________________________________
I built a garbage collector for a language that doesn't need one
Author : claytonwramsey
Score : 224 points
Date : 2023-08-14 13:48 UTC (9 hours ago)
(HTM) web link (claytonwramsey.github.io)
(TXT) w3m dump (claytonwramsey.github.io)
| 3cats-in-a-coat wrote:
| I think GC is a very cool set of algorithms that have their place
| for collecting detached elements of a graph. But in a programming
| language, the baseline memory model should not be a graph in the
| first place.
| kaba0 wrote:
| What else should it be? Besides the memory model being an
| implementation detail, that is not inherent to the problem,
| most problems (mathematically for sure, but imo also
| practically) require general graphs.
| 3cats-in-a-coat wrote:
| Most problems don't require a general graph. They require
| subsets of a graph, because everything is a subset of a
| graph. A list is a subset of a graph. A map is. A tree is. A
| relational database is. A directed acyclic graph is. But we
| can stop before it's just any "graph". We can stop at DAG,
| like neural networks do, or even at tree, like programming
| language syntax does ("abstract syntax tree", "expression
| tree"). Code and data are two faces of the same thing.
|
| You can also easily reproduce a graph when you need it, by
| using simpler data with computation on top which builds the
| graph "on the fly", without it having to be your baseline
| memory model. You can for example easily describe a graph in
| two SQL tables: edges and nodes. But SQL state itself is not
| just any graph, and for a very good reason.
|
| You probably know that the first databases were hierarchical
| and graph. SQL won not because it gave more freedom, but
| because it removed freedom in a highly specific way, which
| enabled larger more reliable systems, contrary to naive
| expectations. Too much freedom is freedom to shoot yourself
| in the foot, whether you know better or not.
|
| Today graph databases have enjoyed a renaissance, but despite
| they have solid uses cases, they'll never be the baseline
| type of database. They won't replace relations. People have
| learned their lesson in databases. But in programming
| languages, they haven't. It's one of those things where we
| can't translate knowledge from one area to another, because
| the words we use for the same effects are different and
| therefore we can't make the connection.
|
| I wrote more here:
| https://news.ycombinator.com/item?id=37122934
| kaba0 wrote:
| > Most problems don't require a general graph. They require
| subsets of a graph, because everything is a subset of a
| graph. A list is a subset of a graph. A map is. A tree is.
| A relational database is. A directed acyclic graph is. But
| we can stop before it's just any "graph". We can stop at
| DAG, like neural networks do, or even at tree, like
| programming language syntax does ("abstract syntax tree",
| "expression tree"). Code and data are two faces of the same
| thing.
|
| And if you have a single edge not fulfilling the stricter
| structure's properties, you are left with a general graph.
| A single one, hence my point.
|
| > but because it removed freedom in a highly specific way,
| which enabled larger more reliable systems, contrary to
| naive expectations.
|
| That I agree with, but I personally believe it is too
| restrictive in case of memory models. Especially that the
| price really is not high -- modern GCs have insanely good
| throughputs with bounded latency, the only price being a
| slightly larger memory footprint.
| 3cats-in-a-coat wrote:
| > And if you have a single edge not fulfilling the
| stricter structure's properties, you are left with a
| general graph.
|
| And this is why it's not good to have a general graph as
| a baseline model, because one mistake or temptation to
| stray from a structure with given properties, and you
| lose all the benefits of a constraint.
|
| "This object will be passed deep into a call tree, but
| never have mutable methods called on it from more than
| one place in a process, but it'll be read from multiple
| places". Good luck enforcing this in a general object
| graph.
|
| But if you eliminate mutable shared state, say like Rust
| does (not that I approve the specifics of Rust exactly)
| then you don't have to worry about it. Rust doesn't
| implement just a general object graph where everyone can
| have a reference to everything all the time. It has
| constraints.
|
| > That I agree with, but I personally believe it is too
| restrictive in case of memory models. Especially that the
| price really is not high -- modern GCs have insanely good
| throughputs with bounded latency, the only price being a
| slightly larger memory footprint.
|
| However as I noted in the link, the GC is the smallest
| price you pay.
|
| Also it's not a slightly larger memory footprint, GC
| languages typically take 2x the RAM.
| kaba0 wrote:
| I do believe that Rust's choice is correct for its
| particular niche, but it is also a perfect example for
| _how_ limiting that constraint is (just see any kind of
| Rust help forum thingy), and not just for beginners, you
| also get burned by it at the other end of the spectrum
| (e.g. implementing lock-free algorithms, etc). Sure,
| there are escape hatches and in most other topics I would
| agree with you that having it constrained with escape
| hatches is the correct solution (e.g. type systems with
| casts), I still feel that the memory model is not the
| correct place to enforce these. (For the Rustaceans, no,
| I 'm not saying Rust is a bad language, I like it and it
| is absolutely a gift to low-level programming. Not
| necessarily a good fit for high level tasks though, for
| me at least)
|
| You mentioned SQL. The reason it can be so performant is
| that it _doesn 't_ let the user over-specify their
| constraints, letting the computer optimize its plans,
| storage layout, etc. I'm sure one can physically write a
| particular query faster than a DB could execute it, but
| it is definitely not an easy task. So strangely enough,
| both strong and weak constraints can give you good
| performance. To get back at the exact topic at hand, let
| me reference Rich Hickey's term: "Place-Oriented
| Programming", referring to this notion that 'objects' are
| _constrained_ to a given physical location, over being
| what they should be: semantics. When you write a program
| and you use a list, you don 't actually care about that
| list being in a particular place, having to be moved when
| more items are added to it, etc, those are all
| implementation details. You care about its "list-ness"
| only. I see GCs with arbitrary graphs as allowing for
| this mental model. (Note, that an object _not_ having
| identity, aka being a value type is a _semantic_
| question, and that allows for plenty optimizations by a
| decent runtime, so I 'm not telling that we shouldn't
| care about performance).
|
| > GC languages typically take 2x the RAM.
|
| There is a niche where that is a huge price, but in 99%
| of cases I believe we can afford that. Also, GCd
| languages also have their own escape hatches when needed.
|
| EDIT: Nonetheless, I hope I don't sound too
| argumentative, I genuinely enjoy this discussion and our
| differing view points - I'm engaging with curiosity, even
| if the tone tells otherwise, I'm no native speaker.
| claytonwramsey wrote:
| That's an interesting idea. Can you clarify some more on what
| the baseline memory model should be, if not a graph?
| 3cats-in-a-coat wrote:
| I don't want to impose what it should be, as I'm asking
| myself this question every day.
|
| But I can list a few thoughts. The actual memory model is
| (seemingly) a flat address space, in which processes allocate
| segments, and everything else arises from there. There's some
| remapping behind the scenes, some segments are shared, some
| not, but this is beside the point.
|
| And a pattern I see often is arena allocation, where you
| allocate a large segment and then "virtually" allocate
| smaller segments inside, down to individual scalar variables.
| This can happen several times with a lot of memory, but tends
| to be shallow.
|
| For this and many other reasons, I see mutable memory is best
| structured as a tree, a shallow tree, almost relational, but
| still permitting nesting, where every item has one owner, one
| caller, and anything else is routed through that tree. Of
| course once this semantic model is established, you can
| reduce and optimize many calls into more direct calls. But
| you get clear intercept points when you need to decorate,
| block, remap and so on behaviors in the system. No "action
| from distance" surprises.
|
| Everything should be pass by copy. For larger structures, we
| need copy-on-write optimization. Which means underneath the
| mutable tree there is a Directed Acyclic Graph of immutable
| segments supporting copy-on-write.
|
| This is also not new, it's how forking works on Unix for
| example. Every modern OS utilizes copy-on-write on the file
| system, in memory, it's also utilized, if we think about it,
| in network caching systems. But to the process it looks like
| normal flat address space, where everyone owns their own copy
| of the content.
|
| For collecting unused segments in COW you'd naively need
| refcounting. There are other approaches that borrow from GC
| algorithms, in a much simpler and more performant way,
| because, well, there are no cycles in a DAG. The best part is
| that once you decide to use copy semantics, you have a
| plethora of algorithms at your disposal to try and switch
| between adaptively (including... just copying the thing if
| it's small enough, which turns out is the fastest approach),
| while maintaining the same exact semantics for the programmer
| who doesn't have to care how it works under the hood.
|
| Another benefit of copy semantics is that things are copied
| between processes, or especially between machines in a
| network, and now your language locally has the same semantics
| as when it talks to the network, which removes barriers and
| unifies interfaces. "Write once, run at any scale" if you
| will.
|
| Anyway, this is not comprehensive. And just my opinion. But
| the problem with graphs is that it is the data structure with
| least discipline, and most complications. And in most cases
| those complications do not pay off. Not only they require
| complex GC, but they result in heavily entangled code, tons
| of shared mutable state, that gave the whole OOP space a bad
| name (that it doesn't deserve). This was never a problem with
| objects, because objects were never supposed to share mutable
| state. It's a problem of graphs and handles, a system that
| pretends everyone "has" a mutable piece of state... which it
| doesn't have, as ownership must be exclusive to be reliable.
|
| Discipline of dependency management (packages etc.) often
| removes cycles and results in a tree or a DAG. This is not
| coincidental. And there's no reason we can't replicate this
| at the lower levels to individual objects.
|
| Of course, we still need graphs. But not everywhere, and not
| all the time.
| SeniorMars wrote:
| Honestly, this is a very cool project. I would suggest you
| publish this as a formal paper to a pl journal.
| MatthiasPortzel wrote:
| I'm not a Rust programmer, but I've heard Rust has compile-time
| AST macros. I wonder if it would be possible to implement a Rust
| GC by defining a macro to tag garbage collected sections of code.
| The macro expansion could then insert cycle-checks on moves as
| described early in the post. I have no intuition for how
| performance would compare to the trait-based approach that the
| author went with.
| chubot wrote:
| IME a lot of people want some kind of help from the compiler
| when writing a GC, but there is the fundamental static vs.
| dynamic problem.
|
| Heap integrity / reachability is a dynamic property of a
| running program, and so all the static solutions basically end
| up as half-measures. (I imagine this would include using
| macros, though I don't know exactly what you mean),
|
| FWIW this is my experience -
| https://www.oilshell.org/blog/2023/01/garbage-collector.html
|
| Take it with however many grains of salt, but I heard many
| static solutions proposed, and they're all "wrong" for the
| simple reasons in theory of computation.
|
| Also, Rust's static memory management inherently clashes a bit
| with dynamic memory management. That's also a fundamental
| thing, and you can have a bazillion mechanisms to ameloriate it
| for some cases (which may be valuable), but the problem will
| still be there no matter what.
| 1letterunixname wrote:
| Yeap. 2 flavors: procedural and pattern matching. Even the
| build can be scripted in Rust. And there is constant
| evaluation. https://doc.rust-lang.org/reference/const_eval.html
|
| Rust already has precise heap allocations with Box and Rc/Arc
| without the need for GC. GC implies uncomputed heap allocation
| liveness deferring work until later.
| claytonwramsey wrote:
| In Rust, macros work at the tokenization level. Making a macro
| do that would require the macro to fully parse and analyze the
| syntax tree - possible, but not very efficient. Additionally,
| there's no guardrail to enforce that the user doesn't move a
| `Gc` anyway.
| loeg wrote:
| > box_ref.tag.store(COLLECTING_TAG.load(Ordering::Release))
|
| Is it just me or do release semantics not make sense for a load?
| Release is for stores (I'm coming from the C/C++ atomics model).
| Hm, the docs[1] say a Release load will panic:
|
| > Panics if order is Release or AcqRel.
|
| Maybe just a blog transcription typo for
| tag.store(TAG.load(Relaxed), Release).
|
| [1]: https://doc.rust-
| lang.org/std/sync/atomic/struct.AtomicUsize...
| claytonwramsey wrote:
| This is correct, thank you. I'll fix it shortly.
| bfeynman wrote:
| Having some marine background the first sentence confused me, I
| didn't realize people use the crab mascot for rust so much to use
| evolutionary terms. But there is already a term for the
| "opposite" of carcinization, it's just decarcinization. This
| analogy can come full circle if the next variant of this language
| uses the coconut crab or a hermit crab as its mascot.
| gabereiser wrote:
| You hear that crab-lang? :D Decarcinization is the correct
| opposite of carcinization. Hermit crab would be the perfect
| mascot in both function and metaphor.
| nicechianti wrote:
| [dead]
| webkike wrote:
| I've recently been looking at this space for using a Gc smart
| pointer in a language I've been writing. It's extremely useful -
| you can integrate builtins and external functions very cleanly
| e_y_ wrote:
| As I understand it, Rust originally had support for garbage
| collected objects. It was removed because the community avoided
| it as a general practice to ensure that libraries could work
| without the optional GC runtime, due to its popularity as a
| systems language.
|
| https://web.archive.org/web/20130607161259/http://pcwalton.g...
| 1letterunixname wrote:
| Yeap. Vec, String, Box, format!, and println! all depend on
| alloc for heap allocations. If an allocator is provided, alloc
| can be used in no_std. For format!, there are alternatives such
| as providing a fixed u8 buffer. It's quite amazing what can be
| accomplished without alloc.
| teleforce wrote:
| Personally I think D language got it right when it makes GC as a
| default but still allows for no GC whenever applicable thus
| keeping most of the language relatively safe, easier to grok and
| more Pythonic compared to Rust.
|
| It will be very interesting to survey how many Rust
| implementations nowadays do away with memory safety [1].
|
| Not sure if the author of the article referred to this seminal
| paper on Rust for GC for his GC implementation for Rust in Rust
| [2].
|
| [1] What is a safe programming language?
|
| https://cs.stackexchange.com/questions/93798/what-is-a-safe-...
|
| [2] Rust as a Language for High Performance GC Implementation:
|
| https://users.cecs.anu.edu.au/~steveb/pubs/papers/rust-ismm-...
| didip wrote:
| I think D failed because it doesn't take a stand one way or the
| other.
|
| Overwhelming majority of mainstream programmers want important
| choices to be made for them.
|
| - People want language designers to decide: Has GC or no GC.
| Choose one and be consistent with it. Then all the libraries
| will be written on top of it.
|
| - People want language designers to decide: Has async green
| thread or not. Choose one and be consistent with it. Then all
| the libraries will be written on top of it.
|
| - People want language designers to decide an auto format
| syntax style. Choose one and be consistent with it. Then all
| the libraries will be written on top of it.
|
| etc. etc.
| quickthrower2 wrote:
| C# doesn't do the latter two
| OtomotO wrote:
| Python is a very niche language in my country
| kaba0 wrote:
| Unless your country barely has any computers, I find it hard
| to believe.
| OtomotO wrote:
| First world country, middle of Europe, Java everywhere. And
| C#.
|
| Saw Python last at university and just because I wanted to.
|
| I know nobody that works with Python on a day by day basis
| and I know many nerds like me :)
| loeg wrote:
| You would think that if D "got it right," it would have seen a
| tenth the excitement / engagement Rust does. The proof is kind
| of in the pudding?
| speed_spread wrote:
| Beyond technical merit, D never had a "carrier" application
| like Rust had with Firefox (AFAIK).
| baq wrote:
| Rust had servo, not Firefox, but it doesn't matter that
| much; Rust got legitimately popular for the 'safe non-GC
| language' badge that it shares with precious little peers.
| Jtsummers wrote:
| > You would think that if D "got it right," it would have
| seen a tenth the excitement / engagement Rust does.
|
| Why? Excitement doesn't always follow correctness or even
| "goodness" (whatever that means in context), even if we'd
| like it to. Fashion and marketing matter more when it comes
| to excitement.
| loeg wrote:
| I think they're very correlated!
| cozzyd wrote:
| counterpoint, PHP?
| crabbone wrote:
| It's very naive to think that language popularity correlates
| with engineering merits...
|
| I worked very little with D, and a lot more with Rust. I
| cannot really speak to the qualities of D because, by and
| large, I dealt with a very few instances of it, and only at
| the interface level (mostly writing in another language).
| But, in terms of fame and popularity -- Rust developers and
| backers put a ton of effort into promoting the language.
| There were times when every other week there would be an
| article in some popular tech. media source from all kinds of
| "big names" about how Rust is superior to something (usually
| C++). There were massive efforts made to create a self-
| sustained community of Rust-evangelists who'd go and spread
| the word to the heathens...
|
| I haven't been around to witness the birth of D and don't
| remember if it had any kind of adoption campaign, but from
| what I can tell, even if it had a campaign, it didn't
| succeed.
|
| Now, here's another example from the same category of
| languages: Ada (esp. with Spark). I'm not very knowledgeable
| about it, but from the little I know, there are a lot of good
| reasons to prefer Spark over Rust. At least, on engineering
| merits, they should be roughly equivalent. But, in terms of
| popularity, Rust beats Spark by a lot.
| loeg wrote:
| Popularity is itself an engineering merit.
| amno wrote:
| Yeah, if you have $500M (half a billion of USD) to spend
| on marketing, like Sun Microsystems poured into Java, you
| must have some engineering merit behind you. Not
| necessarily in the thing you market.
| quickthrower2 wrote:
| Social Engineering
| surajrmal wrote:
| D made the right tradeoffs for whom? There are many safe
| languages with great ecosystems which have a GC. Realistically,
| before rust, there wasn't really any for languages without a
| GC. A major problem for folks who must not use a GC is that
| large parts of the D ecosystem rely on the GC which makes it
| not useful for the segments that cannot use said GC. I've not
| actually used D professionally, so I could be off base, but
| this is at least the perception.
|
| Rust is facing a similar problem in terms of sync vs async,
| which seems to limit options primarily for folks who want to
| avoid async, but it doesn't seem like a major blocker for
| adoption comparatively.
| connicpu wrote:
| I think one notable difference for Rust is that it's still
| perfectly possible to call async code from sync code. It
| might be slightly less efficient than if everything was
| participating in async, but often that's fine when it's just
| the one-off network request in your otherwise synchronous
| CPU-heavy workload. Meanwhile, for people who need to have no
| GC, you can't paper over an internal dependency that uses GC.
| The whole stack has to be no-GC period. A similar comparison
| in the Rust world is probably with the #![no_std] ecosystem.
| Kernels/kernel modules and super low power microcontrollers
| have the biggest restrictions on what libraries they can pull
| in, since they can't use any dependencies that use the normal
| standard library, only the core library. (Low power is
| relative though, eg the ESP32 chips have nearly full std lib
| support)
| claytonwramsey wrote:
| The big reason why this library exists is so that downstream
| consumers can be entirely memory safe in their allocations -
| it's the same way that the Rust standard library works. Library
| writers can use unsafe code to provide safe abstractions and
| APIs, so you end up with an unsafe core with a safe framework
| on top of it. I'm just continuing that tradition.
|
| I actually haven't read that paper! Thanks for sharing it. What
| I'm doing is slightly different - it looks like they're
| building a conservative GC with an unsafe API, much like one
| might for C. My intent was closer toward building a GC which
| can be used with any safe Rust code more or less
| unconditionally.
| [deleted]
| programmer_dude wrote:
| D was doomed from the start. It was designed as an incremental
| improvement over C++. Initially it provided refuge to burnt out
| C++ programmers like myself. But once I discovered Rust there
| was no coming back.
|
| Rust takes its fundamentals from more advanced and well thought
| out functional languages. As a result it outclasses D and its
| ilk.
| ajani wrote:
| > If carcinization happens when languages evolve to be more like
| Rust, then what do you call it when Rust evolves to be more like
| Java? Caffeination?
|
| Surely oxidation is better than carcinization. And rust evolving
| to Java would be mineralization? Or maybe germination?
| OJFord wrote:
| And it's the version I've actually heard used (multiple times)
| before now - having looked up 'carcinisation', I assume that
| comes via 'rustacean' (cf. crustacean), which makes it a bit
| more contrived and then sort of implies the result is a Rust
| _developer_ , not Rust itself?
|
| Makes more sense to me that a language/framework/etc. would
| 'oxidise' to Rust, and someone learning/getting hooked on Rust
| would 'carcinise'.
|
| But this is all silly and doesn't matter anyway, ha!
| Tyr42 wrote:
| The rust mascot Ferris is a crab. Like how go has a gopher.
| munificent wrote:
| And, of course, the name "Ferris" comes from being a
| homophone of "ferrous".
| jakear wrote:
| Eh, carcinization is a better ontological fit. Many separate
| things evolving into the same structure, versus the process of
| losing electrons and/or structurally degrading.
| senkora wrote:
| Always remember: a garbage collector is a theorem prover that
| proves that objects are unreachable, over and over, so that they
| can be destroyed.
|
| It is in general undecidable whether an object is unreachable,
| but we can get good performance with heuristics.
| KRAKRISMOTT wrote:
| Why is it undecidable? Is it not just a graph
| traversal/coloring problem?
| zer8k wrote:
| To put it simply it's _very_ difficult to determine what
| objects (in the C term) are garbage. Hence why we have things
| like reference counting. We need extra data to hint to the
| collector what is, in fact, garbage.
| bruce343434 wrote:
| Okay, so you need to set up some scaffolding/bookkeep some
| metadata. Still not impossible.
| papruapap wrote:
| No an expert, but afaik the problem is not the complexity
| but the performance cost.
| robertlagrant wrote:
| Sure, but that's not "undecidable"?
| hypertele-Xii wrote:
| Technically the halting problem is a performance cost
| issue, because you can always tell whether a program
| stops by just running it. The performance cost is having
| to actually run it and wait and see, rather than find
| some quicker way to the answer.
| myk9001 wrote:
| I don't think it can be boiled down to a performance cost
| issue. If a program is going to run forever, how do you
| measure the forever has passed to conclude the program
| doesn't halt?
| kybernetikos wrote:
| This is only true for the very small turing machines that
| we know the busy beaver number for. In the general case,
| for a system of even moderate size, you have no way of
| knowing if you've waited long enough to know if the
| program runs for ever or not.
| tylerhou wrote:
| If you define garbage in the conventional way -- "objects
| that can never be reached from any GC roots" -- determining
| what objects are garbage only takes linear time
| (tracing)...
|
| Reference counting helps the tracing phase out by pruning
| objects that are clearly garbage early -- if a refcount
| hits zero, that object can clearly be deallocated safely.
| The converse is not true, which is why we need tracing.
| fweimer wrote:
| The case that arises quite often in practice is when a
| reference is retained, but can only be used on an error path
| (e.g., as part of an exception). The error condition may not
| ever happen due to the way to code is written, but it's
| unlikely the garbage collector can prove that except in the
| simplest possible cases. Usually, it is acceptable to do a
| conservative approximation of the actual lifetime, and blame
| space leaks on the programmer.
|
| Garbage collectors usually also make assumptions in the other
| direction (a non-conservative approximation) when it comes to
| finalizable resources (e.g., file descriptors). These uses do
| not involve the reference that is visible to the collector.
| This gives us Reference.reachabilityFence(), and again
| mistakes are blamed on the programmer.
| raincole wrote:
| It's not. You're describing a static analyzer that inserts
| free() automatically. Not GC. GC is pretty much the opposite.
|
| Edit: Well, I didn't think it through. You're correct. A
| "perfect GC" is indeed undecidable, and all the real world,
| practical GC have "false positive" (a piece of memory is no
| longer used by the code, but there is no way to know it). A
| perfect GC without false positive is equal to halting problem.
|
| Example: var obj = new Object(); //
| === real code begin ... 50 lines of real code that never
| uses obj if (something == true) return; ... another
| 50 lines of real code that never uses obj // === real
| code end obj.doThings();
|
| A perfect GC can release obj while the real code is running. A
| real world, imperfect GC can't, since whether the real code
| returns early or not is undecidable.
| monocasa wrote:
| You normally fix that not with the GC itself, but with the
| compiler.
|
| If new Object() truly doesn't have any effects visible other
| than the valid reference, then it's creation can be moved to
| after the if statement.
| chubot wrote:
| After writing a garbage collector, I can say that I've seen
| the unreachable / "will never be used" distinction, and it's
| more or less irrelevant trivia.
|
| It doesn't help you write a garbage collector to think of it
| that way. It doesn't provide insight into the problem. The
| problems are elsewhere. Focus on reachability, dynamic
| understanding of the heap, performance, etc.
| nomel wrote:
| > more or less irrelevant trivia
|
| What are some cases? The only I can think of is if a
| "remote" thing owned the memory. For example, if I sent a
| pointer elsewhere, then reused it later. Like, maybe with a
| device driver, or library.
| lisper wrote:
| > You're correct.
|
| No, s/he's not. The original claim was:
|
| > It is in general undecidable whether an object is
| UNREACHABLE. [Emphasis added]
|
| Being unreachable is not the same thing as being "no longer
| used by the code". The latter is indeed undecidable, but the
| former is not.
| [deleted]
| 3cats-in-a-coat wrote:
| Empirically what you say can't be the case, because if it's
| "undecidable" then one of these must be true:
|
| 1. The GC frees objects that could be reachable (program gets
| corrupted in spectacular ways).
|
| 2. The GC never frees anything (as it can't decide).
|
| The GC may not dispose of objects immediately, but when it
| determines they're not reachable, they're not reachable,
| period. There's nothing undecidable about it.
| hashmash wrote:
| It's undecidable when you consider the possibility that the
| program won't ever attempt to access an object. A tracing
| collector plays it safe and assumes such objects might still
| be accessed at some point. In GC languages you'll often see
| deliberate null assignments in order to allow the collector
| to clean up objects sooner.
| 3cats-in-a-coat wrote:
| Reachable/accessible and reached/accessed are different
| things, why are we talking about it as if it's the same
| thing? One is a potential eventuality, the other is an
| event.
|
| GC are concerned with the potential eventuality, which is
| 100% decidable, not with the latter, that was never under
| discussion.
|
| To say "reachability is undecidable" means you can't decide
| whether you MAY or MAY NOT reach it. Which is wrong. You
| can decide whether you MAY or MAY NOT reach it. And if you
| MAY NOT... then that's 100% unreachable and safe to
| collect.
| munificent wrote:
| It's easily decidable whether a given GC object _could_ be
| reached. What isn 't decidable is whether a it _will_ be used
| in the future.
|
| The conservative approximation that GC's make is that they keep
| all objects that _could_ be accessed in the future by the
| program (reach- _able_ ) instead of only keeping the objects
| that _will_ be used in the future.
|
| A trivial example is: void main() {
| var obj = new Object(); while (true) {
| // Do other work... if (dayOfWeek == 9) {
| print(obj); } } }
|
| Since there are only seven days in the week, that `print()`
| statement will never be reached, and an optimal GC would free
| obj. But the GC can't determine that. All it knows is that it
| could possibly be reached, so the object stays in memory.
| cozzyd wrote:
| https://en.wikipedia.org/wiki/French_Republican_calendar
| aatd86 wrote:
| I'll join the others in trying to bring up a nuance.
|
| The issue is not necessarily unreachability being undecidable.
| The issue is that reachability depends on control flow. And
| that implies being conservative which translates into having to
| deal with false positives.
|
| That static analysis in a nutshell: how to get precision and
| soundness.
|
| Probably a mere question of terminology.
| convolvatron wrote:
| I'm not sure this is right. the classic gc approach is dynamic,
| not static, and it discovers the reachable set, not the
| unreachable one. please elaborate?
| Filligree wrote:
| There's garbage collection algorithms that do one, the other,
| or a combination. It depends mostly on what constraints you
| have, though yes, high-performance GC mostly looks for the
| reachable set.
| tayistay wrote:
| Is this essentially the "cyclic reference counting" algorithm
| from the Garbage Collection Handbook [1], Chapter 5, Algorithm
| 5.5?
|
| [1] https://gchandbook.org
| convolvatron wrote:
| after spending the requisite time trudging through the lifetime
| swamp (I couldn't couldn't imagine running my whole life on the
| stack), and seeing all the nice things about rust, I couldn't
| help but thinking at the end 'this would be a really nice
| language if it were garbage collected'. of course you can't say
| that.
|
| this is nice
| speed_spread wrote:
| It's not taboo to say so, even within the Rust community. It's
| something that will happen eventually. You might be interested
| in reading https://without.boats/blog/revisiting-a-smaller-
| rust/
| echelon wrote:
| This post is a horrible intro to Rust. It's a fantastic work,
| but totally outside of what a normal everyday Rust program
| looks like.
|
| > requisite time trudging through the lifetime swamp
|
| Rust really isn't like that. It gets a bad rap.
|
| > I couldn't couldn't imagine running my whole life on the
| stack
|
| Most of the things you touch in Rust are heap allocated.
|
| > this would be a really nice language if it were garbage
| collected
|
| It is a nice language, and all of your complaints go away once
| you start using the language idiomatically.
| masklinn wrote:
| > of course you can't say that.
|
| Oh you can absolutely say that.
|
| It's just a dumb take. Because there are already languages
| which have a solid type system, a functional bent, and a GC.
| And you can probably bang out one yourself if you want.
|
| If that was what Rust was, it probably wouldn't have existed in
| the first place, Mozilla would not have been interested in it
| in the second place, and no community would have gelled around
| it in the third place. Hell, it was specifically dragged
| further downstack by the people who coalesced around its
| _potential_ in that space.
| shpongled wrote:
| It's not a dumb take at all. I have been using Rust for > 6
| years now. A significant portion of my projects don't benefit
| strongly from the borrow checker (the other ones massively
| do). I use Rust primarily because it has a solid type system,
| functional bent, and - most importantly - phenomenal tooling
| (cargo, rust-analyzer), package ecosystem, and developer UX.
|
| If OCaml/StandardML/Haskell/etc had anything close to the
| ergonomics and convenience of cargo for testing, setting up
| projects, adding deps, deploying them to any OS/platform, I
| would probably write about 30-40% less Rust.
|
| I'm not saying we should add GC to Rust - but I think there
| is space for a Rust+GC language (or just improved ecosystem
| and tooling for other functional langs)
| pjmlp wrote:
| Like opam and stackage?
| shpongled wrote:
| I'm familiar with both - but not really the same order of
| magnitude. crates.io lists >120k packages. opam has
| ~4500, and stackage ~3000. And the package ecosystem is
| only part of it. Admittedly, I haven't written any OCaml
| or Haskell in the last year or so, but my experience of
| just getting a new library or executable up and running
| isn't comparable to cargo either. But also hard to
| separate my strong familiarity with Rust from the
| comparison.
| pjmlp wrote:
| Quality matters more than quantity.
|
| I rather have one package that does it right, than 20
| doing a different set of 80% from what is needed.
| shpongled wrote:
| Who wouldn't. But let's not pretend that stackage and
| crates.io are covering the same functionality space.
|
| As just one example: there are no OCaml or Haskell
| packages for natively reading or writing parquet files.
| There are at least 2 such packages for Rust.
| pjmlp wrote:
| Yeah, yet it doesn't matter to me, I don't even know what
| they are about.
|
| Likewise, despite crates.io, Rust is missing lots of
| stuff I miss from Java, C# and C++.
|
| Plenty of stuff isn't there yet.
| twic wrote:
| It's not a dumb take. It's certainly true that Rust with GC
| would not solve the problems Rust was intended to solve, and
| would really not be Rust.
|
| But that doesn't mean Rust with GC wouldn't be a nice
| language! I don't know of any language that really fits that
| description.
|
| In fact, I'll pick just four things I like about Rust: a good
| type system, a supportive, productive community with a focus
| on doing real stuff, enough mindshare and buzz not to be
| irrelevant, and being not totally insane.
|
| The first criterion rules out all untyped languages, Go, and
| the blue collar languages like Java and C#. The second rules
| out all existing functional languages. The third rules out
| whatever niche language you're about to suggest. The fourth
| rules out C++.
|
| There really isn't a mainstream managed language which has
| managed to thread both the design and community needles the
| way Rust has.
| WorldMaker wrote:
| C# probably has more of the "good type system" you are
| looking for than not these days (and every compiler release
| at this point seems to better it) as it keeps borrowing
| clothes from F# and the ML family of languages. The biggest
| missing tooth in the type system that most really want are
| discriminated unions at this point, and you can build
| things like them by hand or using common libraries
| (including building them relatively easily in F# in a way
| that is exportable to C#; it's not the most ergonomic
| answer, but it is fine sometimes). It's also something that
| seems increasingly likely to show up natively in a future
| compiler release.
| kaba0 wrote:
| Scala seems to fit all of your requirements.
| mhh__ wrote:
| It's true that Rust would probably not exist today if it
| didn't focus on lifetimes but I think a lot of Rust
| programmers now either don't care or literally don't know
| anything about memory management e.g. Every rust programmer
| that I've met (my age at least) has taken to rust because
| they like it as a language and are often trying to avoid
| learning C++ (note that is completely different from looking
| for a language _after_ having learnt C++).
|
| Rust as a midpoint between Haskell and C, so really C with a
| nicer type system, combined with momentum/meme status is more
| important today than it's memory safety story. Memory safety
| just isn't the buzzword that it was even in 2019 let alone
| 2015.
|
| > And you can probably bang out one yourself if you want.
|
| I've written my own dataflow analysis code, aimed at lifetime
| checking, too, but that's not really the point is it?
| zer8k wrote:
| Rust is nothing like C. It's not even a midway point
| between C and Haskell.
|
| Rust is C++ with stronger typing and a better memory
| management model. Comparisons between Rust and C aren't
| really accurate because Rust is the modern stainless steel
| kitchen sink that C++ can't be. Without its lovely ADTs and
| borrow checker it's just C++ with a better package
| management story. Nothing wrong with that. It won't replace
| C.
| kaba0 wrote:
| Targeting C for replacement wouldn't even have been a
| great goal, C++ is the de facto performance queen in the
| industry, follow its suit.
| pjmlp wrote:
| That language is called OCaml.
| galangalalgol wrote:
| OCaml doesn't have a borrow checker (yet). I'm sold on having
| a borrow checker, not just for memory, but for io of any
| sort. I'm not sure what adding GC to a borrow-checked
| language allows for other than catching self referencing but
| unreachable loops?
|
| I definitely think a language that has room for abstractions
| that aren't zero cost, while still having a borrow-checker
| makes sense. Using rust to make that language also makes
| sense. I think implementing a vm or interpreter entirely in
| safe rust would effectively force borrow checking on the
| resultant language wouldn't it?
|
| Ruby seems like a good place to start the experiment. I'd
| prefer it end up with something more like julia or python
| just for my own use cases.
| pjmlp wrote:
| OP was asking for a language like Rust, using a GC instead,
| duh!
|
| As for having both, I rather take Swift memory ownership or
| Linear Haskell approach, or even D/C#/Nim/Go.
|
| The productivity of automatic memory management, with the
| tooling to go low level C and C++ style, if and when it is
| really needed.
| galangalalgol wrote:
| I would have considered the borrow checker and zero cost
| abstractions to be rusts defining characteristics, so
| ocaml isn't anything like rust except some syntax and
| that they are both multi-paradigm. And then my point is
| that as the author points out, you don't really need a GC
| if you have a borrow checker. If you want a language with
| few costly abstractions, but with a GC, go or D come to
| mind I guess? But the borrow checker is such a big piece
| of rust I don't really consider those similar languages
| either.
|
| If the idea is for a safe but more ergonomic language at
| the cost of performance, I think you'd be focused on
| making clone and copy implicit when a move doesn't work,
| getting rid of explocit borrows/references, and perhaps
| figuring out mutability without needing to annotate it.
| The absolute type safety is probably the biggest
| speedbump I hit in daily usage though, not the borrow-
| checker. A strong but duck/dynamic type system might
| solve that?
| pjmlp wrote:
| Most people when they say like Rust, they mean ML syntax,
| as Rust is the very first experience with Hindley-Milner
| type systems.
|
| Borrow checker alone makes several algorithms and data
| structures quite hard to implement.
|
| Swift memory ownership model, Linear Haskell, Hylo (nee
| Val), Chapel, Ada formal proofs, D's ownership model, are
| all examples where language designers decided borrow
| checker alone is too much to ask for.
| galangalalgol wrote:
| Swift's proposal with copy on write is the sort of thing
| I was suggesting to make the borrow checker more
| ergonomic. It is going to be optional though, which I
| think takes some of the utility away. I want much
| stronger safety guarantees in the overall infrastructure
| than that would provide. Linear types in haskell would
| provide that, but speaking of some data structures being
| difficult, Haskell makes largely the same ones
| problematic. Ada has a borrow checker of sorts, but it
| makes unchecked_deallocate UB. As long as you don't free
| memory it doesn't have UB. That isn't a bad trade really,
| but it still isn't completely impossible to do bad things
| with access types and they aren't that ergonomic either.
| I don't know much about the others you mentioned. There
| is definitely a space for a friendlier enforcement of
| ownership that makes strong guarantees. But again, the
| borrow checker isn't my main pain point in rust, its the
| static strong typing with no duck typing for generics. It
| makes for nice error messages, but it makes other stuff
| hard.
| pjmlp wrote:
| No need to call unchecked_deallocate directly, unless one
| isn't using controlled types, and even better SPARK
| features.
|
| No UB to worry about.
| chc4 wrote:
| Nice blog post! I also wrote a concurrent reference counted cycle
| collector in Rust (https://github.com/chc4/samsara,
| https://redvice.org/2023/samsara-garbage-collector/) though never
| published it to crates.io. It's neat to see the different choices
| that people made implementing similar goals, and dumpster works
| pretty differently from how I did it. I hit the same problems wrt
| concurrent mutation of the graph when trying to count in-degree
| of nodes, or adding references during a collection - I didn't
| even think of doing generational references and just have a
| RwLock...
| nextaccountic wrote:
| Why not publish to crates.io? This seems useful
| chc4 wrote:
| Honest answer is mostly that I forgot; there was one more
| small TODO that I needed to fix before I wanted to publish
| it, and then I got busy and forgot about it.
| claytonwramsey wrote:
| That's really cool! Thanks for the kind words. I remember
| peeking at it while hunting for collectors to benchmark
| against.
| zengid wrote:
| This is really impressive work and I applaud the author for
| building something that I'm sure lots of folks will find
| interesting and possibly very useful, but I want to say that I
| think there's a _reason_ reference semantics (using and sharing
| heap allocations) are hard in Rust: It 's because references are
| just plain hard to use safely. Rust tries to make the gnarly
| things awkward, but what is the easy path?
|
| If you look at the Hylo (formerly Val as of 3 days ago) language
| [0] that folks like Dave Abrahams have been working on, they're
| outright saying to just not use reference semantics if you can
| avoid it. In this talk at CppCon about Value Semantics, Dave
| argues that we should be decoupling object graphs from access to
| objects [1]. That's right folks, using `usize` indexes into
| collections, or adjacency lists, isn't just a hack to get around
| the borrow checker as people like Jonathan Blow have famously
| critiqued [2], it may just be _the right way of doing things_.
|
| [0] https://www.hylo-lang.org
|
| [1] https://youtu.be/QthAU-t3PQ4?t=2354
|
| [2] https://www.youtube.com/watch?v=4t1K66dMhWk
| IshKebab wrote:
| > just a hack to get around the borrow checker as people like
| Jonathan Blow have famously critiqued [2], it may just be the
| right way of doing things.
|
| I dunno, isn't a usize index into a collection essentially just
| the same as a pointer? Ok it's a bit safer because you at least
| know the type of the object you're accessing is correct, but
| you can still get e.g. use after free bugs.
|
| I think Jonathan Blow is right and wrong - it is a hack to get
| around the borrow checker, but also that's totally fine. The
| borrow checker still works for the other 95% of code you are
| writing. Nobody ever claimed it was the perfect solution to
| every problem.
| claytonwramsey wrote:
| I know of a lot of people trying to do "arena" GC in Rust using
| integer indices. I think that's an excellent approach, but it
| does mean you have to pass in the arena as a resource to every
| single function which produces an allocation. This isn't even
| necessarily a bad thing - the net result is a lot like Zig's
| allocator API.
|
| I know there's a nontrivial overhead to using index-based
| references, especially since CPUs can't easily predict load
| operations. This gives me an idea: create a CPU architecture
| where pointers are actually (object, offset) tuples, enabling
| better on-metal performance with index-based references. I've
| neither the time, money, expertise, nor energy to implement
| such a thing, but it would be really cool if it existed - maybe
| CHERI is a close approximation, though I haven't looked very
| closely at it.
| gpderetta wrote:
| > it does mean you have to pass in the arena as a resource to
| every single function which produces an allocation
|
| Implicit arguments! I'm more and more convinced that this
| form of controlled dynamic scoping dynamic scoping should be
| embraced by more languages.
|
| > create a CPU architecture where pointers are actually
| (object, offset) tuples
|
| Segments might yet see a renaissance (and CHERI might be
| indeed be a form of it).
| norir wrote:
| Yes, the obvious way to do this is have the compiler add a
| context parameter to every function that allocates memory
| or dereferences a pointer (semantically). To run the
| program, the main function ala c initializes the context
| and passes it in to the actual main program implementation.
| When the program finishes, all the memory can just be
| released in bulk in the wrapper main that initialized the
| context.
| sgtnoodle wrote:
| How is that much different than just using malloc and
| then letting the OS clean up when the process exits?
| greiskul wrote:
| By having implicit arguments, a framework can decide when
| to override the implictness and be explicit to use a
| different arena when needed.
|
| Also, many process like servers are long lived, so the OS
| clean up strategy does not work for them, and they are
| exactly where you want some strategy for memory
| allocation like arenas to prevent memory fragmentation.
| nextaccountic wrote:
| This is actually a serious proposal for the Rust
| language, it's called contexts and capabilities
|
| Here's a blog post from 2021
| https://tmandry.gitlab.io/blog/posts/2021-12-21-context-
| capa...
|
| This isn't implemented yet, and the proposal never
| matured into a proper RFC, but there's high interest for
| some solution along those lines
| nextaccountic wrote:
| And just a note, this is called Reader monad in Haskell
|
| In other words, receiving an implicit parameter is a kind
| of effect. So another way to provide this feature is to
| have a full blown effect system (which is much more
| general and is probably overkill, but there's also some
| interest for that)
| mhink wrote:
| > Implicit arguments! I'm more and more convinced that this
| form of controlled dynamic scoping dynamic scoping should
| be embraced by more languages.
|
| This might be an odd comment, but this is sorta similar to
| React's Context mechanism. For what it's worth, it makes a
| lot of things easier but it can definitely also add a lot
| of surface area for complexity. Although it's certainly
| better than relying on globals all over the place.
___________________________________________________________________
(page generated 2023-08-14 23:00 UTC)