[HN Gopher] Borrow Checking, RC, GC, and Eleven Other Memory Saf...
___________________________________________________________________
Borrow Checking, RC, GC, and Eleven Other Memory Safety Approaches
Author : PotatoPancakes
Score : 167 points
Date : 2024-12-16 21:25 UTC (4 days ago)
(HTM) web link (verdagon.dev)
(TXT) w3m dump (verdagon.dev)
| andrewstuart wrote:
| I'd love to see a language that kept everything as familiar as
| possible and implement memory safety as "the hard bit", instead
| of the Rust approach of cooking in multiple different new sub
| languages and concepts.
| brabel wrote:
| The author of the post is trying pretty much that with his
| language, Vale.
| phicoh wrote:
| As a long time C programmer I like Rust because it combines two
| things from C that are important to me (low runtime overhead,
| no runtime system required) with a focus on writing correct
| programs.
|
| Memory safety is just one aspect where the compiler can help
| making sure a program is correct. The more the compiler helps
| with static analysis, the less we need to rely on creating
| tests for edge cases.
| zamalek wrote:
| > Memory safety is just one aspect
|
| I feel as though not enough attention is given to how std is
| designed. For example: [u8], str, Path, and OsStr may be
| confusing at first, but when you understand why they are
| there any other approach feels icky. std guides you down a
| path of caring about things that really should matter (at
| least if you're only unwrapping provably safe values).
|
| Have you considered what happens if not-utf8 data winds up in
| an environment variable that you are writing to stdout? What
| if it contains malicious VT commands?
| shiomiru wrote:
| > Have you considered what happens if not-utf8 data winds
| up in an environment variable that you are writing to
| stdout? What if it contains malicious VT commands?
|
| Unless you're talking about terminal bugs in parsing
| invalid UTF-8 - and parsing invalid UTF-8 is easier than
| rendering _valid_ UTF-8 - VT commands are UTF-8 compatible.
| You just need to embed an ASCII escape character.
| adgjlsfhk1 wrote:
| yeah I think this is an area where rust (and python) just
| get it wrong. files, the Internet and input devices can
| all give you invalid Unicode. IMO it's better to have a
| primary string type that includes invalid Unicode since
| most algorithms will handle it correctly anyway, and the
| ones that won't can pretty clearly check and throw errors
| appropriately (especially since very few algorithms work
| correctly for all of Unicode in the first place)
| dwattttt wrote:
| There's a primary type that holds invalid utf8; it's
| [u8]. If you want a string from that, you try turn it
| into a string, and deal with the errors then.
|
| If an algorithm works for invalid Unicode, it should
| probably be an algorithm on bytes, not strings.
| ramon156 wrote:
| Tbf I would already consider a different language when it has
| al the nice syntax sugar and design choices of Rust. I like
| almost every choice they made, and I miss things like `if let
| Some` or unwrapping in other languages. It's just not the same
| chrismorgan wrote:
| As an experienced Rust developer, I have absolutely no idea
| what you mean by this. Could you write a little more about what
| you have in mind, and even what you mean by sub-languages and
| concepts in Rust?
| karmakurtisaani wrote:
| Isn't that just C/C++?
| frou_dh wrote:
| "Familiar" is subjective so it's not really something to hang
| your hat on.
| nicoburns wrote:
| Familiar to whom? I came from a JavaScript background, and
| Rust's syntax and "functional lite" style felt very familiar.
| pornel wrote:
| Safety is not an extra feature a'la carte. These concepts are
| all inter-connected:
|
| Safety requires unions to be safe, so unions have to become
| tagged enums. To have tagged enums usable, you have to have
| pattern matching, otherwise you'd get something awkward like
| C++ std::variant.
|
| Borrow checking works only on borrowed values (as the name
| suggests), so you will need something else for long-lived/non-
| lexical storage. To avoid GC or automatic refcounting, you'll
| want moves with exclusive ownership.
|
| Exclusive ownership lets you find all the places where a value
| is used for the last time, and you will want to prevent double-
| free and uninitialized memory, which is a job for RAII and
| destructors.
|
| Static analysis of manual malloc/realloc and casting to/from
| void* is difficult, slow, and in many cases provably
| impossible, so you'll want to have safely implemented standard
| collections, and for these you'll want generics.
|
| Not all bounds checks can be eliminated, so you'll want to have
| iterators to implement typical patterns without redundant
| bounds checks. Iterators need closures to be ergonomic.
|
| ...and so on.
|
| Every time you plug a safety hole, it needs a language feature
| to control it, and then it needs another language feature to
| make this control fast and ergonomic.
|
| If you start with "C but safe", and keep pulling that thread,
| nearly all of Rust will come out.
| AlotOfReading wrote:
| I've experienced that frustration myself several times and
| tried to do "rust but simpler". I just recently failed
| attempt #4 at not reinventing rust but worse. Attempt #2
| ended with Erlang but worse, which was a pleasant surprise.
| nahuel0x wrote:
| It's surprising to see an article with such a large encompassing
| of different techniques, hybrid techniques and design
| interactions with the type system, but is more surprising that a
| whole dimension of memory (un)management was left out: memory
| fragmentation
| Quekid5 wrote:
| It's probably because fragmentation isn't a safety issue. (In
| the sense of 'safety' being discussed here.)
| galangalalgol wrote:
| It doesn't create UB, but it is something safety critical
| software has to address.
| Quekid5 wrote:
| ... which is why I had that little bit at the end there.
| willvarfar wrote:
| Meta comment, but I really like the formatting of the blog post!
|
| It reminds me of the early days of the web, when text was king
| and content was king. I particularly like the sidenotes in the
| margins approach.
|
| (Hope the author sees this comment :) Hats off)
| brabel wrote:
| Yeah the author always uses this in his blog about his
| language, Vale (which is very unfortunately not being developed
| anymore, at least for now). The other posts are also worth a
| read: https://vale.dev/
| ivell wrote:
| He now works on Mojo, to bring linear types into Mojo.
| dgan wrote:
| I am sorry, I am maybe dumb but i can't see the 14 techniques
| been listed anywhere? Where do i even click?
| hawski wrote:
| You need to read the post.
|
| > Wait a minute, this list goes to 17, yet the intro only
| mentions 14! I actually did that because a couple might
| overlap and a couple of them are half-approaches, and that
| last one is just here for fun. Besides, as I learn more
| approaches and add them to the list, the title will get more
| and more out of date anyway.
| _kb wrote:
| Side notes are a great layout for most deeper reads.
|
| There's some great tooling for that via
| https://edwardtufte.github.io/tufte-css/ and https://tufte-
| latex.github.io/tufte-latex/.
| chrismorgan wrote:
| > _Cursed_
|
| With an acute accent, that should be roughly /,ke:r'seId/ "curse-
| ay-d". (Think "cafe" or "sashayed".)
|
| The stylised pronunciation being evoked is roughly /'ke:r,sed/,
| "curse-ed", and would be written with a grave accent: "cursed".
| rzzzt wrote:
| Italians will get close to the first pronunciation both ways, I
| think. The Zalgo line noise is an international way of
| signaling the level of curse in writing.
| bloppe wrote:
| Accents mean different things in different languages. I'm
| assuming the author was invoking Spanish, where the acute
| accent impacts syllable stress but not pronunciation.
| mgaunard wrote:
| The fact that re-using a slot for a different object of the same
| type is considered a memory safety technique is ridiculous.
| obl wrote:
| It is not ridiculous at all. Those things have pretty precise
| definitions and type segregation absolutely does remove a bunch
| of soundness issues related to type confusion.
|
| You can think of it as the rather classic "Vec of struct +
| numeric IDs" that is used a lot e.g. in Rust to represent
| complex graph-like structures.
|
| This combined with bound checking is absolutely memory safe. It
| has a bunch of correctness issue that can arise due to index
| confusion but those are not safety issues. When combined with
| some kind of generational counters those correctness issue also
| go away but are only caught at runtime not at compile time (and
| they incur a runtime cost).
|
| Rust's memory safety is about avoiding liveness issues (that
| become type confusions since all memory allocators will reuse
| memory for different types), nothing more, nothing less.
| jandrewrogers wrote:
| FWIW, arrays of structs + integer handles is the primary way
| objects are represented in performance-engineered C++.
| gpderetta wrote:
| > but those are not safety issues.
|
| there are not _memory_ safety issues. But they definitely can
| lead to security issues with some sort of confused deputy
| attack.
|
| For example a capability based system that relied on just
| this form of memory safety would be pointless.
|
| Of course this can be mitigated by adding version counters to
| objects and object selectors.
| adamrezich wrote:
| "Safety" is a very overloaded English word with strong
| connotations, and the popularization of it in the context of
| "memory safety" has been good in some ways, but has really
| poisoned the discourse in many others.
| AlotOfReading wrote:
| It absolutely is. Some embedded mallocs do this under the hood
| against a predefined linker table. Genuinely helps even with
| the restrictions it imposes.
| eddd-ddde wrote:
| Are you implying that it is unsafe? Or that it is unrelated to
| safety properties?
| mgaunard wrote:
| It just shows memory safety is a joke, simply replacing a
| class of bugs with another.
| IshKebab wrote:
| It isn't a joke. With memory safety bugs the value of an
| object can unexpectedly be _any bit pattern_. That breaks
| the assumptions of basically every language and leads to
| pretty much anything happening.
|
| If you have an array of objects of the same type and you
| just pick the wrong one, then the data still has to be a
| valid bit pattern. Yes it might still be a security bug,
| but it's much less likely because you aren't completely
| subverting the language.
|
| Surely you don't think all bugs are the same because they
| are all bugs?
| mgaunard wrote:
| It takes a lot more for a program to be correct than
| having valid bit patterns.
|
| To begin with, the whole point of classes is to maintain
| invariants. Guaranteeing that a location in memory
| matches the valid bit patterns of its members is far from
| sufficient.
| IshKebab wrote:
| > It takes a lot more for a program to be correct than
| having valid bit patterns.
|
| Obviously. I never said otherwise. What's your point?
| ptx wrote:
| Maybe we need to introduce the term "value safety" to
| complement "type safety".
|
| If a language is merely type-safe, then it might be OK to
| silently replace a value with a different one of the same type,
| sure, fine. Who cares if the program transmits the wrong
| message to the wrong recipient as long as it's definitely some
| message and some recipient?
|
| But a _value-safe_ language, I suggest, is one that doesn 't
| pull this kind of switcheroo.
| jcarrano wrote:
| Reminds me of a video I saw about "cloning" on Super Mario
| 64[1]. SM64 uses slots and one can mess around with slot
| deletions to get random objects.
|
| [1] https://www.youtube.com/watch?v=X2AhyDI58-I
| hawski wrote:
| That is very informational. Thank you.
|
| I am interested in Vale and it feels very promising, though
| because my interested in bootstrapping I don't like that it is
| written in Scala. I know, that is shallow, but that's a thing
| that limits my enthusiasm.
|
| If you are like me and don't like jumping around between notes
| and text and you prefer to read the notes anyway, here is a
| little snippet you can run in Web Inspector's Console:
| document.querySelectorAll(".slice-contents a[data-
| noteid]").forEach(e => {document.querySelectorAll('.slice-notes
| [data-noteid="' + e.attributes["data-noteid"].nodeValue + '"]
| p').forEach(p => {p.style.fontSize = 'smaller';
| e.parentNode.insertBefore(p, e)}); e.remove() })
|
| It will replace note links with notes themselves making them
| smaller, because they will not always fit smoothly.
| tialaramex wrote:
| The list gets very woolly by the end. CHERI exists (though not at
| volume), Cornucopia Reloaded is a research paper, "plus some
| techniques to prevent use-after-free on the stack" is entirely
| hand waving.
|
| It is really good as food for thought though.
| nemetroid wrote:
| No mention of RCU?
| gpderetta wrote:
| RCU, despite the name, is indeed a reclamation algorithm, but
| not a general one. I.e. you would use RCU (or some other
| deferred reclamation algorithm like hazard pointers) for
| specific data structures when you do not have generalized
| garbage collection.
|
| A generalized RCU is just a tracing GC.
| nemetroid wrote:
| The part where the article limits itself to general
| techniques eludes me.
| DanielHB wrote:
| I am not experienced with rust and borrow checkers, but my
| impression is that borrow checkers also statically ensures
| thread/async safety while most other memory safety systems don't.
| Is this accurate?
| PeterWhittaker wrote:
| The first part - that the Rust borrow checker and overall
| memory model ensures thread/async safety - is true. I cannot
| speak to the second part - that other systems don't have this
| assurance.
| tialaramex wrote:
| Just the borrowck isn't enough, you need the Send and Sync
| marker traits. Marker traits are something lots of languages
| _could_ do but they 'd be useless (or always unsafe) without
| a lot of other machinery Rust had already.
| kibwen wrote:
| And Sync might just be there for performance, I think you
| could get away with only Send if you didn't mind some
| additional copying?
| DanielHB wrote:
| > that other systems don't have this assurance
|
| My understanding is that most (all?) GC languages are memory
| safe, but do not ensure statically verifiable thread safety
| at all. Like Java, Go, C#, Python, etc.
| kibwen wrote:
| The borrow checker is only one component of the means by which
| Rust statically enforces thread safety. If you design a
| language that doesn't allow pointers to be shared across
| threads at all, then you wouldn't need a borrow checker.
| Likewise if you have an immutable-only language. What's
| interesting about Rust is that it actually supports this
| safely, which is still unbelievable sometimes (like being able
| to send references to the stack to other threads via
| std::thread::scoped).
| nemaar wrote:
| > If you design a language that doesn't allow pointers to be
| shared across threads at all, then you wouldn't need a borrow
| checker.
|
| Is that actually true? I'm pretty sure you need the borrow
| checker even for single threaded Rust to prevent use after
| frees.
| chlorion wrote:
| Theres even more than just UAFs that you have to worry
| about in a single threaded context, but yes you are
| correct.
|
| Here's a good post that talks about why shared mutability
| even in single threaded contexts is dangerous:
| https://manishearth.github.io/blog/2015/05/17/the-problem-
| wi...
| kibwen wrote:
| Let me clarify: "wouldn't need a borrow checker for the
| specific requirement of ensuring thread safety". Clearly
| the borrow checker is quite useful in single-threaded
| contexts on its own. :P The point is just that it's
| perfectly valid to have a language that doesn't have
| "reference semantics" at all.
| jerf wrote:
| You may have to define your terms more carefully and extend
| past "doesn't allow pointers", but it can be true. Look to
| Erlang; the key to Erlang's safety isn't actually its
| immutable values, but the fact that no messages can travel
| between the various "processes" (threads in conventional
| parlance, just not "OS threads") that carry any sort of
| reference or pointer is what makes it so Erlang is safe
| without any concept of borrow checking. It also
| semantically copies all values (there are some internal
| optimizations for certain types of values that make it so
| they are technically not copied, but at the language level,
| it's all copies) and each process is GC'd, so in terms of
| the Rust borrow checker I mention this _only_ in the
| context of cross-thread sharing safety.
|
| But in general, if threads can't communicate any sort of
| pointer or reference that allows direct, unmediated access
| to the same value that some other thread will see, there's
| no need for a "borrow checker" for thread safety.
|
| (Note that "but what if I have a thing that is just a token
| for a value that people can potentially read and write from
| anywhere?" is not an exception to this, because in this
| context, such access would not be _unmediated_. This access
| would still require messages to and from the "holding"
| process. This sort of safety won't stop you from basically
| deliberately re-embedding your own new sorts of races and
| unsafe accesses in at the higher level of your own code, it
| just won't be a data race in the same way that multiple
| threads reading and writing through the same pointer is a
| data race at the lower level. The main solution to this
| problem is "Doctor, it hurts when I do this." -> "Don't do
| that.")
| alexisread wrote:
| Previous discussion:
|
| https://news.ycombinator.com/item?id=40146615
|
| https://news.ycombinator.com/item?id=41974185
| dang wrote:
| Thanks! Macroexpanded:
|
| _Borrow Checking, RC, GC, and the Eleven (!) Other Memory
| Safety Approaches_ -
| https://news.ycombinator.com/item?id=41974185 - Oct 2024 (1
| comment)
|
| _Borrow Checking, RC, GC, and the Eleven () Other Memory
| Safety Approaches_ -
| https://news.ycombinator.com/item?id=40146615 - April 2024 (68
| comments)
| amelius wrote:
| I like many of the ideas of Rust, but I still think it is an
| unsuitable language for most projects.
|
| The problem is that it is very easy to write non-GC'd code in a
| GC'd language, but the other way around it is much much harder.
|
| Therefore, I think the fundamental choice of Rust to not support
| a GC is wrong.
| FridgeSeal wrote:
| It does have a GC.
|
| It just runs at compile time. Bonus feature, it helpfully
| prevents a number of common bugs too.
| amelius wrote:
| GC is short for automatic GC.
|
| If you have to do it yourself, then it does not "have" a GC.
| FridgeSeal wrote:
| Ssshhh you're ruining my silly and "hasn't gone over
| terribly well" joke.
| f1shy wrote:
| Yet another PoV: for some things with critical timing or so, GC
| might be a problem. But most of the time, it isn't. The
| performance/predictability topic could also be reviewed...
|
| I was talking with a colleague about that, he said "in C I know
| exactly where things are when" And I replied that under any OS
| with virtual memory, you have basically no clue where are
| things at any time, in the N levels of cache, and you cannot do
| accurate time predictions anyway... [1]
|
| I'm convinced today GC is the way to go for almost all. And I
| was until 5 years ago or so, totally opposed to that view.
|
| [1] https://news.ycombinator.com/item?id=42456310
| pjmlp wrote:
| Even with critical timing, real time GCs exist for decades
| now, PTC and Aicas are two surviving companies selling
| software tooling for embedded markets, including their own
| JVM implementations, with AOT compilers, bare metal
| deployments and real time GC.
|
| Many of their customers are factory processes and military
| deployments with weapons control, two scenarios where any
| kind of stall might produce deadly results.
| f1shy wrote:
| Thank you so much! Now I have a more powerful argument,
| when at work somebody says a motor ECU cannot use GC,
| because of time constraints! Of course I assume there might
| be a catch from the cost PoV, where Mil may be ok,
| automotive consumer goods may be a problem (today)
| pjmlp wrote:
| Here,
|
| https://www.ptc.com/en/products/developer-tools/perc
|
| https://www.aicas.com/wp/products-services/jamaicavm/
| dwattttt wrote:
| While I agree with the point, the word "surviving" doesn't
| do that point any favours.
| pjmlp wrote:
| Sadly in modern times many devs aren't willing to pay for
| their tools...
|
| Yet they surely appreciate the income they earn by using
| them.
| 3836293648 wrote:
| You've got this one wrong. Rust is designed for a specific use
| case. Most projects are not that use case. Therefore the choice
| to use Rust is wrong.
|
| If GC is an option and you want all the nice parts of Rust, use
| OCaml
| in-pursuit wrote:
| I think OPs general point, although maybe not what they
| stated is correct: it's easy to write GC'd code. It's "easy"
| to write code with manual memory management. It's "easy" to
| write RC code. But it's hard to write borrow checker code.
| And that will probably limit adoption, even though the goals
| of Rust are good.
| pjmlp wrote:
| If Go designers weren't so much anti-modern language design,
| in many scenarios where people are rushing to do RIIR, they
| would be better served with Go, if those modern features were
| part of the language.
|
| Having said that, there are still OCaml (as you noted),
| Haskell, .NET languages with Native AOT, JVM languages with
| GraalVM/OpenJ9, D, Nim, Swift, ....
|
| And if one wants to get fancy with type systems, Idris,
| Dafny, FStar,..
| amelius wrote:
| > If GC is an option and you want all the nice parts of Rust,
| use OCaml
|
| So are you saying it would be possible to use a hypothetical
| "non-GC-enabled" OCaml compiler that complains if GC'd code
| is invoked/generated, and it would be a similar experience as
| using Rust?
| johnnyjeans wrote:
| No, simply because GC isn't an optional part of ocaml's
| design. You can "disable" it only in the sense that you
| stop any frees from happening.
|
| In order to do away with Garbage collection you'd need to
| completely change the type system, and while you might get
| a result which ends up looking a lot like ocaml, it won't
| be ocaml any more than ocaml is sml.
|
| This _is_ what Rust started out as, by the way. In my
| opinion, it should have stayed that way. Post-Mozilla Rust
| has been nothing but a gigantic disappointment.
| zozbot234 wrote:
| Early versions of Rust "started out" _with_ GC: the
| language was broadly similar to Golang, though with a lot
| of influence from Ocaml and functional languages more
| broadly. The comprehensive use of borrowck to avoid GC
| altogether was only implemented around 2014 or so, hence
| shortly before the Rust "1.0" release in 2015.
| amelius wrote:
| > Most projects are not that use case. Therefore the choice
| to use Rust is wrong.
|
| Do you think that projects that have a large GUI component
| should be written in Rust?
|
| What if a project has both a "systems" and a GUI component to
| it?
| naasking wrote:
| More people need to read up on C#'s ref's:
|
| https://em-tg.github.io/csborrow/
|
| These kinda-sorta fall under borrow checking or regions, just
| without any annotations. Then again, Ada/Spark's strategy also
| technically falls under Tofte-Talpin regions:
|
| https://www.cs.cornell.edu/people/fluet/research/substruct-r...
| mastax wrote:
| Yeah C# is very well designed for gradually introducing low
| level concepts for performance.
| jimbob45 wrote:
| The reverse is arguably the one trait Rust is missing that is
| holding it back from mass adoption. You can write high-level
| C# code and seamlessly introduce low-level concepts into it
| as you see fit. However, although you can write low-level
| Rust code easily, introducing high-level concepts is very
| painful and, in many cases, impossible.
| akkad33 wrote:
| Also F#
| neonsunset wrote:
| As of now - F# needs more work w.r.t. support for [<Struct;
| IsByRefLike>] types and byref<'T>s. There's a reason ref
| lifetime analysis in Roslyn is quite a sophisticated part
| of its implementation even if most developers aren't aware
| of it.
|
| F# has other cool features like `inline` bindings and
| [<IlineIfLambda>] function attributes but I found it more
| difficult to work with for systems programming when
| attempting to stay within safe constructs.
|
| But if you don't need byrefs, ref structs and spans, then
| regular pointer-based or even array-based code is quite
| pleasant to write, for example https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
|
| With that said - F# is an incredible language, but for
| systems programming you are likely to get better results in
| C#. I hope eventually someone contributes subsequent spec
| and compiler work to change this (and if you think you have
| time and could try it - please do! it's a nice small
| community).
| akkad33 wrote:
| Doesn't F# have byrefs, inrefs and outrefs. How are they
| different from C# refs? Asking as someone with surface
| level knowledge of both languages
| https://learn.microsoft.com/en-us/dotnet/fsharp/language-
| ref...
| bob1029 wrote:
| Here's an example where ref returns make a big impact:
|
| https://github.com/prasannavl/WinApi
|
| The performance of this interop layer is so close to native
| it's difficult to argue for doing things the much more painful
| way.
| Someone wrote:
| Isn't that a form of _"5. a mechanism where a pointer cannot
| point to an object that is more deeply scoped than itself"_?
|
| Also, since it's an error, I guess it must be different from
| _gcc_ and _clang_ (and, likely other C compilers) _-Wreturn-
| local-addr_ warnings
| (https://www.emmtrix.com/wiki/Clang:Flag/-Wreturn-local-addr),
| which can have false positives.
|
| What's the difference? Is the language stricter, disallowing
| some constructs that are valid, but hard to prove or is the
| error not always triggered for buggy code?
| naasking wrote:
| > Isn't that a form of "5. a mechanism where a pointer cannot
| point to an object that is more deeply scoped than itself"?
|
| That constraint is just a Tofte-Talpin region restriction,
| eg. lexically scoped regions.
| xmcqdpt2 wrote:
| Not a fan of the framing of the article. Firstly, there are
| millions of Mayans alive today,
|
| https://en.wikipedia.org/wiki/Maya_peoples
|
| and secondly, the reason why the pre-Colombian cultural texts and
| script are not in use today, even by the people who speak the 28
| Mayan languages currently in use, is because of genocide by
| Columbus and those that followed. The Catholic church destroyed
| every piece of Mayan script they could get their hands on.
|
| The article reads like the author is not aware of these basic
| facts of American geography and history.
| constantcrying wrote:
| I thought it was just annoying to read. Irrelevant analogies
| never helped me understand anything.
| 4ad wrote:
| > Interaction nets are a very fast way to manage purely immutable
| data without garbage collection or reference counting.[...] HVM
| starts with affine types (like move-only programming), but then
| adds an extremely efficient lazy .clone() primitive, so it can
| strategically clone objects instead of referencing them.
|
| This is wrong, Interaction nets (and combinators) can model any
| kind of computational systems, including ones that use mutation.
| In fact, ICs are not really about types at all, although they do
| come from a generalization of Girard's proofs nets, which came
| from work in linear logic.
|
| The interesting thing about ICs is that they are beta-optimal
| (any encoding of a computation will be done in the minimum number
| of steps required -- there is no _useless_ work being done), and
| maximum-parallel with only local synchonization (all reduction
| steps are local, and all work that _can_ be parallelized _will_
| be parallelized).
|
| Additionally ICs have the property that any encoding of a
| different computational system in ICs will preserve the
| asymptotic behavior of all programs written for the encoded
| computational system. In fact, ICs are the only computational
| system with this property.
|
| Interaction nets absolutely require garbage collection in the
| general sense. However, interaction combinators are linear and
| all garbage collection is explicit (but still exists). HVMs
| innovation is that by restricting the class of programs encoded
| in the ICs you can get very cheap lambda duplication and eschew
| the need for complex garbage collection while also reducing the
| overhead of implementing ICs on regular CPUs (no croissants or
| brackets, see Asperti[1] for what that means).
|
| Having a linear language with the above restriction allows for a
| very efficient implementation with a very simple GC, while
| maximizing the benefits of ICs. In principle _any_ language can
| be implemented on top of ICs, but to get most benefits you want a
| language with these properties. It 's not that HVM starts with
| affine types and an efficient lazy clone operation, it's that a
| linear language allows extremely efficient lazy cloning
| (including lambda cloning) to be implemented on top of ICs, and
| the result of that is HVM.
|
| > The HVM runtime implements this for Haskell.
|
| This is very wrong. HVM has nothing to do with Haskell. HVM3 is
| written in C[2], HVM2 has three implementations, one in C[3], one
| in Rust[4], and a CUDA[5] one. HVM1 was just a prototype and was
| written in Rust[6].
|
| HOC[7], the company behing HVM provides two languages that
| compile to HVM, Bend[8], and Kind[9]. Bend is a usual functional
| language, while Kind is a theorem prover based on self types.
|
| Haskell is not involved in any of these things except that the
| HVM compiler (not runtime) is written in Haskell, but that is
| irrelevant, before Haskell it used to be written in TypeScript
| and then in Agda (Twitter discussion, sorry, no reference). It's
| an implementation detail, it's not something the user sees.
|
| Please note that HVM adds some stuff on top of ICs that makes it
| not strictly beta-optimal, but nevertheless the stuff added is
| useful in practice and the practical downgrade from theoretical
| behaviour is minimal.
|
| [1] Andrea Asperti, The Optimal Implementation of Functional
| Programming Languages, ISBN-13: 978-0060815424
|
| [2]
| https://github.com/HigherOrderCO/HVM3/blob/main/src/HVML/Run...
|
| [3] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.c
|
| [4] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.rs
|
| [5] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.cu
|
| [6] https://github.com/HigherOrderCO/HVM1
|
| [7] https://higherorderco.com
|
| [8] https://github.com/HigherOrderCO/bend
|
| [9] https://github.com/HigherOrderCO/kind
| pizlonator wrote:
| The way you make garbage collection deterministic is not by doing
| regions but by making it concurrent. That's increasingly common,
| though fully concurrent GCs are not as common as "sorta
| concurrent" ones because there is a throughput hit to going fully
| concurrent (albeit probably a smaller one than if you partitioned
| your heap as the article suggests).
|
| Also, no point in calling it "tracing garbage collection". Its
| just "garbage collection". If you're garbage collecting, you're
| tracing.
| jakewins wrote:
| Do you have any recommended reading material on this?
|
| Intuitively it feels like making it concurrent should do the
| opposite of making GC deterministic! I'd love to read something
| showing that intuition is wrong
| pizlonator wrote:
| Garbage collection handbook
|
| https://gchandbook.org/
|
| If you want to see my latest concurrent GC, see
|
| https://github.com/pizlonator/llvm-project-
| deluge/blob/delug...
|
| https://github.com/pizlonator/llvm-project-
| deluge/blob/delug...
| roetlich wrote:
| > Also, no point in calling it "tracing garbage collection".
|
| You're against more explicit naming just for the sake of it? In
| the literature reference counting is also referred to as a type
| of garbage collection, and doesn't involve tracing. If you
| talking about a specific context you can probably drop the
| "tracing", but in a general article like this it would just be
| very confusing?
|
| This way, someone can google "tracing garbage collection", and
| will find the relevant wikipedia article:
| https://en.wikipedia.org/wiki/Tracing_garbage_collection
| pizlonator wrote:
| The literature does not always put the "tracing" in front of
| the "garbage collection".
|
| For example, nobody says that Objective-C is garbage
| collected just because it has ARC. Nobody says that C++ is
| garbage collected even though shared_ptr is widespread. And
| systems that do tracing GC just call it GC (see for example h
| ttps://www.oracle.com/webfolder/technetwork/tutorials/obe/j..
| .)
|
| To think clearly about the tradeoff between GC and RC it's
| important to acknowledge the semantic differences:
|
| - GC definitely collects dead cycles.
|
| - RC knows exactly when objects die, which allows for
| sensible destructor semantics and things like CoW.
|
| - it's possible to use RC as an optimization in a GC, but
| then you end up with GC semantics and you still have tracing
| (hence: if it's got tracing, it's a garbage collector).
|
| It's a recent fad to say that RC is a kind of GC, but I don't
| think it ever took off outside academia. Folks who write GCs
| call them GCs. Folks who do shared_ptr or ARC say that they
| don't use GC.
|
| And its good if this fad dies because saying that RC is a
| kind of GC causes folks to overlook the massive semantic
| elephant in the room: if you use a GC then you can't swap it
| for RC because you'd leak memory (RC would fail to delete
| cycles), and if you use RC and swap it for a GC then you'd
| leak resources (your destructors would no longer get called
| when you expect them to).
|
| On the other hand, it is possible to change the guts of an RC
| impl without anyone noticing. And it's possible to change the
| guts of a GC while preserving compatibility. So these are
| really two different worlds.
| roetlich wrote:
| > The literature does not always put the "tracing" in front
| of the "garbage collection".
|
| Not always, but often enough that an introductory article
| that presents an overview of different memory managment
| techniques should maybe use the longer name to avoid
| confusion.
|
| And I kinda agree with you, using the name "garbage
| collection" for RC doesn't really make sense, there is no
| metaphorical garbage truck driving around to collect your
| unused memory. :)
|
| What's your opinion on the term "RC with cycle detection"
| that some use for things like Python's GC?
|
| > And it's possible to change the guts of a GC while
| preserving compatibility.
|
| Moving to a conservative GC might also introduce memory
| leaks, if you're unlucky. But yes, "tracing" gc and rc
| obeviously behave very differently, and have very different
| performance considerations.
| pizlonator wrote:
| > Not always, but often enough that an introductory
| article that presents an overview of different memory
| managment techniques should maybe use the longer name to
| avoid confusion.
|
| Referring to garbage collection as tracing garbage
| collection creates more confusion and should be avoided.
|
| It confuses folks into thinking that there is some
| garbage collection that isn't tracing. There's no such
| thing.
|
| > What's your opinion on the term "RC with cycle
| detection" that some use for things like Python's GC?
|
| Depends on how you detect cycles. Python uses a garbage
| collector. Therefore I would say that python has a GC and
| is a GC'd language.
|
| > Moving to a conservative GC might also introduce memory
| leaks, if you're unlucky.
|
| Folks who adopt conservatism in production do so only if
| they have a story for avoiding those leaks. (That's what
| we did in JavaScriptCore.)
|
| > But yes, "tracing" gc and rc obeviously behave very
| differently, and have very different performance
| considerations.
|
| Just call it "GC" and everyone will know what you mean.
| No need to be a contrarian and put "tracing" in front.
|
| And it's not perf considerations if it's the difference
| between your program running at all and crashing. Failing
| to collect all cycles as RC does would cause a program
| written in a GC'd language to simply crash if it ran for
| more than just a short while. Failing to invoke
| destructors the way RC'd programs expect, which would
| happen if you tried to switch to GC, will cause
| observably different behavior in addition to possible
| performance issues.
| roetlich wrote:
| > And it's not perf considerations if it's the difference
| between your program running at all and crashing.
|
| Yeah, that's what I meant to include by "behave very
| differently". I don't think we disagree on anything
| technical here. The problem is if you are currently
| googling for garbage collection you will mostly get
| garbage results. Here's duckduckgo to avoid my search
| bubble:
| https://duckduckgo.com/?q=garbage+collection+programming
|
| The first result is the wikipedia article: https://en.wik
| ipedia.org/wiki/Garbage_collection_%28computer... It's
| pretty bad, under "Strategies" it lists three: Tracing,
| Reference Counting and Escape Analysis. I'm sure these
| three are similar things.
|
| The second result is this blog post, also listing
| rereference counting as gc:
| https://www.freecodecamp.org/news/a-guide-to-garbage-
| collect...
|
| And the third result looks okay. Searching for "tracing
| garbage collection" has better results. The text in
| question already uses "gc" most of the time, and has a
| footnote saying:
|
| > By "garbage collection", we're referring to tracing
| garbage collection.
|
| I think that's as clear as it gets, without going on rant
| about the names of things. You are clearly an expert in
| garbage collectors, but most people in the target
| audience of that article are not. The article compares
| the differences between rc and gc. If someone then goes
| and reads the wikipedia articles about either of those
| they will be very confused because wikipedia will tell
| them rc is gc. A "fad" like this can't be undone, once a
| usage of a word becomes this popular you can't undo it.
|
| Okay, sorry, this was too long, and we agree to like 99%
| anyway. Have a nice day! :)
| pizlonator wrote:
| That reminds me of the time I went on a first date and
| she asked, "so what do you do exactly?" and I said I work
| on "garbage collection". You should have seen her face!
| azornathogron wrote:
| Using concurrent GC has various advantages but in no way makes
| it _deterministic_.
| pizlonator wrote:
| True, not by itself.
|
| But concurrent GC is the basis for making deterministic GC,
| since it gives you the option of scheduling GC work whenever
| you like rather than pausing the world.
|
| Some concurrent GCs are also deterministic while others
| aren't. I've written both kinds.
| gpderetta wrote:
| you have some very non-standard definition of determinism.
| pizlonator wrote:
| How so?
| bluGill wrote:
| Why is garbage collection called memory safety? Garbage
| collection in whatever form is only memory safe if it doesn't
| free memory that will still be used. (which means if you actually
| get all your free calls correct C is memory safe - most long
| lived C code bases have been beat on enough that they get this
| right for even the obscure paths).
|
| Use after free is important, but in my experience not common and
| not too hard to track down when it happens (maybe I'm lucky? - we
| generally used a referenced counted GC for the cases where
| ownership is hard to track down in C++)
|
| I'm more worried about other issues of memory safety that are not
| addressed: write into someone else's buffer - which is generally
| caused by write off the end of your buffer.
| constantcrying wrote:
| >Why is garbage collection called memory safety? Garbage
| collection in whatever form is only memory safe if it doesn't
| free memory that will still be used.
|
| Yes. A garbage collector is only safe if it works correctly.
| What an irrelevant observation. Nothing can guarantee that
| something works correctly if it doesn't work correctly.
| bluGill wrote:
| Keep reading. That is all a garbage collector gives me. there
| are lots of other things that that are memory unsafe that
| garbage collectors don't give me.
| foota wrote:
| To answer your question, I'd say it's memory safe when it's a
| part of the runtime. At some point, you're relying on your
| runtime to be correct, so if it says it does garbage collection
| then you can rely on it, in the same way you rely on the
| allocator not to randomly trash your memory etc.,.
| bluGill wrote:
| You misunderstand. Sure that is a part of memory safe, but
| why is the much larger problem of running off the end of the
| buffer into something else not considered a larger part. In
| my experience the later is a worse problem (the blame for
| issues goes to someone else who's code is working perfectly
| correct and so they spend months trying to find a logic error
| before someone finally looks elsewhere - often the fix is
| just a random fix by those who are at fault and so the team
| will spend months more looking before closed as "doesn't
| happen anymore, no idea why". Memory leaks by contrast are
| hard to track down, but at least they leave obvious clues and
| so the blame doesn't go to the wrong person.
| the__alchemist wrote:
| After pondering, my single favorite capability of rust is this:
| fn modify(val: &mut u8) { // ... }
|
| No other language appears to have this seemingly trivial
| capability; their canonical alternatives are all, IMO, clumsier.
| In light of the article, is this due to Rust's memory model, or
| an unrelated language insight?
| constantcrying wrote:
| How is a mutable reference as an argument to a function in any
| way unique? C++ has mutable references as arguments to
| functions.
| the__alchemist wrote:
| You're right; you can do this in C++, although the patterns
| I've seen usually involve pointers.
|
| There is no equivalent in Python, Javascript, Typescript,
| Java, or Kotlin.
| constantcrying wrote:
| >I've seen usually involve pointers.
|
| If only C++ had a thing called "references", which
| conveniently also used the "&" symbol and are a way to do
| exactly this without pointers.
|
| https://en.cppreference.com/w/cpp/language/reference
| the__alchemist wrote:
| I stand corrected on C++. Can you think of any other
| languages? Of the ones I listed, I use all of, and am
| continuously frustrated by this limitation.
| constantcrying wrote:
| >Can you think of any other languages?
|
| Yes. E.g. every language where you can pass pointers can
| do this. Reference are a construct around pointers.
|
| >Of the ones I listed, I use all of, and am continuously
| frustrated by this limitation.
|
| Why? This is only a limitation for a few basic types. All
| the language you mention allow you to modify objects
| passed to a function.
|
| If it ever were any issue though you can always return
| the modified object and overwrite the original.
|
| a = modify(a)
|
| works even if you can only pass a value.
|
| In practice this should never be a limitation, certainly
| I have never encountered it as one.
| the__alchemist wrote:
| I challenge you to write the function signature I posted
| in Python, Javascript, or Kotlin; I'm confident you will
| not be able to, and the alternatives will be more
| provincial and unergonomic.
|
| Hint: Python and Javascript will use the type passed to
| determine mutability. Kotlin will let you pass a
| mutableStateFlow etc that can be used for, e.g., top-
| level data structure, but not an arbitrary variable.
| (e.g. a data structure field, or free variable).
|
| The most popular languages, outside of C and C++, cannot
| pass pointers. And, raw pointers are not ideal; they lose
| type safety, memory safety etc.
|
| There are always canonical patterns for a given language
| to write a function like I did. I find them to be
| universally more complicated and less explicit/versatile
| than what I posted; they require code to permeate beyond
| the function signature.
| constantcrying wrote:
| It works the moment you wrap the int in a class.
|
| class A: a = 0
|
| def modify (v): v.a = 2
|
| a1 = A()
|
| a1.a = 1
|
| print(a1.a)
|
| modify(a1)
|
| print(a1.a)
| the__alchemist wrote:
| You're correct re the class wrapper (HN nesting limit).
| Can you see why that solution is complicated, and not
| versatile? (hint: This limitation will cause ripples of
| complication throughout your data structures)
| constantcrying wrote:
| I explicitly excluded primitive types in my previous
| comment.
|
| I have never seem a codebase where this was an issue. The
| most important concept in programming is defining your
| data structures. Professional programs almost never
| operate on primitive data structures, but always on
| larger structures, like classes or structs.
|
| This is why this is only a problem for you. I don't want
| to be insulting, but you need to brush up on your data
| structures. The need to pass a modifiable data structure
| almost never arises, the idiom in your OP is extremely
| uncommon and would likely not pass any code review.
| the__alchemist wrote:
| You have missed the point: Functions should not need to
| be provincial to a specific data structure. Nor should
| data structures include field wrappers to subvert a
| language's sloppy mutation semantics.
|
| The point is that with mutable references, you can modify
| only the part of the struct you need to, using a general
| function. You can use this same function to modify a
| different part of the structure, or a different
| structure, or a free variable.
|
| Your inference that preferring mutable references to
| wrapper structs implies insufficient knowledge of data
| structures is puzzling and incorrect.
| constantcrying wrote:
| You don't need to wrap fields. You just pass the classes.
|
| Think about why nobody but you is inconvenienced by this.
| I am sorry, this is a _you_ problem, because you are
| violating common idioms of the language.
|
| There is basically never a reason not to pass your data
| structure to a function, which is why what you are doing
| is an incredibly niche functionality.
|
| Also functions _should_ be particular to data structure.
| That is why types exists.
| the__alchemist wrote:
| General functions are useful.
|
| It is not a _me_ problem, which is why Rust has fixed
| this. You are arguing that functions which mutate
| parameters generally are not useful, which isn 't true.
| You have implied that I have a lack of knowledge of data
| structures, my code wouldn't pass code review. Reflect:
| You are proposing it is preferable to have a simple
| function accept a whole data structure when it may only
| act on one or more of its fields, and may be used in
| other places. That is a semantic misdirection, and
| provincialization.
| renox wrote:
| Ada has inout parameters.
| diath wrote:
| Most low-level languages support this, references in C++,
| pointers in C, ref keyword in C# and Dlang, and so on.
| It's mostly higher-level languages that do not support
| such things because it's in the VM semantics that dictate
| how objects are passed (i.e. in Lua tables are passed by
| reference, non-table values are passed by copy).
| bitbasher wrote:
| I'm kinda torn. It seems there are only three approaches.
|
| 1. laissez-faire / manual memory management (c, c++, etc)
|
| In this approach, the programmer decides everything.
|
| 2. dictatorship / garbage collection (java, go, etc)
|
| In this approach, the runtime decides everything.
|
| 3. deterministic / lifetime memory management (rust, c with
| arenas, etc)
|
| In this approach, the problem determines everything.
| ryao wrote:
| There is another option, which is to use a sound static
| analyzer that can prove the absence of memory safety issues
| like astree, and fix things that cause it to complain until it
| stops complaining:
|
| https://www.absint.com/astree/index.htm
|
| For those who think static analyzers cannot do that, notice the
| word "sound". This is a different type of static analyzer than
| the more common ones that do not catch everything.
|
| Sadly, there is no open source option that works across a broad
| range of software. NASA's IKOS is open source for example, but
| it does not support multithreading and some other things that I
| do not recall offhand, which makes it unable to catch all
| memory safety bugs in software using the features that it does
| not support. For now, people who want to use sound static
| analyzers need to use closed source tools or restrict
| themselves to a subset of what C/C++ can do so they can use
| IKOS:
|
| https://github.com/NASA-SW-VnV/ikos
| voxl wrote:
| Even in their own list of features "Memory Safety" does not
| pop up, and none of the listed features indicate to me that
| they would entail memory safety. Academics aren't publishing
| droves of work on separation logic for a problem that can be
| solved by 1980s static analyzers.
| ryao wrote:
| They say that they can prove the absence of classes of bugs
| that make up memory safety:
|
| * out-of-bounds array indexing,
|
| * erroneous pointer manipulation and dereferencing (NULL,
| uninitialized and dangling pointers),
|
| * read access to uninitialized variables,
|
| NIST gave them a glowing review:
|
| https://nvlpubs.nist.gov/nistpubs/ir/2020/NIST.IR.8304.pdf
|
| It is possible to make sound static analyzers that can
| prove code to be free from memory safety bugs, but it is
| difficult and you need to implement checks that either
| complain or mathematically prove the absence of each class
| to do it.
|
| These tools have been used for years in the aviation and
| nuclear industries, but almost nobody outside of those
| industries knows anything about them. If others had broader
| awareness of the existence of these tools, we could get
| open source equivalents and memory safety issues would be a
| thing of the past in C/C++ code.
|
| Finally, your 1980s remark reveals enormous ignorance of
| what the field of formal methods has produced. Astree was
| not available in the 1980s. It took over 40 years of work
| to make and only became available about 20 years ago. C++
| support took much longer for it to add, with it only adding
| it around 6 years ago if my recollection of the public
| documentation is correct. Some other things in this space
| are the Polyspace Code Prover and Frama-C.
| westurner wrote:
| From https://news.ycombinator.com/item?id=33560227#33563857 :
|
| - Type safety > Memory management and type safety:
| https://en.wikipedia.org/wiki/Type_safety#Memory_management_...
|
| - Memory safety > Classification of memory safety errors:
| https://en.wikipedia.org/wiki/Memory_safety#Classification_o...
|
| - Template:Memory management
| https://en.wikipedia.org/wiki/Template:Memory_management
|
| - Category:Memory_management
| https://en.wikipedia.org/wiki/Category:Memory_management
| eklitzke wrote:
| I don't understand why they say that reference counting is
| "slow". Slow compared to what? Atomic increments/decrements to
| integers are one of the fastest operations you can do on modern
| x86 and ARM hardware, and except in pathological cases will
| pretty much always be faster than pointer chasing done in a
| traditional mark and sweep VMs.
|
| This isn't to say reference counting is without problems (there
| are plenty of them, inability to collect cyclical references
| being the most well known), but I don't normally think of it as a
| slow technique, particularly on modern CPUs.
| gpderetta wrote:
| Atomic reference counting per se is fairly slow compared to
| other simple operations [1]. But the biggest issue with
| reference counting is that it doesn't scale well in
| multithreaded programs: even pure readers have to write to
| shared memory locations. Also acquiring a new reference from a
| shared atomic pointer is complex and need something like hazard
| pointers or a lock.
|
| [1] an atomic inc on x86 is typically ~30 clock cycles, doesn't
| really pipeline well and will stall at the very least other
| load operations.
| immibis wrote:
| MMM++ is a variation of standard malloc/free. You can still UAF,
| but only to another object of the same type, which may or may not
| prevent an exploit.
|
| Something that's missing is full-on formal verification where you
| write unrestricted C code and then mathematically prove it
| doesn't have any bugs. Nobody does this because proving a C
| program is correct is harder than mining a bitcoin block by hand,
| but it's useful to anchor one end of the safety/freedom spectrum.
| Many other approaches (such as borrow checking) can also be
| viewed as variants of this where you restrict the allowed program
| constructs to ones that are easier to prove things about.
| lilyball wrote:
| I don't see any mention of epoch-based garbage collection (see
| crossbeam
| https://docs.rs/crossbeam/latest/crossbeam/epoch/index.html).
| Generational References sounds like a related concept but it's
| not the same. I'm also surprised nobody's mentioned that one
| lance corporal goat yet.
___________________________________________________________________
(page generated 2024-12-20 23:02 UTC)