[HN Gopher] The Inevitability of the Borrow Checker
___________________________________________________________________
The Inevitability of the Borrow Checker
Author : rbanffy
Score : 105 points
Date : 2025-02-06 20:54 UTC (1 days ago)
(HTM) web link (yorickpeterse.com)
(TXT) w3m dump (yorickpeterse.com)
| etyp wrote:
| I like how this is structured. When I read that inline types get
| copied-on-borrow I was pretty put off. Then since fields of
| inline types can't be assigned new values it seems a bit better,
| as long as you roughly know what's happening. Hopefully the
| diagnostics are good enough there. I like the detailed
| alternatives that weren't chosen.
|
| I appreciate being able to choose which side of the tradeoff
| (always-copy or heap allocated) you want to be on, but either way
| be assured it's safe. Not sure how I feel about it in practice
| without trying it, though :)
| YorickPeterse wrote:
| On the diagnostics side of things, the compiler produces these
| in two places:
|
| 1. If you try to define a `mut` field (= one you can normally
| assign a new value), it will produce an error at the definition
| site
|
| 2. If you try to do something like `some_inline_value.field =
| value` it also produces an error at the call/assignment site
|
| The actual messages could use some improvements, but that's not
| a super high priority for the time being as I'm more focusing
| on the language overall.
| agentultra wrote:
| A nicely written article! And an interesting project.
|
| Myself, I'd lean towards a sound (linear) type theory. If it's
| not too much trouble, insert the run-time checks in debug builds
| but use the type system to erase them for optimized builds. It
| might seem like the mountain is impossible to climb if you're not
| used to formalizing such systems but every mountain is surmounted
| one step at a time.
|
| It's hard to bolt-on correctness after the fact. In my
| experience, for critical pieces like this, it's better to get the
| specification right first before digging in too deep and writing
| code.
|
| Best of luck on the project either way you go. Memory safety is
| increasingly important!
| YorickPeterse wrote:
| Although linear typing is certainly interesting, I think linear
| typing on its own is not enough. That is, you basically end up
| with something like Austral where you use linear types _plus_
| some form of compile-time borrow checking, at which point we're
| back at borrow checking (did I mention it seems inevitable?) :)
| agentultra wrote:
| You did! And maybe it _is_ inevitable? I don 't know.
|
| It'd be interesting to see different theories evolve, for
| sure. Maybe something in separation logic will make it's way
| into mainstream type theory and into compilers at some point.
|
| What's cool is how much we're starting to see folks push the
| boundaries these days. :)
| Kinrany wrote:
| Wouldn't the borrow checking be simpler when built on top of
| linear types?
| YorickPeterse wrote:
| I don't think the choice of linear vs affine makes much of
| a difference, but I could be mistaken.
| sunshowers wrote:
| I think a real challenge with trying to work on specifications
| first is error handling -- you often find that a truly sound
| model is quite difficult to explain to users. So some
| prototyping and iteration becomes necessary in my experience.
|
| Like, rustc only recently gained the ability to explain that
| the borrow checker rejected a program because a lifetime
| parameter was invariant. And this isn't even an artificial bit
| of complexity -- if you have mutability you are required to
| think about variance. If you have a Cell<&'static str> you
| cannot just turn that into a Cell<&'a str>, the way you can
| turn a regular &'static str into a &'a str. (Java programmers
| might be familiar with similar issues around
| ArrayList<Object>.)
| pjc50 wrote:
| This article seems to use "borrow" to mean what I would normally
| understand to be the reference count of a refcount-gc system?
| Rather than a Rust borrow, which is part of the type system and
| not counted at runtime.
|
| In C# you can force a type to be stack allocated with "ref
| struct". https://learn.microsoft.com/en-
| us/dotnet/csharp/language-ref...
| YorickPeterse wrote:
| In Inko a borrow is sort of a hybrid in that it does increment
| a counter, but there's no shared ownership like there is in
| traditional reference counting (e.g. a borrow is never tasked
| with cleaning up data).
|
| Terminology is a bit annoying here, as e.g. "reference" can
| also be interpreted as "a pointer", though it depends on who
| you ask. I stuck with "borrow" because that just made the most
| sense to me :)
| thomasmg wrote:
| > borrow is never tasked with cleaning up data
|
| And the reason for this is that there is no branch needed at
| the end of the borrow, to check for refCount=0? (Or, is this
| at least one of the reasons?) I'm wondering about the
| performance impact for this... There's also the code size
| impact.
| YorickPeterse wrote:
| Not quite. A borrow doesn't do cleanup because it's, well,
| a borrow, i.e. a temporary reference to some owned value.
| It's the owned value that's tasked with disposing of any
| resources/memory when it goes out of scope, just as in
| Rust.
|
| The cost of borrowing (at least with heap types) is an
| increment upon creating the borrow, and a decrement upon
| disposing of the borrow. Over time I'd like to optimize
| those away, but that's not implemented at this time.
| harrison_clarke wrote:
| you can stack allocate with just `struct` in C#. if you put a
| struct in a local variable, it'll be on the stack
|
| `ref struct` allows that struct to contain refs, and disallows
| putting it on the heap (so, you can't have an array/list of ref
| structs, and a class or a normal struct can't have a ref struct
| field)
| neonsunset wrote:
| This is correct. Ref structs and refs are also subject to
| similar[0] to Rust's lifetime analysis to prevent them from
| being ever invalidated and having use-after-free (even if
| that free is essentially going out of scope or popping a
| stack frame).
|
| I assume pjc50 specifically refers to the fact that `ref
| struct` gives a strong guarantee while a regular struct can
| be placed wherever.
|
| [0]: https://em-tg.github.io/csborrow/
| harrison_clarke wrote:
| i think of it like having rust's rules, but you only get
| one (implicit) lifetime variable
|
| (that's probably wrong in some subtle ways)
| pjmlp wrote:
| Or _stackalloc_ instead of _new_.
| harrison_clarke wrote:
| something i'd like to see in a borrow checker (which i think
| causes problems in rust, because of the "leaking is safe" thing.
| which sounds like it could be a difficult hole to plug in
| language design):
|
| in rust, &mut means "this pointer points to an initialized value,
| and will point to an initalized value at the end of scope"
|
| i wish it also had &in, &out, and &tmp, for
| "initialized->uninitialized", "uninitialized->initialized", and
| "uninitialized->uninitialized"
| masklinn wrote:
| Their intra-function behaviour seems difficult to define for
| non trivial cases. For instance a &tmp would need to be write
| only until it's written to, then it can be read from as well,
| but it needs to be consumed before its scope ends,
| transitioning back to a write only reference. So you'd need a
| type system which can transition parameters through typestates
| (is that a subset of effects systems or unrelated?).
| wavemode wrote:
| What you're describing is exactly correct - you need a more
| robust "typestate" system (I call them "annotations"). Most
| languages have typestates - where you can, for example,
| declare a variable without immediately assigning it a value,
| but that variable remains write-only until it is assigned to.
|
| But these typestate systems aren't very elaborate usually.
| I've recently been doing research on the (hypothetical)
| design of a language which has typestates that can cross
| function boundaries - you can call a function that annotates
| that it uninitializes one of its arguments, and then that
| reference you passed in is now considered uninitialized in
| your local scope.
| ivanjermakov wrote:
| Author introduces inline type definition. Shouldn't allocation
| strategy be decided by the caller, not type definiton? Or there
| is a way to heap allocate value of inline type by wrapping it
| into some utility type, similar to Rust's Box?
| jayd16 wrote:
| Hmm, if it was by the caller then methods on the inline type
| can't know if members are copied or shared in assignment, I
| wouldn't think.
|
| Seems trivial to box the inline types like most languages do.
| dcrazy wrote:
| > Shouldn't allocation strategy be decided by the caller, not
| type definiton?
|
| This is how C++ is designed. Unfortunately, it precludes types
| from taking dependencies on their own address, which is
| critical for e.g. atomics. As far as I know, there is no way to
| actually force a C++ class to be heap allocated. I've tried.
|
| Newer languages like Swift give the type designer the ability
| to say "the address of this object in memory is important and
| must remain stable for the lifetime of this object." This
| decays to heap allocation.
| mtklein wrote:
| The idiom I am familiar with here is to make the constructor
| private and provide a public static factory method that
| allocates on the heap.
| badmintonbaseba wrote:
| > As far as I know, there is no way to actually force a C++
| class to be heap allocated.
|
| Make the destructor (and possibly the constructors) private,
| have public factory functions that hand out unique_ptr with a
| custom deleter, or define a public destroying operator delete
| (C++20) and use unique_ptr with its default deleter.
| spacechild1 wrote:
| > Unfortunately, it precludes types from taking dependencies
| on their own address,
|
| It does not! You just have to make it non-copyable and non-
| moveable. std::mutex and std::atomic do exactly this.
|
| > As far as I know, there is no way to actually force a C++
| class to be heap allocated. I've tried.
|
| As others have already pointed out: private constructor +
| factory method. However, this pattern is typically used for
| different reasons (implementation hiding or runtime
| polymorphism) and heap allocation is only a side effect.
|
| > This decays to heap allocation.
|
| No! The object might just as well live on the stack or have
| static storage duration.
| YorickPeterse wrote:
| > Or there is a way to heap allocate value of inline type by
| wrapping it into some utility type, similar to Rust's Box?
|
| There isn't. For this to work, functions for inline types have
| to be compiled such that one of these approaches is used:
|
| 1. They always take inline values by pointer, and we have to
| guarantee those pointers are never invalidated. This again
| means you need some sort of borrow checking scheme.
|
| 2. We compile two versions for each function: one that takes
| the data by value, and one by pointer, resulting in significant
| code bloat and compiler complexity.
|
| I think the split also better captures the intent: heap types
| are for cyclic and heavily mutated values, inline types are
| more for short-lived mostly immutable values.
| aidenn0 wrote:
| Why not a tracing GC instead?
| YorickPeterse wrote:
| Inko used to have a tracing GC (see
| https://yorickpeterse.com/articles/friendship-ended-with-the...
| for more details), but this was removed due to the various
| problems the presence of a tracing GC introduces.
| brokencode wrote:
| Inko is an interesting language, though the handling for panics
| sounds disturbing. Based on the documentation, panics abort the
| whole program and are not recoverable.
|
| Things like dividing by zero or accessing an index outside of the
| bounds of an array will panic.
|
| While these aren't incredibly common, I would never want my whole
| production web server to go down because some random,
| infrequently used endpoint had a bug in some edge case.
|
| The language seems heavily inspired by Erlang, which is very
| resilient to programmer errors in comparison.
|
| In Erlang, the philosophy is to let a lightweight process crash
| and restart it when there is a problem. This doesn't need to have
| any effect on other lightweight processes that are running.
| nixpulvis wrote:
| That's why you have wrappers that return Result enumerations.
| There is an overhead to check if an operation is valid, pay for
| it if you need it.
|
| This is exactly how Rust works, for example.
| demurgos wrote:
| Rust's panics are recoverable, so they're more like
| exceptions. This means that you may need to care about
| "unwind" safety and values in a potentially invalid state.
| It's the main reason for mutex poisoning and having to deal
| with it when locking.
|
| Overall, I'm not entirely sure if Rust would be better or not
| if panics were non-recoverable.
|
| PS: For completeness, there are flags to control the panic
| behavior; but you can't rely on them when writing a lib.
| hinkley wrote:
| My entirely made up origin story of borrow checked is that escape
| analysis (for GC) made huge progress in the 00's and borrow
| checkers are just the logical conclusion of having that shoulder
| to stand on.
|
| I don't know what inspired borrow checking but I am certain
| someone else would have thought it up presently if they hadn't.
| panstromek wrote:
| I think the Rust borrow checker is inspired by Cyclone. Not
| sure how that coincides with the GC development timeline
| mkehrt wrote:
| Exactly correct, I think. Cyclone allowed borrowing of
| "regions" (which were similar to rust lifetimes) in a very
| similar way. This was either based on or itself inspired
| other theoretical models of borrowing at around the same
| time; I'm not sure on the causality but I read all the
| literature at the time!
| pmarreck wrote:
| Never heard of the Inko language. Interesting.
___________________________________________________________________
(page generated 2025-02-07 23:00 UTC)