[HN Gopher] Rust: Box Is a Unique Type
       ___________________________________________________________________
        
       Rust: Box Is a Unique Type
        
       Author : thunderbong
       Score  : 80 points
       Date   : 2024-05-02 16:34 UTC (6 hours ago)
        
 (HTM) web link (blog.nilstrieb.dev)
 (TXT) w3m dump (blog.nilstrieb.dev)
        
       | __s wrote:
       | First, Box lacks some ergonomics, it's a pain dealing with when
       | matching
       | 
       | Now, as for the article, I don't really follow their argument.
       | Box<T> owns its contents. Hence why it drops its contents
       | afterwards, unlike &T. If someone needs this aliasable box type,
       | they should define a different DropPtr. "It's just a pointer" can
       | also be said of `&T` & `&mut T` (where T: Sized)
        
         | remram wrote:
         | > It's just a pointer" can also be said of `&T` & `&mut T
         | 
         | But that's the point: you can assign your &T to a pointer and
         | know that it won't be invalidated by merely moving the &T. That
         | is not true for Box<T>.
        
           | eddd-ddde wrote:
           | Except is not the same thing? You can copy a reference, you
           | can't copy a box (without having the T be copy as well).
           | 
           | A box is a pointer AND the storage at the same time. To move
           | it means to move the storage as well, unlike a regular
           | reference.
        
             | remram wrote:
             | > A box is a pointer AND the storage at the same time.
             | 
             | That's just the way you decide to see it. Most people
             | don't, in fact the actual documentation is quite different:
             | 
             | >> A pointer type that uniquely owns a heap allocation of
             | type T.
             | 
             | This explicitly describes it as a pointer, pointing to a
             | single allocation.
             | 
             | https://doc.rust-lang.org/nightly/std/boxed/struct.Box.html
        
               | orf wrote:
               | "A _pointer type_ that _owns_ a heap allocation" is a
               | more broad definition to me than specifically "a pointer
               | pointing to a single allocation"
        
         | spoiler wrote:
         | > First, Box lacks some ergonomics, it's a pain dealing with
         | when matching
         | 
         | Agreed, but this should become better once box_patterns is
         | stable
         | 
         | https://doc.rust-lang.org/beta/unstable-book/language-featur...
        
           | estebank wrote:
           | That feature is almost assuredly _not_ going to land on
           | stable _but_ the more general  "deref_patterns" which would
           | allow matching on boxed values, as well as on `Vec`s and
           | `String`s will. It is not anywhere close to finished, but I
           | am convinced it will land.
        
             | airstrike wrote:
             | Links with more info on the current state of deref patterns
             | 
             | https://github.com/rust-lang/rust/issues/87121
             | 
             | https://hackmd.io/4qDDMcvyQ-GDB089IPcHGg
        
             | spoiler wrote:
             | Oh I missed that one. Thanks for pointing it out!
             | 
             | Here's the proposal for anyone else who's curious
             | https://hackmd.io/4qDDMcvyQ-GDB089IPcHGg
        
       | ComputerGuru wrote:
       | I have to disagree with the entire premise of the article. It's
       | the fact that Box _isn 't_ unique that gives it this behavior.
       | The author even says as much, but dismisses it because it gets in
       | the way of their point:
       | 
       | > While it can be argued that box is like a T, but on the heap,
       | and therefore moving it should invalidate pointers, since moving
       | T definitely has to invalidate pointers to it, this comparison
       | doesn't make sense to me. While Box<T> usually behaves like a T,
       | it's just a pointer.
       | 
       | The "it's just a pointer" argument is moot (and pointers have
       | nothing to do with the issue, it's a question of exclusive
       | ownership and nothing more). It's a high-level object (i.e. not a
       | pointer) to which rust's very clear aliasing and single name
       | rules apply. Thou shalt not have multiple live T that point to
       | the same memory address. QED.
       | 
       | It's the same reason this code is unsafe:
       | struct Foo;         impl Foo {             fn bar(&mut self) {
       | ... }             fn baz(&self) { ... }         }              fn
       | get_foo() -> &mut Foo {             unsafe { static mut
       | FOO_SINGLETON: Foo = Foo; }             unsafe { &mut
       | FOO_SINGLETON }         }
       | 
       | Here we don't even have a pointer, only statically allocated data
       | stored in .TEXT. We don't even create a second top-level instance
       | of Foo, only an &mut reference to it. But it's unsafe because the
       | compiler can't know whether or not it has exclusive access.
       | 
       | (As a refresher, in terms of "strength" of ownership from the
       | most exclusively owned to the least so, it would go T -> &mut T
       | -> &T.)
        
         | Rusky wrote:
         | The problem with this, as the article argues in detail, is that
         | it leaves a hole in functionality that people need but offers
         | no clear benefit in return.
         | 
         | A movable owning pointer that exposes its pointer-ness in its
         | semantics is a useful tool when you're doing lower-level,
         | sometimes-unsafe stuff with memory layout. The author points
         | out that, because Box does not currently provide this
         | functionality, there are crates in the ecosystem that step in
         | to provide it instead. This middle ground between "raw pointers
         | for everything" and "aliased XOR mutable" is an important thing
         | to support.
         | 
         | This might be acceptable if there were some benefit to Box
         | behaving this way. Perhaps if it actually made a difference for
         | the optimizer? The benchmarks the author did seem to suggest
         | otherwise- many mutations wind up going through a reborrowed
         | `&mut T` anyway. Perhaps the conceptual model of "like a T but
         | on the heap" is enough of a benefit? But being more permissive
         | here doesn't change that model for safe code anyway.
         | 
         | The author dismissed this for much more concrete reasons than
         | "it gets in the way of their point."
        
         | remram wrote:
         | > It's a high-level object (i.e. not a pointer)
         | 
         | I don't know why so many people in this thread try to pretend
         | that there's no reason to see Box as a pointer, no one has ever
         | called it that, and every user drawing a parallel is confused.
         | The documentation for Box is literally (emph mine):
         | 
         | >> *A pointer type* that uniquely owns a heap allocation of
         | type T.
         | 
         | Until that documentation changes I find the article's point
         | quite valid.
        
           | CodeMage wrote:
           | > The documentation for Box is literally (emph mine):
           | 
           | > >> _A pointer type_ that uniquely owns a heap allocation of
           | type T.
           | 
           | > Until that documentation changes I find the article's point
           | quite valid.
           | 
           | I think there's some confusion here and that it's because the
           | concept of "pointer" is slightly overloaded.
           | 
           | One overload of the meaning is "first-class pointer types":
           | https://doc.rust-lang.org/reference/types/pointer.html
           | 
           | The other overload, the one used in the docs for Box, Rc, and
           | such, is basically "anything that implements Deref". People
           | who say "Box is not a pointer" are referring to the fact that
           | Box is not a first-class pointer type, i.e. it's neither a
           | reference nor a raw pointer.
        
           | ComputerGuru wrote:
           | I only meant "not a raw pointer" because rust supports read
           | and write operations on raw pointers with very different
           | aliasing semantics.
           | 
           | It is an _owned_ pointer, with emphasis on the  "owned". You
           | can have as many raw pointers to the same memory location as
           | you like, you just can't have multiple native rust objects
           | pointing to that same memory alive at once, though. It's also
           | obvious because Box<T> implements Drop, so obviously it's not
           | just something you can pass to a function in lieu of a
           | pointer and if you do pass it to a function, you can no
           | longer make any assumptions about the lifetime or validity of
           | any pointers to the same data.
        
       | mmastrac wrote:
       | I strongly disagree with this article. Perhaps in the days before
       | Miri it might have made sense, but it's pretty trivial right now
       | to discover UB in unsafe code with a simple `cargo +nightly miri
       | test` run.
       | 
       | It feels like the Rust team is a bit wary of introducing other
       | optimizations for fear of breaking unsafe code that has lurking
       | UB, but it's better to start working on fixing these problems
       | _now_ rather than get stuck in the present state of limbo. It's
       | only going to get harder to fix incorrect code (which we see an
       | example of in this particular post).
       | 
       | Honestly Miri is a superpower and it needs to be the priority of
       | the Rust team to stabilize it. There's nothing inherently wrong
       | with unsafe code: it's unsound code that's the problem, and we
       | have the tools to prevent this exact problem from the article.
        
         | IshKebab wrote:
         | Miri can't detect all UB, so I think it's still sensible to
         | reduce the chance of writing it in the first place.
        
           | mmastrac wrote:
           | I'm not aware of any UB that can 1) be caused by unique
           | pointer violations from Box and 2) are undetectable by Miri
           | (assuming good code coverage), but I might be wrong about
           | this.
        
             | IshKebab wrote:
             | Well for a start, if your tests don't cover that code then
             | it can't possibly detect it.
        
               | mmastrac wrote:
               | This is trivially solved by putting code coverage and
               | Miri into the same project, however.
        
               | IshKebab wrote:
               | It's trivial to _measure_ code coverage. It 's definitely
               | not trivial to achieve 100% code coverage.
               | 
               | This is the sort of "just do things perfectly" nonsense
               | we get from C programmers. I'm surprised to see it from
               | Rust devs, given the whole ethos of Rust is that it
               | acknowledges that programmers are not perfect and helping
               | them avoid bugs as much as possible is a good thing.
        
               | remram wrote:
               | I agree in general, but if you have unsafe code you
               | should definitely make sure it is covered.
        
               | mmastrac wrote:
               | It's not nonsense. It's really not difficult to structure
               | code for 100% coverage of unsafe code if you're thinking
               | about it from the start.
               | 
               | You're also perfectly fine to write code that is free of
               | `unsafe`, freeing you from this onerous task. We're
               | pulling out Miri _because_ we're going outside the normal
               | guardrails.
               | 
               | You also don't _need_ to get 100% coverage of all your
               | unsafe code if you can be confident of the usage of
               | unsafe. The most complex unsafe code should almost
               | certainly be covered, but there are a lot of trivial uses
               | of unsafe that can be shown to be correct through
               | reasoning.
               | 
               | Where possible I prefer to split code into safe and
               | unsafe portions, and test the unsafe portions under Miri
               | with as much coverage as gives me confidence in the code.
               | 
               | I've made UB mistakes before with unsafe, but since
               | adding Miri and code coverage, the numbers of mistakes
               | I've made has dropped dramatically. No programmer is
               | perfect, but one would be pretty foolish to ignore the
               | tools at one's disposal.
        
             | LegionMammal978 wrote:
             | If you have an object that's !Unpin, then Miri will not
             | apply uniqueness rules to anything containing it [0],
             | including boxes and &mut references. (In the example code,
             | replacing the PhantomPinned with a () will make Miri
             | complain again.) This is considered a temporary (if long-
             | lived) measure to allow async executors to manipulate
             | pinned futures without invalidating all their internal
             | borrows. Thus, it might be seen as undetected UB, in lieu
             | of a permanent solution.
             | 
             | [0] https://play.rust-
             | lang.org/?version=stable&mode=debug&editio...
        
               | mmastrac wrote:
               | I forgot about this. I actually had to ignore a test in
               | Miri because of this exact issue.
               | 
               | https://github.com/denoland/deno_core/blob/98b09fa4f77db1
               | 131...
               | 
               | Anxiously waiting on https://github.com/rust-
               | lang/rfcs/pull/3467
        
         | zamalek wrote:
         | This is why I firmly argue (and pretend) that references are
         | not pointers. "References are pointers" results in the belief
         | that references will behave like C pointers and results in
         | things like this article.
         | 
         | At best they are a cousin of pointers.
         | 
         | I consider the fact that they are pointers an implementation
         | detail, just like Box is a value with 'static samantics.
        
           | saghm wrote:
           | > This is why I firmly argue (and pretend) that references
           | are not pointers. "References are pointers" results in the
           | belief that references will behave like C pointers and
           | results in things like this article.
           | 
           | This is my usual mental model as well. My thinking is that if
           | tomorrow a new Rust version came out that used some other
           | magical implementation of references that didn't use pointers
           | under the hood, my code should still be correct. Maybe
           | converting between references and raw pointers would be less
           | efficient, but the semantics of my code shouldn't change.
        
       | vlovich123 wrote:
       | The biggest annoying magic I found with respect to Box (and other
       | std containers like Rc) is that they're the only ones capable of
       | storing fat dyn pointers. You can't construct a hybrid_rc::Rc<dyn
       | Trait> like you can with Box/Rc.
       | 
       | It's annoying magic like that that bothers me.
       | 
       | Another example is async lifetimes - it's frequently hard to
       | properly express the lifetime of a borrow resulting in choices of
       | an unnecessary Box::pin, unsafe or even both. Here's an example i
       | ran into recently and the author's challenges there are similar
       | to the one's I've ran into in my own codebase [1]
       | 
       | Or how about bridging poll-based futures and async (eg if within
       | my poll interface I want to call an async method). It's weird how
       | there's a world of difference between the implicit future
       | generated by async and an explicit type implementing Future. I
       | understand the similarity to named function vs closure but I'm
       | finding the distinction to have far more annoying sharp edges
       | than I've experienced with closures.
       | 
       | The tooling around non-trivial programs is also unfortunate -
       | working with an io_uring async runtime and Miri fails to start
       | (noted limitation). Valgrind deadlocks for some reason as well
       | which means that only asan's more limited techniques are usable.
       | 
       | My point is that soundness issues writing unsafe code is
       | important but a niche topic vs what I've experienced writing a
       | substantial program in Rust (~40k lines of code so far). It's
       | doable but I find myself still fighting with the language just a
       | bit too much.
       | 
       | Hopefully it's completely different teams responsible for these
       | kinds of work but, if not, I'd vote for stabilizing some of the
       | ergonomic magic that std has access to and improving the borrow
       | checker to recognize more definitely safe constructs so that
       | users don't need to do annoying hoop jumping. I know the std
       | magic I referenced is being worked on but as with all things rust
       | it's impossible to predict what actually gets stabilized and when
       | with the exception of marquee tentpole features they talk about
       | on the blog.
       | 
       | [1] https://github.com/someguynamedjosh/ouroboros/issues/112
        
         | LegionMammal978 wrote:
         | > The biggest annoying magic I found with respect to Box (and
         | other std containers like Rc) is that they're the only ones
         | capable of storing fat dyn pointers. You can't construct a
         | hybrid_rc::Rc<dyn Trait> like you can with Box/Rc.
         | 
         | It's perfectly possible to make a container capable of storing
         | trait objects: just define the type parameter as <T: ?Sized>.
         | The main issue is that unlike Box/Rc, the compiler won't give
         | you an automatic coercion from MyRc<Type> to MyRc<dyn Trait>,
         | so you have to write a method to explicitly perform that cast.
         | It just isn't common for many existing third-party containers
         | to support !Sized objects, since it takes tedious unsafe code
         | to manipulate them in memory.
        
           | vlovich123 wrote:
           | Sorry - that's exactly what I meant. The automatic conversion
           | from <T> to <dyn Trait>.
        
             | mjw1007 wrote:
             | Improving this is the subject of RFC #3621 [1], which
             | appeared today.
             | 
             | [1]: https://github.com/Darksonn/rfcs/blob/derive-smart-
             | pointer/t...
        
         | gpm wrote:
         | > The biggest annoying magic I found with respect to Box (and
         | other std containers like Rc) is that they're the only ones
         | capable of storing fat dyn pointers. You can't construct a
         | hybrid_rc::Rc<dyn Trait> like you can with Box/Rc.
         | 
         | Anything can _store_ fat dyn pointers, they 're just like any
         | other type in that regards.
         | 
         | Constructing them for a specific trait is easy and possible on
         | stable (e.g. adding a `as_debug(MyBox<T>) -> MyBox<dyn Debug>`
         | method).
         | 
         | Making it possible to construct them for _any_ trait is special
         | to the built in pointers... on stable. On nightly with unstable
         | features it 's possible (and easy) to make any smart pointer
         | type do this.
         | 
         | Code examples:
         | 
         | https://play.rust-lang.org/?version=nightly&mode=debug&editi...
        
           | vlovich123 wrote:
           | I should have taken more time on that and miswrote.
           | Converting MyBox<T> to MyBox<dyn Trait> for arbitrary traits
           | is only possible on nightly.
        
       | landr0id wrote:
       | >While we are many missing language features away from this being
       | the case, the noalias case is also magic descended upon box
       | itself, with no user code ever having access to it.
       | 
       | I'm not sure why the author thinks there's magic behind Box. Box
       | is not a special case of `noalias`. Run this snippet with miri
       | and you'll see the same issue: https://play.rust-
       | lang.org/?version=stable&mode=debug&editio...
       | 
       | You don't see an assertion failure though because... _dun dun
       | dun_ it 's UB.
       | 
       | `Box<T>` _does_ have an expectation that its inner pointer is not
       | aliased to another Box (even if used for readonly operations).
       | See: https://github.com/rust-
       | lang/miri/issues/1800#issuecomment-8...)
        
         | panstromek wrote:
         | Well, they work on the compiler, so that's one reason I guess.
         | Also the fact that it's magic is no secret and this is not the
         | only way in which it is (the most important is probably the
         | DerefMove behaviour that's mentioned in the article, too).
         | There's been many discussions around this in the past
        
       | andy_xor_andrew wrote:
       | One of the biggest struggles I have (and others have, judging
       | from Stack Overflow) is how to generically handle accepting types
       | of Box<T>, Rc<T>, T, Pin<T>, &T, &mut T, etc etc.
       | 
       | Of course you can write a function that is generic on <I, T:
       | AsRef<I>> or something, but the moment you introduce function-
       | coloring stuff like async, object safety, etc, things explode.
        
         | pylua wrote:
         | Is that by design ? I am a novice at rust, but that is sort of
         | how it feels. I could be missing something.
        
           | pornel wrote:
           | It requires supporting higher-kinded types, and Rust was
           | reluctant to add them (although it's slowly getting there
           | with higher kinded lifetimes and generic associated types).
        
         | unstruktured wrote:
         | macros can help with this if you can narrow down the traits you
         | want to support.
         | 
         | https://doc.rust-lang.org/reference/macros.html
        
         | 3836293648 wrote:
         | You just write an &T version and let everyone coerce their
         | smart pointer into a reference on call?
         | 
         | Worst case you write your generic asref function and then
         | immediately delegate to the &T version
        
         | vishalontheline wrote:
         | Is there an ELI5 video / tutorial for all things box-variables
         | that you can recommend?
         | 
         | I understand pointers, I understand references, I understand
         | ownership and mutability. I feel lost with Box things. The
         | official documentation came across as cryptic to me and I had a
         | hard time getting over the syntax. Like, what is "T" and why
         | does it get passed into Box<> ... etc.
        
       | pornel wrote:
       | There's magic in the Box: ability to partially move content out
       | of it, where any other type with Drop couldn't handle it. You can
       | implement traits for Box<Foo> even when Othertype<Foo> wouldn't
       | be allowed.
       | 
       | But noalias is not very special for Rust. &mut and & have a bunch
       | of limitations too.
       | 
       | But there's no need to give up on them, because Rust has the
       | UnsafeCell wrapper type for doing crimes with pointers. It
       | selectively disables noalias, thread safety, etc. Instead of
       | weakening guarantees of Box in general for all types, just insert
       | UnsafeCell where you need to be clever with pointers.
        
       ___________________________________________________________________
       (page generated 2024-05-02 23:00 UTC)