[HN Gopher] Rust: Box Is a Unique Type
___________________________________________________________________
Rust: Box Is a Unique Type
Author : thunderbong
Score : 80 points
Date : 2024-05-02 16:34 UTC (6 hours ago)
(HTM) web link (blog.nilstrieb.dev)
(TXT) w3m dump (blog.nilstrieb.dev)
| __s wrote:
| First, Box lacks some ergonomics, it's a pain dealing with when
| matching
|
| Now, as for the article, I don't really follow their argument.
| Box<T> owns its contents. Hence why it drops its contents
| afterwards, unlike &T. If someone needs this aliasable box type,
| they should define a different DropPtr. "It's just a pointer" can
| also be said of `&T` & `&mut T` (where T: Sized)
| remram wrote:
| > It's just a pointer" can also be said of `&T` & `&mut T
|
| But that's the point: you can assign your &T to a pointer and
| know that it won't be invalidated by merely moving the &T. That
| is not true for Box<T>.
| eddd-ddde wrote:
| Except is not the same thing? You can copy a reference, you
| can't copy a box (without having the T be copy as well).
|
| A box is a pointer AND the storage at the same time. To move
| it means to move the storage as well, unlike a regular
| reference.
| remram wrote:
| > A box is a pointer AND the storage at the same time.
|
| That's just the way you decide to see it. Most people
| don't, in fact the actual documentation is quite different:
|
| >> A pointer type that uniquely owns a heap allocation of
| type T.
|
| This explicitly describes it as a pointer, pointing to a
| single allocation.
|
| https://doc.rust-lang.org/nightly/std/boxed/struct.Box.html
| orf wrote:
| "A _pointer type_ that _owns_ a heap allocation" is a
| more broad definition to me than specifically "a pointer
| pointing to a single allocation"
| spoiler wrote:
| > First, Box lacks some ergonomics, it's a pain dealing with
| when matching
|
| Agreed, but this should become better once box_patterns is
| stable
|
| https://doc.rust-lang.org/beta/unstable-book/language-featur...
| estebank wrote:
| That feature is almost assuredly _not_ going to land on
| stable _but_ the more general "deref_patterns" which would
| allow matching on boxed values, as well as on `Vec`s and
| `String`s will. It is not anywhere close to finished, but I
| am convinced it will land.
| airstrike wrote:
| Links with more info on the current state of deref patterns
|
| https://github.com/rust-lang/rust/issues/87121
|
| https://hackmd.io/4qDDMcvyQ-GDB089IPcHGg
| spoiler wrote:
| Oh I missed that one. Thanks for pointing it out!
|
| Here's the proposal for anyone else who's curious
| https://hackmd.io/4qDDMcvyQ-GDB089IPcHGg
| ComputerGuru wrote:
| I have to disagree with the entire premise of the article. It's
| the fact that Box _isn 't_ unique that gives it this behavior.
| The author even says as much, but dismisses it because it gets in
| the way of their point:
|
| > While it can be argued that box is like a T, but on the heap,
| and therefore moving it should invalidate pointers, since moving
| T definitely has to invalidate pointers to it, this comparison
| doesn't make sense to me. While Box<T> usually behaves like a T,
| it's just a pointer.
|
| The "it's just a pointer" argument is moot (and pointers have
| nothing to do with the issue, it's a question of exclusive
| ownership and nothing more). It's a high-level object (i.e. not a
| pointer) to which rust's very clear aliasing and single name
| rules apply. Thou shalt not have multiple live T that point to
| the same memory address. QED.
|
| It's the same reason this code is unsafe:
| struct Foo; impl Foo { fn bar(&mut self) {
| ... } fn baz(&self) { ... } } fn
| get_foo() -> &mut Foo { unsafe { static mut
| FOO_SINGLETON: Foo = Foo; } unsafe { &mut
| FOO_SINGLETON } }
|
| Here we don't even have a pointer, only statically allocated data
| stored in .TEXT. We don't even create a second top-level instance
| of Foo, only an &mut reference to it. But it's unsafe because the
| compiler can't know whether or not it has exclusive access.
|
| (As a refresher, in terms of "strength" of ownership from the
| most exclusively owned to the least so, it would go T -> &mut T
| -> &T.)
| Rusky wrote:
| The problem with this, as the article argues in detail, is that
| it leaves a hole in functionality that people need but offers
| no clear benefit in return.
|
| A movable owning pointer that exposes its pointer-ness in its
| semantics is a useful tool when you're doing lower-level,
| sometimes-unsafe stuff with memory layout. The author points
| out that, because Box does not currently provide this
| functionality, there are crates in the ecosystem that step in
| to provide it instead. This middle ground between "raw pointers
| for everything" and "aliased XOR mutable" is an important thing
| to support.
|
| This might be acceptable if there were some benefit to Box
| behaving this way. Perhaps if it actually made a difference for
| the optimizer? The benchmarks the author did seem to suggest
| otherwise- many mutations wind up going through a reborrowed
| `&mut T` anyway. Perhaps the conceptual model of "like a T but
| on the heap" is enough of a benefit? But being more permissive
| here doesn't change that model for safe code anyway.
|
| The author dismissed this for much more concrete reasons than
| "it gets in the way of their point."
| remram wrote:
| > It's a high-level object (i.e. not a pointer)
|
| I don't know why so many people in this thread try to pretend
| that there's no reason to see Box as a pointer, no one has ever
| called it that, and every user drawing a parallel is confused.
| The documentation for Box is literally (emph mine):
|
| >> *A pointer type* that uniquely owns a heap allocation of
| type T.
|
| Until that documentation changes I find the article's point
| quite valid.
| CodeMage wrote:
| > The documentation for Box is literally (emph mine):
|
| > >> _A pointer type_ that uniquely owns a heap allocation of
| type T.
|
| > Until that documentation changes I find the article's point
| quite valid.
|
| I think there's some confusion here and that it's because the
| concept of "pointer" is slightly overloaded.
|
| One overload of the meaning is "first-class pointer types":
| https://doc.rust-lang.org/reference/types/pointer.html
|
| The other overload, the one used in the docs for Box, Rc, and
| such, is basically "anything that implements Deref". People
| who say "Box is not a pointer" are referring to the fact that
| Box is not a first-class pointer type, i.e. it's neither a
| reference nor a raw pointer.
| ComputerGuru wrote:
| I only meant "not a raw pointer" because rust supports read
| and write operations on raw pointers with very different
| aliasing semantics.
|
| It is an _owned_ pointer, with emphasis on the "owned". You
| can have as many raw pointers to the same memory location as
| you like, you just can't have multiple native rust objects
| pointing to that same memory alive at once, though. It's also
| obvious because Box<T> implements Drop, so obviously it's not
| just something you can pass to a function in lieu of a
| pointer and if you do pass it to a function, you can no
| longer make any assumptions about the lifetime or validity of
| any pointers to the same data.
| mmastrac wrote:
| I strongly disagree with this article. Perhaps in the days before
| Miri it might have made sense, but it's pretty trivial right now
| to discover UB in unsafe code with a simple `cargo +nightly miri
| test` run.
|
| It feels like the Rust team is a bit wary of introducing other
| optimizations for fear of breaking unsafe code that has lurking
| UB, but it's better to start working on fixing these problems
| _now_ rather than get stuck in the present state of limbo. It's
| only going to get harder to fix incorrect code (which we see an
| example of in this particular post).
|
| Honestly Miri is a superpower and it needs to be the priority of
| the Rust team to stabilize it. There's nothing inherently wrong
| with unsafe code: it's unsound code that's the problem, and we
| have the tools to prevent this exact problem from the article.
| IshKebab wrote:
| Miri can't detect all UB, so I think it's still sensible to
| reduce the chance of writing it in the first place.
| mmastrac wrote:
| I'm not aware of any UB that can 1) be caused by unique
| pointer violations from Box and 2) are undetectable by Miri
| (assuming good code coverage), but I might be wrong about
| this.
| IshKebab wrote:
| Well for a start, if your tests don't cover that code then
| it can't possibly detect it.
| mmastrac wrote:
| This is trivially solved by putting code coverage and
| Miri into the same project, however.
| IshKebab wrote:
| It's trivial to _measure_ code coverage. It 's definitely
| not trivial to achieve 100% code coverage.
|
| This is the sort of "just do things perfectly" nonsense
| we get from C programmers. I'm surprised to see it from
| Rust devs, given the whole ethos of Rust is that it
| acknowledges that programmers are not perfect and helping
| them avoid bugs as much as possible is a good thing.
| remram wrote:
| I agree in general, but if you have unsafe code you
| should definitely make sure it is covered.
| mmastrac wrote:
| It's not nonsense. It's really not difficult to structure
| code for 100% coverage of unsafe code if you're thinking
| about it from the start.
|
| You're also perfectly fine to write code that is free of
| `unsafe`, freeing you from this onerous task. We're
| pulling out Miri _because_ we're going outside the normal
| guardrails.
|
| You also don't _need_ to get 100% coverage of all your
| unsafe code if you can be confident of the usage of
| unsafe. The most complex unsafe code should almost
| certainly be covered, but there are a lot of trivial uses
| of unsafe that can be shown to be correct through
| reasoning.
|
| Where possible I prefer to split code into safe and
| unsafe portions, and test the unsafe portions under Miri
| with as much coverage as gives me confidence in the code.
|
| I've made UB mistakes before with unsafe, but since
| adding Miri and code coverage, the numbers of mistakes
| I've made has dropped dramatically. No programmer is
| perfect, but one would be pretty foolish to ignore the
| tools at one's disposal.
| LegionMammal978 wrote:
| If you have an object that's !Unpin, then Miri will not
| apply uniqueness rules to anything containing it [0],
| including boxes and &mut references. (In the example code,
| replacing the PhantomPinned with a () will make Miri
| complain again.) This is considered a temporary (if long-
| lived) measure to allow async executors to manipulate
| pinned futures without invalidating all their internal
| borrows. Thus, it might be seen as undetected UB, in lieu
| of a permanent solution.
|
| [0] https://play.rust-
| lang.org/?version=stable&mode=debug&editio...
| mmastrac wrote:
| I forgot about this. I actually had to ignore a test in
| Miri because of this exact issue.
|
| https://github.com/denoland/deno_core/blob/98b09fa4f77db1
| 131...
|
| Anxiously waiting on https://github.com/rust-
| lang/rfcs/pull/3467
| zamalek wrote:
| This is why I firmly argue (and pretend) that references are
| not pointers. "References are pointers" results in the belief
| that references will behave like C pointers and results in
| things like this article.
|
| At best they are a cousin of pointers.
|
| I consider the fact that they are pointers an implementation
| detail, just like Box is a value with 'static samantics.
| saghm wrote:
| > This is why I firmly argue (and pretend) that references
| are not pointers. "References are pointers" results in the
| belief that references will behave like C pointers and
| results in things like this article.
|
| This is my usual mental model as well. My thinking is that if
| tomorrow a new Rust version came out that used some other
| magical implementation of references that didn't use pointers
| under the hood, my code should still be correct. Maybe
| converting between references and raw pointers would be less
| efficient, but the semantics of my code shouldn't change.
| vlovich123 wrote:
| The biggest annoying magic I found with respect to Box (and other
| std containers like Rc) is that they're the only ones capable of
| storing fat dyn pointers. You can't construct a hybrid_rc::Rc<dyn
| Trait> like you can with Box/Rc.
|
| It's annoying magic like that that bothers me.
|
| Another example is async lifetimes - it's frequently hard to
| properly express the lifetime of a borrow resulting in choices of
| an unnecessary Box::pin, unsafe or even both. Here's an example i
| ran into recently and the author's challenges there are similar
| to the one's I've ran into in my own codebase [1]
|
| Or how about bridging poll-based futures and async (eg if within
| my poll interface I want to call an async method). It's weird how
| there's a world of difference between the implicit future
| generated by async and an explicit type implementing Future. I
| understand the similarity to named function vs closure but I'm
| finding the distinction to have far more annoying sharp edges
| than I've experienced with closures.
|
| The tooling around non-trivial programs is also unfortunate -
| working with an io_uring async runtime and Miri fails to start
| (noted limitation). Valgrind deadlocks for some reason as well
| which means that only asan's more limited techniques are usable.
|
| My point is that soundness issues writing unsafe code is
| important but a niche topic vs what I've experienced writing a
| substantial program in Rust (~40k lines of code so far). It's
| doable but I find myself still fighting with the language just a
| bit too much.
|
| Hopefully it's completely different teams responsible for these
| kinds of work but, if not, I'd vote for stabilizing some of the
| ergonomic magic that std has access to and improving the borrow
| checker to recognize more definitely safe constructs so that
| users don't need to do annoying hoop jumping. I know the std
| magic I referenced is being worked on but as with all things rust
| it's impossible to predict what actually gets stabilized and when
| with the exception of marquee tentpole features they talk about
| on the blog.
|
| [1] https://github.com/someguynamedjosh/ouroboros/issues/112
| LegionMammal978 wrote:
| > The biggest annoying magic I found with respect to Box (and
| other std containers like Rc) is that they're the only ones
| capable of storing fat dyn pointers. You can't construct a
| hybrid_rc::Rc<dyn Trait> like you can with Box/Rc.
|
| It's perfectly possible to make a container capable of storing
| trait objects: just define the type parameter as <T: ?Sized>.
| The main issue is that unlike Box/Rc, the compiler won't give
| you an automatic coercion from MyRc<Type> to MyRc<dyn Trait>,
| so you have to write a method to explicitly perform that cast.
| It just isn't common for many existing third-party containers
| to support !Sized objects, since it takes tedious unsafe code
| to manipulate them in memory.
| vlovich123 wrote:
| Sorry - that's exactly what I meant. The automatic conversion
| from <T> to <dyn Trait>.
| mjw1007 wrote:
| Improving this is the subject of RFC #3621 [1], which
| appeared today.
|
| [1]: https://github.com/Darksonn/rfcs/blob/derive-smart-
| pointer/t...
| gpm wrote:
| > The biggest annoying magic I found with respect to Box (and
| other std containers like Rc) is that they're the only ones
| capable of storing fat dyn pointers. You can't construct a
| hybrid_rc::Rc<dyn Trait> like you can with Box/Rc.
|
| Anything can _store_ fat dyn pointers, they 're just like any
| other type in that regards.
|
| Constructing them for a specific trait is easy and possible on
| stable (e.g. adding a `as_debug(MyBox<T>) -> MyBox<dyn Debug>`
| method).
|
| Making it possible to construct them for _any_ trait is special
| to the built in pointers... on stable. On nightly with unstable
| features it 's possible (and easy) to make any smart pointer
| type do this.
|
| Code examples:
|
| https://play.rust-lang.org/?version=nightly&mode=debug&editi...
| vlovich123 wrote:
| I should have taken more time on that and miswrote.
| Converting MyBox<T> to MyBox<dyn Trait> for arbitrary traits
| is only possible on nightly.
| landr0id wrote:
| >While we are many missing language features away from this being
| the case, the noalias case is also magic descended upon box
| itself, with no user code ever having access to it.
|
| I'm not sure why the author thinks there's magic behind Box. Box
| is not a special case of `noalias`. Run this snippet with miri
| and you'll see the same issue: https://play.rust-
| lang.org/?version=stable&mode=debug&editio...
|
| You don't see an assertion failure though because... _dun dun
| dun_ it 's UB.
|
| `Box<T>` _does_ have an expectation that its inner pointer is not
| aliased to another Box (even if used for readonly operations).
| See: https://github.com/rust-
| lang/miri/issues/1800#issuecomment-8...)
| panstromek wrote:
| Well, they work on the compiler, so that's one reason I guess.
| Also the fact that it's magic is no secret and this is not the
| only way in which it is (the most important is probably the
| DerefMove behaviour that's mentioned in the article, too).
| There's been many discussions around this in the past
| andy_xor_andrew wrote:
| One of the biggest struggles I have (and others have, judging
| from Stack Overflow) is how to generically handle accepting types
| of Box<T>, Rc<T>, T, Pin<T>, &T, &mut T, etc etc.
|
| Of course you can write a function that is generic on <I, T:
| AsRef<I>> or something, but the moment you introduce function-
| coloring stuff like async, object safety, etc, things explode.
| pylua wrote:
| Is that by design ? I am a novice at rust, but that is sort of
| how it feels. I could be missing something.
| pornel wrote:
| It requires supporting higher-kinded types, and Rust was
| reluctant to add them (although it's slowly getting there
| with higher kinded lifetimes and generic associated types).
| unstruktured wrote:
| macros can help with this if you can narrow down the traits you
| want to support.
|
| https://doc.rust-lang.org/reference/macros.html
| 3836293648 wrote:
| You just write an &T version and let everyone coerce their
| smart pointer into a reference on call?
|
| Worst case you write your generic asref function and then
| immediately delegate to the &T version
| vishalontheline wrote:
| Is there an ELI5 video / tutorial for all things box-variables
| that you can recommend?
|
| I understand pointers, I understand references, I understand
| ownership and mutability. I feel lost with Box things. The
| official documentation came across as cryptic to me and I had a
| hard time getting over the syntax. Like, what is "T" and why
| does it get passed into Box<> ... etc.
| pornel wrote:
| There's magic in the Box: ability to partially move content out
| of it, where any other type with Drop couldn't handle it. You can
| implement traits for Box<Foo> even when Othertype<Foo> wouldn't
| be allowed.
|
| But noalias is not very special for Rust. &mut and & have a bunch
| of limitations too.
|
| But there's no need to give up on them, because Rust has the
| UnsafeCell wrapper type for doing crimes with pointers. It
| selectively disables noalias, thread safety, etc. Instead of
| weakening guarantees of Box in general for all types, just insert
| UnsafeCell where you need to be clever with pointers.
___________________________________________________________________
(page generated 2024-05-02 23:00 UTC)