[HN Gopher] A surprising enum size optimization in the Rust comp...
___________________________________________________________________
A surprising enum size optimization in the Rust compiler
Author : returningfory2
Score : 101 points
Date : 2025-04-07 22:30 UTC (3 days ago)
(HTM) web link (jpfennell.com)
(TXT) w3m dump (jpfennell.com)
| packetlost wrote:
| I don't think it's actually "flattening" the enums discriminant
| spaces (though maybe it is). My guess is this is just 32-bit
| alignment requirement (ie. bytes 1:4 are padding) + sub-byte
| discriminant sizes. The surprising thing here is that the
| ordering of Outer's variants doesn't match the ordering as
| defined, instead having variant D's discriminant be 0 (0 <<
| size_of::<Inner's discriminant>). size_of::<Inner> is actually 33
| bits and size_of::<Outer> is 34 bits and then you just apply
| alignment requirements to each field. Actual size_of calls will
| account for alignment and padding, but the compiler knows the
| truth.
|
| What's cool about this in general is nested match statements can
| be flattened into a jumplist (idk if rustc does this, but it's
| possible in theory).
| hinkley wrote:
| In a lot of languages space optimizing Optional Types without
| using a reserved enum value or pointer tags would lead to
| memory model problems with atomically writing two values at
| once which might be more easily solved in a borrow semantics
| world. I hope there is someone out there mining research papers
| for the implementation strategies that were abandoned as
| unworkable due to bookkeeping issues, which Rust has already
| paid.
|
| But in the case of Options they tend to be both write-once and
| short-lived, so that removes a lot of necessity. Options are
| going to stay mostly on the stack and in async callbacks,
| unless they go into caches.
|
| But for other data structures where multiples fields need a
| tag, I suspect Rust could use some bitfields for representing
| them. You'd need a fairly big win to make it worth implementing
| however.
| packetlost wrote:
| I'm honestly not exactly sure what you're talking about, but
| the fundamental limit for atomic writes is usually the
| register-size for a CPU which is usually 64 or 32 bits.
| Considering enum variants are often larger than a single
| register in size, I don't see how atomic writes would ever be
| sane expectation.
| hinkley wrote:
| Updating an aligned pointer is atomic. If you haven't
| tagged it by moving bits to a neighboring word.
| packetlost wrote:
| I see. Yeah, you would either have to add the tagging to
| the upper bits of the pointer itself or concede that
| updates to a tagged type is not atomic. I feel like the
| latter is fine in most every scenario I've encountered in
| Rust (thanks to borrow checker rules) but in other
| languages the same cannot be said.
| Georgelemental wrote:
| > In a lot of languages space optimizing Optional Types
| without using a reserved enum value or pointer tags would
| lead to memory model problems with atomically writing two
| values at once which might be more easily solved in a borrow
| semantics world.
|
| Yes, Rust suppresses the niche optimization for values
| wrapped in an `UnsafeCell` (which is how you signal to the
| compiler that "atomically writing two values at once" might
| happen). https://github.com/rust-lang/rust/pull/68491
| mmastrac wrote:
| This is a great way to see why invalid UTF-8 strings and unicode
| chars cause undefined behaviour in Rust. `char` is a special
| integer type, known to have a valid range which is a sub-range of
| its storage type. Outside of dataless enums, this is the only
| datatype with this behaviour (EDIT: I neglected
| NonZero<...>/NonZeroXXX and some other zero-niche types).
|
| If you manage to construct an invalid char from an invalid string
| or any other way, you can defeat the niche optimization code and
| accidentally create yourself an unsound transmute, which is game
| over for soundness.
| NoTeslaThrow wrote:
| > This is a great way to see why invalid UTF-8 strings and
| unicode chars cause undefined behaviour in Rust.
|
| What does "undefined behavior" mean without a spec? Wouldn't
| the behavior rustc produces today be de-facto defined behavior?
| It seems like the contention is violating some transmute
| constraint, but does this not result in reproducible runtime
| behavior? In what context are you framing "soundness"?
|
| EDIT: I'm honestly befuddled why anyone would downvote this. I
| certainly don't think this is detracting from the conversation
| at all--how can you understand the semantics of the above
| comment without understanding what the intended meaning of
| "undefined behavior" or "soundness" is?
| mmastrac wrote:
| > What does "undefined behavior" mean without a spec?
|
| While not as formalized as C/C++, Rust's "spec" exists in the
| reference, nomicon, RFCs and documentation. I believe that
| there is a desire for a spec, but enough resources exist that
| the community can continue without one with no major negative
| side-effects (unless you want to re-implement the compiler
| from scratch, I suppose).
|
| The compiler may exploit "lack of UB" for optimizations,
| e.g., using a known-invalid value as a niche, optimizing away
| safety checks, etc.
|
| > Wouldn't the behavior rustc produces today be de-facto
| defined behavior?
|
| Absolutely not. Bugs are fixed and the behaviour changes. Not
| often, but it happens.
|
| This post probably answers a lot of your reply as well:
| https://jacko.io/safety_and_soundness.html
| NoTeslaThrow wrote:
| EDIT:
|
| > While not as formalized as C/C++, Rust's "spec" exists in
| the reference, nomicon, RFCs and documentation. I believe
| that there is a desire for a spec, but enough resources
| exist that the community can continue without one with no
| major negative side-effects (unless you want to re-
| implement the compiler from scratch, I suppose).
|
| Thank you, I was unaware that this is a thing.
|
| > This post probably answers a lot of your reply as well:
| https://jacko.io/safety_and_soundness.html
|
| This appears to also rely on "undefined behavior" as a
| meaningful term.
| mmastrac wrote:
| > This appears to also rely on "undefined behavior" as a
| meaningful term.
|
| I assure you it is a meaningful term:
|
| https://llvm.org/docs/UndefinedBehavior.html
| NoTeslaThrow wrote:
| Ok, but in the context of the language at hand?
| Presumably the IR has distinct semantics from the
| language that generates the IR. Does UB just strictly
| resolve to LLVM UB? That's very reasonable!
| fc417fc802 wrote:
| No. UB is a term of art here.
|
| Consider a hypothetical non-LLVM full reimplementation of
| the compiler. If it optimizes and there are invalid
| assumptions then there is likely UB. LLVM isn't involved
| in that case though.
| NoTeslaThrow wrote:
| > If it optimizes and there are invalid assumptions then
| there is likely UB.
|
| It's the distinguishing from bugs that concerns me.
| fc417fc802 wrote:
| I don't follow. Isn't UB a subset of bugs or
| alternatively a follow on consequence that causes
| observable behavior to further deviate?
| NoTeslaThrow wrote:
| > Isn't UB a subset of bugs
|
| No, not at all. UB can still produce correct and expected
| results for the entire input domain.
| fc417fc802 wrote:
| If I have a bug that only triggers between 9 and 10 am
| EST on Mondays that is still a bug, no? Now extend that
| to "rand(1.0) < 0.01". Now extend that to a check using
| __TIME__ that goes off at compile time instead of runtime
| (some binaries are buggy, some aren't). Now extend that
| to UB.
| vlovich123 wrote:
| It is a bug - you've violated the contract between the
| language and the compiler.
|
| Just like segfault or logic bug, it's a class of bugs.
| Why is special though is that in most bugs you just hit
| an invalid state. In UB you can end up executing code
| that never existed or not executing code that does exist.
| Or any number of other things can happen because the
| compiler applies an optimization assuming a runtime state
| you promised it would never occur but did.
|
| It's slightly different from being a strict subset
| because UB is actually exploited to perform optimizations
| - UB is not allowed so the compiler is able to emit more
| efficient code is taught to exploit that and the language
| allows for it (eg the niche optimization the blog
| describes)
| vollbrecht wrote:
| You can find a general overview for the language at hand
| in "The rust reference"[1]. For a more formal document,
| you can have a look in to the ferroscene language
| specification list of undefined behaviour[2] section.
| From there you can jump to different section, and see
| legality rules, and undefined behavior sections for each.
|
| The ferroscene language spec was recently donated to the
| rust foundation.
|
| [1] https://doc.rust-lang.org/reference/behavior-
| considered-unde... [2]
| https://spec.ferrocene.dev/undefined-behavior.html
| newpavlov wrote:
| "Undefined behavior" means that the compiler can apply
| optimizations which assume that it does not happen, resulting
| in an incorrect code. A simpler example is
| `Option<NonZeroU8>`, the compiler assumes that `NonZeroU8`
| can never contain 0, thus it can use 0 as value for `None`.
| Now, if you take a reference to the inner `NonZeroU8` stored
| in `Some` and write 0 to it, you changed `Some` to `None`,
| while other optimizations may rely on the assumption that
| references to the content of `Some` can not flip the enum
| variant to `None`.
|
| You don't need a full language spec to declare something UB.
| And, arguably, from the compiler correctness perspective,
| there is no fundamental difference between walls of prose in
| the C/C++ "spec" and the "informal spec" currently used by
| Rust. (Well, there is the CompCert exception, but it's quite
| far from the mainstream compilers in many regards)
| NoTeslaThrow wrote:
| > resulting in an incorrect code.
|
| Incorrect with respect to an assumption furnished where?
| Your sibling comment mentions RFCs--is this behavior tied
| to some kind of documented expectation?
|
| > A simpler example is `Option<NonZeroU8>`, the compiler
| assumes that `NonZeroU8` can never contain 0, thus it can
| use 0 as value for `None`. Now, if you take a reference to
| the inner `NonZeroU8` stored in `Some` and write 0 to it,
| you changed `Some` to `None`, while other optimizations may
| rely on the assumption that references to the content of
| `Some` can not flip the enum variant to `None`.
|
| That seems to be the _intended_ behavior, unless I 'm
| reading incorrectly. Why else would you write a 0 to it?
| Also, does this not require using the `unsafe` keyword? So
| is tricking the compiler into producing the behavior you
| described not the expected and intended behavior?
| newpavlov wrote:
| >Incorrect with respect to an assumption furnished where?
|
| In the definition of the `NonZeroU8` type. Or in a more
| practical terms, in LLVM, when we generate LLVM IR we
| communicate this property to LLVM and it in turn uses it
| to apply optimizations to our code.
|
| >Also, does this not require using the `unsafe` keyword?
|
| Yes, it requires `unsafe` and the point is that writing 0
| to `NonZeroU8` is UB since it breaks the locality
| principle critical for correctness of optimizations.
| Applying just one incorrect (because of the broken
| assumption) optimization together with numerous other
| (correct) optimizations can easily lead to very
| surprising results, which are practically impossible to
| predict and debug. This is why it's considered such
| anathema to have UB in code, since having UB in one place
| may completely break code somewhere far away.
| fc417fc802 wrote:
| It's not intended in that the compiler may have optimized
| based on the assumption that you have now gone and
| violated via an unsafe block. Just as in C that would
| produce undefined behavior in that there's no telling
| what consequences the optimization might have. The lack
| of a formal specification isn't relevant.
| duckerude wrote:
| It means that anything strange that happens next isn't a
| language bug.
|
| Whether something is a bug or not is sometimes hard to pin
| down because there's no formal spec. Most of the time it's
| pretty clear though. Most software doesn't have a formal spec
| and manages to categorize bugs anyway.
| timerol wrote:
| > Outside of dataless enums, this is the only datatype with
| this behaviour.
|
| Note that there are non-zero integer types that can also be
| used in this way, like NonZeroU8 https://doc.rust-
| lang.org/std/num/type.NonZeroU8.html. The NULL pointer is also
| used as a niche, and you can create your own as well, as
| documented in
| https://www.0xatticus.com/posts/understanding_rust_niche/
| mmastrac wrote:
| Ack, yeah. I forgot about those despite having used them.
| That's a good point and I stand corrected. Edited post above.
| deathanatos wrote:
| I guess it depends on whether the sentence is only
| qualified as "integer" types, but bool is sort of the same
| way, no? A bool must be either 0 or 1 (false or true), or
| it's UB.
|
| (And I think for much the same reason, the niche
| optimization. Option<bool> is 1 B.)
|
| (And for the non-Rustaceans, the only way to get a bool to
| be not false or true, i.e., not 0 or 1, would be unsafe {}
| code. Or put differently, _not_ having a bool be "2" is an
| invariant unsafe code must not violate. (IIRC, at all
| times, even in unsafe code.))
| tialaramex wrote:
| Well, in practice you can't make your own non-enum types in
| stable Rust with this property, to unblock this Rust needs
| pattern types, (I wanted a path to do this a different way,
| but I was persuaded it's not a good idea). As that link
| explains the stdlib is doing this via a perma-unstable
| compiler use only attribute.
|
| But yes, there are the NonZero integers, and you can make
| your own NonBlah integer using the "XOR trick" for a
| relatively tiny performance overhead, as well as you can make
| enums which is how the current CompactString works.
|
| The link you gave mentions that Rust does this for other
| types, but in particular OwnedFd is often useful on Unix
| systems. Option<OwnedFd> has the same _implementation_ as a C
| file descriptor, but the same _ergonomics_ as a fancy high
| level data structure, that 's the sort of optimisation we're
| here for.
|
| Alas the Windows equivalent can't do this because different
| parts of Microsoft use all zeroes and -1 to mean different
| things, so both are potentially valid.
| hinkley wrote:
| I seem to recall someone posting a ridiculously fast utf-8
| validator here based on SIMD instructions. Nothing is free but
| some things can be dirt cheap.
| tlb wrote:
| If the compiler is using a niche, it should really check every
| assignment that it's not accidentally the niche. That's still
| faster than also writing the tag.
| jenadine wrote:
| It doesn't need to check the assignment because that type
| cannot be the niche by construction.
| pitaj wrote:
| This actually _is_ niche optimization. The outer enum is using
| the niches available in the tag of the inner enum for its own
| discriminant.
|
| The author seems to have a limited understanding of niche
| optimization.
| returningfory2 wrote:
| In the strictly technical sense I agree, however in the Rust
| community when people say "the niche optimization" they usually
| are referring only to the simplest case. For example in the
| book Rust For Rustaceans (written by a pretty serious Rust
| expert and aimed at intermediate-level Rust programmers) the
| nice optimization is described as "the Rust compiler using
| invalid bit patterns to represent enum variants that hold no
| data". Note the "hold no data" part - it doesn't incorporate
| the case in the blog.
|
| In any case, the definition of what is exactly niche
| optimization is besides the point. The point of the post is:
| the literature gives you the impression that there's one
| limited form of enum size optimization in the Rust compiler,
| but in fact there are other optimizations too. And knowing this
| is useful!
| lordnacho wrote:
| Does the niche optimization require the compiler to know things
| about the type? So that only certain specializations will work?
|
| Also, how do I get some code to do the memory layout vizualizer,
| perhaps one that is a bit more advanced and knows what pointers
| are?
| pornel wrote:
| Internally the compiler tracks what "niches" exist in types,
| like minimum and maximum valid values, and takes the unused
| values for enum tags.
|
| One thing it can't use is padding in structs, because
| references to individual fields must remain valid, and they
| don't guarantee that padding will be preserved.
| subarctic wrote:
| When was this implemented? I remember when people used to have to
| manually flatten nested enums in order to save space
| returningfory2 wrote:
| Many years ago I think, here: https://github.com/rust-
| lang/rust/issues/46213. (Found this via https://lobste.rs/s/w3j
| jb2/surprising_enum_size_optimization...)
| kazinator wrote:
| This seems like it could be summarized as tag merging. When the
| member of an enum is another enum, then the two can be
| effectively merged, rather than tested, so that there is one
| discriminant tag. The tag values have to be displaced/renumbered
| so that their ranges do not overlap.
| Joker_vD wrote:
| Yeah, there are all sorts of nifty little tricks about
| compressing the enums, Appel dedicates a huge chunk of section
| "4.1. Data representation" on it in his "Compiling with
| Continuations" book about the internals of the SML/NJ compiler
| for the Standard ML, and ultimately concludes that the returns
| are quickly getting really diminishing with those, and since
| datatypes can actually be abstract in ML (in pretty much the same
| way they can be in Rust), the applicability of those tricks is
| generally restricted to unexported, intramodular datatypes.
___________________________________________________________________
(page generated 2025-04-10 23:00 UTC)