[HN Gopher] A surprising enum size optimization in the Rust comp...
       ___________________________________________________________________
        
       A surprising enum size optimization in the Rust compiler
        
       Author : returningfory2
       Score  : 101 points
       Date   : 2025-04-07 22:30 UTC (3 days ago)
        
 (HTM) web link (jpfennell.com)
 (TXT) w3m dump (jpfennell.com)
        
       | packetlost wrote:
       | I don't think it's actually "flattening" the enums discriminant
       | spaces (though maybe it is). My guess is this is just 32-bit
       | alignment requirement (ie. bytes 1:4 are padding) + sub-byte
       | discriminant sizes. The surprising thing here is that the
       | ordering of Outer's variants doesn't match the ordering as
       | defined, instead having variant D's discriminant be 0 (0 <<
       | size_of::<Inner's discriminant>). size_of::<Inner> is actually 33
       | bits and size_of::<Outer> is 34 bits and then you just apply
       | alignment requirements to each field. Actual size_of calls will
       | account for alignment and padding, but the compiler knows the
       | truth.
       | 
       | What's cool about this in general is nested match statements can
       | be flattened into a jumplist (idk if rustc does this, but it's
       | possible in theory).
        
         | hinkley wrote:
         | In a lot of languages space optimizing Optional Types without
         | using a reserved enum value or pointer tags would lead to
         | memory model problems with atomically writing two values at
         | once which might be more easily solved in a borrow semantics
         | world. I hope there is someone out there mining research papers
         | for the implementation strategies that were abandoned as
         | unworkable due to bookkeeping issues, which Rust has already
         | paid.
         | 
         | But in the case of Options they tend to be both write-once and
         | short-lived, so that removes a lot of necessity. Options are
         | going to stay mostly on the stack and in async callbacks,
         | unless they go into caches.
         | 
         | But for other data structures where multiples fields need a
         | tag, I suspect Rust could use some bitfields for representing
         | them. You'd need a fairly big win to make it worth implementing
         | however.
        
           | packetlost wrote:
           | I'm honestly not exactly sure what you're talking about, but
           | the fundamental limit for atomic writes is usually the
           | register-size for a CPU which is usually 64 or 32 bits.
           | Considering enum variants are often larger than a single
           | register in size, I don't see how atomic writes would ever be
           | sane expectation.
        
             | hinkley wrote:
             | Updating an aligned pointer is atomic. If you haven't
             | tagged it by moving bits to a neighboring word.
        
               | packetlost wrote:
               | I see. Yeah, you would either have to add the tagging to
               | the upper bits of the pointer itself or concede that
               | updates to a tagged type is not atomic. I feel like the
               | latter is fine in most every scenario I've encountered in
               | Rust (thanks to borrow checker rules) but in other
               | languages the same cannot be said.
        
           | Georgelemental wrote:
           | > In a lot of languages space optimizing Optional Types
           | without using a reserved enum value or pointer tags would
           | lead to memory model problems with atomically writing two
           | values at once which might be more easily solved in a borrow
           | semantics world.
           | 
           | Yes, Rust suppresses the niche optimization for values
           | wrapped in an `UnsafeCell` (which is how you signal to the
           | compiler that "atomically writing two values at once" might
           | happen). https://github.com/rust-lang/rust/pull/68491
        
       | mmastrac wrote:
       | This is a great way to see why invalid UTF-8 strings and unicode
       | chars cause undefined behaviour in Rust. `char` is a special
       | integer type, known to have a valid range which is a sub-range of
       | its storage type. Outside of dataless enums, this is the only
       | datatype with this behaviour (EDIT: I neglected
       | NonZero<...>/NonZeroXXX and some other zero-niche types).
       | 
       | If you manage to construct an invalid char from an invalid string
       | or any other way, you can defeat the niche optimization code and
       | accidentally create yourself an unsound transmute, which is game
       | over for soundness.
        
         | NoTeslaThrow wrote:
         | > This is a great way to see why invalid UTF-8 strings and
         | unicode chars cause undefined behaviour in Rust.
         | 
         | What does "undefined behavior" mean without a spec? Wouldn't
         | the behavior rustc produces today be de-facto defined behavior?
         | It seems like the contention is violating some transmute
         | constraint, but does this not result in reproducible runtime
         | behavior? In what context are you framing "soundness"?
         | 
         | EDIT: I'm honestly befuddled why anyone would downvote this. I
         | certainly don't think this is detracting from the conversation
         | at all--how can you understand the semantics of the above
         | comment without understanding what the intended meaning of
         | "undefined behavior" or "soundness" is?
        
           | mmastrac wrote:
           | > What does "undefined behavior" mean without a spec?
           | 
           | While not as formalized as C/C++, Rust's "spec" exists in the
           | reference, nomicon, RFCs and documentation. I believe that
           | there is a desire for a spec, but enough resources exist that
           | the community can continue without one with no major negative
           | side-effects (unless you want to re-implement the compiler
           | from scratch, I suppose).
           | 
           | The compiler may exploit "lack of UB" for optimizations,
           | e.g., using a known-invalid value as a niche, optimizing away
           | safety checks, etc.
           | 
           | > Wouldn't the behavior rustc produces today be de-facto
           | defined behavior?
           | 
           | Absolutely not. Bugs are fixed and the behaviour changes. Not
           | often, but it happens.
           | 
           | This post probably answers a lot of your reply as well:
           | https://jacko.io/safety_and_soundness.html
        
             | NoTeslaThrow wrote:
             | EDIT:
             | 
             | > While not as formalized as C/C++, Rust's "spec" exists in
             | the reference, nomicon, RFCs and documentation. I believe
             | that there is a desire for a spec, but enough resources
             | exist that the community can continue without one with no
             | major negative side-effects (unless you want to re-
             | implement the compiler from scratch, I suppose).
             | 
             | Thank you, I was unaware that this is a thing.
             | 
             | > This post probably answers a lot of your reply as well:
             | https://jacko.io/safety_and_soundness.html
             | 
             | This appears to also rely on "undefined behavior" as a
             | meaningful term.
        
               | mmastrac wrote:
               | > This appears to also rely on "undefined behavior" as a
               | meaningful term.
               | 
               | I assure you it is a meaningful term:
               | 
               | https://llvm.org/docs/UndefinedBehavior.html
        
               | NoTeslaThrow wrote:
               | Ok, but in the context of the language at hand?
               | Presumably the IR has distinct semantics from the
               | language that generates the IR. Does UB just strictly
               | resolve to LLVM UB? That's very reasonable!
        
               | fc417fc802 wrote:
               | No. UB is a term of art here.
               | 
               | Consider a hypothetical non-LLVM full reimplementation of
               | the compiler. If it optimizes and there are invalid
               | assumptions then there is likely UB. LLVM isn't involved
               | in that case though.
        
               | NoTeslaThrow wrote:
               | > If it optimizes and there are invalid assumptions then
               | there is likely UB.
               | 
               | It's the distinguishing from bugs that concerns me.
        
               | fc417fc802 wrote:
               | I don't follow. Isn't UB a subset of bugs or
               | alternatively a follow on consequence that causes
               | observable behavior to further deviate?
        
               | NoTeslaThrow wrote:
               | > Isn't UB a subset of bugs
               | 
               | No, not at all. UB can still produce correct and expected
               | results for the entire input domain.
        
               | fc417fc802 wrote:
               | If I have a bug that only triggers between 9 and 10 am
               | EST on Mondays that is still a bug, no? Now extend that
               | to "rand(1.0) < 0.01". Now extend that to a check using
               | __TIME__ that goes off at compile time instead of runtime
               | (some binaries are buggy, some aren't). Now extend that
               | to UB.
        
               | vlovich123 wrote:
               | It is a bug - you've violated the contract between the
               | language and the compiler.
               | 
               | Just like segfault or logic bug, it's a class of bugs.
               | Why is special though is that in most bugs you just hit
               | an invalid state. In UB you can end up executing code
               | that never existed or not executing code that does exist.
               | Or any number of other things can happen because the
               | compiler applies an optimization assuming a runtime state
               | you promised it would never occur but did.
               | 
               | It's slightly different from being a strict subset
               | because UB is actually exploited to perform optimizations
               | - UB is not allowed so the compiler is able to emit more
               | efficient code is taught to exploit that and the language
               | allows for it (eg the niche optimization the blog
               | describes)
        
               | vollbrecht wrote:
               | You can find a general overview for the language at hand
               | in "The rust reference"[1]. For a more formal document,
               | you can have a look in to the ferroscene language
               | specification list of undefined behaviour[2] section.
               | From there you can jump to different section, and see
               | legality rules, and undefined behavior sections for each.
               | 
               | The ferroscene language spec was recently donated to the
               | rust foundation.
               | 
               | [1] https://doc.rust-lang.org/reference/behavior-
               | considered-unde... [2]
               | https://spec.ferrocene.dev/undefined-behavior.html
        
           | newpavlov wrote:
           | "Undefined behavior" means that the compiler can apply
           | optimizations which assume that it does not happen, resulting
           | in an incorrect code. A simpler example is
           | `Option<NonZeroU8>`, the compiler assumes that `NonZeroU8`
           | can never contain 0, thus it can use 0 as value for `None`.
           | Now, if you take a reference to the inner `NonZeroU8` stored
           | in `Some` and write 0 to it, you changed `Some` to `None`,
           | while other optimizations may rely on the assumption that
           | references to the content of `Some` can not flip the enum
           | variant to `None`.
           | 
           | You don't need a full language spec to declare something UB.
           | And, arguably, from the compiler correctness perspective,
           | there is no fundamental difference between walls of prose in
           | the C/C++ "spec" and the "informal spec" currently used by
           | Rust. (Well, there is the CompCert exception, but it's quite
           | far from the mainstream compilers in many regards)
        
             | NoTeslaThrow wrote:
             | > resulting in an incorrect code.
             | 
             | Incorrect with respect to an assumption furnished where?
             | Your sibling comment mentions RFCs--is this behavior tied
             | to some kind of documented expectation?
             | 
             | > A simpler example is `Option<NonZeroU8>`, the compiler
             | assumes that `NonZeroU8` can never contain 0, thus it can
             | use 0 as value for `None`. Now, if you take a reference to
             | the inner `NonZeroU8` stored in `Some` and write 0 to it,
             | you changed `Some` to `None`, while other optimizations may
             | rely on the assumption that references to the content of
             | `Some` can not flip the enum variant to `None`.
             | 
             | That seems to be the _intended_ behavior, unless I 'm
             | reading incorrectly. Why else would you write a 0 to it?
             | Also, does this not require using the `unsafe` keyword? So
             | is tricking the compiler into producing the behavior you
             | described not the expected and intended behavior?
        
               | newpavlov wrote:
               | >Incorrect with respect to an assumption furnished where?
               | 
               | In the definition of the `NonZeroU8` type. Or in a more
               | practical terms, in LLVM, when we generate LLVM IR we
               | communicate this property to LLVM and it in turn uses it
               | to apply optimizations to our code.
               | 
               | >Also, does this not require using the `unsafe` keyword?
               | 
               | Yes, it requires `unsafe` and the point is that writing 0
               | to `NonZeroU8` is UB since it breaks the locality
               | principle critical for correctness of optimizations.
               | Applying just one incorrect (because of the broken
               | assumption) optimization together with numerous other
               | (correct) optimizations can easily lead to very
               | surprising results, which are practically impossible to
               | predict and debug. This is why it's considered such
               | anathema to have UB in code, since having UB in one place
               | may completely break code somewhere far away.
        
               | fc417fc802 wrote:
               | It's not intended in that the compiler may have optimized
               | based on the assumption that you have now gone and
               | violated via an unsafe block. Just as in C that would
               | produce undefined behavior in that there's no telling
               | what consequences the optimization might have. The lack
               | of a formal specification isn't relevant.
        
           | duckerude wrote:
           | It means that anything strange that happens next isn't a
           | language bug.
           | 
           | Whether something is a bug or not is sometimes hard to pin
           | down because there's no formal spec. Most of the time it's
           | pretty clear though. Most software doesn't have a formal spec
           | and manages to categorize bugs anyway.
        
         | timerol wrote:
         | > Outside of dataless enums, this is the only datatype with
         | this behaviour.
         | 
         | Note that there are non-zero integer types that can also be
         | used in this way, like NonZeroU8 https://doc.rust-
         | lang.org/std/num/type.NonZeroU8.html. The NULL pointer is also
         | used as a niche, and you can create your own as well, as
         | documented in
         | https://www.0xatticus.com/posts/understanding_rust_niche/
        
           | mmastrac wrote:
           | Ack, yeah. I forgot about those despite having used them.
           | That's a good point and I stand corrected. Edited post above.
        
             | deathanatos wrote:
             | I guess it depends on whether the sentence is only
             | qualified as "integer" types, but bool is sort of the same
             | way, no? A bool must be either 0 or 1 (false or true), or
             | it's UB.
             | 
             | (And I think for much the same reason, the niche
             | optimization. Option<bool> is 1 B.)
             | 
             | (And for the non-Rustaceans, the only way to get a bool to
             | be not false or true, i.e., not 0 or 1, would be unsafe {}
             | code. Or put differently, _not_ having a bool be  "2" is an
             | invariant unsafe code must not violate. (IIRC, at all
             | times, even in unsafe code.))
        
           | tialaramex wrote:
           | Well, in practice you can't make your own non-enum types in
           | stable Rust with this property, to unblock this Rust needs
           | pattern types, (I wanted a path to do this a different way,
           | but I was persuaded it's not a good idea). As that link
           | explains the stdlib is doing this via a perma-unstable
           | compiler use only attribute.
           | 
           | But yes, there are the NonZero integers, and you can make
           | your own NonBlah integer using the "XOR trick" for a
           | relatively tiny performance overhead, as well as you can make
           | enums which is how the current CompactString works.
           | 
           | The link you gave mentions that Rust does this for other
           | types, but in particular OwnedFd is often useful on Unix
           | systems. Option<OwnedFd> has the same _implementation_ as a C
           | file descriptor, but the same _ergonomics_ as a fancy high
           | level data structure, that 's the sort of optimisation we're
           | here for.
           | 
           | Alas the Windows equivalent can't do this because different
           | parts of Microsoft use all zeroes and -1 to mean different
           | things, so both are potentially valid.
        
         | hinkley wrote:
         | I seem to recall someone posting a ridiculously fast utf-8
         | validator here based on SIMD instructions. Nothing is free but
         | some things can be dirt cheap.
        
         | tlb wrote:
         | If the compiler is using a niche, it should really check every
         | assignment that it's not accidentally the niche. That's still
         | faster than also writing the tag.
        
           | jenadine wrote:
           | It doesn't need to check the assignment because that type
           | cannot be the niche by construction.
        
       | pitaj wrote:
       | This actually _is_ niche optimization. The outer enum is using
       | the niches available in the tag of the inner enum for its own
       | discriminant.
       | 
       | The author seems to have a limited understanding of niche
       | optimization.
        
         | returningfory2 wrote:
         | In the strictly technical sense I agree, however in the Rust
         | community when people say "the niche optimization" they usually
         | are referring only to the simplest case. For example in the
         | book Rust For Rustaceans (written by a pretty serious Rust
         | expert and aimed at intermediate-level Rust programmers) the
         | nice optimization is described as "the Rust compiler using
         | invalid bit patterns to represent enum variants that hold no
         | data". Note the "hold no data" part - it doesn't incorporate
         | the case in the blog.
         | 
         | In any case, the definition of what is exactly niche
         | optimization is besides the point. The point of the post is:
         | the literature gives you the impression that there's one
         | limited form of enum size optimization in the Rust compiler,
         | but in fact there are other optimizations too. And knowing this
         | is useful!
        
       | lordnacho wrote:
       | Does the niche optimization require the compiler to know things
       | about the type? So that only certain specializations will work?
       | 
       | Also, how do I get some code to do the memory layout vizualizer,
       | perhaps one that is a bit more advanced and knows what pointers
       | are?
        
         | pornel wrote:
         | Internally the compiler tracks what "niches" exist in types,
         | like minimum and maximum valid values, and takes the unused
         | values for enum tags.
         | 
         | One thing it can't use is padding in structs, because
         | references to individual fields must remain valid, and they
         | don't guarantee that padding will be preserved.
        
       | subarctic wrote:
       | When was this implemented? I remember when people used to have to
       | manually flatten nested enums in order to save space
        
         | returningfory2 wrote:
         | Many years ago I think, here: https://github.com/rust-
         | lang/rust/issues/46213. (Found this via https://lobste.rs/s/w3j
         | jb2/surprising_enum_size_optimization...)
        
       | kazinator wrote:
       | This seems like it could be summarized as tag merging. When the
       | member of an enum is another enum, then the two can be
       | effectively merged, rather than tested, so that there is one
       | discriminant tag. The tag values have to be displaced/renumbered
       | so that their ranges do not overlap.
        
       | Joker_vD wrote:
       | Yeah, there are all sorts of nifty little tricks about
       | compressing the enums, Appel dedicates a huge chunk of section
       | "4.1. Data representation" on it in his "Compiling with
       | Continuations" book about the internals of the SML/NJ compiler
       | for the Standard ML, and ultimately concludes that the returns
       | are quickly getting really diminishing with those, and since
       | datatypes can actually be abstract in ML (in pretty much the same
       | way they can be in Rust), the applicability of those tricks is
       | generally restricted to unexported, intramodular datatypes.
        
       ___________________________________________________________________
       (page generated 2025-04-10 23:00 UTC)