hngopher.com

       [HN Gopher] Wild performance tricks
       ___________________________________________________________________
        
       Wild performance tricks
        
       Author : tbillington
       Score  : 65 points
       Date   : 2025-09-23 08:16 UTC (2 days ago)
        
 (HTM) web link (davidlattimore.github.io)
 (TXT) w3m dump (davidlattimore.github.io)
        
       | Strilanc wrote:
       | Every one of these "performance tricks" is describing how to
       | convince rust's borrow checker that you're allowed to do a thing.
       | It's more like "performance permission slips".
        
         | oleganza wrote:
         | You don't have to play this game - you can always write within
         | unsafe { ... } like in plain old C or C++. But people do choose
         | to play this game because it helps them to write code that is
         | also correct, where "correct" has an old-school meaning of
         | "actually doing what it is supposed to do and not doing what
         | it's not supposed to".
        
           | ManlyBread wrote:
           | That just makes it seem like there's no point in using this
           | language in the first place.
        
             | maccard wrote:
             | Dont let perfect be the enemy of good.
             | 
             | Software is built on abstractions - if all your app code is
             | written without unsafe and you have one low level unsafe
             | block to allow for something, you get the value of rust for
             | all your app logic and you know the actual bug is in the
             | unsafe code
        
         | Ar-Curunir wrote:
         | This is an issue that you would face in any language with
         | strong typing. It only rears its head in Rust because Rust
         | tries to give you both low-level control _and_ strong types.
         | 
         | For example, in something like Go (which has a weaker type
         | system than Rust), you wouldn't think twice about, paying for
         | the re-allocation in buffer-reuse example.
         | 
         | Of course, in something like C or C++ you could do these things
         | via simple pointer casts, but then you run the risk of
         | violating some undefined behaviour.
        
           | jstimpfle wrote:
           | In C I wouldn't use such a fluffy high-level approach in the
           | first place. I wouldn't use contiguous unbounded vec-slices.
           | And no, I wouldn't attempt trickery with overwriting input
           | buffers. That's a bad inflexible approach that will bite at
           | the next refactor. Instead, I would first make sure there's a
           | way to cheaply allocate fixed size buffers (like 4 K buffers
           | or whatever) and stream into those. Memory should be used in
           | a allocate/write-once/release fashion whenever possible. This
           | approach leads to straightforward, efficient architecture and
           | bug-free code. It's also much better for
           | concurrency/parallelism.
        
             | kibwen wrote:
             | _> In C I wouldn 't use such a fluffy high-level approach
             | in the first place._
             | 
             | Sure, though that's because C has abstraction like Mars has
             | a breathable atmosphere.
             | 
             |  _> This approach leads to straightforward, efficient
             | architecture and bug-free code. It 's also much better for
             | concurrency/parallelism._
             | 
             | This claim is wild considering that Rust code is more bug-
             | free than C code while being just as efficient, while
             | keeping in mind that Rust makes parallelism so much easier
             | than C that it's stops being funny and starts being tragic.
        
             | dwattttt wrote:
             | > straightforward, efficient architecture and bug-free code
             | 
             | The grace with which C handles projects of high complexity
             | disagrees.
             | 
             | You get a simple implementation only by ignoring edge cases
             | or improvements that increase complexity.
        
           | jandrewrogers wrote:
           | > in something like C or C++ you could do these things via
           | simple pointer casts
           | 
           | No you don't. You explicitly start a new object lifetime at
           | the address, either of the same type or a different type.
           | There are standard mechanisms for this.
           | 
           | Developers that can't be bothered to do things correctly is
           | why languages like Rust exist.
        
             | Ar-Curunir wrote:
             | And that is safer... how?
        
         | jstimpfle wrote:
         | Yup -- yet another article only solving language level problems
         | instead of teaching something about real constraints (i.e.
         | hardware performance characteristics). Booooring. This kind of
         | article is why I still haven't mustered the energy to get up to
         | date with Rust. I'm still writing C (or C-in-C++) and having
         | fun, most of the time feeling like I'm solving actual technical
         | problems.
        
         | the-smug-one wrote:
         | The rayon thing is neat.
        
         | kibwen wrote:
         | ...Except that Rust is thread-safe, so expressing your
         | algorithm in terms that the borrow checker accepts makes safe
         | parallelism possible, as shown in the example using Rayon to
         | trivially parallelize an operation. This is the whole point of
         | Rust, and to say that C and C++ fail at thread-safety would be
         | the understatement of the century.
        
       | Cheetahlee01 wrote:
       | Just want to know some hacking tricks
        
       | vlovich123 wrote:
       | > Now that we have a Vec with no non-static lifetimes, we can
       | safely move it to another thread.
       | 
       | I liked most of the tricks but this one seems pointless. This is
       | no different than transmute as accessing the borrower requires an
       | assume_init which I believe is technically UB when called on an
       | uninit. Unless the point is that you're going to be working with
       | Owned but want to just transmute the Vec safely.
       | 
       | Overall I like the into_iter/collect trick to avoid unsafe. It
       | was also most of the article, just various ways to apply this
       | trick in different scenarios. Very neat!
        
         | ComputerGuru wrote:
         | You misunderstood the purpose of that trick. The vector is not
         | going to be accessed again, the idea is to move it to another
         | thread so it can be dropped in parallel (never accessed).
        
       | quotemstr wrote:
       | > Even if it were stable, it only works with slices of primitive
       | types, so we'd have to lose our newtypes (SymbolId etc).
       | 
       | That's weird. I'd expect it to work with _any_ type, primitive or
       | not, newtype or not, with a sufficiently simple memory layout,
       | the rough equivalent of what C++ calls a "standard-layout type"
       | or (formerly) a "POD".
       | 
       | I don't like magical stdlibs and I don't like user types being
       | less powerful than built-in ones.
       | 
       | Clever workaround doing a no-op transformation of the whole
       | vector though! Very nearly zero-cost.
       | 
       | > It would be possible to ensure that the proper Vec was restored
       | for use-cases where that was important, however it would add
       | extra complexity and might be enough to convince me that it'd be
       | better to just use transmute.
       | 
       | Great example of Rust being built such that you have to deal with
       | error returns _and_ think about C++-style exception safety.
       | 
       | > The optimisation in the Rust standard library that allows reuse
       | of the heap allocation will only actually work if the size and
       | alignment of T and U are the same
       | 
       | Shouldn't it work when T and U are the same size and T has
       | stricter alignment requirements than U but not exactly the same
       | alignment? In this situation, any U would be properly aligned
       | because T is even more aligned.
        
         | aw1621107 wrote:
         | > I'd expect it to work with _any_ type, primitive or not,
         | newtype or not, with a sufficiently simple memory layout, the
         | rough equivalent of what C++ calls a "standard-layout type" or
         | (formerly) a "POD".
         | 
         | This might be related in part to the fact that Rust chose to
         | create specific AtomicU8/AtomicU16/etc. types instead of going
         | for Atomic<T> like in C++. The reasoning for forgoing the
         | latter is [0]:
         | 
         | > However the consensus was that having unsupported atomic
         | types either fail at monomorphization time or fall back to
         | lock-based implementations was undesirable.
         | 
         | That doesn't mean that one couldn't hypothetically try to write
         | from_mut_slice<T> where T is a transparent newtype over one of
         | the supported atomics, but I'm not sure whether that function
         | signature is expressible at the moment. Maybe if/when safe
         | transmutes land, since from_mut_slice is basically just doing a
         | transmute?
         | 
         | > Shouldn't it work when T and U are the same size and T has
         | stricter alignment requirements than U but not exactly the same
         | alignment? In this situation, any U would be properly aligned
         | because T is even more aligned.
         | 
         | I think this optimization does what you say? A quick skim of
         | the source code [1] seems to show that the alignments don't
         | have to exactly match:                   //! # Layout
         | constraints         //! <snip>         //! Alignments of `T`
         | must be the same or larger than `U`. Since alignments are
         | always a power         //! of two _larger_ implies _is a
         | multiple of_.
         | 
         | And later:                   const fn
         | in_place_collectible<DEST, SRC>(             step_merge:
         | Option<NonZeroUsize>,             step_expand:
         | Option<NonZeroUsize>,         ) -> bool {             if const
         | { SRC::IS_ZST || DEST::IS_ZST || mem::align_of::<SRC>() <
         | mem::align_of::<DEST>() } {                 return false;
         | }             // Other code that deals with non-alignment
         | conditions         }
         | 
         | [0]:
         | https://github.com/Amanieu/rfcs/blob/more_atomic_types/text/...
         | 
         | [1]: https://github.com/rust-
         | lang/rust/blob/c58a5da7d48ff3887afe4...
        
           | quotemstr wrote:
           | > I think this optimization does what you say?
           | 
           | Cool. Thanks for checking! I guess the article should be
           | tweaked a bit --- it states that the alignment has to match
           | exactly.
        
         | 0x1ceb00da wrote:
         | > Great example of Rust being built such that you have to deal
         | with error returns and think about C++-style exception safety.
         | 
         | Not really. Panics are supposed to be used in super exceptional
         | situations, where the only course of action is to abort the
         | whole unit of work you're doing and throw away all the
         | resources. However you do have to be careful in critical code
         | because things like integer overflow can also raise a panic.
        
           | quotemstr wrote:
           | > be careful in critical code because things like integer
           | overflow can also raise a panic
           | 
           | So you can basically panic anywhere. I understand people have
           | looked at no-panic markers (like C++ noexcept) but the
           | proposals haven't gone anywhere. Consequently, you need to
           | maintain the basic exception safety guarantee [1] at all
           | times. In safe Rust, the compiler enforces this level of
           | safety in most cases on its own, but there are situations in
           | which you can temporarily violate program invariants and
           | panic before being able to restore them. (A classic example
           | is debiting from one bank account before crediting to
           | another. If you panic in the middle, the money is lost.)
           | 
           | If you want that bank code to be robust against panics, you
           | need to use something like
           | https://docs.rs/scopeguard/latest/scopeguard/
           | 
           | In unsafe Rust, you basically have the same burden of
           | exception safety that C++ creates, except your job as an
           | unsafe Rust programmer is harder than a C++ programmer's
           | because Rust doesn't have a noexcept. Without noexcept, it's
           | hard to reason about which calls can panic and which can't,
           | so it's hard to make bulletproof cleanup paths.
           | 
           | Most Rust programmers don't think much about panics, so I
           | assume most Rust programs are full of latent bugs of this
           | sort. That's why I usually recommend panic=abort.
           | 
           | [1] https://en.wikipedia.org/wiki/Exception_safety#Classifica
           | tio...
        
       | ComputerGuru wrote:
       | I don't like relying on (release-only) llvm optimizations for a
       | number of reasons, but primarily a) they break between releases,
       | more often than you'd think, b) they're part of the reason why
       | debug builds of rust software are so much slower (at runtime)
       | than release builds, c) they're much harder to verify (and very
       | opaque).
       | 
       | For non-performance-sensitive code, sure, go ahead and rely on
       | the rust compiler to compile away the allocation of a whole new
       | vector of a different type to convert from T to AtomicT, but
       | where the performance matters, for my money I would go with the
       | transmute 100% of the time (assuming the underlying type was
       | decorated with #[transparent], though it would be nice if we
       | could statically assert that). It'll perform better in debug
       | mode, it's obvious what you are doing, it's guaranteed not to
       | break in a minor rustc update, and it'll work with &mut [T]
       | instead of an owned Vec<T> (which is a big one).
        
         | mjmas wrote:
         | The particular optimisation for non-copying Vec -> IntoIter ->
         | Vec transform is actually hard coded in the standard library as
         | a special case of collecting an Iterator into a Vec. It doesn't
         | rely on the backend for this.
         | 
         | Though this optimisation is treated as an implementation detail
         | [1].
         | 
         | [1]: https://doc.rust-
         | lang.org/stable/std/vec/struct.Vec.html#imp...
        
       | 0x1ceb00da wrote:
       | > It'd be reasonable to think that this will have a runtime cost,
       | however it doesn't. The reason is that the Rust standard library
       | has a nice optimisation in it that when we consume a Vec and
       | collect the result into a new Vec, in many circumstances, the
       | heap allocation of the original Vec can be reused. This applies
       | in this case. But what even with the heap allocation being
       | reused, we're still looping over all the elements to transform
       | them right? Because the in-memory representation of an
       | AtomicSymbolId is identical to that of a SymbolId, our loop
       | becomes a no-op and is optimised away.
       | 
       | Those optimisations that this code relies on are literally
       | undefined behaviour. The compiler doesn't guarantee it's gonna
       | apply those optimisations. So your code might suddenly become
       | super slow and you'll have to go digging in to see why. Is this
       | undefined behaviour better than just having an unsafe block? I'm
       | not so sure. The unsafe code will be easier to read and you won't
       | need any comments or a blog to explain why we're doing voodoo
       | stuff because the logic of the code will explain its intentions.
        
         | stouset wrote:
         | You're misusing the term "undefined behavior". You can
         | certainly say that these kinds of performance optimizations
         | aren't guaranteed.
        
         | steveklabnik wrote:
         | > Those optimisations that this code relies on are literally
         | undefined behaviour.
         | 
         | You cannot get undefined behavior in Rust without an unsafe
         | block.
         | 
         | > The compiler doesn't guarantee it's gonna apply those
         | optimisations.
         | 
         | This is a different concept than UB.
         | 
         | However, for the "heap allocation can be re-used", Rust does
         | talk about this: https://doc.rust-
         | lang.org/stable/std/vec/struct.Vec.html#imp...
         | 
         | It cannot guarantee it for arbitrary iterators, but the
         | map().collect() re-use is well known, and the machinery is
         | there to do this, so while other implementations may not, rustc
         | always will.
         | 
         | Basically, it is implementation-defined behavior. (If it were
         | C/C++ it would be 'unspecified behavior' because rustc does not
         | document exactly when it does this, but this is a _very_ fine
         | nitpick and not language Rust currently uses, though I 'd argue
         | it should.)
         | 
         | > So your code might suddenly become super slow and you'll have
         | to go digging in to see why.
         | 
         | That's why wild has performance tests, to ensure that if a
         | change breaks rustc's ability to optimize, it'll be noticed,
         | and therefore fixed.
        
           | 0x1ceb00da wrote:
           | > That's why wild has performance tests, to ensure that if a
           | change breaks rustc's ability to optimize, it'll be noticed,
           | and therefore fixed.
           | 
           | But benchmarks won't tell us which optimisation suddenly
           | stopped working. This looks so similar to the argument
           | against UB to me. Something breaks, but you don't know what,
           | where, and why.
        
             | steveklabnik wrote:
             | It is true that it won't tell you, for sure. It's just that
             | UB means something very specific when discussing language
             | semantics.
        
               | 0x1ceb00da wrote:
               | I see. These optimisations might not be UB as understood
               | in compiler lingo, but it is a kind of "undefined
               | behaviour", as in anything could happen. And honestly the
               | problems it might cause don't look that different from
               | those caused by UB (from compiler lingo). Not to mention,
               | using unsafe for writing optimised code will generate
               | same-ish code in both debug and release mode, so DX will
               | be better too.
        
               | Ar-Curunir wrote:
               | The optimization not getting applied doesn't mean that
               | "anything could happen". Your code would just run slower.
               | The result of this computation would still be correct and
               | would match what you would expect to happen. This is the
               | opposite of undefined behaviour, where the result is
               | literally undefined, and, in particular, can be garbage.
        
               | tuckerman wrote:
               | As an example, parts of the C++ standard library (none of
               | the core language I believe though) are covered by
               | complexity requirements but implementations can still
               | vary widely, e.g. std::sort needs to be linearithmic but
               | someone could still implement a very slow version without
               | it being UB (even if it was quadratic or something it
               | still wouldn't be UB but wouldn't be standards
               | conforming).
               | 
               | UB is really about the observable behavior of the
               | abstract machine which is limited to the reads/writes to
               | volatile data and I/O library calls [1]
               | 
               | [1] http://open-std.org/jtc1/sc22/open/n2356/intro.html
               | 
               | Edit: to clarify the example
        
       | ashvardanian wrote:
       | I'd strongly caution against many of those "performance tricks."
       | Spawning an asynchronous task on a separate thread, often with a
       | heap-allocated handle, solely to deallocate a local object is a
       | dubious pattern -- especially given how typical allocators behave
       | under the hood.
       | 
       | I frequently encounter use-cases akin to the "Sharded Vec Writer"
       | idea, and I agree it can be valuable. But if performance is a
       | genuine requirement, the implementation needs to be very
       | different. I once attempted to build a general-purpose trait for
       | performing parallel in-place updates of a Vec<T>, and found it
       | extremely difficult to express cleanly in Rust without
       | degenerating into unsafe or brittle abstractions.
        
       ___________________________________________________________________
       (page generated 2025-09-25 23:00 UTC)