hngopher.com

       [HN Gopher] The Rust Performance Book (2020)
       ___________________________________________________________________
        
       The Rust Performance Book (2020)
        
       Author : vinhnx
       Score  : 194 points
       Date   : 2025-11-19 09:08 UTC (5 days ago)
        
 (HTM) web link (nnethercote.github.io)
 (TXT) w3m dump (nnethercote.github.io)
        
       | echelon wrote:
       | This is a great resource!
       | 
       | Some TILs:
       | 
       | Hashing
       | 
       | > The default hashing algorithm is not specified, but at the time
       | of writing the default is an algorithm called SipHash 1-3. This
       | algorithm is high quality--it provides high protection against
       | collisions--but is relatively slow, particularly for short keys
       | such as integers.
       | 
       | > An attempt to switch from fxhash back to the default hasher
       | resulted in slowdowns ranging from 4-84%!
       | 
       | I/O
       | 
       | > Rust's print! and println! macros lock stdout on every call. If
       | you have repeated calls to these macros it may be better to lock
       | stdout manually.
       | 
       | Build times
       | 
       | > If you use dev builds but don't often use a debugger, consider
       | disabling debuginfo. This can improve dev build times
       | significantly, by as much as 20-40%.
       | 
       | Interesting std library alternatives
       | 
       | > If you have many short vectors, you can use the SmallVec type
       | from the smallvec crate. SmallVec<[T; N]> is a drop-in
       | replacement for Vec that can store N elements within the SmallVec
       | itself, and then switches to a heap allocation if the number of
       | elements exceeds that.
       | 
       | > If you have many short vectors and you precisely know their
       | maximum length, ArrayVec from the arrayvec crate is a better
       | choice than SmallVec. It does not require the fallback to heap
       | allocation, which makes it a little faster.
       | 
       | > The SmallString type from the smallstr crate is similar to the
       | SmallVec type.
       | 
       | I doubt I'll change my use of the standard types often, but this
       | is good information to know for cases where this might be
       | applicable.
       | 
       | Advice on enums
       | 
       | > If an enum has an outsized variant, consider boxing one or more
       | fields.
       | 
       | I'm surprised I didn't see any advice about skipping proc macros
       | or Serde for faster compile times.
        
         | saghm wrote:
         | Most of these compile time improvements seem to be more along
         | the lines of drop-in changes that don't require larger
         | refactors. Removing something like serde from a codebase that
         | makes use of it generally is going to be a lot more work.
         | 
         | If you're referring to serde being brought in by a dependency
         | when you don't need it, most well-behaved crates should already
         | have this be something you opt into by specifying the feature
         | rather than something you need to go out of your way to enable.
         | That said, I've had a theory for a while now that when Rust
         | projects end up suffering from long compile times, the most
         | significant cause is unneeded code from dependencies getting
         | compiled, and that the poor ergonomics around Cargo features
         | have basically encouraged the opposite of the good behavior I
         | described above. I've still almost never seen this discussed
         | outside of when I bring it up, so I wrote up my thoughts on it
         | in a blog post a while back rather than try to restate my case
         | every time, but I don't have much hope that anyone will take it
         | seriously enough to either convince me I'm wrong or do anything
         | about it: https://saghm.com/cargo-features-rust-compile-times/
        
           | anonymousDan wrote:
           | Interesting. It feels like once you have the features defined
           | this is basically dead code elimination. To solve the
           | transitive dependency issue could you not execute a dead code
           | elimination pass once and cache the results?
        
             | saghm wrote:
             | Yes, I do think it resembles dead-code elimination at a
             | high-level. I don't think that doing it after the fact is
             | particularly desirable though, even with the results
             | cached. I went into more details in my response to a
             | sibling comment, but I think there are actually quite a lot
             | of reasons why someone in practice might still care about
             | the experience when doing a completely clean build rather
             | than an incremental one that can re-use the results from a
             | previous one. A lot of it might be overfitting from my own
             | personal experience, but I'm really not convinced that
             | fixing this issue purely by adding additional steps to
             | building that assume the earlier ones completed
             | successfully will end up with a better experience in the
             | long term; all it takes is one link in the chain to break
             | in order to invalidate all of the later ones, and I'd feel
             | much more confident in the toolchain if it were designed so
             | that each link was strengthened as much as possible instead
             | of extending the chain further to try to mitigate issues
             | with the entire thing. Adding new links in the chain might
             | improve the happy path, but it also increases the cost in
             | the unhappy path if the chain breaks, and arguably adds
             | more potential places where that can happen. I'm worried
             | that focusing too much on the happy path has led to an
             | experience where the risk of getting stuck on the unhappy
             | path has gotten too high precisely because of how much
             | easier it's been for us to keep adding links to the chain
             | than to address the structural integrity of it.
        
           | throwup238 wrote:
           | _> That said, I 've had a theory for a while now that when
           | Rust projects end up suffering from long compile times, the
           | most significant cause is unneeded code from dependencies
           | getting compiled, and that the poor ergonomics around Cargo
           | features have basically encouraged the opposite of the good
           | behavior I described above._
           | 
           | Are you talking about clean builds or incremental builds?
           | Rust developers care far more about the latter, and
           | dependencies should only be monomorphized not rebuilt.
        
             | saghm wrote:
             | Clean...ish? I might have atypical experience, but over the
             | years I've run into quite a lot of cases where dependencies
             | end up mattering more often than one might expect, and I
             | think that the focus on incremental builds makes sense in
             | theory but unfortunately misses a lot of common experiences
             | in practice.
             | 
             | For starters, any time I'm working on things on multiple
             | branches at once (e.g. fixing a bug on a longer-term stable
             | branch while also simultaneously prototyping a new feature
             | on the development branch), the versions of the
             | dependencies will potentially be different, and while it's
             | certainly possible to try to keep around every possible
             | past build in my `target` directory indefinitely (or cache
             | it with `sccache` for future use), it's just not always
             | something that's feasible. I'd also argue that the way most
             | dependencies are specified in practice isn't by being
             | pinned, but just by specifying the minimum version, and any
             | new dependency being added (or even updating an existing
             | one) can in practice cause a dependency to get bumped when
             | Cargo resolves the lockfile again. (As an aside, I've also
             | noticed a fairly common pattern where people seem to like
             | to specify their dependencies as `x.y` rather than `x.y.z`,
             | which based on the rules Cargo uses will potentially allow
             | minor version updates during resolution rather than just
             | patch versions, and I'm honestly not sure whether this is
             | actually the intent in most cases or due to a
             | misunderstanding about how Cargo resolves versions).
             | 
             | There's also the argument that given the six-week cadence
             | of the compiler releases (without an LTS version) and the
             | yearly questions on the Rust language survey trying to
             | learn about whether people actually experience breakage
             | when updating the compiler that there's an implied intent
             | for people to be updating their stable compiler as much as
             | possible, and if people followed that, it ostensibly puts a
             | maximum lifetime (pun intended) on clean builds. Pretty
             | much no project I've worked on actually updates their Rust
             | toolchain that often, which maybe is actually fine from the
             | standpoint of the compiler team, but as someone who has
             | been programming in the language for over a decade, I'm at
             | least unsure of what they're actually striving for here, so
             | if there is any chance they actually would like somewhat of
             | a regular cadence, I think it might need both more clear
             | communication and potentially a higher-level audit of what
             | they'd actually need to achieve for this to be palatable to
             | most developers.
             | 
             | This is probably even less common, but in the past few
             | years I've worked on multiple projects that use both docker
             | for builds (due to needing to support other
             | OS/architectures for deployment than are used for
             | development) and require some amount of build.rs codegen
             | for interop with other languages, and for reasons I've yet
             | to fully understand (although I suspect might be due to
             | weirdness with timestamps when copying files from the host
             | to a container), this always seems quite brittle, with the
             | generated files somehow getting out of sync with the actual
             | code at least weekly. Many people on the teams I've worked
             | on with this process seem to resort to just doing a fully
             | clean build whenever this happens, which is mildly
             | horrifying, but having looked into some of these issues,
             | even a best-case partial clean of just the out of sync
             | dependencies often ends up requiring a fairly large chunk
             | of the build time to get repeated (especially due to the
             | fact that you start losing parallelization in your builds
             | once you have fewer dependencies left than cores on your
             | machine ).
             | 
             | I'm not positive is a factor, but my very naive intuition
             | about static linking is that it would scale with the size
             | of code being linked to. When you have a large number of
             | dependencies, and each of them has a lot of dead code that
             | only gets stripped out as a post-build step, I would expect
             | that the linking would take longer. I'm honestly not sure
             | about whether this actually matters in practice though,
             | since it's unclear to me whether relinking after building
             | only a few final dependencies requires linking all of the
             | original dependencies or if it can start from a previous
             | build and then patch the dependencies that were rebuilt.
             | I've tried researching this, but either due to my lack of
             | skill or a lack of resources available, I haven't been able
             | to reach any conclusions about this.
             | 
             | All of this is ignoring the elephant in the room though,
             | which is that the above issues I've mentioned are all
             | fairly straightforward concerns when you're running `cargo
             | build` in the terminal. This isn't how most Rust developers
             | I've worked with actually work, though, at least when
             | actively making changes to the code; people tend to use
             | `rust-analyzer` via a plugin in their editor of choice. I
             | don't think in my years of Rust development I've met
             | someone who is _totally_ satisfied with how well rust-
             | analyzer handles edge-cases like the ones I describe above,
             | and in practice, I 've found that I need to completely
             | restart `rust-analyzer` far more often than I'd like, and
             | every time it triggers something that at least resembles a
             | clean build from looking at the log outputs. It would be
             | great to make rust-analyzer more reliable to reduce the
             | need for this, or maybe improve it so that it can cache
             | things more intelligently, but that's all optimizing for
             | the happy path. I feel pretty strongly that this wouldn't
             | be the best path forwards in the long term; I'd much rather
             | the toolchain is designed in a way where each component
             | should strive for the best possible experience independent
             | of whether the other components happen to run into issues.
             | For `rust-analyzer`, this would mean caching and reducing
             | the need to restart, but for `cargo build`, this would mean
             | striving to make even fully clean builds as painless as
             | possible. The alternative, in my eyes, is essentially
             | building a tower of cards that's beautiful while it remains
             | intact but quickly collapses as soon as one piece shifts
             | slightly out of place. I don't think this this is the ethos
             | that Rust embodies, and I'd find it disappointing if that's
             | what everyone settles on, but at times it does feel like
             | I'm a bit of an outlier in this regard.
        
           | epage wrote:
           | Pulling over my reddit comment
           | 
           | > Providing a mechanism to manually disable individual
           | default features when specifying a dependency
           | 
           | We want this on the Cargo team; somebody just needs to do the
           | design work, see https://github.com/rust-
           | lang/cargo/issues/3126
           | 
           | Note that there is a related problem of feature evolution.
           | Can you take existing functionality and move it behind a
           | feature? No because that would be a breaking change for
           | `default-features = false`. On the surface, it sounds like
           | disabling default features works around that and we can
           | remove `default-features = false` in an Edition but that is
           | in the dependents edition and doesn't control the
           | dependencies edition. We need something else to go with this.
           | 
           | > Providing a less verbose way for libraries to expose the
           | features of their direct dependencies to other packages that
           | depend on them directly
           | 
           | > Providing a way to disable features from transitive
           | dependencies
           | 
           | I'm tying these two together because I wonder how well the
           | same solution would work for this: maybe not everything
           | should be a feature but a ["global"](https://internals.rust-
           | lang.org/t/pre-rfc-mutually-excusive-...). Our idea for
           | solving mutually exclusive features is to provide a key-value
           | pair that a package defines and sets a default on and then
           | the final application can override it. Maybe we'll eventually
           | allow toolkits / sdks to also override but that can run into
           | unification issues and is a lower priority.
           | 
           | Otherwise, I think we'd need to dig into exact use cases to
           | make sure we have a good understanding of what kinds of
           | features in what situations that libraries need to re-export
           | before figuring out what design is appropriate for making
           | large scale feature management manageable.
           | 
           | > "Zero-config" features that allow enabling/disabling code
           | in a library without the author having to manually define it
           | 
           | We have `hints.mostly-unused` (https://doc.rust-
           | lang.org/cargo/reference/unstable.html#prof...) which defers
           | parts of the compilation process for most of a package until
           | we know whether it is needed. Currently, it is a mixed bag.
           | Explicit features still provide noticeable benefits. It is
           | mostly for packages like `windows-sys` and `aws-sdk`.
        
             | saghm wrote:
             | Thanks for the thoughtful response! I also had not been
             | aware someone posted this to reddit, so I appreciate the
             | heads-up on that as well.
             | 
             | I was not aware of at least some of the potential work
             | being considered on these things, so I'm definitely going
             | to start reading more about each of the things you've
             | linked over the next couple of days. Somehow I was under
             | the false impression that there wasn't much appetite for
             | the types of changes I've been hoping for, so all of this
             | is awesome to hear!
        
         | xnorswap wrote:
         | > slow, particularly for short keys such as integers
         | 
         | An interesting thing about the CLR is that the default hash for
         | integers is:                   h(x) = x.
         | 
         | Which as well as being collision-free, also avoids the trap of
         | a slow default hash.
        
           | Tuna-Fish wrote:
           | But if you know the hash table implementation, it makes
           | forcing collisions trivial for user-generated input, leading
           | to easy denial of service attacks.
           | 
           | The first requirement for safe hashtable implementations is a
           | secret key, which makes it impossible to know the hash value
           | for an external observer. (Or even, to know the _relative_
           | hash value between any two inputs.)
        
             | xnorswap wrote:
             | Did you reply to the wrong comment, I'm not sure it follows
             | from what I posted?
             | 
             | You can't force a collision for the function f(x) = x, by
             | definition it has no collisions. It's not just collision-
             | resistant, it's actually collision proof!
             | 
             | hash(x) = x is of course not a cryptographic hash, but it
             | also has the advantage that it's unlikely to be confused
             | for one if anyone looks at the output.
        
               | afishhh wrote:
               | It's not about collisions of the hash function itself.
               | 
               | Every hashtable implementation will put the hash value
               | through some sort of modulo because you generally don't
               | want to waste memory storing the key 5726591 at index
               | 5726591 in an array.
               | 
               | So if you know how the implementation works and the hash
               | function is predictable you can keep providing the
               | program with values that will consistently go into the
               | same bucket resulting in linear lookups and insertions.
        
               | kstrauser wrote:
               | Nitpick: it's still going to collide when taking the
               | modulo to decide what bucket to put it in. If you know
               | that the hashmap has 1000 buckets, and you spam it with
               | multiples of 1000, you could make it put every key into
               | the same bucket and have O(1) lookups turn into O(n).
               | That doesn't happen with real hash functions with random
               | seeds.
        
               | xnorswap wrote:
               | Oh right, got you.
        
             | josefx wrote:
             | > The first requirement for safe hashtable implementations
             | is a secret key,
             | 
             | Some languages use different approaches. The buckets in a
             | Java HashMap turn into a sorted tree if they grow too
             | large. Then there are trivial solutions like adding an
             | input limit for untrusted users. Either way works, is
             | secure and doesn't depend on a secret key.
        
           | mrec wrote:
           | Do you know how it then maps hashes to buckets? I'd expect
           | the natural occurrence of integer values to be very non-
           | uniform and heavily biased toward the small end.
        
           | tialaramex wrote:
           | The identity function. Several C++ implementations choose
           | this too. It is very cheap but has obvious problems which may
           | make you wish you'd paid more up front.
           | 
           | If you want this in Rust you can use:
           | https://crates.io/crates/integer-hasher -- being able to swap
           | out the hasher is something C++ folks have kinda wanted to do
           | for like 15-20 years but they have never got it over the
           | line.
           | 
           | I have some benchmarks I've been noodling with for a while,
           | measuring different ways to do a hashtable. I call the one
           | where we just do this operation but otherwise use the
           | ordinary Rust Swiss tables - IntHashMap in this code. For
           | some operations, and especially at small sizes, IntHashMap is
           | significantly better
           | 
           | But, for other operations, and especially at large sizes,
           | it's worse.
           | 
           | For example suppose we have 10 K->V pairs in our hash table,
           | when we're looking for one of the ten K values, we're much
           | faster in IntHashMap. However, if it's not there, IntHashMap
           | is slightly slower. Further, if we have the _first ten
           | thousands numbers_ instead of ten thousand random numbers,
           | like if we 'd made a hash table of serial numbers for
           | something - we're _ten times worse_ in IntHashMap and that 's
           | because our hashing function though fast, is very bad at its
           | job.
        
         | andrepd wrote:
         | RE ArrayVec: I would recommend the heapless crate instead,
         | better code and nicer API.
        
         | zipy124 wrote:
         | the locking of stdout on every call is common amongst a lot of
         | programming languages, a common issue when multi-threading code
         | where every thread is allowed to print to the terminal.
        
         | tialaramex wrote:
         | On a 64-bit machine the String type, and likewise a C++
         | std::string are 24 bytes, 8 bytes for a pointer to the
         | allocated memory on the heap, then twice that for a size and
         | capacity or their pointer equivalents depending on
         | implementation.
         | 
         | The 3rd party library type CompactString can fit up to 24 bytes
         | of UTF-8 text internally, yet it is still the same size as
         | String and just like String, Option<CompactString> is the same
         | size as CompactString. It does add complexity (and of course a
         | library dependency if you care about that) but if you have lots
         | of short strings this may be the best small string type for
         | you.
         | 
         | [The key is UTF-8 encoding can only end with certain bytes,
         | CompactString's documentation explains in more detail]
        
           | epage wrote:
           | For more string types, see https://github.com/rosetta-
           | rs/string-rosetta-rs
        
             | tialaramex wrote:
             | Yes - this is definitely about performance tuning which
             | means you should be measuring and not just assume whatever
             | you're about to do is a good idea. CompactString absolutely
             | might make the wrong trades for your needs and the
             | alternatives are very different.
        
         | epage wrote:
         | > Build times
         | 
         | > I'm surprised I didn't see any advice about skipping proc
         | macros or Serde for faster compile times.
         | 
         | A more comprehensive document on build times is at
         | https://corrode.dev/blog/tips-for-faster-rust-compile-times/
         | 
         | We're integrating parts of it into Cargo's official
         | documentation at https://doc.rust-
         | lang.org/nightly/cargo/guide/build-performa...
         | 
         | We're coordinating the work on that at https://github.com/rust-
         | lang/cargo/issues/16119
         | 
         | > > If you use dev builds but don't often use a debugger,
         | consider disabling debuginfo. This can improve dev build times
         | significantly, by as much as 20-40%.
         | 
         | We're wondering if we should split iterative development from
         | debugging to pull in these improvements (and maybe more). This
         | is being explored at https://github.com/rust-
         | lang/cargo/issues/15931
        
           | echelon wrote:
           | Any chance we could get macros that do physical-code-on-
           | machine codegen? (Like Google protobuf?)
           | 
           | I love Serde to death, but the compile times are getting
           | absolutely absurd. It's painful to sit around for three
           | minutes after a single code change.
           | 
           | If I could use Serde with all of its features, but have it
           | write out the source to disk so I can commit it to our repo,
           | that seems like it would vastly improve compile times. During
           | development, I'd love to be able to change our API definition
           | and be able to rapidly test changes. The productivity loss
           | with the current state of Serde and proc macros sucks.
           | 
           | Any chance of something like that happening? I'd use it in a
           | heartbeat. I would donate to fund this.
        
             | epage wrote:
             | I am very much a fan of codegen by generating code and
             | saving it to disk and advocate for it in place of
             | `build.rs` wherever possible. Unsure what the path forward
             | for doing that for proc-macros would look like. There is
             | talk of proc-macro expansion caching so rustc can decide
             | when it needs to re-run proc-macros. It requires more
             | information from proc-macros and performance results have
             | been a mixed bag.
             | 
             | What I am most looking forward to help with this are:
             | 
             | - reflection which is being worked on, see https://rust-
             | lang.github.io/rust-project-goals/2025h2/reflec...
             | 
             | - declarative attribute and derive macros which have an
             | initial implementation but more work around them is needed
             | to make them viable
        
         | johnofthesea wrote:
         | > > If an enum has an outsized variant, consider boxing one or
         | more fields.
         | 
         | Clippy (at least pedantic one) will suggest this if needed.
        
       | hedora wrote:
       | Is there something similar to this for rust wasm?
       | 
       | I have some frontend use cases for rust that I just ended up
       | rewriting in typescript because transferring and loading the wasm
       | rust blob is more expensive than running the program.
       | 
       | I imagine wasm-conscious optimizations would look a lot like
       | targeting microcontrollers, but with weird escape hatches to
       | high-level browser apis.
        
         | vinhnx wrote:
         | I find the RustWasm book has a section on profiling throughput
         | and latency performance for Rust Wasm:
         | https://rustwasm.github.io/book/reference/time-
         | profiling.htm.... Hope it helps.
        
         | tracker1 wrote:
         | Here's something I found: https://github.com/rustwasm/book
         | 
         | Though it's archived and seems to be abandoned. And there
         | doesn't seem to be a replacement for it...
         | 
         | https://blog.rust-lang.org/inside-rust/2025/07/21/sunsetting...
        
       | vinhnx wrote:
       | Print version (full page) https://nnethercote.github.io/perf-
       | book/print.html
        
       | pawurb wrote:
       | I got a ton of value from this book. It actually pushed me to
       | dive deeper into profiling and eventually start building Rust
       | perf tooling of my own https://github.com/pawurb/hotpath-rs
        
       | wiz21c wrote:
       | I miss SIMD...
        
         | misja111 wrote:
         | There is this:
         | 
         | > CPU Specific Instructions
         | 
         | > If you do not care about the compatibility of your binary on
         | older (or other types of) processors, you can tell the compiler
         | to generate the newest (and potentially fastest) instructions
         | specific to a certain CPU architecture, such as AVX SIMD
         | instructions for x86-64 CPUs.
         | 
         | > To request these instructions from the command line, use the
         | -C target-cpu=native flag
        
       ___________________________________________________________________
       (page generated 2025-11-24 23:01 UTC)