[HN Gopher] The Rust Performance Book (2020)
___________________________________________________________________
The Rust Performance Book (2020)
Author : vinhnx
Score : 194 points
Date : 2025-11-19 09:08 UTC (5 days ago)
(HTM) web link (nnethercote.github.io)
(TXT) w3m dump (nnethercote.github.io)
| echelon wrote:
| This is a great resource!
|
| Some TILs:
|
| Hashing
|
| > The default hashing algorithm is not specified, but at the time
| of writing the default is an algorithm called SipHash 1-3. This
| algorithm is high quality--it provides high protection against
| collisions--but is relatively slow, particularly for short keys
| such as integers.
|
| > An attempt to switch from fxhash back to the default hasher
| resulted in slowdowns ranging from 4-84%!
|
| I/O
|
| > Rust's print! and println! macros lock stdout on every call. If
| you have repeated calls to these macros it may be better to lock
| stdout manually.
|
| Build times
|
| > If you use dev builds but don't often use a debugger, consider
| disabling debuginfo. This can improve dev build times
| significantly, by as much as 20-40%.
|
| Interesting std library alternatives
|
| > If you have many short vectors, you can use the SmallVec type
| from the smallvec crate. SmallVec<[T; N]> is a drop-in
| replacement for Vec that can store N elements within the SmallVec
| itself, and then switches to a heap allocation if the number of
| elements exceeds that.
|
| > If you have many short vectors and you precisely know their
| maximum length, ArrayVec from the arrayvec crate is a better
| choice than SmallVec. It does not require the fallback to heap
| allocation, which makes it a little faster.
|
| > The SmallString type from the smallstr crate is similar to the
| SmallVec type.
|
| I doubt I'll change my use of the standard types often, but this
| is good information to know for cases where this might be
| applicable.
|
| Advice on enums
|
| > If an enum has an outsized variant, consider boxing one or more
| fields.
|
| I'm surprised I didn't see any advice about skipping proc macros
| or Serde for faster compile times.
| saghm wrote:
| Most of these compile time improvements seem to be more along
| the lines of drop-in changes that don't require larger
| refactors. Removing something like serde from a codebase that
| makes use of it generally is going to be a lot more work.
|
| If you're referring to serde being brought in by a dependency
| when you don't need it, most well-behaved crates should already
| have this be something you opt into by specifying the feature
| rather than something you need to go out of your way to enable.
| That said, I've had a theory for a while now that when Rust
| projects end up suffering from long compile times, the most
| significant cause is unneeded code from dependencies getting
| compiled, and that the poor ergonomics around Cargo features
| have basically encouraged the opposite of the good behavior I
| described above. I've still almost never seen this discussed
| outside of when I bring it up, so I wrote up my thoughts on it
| in a blog post a while back rather than try to restate my case
| every time, but I don't have much hope that anyone will take it
| seriously enough to either convince me I'm wrong or do anything
| about it: https://saghm.com/cargo-features-rust-compile-times/
| anonymousDan wrote:
| Interesting. It feels like once you have the features defined
| this is basically dead code elimination. To solve the
| transitive dependency issue could you not execute a dead code
| elimination pass once and cache the results?
| saghm wrote:
| Yes, I do think it resembles dead-code elimination at a
| high-level. I don't think that doing it after the fact is
| particularly desirable though, even with the results
| cached. I went into more details in my response to a
| sibling comment, but I think there are actually quite a lot
| of reasons why someone in practice might still care about
| the experience when doing a completely clean build rather
| than an incremental one that can re-use the results from a
| previous one. A lot of it might be overfitting from my own
| personal experience, but I'm really not convinced that
| fixing this issue purely by adding additional steps to
| building that assume the earlier ones completed
| successfully will end up with a better experience in the
| long term; all it takes is one link in the chain to break
| in order to invalidate all of the later ones, and I'd feel
| much more confident in the toolchain if it were designed so
| that each link was strengthened as much as possible instead
| of extending the chain further to try to mitigate issues
| with the entire thing. Adding new links in the chain might
| improve the happy path, but it also increases the cost in
| the unhappy path if the chain breaks, and arguably adds
| more potential places where that can happen. I'm worried
| that focusing too much on the happy path has led to an
| experience where the risk of getting stuck on the unhappy
| path has gotten too high precisely because of how much
| easier it's been for us to keep adding links to the chain
| than to address the structural integrity of it.
| throwup238 wrote:
| _> That said, I 've had a theory for a while now that when
| Rust projects end up suffering from long compile times, the
| most significant cause is unneeded code from dependencies
| getting compiled, and that the poor ergonomics around Cargo
| features have basically encouraged the opposite of the good
| behavior I described above._
|
| Are you talking about clean builds or incremental builds?
| Rust developers care far more about the latter, and
| dependencies should only be monomorphized not rebuilt.
| saghm wrote:
| Clean...ish? I might have atypical experience, but over the
| years I've run into quite a lot of cases where dependencies
| end up mattering more often than one might expect, and I
| think that the focus on incremental builds makes sense in
| theory but unfortunately misses a lot of common experiences
| in practice.
|
| For starters, any time I'm working on things on multiple
| branches at once (e.g. fixing a bug on a longer-term stable
| branch while also simultaneously prototyping a new feature
| on the development branch), the versions of the
| dependencies will potentially be different, and while it's
| certainly possible to try to keep around every possible
| past build in my `target` directory indefinitely (or cache
| it with `sccache` for future use), it's just not always
| something that's feasible. I'd also argue that the way most
| dependencies are specified in practice isn't by being
| pinned, but just by specifying the minimum version, and any
| new dependency being added (or even updating an existing
| one) can in practice cause a dependency to get bumped when
| Cargo resolves the lockfile again. (As an aside, I've also
| noticed a fairly common pattern where people seem to like
| to specify their dependencies as `x.y` rather than `x.y.z`,
| which based on the rules Cargo uses will potentially allow
| minor version updates during resolution rather than just
| patch versions, and I'm honestly not sure whether this is
| actually the intent in most cases or due to a
| misunderstanding about how Cargo resolves versions).
|
| There's also the argument that given the six-week cadence
| of the compiler releases (without an LTS version) and the
| yearly questions on the Rust language survey trying to
| learn about whether people actually experience breakage
| when updating the compiler that there's an implied intent
| for people to be updating their stable compiler as much as
| possible, and if people followed that, it ostensibly puts a
| maximum lifetime (pun intended) on clean builds. Pretty
| much no project I've worked on actually updates their Rust
| toolchain that often, which maybe is actually fine from the
| standpoint of the compiler team, but as someone who has
| been programming in the language for over a decade, I'm at
| least unsure of what they're actually striving for here, so
| if there is any chance they actually would like somewhat of
| a regular cadence, I think it might need both more clear
| communication and potentially a higher-level audit of what
| they'd actually need to achieve for this to be palatable to
| most developers.
|
| This is probably even less common, but in the past few
| years I've worked on multiple projects that use both docker
| for builds (due to needing to support other
| OS/architectures for deployment than are used for
| development) and require some amount of build.rs codegen
| for interop with other languages, and for reasons I've yet
| to fully understand (although I suspect might be due to
| weirdness with timestamps when copying files from the host
| to a container), this always seems quite brittle, with the
| generated files somehow getting out of sync with the actual
| code at least weekly. Many people on the teams I've worked
| on with this process seem to resort to just doing a fully
| clean build whenever this happens, which is mildly
| horrifying, but having looked into some of these issues,
| even a best-case partial clean of just the out of sync
| dependencies often ends up requiring a fairly large chunk
| of the build time to get repeated (especially due to the
| fact that you start losing parallelization in your builds
| once you have fewer dependencies left than cores on your
| machine ).
|
| I'm not positive is a factor, but my very naive intuition
| about static linking is that it would scale with the size
| of code being linked to. When you have a large number of
| dependencies, and each of them has a lot of dead code that
| only gets stripped out as a post-build step, I would expect
| that the linking would take longer. I'm honestly not sure
| about whether this actually matters in practice though,
| since it's unclear to me whether relinking after building
| only a few final dependencies requires linking all of the
| original dependencies or if it can start from a previous
| build and then patch the dependencies that were rebuilt.
| I've tried researching this, but either due to my lack of
| skill or a lack of resources available, I haven't been able
| to reach any conclusions about this.
|
| All of this is ignoring the elephant in the room though,
| which is that the above issues I've mentioned are all
| fairly straightforward concerns when you're running `cargo
| build` in the terminal. This isn't how most Rust developers
| I've worked with actually work, though, at least when
| actively making changes to the code; people tend to use
| `rust-analyzer` via a plugin in their editor of choice. I
| don't think in my years of Rust development I've met
| someone who is _totally_ satisfied with how well rust-
| analyzer handles edge-cases like the ones I describe above,
| and in practice, I 've found that I need to completely
| restart `rust-analyzer` far more often than I'd like, and
| every time it triggers something that at least resembles a
| clean build from looking at the log outputs. It would be
| great to make rust-analyzer more reliable to reduce the
| need for this, or maybe improve it so that it can cache
| things more intelligently, but that's all optimizing for
| the happy path. I feel pretty strongly that this wouldn't
| be the best path forwards in the long term; I'd much rather
| the toolchain is designed in a way where each component
| should strive for the best possible experience independent
| of whether the other components happen to run into issues.
| For `rust-analyzer`, this would mean caching and reducing
| the need to restart, but for `cargo build`, this would mean
| striving to make even fully clean builds as painless as
| possible. The alternative, in my eyes, is essentially
| building a tower of cards that's beautiful while it remains
| intact but quickly collapses as soon as one piece shifts
| slightly out of place. I don't think this this is the ethos
| that Rust embodies, and I'd find it disappointing if that's
| what everyone settles on, but at times it does feel like
| I'm a bit of an outlier in this regard.
| epage wrote:
| Pulling over my reddit comment
|
| > Providing a mechanism to manually disable individual
| default features when specifying a dependency
|
| We want this on the Cargo team; somebody just needs to do the
| design work, see https://github.com/rust-
| lang/cargo/issues/3126
|
| Note that there is a related problem of feature evolution.
| Can you take existing functionality and move it behind a
| feature? No because that would be a breaking change for
| `default-features = false`. On the surface, it sounds like
| disabling default features works around that and we can
| remove `default-features = false` in an Edition but that is
| in the dependents edition and doesn't control the
| dependencies edition. We need something else to go with this.
|
| > Providing a less verbose way for libraries to expose the
| features of their direct dependencies to other packages that
| depend on them directly
|
| > Providing a way to disable features from transitive
| dependencies
|
| I'm tying these two together because I wonder how well the
| same solution would work for this: maybe not everything
| should be a feature but a ["global"](https://internals.rust-
| lang.org/t/pre-rfc-mutually-excusive-...). Our idea for
| solving mutually exclusive features is to provide a key-value
| pair that a package defines and sets a default on and then
| the final application can override it. Maybe we'll eventually
| allow toolkits / sdks to also override but that can run into
| unification issues and is a lower priority.
|
| Otherwise, I think we'd need to dig into exact use cases to
| make sure we have a good understanding of what kinds of
| features in what situations that libraries need to re-export
| before figuring out what design is appropriate for making
| large scale feature management manageable.
|
| > "Zero-config" features that allow enabling/disabling code
| in a library without the author having to manually define it
|
| We have `hints.mostly-unused` (https://doc.rust-
| lang.org/cargo/reference/unstable.html#prof...) which defers
| parts of the compilation process for most of a package until
| we know whether it is needed. Currently, it is a mixed bag.
| Explicit features still provide noticeable benefits. It is
| mostly for packages like `windows-sys` and `aws-sdk`.
| saghm wrote:
| Thanks for the thoughtful response! I also had not been
| aware someone posted this to reddit, so I appreciate the
| heads-up on that as well.
|
| I was not aware of at least some of the potential work
| being considered on these things, so I'm definitely going
| to start reading more about each of the things you've
| linked over the next couple of days. Somehow I was under
| the false impression that there wasn't much appetite for
| the types of changes I've been hoping for, so all of this
| is awesome to hear!
| xnorswap wrote:
| > slow, particularly for short keys such as integers
|
| An interesting thing about the CLR is that the default hash for
| integers is: h(x) = x.
|
| Which as well as being collision-free, also avoids the trap of
| a slow default hash.
| Tuna-Fish wrote:
| But if you know the hash table implementation, it makes
| forcing collisions trivial for user-generated input, leading
| to easy denial of service attacks.
|
| The first requirement for safe hashtable implementations is a
| secret key, which makes it impossible to know the hash value
| for an external observer. (Or even, to know the _relative_
| hash value between any two inputs.)
| xnorswap wrote:
| Did you reply to the wrong comment, I'm not sure it follows
| from what I posted?
|
| You can't force a collision for the function f(x) = x, by
| definition it has no collisions. It's not just collision-
| resistant, it's actually collision proof!
|
| hash(x) = x is of course not a cryptographic hash, but it
| also has the advantage that it's unlikely to be confused
| for one if anyone looks at the output.
| afishhh wrote:
| It's not about collisions of the hash function itself.
|
| Every hashtable implementation will put the hash value
| through some sort of modulo because you generally don't
| want to waste memory storing the key 5726591 at index
| 5726591 in an array.
|
| So if you know how the implementation works and the hash
| function is predictable you can keep providing the
| program with values that will consistently go into the
| same bucket resulting in linear lookups and insertions.
| kstrauser wrote:
| Nitpick: it's still going to collide when taking the
| modulo to decide what bucket to put it in. If you know
| that the hashmap has 1000 buckets, and you spam it with
| multiples of 1000, you could make it put every key into
| the same bucket and have O(1) lookups turn into O(n).
| That doesn't happen with real hash functions with random
| seeds.
| xnorswap wrote:
| Oh right, got you.
| josefx wrote:
| > The first requirement for safe hashtable implementations
| is a secret key,
|
| Some languages use different approaches. The buckets in a
| Java HashMap turn into a sorted tree if they grow too
| large. Then there are trivial solutions like adding an
| input limit for untrusted users. Either way works, is
| secure and doesn't depend on a secret key.
| mrec wrote:
| Do you know how it then maps hashes to buckets? I'd expect
| the natural occurrence of integer values to be very non-
| uniform and heavily biased toward the small end.
| tialaramex wrote:
| The identity function. Several C++ implementations choose
| this too. It is very cheap but has obvious problems which may
| make you wish you'd paid more up front.
|
| If you want this in Rust you can use:
| https://crates.io/crates/integer-hasher -- being able to swap
| out the hasher is something C++ folks have kinda wanted to do
| for like 15-20 years but they have never got it over the
| line.
|
| I have some benchmarks I've been noodling with for a while,
| measuring different ways to do a hashtable. I call the one
| where we just do this operation but otherwise use the
| ordinary Rust Swiss tables - IntHashMap in this code. For
| some operations, and especially at small sizes, IntHashMap is
| significantly better
|
| But, for other operations, and especially at large sizes,
| it's worse.
|
| For example suppose we have 10 K->V pairs in our hash table,
| when we're looking for one of the ten K values, we're much
| faster in IntHashMap. However, if it's not there, IntHashMap
| is slightly slower. Further, if we have the _first ten
| thousands numbers_ instead of ten thousand random numbers,
| like if we 'd made a hash table of serial numbers for
| something - we're _ten times worse_ in IntHashMap and that 's
| because our hashing function though fast, is very bad at its
| job.
| andrepd wrote:
| RE ArrayVec: I would recommend the heapless crate instead,
| better code and nicer API.
| zipy124 wrote:
| the locking of stdout on every call is common amongst a lot of
| programming languages, a common issue when multi-threading code
| where every thread is allowed to print to the terminal.
| tialaramex wrote:
| On a 64-bit machine the String type, and likewise a C++
| std::string are 24 bytes, 8 bytes for a pointer to the
| allocated memory on the heap, then twice that for a size and
| capacity or their pointer equivalents depending on
| implementation.
|
| The 3rd party library type CompactString can fit up to 24 bytes
| of UTF-8 text internally, yet it is still the same size as
| String and just like String, Option<CompactString> is the same
| size as CompactString. It does add complexity (and of course a
| library dependency if you care about that) but if you have lots
| of short strings this may be the best small string type for
| you.
|
| [The key is UTF-8 encoding can only end with certain bytes,
| CompactString's documentation explains in more detail]
| epage wrote:
| For more string types, see https://github.com/rosetta-
| rs/string-rosetta-rs
| tialaramex wrote:
| Yes - this is definitely about performance tuning which
| means you should be measuring and not just assume whatever
| you're about to do is a good idea. CompactString absolutely
| might make the wrong trades for your needs and the
| alternatives are very different.
| epage wrote:
| > Build times
|
| > I'm surprised I didn't see any advice about skipping proc
| macros or Serde for faster compile times.
|
| A more comprehensive document on build times is at
| https://corrode.dev/blog/tips-for-faster-rust-compile-times/
|
| We're integrating parts of it into Cargo's official
| documentation at https://doc.rust-
| lang.org/nightly/cargo/guide/build-performa...
|
| We're coordinating the work on that at https://github.com/rust-
| lang/cargo/issues/16119
|
| > > If you use dev builds but don't often use a debugger,
| consider disabling debuginfo. This can improve dev build times
| significantly, by as much as 20-40%.
|
| We're wondering if we should split iterative development from
| debugging to pull in these improvements (and maybe more). This
| is being explored at https://github.com/rust-
| lang/cargo/issues/15931
| echelon wrote:
| Any chance we could get macros that do physical-code-on-
| machine codegen? (Like Google protobuf?)
|
| I love Serde to death, but the compile times are getting
| absolutely absurd. It's painful to sit around for three
| minutes after a single code change.
|
| If I could use Serde with all of its features, but have it
| write out the source to disk so I can commit it to our repo,
| that seems like it would vastly improve compile times. During
| development, I'd love to be able to change our API definition
| and be able to rapidly test changes. The productivity loss
| with the current state of Serde and proc macros sucks.
|
| Any chance of something like that happening? I'd use it in a
| heartbeat. I would donate to fund this.
| epage wrote:
| I am very much a fan of codegen by generating code and
| saving it to disk and advocate for it in place of
| `build.rs` wherever possible. Unsure what the path forward
| for doing that for proc-macros would look like. There is
| talk of proc-macro expansion caching so rustc can decide
| when it needs to re-run proc-macros. It requires more
| information from proc-macros and performance results have
| been a mixed bag.
|
| What I am most looking forward to help with this are:
|
| - reflection which is being worked on, see https://rust-
| lang.github.io/rust-project-goals/2025h2/reflec...
|
| - declarative attribute and derive macros which have an
| initial implementation but more work around them is needed
| to make them viable
| johnofthesea wrote:
| > > If an enum has an outsized variant, consider boxing one or
| more fields.
|
| Clippy (at least pedantic one) will suggest this if needed.
| hedora wrote:
| Is there something similar to this for rust wasm?
|
| I have some frontend use cases for rust that I just ended up
| rewriting in typescript because transferring and loading the wasm
| rust blob is more expensive than running the program.
|
| I imagine wasm-conscious optimizations would look a lot like
| targeting microcontrollers, but with weird escape hatches to
| high-level browser apis.
| vinhnx wrote:
| I find the RustWasm book has a section on profiling throughput
| and latency performance for Rust Wasm:
| https://rustwasm.github.io/book/reference/time-
| profiling.htm.... Hope it helps.
| tracker1 wrote:
| Here's something I found: https://github.com/rustwasm/book
|
| Though it's archived and seems to be abandoned. And there
| doesn't seem to be a replacement for it...
|
| https://blog.rust-lang.org/inside-rust/2025/07/21/sunsetting...
| vinhnx wrote:
| Print version (full page) https://nnethercote.github.io/perf-
| book/print.html
| pawurb wrote:
| I got a ton of value from this book. It actually pushed me to
| dive deeper into profiling and eventually start building Rust
| perf tooling of my own https://github.com/pawurb/hotpath-rs
| wiz21c wrote:
| I miss SIMD...
| misja111 wrote:
| There is this:
|
| > CPU Specific Instructions
|
| > If you do not care about the compatibility of your binary on
| older (or other types of) processors, you can tell the compiler
| to generate the newest (and potentially fastest) instructions
| specific to a certain CPU architecture, such as AVX SIMD
| instructions for x86-64 CPUs.
|
| > To request these instructions from the command line, use the
| -C target-cpu=native flag
___________________________________________________________________
(page generated 2025-11-24 23:01 UTC)