[HN Gopher] How Rust 1.64 became faster on Windows
___________________________________________________________________
How Rust 1.64 became faster on Windows
Author : todsacerdoti
Score : 160 points
Date : 2022-10-23 13:43 UTC (9 hours ago)
(HTM) web link (tomaszs2.medium.com)
(TXT) w3m dump (tomaszs2.medium.com)
| mgaunard wrote:
| wongarsu wrote:
| Great to see how huge the benefit of profile-guided optimization
| is. I feel it's one of the more underappreciated techniques. Rust
| adding support for it on windows, and showcasing what a big
| improvement it makes on the compiler is pretty big (in addition
| to just having a faster compiler)
| mhh__ wrote:
| I did pgo builds for the D compiler, was about 10% to 30% even
| on some benchmarks.
|
| The subtle win is the space savings, no more Jackson pollock
| inlining
| londons_explore wrote:
| Unfortunately, many projects never benefit from PGO, because
| there is quite a lot of complexity involved in setting up a
| profiling workload, storing the profile somewhere, and using it
| for future builds.
|
| I'd like compiler writers to embed a 'default profile' into the
| compiler, which uses data from as much opensource code as they
| can find all over github etc.
|
| This default profile will improve the performance of lots of
| libraries that everyone uses, and will probably still help
| closed source code (since it will probably be written in a
| similar style to opensource code).
| hinkley wrote:
| JITs do PGO all the time. It's their bread and butter.
| pjmlp wrote:
| Nowadays they also save PGO across executions, so that they
| don't always start from zero.
|
| The most modern ones that is (Java, .NET, Android).
| andrewaylett wrote:
| The "default" profile "for PGO" is the compiler on its own --
| folk put a _lot_ of effort into making sure it will generally
| compile arbitrary code well. And a big part of that is lots
| of people running lots of open source code and measuring how
| well it performs.
|
| The difficulty with "as much open source code as they can
| find" is that we need to execute the code to make a profile.
| And unless we're running the code under real-world
| conditions, there's no guarantee that we'll generate a useful
| profile. So we need to be a little careful about which code
| we look at from a performance perspective. Even when we have
| a profile, it's a count of branches taken for the specific
| code that was compiled, and it's not normally applicable to
| either a different version of the compiler or any input
| that's not identical to the input used for profiling. With
| link-time optimisations, even a "common" profile for library
| code isn't necessarily going to be useful: which bits of a
| library we'll try to inline will vary according to the code
| that's calling it.
| pca006132 wrote:
| I think you can already build shared libraries with PGO,
| although this doesn't really work with header only libraries
| for C++...
| darksaints wrote:
| I think a cool project to work on would be model-based ML-
| generated profiles that takes a set of parameters like:
|
| * application type (e.g. client, server, batch process,
| parser, etc.) * target architecture, vendor, model, etc. *
| target resources like RAM, HD Types, Network interfaces, etc.
|
| I would think you could get very close to an actual PGO level
| of performance with just a handful of parameters and lot of
| data.
| branko_d wrote:
| Perhaps a better approach would be some sort of per-library
| profile?
| londons_explore wrote:
| If you can make it be zero effort for developers, thats a
| good plan... But if it involves even a minor effort from
| the developer, then most developers probably won't bother.
|
| I'm imagining for example a 'profile server', which anyone
| can upload profiler data to, and that the compiler queries
| to get profile data for any given file it wants to compile.
| pjmlp wrote:
| That is the beauty of modern JITs with feedback PGO data, it
| can be saved across execution sessions and with time the data
| with grow towards an optimal data point.
| bruce343434 wrote:
| > I'd like compiler writers to embed a 'default profile' into
| the compiler, which uses data from as much opensource code as
| they can find all over github etc.
|
| What would be the point? The whole thing about PGO is that it
| measures which paths of _your_ code are "hot".
| tsavola wrote:
| Consider error handling paths.
| tialaramex wrote:
| Rust will already end up optimising out the error
| handling that _can 't_ happen because Infallible is an
| Empty Type (it makes no sense to emit code for an Empty
| Type because no values of this type can exist, so during
| monomorphization this code evaporates)
|
| (e.g. trying to convert a 16-bit unsigned integer into a
| 32-bit signed integer can't fail, that always works so
| its error type is Infallible, whereas trying to convert a
| 32-bit _signed_ integer into an unsigned one clearly
| fails for some values, that 's a
| core::num::TryFromIntError you need to handle)
|
| So we're left only with errors which _don 't_ happen. But
| who says? On my workload maybe the profile image file
| doesn't exist 0% of the time since I'm actually making
| the image files, so of course they exist, but in _your_
| workload the user gets to specify the filename and so
| they type it wrong about 0.1% of the time, and in
| somebody else 's workload the hostile adversary spews
| nonsense filename values like "../../../../../etc/passwd"
| to try to exploit bugs in some PHP code from 15 years
| ago, so they see almost 10% errors. What would we learn
| from a "general profile"? Nothing useful.
| hinkley wrote:
| Or a perennial favorite of mine:
|
| $ process Some Image Name.png
|
| Could not find file "Some"
|
| $ process "Some Image Name.png"
|
| Done.
| a1369209993 wrote:
| > Some Image Name.png
|
| ... Urg, _that_.
|
| If I ever implement a bespoke file system format, it is
| going to be encoding-level impossible to represent file
| names with spaces. Not FAT-style[0] "the spec says to
| replace that with a underscore" or something, but more
| "the on-disk character encoding does not contain any
| sequence of bits that represents space".
|
| 0: (non-ex-)FAT stores filenames in all caps, but the
| data of disk is ASCII, so you can just write lowercase
| letters in the physical directory entries. (I've seen at
| least one FAT implementation that actually uses that to
| 'support' lowercase filenames.)
| dhosek wrote:
| Meh, I remember the move from DOS 3.3 to ProDOS back in
| the Apple //e days and the loss of spaces in filenames
| was something that seemed a regression to me.
| dhosek wrote:
| I'd rather see a ban on non-Unicode strings as file
| paths. ^&*# Windows.
| [deleted]
| hinkley wrote:
| And all this time I've been blaming Windows for bringing
| us white spaces in file names.
| londons_explore wrote:
| Lots of _your_ code is library code that everybody uses...
|
| And lots of your code has similar hot paths to everyone
| elses code. It turns out that `for x in pixels { }` is
| probably going to be a hot loop... But `for x in
| serial_ports { }` probably isn't a hot loop...
| tnh wrote:
| Agree this difficulty is the biggest obstacle to PGO's
| success. A language/ecosystem that works out how to integrate
| this as smoothly as testing would have a sizeable performance
| boost in practice.
|
| The default profile is a nice hack. We do this by default for
| C++ builds at [company], it works great. Teams that care can
| build a custom profile which performs better, but most don't.
|
| > I'd like compiler writers to embed a 'default profile' into
| the compiler, which uses data from as much opensource code as
| they can find all over github etc.
|
| Working out how to build, let alone profile all that code is
| no joke. And the result will be large, and maybe not that
| much overlap with the average program. As a sibling points
| out, maybe using ML to recognize patterns instead of concrete
| code would help?
|
| I'd settle for profiling of the standard library. In an
| ecosystem like Rust, per-crate default profiles that you
| could stitch together would be amazing.
| dijit wrote:
| What is the common consensus on benchmarking test suites in Rust?
|
| From what I understood: Criterion was the gold standard, but
| there is a built-in benchmark suite but that is only supported in
| Nightly.
|
| What's the difference?
| dochtman wrote:
| There's the bencher crate as well, which provides a similar API
| to nightly through macros that work on stable. On one of the
| projects I maintain we reverted from criterion to bencher
| because the criterion results sometimes made no sense.
| throwup wrote:
| Criterion is still the gold standard.
|
| Pros for Criterion over the stdlib:
| https://github.com/bheisler/criterion.rs#features
|
| Downsides of Criterion:
| https://bheisler.github.io/criterion.rs/book/user_guide/know...
| dijit wrote:
| Thanks throwup!
| superjan wrote:
| TLDR: Profile guided optimization was not supported on windows.
| That has been enabled now. So this only helps you if you are
| compiling on windows and want to go the extra mile of running PGO
| builds.
| itamarst wrote:
| They're shipping PGO builds of the Rust compiler, so for faster
| compilation you don't have to do anything ("Windows builds now
| use profile-guided optimization, providing 10-20% improvements
| to compiler performance", per Rust's release notes:
| https://github.com/rust-lang/rust/blob/master/RELEASES.md).
| bogeholm wrote:
| After enabling PGO, that was used to compile `rustc` itself.
|
| So compiling Rust code on Windows is faster with 1.64 than
| 1.63.
| mastax wrote:
| Tldr: PGO
|
| https://github.com/rust-lang/rust/pull/96978/
| [deleted]
| SimonV1235 wrote:
| hinoki wrote:
| I thought PGO instrumentation works on basic blocks, and the
| inlining, outlining, and register allocation optimisations are
| all done on llvm's IR. So everything can happen in the backend.
|
| What sort of work is OS specific, or language specific?
|
| I've used PGO before, but I'm not familiar with the details.
| Someone wrote:
| Computing the profile wasn't possible on Windows
|
| FTA: _But there is one problem: PGO was up until now available
| only on Linux._
|
| I think they couldn't use a profile generated on Linux because
| of differences in ABI and standard library.
|
| I also think generating a profile is OS dependent because you
| want it to not have much of a performance impact.
| eventhorizonpl wrote:
| It's great to see every speed up :)
| [deleted]
| aliqot wrote:
| I think your comments are being suppressed, they're all removed
| after you post. You might check on that.
| nalllar wrote:
| https://news.ycombinator.com/newsfaq.html
|
| Look at the [dead] section
| aliqot wrote:
| A single flagged comment and we've declared someone
| irredeemable. Amazing.
|
| https://news.ycombinator.com/item?id=31738035
| xyzzy123 wrote:
| Look closer.
| sp332 wrote:
| I dont think that's right, because there are 6 non-dead
| comments after that one. But very new accounts are likely
| to be banned after a single flag, to avoid ban evasion.
| aliqot wrote:
| Those have been vouched for
| jeff-davis wrote:
| For databases, I'm reluctant to rely on PGO (profile-guided
| optimization) because the workloads are so varied. There's a risk
| of over-fitting to the profiled workloads at the expense of
| others.
|
| Though there may be a lot of opportunity with some database
| _subsystems_ that have a more consistent usage pattern.
|
| Edit: also, PGO is closely related to JIT techniques, which are
| based on current runtime information rather than profiles
| generated a long time ago on a workload that may or may not be
| representative.
| vlovich123 wrote:
| I think in practice enabling PGO will be a net gain even if
| it's suboptimal on some workloads (ie you should still see some
| performance gain across the board even if the specific workload
| isn't profiled). The reason is that it's using the profiles to
| make decisions in lieu of heuristics which should be a win even
| for non profiled workloads because heuristics are essentially
| just general case profiles (ie tuned to a bunch of OSS software
| out there). I'm unaware of any research showing PGO being worse
| than not doing it even if your profile isn't the workload
| (you'd probably have to try to specially build such a situation
| and unlikely to come up in practice).
|
| Have you actually seen otherwise?
| Filligree wrote:
| Throughput isn't everything, and improving OPS at the
| expensive of tail latency can be a problem. Depends on your
| specific workload, but it isn't something I'd enable by
| default.
| summerlight wrote:
| PGO will still likely improve general performance even with
| biased workloads because in many cases we want compilers to
| focus on optimizing happy paths, but this is not always well
| executed even when it's pretty trivial for human eyes.
| sorz wrote:
| Is it possible that the profile is over-fitting to the benchmark
| tests?
| varajelle wrote:
| Yes that's likely. But the idea is that even if that's the
| case, it is still better than no PGO.
|
| Edit: I'd like to add that if the 10-20% mentioned is measured
| on the benchmark that was used to do the pgo, then that figure
| might indeed not be representative of the real gain.
| Tuna-Fish wrote:
| Their main benchmark test is compiling every publicly released
| crate on crates.io. This is also their main regression test.
|
| If you manage to overfit against that, it's still probably an
| amazing general purpose solution.
| tyingq wrote:
| https://archive.ph/5nvje
| unnouinceput wrote:
| xeonmc wrote:
| Oh, I just assumed that the article was written by a French and
| shrugged.
| DeathArrow wrote:
| wizardman wrote:
| evilduck wrote:
| Why does this flame bait have to get squeezed in everywhere?
| Thaxll wrote:
| There is no viable alternative to Electron that is truly multi-
| platform.
| simplotek wrote:
| > There is no viable alternative to Electron that is truly
| multi-platform.
|
| Qt?
| pixl97 wrote:
| They did say viable.
| simplotek wrote:
| Since when is the likes of electron more viable than Qt?
| pixl97 wrote:
| Since there are at least an order of magnitude more
| programmers that are going to write JS over Qt.
| fuzzy2 wrote:
| I'm all for native UIs, but if you _must_ go cross-platform,
| it's often not worth it. Why bother with the specifics of
| Windows, Linux, macOS, Android, iOS. Just build one mediocre UI
| to rule them all.
|
| Electron does not have to be slow, just how web applications in
| general do not have to be slow, multi-megabyte monstrosities.
|
| I bet you would be surprised if you knew where even
| specialized, sort-of-embedded UI is moving. Hint: it's not
| native.
| pas wrote:
| Nowadays it's possible to opt for a best of both worlds
| situation: https://tauri.app/ or https://github.com/sciter-
| sdk/rust-sciter
| Thaxll wrote:
| Electron uses chrome everywhere, Tauri uses the system
| webview which is very different and vastly inferior.
| zeta0134 wrote:
| This really depends on your definition of inferior; I'd
| consider the lower resource usage a good tradeoff for fewer
| bleeding edge features on most days
| geodel wrote:
| On the third hand are people who use VScode to write Rust code
| and _claim_ tools based on electron is the way to go.
| [deleted]
| simplotek wrote:
| > On the other hand there are guys who don't waste much time
| and pick Electron for their app, for a 1000% performance
| degradation.
|
| Do hypothetical 1000% performance gains matter if perceived
| performance is already within acceptable limits?
|
| Wasting time gold plating solutions with no meaningful tradeoff
| is a negative trait, not a positive one.
| pixl97 wrote:
| Individually or in bulk?
|
| For an individual, no.
|
| For the gigawatt of power you just wasted nationwide, yes.
| golergka wrote:
| Being able to use the same code on all three major desktop
| platforms as well as the web is worth it many times over.
___________________________________________________________________
(page generated 2022-10-23 23:01 UTC)