[HN Gopher] Hyperfine: A command-line benchmarking tool
___________________________________________________________________
Hyperfine: A command-line benchmarking tool
Author : hundredwatt
Score : 221 points
Date : 2024-11-18 21:47 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| mmastrac wrote:
| Hyperfine is a great tool but when I was using it at Deno to
| benchmark startup time there was a lot of weirdness around the
| operating system apparently caching inodes of executables.
|
| If you are looking at shaving sub 20ms numbers, be aware you may
| need to pull tricks on macos especially to get real numbers.
| JackYoustra wrote:
| I've found pretty good results with the System Trace template
| in xcode instruments. You can also stack instruments, for
| example combining the file inspector with a virtual memory
| inspector.
|
| I've run into some memory corruption with it sometimes, though,
| so be wary of that. Emerge tools has an alternative for iOS at
| least, maybe one day they'll port it to mac.
| art049 wrote:
| I never tried xcode instruments. Is the UX good for this kind
| of tool?
| sharkdp wrote:
| Caching is something that you almost always have to be aware of
| when benchmarking command line applications, even if the
| application itself has no caching behavior. Please see
| https://github.com/sharkdp/hyperfine?tab=readme-ov-file#warm...
| on how to run either warm-cache benchmarks or cold-cache
| benchmarks.
| mmastrac wrote:
| I'm fully aware but it's not a problem that warmup runs fix.
| An executable freshly compiled will always benchmark
| differently than one that has "cooled off" on macos,
| regardless of warmup runs.
|
| I've tried to understand what the issue is (played with
| resigning executables etc) but it's literally something about
| the inode of the executable itself. Most likely part of the
| OSX security system.
| renewiltord wrote:
| Interesting. I've encountered this obviously on first run
| (because of the security checking it does on novel
| executables) but didn't realize this expired. Probably
| because I usually attribute it to a recompilation. Thanks.
| maccard wrote:
| Not being able to rely on numbers to 20ms is pretty poor.
| That's longer than a frame in a video game.
|
| Windows has microsecond precision counters (see
| QueryPerformanceCounter and friends)
| 7e wrote:
| What database product does the community commonly send benchmark
| results to? This tool is great, but I'd love to analyze results
| relationally.
| rmorey wrote:
| Something like Geekbench for CLI tools would be awesome
| forrestthewoods wrote:
| Hyperfine is hyper frustrating because it only works with really
| really fine microsecond level benchmarks. Once you get into the
| millisecond range it's worthless.
| anotherhue wrote:
| It spawns a new process each time right? I would think that
| would but a cap on how accurate it can get.
|
| For my purposes I use it all the time though, quick and easy
| sanity-check.
| forrestthewoods wrote:
| The issue is it runs a kajillion tests to try and be
| "statistical". But there's no good way to say "just run it
| for 5 seconds and give me the best answer you can". It's very
| much designed for nanosecond to low microsecond benchmarks.
| Trying to fight this is trying to smash a square peg through
| a round hole.
| gforce_de wrote:
| At least it gives some numbers and point in a direction:
| $ hyperfine --warmup 3 './hello-world-bin-sh.sh' './hello-
| world-env-python3.py' Benchmark 1: ./hello-world-bin-
| sh.sh Time (mean +- s): 1.3 ms +- 0.4 ms
| [User: 1.0 ms, System: 0.5 ms] ... Benchmark 2:
| ./hello-world-env-python3.py Time (mean +- s):
| 43.1 ms +- 1.4 ms [User: 33.6 ms, System: 8.4 ms]
| ...
| PhilipRoman wrote:
| I disagree that it is designed for nano/micro benchmarks.
| If you want that level of detail, you need to stay within a
| single process, pinned to a core which is isolated from
| scheduler. At least I found it almost impossible to
| benchmark assembly routines with it.
| sharkdp wrote:
| > The issue is it runs a kajillion tests to try and be
| "statistical".
|
| If you see any reason for putting "statistical" in quotes,
| please let us know. hyperfine does not run a lot of tests,
| but it does try to find outliers in your measurements. This
| is really valuable in some cases. For example: we can
| detect when the first run of your program takes much longer
| than the rest of the runs. We can then show you a warning
| to let you know that you probably want to either use some
| warmup runs, or a "--prepare" command to clean (OS) caches
| if you want a cold-cache benchmark.
|
| > But there's no good way to say "just run it for 5 seconds
| and give me the best answer you can".
|
| What is the "best answer you can"?
|
| > It's very much designed for nanosecond to low microsecond
| benchmarks.
|
| Absolutely not. With hyperfine, you can not measure
| execution times in the "low microsecond" range, let alone
| nanosecond range. See also my other comment.
| oguz-ismail wrote:
| It spawns a new _shell_ for each run and subtracts the
| average shell startup time from final results. Too much noise
| PhilipRoman wrote:
| The shell can be disabled, leaving just fork+exec
| sharkdp wrote:
| Yes. If you don't make use of shell builtins/syntax, you
| can use hyperfine's `--shell=none`/`-N` option to disable
| the intermediate shell.
| oguz-ismail wrote:
| You still need to quote the command though. `hyperfine -N
| ls "$dir"' won't work, you need `hyperfine -N "ls
| ${dir@Q}"' or something. It'd be better if you could
| specify commands like with `find -exec'.
| PhilipRoman wrote:
| Oh that sucks, I really hate when programs impose useless
| shell parsing instead of letting the user give an
| argument vector natively.
| sharkdp wrote:
| I don't think it's useless. You can use hyperfine to run
| multiple benchmarks at the same time, to get a comparison
| between multiple tools. So if you want it to work without
| quotes, you need to (1) come up with a way to separate
| commands and (2) come up with a way to distinguish
| hyperfine arguments from command arguments. It's doable,
| but it's also not a great UX if you have to write
| something like hyperfine -N -- ls
| "$dir" \; my_ls "$dir"
| oguz-ismail wrote:
| > not a great UX
|
| Looks fine to me. Obviously it's too late to undo that
| mistake, but a new flag to enable new behavior wouldn't
| hurt anyone.
| sharkdp wrote:
| That doesn't make a lot of sense. It's more like the opposite
| of what you are saying. The precision of hyperfine is typically
| in the single-digit millisecond range. Maybe just below 1 ms if
| you take special care to run the benchmark on a quiet system.
| Everything _below_ that (microsecond or nanosecond range) is
| something that you need to address with other forms of
| benchmarking.
|
| But for everything in the right range (milliseconds, seconds,
| minutes or above), hyperfine is well suited.
| forrestthewoods wrote:
| No it's not.
|
| Back in the day my goal for Advent of Code was to run all
| solutions in under 1 second total. Hyperfine would take like
| 30 minutes to benchmark a 1 second runtime.
|
| It was hyper frustrating. I could not find a good way to get
| Hyperfine to do what I wanted.
| sharkdp wrote:
| If that's the case, I would consider it a bug. Please feel
| free to report it. In general, hyperfine should not take
| longer than ~3 seconds, unless the command itself takes >
| 300 ms second to run. In the latter case, we do a minimum
| of 10 runs by default. So if your program takes 3 min for a
| single iteration, it would take 30 min by default -- yes.
| But this can be controlled using the `-m`/`--min-runs`
| option. You can also specify the exact amount of runs using
| `-r`/`--runs`, if you prefer that.
|
| > I could not find a good way to get Hyperfine to do what I
| wanted
|
| This is all documented here: https://github.com/sharkdp/hyp
| erfine/tree/master?tab=readme-... under "Basic benchmarks".
| The options to control the amount of runs are also listed
| in `hyperfine --help` and in the man page. Please let us
| know if you think we can improve the documentation /
| discovery of those options.
| fwip wrote:
| I've been using it for about four or five years, and never
| experienced this behavior.
|
| Current defaults: "By default, it will perform at least 10
| benchmarking runs and measure for at least 3 seconds." If
| your program takes 1s to run, it should take 10 seconds to
| benchmark.
|
| Is it possible that your program was waiting for input that
| never came? One "gotcha" is that it expects each argument
| to be a full program, so if you ran `hyperfine ./a.out
| input.txt`, it will first bench a.out with no args, then
| try to bench input.txt (which will fail). If a.out reads
| from stdin when no argument is given, then it would hang
| forever, and I can see why you'd give up after a half hour.
| sharkdp wrote:
| > Is it possible that your program was waiting for input
| that never came?
|
| We do close stdin to prevent this. So you can benchmark
| `cat`, for example, and it works just fine.
| fwip wrote:
| Oh, my bad! Thank you for the correction, and for all
| your work making hyperfine.
| usrme wrote:
| I've also had a good experience using the 'perf'[^1] tools for
| when I don't want to install 'hyperfine'. Shameless plug for a
| small blog post about it as I don't think it is that well known:
| https://usrme.xyz/tils/perf-is-more-robust-for-repeated-timi....
|
| ---
|
| [^1]: https://www.mankier.com/1/perf
| vdm wrote:
| I too have scripted time(1) in a loop badly. perf stat is more
| likely to be already installed than hyperfine. Thank you for
| sharing!
| CathalMullan wrote:
| There's also 'poop', which is a nice middle-ground between
| 'hyperfine' and 'perf'. https://github.com/andrewrk/poop
| llimllib wrote:
| worth mentioning that it's linux-only
| mosselman wrote:
| Hyperfine is great. I use it sometimes for some quick web page
| benchmarks:
|
| https://abuisman.com/posts/developer-tools/quick-page-benchm...
|
| As mentioned here in the thread, when you want to go into the
| single ms optimisations it is not the best approach since there
| is a lot of overhead especially the way I demonstrate here, but
| it works very well for some sanity checks.
| llimllib wrote:
| I find k6 a lot nicer for HTTP benching, and no slower to set
| up than hyperfine (which I love for CLI benching):
| https://k6.io/
| jiehong wrote:
| Could hyperfine running curl be an alternative?
| mosselman wrote:
| That is what I do in my blog post.
| Sesse__ wrote:
| > Hyperfine is great.
|
| Is it, though?
|
| What I would expect a system like this to have, at a minimum:
| * Robust statistics with p-values (not just min/max,
| compensation for multiple hypotheses, no Gaussian assumptions)
| * Multiple stopping points depending on said statistics.
| * Automatic isolation to the greatest extent possible (given
| appropriate permissions) * Interleaved execution, in case
| something external changes mid-way.
|
| I don't see any of this in hyperfine. It just... runs things N
| times and then does a naive average/min/max? At that rate, one
| could just as well use a shell script and eyeball the results.
| bee_rider wrote:
| What do you suggest? Those sound like great features.
| Sesse__ wrote:
| I've only seen such things in internal tools so far,
| unfortunately, so if you see anything in public, please
| tell me :-) I'm just confused why everything thinks
| hyperfine is so awesome, when it does not meet what I'd
| consider a fairly low bar for benchmarking tools? ("Best
| publicly available" != "great", in my book.)
| sharkdp wrote:
| > "Best publicly available" != "great"
|
| Of course. But it is free and open source. And everyone
| is invited to make it better.
| sharkdp wrote:
| > Robust statistics with p-values (not just min/max,
| compensation for multiple hypotheses, no Gaussian
| assumptions)
|
| This is not included in the core of hyperfine, but we do have
| scripts to compute "advanced" statistics, and to perform
| t-tests here:
| https://github.com/sharkdp/hyperfine/tree/master/scripts
|
| Please feel free to comment here if you think it should be
| included in hyperfine itself:
| https://github.com/sharkdp/hyperfine/issues/523
|
| > Automatic isolation to the greatest extent possible (given
| appropriate permissions)
|
| This sounds interesting. Please feel free to open a ticket if
| you have any ideas.
|
| > Interleaved execution, in case something external changes
| mid-way.
|
| Please see the discussion here:
| https://github.com/sharkdp/hyperfine/issues/21
|
| > It just... runs things N times and then does a naive
| average/min/max?
|
| While there is nothing wrong with computing average/min/max,
| this is not all hyperfine does. We also compute modified
| Z-scores to detect outliers. We use that to issue warnings,
| if we think the mean value is influenced by them. We also
| warn if the first run of a command took significantly longer
| than the rest of the runs and suggest counter-measures.
|
| Depending on the benchmark I do, I tend to look at either the
| `min` or the `mean`. If I need something more fine-grained, I
| export the results and use the scripts referenced above.
|
| > At that rate, one could just as well use a shell script and
| eyeball the results.
|
| Statistical analysis (which you can consider to be basic) is
| just one reason why I wrote hyperfine. The other reason is
| that I wanted to make benchmarking easy to use. I use warmup
| runs, preparation commands and parametrized benchmarks all
| the time. I also frequently use the Markdown export or the
| JSON export to generate graphs or histograms. This is my
| personal experience. If you are not interested in all of
| these features, you can obviously "just as well use a shell
| script".
| Sesse__ wrote:
| > This is not included in the core of hyperfine, but we do
| have scripts to compute "advanced" statistics, and to
| perform t-tests here:
| https://github.com/sharkdp/hyperfine/tree/master/scripts
|
| t-tests run afoul of the "no Gaussian assumptions", though.
| Distributions arising from benchmarking frequently has
| various forms of skew which messes up t-tests and gives
| artificially narrow confidence intervals.
|
| (I'll gladly give you credit for your outlier detection,
| though!)
|
| >> Automatic isolation to the greatest extent possible
| (given appropriate permissions) > This sounds interesting.
| Please feel free to open a ticket if you have any ideas.
|
| Off the top of my head, some option that would:
|
| * Bind to isolated CPUs, if booted with it (isolcpus=) *
| Binding to a consistent set of cores/hyperthreads (the
| scheduler frequently sabotages benchmarking, especially if
| your cores are have very different maximum frequency) *
| Warns if thermal throttling is detected during the run *
| Warns if an inappropriate CPU governor is enabled * Locks
| the program into RAM (probably hard to do without some sort
| of help from the program) * Enables realtime priority if
| available (e.g., if isolcpus= is not enabled, or you're not
| on Linux)
|
| Of course, sometimes you would _want_ to benchmark some of
| these effects, and that's fine. But most people probably
| won't, and won't know that they exist. I may easily have
| forgotten some.
|
| On the flip side (making things more random as opposed to
| less), something that randomizes the initial stack pointer
| would be nice, as I've sometimes seen this go really,
| really wrong (renaming a binary from foo to foo_new made it
| run >1% slower!).
| sharkdp wrote:
| > On the flip side (making things more random as opposed
| to less), something that randomizes the initial stack
| pointer would be nice, as I've sometimes seen this go
| really, really wrong (renaming a binary from foo to
| foo_new made it run >1% slower!).
|
| This is something we do already. We set a
| `HYPERFINE_RANDOMIZED_ENVIRONMENT_OFFSET` environment
| variable with a random-length value: https://github.com/s
| harkdp/hyperfine/blob/87d77c861f1b6c761a...
| renewiltord wrote:
| Personally, I'm all about the UNIX philosophy of doing one
| thing and doing it well. All I want is the process to be
| invoked k times to do a thing with warmup etc. etc. If I want
| additional stats, it's easy to calculate. I just `--export-
| json` and then once it's in a dataframe I can do what I want
| with it.
| shawndavidson7 wrote:
| "Hyperfine seems like an incredibly useful tool for anyone
| working with command-line utilities. The ability to benchmark
| processes straightforwardly is vital for optimizing performance.
| I'm particularly impressed with how simple it is to use compared
| to other benchmarking tools. I'd love to see more examples of how
| Hyperfine can be integrated into different workflows, especially
| for large-scale applications.
|
| https://www.osplabs.com/
| edwardzcn wrote:
| Hyperfine is great! I remember I learned about it when comparing
| functions with/without tail recursion (not sure if it was from
| the Go reference or the Rust reference). It provides simple
| configurations for unit test. But I have not tried it on DBMS
| (e.g. like sysbench). Does anyone have a try?
| smartmic wrote:
| A capable alternative based on "boring, old" technology is
| multitime [1]
|
| Back at the time I needed it, it had peak memory usage -
| hyperfine was not able to show it. Maybe this had changed by now.
|
| [1] https://tratt.net/laurie/src/multitime/
| accelbred wrote:
| Hyperfine is a really useful tool.
|
| Weirdest thing I've used it for is comparing io throughput on
| various disks.
| ratrocket wrote:
| Perhaps interesting (for some) to note that hyperfine is from the
| same author as at least a few other "ne{w,xt} generation" command
| line tools (that could maybe be seen as part of "rewrite it in
| Rust", but I don't want to paint the author with a brush they
| disagree with!!): fd (find alternative;
| https://github.com/sharkdp/fd), bat ("supercharged version of the
| cat command"; https://github.com/sharkdp/bat), and hexyl (hex
| viewer; https://github.com/sharkdp/hexyl). (And certainly others
| I've missed!)
|
| Pointing this out because I myself appreciate comments that do
| this.
|
| For myself, `fd` is the one most incorporated into my own
| "toolbox" -- used it this morning prior to seeing this thread on
| hyperfine! So, thanks for all that, sharkdp if you're reading!
|
| Ok, end OT-ness.
| varenc wrote:
| ++ to `fd`
|
| It's absolutely my preferred `find` replacement. Its CLI
| interface just clicks for me and I can quickly express my
| desires. Quite unlike `find`. `fd` is one of the first packages
| I install on a new system.
| ratrocket wrote:
| The "funny" thing for me about `fd` is that the set of
| operations I use for `find` are very hard-wired into my
| muscle memory from using it for 20+ years, so when I reach
| for `fd` I often have to reference the man page! I'm getting
| a little better from more exposure, but it's just different
| enough from `find` to create a bit of an uncanny valley
| effect (I think that's the right use of the term...).
|
| Even with that I reach for `fd` for some of its quality-of-
| life features: respecting .gitignore, its speed, regex-
| ability. (Though not its choices with color; I am a pretty
| staunch "--color never" person, for better or worse!)
|
| Anyway, that actually points to another good thing about
| sharkdp's tools: they have good man pages!!
___________________________________________________________________
(page generated 2024-11-19 23:01 UTC)