[HN Gopher] Show HN: Rust test harness that measures energy cons...
       ___________________________________________________________________
        
       Show HN: Rust test harness that measures energy consumption
        
       Author : thijsr
       Score  : 104 points
       Date   : 2022-04-05 12:51 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | rictic wrote:
       | Many benchmarking systems face measurement issues that make it
       | difficult to produce solid results. Any given run might not be
       | running on the same hardware, the same OS, built with the same
       | compiler, running with the same runtime, with the same versions
       | of dependencies, with the same system load, at the same
       | temperature, and so on.
       | 
       | One robust solution is to instead do pairwise comparisons, many
       | times in a round robin fashion. The results aren't quite as nice
       | to plot, as you don't get a single consistent speed value, but
       | they are much more reliable and true, and you still get useful
       | information, like ">95% chance that this test is at least 20%
       | faster at this commit than at the previous one".
       | 
       | A project I contribute to uses this strategy:
       | https://github.com/Polymer/tachometer, but I'd love it if more
       | benchmarks took this approach.
        
       | ______-_-______ wrote:
       | Counting instructions is very accurate and roughly approximates
       | power usage. The CPU's self-reported power usage is comparatively
       | pretty noisy. Unit tests will probably be done running before you
       | can get meaningful data. I have to wonder if a test runner is the
       | best point of integration for this. It might make more sense to
       | expose it as a bench harness, like criterion.
       | 
       | EDIT: another benefit of a criterion-like approach is that you
       | wouldn't require nightly
        
         | thijsr wrote:
         | Yes, the CPU self-reported power usage is indeed fairly noisy.
         | We've tried to mitigate this by executing certain tests
         | multiple times in a row, and using the average power
         | consumption across these executions. However, this is data is
         | still quite noisy and is influenced by external factors, like
         | the operating system, your hardware, power management settings,
         | and a lot more. We mention this and possible mitigating actions
         | in the README.
         | 
         | We chose for a test harness because one of our goals was to
         | make it as easy as possible to run it on existing Rust
         | projects. A lot of projects define tests, but benchmarks are
         | not often not present. But maybe a bench harness would be a
         | better and/or cleaner approach, will look into it!
        
         | Shish2k wrote:
         | > Counting instructions is very accurate and roughly
         | approximates power usage
         | 
         | I've always assumed this to be true, but I see a lot of
         | benchmarking tools / libraries measuring wall-clock time or
         | iterations-per-second or something like that, I've never seen a
         | benchmark tool which counts CPU instructions. Am I being blind
         | or is there some other reason that I'm not seeing them? :S
        
           | ______-_-______ wrote:
           | At the end of the day most people care about wall clock time.
           | It's a real physical value that's easy to understand and easy
           | to compare between systems. Plus, if two functions execute
           | say, 1 billion instructions each, but one spends extra time
           | stalled waiting on IO or data fetches from RAM, you
           | definitely want to account for that in normal benchmarking.
           | 
           | Instruction counting is more of a specialized tool but I like
           | to use it whenever I can because it has low variance and
           | makes comparing changes a lot easier. Compare how bumpy these
           | graphs are for instruction count (first link) and wall clock
           | time (second link):
           | 
           | https://perf.rust-lang.org/
           | 
           | https://perf.rust-
           | lang.org/?start=&end=&kind=raw&stat=wall-t...
        
           | mhh__ wrote:
           | Counting instructions properly is hard and also results in a
           | good amount of overhead if you don't use a bunch of tricks or
           | a kernel module.
           | 
           | You also can't really count instructions in the cloud.
        
             | wooosh wrote:
             | Counting (userspace) instructions is relatively easy
             | regardless of language with perf stat, though it does
             | require the kernel module. Generally speaking it should
             | just work if perf is installed through the package manager
             | for your distribution.
             | 
             | edit: valgrind's callgrind utility can also produce exact
             | instruction execution counts for a given block of code
        
               | mhh__ wrote:
               | Callgrind can give you instruction counts yes. It doesn't
               | simulate any microarchitecture other than caches which
               | means its only useful for comparing with itself.
               | 
               | Perf stat is very very high overhead. The perf API is
               | available and can be tuned a bit more nicely but it's
               | mostly a horrible mess. It uses bitfields too which makes
               | it somewhat hard to get to from other languages unless
               | you trust the shifts and masks.
        
           | wmf wrote:
           | Instructions correlate to energy but not to performance. If
           | you're benchmarking performance you should use wall clock
           | time.
        
           | wooosh wrote:
           | Counting instructions does not give information about time
           | spent in syscalls/doing IO, which limits its use to CPU-bound
           | software.
        
       | hd4 wrote:
       | This reminds me of an idea I wanted to submit to the systemd team
       | (or wherever it would be more appropriate) to have a Linux
       | service report on the current power usage of the OS and maybe
       | even translate it into currency-per-hour to show people how much
       | they were spending over time with the aim of reducing power
       | wastage. Seems like it would be more relevant than ever given the
       | global situation around energy these days.
        
         | yjftsjthsd-h wrote:
         | So... powertop? Possibly run as a daemon logging to the
         | journal, which I grant is somewhat different from how it works
         | now (ncurses tool or something you run and log to CSV).
        
       | thijsr wrote:
       | Hey, I wanted to share a project that we've been working on!
       | Coppers is a custom test harness for Rust that allows you to
       | measure the energy consumption of your test suite. A use case for
       | this could be to identify regressions in energy usage, or to do
       | more targeted energy optimizations. Our goal was to make it as
       | seamless as possible to integrate it with existing Rust projects.
       | To make that work, we had to rely on some unstable and internal
       | Rust compiler features that are only available in nightly. But
       | the current implementation seems to be able to measure the energy
       | consumption of almost every existing Rust crate we tested! (with
       | the exception of embedded and system-specific crates, but that is
       | a limitation we're looking into)
        
         | teitoklien wrote:
         | it's a pretty cool project :D, Thank you for making it ! I'll
         | definitely try it in my next project.
        
       | ducktective wrote:
       | Very nice!
       | 
       | Any ideas on how to measure energy consumption of programs in a
       | GNU/Linux OS? I know of `powertop` but it measures total power
       | consumption (its per-program table is inaccurate)
        
       ___________________________________________________________________
       (page generated 2022-04-05 23:01 UTC)