https://lemire.me/blog/2023/04/06/are-your-memory-bound-benchmarking-timings-normally-distributed/

Skip to content

Daniel Lemire's blog

Daniel Lemire is a computer science professor at the Data Science
Laboratory of the Universite du Quebec (TELUQ) in Montreal. His
research is focused on software performance and data engineering. He
is a techno-optimist and a free-speech advocate.

Menu and widgets

  * My home page
  * My papers
  * My software

Subscribe

Join over 12,500 subscribers:
Email Address [                    ]
[                    ]
[Subscribe by email]
You can also follow this blog on telegram.
You can find me on twitter as @lemire or on Mastodon.
Search for: [                    ] [Search]
Support my work!

I do not accept any advertisement. However, you can you can sponsor
my open-source work on GitHub.

Recent Posts

  * Are your memory-bound benchmarking timings normally distributed?
  * What are we going to do about ChatGPT?
  * C++20: consteval and constexpr functions
  * Can GPT pass my programming courses?
  * Counting cycles and instructions on ARM-based Apple systems

Recent Comments

  * Brian on Care is needed to use C++ std::optional with non-trivial
    objects
  * al butlerian on What are we going to do about ChatGPT?
  * Patrick on What are we going to do about ChatGPT?
  * Ian Beauregard on What are we going to do about ChatGPT?
  * Daniel Lemire on What are we going to do about ChatGPT?

Pages

  * A short history of technology
  * About me
  * Book recommendations
  * Checkout-Result
  * Cognitive biases
  * Interviews and talks
  * My bets
  * My favorite articles
  * My favorite quotes
  * My readers
  * My sayings
  * Predictions
  * Privacy Policy
  * Products
  * Recommended video games
  * Terms of use
  * Write good papers

Archives

Archives [Select Month        ]

Boring stuff

  * Log in
  * Entries feed
  * Comments feed
  * WordPress.org

Are your memory-bound benchmarking timings normally distributed?

When optimizing software, we routinely measure the time that takes a
given function or task. The typical assumption is that we get a
normal distribution, and so we should therefore report the average
time.

I believe that this hidden assumption (normality) is frequently
violated in software benchmarks. I recently gave a talk on precise
and accurate benchmarking (video, slides) at a "Benchmarking in the
Data Center" workshop where I made this point in detail.

Why should it matter? If you have normal distributions, you can
mathematically increase the accuracy of the measure by taking more
samples. Your error should go down as the square root of the number
of measures.

If you ever tried to increase the number of samples in the hope of
getting a more accurate result, you may have been severely
disappointed. And that is because the timings often more closely
ressemble a log normal distribution:

[Capture-decran-le-2023-04-06-a-15]

A log normal distribution is asymmetrical, you have mean that is
relatively close the minimum, and a long tail... as you take more and
more samples, you may find more and more large values.

You can often show that you do not have a normal distribution because
you find 4-sigma, 5-sigma or even 13-sigma events: you measure values
that are far above the average compared to your estimated standard
deviation. It is not possible, in a normal distribution, to be
multiple times the standard deviation away from the mean. However, it
happens much more easily with a log normal.

Of course, real data is messy. I am not claiming that your timings
precisely follow a log normal distribution. It is merely a model.
Nevertheless, it suggests that reporting the average and the standard
error is inadequate. I like to measure the minimum and the
average. You can then use the distance between the average and the
minimum as an easy-to-measure error metric: if the average is far
from the minimum, then your real-world performance could differ
drastically from the minimum.

The minimum is easier to measure accurately than the average. I
routinely achieve a precision of 1% or better in practice. That is,
if I rerun the same benchmark another day on the same machine, I can
be almost certain that the result won't vary by much more than 1%.
The average is slightly more difficult to nail down. You can verify
that it is so with a model: if you generate log-normally distributed
samples, you will find it easier to determine the minimum than the
average with high accuracy.

For my talk, I used a compute-bound routine: the performance of my
function was not bound by the speed of the RAM, by the network or by
the disk. I took data that was in CPU cache and I generated more data
in CPU cache. For these types of problems, I find that timings often
ressemble a log normal.

What about other types of tasks? What about memory-bound tasks?

I took a memory benchmark that I like to use. It consists of a large
array spanning hundreds of megabytes, and the software must jump from
location to location in it, being incapable of predicting the next
jump before completing the read. It is often called a "pointer
chasing" routine. I can interleave the pointer-chasing routines so
that I have several loads in flight at each time: I call this the
number of lanes. The more lanes I have, the more "bandwidth limited"
I become, the fewer the lanes, the more "memory latency" I become. I
am never compute bound in these tests, meaning that the number of
instructions retired per cycle is always low.

For each fixed number of lanes, I run the benchmark 30 times on an
Amazon Intel c6i node running Ubuntu 22. The time elapsed vary
between over 3 seconds per run (for one lane) to about 1/6 s (for 30
lanes). I then estimate the standard deviation, I compute the mean,
the maximum and the minimum. I then compute the bottom sigma as the
gap (in number of standard deviations) between the minimum and the
average, and then the gap between the average and the maximum. If the
distribution is normal, I should have roughly the same number of
sigmas on either side (min and max) and I should not exceed 3 sigmas.
I find that one side (the maximum),  easily exceed 3 sigmas, so it is
not a normal distribution. It is also clearly not symmetrical.

[plot]

Yet my measures are relatively precise. The relative distance between
the minimum and the mean, I get a tiny margin, often under 1%.

[plotsm]

It is interesting that the more lanes I have, the more accurate the
results: intuitively, this is not entirely surprising as it breaks
down the data dependency and one bad step has less impact on the
whole processing.

Thus, for a wide range of performance-related timings, you should not
assume that you have a normal distribution without checking first!
Computing the distance between the maximum and the mean divided by
the standard deviation is a useful indicator. I personally find that
a log normal distribution is a better model for my timings.

Published by

[2ca999]

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).
View all posts by Daniel Lemire

Posted on April 6, 2023Author Daniel LemireCategories  

Leave a Reply Cancel reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four
spaces:

    This will be displayed in a monospaced font. The first four
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
Comment * [                                             ]

Name * [                              ]

Email * [                              ]

Website [                              ]

[ ] Save my name, email, and website in this browser for the next
time I comment.

Receive Email Notifications? [no, do not subscribe      ] 
[instantly    ]
Or, you can subscribe without commenting.

[Post Comment] 

 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
D[                                             ] 

You may subscribe to this blog by email.

Post navigation

Previous Previous post: What are we going to do about ChatGPT?
Terms of use Proudly powered by WordPress