https://lemire.me/blog/2024/02/19/measuring-energy-usage-regular-code-vs-simd-code/ Skip to content Daniel Lemire's blog Daniel Lemire is a computer science professor at the Data Science Laboratory of the Universite du Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist and a free-speech advocate. Menu and widgets * My home page * My papers * My software Join over 12,500 email subscribers: [ ][Go!] You can follow this blog on telegram. You can find me on twitter as @lemire or on Mastodon. Search for: [ ] [Search] Support my work! I do not accept any advertisement. However, you can you can sponsor my open-source work on GitHub. Recent Posts * Measuring energy usage: regular code vs. SIMD code * JSON Parsing: Intel Sapphire Rapids versus AMD Zen 4 * How fast is rolling Karp-Rabin hashing? * C23: a slightly better C * How much memory bandwidth do large Amazon instances offer? Recent Comments * Daniel Lemire on JSON Parsing: Intel Sapphire Rapids versus AMD Zen 4 * Joe Duarte on JSON Parsing: Intel Sapphire Rapids versus AMD Zen 4 * JC on C23: a slightly better C * JC on C23: a slightly better C * Daniel Lemire on Estimating your memory bandwidth Pages * A short history of technology * About me * Book recommendations * Cognitive biases * Interviews and talks * My bets * My favorite articles * My favorite quotes * My rules * Newsletter * Predictions * Privacy Policy * Recommended video games * Terms of use * Write good papers Archives Archives [Select Month ] Boring stuff * Log in * Entries feed * Comments feed * WordPress.org Measuring energy usage: regular code vs. SIMD code Modern processor have fancy instructions that can do many operations at one using wide registers: SIMD instructions. Intel and AMD have 512-bit registers and associated instructions under AVX-512. You expect these instructions to use more power, more energy. However, they get the job done faster. Do you save energy overall? You should expect so. Let us consider an example. I can just sum all values in a large array. float sum(float *data, size_t N) { double counter = 0; for (size_t i = 0; i < N; i++) { counter += data[i]; } return counter; } If I leave it as is, the compiler might be tempted to optimize too much, but I can instruct it to avoid 'autovectorization': it will not doing anything fancy. I can write the equivalent function using AVX-512 intrinsic functions. The details do not matter too much, just trust me that it is expected to be faster for sufficiently long inputs. float sum(float *data, size_t N) { __m512d counter = _mm512_setzero_pd(); for (size_t i = 0; i < N; i += 16) { __m512 v = _mm512_loadu_ps((__m512 *)&data[i]); __m512d part1 = _mm512_cvtps_pd(_mm512_extractf32x8_ps(v, 0)); __m512d part2 = _mm512_cvtps_pd(_mm512_extractf32x8_ps(v, 1)); counter = _mm512_add_pd(counter, part1); counter = _mm512_add_pd(counter, part2); } double sum = _mm512_reduce_add_pd(counter); for (size_t i = N / 16 * 16; i < N; i++) { sum += data[i]; } return sum; } Under Linux, we can ask the kernel about power usage. You can query the power usage of different components, but I query the overall power usage. This includes, among other things, the power usage of the memory system. It works well with Intel processors as long as you have privileged access on the system. I wrote a little benchmark that runs both functions. On a 32-core Ice Lake processors, my results are as follows: naive code 0.055 muJ/s 0.11 muJ/value AVX-512 0.061 muJ/s 0.032 muJ/value So the AVX-512 uses 3.5 times less energy overall, despite consuming 10% more energy per unit of time. My benchmark is naive and should only serve as an illustration. The general principle holds, however: if your tasks complete much faster, you are likely to use less power, even if you are using more energy per unit of time. Published by [2ca999] Daniel Lemire A computer science professor at the University of Quebec (TELUQ). View all posts by Daniel Lemire Posted on February 19, 2024Author Daniel LemireCategories Leave a Reply Cancel reply Your email address will not be published. To create code blocks or other preformatted text, indent by four spaces: This will be displayed in a monospaced font. The first four spaces will be stripped off, but all other whitespace will be preserved. Markdown is turned off in code blocks: [This is not a link](http://example.com) To create not a block, but an inline code span, use backticks: Here is some inline `code`. For more help see http://daringfireball.net/projects/markdown/syntax [ ] [ ] [ ] [ ] [ ] [ ] [ ] Comment * [ ] Name * [ ] Email * [ ] Website [ ] [ ] Save my name, email, and website in this browser for the next time I comment. Receive Email Notifications? [no, do not subscribe ] [instantly ] Or, you can subscribe without commenting. [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] You may subscribe to this blog by email. Post navigation Previous Previous post: JSON Parsing: Intel Sapphire Rapids versus AMD Zen 4 Terms of use Proudly powered by WordPress