[HN Gopher] Measuring energy usage: regular code vs. SIMD code
___________________________________________________________________
Measuring energy usage: regular code vs. SIMD code
Author : ashvardanian
Score : 31 points
Date : 2024-02-19 21:41 UTC (1 hours ago)
(HTM) web link (lemire.me)
(TXT) w3m dump (lemire.me)
| jsheard wrote:
| It makes intuitive sense that going wide in SIMD uses less power
| than going wide over cores, because extra SIMD lanes take up
| _much_ less silicon area than adding another whole core. If they
| didn 't then we wouldn't bother with SIMD, we'd just make more
| cores and save ourselves the hassle of juggling two different
| types of parallelism.
|
| GPUs take this even further by going even wider, with each "core"
| typically executing 1024-bit SIMD operations (i.e. 32x FP32, 64x
| FP16, etc). CPUs have roughly settled at 128-bit (most ARM) or
| 256-bit (most x86) with a little bit of 512-bit (x86 with AVX512)
| which grew out of Intels earlier aborted attempt at making a
| dedicated GPU.
| dist-epoch wrote:
| Not only that, but you need to decode the SIMD instruction only
| once, and not N times. Same with other stages of the pipeline.
| andy99 wrote:
| There's a talk by a Nvidia scientist I just saw where he talks
| about compute efficiency in terms of flops/joule and explicitly
| mentions vectorized instructions as one of the energy savers.
| I'll see if I can find it.
|
| Edit: found it (I think) I can't remember where in the
| presentation but he does mention energy efficiency as a proxy for
| performance. https://m.youtube.com/watch?v=kLiwvnr4L80
|
| Edit 2: about 26:40 he starts talking about energy use in the
| context of performance.
| 01HNNWZ0MV43FF wrote:
| I guess it also makes sense if you figure that dedicated
| hardware for things like AES, video codecs, etc. are similar to
| SIMD
___________________________________________________________________
(page generated 2024-02-19 23:00 UTC)