[HN Gopher] Measuring energy usage: regular code vs. SIMD code
       ___________________________________________________________________
        
       Measuring energy usage: regular code vs. SIMD code
        
       Author : ashvardanian
       Score  : 31 points
       Date   : 2024-02-19 21:41 UTC (1 hours ago)
        
 (HTM) web link (lemire.me)
 (TXT) w3m dump (lemire.me)
        
       | jsheard wrote:
       | It makes intuitive sense that going wide in SIMD uses less power
       | than going wide over cores, because extra SIMD lanes take up
       | _much_ less silicon area than adding another whole core. If they
       | didn 't then we wouldn't bother with SIMD, we'd just make more
       | cores and save ourselves the hassle of juggling two different
       | types of parallelism.
       | 
       | GPUs take this even further by going even wider, with each "core"
       | typically executing 1024-bit SIMD operations (i.e. 32x FP32, 64x
       | FP16, etc). CPUs have roughly settled at 128-bit (most ARM) or
       | 256-bit (most x86) with a little bit of 512-bit (x86 with AVX512)
       | which grew out of Intels earlier aborted attempt at making a
       | dedicated GPU.
        
         | dist-epoch wrote:
         | Not only that, but you need to decode the SIMD instruction only
         | once, and not N times. Same with other stages of the pipeline.
        
       | andy99 wrote:
       | There's a talk by a Nvidia scientist I just saw where he talks
       | about compute efficiency in terms of flops/joule and explicitly
       | mentions vectorized instructions as one of the energy savers.
       | I'll see if I can find it.
       | 
       | Edit: found it (I think) I can't remember where in the
       | presentation but he does mention energy efficiency as a proxy for
       | performance. https://m.youtube.com/watch?v=kLiwvnr4L80
       | 
       | Edit 2: about 26:40 he starts talking about energy use in the
       | context of performance.
        
         | 01HNNWZ0MV43FF wrote:
         | I guess it also makes sense if you figure that dedicated
         | hardware for things like AES, video codecs, etc. are similar to
         | SIMD
        
       ___________________________________________________________________
       (page generated 2024-02-19 23:00 UTC)