[HN Gopher] Efficient and performance-portable vector software
       ___________________________________________________________________
        
       Efficient and performance-portable vector software
        
       Author : signa11
       Score  : 76 points
       Date   : 2023-01-10 09:28 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | nisa wrote:
       | That linked quicksort paper is pretty cool:
       | https://arxiv.org/abs/2205.05982 - for sorting floats/integers
       | the speedup against std::qsort is around 10x.
       | 
       | Considering climate change more focus should be put on making
       | fast and efficient software. Every watt not wasted on bad
       | algorithms/code is good.
        
         | jules wrote:
         | Great paper. Is that the fastest sort right now?
        
         | techie128 wrote:
         | If climate change is a focus, we collectively should stop
         | buying x86 chips and focus on RISC-V and ARM chips.
        
           | sakras wrote:
           | I'm not convinced that'll bring about any significant change.
           | Any power savings from switching to a RISC from x86 is coming
           | from simplifying the instruction decoder, which seems to be
           | about 15-20% if we compare the Ampere Altra to a comparable
           | AMD chip. That's not an order of magnitude.
           | 
           | On the other hand, on the order of 80% of a chip's power is
           | spent on OOO execution. If you want the order of magnitude
           | improvement in power efficiency, you need to dump
           | superscalar/OOO in favor of smart compilers and VLIW. Cheap
           | DSPs have been doing it for years, but compilers aren't good
           | enough yet for general purpose processing.
        
             | janwas wrote:
             | Agree that OoO is the big cost. But we can also mitigate
             | that without VLIW: SIMD/vector reduces the instruction
             | count by ~5x, and energy by a similar factor.
             | 
             | And a portable API such as Highway also helps us move the
             | same code from x86 to Arm or RISC-V with just a recompile
             | :D
        
             | commandlinefan wrote:
             | > compilers aren't good enough yet
             | 
             | So we need to go back to coding in assembler to save the
             | planet? Sign me up!
        
           | ninepoints wrote:
           | If you actually look at x86 microcode much, you start to
           | realize that the backend is actually very risc-like. The
           | distinction between cisc and risc to me is more of a frontend
           | instruction selection one.
        
         | taeric wrote:
         | Defining "not wasted" is carrying a lot of work there. Yes, we
         | should make things as efficient as we can. And we shouldn't
         | reach for slow solutions, if we can avoid it. But... that is,
         | essentially, the famous Knuth quote. Right?
        
           | nisa wrote:
           | The full quote is:
           | 
           | > "Programmers waste enormous amounts of time thinking about,
           | or worrying about, the speed of noncritical parts of their
           | programs, and these attempts at efficiency actually have a
           | strong negative impact when debugging and maintenance are
           | considered. We should forget about small efficiencies, say
           | about 97% of the time: premature optimization is the root of
           | all evil. Yet we should not pass up our opportunities in that
           | critical 3%."
           | 
           | Also this was probably in the 80ies or 90ies and was in
           | reference to loop unrolling or hand-crafting assembly - which
           | is kind of different to just not knowing what you are doing
           | and writing O(n^2) algorithms for O(n) or O(log(n)) problems
           | - due to a lack of understanding the problem which is the
           | problem we have at the moment (not saying I'm above that,
           | been there done that ;).
           | 
           | If Google can speed up std::qsort tenfold using SIMD
           | instructions and likely other operations we should add this
           | code everywhere imho.
        
           | gpderetta wrote:
           | You mean the widely misrepresented one?
        
             | taeric wrote:
             | I do, indeed, mean that one. :D
             | 
             | If it looks like I misrepresented it again, please correct
             | me!
        
       | xxpor wrote:
       | Is there anything similar to this in C?
        
         | janwas wrote:
         | SLEEF comes with an abstraction layer (mostly for math).
         | Instead of C++ function overloading, it uses type suffixes:
         | https://github.com/shibatch/sleef/blob/master/src/arch/helpe...
        
         | owlbite wrote:
         | If you're on an Apple platform, their simd.h header works with
         | the clang vector attribute to provide a nice way to do very
         | similar stuff.
        
       | janwas wrote:
       | Author here, happy to discuss.
        
         | mgraczyk wrote:
         | Looks cool. Are you familiar with Halide? In terms of target
         | support or implementation complexity, how would you company
         | using highway vs using Halide for similar projects? (I work at
         | Google, use Halide)
        
         | jackmott wrote:
         | [dead]
        
         | josephg wrote:
         | This is beautiful work. Do you think it would be possible to
         | port this to Rust some day?
        
         | pistachiopro wrote:
         | Looks very interesting!
         | 
         | I've been having trouble getting good SIMD performance from
         | WASM. Are there any benchmarks posted anywhere comparing
         | Highway's performance on various native targets vs WASM?
        
       ___________________________________________________________________
       (page generated 2023-01-11 23:01 UTC)