[HN Gopher] Efficient and performance-portable vector software
___________________________________________________________________
Efficient and performance-portable vector software
Author : signa11
Score : 76 points
Date : 2023-01-10 09:28 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| nisa wrote:
| That linked quicksort paper is pretty cool:
| https://arxiv.org/abs/2205.05982 - for sorting floats/integers
| the speedup against std::qsort is around 10x.
|
| Considering climate change more focus should be put on making
| fast and efficient software. Every watt not wasted on bad
| algorithms/code is good.
| jules wrote:
| Great paper. Is that the fastest sort right now?
| techie128 wrote:
| If climate change is a focus, we collectively should stop
| buying x86 chips and focus on RISC-V and ARM chips.
| sakras wrote:
| I'm not convinced that'll bring about any significant change.
| Any power savings from switching to a RISC from x86 is coming
| from simplifying the instruction decoder, which seems to be
| about 15-20% if we compare the Ampere Altra to a comparable
| AMD chip. That's not an order of magnitude.
|
| On the other hand, on the order of 80% of a chip's power is
| spent on OOO execution. If you want the order of magnitude
| improvement in power efficiency, you need to dump
| superscalar/OOO in favor of smart compilers and VLIW. Cheap
| DSPs have been doing it for years, but compilers aren't good
| enough yet for general purpose processing.
| janwas wrote:
| Agree that OoO is the big cost. But we can also mitigate
| that without VLIW: SIMD/vector reduces the instruction
| count by ~5x, and energy by a similar factor.
|
| And a portable API such as Highway also helps us move the
| same code from x86 to Arm or RISC-V with just a recompile
| :D
| commandlinefan wrote:
| > compilers aren't good enough yet
|
| So we need to go back to coding in assembler to save the
| planet? Sign me up!
| ninepoints wrote:
| If you actually look at x86 microcode much, you start to
| realize that the backend is actually very risc-like. The
| distinction between cisc and risc to me is more of a frontend
| instruction selection one.
| taeric wrote:
| Defining "not wasted" is carrying a lot of work there. Yes, we
| should make things as efficient as we can. And we shouldn't
| reach for slow solutions, if we can avoid it. But... that is,
| essentially, the famous Knuth quote. Right?
| nisa wrote:
| The full quote is:
|
| > "Programmers waste enormous amounts of time thinking about,
| or worrying about, the speed of noncritical parts of their
| programs, and these attempts at efficiency actually have a
| strong negative impact when debugging and maintenance are
| considered. We should forget about small efficiencies, say
| about 97% of the time: premature optimization is the root of
| all evil. Yet we should not pass up our opportunities in that
| critical 3%."
|
| Also this was probably in the 80ies or 90ies and was in
| reference to loop unrolling or hand-crafting assembly - which
| is kind of different to just not knowing what you are doing
| and writing O(n^2) algorithms for O(n) or O(log(n)) problems
| - due to a lack of understanding the problem which is the
| problem we have at the moment (not saying I'm above that,
| been there done that ;).
|
| If Google can speed up std::qsort tenfold using SIMD
| instructions and likely other operations we should add this
| code everywhere imho.
| gpderetta wrote:
| You mean the widely misrepresented one?
| taeric wrote:
| I do, indeed, mean that one. :D
|
| If it looks like I misrepresented it again, please correct
| me!
| xxpor wrote:
| Is there anything similar to this in C?
| janwas wrote:
| SLEEF comes with an abstraction layer (mostly for math).
| Instead of C++ function overloading, it uses type suffixes:
| https://github.com/shibatch/sleef/blob/master/src/arch/helpe...
| owlbite wrote:
| If you're on an Apple platform, their simd.h header works with
| the clang vector attribute to provide a nice way to do very
| similar stuff.
| janwas wrote:
| Author here, happy to discuss.
| mgraczyk wrote:
| Looks cool. Are you familiar with Halide? In terms of target
| support or implementation complexity, how would you company
| using highway vs using Halide for similar projects? (I work at
| Google, use Halide)
| jackmott wrote:
| [dead]
| josephg wrote:
| This is beautiful work. Do you think it would be possible to
| port this to Rust some day?
| pistachiopro wrote:
| Looks very interesting!
|
| I've been having trouble getting good SIMD performance from
| WASM. Are there any benchmarks posted anywhere comparing
| Highway's performance on various native targets vs WASM?
___________________________________________________________________
(page generated 2023-01-11 23:01 UTC)