[HN Gopher] Do low-level optimizations matter? Faster quicksort ...
___________________________________________________________________
Do low-level optimizations matter? Faster quicksort with cmov
(2020)
Author : fanf2
Score : 23 points
Date : 2024-08-21 20:42 UTC (2 hours ago)
(HTM) web link (cantrip.org)
(TXT) w3m dump (cantrip.org)
| mbroncano wrote:
| (2020)
| karmakaze wrote:
| I would expect eliminating branches in a busy inner loop to
| matter.
|
| The interesting part is how that was done:
|
| > A New Primitive, swap_if
|
| > How can we use this method in our sort? First, let us make a
| swap_if: inline bool swap_if(bool c, int& a, int&
| b) { int ta = a, mask = -c; // false -> 0, true ->
| 111..111 a = (b & mask) | (ta & ~mask); b = (ta &
| mask) | (b & ~mask); return c; }
|
| > In our partition function, then, we can transform
| if (*right <= pivot) { int tmp = *left; *left = *right,
| *right = tmp; ++left; }
|
| > into just left += swap_if(*right <= pivot,
| *left, *right);
| memset wrote:
| Fascinating! How does one learn to write C code that will make
| use of specific asm instructions?
|
| SIMDjson is another example that comes to mind. The conceit of C
| is that you do t have control over the underlying machine
| instructions without inlining it yourself. So how do people write
| C code with cpu optimizations like this?
| anonymoushn wrote:
| Use inline assembly or intrinsics. Read the code of stuff that
| does these things.
|
| Resources sufficient for implementing simdjson or similar
| vectorized parsers:
| https://www.intel.com/content/www/us/en/docs/intrinsics-guid...
| https://lemire.me/blog/ http://0x80.pl/articles/
| addaon wrote:
| There are three approaches:
|
| 1) Use intrinsics, if your platform provides intrinsics for the
| instructions you want to use.
|
| 2) Use inline assembly (which I suppose doesn't technically
| count as writing C code, but is very much part of the story).
|
| 3) Write C code carefully against a chosen compiler with chosen
| optimization flags, and inspect the assembly. Especially for
| small chunks of code, a tool like godbolt.org is indispensable.
| Basically, you're writing the assembly you want the compiler to
| generate (either explicitly, or just in your head), then
| writing C code that seems likely to generate that sequence of
| instructions, then tweaking the code until the output you want
| is generated. If this is a super important optimization, it's
| also reasonable to add a build step that inspects the generated
| assembly and fails the build if it doesn't match the desired
| pattern; but in that case, writing inline assembly is usually
| easier.
| vlovich123 wrote:
| Option 4 is sometimes compiler annotations like
| __builtin_expect_with_probability which you could use to
| assign a branch a value of 50% which should coerce it to pick
| cmov (same risk as 3 though in that you need to inspect
| compiler output).
| adelpozo wrote:
| I would say it is a moving target if the goal is to write C
| code that compiles to a seemingly magical cpu instruction. As
| other have pointed out, learning assembly and compiler explorer
| are useful things.
|
| As a way to show the wonderful complexities of compilers check
| the TOC of https://shop.elsevier.com/books/optimizing-
| compilers-for-mod...
| kimixa wrote:
| > Back in 2000, AMD included cmov in its 64-bit x86 ISA
| extensions. Then, Intel had to adopt them when Itanium flopped.
|
| Wasn't "cmov" one of the things added for the pentium pro? So it
| wasn't instruction compatible - hence the "i686" prefix to a lot
| of compiler triples?
___________________________________________________________________
(page generated 2024-08-21 23:00 UTC)