[HN Gopher] Bit twiddling with Arm Neon: beating SSE movemasks, ...
___________________________________________________________________
Bit twiddling with Arm Neon: beating SSE movemasks, counting bits
and more
Author : danlark
Score : 29 points
Date : 2022-08-29 19:56 UTC (3 hours ago)
(HTM) web link (community.arm.com)
(TXT) w3m dump (community.arm.com)
| zX41ZdbW wrote:
| It improves string comparison and sorting in ClickHouse by 15%:
| https://github.com/ClickHouse/ClickHouse/pull/38093
| alas44 wrote:
| Really interesting, thanks for sharing
|
| From the article also, 10-20% improvement (I guess in
| Instructions Per Cycle) on some str methods in glibc
| https://sourceware.org/git/?p=glibc.git;a=commit;h=3c9980698...
| olliej wrote:
| This is a really interesting article. I was expecting some
| obviously biased and/or marketing horror by virtue of it being on
| arm.com
|
| It's actually an interesting breakdown of ways NEON differs from
| SSE, and how a "direct" translation may well be sub optimal.
| Their first example is really illustrative of this. SSE has an
| instruction that pulls the top(I think?) but of each register and
| creates an 8bit mask from those. You can do similar in NEON but
| the perf is apparently terrible. But NEON has an instruction that
| packs some bits from each register into a 64bit value, and you
| can go from that to the masking behaviour you were presumably
| trying for originally, but much faster.
|
| The other examples and case studies are similarly interesting.
___________________________________________________________________
(page generated 2022-08-29 23:00 UTC)