[HN Gopher] Using the most unhinged AVX-512 instruction to make ...
___________________________________________________________________
Using the most unhinged AVX-512 instruction to make fastest phrase
search algo
Author : cmcollier
Score : 65 points
Date : 2025-01-23 21:38 UTC (3 days ago)
(HTM) web link (gab-menezes.github.io)
(TXT) w3m dump (gab-menezes.github.io)
| nxobject wrote:
| Spoiler if you don't want to read through the (wonder but many)
| paragraphs of exposition: the instruction is `vp2intersectq k,
| zmm, zmm`.
| bri3d wrote:
| And, as noted in the article, that's an instruction which only
| works on two desktop CPU architectures (Tiger Lake and Zen 5),
| including one where it's arguably slower than not using it
| (Tiger Lake).
|
| Meaning... this entire effort was for something that's faster
| on only a single kind of CPU (Zen 5).
|
| This article is honestly one of the best I've read in a long
| time. It's esoteric and the result is 99.5% pointless
| objectively, but in reality it's incredibly useful and a
| wonderful guide to low-level x86 optimization end to end. The
| sections on cache alignment and uiCA + analysis notes are a
| perfect illustration of "how it's done."
| iamnotagenius wrote:
| Imo the most "unhinged" cpus for AVX-512 are early batches of
| Alder Lakes which is the only cpu family that has nearly full
| coverage of all existing avx-512 subsets.
| fuhsnn wrote:
| Do they cover anything Sapphire Rapids Xeon's don't? I thought
| they share the same arch (Golden Cove).
| iamnotagenius wrote:
| Yes, you are right; I meant "consumer grade cpu".
| suzumer wrote:
| According to this [1] wikipedia article, the only feature
| Sapphire Rapids doesn't support is VP2INTERSECT.
|
| [1]:https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
| nextaccountic wrote:
| It seems that there are faster alternatives to it
|
| https://arxiv.org/abs/2112.06342
|
| https://www.reddit.com/r/asm/comments/110pld0/fasterthannat
| i...
| hogwarts2025 wrote:
| Or Zen 5. :-p
| Namidairo wrote:
| It's a shame that Intel seemed to really not want people to use
| it, given they started disabling the ability to use it in
| future microcode, and fused it off in later parts.
| Aurornis wrote:
| > It's a shame that Intel seemed to really not want people to
| use it
|
| AVX-512 was never part of the specification for those CPUs.
| It was never advertised as a feature or selling point. You
| had to disable the E cores to enable AVX-512, assuming your
| motherboard even supported it.
|
| Alder Lake AVX-512 has reached mythical status, but I think
| the number of people angry about it is far higher than the
| number of people who ever could have taken advantage of it
| and benefitted from it. For general purpose workloads, having
| the E cores enabled (and therefore AVX-512 disabled) was
| faster. You had to have an extremely specific workload that
| didn't scale well with additional cores and also had hot
| loops that benefitted from AVX-512, which was not very
| common.
|
| So you're right: They never wanted people to use it. It
| wasn't advertised and wasn't usable without sacrificing all
| of the E cores and doing a lot of manual configuration work.
| I suspect they didn't want people using it because they never
| validated it. AVX-512 mode increased the voltages, which
| would impact things like failure rate and warranty returns.
| They probably meant to turn it off but forgot in the first
| versions.
| jonstewart wrote:
| The most unhinged AVX-512 instruction is GF2P8AFFINEQB.
| pclmulqdq wrote:
| What about GF2P8AFFINEINVQB?
| jonstewart wrote:
| potato, potato, tomato, tomato
| bri3d wrote:
| There's a pretty good list of weird off-label uses for the
| Galois Field instructions here:
| https://gist.github.com/animetosho/d3ca95da2131b5813e16b5bb1...
| mrandish wrote:
| From my 1980s 8-bit CPU perspective, the instruction is
| unhinged based solely on the number of letters. Compared to
| LDA, STA, RTS, that's not an assembler mnemonic, it's a novel.
| :-)
___________________________________________________________________
(page generated 2025-01-26 23:00 UTC)