[HN Gopher] Using the most unhinged AVX-512 instruction to make ...
       ___________________________________________________________________
        
       Using the most unhinged AVX-512 instruction to make fastest phrase
       search algo
        
       Author : cmcollier
       Score  : 65 points
       Date   : 2025-01-23 21:38 UTC (3 days ago)
        
 (HTM) web link (gab-menezes.github.io)
 (TXT) w3m dump (gab-menezes.github.io)
        
       | nxobject wrote:
       | Spoiler if you don't want to read through the (wonder but many)
       | paragraphs of exposition: the instruction is `vp2intersectq k,
       | zmm, zmm`.
        
         | bri3d wrote:
         | And, as noted in the article, that's an instruction which only
         | works on two desktop CPU architectures (Tiger Lake and Zen 5),
         | including one where it's arguably slower than not using it
         | (Tiger Lake).
         | 
         | Meaning... this entire effort was for something that's faster
         | on only a single kind of CPU (Zen 5).
         | 
         | This article is honestly one of the best I've read in a long
         | time. It's esoteric and the result is 99.5% pointless
         | objectively, but in reality it's incredibly useful and a
         | wonderful guide to low-level x86 optimization end to end. The
         | sections on cache alignment and uiCA + analysis notes are a
         | perfect illustration of "how it's done."
        
       | iamnotagenius wrote:
       | Imo the most "unhinged" cpus for AVX-512 are early batches of
       | Alder Lakes which is the only cpu family that has nearly full
       | coverage of all existing avx-512 subsets.
        
         | fuhsnn wrote:
         | Do they cover anything Sapphire Rapids Xeon's don't? I thought
         | they share the same arch (Golden Cove).
        
           | iamnotagenius wrote:
           | Yes, you are right; I meant "consumer grade cpu".
        
           | suzumer wrote:
           | According to this [1] wikipedia article, the only feature
           | Sapphire Rapids doesn't support is VP2INTERSECT.
           | 
           | [1]:https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
        
             | nextaccountic wrote:
             | It seems that there are faster alternatives to it
             | 
             | https://arxiv.org/abs/2112.06342
             | 
             | https://www.reddit.com/r/asm/comments/110pld0/fasterthannat
             | i...
        
               | hogwarts2025 wrote:
               | Or Zen 5. :-p
        
         | Namidairo wrote:
         | It's a shame that Intel seemed to really not want people to use
         | it, given they started disabling the ability to use it in
         | future microcode, and fused it off in later parts.
        
           | Aurornis wrote:
           | > It's a shame that Intel seemed to really not want people to
           | use it
           | 
           | AVX-512 was never part of the specification for those CPUs.
           | It was never advertised as a feature or selling point. You
           | had to disable the E cores to enable AVX-512, assuming your
           | motherboard even supported it.
           | 
           | Alder Lake AVX-512 has reached mythical status, but I think
           | the number of people angry about it is far higher than the
           | number of people who ever could have taken advantage of it
           | and benefitted from it. For general purpose workloads, having
           | the E cores enabled (and therefore AVX-512 disabled) was
           | faster. You had to have an extremely specific workload that
           | didn't scale well with additional cores and also had hot
           | loops that benefitted from AVX-512, which was not very
           | common.
           | 
           | So you're right: They never wanted people to use it. It
           | wasn't advertised and wasn't usable without sacrificing all
           | of the E cores and doing a lot of manual configuration work.
           | I suspect they didn't want people using it because they never
           | validated it. AVX-512 mode increased the voltages, which
           | would impact things like failure rate and warranty returns.
           | They probably meant to turn it off but forgot in the first
           | versions.
        
       | jonstewart wrote:
       | The most unhinged AVX-512 instruction is GF2P8AFFINEQB.
        
         | pclmulqdq wrote:
         | What about GF2P8AFFINEINVQB?
        
           | jonstewart wrote:
           | potato, potato, tomato, tomato
        
         | bri3d wrote:
         | There's a pretty good list of weird off-label uses for the
         | Galois Field instructions here:
         | https://gist.github.com/animetosho/d3ca95da2131b5813e16b5bb1...
        
         | mrandish wrote:
         | From my 1980s 8-bit CPU perspective, the instruction is
         | unhinged based solely on the number of letters. Compared to
         | LDA, STA, RTS, that's not an assembler mnemonic, it's a novel.
         | :-)
        
       ___________________________________________________________________
       (page generated 2025-01-26 23:00 UTC)