hngopher.com

       [HN Gopher] SIMD.info - Reference tool for C intrinsics of all m...
       ___________________________________________________________________
        
       SIMD.info - Reference tool for C intrinsics of all major SIMD
       engines
        
       Author : pabs3
       Score  : 88 points
       Date   : 2025-07-08 01:44 UTC (21 hours ago)
        
 (HTM) web link (simd.info)
 (TXT) w3m dump (simd.info)
        
       | mshockwave wrote:
       | This is pretty useful! Any plan for adding ARM SVE and RISC-V V
       | extension?
        
         | pabs3 wrote:
         | A response from the SIMD.info folks:
         | 
         | Yeah, the plan is to get all SIMD engines there, RVV is the
         | hardest though (20k intrinsics). Currently we're doing IBM Z,
         | which should be done probably within the month? It still needs
         | some work, and progress is slow because we're just using our
         | own funds. Plan is IBM Z (currently worked on), Loongson
         | LSX/LASX, MIPS MSA, ARM SVE/SVE2 and finally RVV 1.0. LSX/LASX
         | and MSA are very easy. Ideally, I'd like to open source
         | everything, but I can't just now, as I would just hand over all
         | the data to big players like OpenAI. Once I manage to ensure
         | adequate funding, we're going to open source the data
         | (SIMD.info) and probably the model itself (SIMD.ai).
        
           | CalChris wrote:
           | Maybe std::simd could be worked into this.
        
       | syockit wrote:
       | While the search feature is nice, the reference itself still
       | lacks some details about what an instruction actually does. Take
       | for example, [1], and compare it with say [2] (with diagram), [3]
       | (ditto), or [4] (only pseudocode but helpful nonetheless). Of
       | course, all the alternatives mentioned only cater to x86 but
       | still it'd be great if this site also follows the approach taken
       | by the other three.
       | 
       | [1]: https://simd.info/c_intrinsic/_mm256_permute_pd [2]:
       | https://www.felixcloutier.com/x86/vpermilpd [3]:
       | https://officedaytime.com/simd512e/simdimg/si.php?f=vpermilp...
       | [4]: https://www.intel.com/content/www/us/en/docs/intrinsics-
       | guid...
        
         | skavi wrote:
         | https://dougallj.github.io/asil/ is like officedaytime but for
         | SVE.
        
         | camel-cdr wrote:
         | https://github.com/dzaima/intrinsics-viewer is like Intels
         | Guide, but also for Arm, RISC-V and wasm.
         | 
         | RISC-V and wasm are hosted here:
         | https://dzaima.github.io/intrinsics-viewer/
         | 
         | You need to download it your self if you want to use the
         | others.
        
         | vectorcamp wrote:
         | Hi, I'm one of the SIMD.info team, thanks for your feedback.
         | 
         | We would actually like to include more information, but our
         | goal is to complement the official documentation not replace
         | it. We actually provide links to Felix Cloutier's and Intel
         | sites anyway, same for Arm and Power, where we can.
         | 
         | The biggest problem is the generation of the diagrams, we're
         | investigating some way to generate these diagrams in a common
         | manner for all architectures, but this will take time.
        
       | convery wrote:
       | Neat idea, the 'search' feature is a bit odd though if you don't
       | know which instruction you are looking for. e.g. searching for
       | 'SHA' shows the autocomplete for platforms not selected and then
       | 0 results due to the filters (they haven't been added for SSE/AVX
       | yet), but searching for 'hash' gets you 100 results like
       | '_mm256_castsi256_ph' which has nothing to do with the search.
        
         | gMermigkis wrote:
         | Thanks for your comment. We have noticed some strange behavior
         | with the "search" feature, you are right to mention that & we
         | are currently trying to improve its performance. Regarding the
         | SHA you don't get any results when filtering out NEON or VSX,
         | because the AVX512 SHA intrinsics hasn't been added yet (under
         | dev atm). When searching for "HASH", the first 3 results that
         | you get are correct (NEON), the other ones are as mentioned
         | before are bad behavior of the search component - it must have
         | found some similarity.
        
       | varispeed wrote:
       | SIMD from MCUs would also be awesome!
        
         | vectorcamp wrote:
         | Do you mean Helium from Arm? Yes, that would be nice to include
         | and relatively easy as it's mostly the same as Neon.
        
       | Sesse__ wrote:
       | I clicked the "go" button just to see the typical format, and it
       | gave... zero results. Because the example is "e.g. integer vector
       | addition" and it doesn't strip away the "e.g." part!
       | 
       | Apart from that, I find the search results too sparse (doesn't
       | contain the prototype) and the result page too verbose (way too
       | much fluff in the description, and way too much setup in the
       | example; honestly, who cares about <stdio.h>[1]), so I'll
       | probably stick to the existing x86/Arm references.
       | 
       | [1] Also, the contrast is set so low that I literally cannot read
       | all of the example.
        
         | SloopJon wrote:
         | I don't think it's that it's not stripping "e.g.", but that the
         | search criteria are empty. The empty result set is prefaced by
         | "Search results for:".
         | 
         | I actually like that the example is a complete, standalone
         | program that you can compile or send to Compiler Explorer.
        
         | vectorcamp wrote:
         | You make some good points. I represent Vectorcamp (creators of
         | simd.info). It's still in Beta status, because we know there
         | are some limitations currently, but we are already using it in
         | production for our own projects. Now to comment on your points:
         | 
         | 1. Empty string -> zero results, obviously a bug, we'll add
         | some default value. 2. The sparse results are because of VSX,
         | VSX provides multiple prototypes per intrinsic, which we
         | thought would increase the size of the results a bit too much.
         | Including the prototypes in the results is not a problem, but
         | we don't want to have too much information on the other hand,
         | that would make it too hard for the developer to find the
         | relevant intrinsic. We'll take another look at this.
         | 
         | The description actually is rather bare, we intend to include a
         | lot more information, like diagrams, pseudo code for the
         | operation etc.
         | 
         | Examples are meant to be run as self-contained compile units,
         | in the Compiler Explorer or locally, to demonstrate the
         | intrinsic hence the extra setup. This will not change.
         | 
         | We also think that nothing will replace the official ISA
         | references, we also include links to those anyway.
         | 
         | 3. Regarding the contrast, we're already working on a
         | light/dark theme.
         | 
         | Thank you for your comments.
        
       | llm_nerd wrote:
       | Neat tool.
       | 
       | It is interesting how often SIMD stuff is discussed on here. Are
       | people really directly dealing with SIMD calls a lot?
       | 
       | I get the draw -- this sort of to-the-metal hyper-optimization is
       | legitimately fun and intellectually rewarding -- but I suspect
       | that in the overwhelming majority of cases simply using the
       | appropriate library, ideally one that is cross-platform and
       | utilizes what SIMD a given target hosts, is a far better choice
       | than bothering with the esoterica or every platform and
       | generation of SIMD offerings.
        
         | vectorcamp wrote:
         | I agree, it's always best to use something that already exists
         | and is optimized for your platform, unless it doesn't exist or
         | you need extra features that are not covered. In those cases
         | you need to read large ISA manuals, use each vendor's intrinsic
         | site or use our tool SIMD.info :)
        
         | sophacles wrote:
         | I kinda agree with the main point, but keep in mind those
         | libraries with SIMD optimizations don't just appear out of
         | nowhere... people write those. Also it's pretty common for
         | someone to write software for an org that thas 10^5 or more
         | identical cores running in a datacenter (or datacenters)...
         | some specialized optimization can easily be cost-effective in
         | those situations. Then there's crazy distributed systems stuff,
         | where a small latency reduction in the right place can have
         | significant impact for an entire cluster. And on and on....
         | 
         | Point being, while not everyone is in a position this stuff is
         | relevant (and not everyone who sometimes finds this stuff
         | relevant can say it's relevant often), it's more widely
         | applicable than you're suggesting.
        
           | llm_nerd wrote:
           | For sure there are obviously developers building those
           | computation libraries like numpy, compilers, R, and so on.
           | These people exist and are grinding out great code and
           | abstractions for the rest of us to use, and many of them are
           | regulars on HN. But these people are seldom the target of the
           | "learn SIMD" content that appears on here regularly.
           | 
           | If someone is an average developer building a game or a
           | corporate information or financial system or even a neural
           | network implementation, if you are touching SIMD code
           | directly you're probably approaching things in a less than
           | optimal fashion and there are much better ways to utilize
           | whatever features your hardware, or hardware in the future,
           | may offer up.
        
       | fancyfredbot wrote:
       | The link to SIMD.AI is interesting. I didn't have a perfect
       | experience trying to get Claude to convert a scalar code to
       | AVX512.
       | 
       | Claude seems to enjoy storing 16 bit masks in 512 bit vectors but
       | the compiler will find that easily.
       | 
       | The biggest issue I encountered was that when converting nested
       | if statements into mask operations, it would frequently forget to
       | and the inner and outer mask together.
        
         | vectorcamp wrote:
         | Getting an LLM to translate code is very tricky, we haven't
         | included AVX2 and AVX512 yet in our SIMD.ai because it requires
         | a lot more work. However, translating code between similarly
         | sized vector engines is doable when we finetuned our own data
         | to the LLM. We tested both ChatGPT and Claude -and more- but
         | none could do even the simplest translations between eg SSE4.2
         | and Neon or VSX. So trying something harder like AVX512 felt
         | like a bit of a stretch. But we're working on it.
        
       | Const-me wrote:
       | The ISA extension tags are mostly incorrect. According to that
       | web site, all SSE2, SSE3, SSSE3, and SSE4.1 intrinsics are part
       | of SSE 4.2, and all FMA3 intrinsics are part of AVX2. BTW there's
       | one processor which supports AVX2 but lacks FMA3:
       | https://en.wikipedia.org/wiki/List_of_VIA_Eden_microprocesso...
       | 
       | The search is less than ideal. Search for FMA, it will find
       | multiple pages of NEON intrinsics, but no AMD64 like
       | _mm256_fmadd_pd
        
         | vectorcamp wrote:
         | Hi, thanks for your feedback, we are being "incorrect" on
         | purpose. All intrinsics up to and including SSE4.2 were
         | including as part of SSE4.2. We have no intention of providing
         | full granularity for any ISA extension esp one that is 20 years
         | old. For the same reason, we are listing VSX as including in
         | Power ISA 3.0, but not eg. Altivec or Power7/Power8 VSX. If you
         | need such granularity, you are better off visiting the Intel
         | Intrinsics Guide or the ISA manuals. So the separation for x86
         | is split into 3 groups, SSE4.2 (up to and including), AVX2
         | (including AVX) and AVX512 (also including some but not all
         | variants). Something like x86_64-v1, x86_64-v2, etc that is
         | used on compilers. We will probably do a finer granularity
         | listing the exact extension in the description in the future,
         | but not as part of the categorization.
         | 
         | Now the search is indeed less than ideal, we're working on
         | replacing our search engine with a much more robust that
         | doesn't favour one architecture over the other, especially in
         | such words.
         | 
         | In any case, thank you for your feedback. It's still in beta
         | but it is already very useful for us, as we're actually using
         | it for development on our own projects.
        
           | vient wrote:
           | Would also be nice to remove empty categories from tree view.
           | For example, right now you can uncheck VSX and still see
           | "Memory Operations - VSX Unaligned ..." full of empty tags.
        
             | gMermigkis wrote:
             | Thank you for this comment, will be taken into
             | consideration.
        
           | ack_complete wrote:
           | Note that this issue also affects NEON. Two examples are
           | vmull_p64(), which requires the Crypto extension -- notably
           | absent on RPi3/4 -- and vqrdmlah_s32(), which requires
           | FEAT_RDM, not guaranteed until ARMv8.1. Unlike Intel, ARM
           | doesn't do a very good job of surfacing this in their
           | intrinsics guide.
        
       ___________________________________________________________________
       (page generated 2025-07-08 23:01 UTC)