[HN Gopher] SIMD.info - Reference tool for C intrinsics of all m...
___________________________________________________________________
SIMD.info - Reference tool for C intrinsics of all major SIMD
engines
Author : pabs3
Score : 88 points
Date : 2025-07-08 01:44 UTC (21 hours ago)
(HTM) web link (simd.info)
(TXT) w3m dump (simd.info)
| mshockwave wrote:
| This is pretty useful! Any plan for adding ARM SVE and RISC-V V
| extension?
| pabs3 wrote:
| A response from the SIMD.info folks:
|
| Yeah, the plan is to get all SIMD engines there, RVV is the
| hardest though (20k intrinsics). Currently we're doing IBM Z,
| which should be done probably within the month? It still needs
| some work, and progress is slow because we're just using our
| own funds. Plan is IBM Z (currently worked on), Loongson
| LSX/LASX, MIPS MSA, ARM SVE/SVE2 and finally RVV 1.0. LSX/LASX
| and MSA are very easy. Ideally, I'd like to open source
| everything, but I can't just now, as I would just hand over all
| the data to big players like OpenAI. Once I manage to ensure
| adequate funding, we're going to open source the data
| (SIMD.info) and probably the model itself (SIMD.ai).
| CalChris wrote:
| Maybe std::simd could be worked into this.
| syockit wrote:
| While the search feature is nice, the reference itself still
| lacks some details about what an instruction actually does. Take
| for example, [1], and compare it with say [2] (with diagram), [3]
| (ditto), or [4] (only pseudocode but helpful nonetheless). Of
| course, all the alternatives mentioned only cater to x86 but
| still it'd be great if this site also follows the approach taken
| by the other three.
|
| [1]: https://simd.info/c_intrinsic/_mm256_permute_pd [2]:
| https://www.felixcloutier.com/x86/vpermilpd [3]:
| https://officedaytime.com/simd512e/simdimg/si.php?f=vpermilp...
| [4]: https://www.intel.com/content/www/us/en/docs/intrinsics-
| guid...
| skavi wrote:
| https://dougallj.github.io/asil/ is like officedaytime but for
| SVE.
| camel-cdr wrote:
| https://github.com/dzaima/intrinsics-viewer is like Intels
| Guide, but also for Arm, RISC-V and wasm.
|
| RISC-V and wasm are hosted here:
| https://dzaima.github.io/intrinsics-viewer/
|
| You need to download it your self if you want to use the
| others.
| vectorcamp wrote:
| Hi, I'm one of the SIMD.info team, thanks for your feedback.
|
| We would actually like to include more information, but our
| goal is to complement the official documentation not replace
| it. We actually provide links to Felix Cloutier's and Intel
| sites anyway, same for Arm and Power, where we can.
|
| The biggest problem is the generation of the diagrams, we're
| investigating some way to generate these diagrams in a common
| manner for all architectures, but this will take time.
| convery wrote:
| Neat idea, the 'search' feature is a bit odd though if you don't
| know which instruction you are looking for. e.g. searching for
| 'SHA' shows the autocomplete for platforms not selected and then
| 0 results due to the filters (they haven't been added for SSE/AVX
| yet), but searching for 'hash' gets you 100 results like
| '_mm256_castsi256_ph' which has nothing to do with the search.
| gMermigkis wrote:
| Thanks for your comment. We have noticed some strange behavior
| with the "search" feature, you are right to mention that & we
| are currently trying to improve its performance. Regarding the
| SHA you don't get any results when filtering out NEON or VSX,
| because the AVX512 SHA intrinsics hasn't been added yet (under
| dev atm). When searching for "HASH", the first 3 results that
| you get are correct (NEON), the other ones are as mentioned
| before are bad behavior of the search component - it must have
| found some similarity.
| varispeed wrote:
| SIMD from MCUs would also be awesome!
| vectorcamp wrote:
| Do you mean Helium from Arm? Yes, that would be nice to include
| and relatively easy as it's mostly the same as Neon.
| Sesse__ wrote:
| I clicked the "go" button just to see the typical format, and it
| gave... zero results. Because the example is "e.g. integer vector
| addition" and it doesn't strip away the "e.g." part!
|
| Apart from that, I find the search results too sparse (doesn't
| contain the prototype) and the result page too verbose (way too
| much fluff in the description, and way too much setup in the
| example; honestly, who cares about <stdio.h>[1]), so I'll
| probably stick to the existing x86/Arm references.
|
| [1] Also, the contrast is set so low that I literally cannot read
| all of the example.
| SloopJon wrote:
| I don't think it's that it's not stripping "e.g.", but that the
| search criteria are empty. The empty result set is prefaced by
| "Search results for:".
|
| I actually like that the example is a complete, standalone
| program that you can compile or send to Compiler Explorer.
| vectorcamp wrote:
| You make some good points. I represent Vectorcamp (creators of
| simd.info). It's still in Beta status, because we know there
| are some limitations currently, but we are already using it in
| production for our own projects. Now to comment on your points:
|
| 1. Empty string -> zero results, obviously a bug, we'll add
| some default value. 2. The sparse results are because of VSX,
| VSX provides multiple prototypes per intrinsic, which we
| thought would increase the size of the results a bit too much.
| Including the prototypes in the results is not a problem, but
| we don't want to have too much information on the other hand,
| that would make it too hard for the developer to find the
| relevant intrinsic. We'll take another look at this.
|
| The description actually is rather bare, we intend to include a
| lot more information, like diagrams, pseudo code for the
| operation etc.
|
| Examples are meant to be run as self-contained compile units,
| in the Compiler Explorer or locally, to demonstrate the
| intrinsic hence the extra setup. This will not change.
|
| We also think that nothing will replace the official ISA
| references, we also include links to those anyway.
|
| 3. Regarding the contrast, we're already working on a
| light/dark theme.
|
| Thank you for your comments.
| llm_nerd wrote:
| Neat tool.
|
| It is interesting how often SIMD stuff is discussed on here. Are
| people really directly dealing with SIMD calls a lot?
|
| I get the draw -- this sort of to-the-metal hyper-optimization is
| legitimately fun and intellectually rewarding -- but I suspect
| that in the overwhelming majority of cases simply using the
| appropriate library, ideally one that is cross-platform and
| utilizes what SIMD a given target hosts, is a far better choice
| than bothering with the esoterica or every platform and
| generation of SIMD offerings.
| vectorcamp wrote:
| I agree, it's always best to use something that already exists
| and is optimized for your platform, unless it doesn't exist or
| you need extra features that are not covered. In those cases
| you need to read large ISA manuals, use each vendor's intrinsic
| site or use our tool SIMD.info :)
| sophacles wrote:
| I kinda agree with the main point, but keep in mind those
| libraries with SIMD optimizations don't just appear out of
| nowhere... people write those. Also it's pretty common for
| someone to write software for an org that thas 10^5 or more
| identical cores running in a datacenter (or datacenters)...
| some specialized optimization can easily be cost-effective in
| those situations. Then there's crazy distributed systems stuff,
| where a small latency reduction in the right place can have
| significant impact for an entire cluster. And on and on....
|
| Point being, while not everyone is in a position this stuff is
| relevant (and not everyone who sometimes finds this stuff
| relevant can say it's relevant often), it's more widely
| applicable than you're suggesting.
| llm_nerd wrote:
| For sure there are obviously developers building those
| computation libraries like numpy, compilers, R, and so on.
| These people exist and are grinding out great code and
| abstractions for the rest of us to use, and many of them are
| regulars on HN. But these people are seldom the target of the
| "learn SIMD" content that appears on here regularly.
|
| If someone is an average developer building a game or a
| corporate information or financial system or even a neural
| network implementation, if you are touching SIMD code
| directly you're probably approaching things in a less than
| optimal fashion and there are much better ways to utilize
| whatever features your hardware, or hardware in the future,
| may offer up.
| fancyfredbot wrote:
| The link to SIMD.AI is interesting. I didn't have a perfect
| experience trying to get Claude to convert a scalar code to
| AVX512.
|
| Claude seems to enjoy storing 16 bit masks in 512 bit vectors but
| the compiler will find that easily.
|
| The biggest issue I encountered was that when converting nested
| if statements into mask operations, it would frequently forget to
| and the inner and outer mask together.
| vectorcamp wrote:
| Getting an LLM to translate code is very tricky, we haven't
| included AVX2 and AVX512 yet in our SIMD.ai because it requires
| a lot more work. However, translating code between similarly
| sized vector engines is doable when we finetuned our own data
| to the LLM. We tested both ChatGPT and Claude -and more- but
| none could do even the simplest translations between eg SSE4.2
| and Neon or VSX. So trying something harder like AVX512 felt
| like a bit of a stretch. But we're working on it.
| Const-me wrote:
| The ISA extension tags are mostly incorrect. According to that
| web site, all SSE2, SSE3, SSSE3, and SSE4.1 intrinsics are part
| of SSE 4.2, and all FMA3 intrinsics are part of AVX2. BTW there's
| one processor which supports AVX2 but lacks FMA3:
| https://en.wikipedia.org/wiki/List_of_VIA_Eden_microprocesso...
|
| The search is less than ideal. Search for FMA, it will find
| multiple pages of NEON intrinsics, but no AMD64 like
| _mm256_fmadd_pd
| vectorcamp wrote:
| Hi, thanks for your feedback, we are being "incorrect" on
| purpose. All intrinsics up to and including SSE4.2 were
| including as part of SSE4.2. We have no intention of providing
| full granularity for any ISA extension esp one that is 20 years
| old. For the same reason, we are listing VSX as including in
| Power ISA 3.0, but not eg. Altivec or Power7/Power8 VSX. If you
| need such granularity, you are better off visiting the Intel
| Intrinsics Guide or the ISA manuals. So the separation for x86
| is split into 3 groups, SSE4.2 (up to and including), AVX2
| (including AVX) and AVX512 (also including some but not all
| variants). Something like x86_64-v1, x86_64-v2, etc that is
| used on compilers. We will probably do a finer granularity
| listing the exact extension in the description in the future,
| but not as part of the categorization.
|
| Now the search is indeed less than ideal, we're working on
| replacing our search engine with a much more robust that
| doesn't favour one architecture over the other, especially in
| such words.
|
| In any case, thank you for your feedback. It's still in beta
| but it is already very useful for us, as we're actually using
| it for development on our own projects.
| vient wrote:
| Would also be nice to remove empty categories from tree view.
| For example, right now you can uncheck VSX and still see
| "Memory Operations - VSX Unaligned ..." full of empty tags.
| gMermigkis wrote:
| Thank you for this comment, will be taken into
| consideration.
| ack_complete wrote:
| Note that this issue also affects NEON. Two examples are
| vmull_p64(), which requires the Crypto extension -- notably
| absent on RPi3/4 -- and vqrdmlah_s32(), which requires
| FEAT_RDM, not guaranteed until ARMv8.1. Unlike Intel, ARM
| doesn't do a very good job of surfacing this in their
| intrinsics guide.
___________________________________________________________________
(page generated 2025-07-08 23:01 UTC)