[HN Gopher] Expressive Vector Engine - SIMD in C++
       ___________________________________________________________________
        
       Expressive Vector Engine - SIMD in C++
        
       Author : klaussilveira
       Score  : 61 points
       Date   : 2025-01-05 17:11 UTC (3 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | vblanco wrote:
       | Interesting library, but i see it falls back into what happens to
       | almost all SIMD libraries, which is that they hardcode the vector
       | target completely and you cant mix/match feature levels within a
       | build. The documentation recommends writing your kernels into
       | DLLs and dynamic-loading them which is a huge mess
       | https://jfalcou.github.io/eve/multiarch.html
       | 
       | Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the
       | feature level as a template parameter on its vector objects,
       | which lets you branch at runtime between simd levels as you wish.
       | I find its a far better way of doing things if you actually want
       | to ship the simd code to users.
        
         | spacechild1 wrote:
         | Thanks, that's an important caveat!
         | 
         | > Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has
         | the feature level as a template parameter on its vector objects
         | 
         | That's pretty cool because you can write function templates and
         | instantiate different versions that you can select at runtime.
        
           | vblanco wrote:
           | Yeah thts the fun of it, you create your kernel/function so
           | that the simd level is a template parameter, and then you can
           | use simple branching like:
           | 
           | if(supports<avx512>){ myAlgo<avx512>(); } else{
           | myAlgo<avx>(); }
           | 
           | Ive also used it for benchmarking to see if my code scales to
           | different simd widths well and its a huge help
        
             | dyaroshev wrote:
             | FYI: You don't want to do this. `supports<avx512>` is an
             | expensive check. You really want to put this check in a
             | static.
        
         | kookamamie wrote:
         | 100% agreed. This is the main reason ISPC is my go-to tool for
         | explicit vectorization.
        
         | janwas wrote:
         | +1, dynamic dispatch is important. Our Highway library has
         | extensive support for this.
         | 
         | Detailed intro by kfjahnke here:
         | https://github.com/kfjahnke/zimt/blob/multi_isa/examples/mul...
        
         | vlovich123 wrote:
         | Since you seem knowledgeable about this, what does this do
         | differently from other SIMD libraries like xsimd / highway? Is
         | it the addition of algorithms similar to the STD library that
         | are explicitly SIMD optimized?
        
         | dyaroshev wrote:
         | Our answer to this - is dynamic dispatch. If you want to have
         | multiple version of the same kernel compiled - compile multiple
         | dlls.
         | 
         | The big problem here is: ODR violations. We really didn't want
         | to do the xsimd thing of forcing the user to pass an arch
         | everywhere.
         | 
         | Also that kinda defeats the purpose of "simd portability" - any
         | code with avx2 can't work for an arm platform.
         | 
         | eve just works everywhere.
         | 
         | Example: https://godbolt.org/z/bEGd7Tnb3
        
           | janwas wrote:
           | It is possible to avoid ODR violations :) We put the per-
           | target code into unique namespaces, and export a function
           | pointer to them.
        
             | dyaroshev wrote:
             | You can do many thing with macros and inline namespaces but
             | I believe they run into problems when modules come into
             | play. Can you compile the same code twice, with different
             | flags with modules?
        
       | nickpsecurity wrote:
       | I also found this looking for portable SIMD:
       | 
       | https://github.com/google/highway
        
       | shadowpho wrote:
       | Wait what about AMD? They only claim support for intel and arm
        
         | Sadiinso wrote:
         | << AMD >> is x86
        
         | dyaroshev wrote:
         | AMD we support pretty well. I tested Zen1 and a bit Zen4
        
       | Conscat wrote:
       | EVE is personally my favorite SIMD library in any programming
       | language. It's the only one I've tried that provides masked lane
       | operations in a declarative style, aside from SPMD languages like
       | CUDA or OpenMP. The [] syntax for that is admittedly pretty
       | exotic C++, but I think the usefulness of the feature is worth
       | it. I wish the documentation was better, though. When I first
       | started, I struggled to figure out how to simply make a 4-lane
       | float vector that I can pass into shaders, because almost all of
       | the examples are written for the "wide" native-SIMD size.
        
       | dyaroshev wrote:
       | Hi!
       | 
       | Thanks for your interest in the library.
       | 
       | Here is a godbolt example: https://godbolt.org/z/bEGd7Tnb3 Here
       | is a bunch of simple examples:
       | https://github.com/jfalcou/eve/blob/fb093a0553d25bb8114f1396...
       | 
       | I personally think we have the following strenghs:
       | 
       | * Algorithms. Writing SIMD loops is very hard. We give you a lot
       | of ready to go loops. (find, search, remove, set_intersection to
       | name a few). * zip and SOA support out of the box. * High quality
       | codegen. I haven't seen other libraries care about
       | unrolling/aligning data accesses - meanwhile these give you
       | substantial improvements. * Supporting more than
       | transform/reduce. We have really decent compress implemented for
       | sse/avx/neon implemented for example.
       | 
       | The following weaknesses:
       | 
       | * We don't support runtime sized sve/rvv (only fixed size). We
       | tried really hard, but unfortunately just the C++ language
       | refuses to play ball there. Here is a discussion about that
       | https://stackoverflow.com/questions/73210512/arm-sve-wrappin...
       | 
       | If this is something you need we recommend compiling a few
       | dynamic libraries with the correct fixed lengths. Google Highway
       | manage to pull it off but the trade off is a variadics interface
       | that I personally find very difficult.
       | 
       | * Runtime dispatch based on arch.
       | 
       | We again recommend dlls for this. The problem here is ODR. I
       | believe there is a solution based on preprocessor and namespaces
       | I could use but it breaks as soon as modules become a thing. So -
       | in the module world - we don't have an option. I'm happy for
       | suggestions.
       | 
       | * No MSVC support
       | 
       | C++20 and MSVC is still not a thing enough. And each new version
       | breaks something that was already working. Sad times.
       | 
       | * Just tricky to get started.
       | 
       | I don't know what to do about that. I'm happy to just write
       | examples for people. If you wanna try a library - please create
       | an issue/discussion or smth - I'm happy to take some time and try
       | to solve your case.
       | 
       | We talked about the library at CppCon:
       | https://youtu.be/WZGNCPBMInI?si=buFteQB1e1vXRT5M
       | 
       | If you want to learn how SIMD algorithms work, here are a couple
       | of talks I gave: https://youtu.be/PHZRTv3erlA?si=b87DBYMDskvzYcq1
       | https://youtu.be/vGcH40rkLdA?si=WL2e5gYQ7pSie9bd
       | 
       | Feel free to ask any questions.
        
       ___________________________________________________________________
       (page generated 2025-01-08 23:01 UTC)