[HN Gopher] Gpu.cpp: A lightweight library for portable low-leve...
       ___________________________________________________________________
        
       Gpu.cpp: A lightweight library for portable low-level GPU
       computation
        
       Author : bovem
       Score  : 218 points
       Date   : 2024-07-13 06:12 UTC (1 days ago)
        
 (HTM) web link (www.answer.ai)
 (TXT) w3m dump (www.answer.ai)
        
       | pavlov wrote:
       | Lovely! I like how the API is in a single header file that you
       | can read through and understand in one sitting.
       | 
       | I've worked with OpenGL and Direct3D and Metal in the past, but
       | the pure compute side of GPUs is mostly foreign to me. Learning
       | CUDA always felt like a big time investment when I never had an
       | obvious need at hand.
       | 
       | So I'm definitely going to play with library and try to get up to
       | speed. Thanks for publishing it.
        
         | austinvhuang wrote:
         | Thanks very much!
         | 
         | You're probably better prepared than you think. The funny thing
         | is after working on making compute workflows work with graphics
         | APIs like vulkan and webgpu, CUDA is so user friendly by
         | comparison :)
         | 
         | Feel free to say hi or ping us if you run into issues in the
         | discord channel https://discord.gg/Q9PWDckbnR
        
       | Arech wrote:
       | Very interesting... I wonder, how does code performance compares
       | to raw Vulkan?
        
         | austinvhuang wrote:
         | See https://news.ycombinator.com/item?id=40952182#40957959
         | 
         | It's early but my current since WGSL -> SPIRV is fairly shallow
         | mapping you should be able to get close modulo extensions.
         | Extensions can be important though, in particular I'm tracking
         | this closely:
         | 
         | https://github.com/gpuweb/gpuweb/issues/4195
         | 
         | One subgoal of gpu.cpp is to be able to have a canvas to
         | experiment and see how far we can push the limits.
        
       | hpen wrote:
       | Any performance metrics vs Vulkan, metal, etc?
        
         | mpreda wrote:
         | vs OpenCL, ROCm, CUDA?
        
           | zamadatix wrote:
           | Since this library ends up acting as a layer on top of the
           | listed specifications it'd be more applicable to see
           | benchmarks comparing the performance to building on top of
           | said specifications directly to get an idea of overhead. At
           | that point you could layer existing generic comparisons for
           | the specifications you listed (or anything else for that
           | matter) instead of needing them all to be redone specifically
           | with this in mind.
        
         | austinvhuang wrote:
         | The data that is out there is reasonably promising with WebGPU
         | already in use in some production ML inference engines. TVM of
         | course is way ahead of the curve as usual -
         | https://tvm.apache.org/2020/05/14/compiling-machine-learning...
         | though this post is quite old now.
         | 
         | It's still early days for pushing compute use cases to WebGPU
         | (OctoML being super early notwithstanding). There's a small
         | matmul in the examples directory but it only has the most basic
         | tiling optimizations. One of my goals the next few weeks is
         | porting the transformer block kernels from llm.c - I think that
         | will flesh out the picture far better. If there's interest,
         | happy to collaborate + could potentially do a writeup if
         | there's enough interest.
         | 
         | There's always some tradeoffs that comes with portability, but
         | part of my goal with gpu.cpp is to create a scaffold to
         | experiment and see how far we can push portable GPU
         | performance.
        
         | koolala wrote:
         | WebGPU is slower than WebGL2 on the GPU but faster on the CPU.
        
       | byefruit wrote:
       | This looks great. Is there an equivalent project in rust?
        
         | LegNeato wrote:
         | https://github.com/charles-r-earp/krnl, and more broadly
         | https://github.com/EmbarkStudios/rust-gpu.
        
           | byefruit wrote:
           | Thank you!
        
       | 01HNNWZ0MV43FF wrote:
       | > The only library dependency of gpu.cpp is a WebGPU
       | implementation.
       | 
       | Noo
        
         | sieste wrote:
         | What's the problem?
        
         | austinvhuang wrote:
         | I understand what you mean. We tried to make it as painless as
         | possible by providing a downloadable prebuilt shared library so
         | user's don't need to know the pain of building dawn from
         | scratch. It's just a few seconds to download the first time and
         | after that you just link instantaneously
         | 
         | For those that really do want to build end-to-end, there are
         | community efforts (which I've leaned on) that make dawn builds
         | much more palatable which I link at the bottom of the README.
         | 
         | We'll need to kick the tires to see if anyone reports ABI
         | issues (I had more testing to do before announcing the project
         | but this thread came early). I _really_ want the Google Dawn
         | team to ship a shared library though so we in the community don
         | 't have to roll our own.
        
           | thrtythreeforty wrote:
           | I know you said elsewhere in this thread that you want to
           | focus on a single WebGPU runtime for the moment, but I just
           | want to plug how easy it is to build wgpu even as a submodule
           | of a C++ project. I had a demo integrated into my project in
           | less than an hour of tinkering with CMake.
        
             | austinvhuang wrote:
             | Yes wgpu is a much lighter build and has a lot going for
             | it.
             | 
             | The situation has gotten a lot better for both dawn and
             | wgpu integration in C++ with:
             | 
             | https://github.com/eliemichel/WebGPU-distribution/
             | 
             | Getting a shared library build was a revelation though,
             | credit to:
             | 
             | https://github.com/jspanchu/webgpu-dawn-binaries
             | 
             | because the FetchContent cache invalidations would still
             | periodically lead to recompiling which gets quite annoying.
             | When it's just a matter of linking you get few-second
             | builds consistently. The cost is we'll have a bit of
             | hardening around potential ABI bugs but it's ultimately
             | worth it.
             | 
             | We'll work towards wgpu support. There's some sharp edges
             | in the non-overlap w/ dawn which seem most pronounced with
             | the async handling (which is pretty critical), but I don't
             | think anything is a hard blocker.
        
       | uLogMicheal wrote:
       | This is awesome! Was looking at creating similar, inspired by the
       | miniaudio approach. Will likely contribute a dart wrapper soon.
        
         | austinvhuang wrote:
         | Thanks! If there are binding projects, feel free to get in
         | touch so we can link it + trade notes.
        
       | almostgotcaught wrote:
       | TIL you can run the WebGPU runtime without a browser.
        
         | summarity wrote:
         | For me that's its most promising feature. At last a truly cross
         | platform compute library (not this, WebGPU itself). With two
         | complete and mature implementations no less (dawn and wgpu).
        
           | binary132 wrote:
           | I do not think of dawn or wgpu as complete and mature, has
           | something changed?
        
             | moffkalast wrote:
             | Yeah does Firefox support it yet in stable, or are they
             | still a solid year behind Chrome as usual?
        
               | rahkiin wrote:
               | WebGPU is interesting outside the browser: both dawn and
               | wgpu-rs can be used as cross playform native gpu layer.
               | That does not depend on firefox having webgpu support
        
         | austinvhuang wrote:
         | You're not alone.
         | 
         | I've had hour long conversations explaining the project talking
         | about how webgpu can be used natively, how rust and zig people
         | are using webgpu as a main GPU APIs (with wgpu and mach) and at
         | the end there's still clarification questions about differences
         | from WebGL and WASM.
         | 
         | The phrase "native webgpu" might as well be a Stroop Effect
         | prank in technology branding.
        
       | 0xf00ff00f wrote:
       | This is cool, but they should have just used Vulkan. Dawn is a
       | massive dependency (and a PITA to build, in my experience) to get
       | what's basically a wrapper around Vulkan. Vulkan has a reputation
       | for being difficult to work with, but if you just want to use a
       | compute queue it's not that horrible. Also, since Vulkan uses
       | SPIR-V, the user would have more choices for shading languages.
       | Additionally, with RenderDoc you get source-level shader
       | debugging.
       | 
       | Shameless plug: in case anyone wants to see how doing just
       | compute with Vulkan looks like, I wrote a similar library to
       | compete on SHAllenge [0], which was posted here on HN a few days
       | ago. My library is here: https://github.com/0xf00ff00f/vulkan-
       | compute-playground/
       | 
       | [0] https://shallenge.quirino.net/
        
         | rahkiin wrote:
         | Your suggestion would not work on mac or ios
        
           | rice7th wrote:
           | Moltenvk is a great solution
        
         | austinvhuang wrote:
         | Vulkan is definitely a valid angle and I seriously considered
         | it as well. There's a few things that, in aggregate, led me to
         | explore a different direction:
         | 
         | First, there's already a few teams taking a stab at the vulkan
         | approach like kompute, so it's not like that's uncovered
         | territory. At the same time I first looked into this the
         | khronos/apple drama + complaints about moltenvk didn't seem
         | encouraging but I'd be happy to hear if the situation is a lot
         | better.
         | 
         | Second, even though it's not the initial focus, the possibility
         | of browser targets is interesting.
         | 
         | Finally, there's not much in the fairly minimalist gpu.cpp
         | design that couldn't be retargeted to a vulkan backend at some
         | point in the future if it becomes clear that (eg w/ the right
         | combination of vulkan-specific extensions) the performance
         | differential is sufficient to justify the higher implementation
         | complexity and the metal/vulkan tug of war issues are a thing
         | of the past.
         | 
         | Ultimately there's much less happening with webgpu and the
         | things that are happening tend to be in the ml inference infra
         | rather than libraries. it seemed to be a point in the design
         | space worth exploring.
         | 
         | Regarding Dawn - I've lived where your coming from. Some non-
         | trivial amount of effort went into smoothing out the friction.
         | First, if you look at the bottom of the repo README you'll see
         | others have done a lot to make building easier - fetchcontent
         | with Elie's repo worked on the first try, but w/ gpu.cpp users
         | shouldn't even have to deal with that if they don't want to.
         | The reason there's a small script that takes the few seconds to
         | fetch a prebuilt shared library on the first build is so that
         | you can avoid the dawn build by default. After that it should
         | be almost instantaneous to link and compile cycles should be a
         | second or two.
         | 
         | But as I mention elsewhere in these threads, if the Dawn team
         | shipped prebuilt shared libraries themselves, that would be an
         | even better solution (if anyone at Google is reading this)!
        
       | austinvhuang wrote:
       | Hi, author here! Agh I was intending for the project to fly under
       | the radar for a few more days before making the announcement and
       | blog post (please look/upvote that when you see it haha :)
       | 
       | But since this is starting I'm happy to chat. Nice to see the
       | interest here!
        
         | JackYoustra wrote:
         | Thoughts on this vs wgpu (and the associated projects)?
        
           | austinvhuang wrote:
           | wgpu is an implementation of the WebGPU API, so it's
           | basically an alternative to Dawn.
           | 
           | gpu.cpp is one level up - it's implemented using the WebGPU
           | API, not an implementation of the WebGPU API. In theory it
           | should work with both wgpu and dawn but in practice you find
           | there's enough differences it takes some conditional
           | branching + testing to support both.
           | 
           | Having both wgpu and dawn support would be nice and I think
           | we'll get there in the coming months but for faster early
           | iteration I wanted to keep things simple for now. There's
           | implementation + maintenance + testing overhead that you
           | start to have to carry around so it isn't free.
        
       | jph00 wrote:
       | We just published an article introducing gpu.cpp, what it's for,
       | and how it works:
       | 
       | https://www.answer.ai/posts/2024-07-11--gpu-cpp.html
        
       | captaincrowbar wrote:
       | This looks useful but I'm worried about portability. Are there
       | any plans for native Windows support?
        
         | austinvhuang wrote:
         | Windows should work since WebGPU can target DirectX or Vulkan
         | and it should be possible to build in WSL.
         | 
         | However I was planning to announce next week after I've had a
         | chance to test with my Windows-using colleagues and this thread
         | came early, so it's possible we'll run into some hiccups.
         | 
         | Meet us on discord here if anyone needs helps or just wants to
         | say hello - https://discord.gg/Q9PWDckbnR
        
           | kookamamie wrote:
           | I would say most people would not consider WSL to be
           | "Windows".
        
           | captaincrowbar wrote:
           | Put it this way: Can I build an executable using this, that I
           | could confidently give to a Windows user who has never heard
           | of WSL?
        
             | austinvhuang wrote:
             | Fair enough - I don't think there's any hard blockers to
             | doing this, but to get the same QoL we'll want to add a
             | dawn dll to the available prebuilt binaries and adjust the
             | download script.
             | 
             | Will look into this in the coming weeks (or if anyone is up
             | for contributing let us know).
        
       | apatheticonion wrote:
       | Oh nice! Would love to see a Rust crate wrapping bindings for
       | this
        
         | austinvhuang wrote:
         | Thanks!
         | 
         | If anyone adds bindings let us know so we can link it in the
         | readme.
        
       | kookamamie wrote:
       | Portable, as in Windows native is not supported?
        
       | coffeeaddict1 wrote:
       | Is this intended to integrate well in an existing WebGPU project?
        
         | austinvhuang wrote:
         | Part of the goal is not to get in the way if there's other
         | aspects of a project that talk to WebGPU directly. If you're
         | already using WebGPU the correspondence should be pretty
         | familiar if you look at the `gpu.h` source. We specifically
         | avoided extra layers of indirection so that you can mix in
         | direct calls against the WebGPU API when needed.
        
       | soci wrote:
       | I watched the video mentioned in the post [1], but now I'm more
       | confused than before...
       | 
       | What are the benefits, if any, of using gpu.cpp instead of just
       | webgpu.h (webgpu native) directly? Maybe each is tailored for
       | different use cases?
       | 
       | [1] https://youtu.be/qHrx41aOTUQ?si=CehJnYQWCg3XklHj
        
         | austinvhuang wrote:
         | The raw WebGPUAPI is geared towards infrastructure type of
         | usage, eg ML compilers, game engines, etc and is pretty verbose
         | for application and research use cases.
         | 
         | Under examples/, for pedagogical purposes + help contributors
         | understand what happens with WebGPU under the hood, I actually
         | included an example of invoking the same GELU kernel as in the
         | hello world example without gpu.cpp. It looks like this and is
         | ~ 400+ LoC and also will take several minutes to build Dawn:
         | 
         | https://github.com/AnswerDotAI/gpu.cpp/blob/main/examples/we...
         | 
         | A goal of gpu.cpp is to make the power of webgpu much less
         | painful to integrate into a project without having to jump
         | through as many hoops (+ also sets up the prebuilt shared
         | library so builds are instantaneous and painless instead of
         | reams of cmake hassles + 5-10 minutes of waiting for dawn to
         | build):
         | 
         | https://github.com/AnswerDotAI/gpu.cpp/blob/main/examples/he...
        
       ___________________________________________________________________
       (page generated 2024-07-14 23:02 UTC)