[HN Gopher] Gpu.cpp: A lightweight library for portable low-leve...
___________________________________________________________________
Gpu.cpp: A lightweight library for portable low-level GPU
computation
Author : bovem
Score : 218 points
Date : 2024-07-13 06:12 UTC (1 days ago)
(HTM) web link (www.answer.ai)
(TXT) w3m dump (www.answer.ai)
| pavlov wrote:
| Lovely! I like how the API is in a single header file that you
| can read through and understand in one sitting.
|
| I've worked with OpenGL and Direct3D and Metal in the past, but
| the pure compute side of GPUs is mostly foreign to me. Learning
| CUDA always felt like a big time investment when I never had an
| obvious need at hand.
|
| So I'm definitely going to play with library and try to get up to
| speed. Thanks for publishing it.
| austinvhuang wrote:
| Thanks very much!
|
| You're probably better prepared than you think. The funny thing
| is after working on making compute workflows work with graphics
| APIs like vulkan and webgpu, CUDA is so user friendly by
| comparison :)
|
| Feel free to say hi or ping us if you run into issues in the
| discord channel https://discord.gg/Q9PWDckbnR
| Arech wrote:
| Very interesting... I wonder, how does code performance compares
| to raw Vulkan?
| austinvhuang wrote:
| See https://news.ycombinator.com/item?id=40952182#40957959
|
| It's early but my current since WGSL -> SPIRV is fairly shallow
| mapping you should be able to get close modulo extensions.
| Extensions can be important though, in particular I'm tracking
| this closely:
|
| https://github.com/gpuweb/gpuweb/issues/4195
|
| One subgoal of gpu.cpp is to be able to have a canvas to
| experiment and see how far we can push the limits.
| hpen wrote:
| Any performance metrics vs Vulkan, metal, etc?
| mpreda wrote:
| vs OpenCL, ROCm, CUDA?
| zamadatix wrote:
| Since this library ends up acting as a layer on top of the
| listed specifications it'd be more applicable to see
| benchmarks comparing the performance to building on top of
| said specifications directly to get an idea of overhead. At
| that point you could layer existing generic comparisons for
| the specifications you listed (or anything else for that
| matter) instead of needing them all to be redone specifically
| with this in mind.
| austinvhuang wrote:
| The data that is out there is reasonably promising with WebGPU
| already in use in some production ML inference engines. TVM of
| course is way ahead of the curve as usual -
| https://tvm.apache.org/2020/05/14/compiling-machine-learning...
| though this post is quite old now.
|
| It's still early days for pushing compute use cases to WebGPU
| (OctoML being super early notwithstanding). There's a small
| matmul in the examples directory but it only has the most basic
| tiling optimizations. One of my goals the next few weeks is
| porting the transformer block kernels from llm.c - I think that
| will flesh out the picture far better. If there's interest,
| happy to collaborate + could potentially do a writeup if
| there's enough interest.
|
| There's always some tradeoffs that comes with portability, but
| part of my goal with gpu.cpp is to create a scaffold to
| experiment and see how far we can push portable GPU
| performance.
| koolala wrote:
| WebGPU is slower than WebGL2 on the GPU but faster on the CPU.
| byefruit wrote:
| This looks great. Is there an equivalent project in rust?
| LegNeato wrote:
| https://github.com/charles-r-earp/krnl, and more broadly
| https://github.com/EmbarkStudios/rust-gpu.
| byefruit wrote:
| Thank you!
| 01HNNWZ0MV43FF wrote:
| > The only library dependency of gpu.cpp is a WebGPU
| implementation.
|
| Noo
| sieste wrote:
| What's the problem?
| austinvhuang wrote:
| I understand what you mean. We tried to make it as painless as
| possible by providing a downloadable prebuilt shared library so
| user's don't need to know the pain of building dawn from
| scratch. It's just a few seconds to download the first time and
| after that you just link instantaneously
|
| For those that really do want to build end-to-end, there are
| community efforts (which I've leaned on) that make dawn builds
| much more palatable which I link at the bottom of the README.
|
| We'll need to kick the tires to see if anyone reports ABI
| issues (I had more testing to do before announcing the project
| but this thread came early). I _really_ want the Google Dawn
| team to ship a shared library though so we in the community don
| 't have to roll our own.
| thrtythreeforty wrote:
| I know you said elsewhere in this thread that you want to
| focus on a single WebGPU runtime for the moment, but I just
| want to plug how easy it is to build wgpu even as a submodule
| of a C++ project. I had a demo integrated into my project in
| less than an hour of tinkering with CMake.
| austinvhuang wrote:
| Yes wgpu is a much lighter build and has a lot going for
| it.
|
| The situation has gotten a lot better for both dawn and
| wgpu integration in C++ with:
|
| https://github.com/eliemichel/WebGPU-distribution/
|
| Getting a shared library build was a revelation though,
| credit to:
|
| https://github.com/jspanchu/webgpu-dawn-binaries
|
| because the FetchContent cache invalidations would still
| periodically lead to recompiling which gets quite annoying.
| When it's just a matter of linking you get few-second
| builds consistently. The cost is we'll have a bit of
| hardening around potential ABI bugs but it's ultimately
| worth it.
|
| We'll work towards wgpu support. There's some sharp edges
| in the non-overlap w/ dawn which seem most pronounced with
| the async handling (which is pretty critical), but I don't
| think anything is a hard blocker.
| uLogMicheal wrote:
| This is awesome! Was looking at creating similar, inspired by the
| miniaudio approach. Will likely contribute a dart wrapper soon.
| austinvhuang wrote:
| Thanks! If there are binding projects, feel free to get in
| touch so we can link it + trade notes.
| almostgotcaught wrote:
| TIL you can run the WebGPU runtime without a browser.
| summarity wrote:
| For me that's its most promising feature. At last a truly cross
| platform compute library (not this, WebGPU itself). With two
| complete and mature implementations no less (dawn and wgpu).
| binary132 wrote:
| I do not think of dawn or wgpu as complete and mature, has
| something changed?
| moffkalast wrote:
| Yeah does Firefox support it yet in stable, or are they
| still a solid year behind Chrome as usual?
| rahkiin wrote:
| WebGPU is interesting outside the browser: both dawn and
| wgpu-rs can be used as cross playform native gpu layer.
| That does not depend on firefox having webgpu support
| austinvhuang wrote:
| You're not alone.
|
| I've had hour long conversations explaining the project talking
| about how webgpu can be used natively, how rust and zig people
| are using webgpu as a main GPU APIs (with wgpu and mach) and at
| the end there's still clarification questions about differences
| from WebGL and WASM.
|
| The phrase "native webgpu" might as well be a Stroop Effect
| prank in technology branding.
| 0xf00ff00f wrote:
| This is cool, but they should have just used Vulkan. Dawn is a
| massive dependency (and a PITA to build, in my experience) to get
| what's basically a wrapper around Vulkan. Vulkan has a reputation
| for being difficult to work with, but if you just want to use a
| compute queue it's not that horrible. Also, since Vulkan uses
| SPIR-V, the user would have more choices for shading languages.
| Additionally, with RenderDoc you get source-level shader
| debugging.
|
| Shameless plug: in case anyone wants to see how doing just
| compute with Vulkan looks like, I wrote a similar library to
| compete on SHAllenge [0], which was posted here on HN a few days
| ago. My library is here: https://github.com/0xf00ff00f/vulkan-
| compute-playground/
|
| [0] https://shallenge.quirino.net/
| rahkiin wrote:
| Your suggestion would not work on mac or ios
| rice7th wrote:
| Moltenvk is a great solution
| austinvhuang wrote:
| Vulkan is definitely a valid angle and I seriously considered
| it as well. There's a few things that, in aggregate, led me to
| explore a different direction:
|
| First, there's already a few teams taking a stab at the vulkan
| approach like kompute, so it's not like that's uncovered
| territory. At the same time I first looked into this the
| khronos/apple drama + complaints about moltenvk didn't seem
| encouraging but I'd be happy to hear if the situation is a lot
| better.
|
| Second, even though it's not the initial focus, the possibility
| of browser targets is interesting.
|
| Finally, there's not much in the fairly minimalist gpu.cpp
| design that couldn't be retargeted to a vulkan backend at some
| point in the future if it becomes clear that (eg w/ the right
| combination of vulkan-specific extensions) the performance
| differential is sufficient to justify the higher implementation
| complexity and the metal/vulkan tug of war issues are a thing
| of the past.
|
| Ultimately there's much less happening with webgpu and the
| things that are happening tend to be in the ml inference infra
| rather than libraries. it seemed to be a point in the design
| space worth exploring.
|
| Regarding Dawn - I've lived where your coming from. Some non-
| trivial amount of effort went into smoothing out the friction.
| First, if you look at the bottom of the repo README you'll see
| others have done a lot to make building easier - fetchcontent
| with Elie's repo worked on the first try, but w/ gpu.cpp users
| shouldn't even have to deal with that if they don't want to.
| The reason there's a small script that takes the few seconds to
| fetch a prebuilt shared library on the first build is so that
| you can avoid the dawn build by default. After that it should
| be almost instantaneous to link and compile cycles should be a
| second or two.
|
| But as I mention elsewhere in these threads, if the Dawn team
| shipped prebuilt shared libraries themselves, that would be an
| even better solution (if anyone at Google is reading this)!
| austinvhuang wrote:
| Hi, author here! Agh I was intending for the project to fly under
| the radar for a few more days before making the announcement and
| blog post (please look/upvote that when you see it haha :)
|
| But since this is starting I'm happy to chat. Nice to see the
| interest here!
| JackYoustra wrote:
| Thoughts on this vs wgpu (and the associated projects)?
| austinvhuang wrote:
| wgpu is an implementation of the WebGPU API, so it's
| basically an alternative to Dawn.
|
| gpu.cpp is one level up - it's implemented using the WebGPU
| API, not an implementation of the WebGPU API. In theory it
| should work with both wgpu and dawn but in practice you find
| there's enough differences it takes some conditional
| branching + testing to support both.
|
| Having both wgpu and dawn support would be nice and I think
| we'll get there in the coming months but for faster early
| iteration I wanted to keep things simple for now. There's
| implementation + maintenance + testing overhead that you
| start to have to carry around so it isn't free.
| jph00 wrote:
| We just published an article introducing gpu.cpp, what it's for,
| and how it works:
|
| https://www.answer.ai/posts/2024-07-11--gpu-cpp.html
| captaincrowbar wrote:
| This looks useful but I'm worried about portability. Are there
| any plans for native Windows support?
| austinvhuang wrote:
| Windows should work since WebGPU can target DirectX or Vulkan
| and it should be possible to build in WSL.
|
| However I was planning to announce next week after I've had a
| chance to test with my Windows-using colleagues and this thread
| came early, so it's possible we'll run into some hiccups.
|
| Meet us on discord here if anyone needs helps or just wants to
| say hello - https://discord.gg/Q9PWDckbnR
| kookamamie wrote:
| I would say most people would not consider WSL to be
| "Windows".
| captaincrowbar wrote:
| Put it this way: Can I build an executable using this, that I
| could confidently give to a Windows user who has never heard
| of WSL?
| austinvhuang wrote:
| Fair enough - I don't think there's any hard blockers to
| doing this, but to get the same QoL we'll want to add a
| dawn dll to the available prebuilt binaries and adjust the
| download script.
|
| Will look into this in the coming weeks (or if anyone is up
| for contributing let us know).
| apatheticonion wrote:
| Oh nice! Would love to see a Rust crate wrapping bindings for
| this
| austinvhuang wrote:
| Thanks!
|
| If anyone adds bindings let us know so we can link it in the
| readme.
| kookamamie wrote:
| Portable, as in Windows native is not supported?
| coffeeaddict1 wrote:
| Is this intended to integrate well in an existing WebGPU project?
| austinvhuang wrote:
| Part of the goal is not to get in the way if there's other
| aspects of a project that talk to WebGPU directly. If you're
| already using WebGPU the correspondence should be pretty
| familiar if you look at the `gpu.h` source. We specifically
| avoided extra layers of indirection so that you can mix in
| direct calls against the WebGPU API when needed.
| soci wrote:
| I watched the video mentioned in the post [1], but now I'm more
| confused than before...
|
| What are the benefits, if any, of using gpu.cpp instead of just
| webgpu.h (webgpu native) directly? Maybe each is tailored for
| different use cases?
|
| [1] https://youtu.be/qHrx41aOTUQ?si=CehJnYQWCg3XklHj
| austinvhuang wrote:
| The raw WebGPUAPI is geared towards infrastructure type of
| usage, eg ML compilers, game engines, etc and is pretty verbose
| for application and research use cases.
|
| Under examples/, for pedagogical purposes + help contributors
| understand what happens with WebGPU under the hood, I actually
| included an example of invoking the same GELU kernel as in the
| hello world example without gpu.cpp. It looks like this and is
| ~ 400+ LoC and also will take several minutes to build Dawn:
|
| https://github.com/AnswerDotAI/gpu.cpp/blob/main/examples/we...
|
| A goal of gpu.cpp is to make the power of webgpu much less
| painful to integrate into a project without having to jump
| through as many hoops (+ also sets up the prebuilt shared
| library so builds are instantaneous and painless instead of
| reams of cmake hassles + 5-10 minutes of waiting for dawn to
| build):
|
| https://github.com/AnswerDotAI/gpu.cpp/blob/main/examples/he...
___________________________________________________________________
(page generated 2024-07-14 23:02 UTC)