[HN Gopher] Rust running on every GPU
       ___________________________________________________________________
        
       Rust running on every GPU
        
       Author : littlestymaar
       Score  : 447 points
       Date   : 2025-07-26 10:08 UTC (12 hours ago)
        
 (HTM) web link (rust-gpu.github.io)
 (TXT) w3m dump (rust-gpu.github.io)
        
       | piker wrote:
       | > Existing no_std + no alloc crates written for other purposes
       | can generally run on the GPU without modification.
       | 
       | Wow. That at first glance seems to unlock ALOT of interesting
       | ideas.
        
         | boredatoms wrote:
         | I guess performance would be very different if things were
         | initially assumed to run on a cpu
        
       | vouwfietsman wrote:
       | Certainly impressive that this is possible!
       | 
       | However, for my use cases (running on arbitrary client hardware)
       | I generally distrust any _abstractions_ over the GPU api, as the
       | entire point is to leverage the _low level details_ of the gpu.
       | Treating those details as a nuisance leads to bugs and
       | performance loss, because each target is meaningfully different.
       | 
       | To overcome this, a similar system should be brought forward by
       | the vendors. However, since they failed to settle their
       | arguments, I imagine the platform differences are significant.
       | There are exceptions to this (e.g Angle), but they only arrive at
       | stability by limiting the feature set (and so performance).
       | 
       | Its good that this approach at least allows conditional
       | compilation, that helps for sure.
        
         | kookamamie wrote:
         | Exactly. Not sure why it would be better to run Rust on Nvidia
         | GPUs compared to actual CUDA code.
         | 
         | I get the idea of added abstraction, but do think it becomes a
         | bit jack-of-all-tradesey.
        
           | rbanffy wrote:
           | I think the idea is to allow developers to write a single
           | implementation and have a portable binary that can run on any
           | kind of hardware.
           | 
           | We do that all the time - there are lots of code that chooses
           | optimal code paths depending on runtime environment or which
           | ISA extensions are available.
        
             | kookamamie wrote:
             | Sure. The performance-purist in me would be very doubtful
             | about the result's optimality, though.
        
               | littlestymaar wrote:
               | The performance purist don't use Cuda either though
               | (that's why Deepseek used PTX directly).
               | 
               | Everything is an abstraction and choosing the right level
               | of abstraction for your usecase is a tradeoff between
               | your engineering capacities and your performance needs.
        
               | LowLevelMahn wrote:
               | this Rust demo also uses PTX directly
               | During the build, build.rs uses rustc_codegen_nvvm to
               | compile the GPU kernel to PTX.       The resulting PTX is
               | embedded into the CPU binary as static data.       The
               | host code is compiled normally.
        
               | LegNeato wrote:
               | To be more technically correct, we compile to NVVM IR and
               | then use NVIDIA's NVVM to convert it to PTX.
        
               | brandonpelfrey wrote:
               | The issue in my mind is that this doesn't seem to include
               | any of the critical library functionality specific eg to
               | NVIDIA cards, think reduction operations across threads
               | in a warp and similar. Some of those don't exist in all
               | hardware architectures. We may get to a point where
               | everything could be written in one language but actually
               | leveraging the hardware correctly still requires a bunch
               | of different implementations, ones for each target
               | architecture.
               | 
               | The fact that different hardware has different features
               | is a good thing.
        
             | pjmlp wrote:
             | Without the tooling though.
             | 
             | Commendable effort, however just like people forget
             | languages are ecosystems, they tend to forget APIs are
             | ecosystems as well.
        
           | MuffinFlavored wrote:
           | > Exactly. Not sure why it would be better to run Rust on
           | Nvidia GPUs compared to actual CUDA code.
           | 
           | You get to pull no_std Rust crates and they go to GPU instead
           | of having to convert them to C++
        
           | the__alchemist wrote:
           | I think the sweet spot is:
           | 
           | If your program is written in rust, use an abstraction like
           | Cudarc to send and receive data from the GPU. Write normal
           | CUDA kernels.
        
           | Ar-Curunir wrote:
           | Because folks like to program in Rust, not CUDA
        
             | tucnak wrote:
             | "Folks" as-in Rust stans, whom know very little about CUDA
             | and what makes it nice in the first place, sure, but is
             | there demand for Rust ports amongst actual CUDA
             | programmers?
             | 
             | I think not.
        
               | tayo42 wrote:
               | What makes cuda nice in the first place?
        
               | tucnak wrote:
               | All the things marked with red cross in the Rust-CUDA
               | compatibility matrix.
               | 
               | https://github.com/Rust-GPU/Rust-
               | CUDA/blob/main/guide/src/fe...
        
               | LegNeato wrote:
               | FYI, rust-cuda outputs nvvm so it can integrate with the
               | existing cuda ecosystem. We aren't suggesting rewriting
               | everything in Rust. Check the repo for crates that allow
               | using existing stuff like cudnn and cuBLAS.
        
               | tucnak wrote:
               | I take it you're the maintainer. Firstly, congrats on the
               | work done, for the open source people are a small crowd,
               | and determination of Rust teams here is commendable. On
               | the other hand, I'm struggling to see the unique value
               | proposition. What is your motivation with Rust-GPU?
               | Graphics or general-purpose computing? If it's the
               | latter, at least from my POV, I would struggle to justify
               | going up against a daunting umbrella project like this;
               | in view of it likely culminating in layers upon layers of
               | abstraction. Is the long-term goal here to have fun
               | writing a bit of Rust, or upsetting the entrenched status
               | quo of highly-concurrent GPU programming? There's a
               | saying that goes along like "pleasing all is a lot like
               | pleasing none," and intuitively I would guess it should
               | apply here.
        
               | Ar-Curunir wrote:
               | Rust expanded systems programming to a much larger
               | audience. If it can do the same for GPU programming ,
               | _even_ if the resulting programs are not (initially) as
               | fast as CUDA programs, that's a big win.
        
           | JayEquilibria wrote:
           | Good stuff. I have been thinking of learning Rust because of
           | people here even though CUDA is what I care about.
           | 
           | My abstractions though are probably best served by Pytorch
           | and Julia so Rust is just a waste of time, FOR ME.
        
         | diabllicseagull wrote:
         | same here. I'm always hesitant to build anything commercial
         | over abstractions, adapter or translation layers that may or
         | may not have sufficient support in the future.
         | 
         | sadly in 2025, we are still in desparate need for an open
         | standard that's supported by all vendors and that allows
         | programming for the full feature set of current gpu hardware.
         | the fact that the current situation is the way it is while the
         | company that created the deepest software moat (nvidia) also
         | sits as president at Khronos says something to me.
        
           | pjmlp wrote:
           | Khronos APIs are the C++ of graphics programming, there is a
           | reason why professional game studios never do political wars
           | on APIs.
           | 
           | Decades of exerience building cross platform game engines
           | since the days of raw assembly programming across
           | heterogeneous computer architectures.
           | 
           | What matters are game design and IP, that they eventually can
           | turn into physical assets like toys, movies, collection
           | assets.
           | 
           | Hardware abstraction layers are done once per platform, can
           | even leave an intern do it, at least the initial hello
           | triangle.
           | 
           | As for who seats as president at Khronos, so are elections on
           | committee driven standards bodies.
        
             | ducktective wrote:
             | I think you are very experienced in this subject. Can you
             | explain what's wrong with WebGPU? Doesn't it utilize like
             | 80% of the cool features of the modern GPUs? Games and
             | ambitious graphics-hungry applications aside, why aren't we
             | seeing more tech built on top of WebGPU like GUI stacks?
             | Why aren't we seeing browsers and web apps using it?
             | 
             | Do you recommended learning it (considering all the things
             | worth learning nowadays and rise of LLMs)
        
               | 3836293648 wrote:
               | First of all WebGPU has only been supported in Chrome for
               | a few months and Firefox in the next release. And that's
               | just Windows.
               | 
               | We haven't had enough time to develop anything really.
               | 
               | Secondly, the WebGPU standard is like Vulkan 1.0 and is
               | cumbersome to work with. But that part is hearsay, I
               | don't have much experience with it.
        
               | sim7c00 wrote:
               | gpu is often cumbesome tho. i mean, openGL, Vulkan, they
               | are not really trivial?
        
               | 3836293648 wrote:
               | OpenGL is trivial compared to Vulkan. And apparently
               | Vulkan has gotten much easier today compared to its
               | initial release in 2015
        
               | MindSpunk wrote:
               | WebGPU is about a decade behind in feature support
               | compared to what is available in modern GPUs. Things
               | missing include:
               | 
               | - Bindless resources
               | 
               | - RT acceleration
               | 
               | - 64-bit image atomic operations (these are what make
               | nanite's software rasterizer possible)
               | 
               | - mesh shaders
               | 
               | It has compute shaders at least. There's a lot of less
               | flashy to non-experts extensions being added to Vulkan
               | and D3D12 lately that removes abstractions that WebGPU
               | cant have without being a security nightmare. Outside of
               | the rendering algorithms themselves, the vast majority of
               | API surface area in Vulkan/D3D12 is just ceremony around
               | allocating memory for different purposes. New stuff like
               | descriptor buffers in Vulkan are removing that ceremony
               | in a very core area, but its unlikely to ever come to
               | WebGPU.
               | 
               | fwiw some of these features are available outside the
               | browser via 'wgpu' and/or 'dawn', but that doesn't help
               | people in the browser.
        
         | littlestymaar wrote:
         | Everything is an abstraction though, even Cuda abstracts away
         | very difference pieces of hardware with totally different
         | capabilities.
        
         | LegNeato wrote:
         | Rust is a system language, so you should have the control you
         | need. We intend to bring GPU details and APIs into the language
         | and core / std lib, and expose GPU and driver stuff to the
         | `cfg()` system.
         | 
         | (Author here)
        
           | Voultapher wrote:
           | Who is we here? I'm curious to hear more about your ambitions
           | here, since surly pulling in wgpu or something similar seems
           | out-of-scope for the traditionally lean Rust stdlib.
        
             | LegNeato wrote:
             | Many of us working on Rust + GPUs in various projects have
             | discussed starting a GPU working group to explore some of
             | these questions:
             | 
             | https://gist.github.com/LegNeato/a1fb3e3a9795af05f22920709d
             | 9...
             | 
             | Agreed, I don't think we'd ever pull in things like wgpu,
             | but we might create APIs or traits wgpu could use to
             | improve perf/safety/ergonomics/interoperability.
        
               | junon wrote:
               | I'm surprised there isn't already a Rust GPU WG. That'd
               | be incredible.
        
               | Voultapher wrote:
               | Cool, looking forward to that. It's certainly a good fit
               | for the Rust story overall, given the increasingly
               | heterogenous nature of systems.
        
               | jpc0 wrote:
               | Hears an idea,
               | 
               | Get Nvidia, AMD, Intel and whoever else you can get into
               | a room. Get LLVMs boys into the same room.
               | 
               | Compile LLVMIR directly into hardware instructions fed
               | into the GPU, get them to open up.
               | 
               | Having to target an API is part of the problem, get them
               | to allow you to write Rust that directly compiles into
               | the code that will run on the GPU, not something that
               | becomes something else, that becomes spirv that controls
               | a driver that will eventually run on the GPU.
        
               | Ygg2 wrote:
               | Hell will freeze over, then go into negative Kelvin
               | temperatures before you see nVidia agreeing in earnest to
               | do so. They make too much money on NOT GETTING
               | COMMODITIZED. nVidia even changed CUDA to make API not
               | compatible with interpreters.
               | 
               | It's the same reason Safari is in such a sorry state. Why
               | make web browser better, when it could cannibalize your
               | app store?
        
               | jpc0 wrote:
               | Somehow I want to believe if you get everyone else in the
               | room, and it becomes enough of a market force that nvidia
               | stops selling GPUs because of it, they will change.
               | _Cough_ linux gpu drivers
        
               | bobajeff wrote:
               | Sounds sort of like the idea behind MLIR and it's GPU
               | dialects.
               | 
               | * https://mlir.llvm.org/docs/Dialects/NVGPU/
               | 
               | * https://mlir.llvm.org/docs/Dialects/AMDGPU/
               | 
               | * https://mlir.llvm.org/docs/Dialects/XeGPU/
        
               | jpc0 wrote:
               | Very likely something along those lines.
               | 
               | Effectively standardise passing operations off to a
               | coprocessor. C++ is moving into that direction with
               | stdexec and the linear algebra library and SIMD.
               | 
               | I don't see why Rust wouldn't also do that.
               | 
               | Effectively why must I write a GPU kernel to have an
               | algorithm execute on the GPU, we're talking about memory
               | wrangling and linear algebra almost all of the time when
               | dealing with GPU in any way whatsoever. I don't see why
               | we need a different interface and API layer for that.
               | 
               | OpenGL et al abstract some of the linear algebra away
               | from you which is nice until you need to give a damn
               | about the assumptions they made that are no longer valid.
               | I would rather that code be in a library in the language
               | of your choice that you can inspect and understand than
               | hidden somewhere in a driver behind 3 layers of
               | abstraction.
        
               | bobajeff wrote:
               | >I would rather that code be in a library in the language
               | of your choice that you can inspect and understand than
               | hidden somewhere in a driver behind 3 layers of
               | abstraction.
               | 
               | I agree that, that would be ideal. Hopefully, that can
               | happen one day with c++, rust and other languages. So far
               | Mojo seems to be the only language close to that vision.
        
         | ants_everywhere wrote:
         | Genuine question since you seem to care about the performance:
         | 
         | As an outsider, where we are with GPUs looks a lot like where
         | we were with CPUs many years ago. And (AFAIK), the solution
         | there was three-part compilers where optimizations happen on a
         | middle layer and the third layer transforms the optimized code
         | to run directly on the hardware. A major upside is that the
         | compilers get smarter over time because the abstractions are
         | more evergreen than the hardware targets.
         | 
         | Is that sort of thing possible for GPUs? Or is there too much
         | diversity in GPUs to make it feasible/economical? Or is that
         | obviously where we're going and we just don't have it working
         | yet?
        
           | nicoburns wrote:
           | The status quo in GPU-land seems to be that the compiler
           | lives in the GPU driver and is largely opaque to everyone
           | other than the OS/GPU vendors. Sometimes there is an
           | additional layer of compiler in user land that compilers into
           | the language that the driver-compiler understands.
           | 
           | I think a lot of people would love to move to the CPU model
           | where the actual hardware instructions are documented and
           | relatively stable between different GPUs. But that's
           | impossible to do unless the GPU vendors commit to it.
        
             | sim7c00 wrote:
             | i think intel and amd provide ISA docs for their hw. not
             | sure about nvidia didnt check it in forever
        
             | pornel wrote:
             | I would like CPUs to move to the GPU model, because in the
             | CPU land adoption of wider SIMD instructions (without
             | manual dispatch/multiversioning faff) takes over a decade,
             | while in the GPU land it's a driver update.
             | 
             | To be clear, I'm talking about the PTX -> SASS compilation
             | (which is something like LLVM bitcode to x86-64 microcode
             | compilation). The fragmented and messy high-level shader
             | language compilers are a different thing, in the higher
             | abstraction layers.
        
       | omnicognate wrote:
       | Zig can also compile to SPIR-V. Not sure about the others.
       | 
       | (And I haven't tried the SPIR-V compilation yet, just came across
       | it yesterday.)
        
         | revskill wrote:
         | I do not get u.
        
           | omnicognate wrote:
           | What don't you get?
           | 
           | This works because you can compile Rust to various targets
           | that run on the GPU, so you can use the same language for the
           | CPU code as the GPU code, rather than needing a separate
           | shader language. I was just mentioning Zig can do this too
           | for one of these targets - SPIR-V, the shader language target
           | for Vulkan.
           | 
           | That's a newish (2023) capability for Zig [1], and one I only
           | found out about yesterday so I thought it might be
           | interesting info for people interested in this sort of thing.
           | 
           | For some reason it's getting downvoted by some people,
           | though. Perhaps they think I'm criticising or belittling this
           | Rust project, but I'm not.
           | 
           | [1] https://github.com/ziglang/zig/issues/2683#issuecomment-1
           | 501...
        
         | arc619 wrote:
         | Nim too, as it can use Zig as a compiler.
         | 
         | There's also https://github.com/treeform/shady to compile Nim
         | to GLSL.
         | 
         | Also, more generally, there's an LLVM-IR->SPIR-V compiler that
         | you can use for any language that has an LLVM back end (Nim has
         | nlvm, for example): https://github.com/KhronosGroup/SPIRV-LLVM-
         | Translator
         | 
         | That's not to say this project isn't cool, though. As usual
         | with Rust projects, it's a bit breathy with hype (eg
         | "sophisticated conditional compilation patterns" for
         | cfg(feature)), but it seems well developed, focused, and most
         | importantly, well documented.
         | 
         | It also shows some positive signs of being dog-fooded, and the
         | author(s) clearly intend to use it.
         | 
         | Unifying GPU back ends is a noble goal, and I wish the
         | author(s) luck.
        
       | rbanffy wrote:
       | > Though this demo doesn't do so, multiple backends could be
       | compiled into a single binary and platform-specific code paths
       | could then be selected at runtime.
       | 
       | That's kind of the goal, I'd assume: writing generic code and
       | having it run on anything.
        
         | maratc wrote:
         | > writing generic code and having it run on anything.
         | 
         | That has been already done successfully by Java applets in
         | 1995.
         | 
         | Wait, Java applets were dead by 2005, which leads me to assume
         | that the goal is different.
        
           | DarmokJalad1701 wrote:
           | > That has been already done successfully by Java applets in
           | 1995.
           | 
           | The first video card with a programmable pixel shader was the
           | Nvidia GeForce 3, released in 2001. How would Java applets be
           | running on GPUs in 1995?
           | 
           | Besides, Java cannot even be compiled for GPUs as far as I
           | know.
        
       | chrisldgk wrote:
       | Maybe this is a stupid question, as I'm just a web developer and
       | have no experience programming for a GPU.
       | 
       | Doesn't WebGPU solve this entire problem by having a single API
       | that's compatible with every GPU backend? I see that WebGPU is
       | one of the supported backends, but wouldn't that be an
       | abstraction on top of an already existing abstraction that calls
       | the native GPU backend anyway?
        
         | inciampati wrote:
         | Isn't webgpu 32-bit?
        
           | 3836293648 wrote:
           | WebAssembly is 32bit. WebGPU uses 32bit floats like all
           | graphics does. 64bit floats aren't worth it in graphics and
           | 64bit is there when you want it in compute
        
         | adithyassekhar wrote:
         | When microsoft had teeth, they had directx. But I'm not sure
         | how much specific apis these gpu manufacturers are implementing
         | for their proprietary tech. DLSS, MFG, RTX. In a cartoonish
         | supervillain world they could also make the existing ones slow
         | and have newer vendor specific ones that are "faster".
         | 
         | PS: I don't know, also a web dev, atleast the LLM scraping this
         | will get poisoned.
        
           | pjmlp wrote:
           | The teeth are pretty much around, hence Valve's failure to
           | push native Linux games, having to adopt Proton instead.
        
             | yupyupyups wrote:
             | Which isn't a failure, but a pragmatic solution that
             | facilitated most games being runnable today on Linux
             | regardless of developer support. That's with good
             | performance, mind you.
             | 
             | For concrete examples, check out https://www.protondb.com/
             | 
             | That's a success.
        
               | tonyhart7 wrote:
               | that is not native
        
               | Voultapher wrote:
               | It's often enough faster than on Windows, I'd call that
               | good enough with room for improvement.
        
               | Mond_ wrote:
               | And?
        
               | yupyupyups wrote:
               | Maybe the fact that we have all these games running on
               | Linux now, and as a result more gamers running Linux,
               | developers will be more incentivized to consider native
               | support for Linux too.
               | 
               | Regardless, "native" is not the end-goal here. Consider
               | Wine/Proton as an implementation of Windows libraries on
               | Linux. Even if all binaries are not ELF-binaries, it's
               | still not emulation or anything like that. :)
        
               | pjmlp wrote:
               | Why should they be incentivized to do anything, Valve
               | takes care of the work, they can keep targeting good old
               | Windows/DirectX as always.
               | 
               | OS/2 lesson has not yet been learnt.
        
               | yupyupyups wrote:
               | Regardless if the game is using Wine or not, when the
               | exceedingly growing Linux customerbase start complaining
               | about bugs while running the game on their Steam Decks,
               | the developers will notice. It doesn't matter if the game
               | was supposed to be running on Microsoft Windows (tm) with
               | Bill Gate's blessings. If this is how a significant
               | number of customers want to run the game, the developers
               | should listen.
               | 
               | If the devs then choose to improve "Wine compatibility"
               | or rebuild for Linux doesn't matter, as long as it's a
               | working product on Linux.
        
               | pjmlp wrote:
               | Valve will notice, devs couldn't care less.
        
               | yupyupyups wrote:
               | I'll hold on to my optimism.
        
               | pjmlp wrote:
               | Your comment looks like when political parties lose an
               | election, and then do a speech on how they achieved XYZ,
               | thus they actually won, somehow, something.
        
             | pornel wrote:
             | This didn't need Microsoft's teeth to fail. There isn't a
             | single "Linux" that game devs can build for. The kernel ABI
             | isn't sufficient to run games, and Linux doesn't have any
             | other stable ABI. The APIs are fragmented across distros,
             | and the ABIs get broken regularly.
             | 
             | The reality is that for applications with visuals better
             | than vt100, the Win32+DirectX ABI is more stable and
             | portable across _Linux distros_ than anything else that
             | Linux distros offer.
        
           | dontlaugh wrote:
           | Direct3D is still overwhelmingly the default on Windows,
           | particularly for Unreal/Unity games. And of course on the
           | Xbox.
           | 
           | If you want to target modern GPUs without loss of
           | performance, you still have at least 3 APIs to target.
        
         | pjmlp wrote:
         | If you only care about hardware designed up to 2015, as that is
         | its baseline for 1.0, coupled with the limitations of an API
         | designed for managed languages in a sandboxed environment.
        
         | nromiun wrote:
         | If it was that easy CUDA would not be the huge moat for Nvidia
         | it is now.
        
         | ducktective wrote:
         | I think WebGPU is a like a minimum common API. Zed editor for
         | Mac has targeted Metal directly.
         | 
         | Also, people have different opinions on what "common" should
         | mean. OpenGL vs Vulkan. Or as the sibling commentator
         | suggested, those who have teeth try to force the market their
         | own thing like CUDA, Metal, DirectX
        
           | pjmlp wrote:
           | Most game studios rather go with middleware using plugins,
           | adopting the best API on each platform.
           | 
           | Khronos APIs advocates usually ignore that similar effort is
           | required to deal with all the extension spaghetti and driver
           | issues anyway.
        
         | swiftcoder wrote:
         | A very large part of this project is built on the efforts of
         | the wgpu-rs WebGPU implementation.
         | 
         | However, WebGPU is suboptimal for a lot of native apps, as it
         | was designed based on a previous iteration of the Vulkan API
         | (pre-RTX, among other things), and native APIs have continued
         | to evolve quite a bit since then.
        
         | exDM69 wrote:
         | No, it does not. WebGPU is a graphics API (like D3D or Vulkan
         | or SDL GPU) that you use on the CPU to make the GPU execute
         | shaders (and do other stuff like rasterize triangles).
         | 
         | Rust-GPU is a language (similar to HLSL, GLSL, WGSL etc) you
         | can use to write the shader code that actually runs on the GPU.
        
           | nicoburns wrote:
           | This is a bit pedantic. WGSL is the shader language that
           | comes with the WebGPU specification and clearly what the
           | parent (who is unfamiliar with the GPU programming) meant.
           | 
           | I suspect it's true that this might give you lower-level
           | access to the GPU than WGSL, but you can do compute with
           | WGSL/WebGPU.
        
             | omnicognate wrote:
             | Right, but that doesn't mean WGSL/WebGPU solves the
             | "problem", which is allowing you to use the _same_ language
             | in the GPU code (i.e. the shaders) as the CPU code. You
             | still have to use separate languages.
             | 
             | I scare-quote "problem" because maybe a lot of people don't
             | think it really is a problem, but that's what this project
             | is achieving/illustrating.
             | 
             | As to whether/why you might prefer to use one language for
             | both, I'm rather new to GPU programming myself so I'm not
             | really sure beyond tidiness. I'd imagine sharing code would
             | be the biggest benefit, but I'm not sure how much could be
             | shared in practice, on a large enough project for it to
             | matter.
        
       | hardwaresofton wrote:
       | This is amazing and there is already a pretty stacked list of
       | Rust GPU projects.
       | 
       | This seems to be at an even lower level of abstraction than
       | burn[0] which is lower than candle[1].
       | 
       | I gueds whats left is to add backend(s) that leverage naga and
       | others to the above projects? Feeks like everyone is building on
       | different bases here, though I know the naga work is relatively
       | new.
       | 
       | [EDIT] Just to note, burn is the one that focuses most on
       | platform support but it looks like the only backend that uses
       | naga is wgpu... So just use wgpu and it's fine?
       | 
       | Yeah basically wgpu/ash (vulkan, metal) or cuda
       | 
       | [EDIT2] Another crate closer to this effort:
       | 
       | https://github.com/tracel-ai/cubecl
       | 
       | [0]: https://github.com/tracel-ai/burn
       | 
       | [1]: https://github.com/huggingface/candle/
        
         | LegNeato wrote:
         | You can check out https://rust-gpu.github.io/ecosystem/ as
         | well, which mentions CubeCL.
        
       | Voultapher wrote:
       | Let's count abstraction layers:
       | 
       | 1. Domain specific Rust code
       | 
       | 2. Backend abstracting over the cust, ash and wgpu crates
       | 
       | 3. wgpu and co. abstracting over platforms, drivers and APIs
       | 
       | 4. Vulkan, OpenGL, DX12 and Metal abstracting over platforms and
       | drivers
       | 
       | 5. Drivers abstracting over vendor specific hardware (one could
       | argue there are more layers in here)
       | 
       | 6. Hardware
       | 
       | That's _a lot_ of hidden complexity, better hope one never needs
       | to look under the lid. It 's also questionable how well
       | performance relevant platform specifics survive all these layers.
        
         | thrtythreeforty wrote:
         | Realistically though, a user can only hope to operate at (3) or
         | maybe (4). So not as much of an add. (Abstraction layers do not
         | stop at 6, by the way, they keep going with firmware and
         | microarchitecture implementing what you think of as the
         | instruction set.)
        
           | ivanjermakov wrote:
           | Don't know about you, but I consider 3 levels of abstraction
           | a lot, especially when it comes to such black-boxy tech like
           | GPUs.
           | 
           | I suspect debugging this Rust code is impossible.
        
             | yjftsjthsd-h wrote:
             | You posted this comment in a browser on an operating system
             | running on at least one CPU using microcode. There are more
             | layers inside those (the OS alone contains a laundry list
             | of abstractions). Three levels of abstractions can be fine.
        
             | wiz21c wrote:
             | shader code is not exactly easy to debug for a start...
        
             | coolsunglasses wrote:
             | Debugging the Rust is the easy part. I write vanilla CUDA
             | code that integrates with Rust and that one is the hard
             | part. Abstracting over the GPU backend w/ more Rust isn't a
             | big deal, most of it's SPIR-V anyway. I'm planning to stick
             | with vanilla CUDA integrating with Rust via FFI for now but
             | I'm eyeing this project as it could give me some options
             | for a more maintainable and testable stack.
        
         | LegNeato wrote:
         | The demo is admittedly a rube goldberg machine, but that's
         | because this was the first time it is possible. It will get
         | more integrated over time. And just like normal rust code, you
         | can make it as abstract or concrete as you want. But at least
         | you have the tools to do so.
         | 
         | That's one of the nice things about the rust ecosystem, you can
         | drill down and do what you want. There is std::arch, which is
         | platform specific, there is asm support, you can do things like
         | replace the allocator and panic handler, etc. And with features
         | coming like externally implemented items, it will be even more
         | flexible to target what layer of abstraction you want
        
           | 90s_dev wrote:
           | "It's only complex because it's new, it will get less complex
           | over time."
           | 
           | They said the same thing about browser tech. Still not
           | simpler under the hood.
        
             | luxuryballs wrote:
             | now _that_ is a relevant username
        
             | a99c43f2d565504 wrote:
             | As far as I understand, there was a similar mess with CPUs
             | some 50 years ago: All computers were different and there
             | was no such thing as portable code. Then problem solvers
             | came up with abstractions like the C programming language,
             | allowing developers to write more or less the same code for
             | different platforms. I suppose GPUs are slowly going
             | through a similar process now that they're useful in many
             | more domains than just graphics. I'm just spitballing.
        
               | Maken wrote:
               | And yet, we are still using handwritten assembly for hot
               | code paths. All these abstraction layers would need to be
               | porous enough to allow per-device specific code.
        
               | pizza234 wrote:
               | > And yet, we are still using handwritten assembly for
               | hot code paths
               | 
               | This is actually a win. It implies that abstractions have
               | a negligible (that is, existing but so small that can be
               | ignored) cost for anything other than small parts of the
               | codebase.
        
               | pjmlp wrote:
               | Computers have been enjoying high level systems
               | languages, a decade predating C.
        
               | Yoric wrote:
               | But it's true that you generally couldn't use the same
               | Lisp dialect on two different families of computers, for
               | instance.
        
               | pjmlp wrote:
               | Neither could you with C, POSIX exists for a reason.
        
               | dotancohen wrote:
               | > I suppose GPUs are slowly going through a similar
               | process now that they're useful in many more domains than
               | just graphics.
               | 
               | I've been waiting for the G in GPU to be replaced with
               | something else since the first CUDA releases. I honestly
               | think that once we rename this tech, more people will
               | learn to use it.
        
               | ecshafer wrote:
               | MPU - Matrix Processing Unit
               | 
               | LAPU - Linear Algebra Processing Unit
        
               | dotancohen wrote:
               | LAPU is terrific. It also means paw in Russian.
        
               | carlhjerpe wrote:
               | PPU - Parallel processing unit
        
               | jcranmer wrote:
               | The first portable programming language was, uh, Fortran.
               | Indeed, by the time the Unix developers are thinking
               | about porting to different platforms, there are already
               | open source Fortran libraries for math routines (the
               | antecedents of LAPACK). And not long afterwards, the
               | developers of those libraries are going to get together
               | and work out the necessary low-level kernel routines to
               | get good performance on the most powerful hardware of the
               | day--i.e., the BLAS interface that is still the
               | foundation of modern HPC software almost 50 years later.
               | 
               | (One of the problems of C is that people have effectively
               | erased pre-C programming languages from history.)
        
             | lukan wrote:
             | Who said that?
        
               | 90s_dev wrote:
               | They did.
        
               | Ygg2 wrote:
               | Who is they? Aka [citation needed] aka weasel word.
        
             | turnsout wrote:
             | Complexity is not inherently bad. Browsers are more or less
             | exactly as complex as they need to be in order to allow
             | users to browse the web with modern features while
             | remaining competitive with other browsers.
             | 
             | This is Tesler's Law [0] at work. If you want to fully
             | abstract away GPU compilation, it probably won't get
             | dramatically simpler than this project.
             | [0]: https://en.wikipedia.org/wiki/Law_of_conservation_of_c
             | omplexity
        
               | jpc0 wrote:
               | > Complexity is not inherently bad. Browsers are more or
               | less exactly as complex as they need to be in order to
               | allow users to browse the web with modern features while
               | remaining competitive with other browsers.
               | 
               | What a sad world we live in.
               | 
               | Your statement is technically true, the best kind of
               | true...
               | 
               | If work went into standardising a better API than the DOM
               | we might live in a world without hunger, where all our
               | dreams could become reality. But this is what we have, a
               | steaming pile of crap. But hey, at least it's a standard
               | steaming pile of crap that we can all rally around.
               | 
               | I hate it, but I hate it the least of all the options
               | presented.
        
             | Yoric wrote:
             | Who ever said that?
        
           | flohofwoe wrote:
           | > but that's because this was the first time it is possible
           | 
           | Using SPIRV as abstraction layer for GPU code across all 3D
           | APIs is hardly a new thing (via SPIRVCross, Naga or Tint),
           | and the LLVM SPIRV backend is also well established by now.
        
             | LegNeato wrote:
             | Those don't include CUDA and don't include the CPU host
             | side AFAIK.
             | 
             | SPIR-V isn't the main abstraction layer here, Rust is. This
             | is the first time it is possible for Rust host + device
             | across all these platforms and OSes and device apis.
             | 
             | You could make an argument that CubeCL enabled something
             | similar first, but it is more a DSL that looks like Rust
             | rather than the Rust language proper(but still cool).
        
               | socalgal2 wrote:
               | > This is the first time it is possible for Rust host +
               | device across all these platforms and OSes and device
               | apis.
               | 
               | I thought wgpu already did that. The new thing here is
               | you code shaders in rust, not WGSL like you do with wgpu
        
               | LegNeato wrote:
               | Correct. The new thing is those shaders/kernel also run
               | via CUDA and on CPU unchanged. You could not do that with
               | only wgpu...there is no rust shader input, (and the thing
               | that enables it rust-gpu which is used here) and if you
               | wrote your code in a shader lang it wouldn't run on the
               | CPU (as they are made for GPU only) or via CUDA.
        
             | winocm wrote:
             | LLVM SPIR-V's backend is a bit... questionable when it
             | comes to code generation.
        
         | tombh wrote:
         | I think it's worth bearing in mind that all `rust-gpu` does is
         | compile to SPIRV, which is Vulkan's IR. So in a sense layers 2.
         | and 3. are optional, or at least parallel layers rather than
         | accumulative.
         | 
         | And it's also worth remembering that all of Rust's tooling can
         | be used for building its shaders; `cargo`, `cargo test`, `cargo
         | clippy`, `rust-analyzer` (Rust's LSP server).
         | 
         | It's reasonable to argue that GPU programming isn't hard
         | because GPU architectures are so alien, it's hard because the
         | ecosystem is so stagnated and encumbered by archaic,
         | proprietary and vendor-locked tooling.
        
           | reactordev wrote:
           | Layers 2 and 3 are implementation specific and you can do it
           | however you wish. The point is that a rust program is running
           | on your GPU, whatever GPU. That's amazing!
        
         | dahart wrote:
         | Fair point, though layers 4-6 are always there, including for
         | shaders and CUDA code, and layers 1 and 3 are usually replaced
         | with a different layer, especially for anything cross-platform.
         | So this Rust project might be adding a layer of abstraction,
         | but probably only one-ish.
         | 
         | I work on layers 4-6 and I can confirm there's a lot of hidden
         | complexity in there. I'd say there are more than 3 layers there
         | too. :P
        
         | ajross wrote:
         | There is absolutely an xkcd 927 feel to this.
         | 
         | But that's not the fault of the new abstraction layers, it's
         | the fault of the GPU industry and its outrageous refusal to
         | coordinate on anything, at all, ever. Every generation of GPU
         | from every vendor has its own toolchain, its own ideas about
         | architecture, its own entirely hidden and undocumented set of
         | quirks, its own secret sauce interfaces available only in its
         | own incompatible development environment...
         | 
         | CPUs weren't like this. People figured out a basic model for
         | programming them back in the 60's and everyone agreed that open
         | docs and collabora-competing toolchains and environments were a
         | good thing. But GPUs never got the memo, and things are a huge
         | mess and remain so.
         | 
         | All the folks up here in the open source community can do is
         | add abstraction layers, which is why we have thirty seven
         | "shading languages" now.
        
           | yjftsjthsd-h wrote:
           | In fairness, the ability to restructure at will probably does
           | make it easier to improve things.
        
             | Ygg2 wrote:
             | Improve things for who?
        
               | bee_rider wrote:
               | Pretty sure they mean improve performance; number
               | crunching ability.
        
               | Ygg2 wrote:
               | In consumer GPU land, that's yet to be observed.
        
             | ajross wrote:
             | The fact that the upper parts of the stack are so
             | commoditized (i.e. CUDA and WGSL do not in fact represent
             | particularly different modes of computation, and of course
             | the linked article shows that you can drive everything
             | pretty well with scalar rust code) argues strongly against
             | that. Things aren't incompatible because of innovation,
             | they're incompatible because of expedience and paranoia.
        
           | jcranmer wrote:
           | CPUs, almost from the get-go, were intended to be programmed
           | by people other than the company who built the CPU, and thus
           | the need for a stable, persistent, well-defined ISA interface
           | was recognized very early on. But for pretty much every other
           | computer peripheral, the responsibility for the code running
           | on those embedded processors has been with the hardware
           | vendor, their responsibility ending at providing a system
           | library interface. With literal decades of experience in an
           | environment where they're freed from the burden of
           | maintaining stable low-level details, all of these
           | development groups have quite jealously guarded access to
           | that low level and actively resist any attempts to push the
           | interface layers lower.
           | 
           | As frustrating as it is, GPUs are actually the _most_ open of
           | the accelerator classes, since they 've been forced to accept
           | a layer like PTX or SPIR-V; trying to do that with other
           | kinds of accelerators is really pulling teeth.
        
         | rhaps0dy wrote:
         | Though if the rust compiles to NVVM it's exactly as bad as C++
         | CUDA, no?
        
         | flohofwoe wrote:
         | Tbf, Proton on Linux is about the same number of abstraction
         | layers, and that sometimes has better peformance than Windows
         | games running on Windows.
        
         | ben-schaaf wrote:
         | That looks like the graphics stack of a modern game engine.
         | Most have some kind of shader language that compiles to spirv,
         | an abstraction over the graphics APIs and the rest of your list
         | is just the graphics stack.
        
         | dontlaugh wrote:
         | It's not all that much worse than a compiler and runtime
         | targeting multiple CPU architectures, with different calling
         | conventions, endianess, etc. and at the hardware level
         | different firmware and microcode.
        
         | kelnos wrote:
         | > _It 's also questionable how well performance relevant
         | platform specifics survive all these layers._
         | 
         | Fair point, but one of Rust's strengths is the many zero-cost
         | abstractions it provides. And the article talks about how the
         | code complies to the GPU-specific machine code or IR.
         | Ultimately the efficiency and optimization abilities of that
         | compiler is going to determine how well your code runs, just
         | like any other compilation process.
         | 
         | This project doesn't even add _that_ much. In  "traditional"
         | GPU code, you're still going to have:
         | 
         | 1. Domain specific GPU code in whatever high-level language
         | you've chosen to work in for the target you want to support.
         | (Or more than one, if you need it, which isn't fun.)
         | 
         | ...
         | 
         | 3. Compiler that compiles your GPU code into whatever machine
         | code or IR the GPU expects.
         | 
         | 4. Vulkan, OpenGL, DX12 and Metal...
         | 
         | 5. Drivers...
         | 
         | 6. Hardware...
         | 
         | So yes, there's an extra layer here. But I think many
         | developers will gladly take on that trade off for the ability
         | to target so many software and hardware combinations in one
         | codebase/binary. And hopefully as they polish the project,
         | debugging issues will become more straightforward.
        
       | Archit3ch wrote:
       | I write native audio apps, where every cycle matters. I also need
       | the full compute API instead of graphics shaders.
       | 
       | Is the "Rust -> WebGPU -> SPIR-V -> MSL -> Metal" pipeline robust
       | when it come to performance? To me, it seems brittle and hard to
       | reason about all these translation stages. Ditto for "... ->
       | Vulkan -> MoltenVk -> ...".
       | 
       | Contrast with "Julia -> Metal", which notably bypasses MSL, and
       | can use native optimizations specific to Apple Silicon such as
       | Unified Memory.
       | 
       | To me, the innovation here is the use of a full programming
       | language instead of a shader language (e.g. Slang). Rust supports
       | newtype, traits, macros, and so on.
        
         | tucnak wrote:
         | I must agree that for numerical computation (and downstream
         | optimisation thereof) Julia is much better suited than
         | ostensibly "systems" language such as Rust. Moreover, the
         | compatibility matrix[1] for Rust-CUDA tells a story: there's
         | seemingly very little demand for CUDA programming in Rust, and
         | most parts that people love about CUDA are notably missing. If
         | there was demand, surely it would get more traction, alas, it
         | would appear that actual CUDA programmers have very little
         | appetite for it...
         | 
         | [1]: https://github.com/Rust-GPU/Rust-
         | CUDA/blob/main/guide/src/fe...
        
           | Ygg2 wrote:
           | It's not just that. See CUDA EULA at
           | https://docs.nvidia.com/cuda/eula/index.html
           | 
           | Section 1.2 Limitations:                    You may not
           | reverse engineer, decompile or disassemble any portion of the
           | output generated using SDK elements for the purpose of
           | translating such output artifacts to **target a non-NVIDIA
           | platform**.
           | 
           | Emphasis mine.
        
         | bigyabai wrote:
         | > Is the "Rust -> WebGPU -> SPIR-V -> MSL -> Metal" pipeline
         | robust when it come to performance?
         | 
         | It's basically the same concept as Apple's Clang optimizations,
         | but for the GPU. SPIR-V is an IR just like the one in LLVM,
         | which can be used for system-specific optimization. In theory,
         | you can keep the one codebase to target any number of supported
         | raster GPUs.
         | 
         | The Julia -> Metal stack is comparatively not very portable,
         | which probably doesn't matter if you write Audio Unit plugins.
         | But I could definitely see how the bigger cross-platform devs
         | like u-he or Spectrasonics would value a more complex SPIR-V
         | based pipeline.
        
       | jdbohrman wrote:
       | Why though
        
       | gedw99 wrote:
       | I am over joyed to see this.
       | 
       | They are doing a huge service for developers that just want to
       | build stuff and not get into the platform wars.
       | 
       | https://github.com/cogentcore/webgpu is a great example . I code
       | in golang and just need stuff to work on everything and this gets
       | it done, so I can use the GPU on everything.
       | 
       | Thank you rust !!
        
       | ivanjermakov wrote:
       | Is it really "Rust" on GPU? Skimming through the code, it looks
       | like shader language within proc macro heavy Rust syntax.
       | 
       | I think GPU programming is different enough to require special
       | care. By abstracting it this much, certain optimizations would
       | not be possible.
        
         | dvtkrlbs wrote:
         | It is normal rust code compiled to spirv bytecode.
        
           | LegNeato wrote:
           | And it uses 3rd party deps from crates.io that are completely
           | GPU unaware.
        
       | max-privatevoid wrote:
       | It would be great if Rust people learned how to properly load GPU
       | libraries first.
        
         | zbentley wrote:
         | Say more?
        
           | max-privatevoid wrote:
           | Rust GPU libraries such as wgpu and ash rely on external
           | libraries such as vulkan-loader to load the actual ICDs, but
           | for some reason Rust people really love dlopening them
           | instead of linking to them normally. Then it's up to the
           | consumer to configure their linker flags correctly so RPATH
           | gets set correctly when needed, but because most people don't
           | know how to use their linker, they usually end up with dumb
           | hacks like these instead:
           | 
           | https://github.com/Rust-GPU/rust-
           | gpu/blob/87ea628070561f576a...
           | 
           | https://github.com/gfx-
           | rs/wgpu/blob/bf86ac3489614ed2b212ea2f...
        
             | LegNeato wrote:
             | Can you file a bug on rust-gpu? I'd love to look into it (I
             | am unfamiliar with this area).
        
               | max-privatevoid wrote:
               | Done. https://github.com/Rust-GPU/rust-gpu/issues/351
        
             | guipsp wrote:
             | Where's the dumb hack?
        
             | viraptor wrote:
             | Isn't it typically done with Vulkan, because apps don't
             | know which library you wanted to use later? On a multi-gpu
             | system you may want to switch (for example) Intel/NVIDIA
             | implementation at runtime rather than linking directly.
        
       | bobajeff wrote:
       | I applaud the attempt this project and the GPU Working Group are
       | making here. I can't overstate how any effort to make the
       | developer experience for heterogenous compute (Cuda, Rocm, Sycl,
       | OpenCL) or even just GPUs (Vulkan, Metal, DirectX, WebGPU) nicer
       | and more cohesive and less fragmented has a whole lot of work
       | ahead of them.
        
       | slashdev wrote:
       | This is a little crude still, but the fact that this is even
       | possible is mind blowing. This has the potential, if progress
       | continues, to break the vendor-locked nightmare that is GPU
       | software and open up the space to real competition between
       | hardware vendors.
       | 
       | Imagine a world where machine learning models are written in Rust
       | and can run on both Nvidia and AMD.
       | 
       | To get max performance you likely have to break the abstraction
       | and write some vendor-specific code for each, but that's an
       | optimization problem. You still have a portable kernel that runs
       | cross platform.
        
         | bwfan123 wrote:
         | > Imagine a world where machine learning models are written in
         | Rust and can run on both Nvidia and AMD
         | 
         | Not likely in the next decade if ever. Unfortunately, the
         | entire ecosystems of jax and torch are python based. Imagine
         | retraining all those devs to use rust tooling.
        
         | willglynn wrote:
         | You might be interested in https://burn.dev, a Rust machine
         | learning framework. It has CUDA and ROCm backends among others.
        
           | slashdev wrote:
           | I am interested, thanks for sharing!
        
       | reactordev wrote:
       | This is amazing. With time, we should be able to write GPU
       | programs semantically identical to user land programs.
       | 
       | The implications of this for inference is going to be huge.
        
         | 5pl1n73r wrote:
         | CUDA programming consists of making a kernel parameterized by
         | its thread id which is used to slightly alter its behavior
         | while it executes on many hundreds of GPU cores; it's very
         | different than general purpose programming. Memory and
         | branching behave differently there. I'd say at best, it will be
         | like traditional programs and libraries with multiple
         | incompatible event loops.
        
       | melodyogonna wrote:
       | Very interesting. I wonder about the model of storing the GPU IR
       | in binary for a real-world project; it seems like that could
       | bloat the binary size a lot.
       | 
       | I also wonder about the performance of just compiling for a
       | target GPU AOT. These GPUs can be very different even if they
       | come from the same vendor. This seems like it would compile to
       | the lowest common denominator for each vendor, leaving
       | performance on the table. For example, Nvidia H-100s and Nvidia
       | Blackwell GPUs are different beasts, with specialised intrinsics
       | that are not shared, and to generate a PTX that would work on
       | both would require not using specialised features in one or both
       | of these GPUs.
       | 
       | Mojo solves these problems by JIT compiling GPU kernels at the
       | point where they're launched.
        
         | LegNeato wrote:
         | The underlying projects support loading at runtime, so you
         | could have as many AOT compiled kernel variants and load the
         | one you want. Disk is cheap. You could even ship rustc if you
         | really wanted for "JIT" (lol), but maybe not super crazy as
         | mojo is llvm based anyway. There is of course the warmup /
         | stuttering problem, but that is a separate issue and
         | (sometimes) not an issue for compute vs graphics where it is a
         | bigger issue. I have some thoughts on how to improve the status
         | quo with things unique to Rust but too early to know.
         | 
         | One of the issues with GPUs as a platform is runtime probing of
         | capabilities is... rudimentary to say the least. Rust has to
         | deal with similar stuff with CPUs+SIMD FWIW. AOT vs JIT is not
         | a new problem domain and there are no silver bullets only
         | tradeoffs. Mojo hasn't solved anything in particular, their
         | position in the solution space (JIT) has the same tradeoffs as
         | anyone else doing JIT.
        
           | melodyogonna wrote:
           | > The underlying projects support loading at runtime, so you
           | could have as many AOT compiled kernel variants and load the
           | one you want.
           | 
           | I'm not sure I understand. What underlying projects? The only
           | reference to loading at runtime I see on the post is loading
           | the AOT-compiled IR.
           | 
           | > There is of course the warmup / stuttering problem, but
           | that is a separate issue and (sometimes) not an issue for
           | compute vs graphics where it is a bigger issue.
           | 
           | It should be worth noting that Mojo itself is not JIT
           | compiled; Mojo has a GPU infrastructure that can JIT compile
           | Mojo code at runtime [1].
           | 
           | > One of the issues with GPUs as a platform is runtime
           | probing of capabilities is... rudimentary to say the least.
           | 
           | Also not an issue in Mojo when you combine the selective JIT
           | compilation I mentioned with powerful compile-time
           | programming [2].
           | 
           | 1. https://docs.modular.com/mojo/manual/gpu/intro-
           | tutorial#4-co... - here the kernel "print_threads" will be
           | jit compiled
           | 
           | 2. https://docs.modular.com/mojo/manual/parameters/
        
       ___________________________________________________________________
       (page generated 2025-07-26 23:00 UTC)