[HN Gopher] Rust running on every GPU
___________________________________________________________________
Rust running on every GPU
Author : littlestymaar
Score : 447 points
Date : 2025-07-26 10:08 UTC (12 hours ago)
(HTM) web link (rust-gpu.github.io)
(TXT) w3m dump (rust-gpu.github.io)
| piker wrote:
| > Existing no_std + no alloc crates written for other purposes
| can generally run on the GPU without modification.
|
| Wow. That at first glance seems to unlock ALOT of interesting
| ideas.
| boredatoms wrote:
| I guess performance would be very different if things were
| initially assumed to run on a cpu
| vouwfietsman wrote:
| Certainly impressive that this is possible!
|
| However, for my use cases (running on arbitrary client hardware)
| I generally distrust any _abstractions_ over the GPU api, as the
| entire point is to leverage the _low level details_ of the gpu.
| Treating those details as a nuisance leads to bugs and
| performance loss, because each target is meaningfully different.
|
| To overcome this, a similar system should be brought forward by
| the vendors. However, since they failed to settle their
| arguments, I imagine the platform differences are significant.
| There are exceptions to this (e.g Angle), but they only arrive at
| stability by limiting the feature set (and so performance).
|
| Its good that this approach at least allows conditional
| compilation, that helps for sure.
| kookamamie wrote:
| Exactly. Not sure why it would be better to run Rust on Nvidia
| GPUs compared to actual CUDA code.
|
| I get the idea of added abstraction, but do think it becomes a
| bit jack-of-all-tradesey.
| rbanffy wrote:
| I think the idea is to allow developers to write a single
| implementation and have a portable binary that can run on any
| kind of hardware.
|
| We do that all the time - there are lots of code that chooses
| optimal code paths depending on runtime environment or which
| ISA extensions are available.
| kookamamie wrote:
| Sure. The performance-purist in me would be very doubtful
| about the result's optimality, though.
| littlestymaar wrote:
| The performance purist don't use Cuda either though
| (that's why Deepseek used PTX directly).
|
| Everything is an abstraction and choosing the right level
| of abstraction for your usecase is a tradeoff between
| your engineering capacities and your performance needs.
| LowLevelMahn wrote:
| this Rust demo also uses PTX directly
| During the build, build.rs uses rustc_codegen_nvvm to
| compile the GPU kernel to PTX. The resulting PTX is
| embedded into the CPU binary as static data. The
| host code is compiled normally.
| LegNeato wrote:
| To be more technically correct, we compile to NVVM IR and
| then use NVIDIA's NVVM to convert it to PTX.
| brandonpelfrey wrote:
| The issue in my mind is that this doesn't seem to include
| any of the critical library functionality specific eg to
| NVIDIA cards, think reduction operations across threads
| in a warp and similar. Some of those don't exist in all
| hardware architectures. We may get to a point where
| everything could be written in one language but actually
| leveraging the hardware correctly still requires a bunch
| of different implementations, ones for each target
| architecture.
|
| The fact that different hardware has different features
| is a good thing.
| pjmlp wrote:
| Without the tooling though.
|
| Commendable effort, however just like people forget
| languages are ecosystems, they tend to forget APIs are
| ecosystems as well.
| MuffinFlavored wrote:
| > Exactly. Not sure why it would be better to run Rust on
| Nvidia GPUs compared to actual CUDA code.
|
| You get to pull no_std Rust crates and they go to GPU instead
| of having to convert them to C++
| the__alchemist wrote:
| I think the sweet spot is:
|
| If your program is written in rust, use an abstraction like
| Cudarc to send and receive data from the GPU. Write normal
| CUDA kernels.
| Ar-Curunir wrote:
| Because folks like to program in Rust, not CUDA
| tucnak wrote:
| "Folks" as-in Rust stans, whom know very little about CUDA
| and what makes it nice in the first place, sure, but is
| there demand for Rust ports amongst actual CUDA
| programmers?
|
| I think not.
| tayo42 wrote:
| What makes cuda nice in the first place?
| tucnak wrote:
| All the things marked with red cross in the Rust-CUDA
| compatibility matrix.
|
| https://github.com/Rust-GPU/Rust-
| CUDA/blob/main/guide/src/fe...
| LegNeato wrote:
| FYI, rust-cuda outputs nvvm so it can integrate with the
| existing cuda ecosystem. We aren't suggesting rewriting
| everything in Rust. Check the repo for crates that allow
| using existing stuff like cudnn and cuBLAS.
| tucnak wrote:
| I take it you're the maintainer. Firstly, congrats on the
| work done, for the open source people are a small crowd,
| and determination of Rust teams here is commendable. On
| the other hand, I'm struggling to see the unique value
| proposition. What is your motivation with Rust-GPU?
| Graphics or general-purpose computing? If it's the
| latter, at least from my POV, I would struggle to justify
| going up against a daunting umbrella project like this;
| in view of it likely culminating in layers upon layers of
| abstraction. Is the long-term goal here to have fun
| writing a bit of Rust, or upsetting the entrenched status
| quo of highly-concurrent GPU programming? There's a
| saying that goes along like "pleasing all is a lot like
| pleasing none," and intuitively I would guess it should
| apply here.
| Ar-Curunir wrote:
| Rust expanded systems programming to a much larger
| audience. If it can do the same for GPU programming ,
| _even_ if the resulting programs are not (initially) as
| fast as CUDA programs, that's a big win.
| JayEquilibria wrote:
| Good stuff. I have been thinking of learning Rust because of
| people here even though CUDA is what I care about.
|
| My abstractions though are probably best served by Pytorch
| and Julia so Rust is just a waste of time, FOR ME.
| diabllicseagull wrote:
| same here. I'm always hesitant to build anything commercial
| over abstractions, adapter or translation layers that may or
| may not have sufficient support in the future.
|
| sadly in 2025, we are still in desparate need for an open
| standard that's supported by all vendors and that allows
| programming for the full feature set of current gpu hardware.
| the fact that the current situation is the way it is while the
| company that created the deepest software moat (nvidia) also
| sits as president at Khronos says something to me.
| pjmlp wrote:
| Khronos APIs are the C++ of graphics programming, there is a
| reason why professional game studios never do political wars
| on APIs.
|
| Decades of exerience building cross platform game engines
| since the days of raw assembly programming across
| heterogeneous computer architectures.
|
| What matters are game design and IP, that they eventually can
| turn into physical assets like toys, movies, collection
| assets.
|
| Hardware abstraction layers are done once per platform, can
| even leave an intern do it, at least the initial hello
| triangle.
|
| As for who seats as president at Khronos, so are elections on
| committee driven standards bodies.
| ducktective wrote:
| I think you are very experienced in this subject. Can you
| explain what's wrong with WebGPU? Doesn't it utilize like
| 80% of the cool features of the modern GPUs? Games and
| ambitious graphics-hungry applications aside, why aren't we
| seeing more tech built on top of WebGPU like GUI stacks?
| Why aren't we seeing browsers and web apps using it?
|
| Do you recommended learning it (considering all the things
| worth learning nowadays and rise of LLMs)
| 3836293648 wrote:
| First of all WebGPU has only been supported in Chrome for
| a few months and Firefox in the next release. And that's
| just Windows.
|
| We haven't had enough time to develop anything really.
|
| Secondly, the WebGPU standard is like Vulkan 1.0 and is
| cumbersome to work with. But that part is hearsay, I
| don't have much experience with it.
| sim7c00 wrote:
| gpu is often cumbesome tho. i mean, openGL, Vulkan, they
| are not really trivial?
| 3836293648 wrote:
| OpenGL is trivial compared to Vulkan. And apparently
| Vulkan has gotten much easier today compared to its
| initial release in 2015
| MindSpunk wrote:
| WebGPU is about a decade behind in feature support
| compared to what is available in modern GPUs. Things
| missing include:
|
| - Bindless resources
|
| - RT acceleration
|
| - 64-bit image atomic operations (these are what make
| nanite's software rasterizer possible)
|
| - mesh shaders
|
| It has compute shaders at least. There's a lot of less
| flashy to non-experts extensions being added to Vulkan
| and D3D12 lately that removes abstractions that WebGPU
| cant have without being a security nightmare. Outside of
| the rendering algorithms themselves, the vast majority of
| API surface area in Vulkan/D3D12 is just ceremony around
| allocating memory for different purposes. New stuff like
| descriptor buffers in Vulkan are removing that ceremony
| in a very core area, but its unlikely to ever come to
| WebGPU.
|
| fwiw some of these features are available outside the
| browser via 'wgpu' and/or 'dawn', but that doesn't help
| people in the browser.
| littlestymaar wrote:
| Everything is an abstraction though, even Cuda abstracts away
| very difference pieces of hardware with totally different
| capabilities.
| LegNeato wrote:
| Rust is a system language, so you should have the control you
| need. We intend to bring GPU details and APIs into the language
| and core / std lib, and expose GPU and driver stuff to the
| `cfg()` system.
|
| (Author here)
| Voultapher wrote:
| Who is we here? I'm curious to hear more about your ambitions
| here, since surly pulling in wgpu or something similar seems
| out-of-scope for the traditionally lean Rust stdlib.
| LegNeato wrote:
| Many of us working on Rust + GPUs in various projects have
| discussed starting a GPU working group to explore some of
| these questions:
|
| https://gist.github.com/LegNeato/a1fb3e3a9795af05f22920709d
| 9...
|
| Agreed, I don't think we'd ever pull in things like wgpu,
| but we might create APIs or traits wgpu could use to
| improve perf/safety/ergonomics/interoperability.
| junon wrote:
| I'm surprised there isn't already a Rust GPU WG. That'd
| be incredible.
| Voultapher wrote:
| Cool, looking forward to that. It's certainly a good fit
| for the Rust story overall, given the increasingly
| heterogenous nature of systems.
| jpc0 wrote:
| Hears an idea,
|
| Get Nvidia, AMD, Intel and whoever else you can get into
| a room. Get LLVMs boys into the same room.
|
| Compile LLVMIR directly into hardware instructions fed
| into the GPU, get them to open up.
|
| Having to target an API is part of the problem, get them
| to allow you to write Rust that directly compiles into
| the code that will run on the GPU, not something that
| becomes something else, that becomes spirv that controls
| a driver that will eventually run on the GPU.
| Ygg2 wrote:
| Hell will freeze over, then go into negative Kelvin
| temperatures before you see nVidia agreeing in earnest to
| do so. They make too much money on NOT GETTING
| COMMODITIZED. nVidia even changed CUDA to make API not
| compatible with interpreters.
|
| It's the same reason Safari is in such a sorry state. Why
| make web browser better, when it could cannibalize your
| app store?
| jpc0 wrote:
| Somehow I want to believe if you get everyone else in the
| room, and it becomes enough of a market force that nvidia
| stops selling GPUs because of it, they will change.
| _Cough_ linux gpu drivers
| bobajeff wrote:
| Sounds sort of like the idea behind MLIR and it's GPU
| dialects.
|
| * https://mlir.llvm.org/docs/Dialects/NVGPU/
|
| * https://mlir.llvm.org/docs/Dialects/AMDGPU/
|
| * https://mlir.llvm.org/docs/Dialects/XeGPU/
| jpc0 wrote:
| Very likely something along those lines.
|
| Effectively standardise passing operations off to a
| coprocessor. C++ is moving into that direction with
| stdexec and the linear algebra library and SIMD.
|
| I don't see why Rust wouldn't also do that.
|
| Effectively why must I write a GPU kernel to have an
| algorithm execute on the GPU, we're talking about memory
| wrangling and linear algebra almost all of the time when
| dealing with GPU in any way whatsoever. I don't see why
| we need a different interface and API layer for that.
|
| OpenGL et al abstract some of the linear algebra away
| from you which is nice until you need to give a damn
| about the assumptions they made that are no longer valid.
| I would rather that code be in a library in the language
| of your choice that you can inspect and understand than
| hidden somewhere in a driver behind 3 layers of
| abstraction.
| bobajeff wrote:
| >I would rather that code be in a library in the language
| of your choice that you can inspect and understand than
| hidden somewhere in a driver behind 3 layers of
| abstraction.
|
| I agree that, that would be ideal. Hopefully, that can
| happen one day with c++, rust and other languages. So far
| Mojo seems to be the only language close to that vision.
| ants_everywhere wrote:
| Genuine question since you seem to care about the performance:
|
| As an outsider, where we are with GPUs looks a lot like where
| we were with CPUs many years ago. And (AFAIK), the solution
| there was three-part compilers where optimizations happen on a
| middle layer and the third layer transforms the optimized code
| to run directly on the hardware. A major upside is that the
| compilers get smarter over time because the abstractions are
| more evergreen than the hardware targets.
|
| Is that sort of thing possible for GPUs? Or is there too much
| diversity in GPUs to make it feasible/economical? Or is that
| obviously where we're going and we just don't have it working
| yet?
| nicoburns wrote:
| The status quo in GPU-land seems to be that the compiler
| lives in the GPU driver and is largely opaque to everyone
| other than the OS/GPU vendors. Sometimes there is an
| additional layer of compiler in user land that compilers into
| the language that the driver-compiler understands.
|
| I think a lot of people would love to move to the CPU model
| where the actual hardware instructions are documented and
| relatively stable between different GPUs. But that's
| impossible to do unless the GPU vendors commit to it.
| sim7c00 wrote:
| i think intel and amd provide ISA docs for their hw. not
| sure about nvidia didnt check it in forever
| pornel wrote:
| I would like CPUs to move to the GPU model, because in the
| CPU land adoption of wider SIMD instructions (without
| manual dispatch/multiversioning faff) takes over a decade,
| while in the GPU land it's a driver update.
|
| To be clear, I'm talking about the PTX -> SASS compilation
| (which is something like LLVM bitcode to x86-64 microcode
| compilation). The fragmented and messy high-level shader
| language compilers are a different thing, in the higher
| abstraction layers.
| omnicognate wrote:
| Zig can also compile to SPIR-V. Not sure about the others.
|
| (And I haven't tried the SPIR-V compilation yet, just came across
| it yesterday.)
| revskill wrote:
| I do not get u.
| omnicognate wrote:
| What don't you get?
|
| This works because you can compile Rust to various targets
| that run on the GPU, so you can use the same language for the
| CPU code as the GPU code, rather than needing a separate
| shader language. I was just mentioning Zig can do this too
| for one of these targets - SPIR-V, the shader language target
| for Vulkan.
|
| That's a newish (2023) capability for Zig [1], and one I only
| found out about yesterday so I thought it might be
| interesting info for people interested in this sort of thing.
|
| For some reason it's getting downvoted by some people,
| though. Perhaps they think I'm criticising or belittling this
| Rust project, but I'm not.
|
| [1] https://github.com/ziglang/zig/issues/2683#issuecomment-1
| 501...
| arc619 wrote:
| Nim too, as it can use Zig as a compiler.
|
| There's also https://github.com/treeform/shady to compile Nim
| to GLSL.
|
| Also, more generally, there's an LLVM-IR->SPIR-V compiler that
| you can use for any language that has an LLVM back end (Nim has
| nlvm, for example): https://github.com/KhronosGroup/SPIRV-LLVM-
| Translator
|
| That's not to say this project isn't cool, though. As usual
| with Rust projects, it's a bit breathy with hype (eg
| "sophisticated conditional compilation patterns" for
| cfg(feature)), but it seems well developed, focused, and most
| importantly, well documented.
|
| It also shows some positive signs of being dog-fooded, and the
| author(s) clearly intend to use it.
|
| Unifying GPU back ends is a noble goal, and I wish the
| author(s) luck.
| rbanffy wrote:
| > Though this demo doesn't do so, multiple backends could be
| compiled into a single binary and platform-specific code paths
| could then be selected at runtime.
|
| That's kind of the goal, I'd assume: writing generic code and
| having it run on anything.
| maratc wrote:
| > writing generic code and having it run on anything.
|
| That has been already done successfully by Java applets in
| 1995.
|
| Wait, Java applets were dead by 2005, which leads me to assume
| that the goal is different.
| DarmokJalad1701 wrote:
| > That has been already done successfully by Java applets in
| 1995.
|
| The first video card with a programmable pixel shader was the
| Nvidia GeForce 3, released in 2001. How would Java applets be
| running on GPUs in 1995?
|
| Besides, Java cannot even be compiled for GPUs as far as I
| know.
| chrisldgk wrote:
| Maybe this is a stupid question, as I'm just a web developer and
| have no experience programming for a GPU.
|
| Doesn't WebGPU solve this entire problem by having a single API
| that's compatible with every GPU backend? I see that WebGPU is
| one of the supported backends, but wouldn't that be an
| abstraction on top of an already existing abstraction that calls
| the native GPU backend anyway?
| inciampati wrote:
| Isn't webgpu 32-bit?
| 3836293648 wrote:
| WebAssembly is 32bit. WebGPU uses 32bit floats like all
| graphics does. 64bit floats aren't worth it in graphics and
| 64bit is there when you want it in compute
| adithyassekhar wrote:
| When microsoft had teeth, they had directx. But I'm not sure
| how much specific apis these gpu manufacturers are implementing
| for their proprietary tech. DLSS, MFG, RTX. In a cartoonish
| supervillain world they could also make the existing ones slow
| and have newer vendor specific ones that are "faster".
|
| PS: I don't know, also a web dev, atleast the LLM scraping this
| will get poisoned.
| pjmlp wrote:
| The teeth are pretty much around, hence Valve's failure to
| push native Linux games, having to adopt Proton instead.
| yupyupyups wrote:
| Which isn't a failure, but a pragmatic solution that
| facilitated most games being runnable today on Linux
| regardless of developer support. That's with good
| performance, mind you.
|
| For concrete examples, check out https://www.protondb.com/
|
| That's a success.
| tonyhart7 wrote:
| that is not native
| Voultapher wrote:
| It's often enough faster than on Windows, I'd call that
| good enough with room for improvement.
| Mond_ wrote:
| And?
| yupyupyups wrote:
| Maybe the fact that we have all these games running on
| Linux now, and as a result more gamers running Linux,
| developers will be more incentivized to consider native
| support for Linux too.
|
| Regardless, "native" is not the end-goal here. Consider
| Wine/Proton as an implementation of Windows libraries on
| Linux. Even if all binaries are not ELF-binaries, it's
| still not emulation or anything like that. :)
| pjmlp wrote:
| Why should they be incentivized to do anything, Valve
| takes care of the work, they can keep targeting good old
| Windows/DirectX as always.
|
| OS/2 lesson has not yet been learnt.
| yupyupyups wrote:
| Regardless if the game is using Wine or not, when the
| exceedingly growing Linux customerbase start complaining
| about bugs while running the game on their Steam Decks,
| the developers will notice. It doesn't matter if the game
| was supposed to be running on Microsoft Windows (tm) with
| Bill Gate's blessings. If this is how a significant
| number of customers want to run the game, the developers
| should listen.
|
| If the devs then choose to improve "Wine compatibility"
| or rebuild for Linux doesn't matter, as long as it's a
| working product on Linux.
| pjmlp wrote:
| Valve will notice, devs couldn't care less.
| yupyupyups wrote:
| I'll hold on to my optimism.
| pjmlp wrote:
| Your comment looks like when political parties lose an
| election, and then do a speech on how they achieved XYZ,
| thus they actually won, somehow, something.
| pornel wrote:
| This didn't need Microsoft's teeth to fail. There isn't a
| single "Linux" that game devs can build for. The kernel ABI
| isn't sufficient to run games, and Linux doesn't have any
| other stable ABI. The APIs are fragmented across distros,
| and the ABIs get broken regularly.
|
| The reality is that for applications with visuals better
| than vt100, the Win32+DirectX ABI is more stable and
| portable across _Linux distros_ than anything else that
| Linux distros offer.
| dontlaugh wrote:
| Direct3D is still overwhelmingly the default on Windows,
| particularly for Unreal/Unity games. And of course on the
| Xbox.
|
| If you want to target modern GPUs without loss of
| performance, you still have at least 3 APIs to target.
| pjmlp wrote:
| If you only care about hardware designed up to 2015, as that is
| its baseline for 1.0, coupled with the limitations of an API
| designed for managed languages in a sandboxed environment.
| nromiun wrote:
| If it was that easy CUDA would not be the huge moat for Nvidia
| it is now.
| ducktective wrote:
| I think WebGPU is a like a minimum common API. Zed editor for
| Mac has targeted Metal directly.
|
| Also, people have different opinions on what "common" should
| mean. OpenGL vs Vulkan. Or as the sibling commentator
| suggested, those who have teeth try to force the market their
| own thing like CUDA, Metal, DirectX
| pjmlp wrote:
| Most game studios rather go with middleware using plugins,
| adopting the best API on each platform.
|
| Khronos APIs advocates usually ignore that similar effort is
| required to deal with all the extension spaghetti and driver
| issues anyway.
| swiftcoder wrote:
| A very large part of this project is built on the efforts of
| the wgpu-rs WebGPU implementation.
|
| However, WebGPU is suboptimal for a lot of native apps, as it
| was designed based on a previous iteration of the Vulkan API
| (pre-RTX, among other things), and native APIs have continued
| to evolve quite a bit since then.
| exDM69 wrote:
| No, it does not. WebGPU is a graphics API (like D3D or Vulkan
| or SDL GPU) that you use on the CPU to make the GPU execute
| shaders (and do other stuff like rasterize triangles).
|
| Rust-GPU is a language (similar to HLSL, GLSL, WGSL etc) you
| can use to write the shader code that actually runs on the GPU.
| nicoburns wrote:
| This is a bit pedantic. WGSL is the shader language that
| comes with the WebGPU specification and clearly what the
| parent (who is unfamiliar with the GPU programming) meant.
|
| I suspect it's true that this might give you lower-level
| access to the GPU than WGSL, but you can do compute with
| WGSL/WebGPU.
| omnicognate wrote:
| Right, but that doesn't mean WGSL/WebGPU solves the
| "problem", which is allowing you to use the _same_ language
| in the GPU code (i.e. the shaders) as the CPU code. You
| still have to use separate languages.
|
| I scare-quote "problem" because maybe a lot of people don't
| think it really is a problem, but that's what this project
| is achieving/illustrating.
|
| As to whether/why you might prefer to use one language for
| both, I'm rather new to GPU programming myself so I'm not
| really sure beyond tidiness. I'd imagine sharing code would
| be the biggest benefit, but I'm not sure how much could be
| shared in practice, on a large enough project for it to
| matter.
| hardwaresofton wrote:
| This is amazing and there is already a pretty stacked list of
| Rust GPU projects.
|
| This seems to be at an even lower level of abstraction than
| burn[0] which is lower than candle[1].
|
| I gueds whats left is to add backend(s) that leverage naga and
| others to the above projects? Feeks like everyone is building on
| different bases here, though I know the naga work is relatively
| new.
|
| [EDIT] Just to note, burn is the one that focuses most on
| platform support but it looks like the only backend that uses
| naga is wgpu... So just use wgpu and it's fine?
|
| Yeah basically wgpu/ash (vulkan, metal) or cuda
|
| [EDIT2] Another crate closer to this effort:
|
| https://github.com/tracel-ai/cubecl
|
| [0]: https://github.com/tracel-ai/burn
|
| [1]: https://github.com/huggingface/candle/
| LegNeato wrote:
| You can check out https://rust-gpu.github.io/ecosystem/ as
| well, which mentions CubeCL.
| Voultapher wrote:
| Let's count abstraction layers:
|
| 1. Domain specific Rust code
|
| 2. Backend abstracting over the cust, ash and wgpu crates
|
| 3. wgpu and co. abstracting over platforms, drivers and APIs
|
| 4. Vulkan, OpenGL, DX12 and Metal abstracting over platforms and
| drivers
|
| 5. Drivers abstracting over vendor specific hardware (one could
| argue there are more layers in here)
|
| 6. Hardware
|
| That's _a lot_ of hidden complexity, better hope one never needs
| to look under the lid. It 's also questionable how well
| performance relevant platform specifics survive all these layers.
| thrtythreeforty wrote:
| Realistically though, a user can only hope to operate at (3) or
| maybe (4). So not as much of an add. (Abstraction layers do not
| stop at 6, by the way, they keep going with firmware and
| microarchitecture implementing what you think of as the
| instruction set.)
| ivanjermakov wrote:
| Don't know about you, but I consider 3 levels of abstraction
| a lot, especially when it comes to such black-boxy tech like
| GPUs.
|
| I suspect debugging this Rust code is impossible.
| yjftsjthsd-h wrote:
| You posted this comment in a browser on an operating system
| running on at least one CPU using microcode. There are more
| layers inside those (the OS alone contains a laundry list
| of abstractions). Three levels of abstractions can be fine.
| wiz21c wrote:
| shader code is not exactly easy to debug for a start...
| coolsunglasses wrote:
| Debugging the Rust is the easy part. I write vanilla CUDA
| code that integrates with Rust and that one is the hard
| part. Abstracting over the GPU backend w/ more Rust isn't a
| big deal, most of it's SPIR-V anyway. I'm planning to stick
| with vanilla CUDA integrating with Rust via FFI for now but
| I'm eyeing this project as it could give me some options
| for a more maintainable and testable stack.
| LegNeato wrote:
| The demo is admittedly a rube goldberg machine, but that's
| because this was the first time it is possible. It will get
| more integrated over time. And just like normal rust code, you
| can make it as abstract or concrete as you want. But at least
| you have the tools to do so.
|
| That's one of the nice things about the rust ecosystem, you can
| drill down and do what you want. There is std::arch, which is
| platform specific, there is asm support, you can do things like
| replace the allocator and panic handler, etc. And with features
| coming like externally implemented items, it will be even more
| flexible to target what layer of abstraction you want
| 90s_dev wrote:
| "It's only complex because it's new, it will get less complex
| over time."
|
| They said the same thing about browser tech. Still not
| simpler under the hood.
| luxuryballs wrote:
| now _that_ is a relevant username
| a99c43f2d565504 wrote:
| As far as I understand, there was a similar mess with CPUs
| some 50 years ago: All computers were different and there
| was no such thing as portable code. Then problem solvers
| came up with abstractions like the C programming language,
| allowing developers to write more or less the same code for
| different platforms. I suppose GPUs are slowly going
| through a similar process now that they're useful in many
| more domains than just graphics. I'm just spitballing.
| Maken wrote:
| And yet, we are still using handwritten assembly for hot
| code paths. All these abstraction layers would need to be
| porous enough to allow per-device specific code.
| pizza234 wrote:
| > And yet, we are still using handwritten assembly for
| hot code paths
|
| This is actually a win. It implies that abstractions have
| a negligible (that is, existing but so small that can be
| ignored) cost for anything other than small parts of the
| codebase.
| pjmlp wrote:
| Computers have been enjoying high level systems
| languages, a decade predating C.
| Yoric wrote:
| But it's true that you generally couldn't use the same
| Lisp dialect on two different families of computers, for
| instance.
| pjmlp wrote:
| Neither could you with C, POSIX exists for a reason.
| dotancohen wrote:
| > I suppose GPUs are slowly going through a similar
| process now that they're useful in many more domains than
| just graphics.
|
| I've been waiting for the G in GPU to be replaced with
| something else since the first CUDA releases. I honestly
| think that once we rename this tech, more people will
| learn to use it.
| ecshafer wrote:
| MPU - Matrix Processing Unit
|
| LAPU - Linear Algebra Processing Unit
| dotancohen wrote:
| LAPU is terrific. It also means paw in Russian.
| carlhjerpe wrote:
| PPU - Parallel processing unit
| jcranmer wrote:
| The first portable programming language was, uh, Fortran.
| Indeed, by the time the Unix developers are thinking
| about porting to different platforms, there are already
| open source Fortran libraries for math routines (the
| antecedents of LAPACK). And not long afterwards, the
| developers of those libraries are going to get together
| and work out the necessary low-level kernel routines to
| get good performance on the most powerful hardware of the
| day--i.e., the BLAS interface that is still the
| foundation of modern HPC software almost 50 years later.
|
| (One of the problems of C is that people have effectively
| erased pre-C programming languages from history.)
| lukan wrote:
| Who said that?
| 90s_dev wrote:
| They did.
| Ygg2 wrote:
| Who is they? Aka [citation needed] aka weasel word.
| turnsout wrote:
| Complexity is not inherently bad. Browsers are more or less
| exactly as complex as they need to be in order to allow
| users to browse the web with modern features while
| remaining competitive with other browsers.
|
| This is Tesler's Law [0] at work. If you want to fully
| abstract away GPU compilation, it probably won't get
| dramatically simpler than this project.
| [0]: https://en.wikipedia.org/wiki/Law_of_conservation_of_c
| omplexity
| jpc0 wrote:
| > Complexity is not inherently bad. Browsers are more or
| less exactly as complex as they need to be in order to
| allow users to browse the web with modern features while
| remaining competitive with other browsers.
|
| What a sad world we live in.
|
| Your statement is technically true, the best kind of
| true...
|
| If work went into standardising a better API than the DOM
| we might live in a world without hunger, where all our
| dreams could become reality. But this is what we have, a
| steaming pile of crap. But hey, at least it's a standard
| steaming pile of crap that we can all rally around.
|
| I hate it, but I hate it the least of all the options
| presented.
| Yoric wrote:
| Who ever said that?
| flohofwoe wrote:
| > but that's because this was the first time it is possible
|
| Using SPIRV as abstraction layer for GPU code across all 3D
| APIs is hardly a new thing (via SPIRVCross, Naga or Tint),
| and the LLVM SPIRV backend is also well established by now.
| LegNeato wrote:
| Those don't include CUDA and don't include the CPU host
| side AFAIK.
|
| SPIR-V isn't the main abstraction layer here, Rust is. This
| is the first time it is possible for Rust host + device
| across all these platforms and OSes and device apis.
|
| You could make an argument that CubeCL enabled something
| similar first, but it is more a DSL that looks like Rust
| rather than the Rust language proper(but still cool).
| socalgal2 wrote:
| > This is the first time it is possible for Rust host +
| device across all these platforms and OSes and device
| apis.
|
| I thought wgpu already did that. The new thing here is
| you code shaders in rust, not WGSL like you do with wgpu
| LegNeato wrote:
| Correct. The new thing is those shaders/kernel also run
| via CUDA and on CPU unchanged. You could not do that with
| only wgpu...there is no rust shader input, (and the thing
| that enables it rust-gpu which is used here) and if you
| wrote your code in a shader lang it wouldn't run on the
| CPU (as they are made for GPU only) or via CUDA.
| winocm wrote:
| LLVM SPIR-V's backend is a bit... questionable when it
| comes to code generation.
| tombh wrote:
| I think it's worth bearing in mind that all `rust-gpu` does is
| compile to SPIRV, which is Vulkan's IR. So in a sense layers 2.
| and 3. are optional, or at least parallel layers rather than
| accumulative.
|
| And it's also worth remembering that all of Rust's tooling can
| be used for building its shaders; `cargo`, `cargo test`, `cargo
| clippy`, `rust-analyzer` (Rust's LSP server).
|
| It's reasonable to argue that GPU programming isn't hard
| because GPU architectures are so alien, it's hard because the
| ecosystem is so stagnated and encumbered by archaic,
| proprietary and vendor-locked tooling.
| reactordev wrote:
| Layers 2 and 3 are implementation specific and you can do it
| however you wish. The point is that a rust program is running
| on your GPU, whatever GPU. That's amazing!
| dahart wrote:
| Fair point, though layers 4-6 are always there, including for
| shaders and CUDA code, and layers 1 and 3 are usually replaced
| with a different layer, especially for anything cross-platform.
| So this Rust project might be adding a layer of abstraction,
| but probably only one-ish.
|
| I work on layers 4-6 and I can confirm there's a lot of hidden
| complexity in there. I'd say there are more than 3 layers there
| too. :P
| ajross wrote:
| There is absolutely an xkcd 927 feel to this.
|
| But that's not the fault of the new abstraction layers, it's
| the fault of the GPU industry and its outrageous refusal to
| coordinate on anything, at all, ever. Every generation of GPU
| from every vendor has its own toolchain, its own ideas about
| architecture, its own entirely hidden and undocumented set of
| quirks, its own secret sauce interfaces available only in its
| own incompatible development environment...
|
| CPUs weren't like this. People figured out a basic model for
| programming them back in the 60's and everyone agreed that open
| docs and collabora-competing toolchains and environments were a
| good thing. But GPUs never got the memo, and things are a huge
| mess and remain so.
|
| All the folks up here in the open source community can do is
| add abstraction layers, which is why we have thirty seven
| "shading languages" now.
| yjftsjthsd-h wrote:
| In fairness, the ability to restructure at will probably does
| make it easier to improve things.
| Ygg2 wrote:
| Improve things for who?
| bee_rider wrote:
| Pretty sure they mean improve performance; number
| crunching ability.
| Ygg2 wrote:
| In consumer GPU land, that's yet to be observed.
| ajross wrote:
| The fact that the upper parts of the stack are so
| commoditized (i.e. CUDA and WGSL do not in fact represent
| particularly different modes of computation, and of course
| the linked article shows that you can drive everything
| pretty well with scalar rust code) argues strongly against
| that. Things aren't incompatible because of innovation,
| they're incompatible because of expedience and paranoia.
| jcranmer wrote:
| CPUs, almost from the get-go, were intended to be programmed
| by people other than the company who built the CPU, and thus
| the need for a stable, persistent, well-defined ISA interface
| was recognized very early on. But for pretty much every other
| computer peripheral, the responsibility for the code running
| on those embedded processors has been with the hardware
| vendor, their responsibility ending at providing a system
| library interface. With literal decades of experience in an
| environment where they're freed from the burden of
| maintaining stable low-level details, all of these
| development groups have quite jealously guarded access to
| that low level and actively resist any attempts to push the
| interface layers lower.
|
| As frustrating as it is, GPUs are actually the _most_ open of
| the accelerator classes, since they 've been forced to accept
| a layer like PTX or SPIR-V; trying to do that with other
| kinds of accelerators is really pulling teeth.
| rhaps0dy wrote:
| Though if the rust compiles to NVVM it's exactly as bad as C++
| CUDA, no?
| flohofwoe wrote:
| Tbf, Proton on Linux is about the same number of abstraction
| layers, and that sometimes has better peformance than Windows
| games running on Windows.
| ben-schaaf wrote:
| That looks like the graphics stack of a modern game engine.
| Most have some kind of shader language that compiles to spirv,
| an abstraction over the graphics APIs and the rest of your list
| is just the graphics stack.
| dontlaugh wrote:
| It's not all that much worse than a compiler and runtime
| targeting multiple CPU architectures, with different calling
| conventions, endianess, etc. and at the hardware level
| different firmware and microcode.
| kelnos wrote:
| > _It 's also questionable how well performance relevant
| platform specifics survive all these layers._
|
| Fair point, but one of Rust's strengths is the many zero-cost
| abstractions it provides. And the article talks about how the
| code complies to the GPU-specific machine code or IR.
| Ultimately the efficiency and optimization abilities of that
| compiler is going to determine how well your code runs, just
| like any other compilation process.
|
| This project doesn't even add _that_ much. In "traditional"
| GPU code, you're still going to have:
|
| 1. Domain specific GPU code in whatever high-level language
| you've chosen to work in for the target you want to support.
| (Or more than one, if you need it, which isn't fun.)
|
| ...
|
| 3. Compiler that compiles your GPU code into whatever machine
| code or IR the GPU expects.
|
| 4. Vulkan, OpenGL, DX12 and Metal...
|
| 5. Drivers...
|
| 6. Hardware...
|
| So yes, there's an extra layer here. But I think many
| developers will gladly take on that trade off for the ability
| to target so many software and hardware combinations in one
| codebase/binary. And hopefully as they polish the project,
| debugging issues will become more straightforward.
| Archit3ch wrote:
| I write native audio apps, where every cycle matters. I also need
| the full compute API instead of graphics shaders.
|
| Is the "Rust -> WebGPU -> SPIR-V -> MSL -> Metal" pipeline robust
| when it come to performance? To me, it seems brittle and hard to
| reason about all these translation stages. Ditto for "... ->
| Vulkan -> MoltenVk -> ...".
|
| Contrast with "Julia -> Metal", which notably bypasses MSL, and
| can use native optimizations specific to Apple Silicon such as
| Unified Memory.
|
| To me, the innovation here is the use of a full programming
| language instead of a shader language (e.g. Slang). Rust supports
| newtype, traits, macros, and so on.
| tucnak wrote:
| I must agree that for numerical computation (and downstream
| optimisation thereof) Julia is much better suited than
| ostensibly "systems" language such as Rust. Moreover, the
| compatibility matrix[1] for Rust-CUDA tells a story: there's
| seemingly very little demand for CUDA programming in Rust, and
| most parts that people love about CUDA are notably missing. If
| there was demand, surely it would get more traction, alas, it
| would appear that actual CUDA programmers have very little
| appetite for it...
|
| [1]: https://github.com/Rust-GPU/Rust-
| CUDA/blob/main/guide/src/fe...
| Ygg2 wrote:
| It's not just that. See CUDA EULA at
| https://docs.nvidia.com/cuda/eula/index.html
|
| Section 1.2 Limitations: You may not
| reverse engineer, decompile or disassemble any portion of the
| output generated using SDK elements for the purpose of
| translating such output artifacts to **target a non-NVIDIA
| platform**.
|
| Emphasis mine.
| bigyabai wrote:
| > Is the "Rust -> WebGPU -> SPIR-V -> MSL -> Metal" pipeline
| robust when it come to performance?
|
| It's basically the same concept as Apple's Clang optimizations,
| but for the GPU. SPIR-V is an IR just like the one in LLVM,
| which can be used for system-specific optimization. In theory,
| you can keep the one codebase to target any number of supported
| raster GPUs.
|
| The Julia -> Metal stack is comparatively not very portable,
| which probably doesn't matter if you write Audio Unit plugins.
| But I could definitely see how the bigger cross-platform devs
| like u-he or Spectrasonics would value a more complex SPIR-V
| based pipeline.
| jdbohrman wrote:
| Why though
| gedw99 wrote:
| I am over joyed to see this.
|
| They are doing a huge service for developers that just want to
| build stuff and not get into the platform wars.
|
| https://github.com/cogentcore/webgpu is a great example . I code
| in golang and just need stuff to work on everything and this gets
| it done, so I can use the GPU on everything.
|
| Thank you rust !!
| ivanjermakov wrote:
| Is it really "Rust" on GPU? Skimming through the code, it looks
| like shader language within proc macro heavy Rust syntax.
|
| I think GPU programming is different enough to require special
| care. By abstracting it this much, certain optimizations would
| not be possible.
| dvtkrlbs wrote:
| It is normal rust code compiled to spirv bytecode.
| LegNeato wrote:
| And it uses 3rd party deps from crates.io that are completely
| GPU unaware.
| max-privatevoid wrote:
| It would be great if Rust people learned how to properly load GPU
| libraries first.
| zbentley wrote:
| Say more?
| max-privatevoid wrote:
| Rust GPU libraries such as wgpu and ash rely on external
| libraries such as vulkan-loader to load the actual ICDs, but
| for some reason Rust people really love dlopening them
| instead of linking to them normally. Then it's up to the
| consumer to configure their linker flags correctly so RPATH
| gets set correctly when needed, but because most people don't
| know how to use their linker, they usually end up with dumb
| hacks like these instead:
|
| https://github.com/Rust-GPU/rust-
| gpu/blob/87ea628070561f576a...
|
| https://github.com/gfx-
| rs/wgpu/blob/bf86ac3489614ed2b212ea2f...
| LegNeato wrote:
| Can you file a bug on rust-gpu? I'd love to look into it (I
| am unfamiliar with this area).
| max-privatevoid wrote:
| Done. https://github.com/Rust-GPU/rust-gpu/issues/351
| guipsp wrote:
| Where's the dumb hack?
| viraptor wrote:
| Isn't it typically done with Vulkan, because apps don't
| know which library you wanted to use later? On a multi-gpu
| system you may want to switch (for example) Intel/NVIDIA
| implementation at runtime rather than linking directly.
| bobajeff wrote:
| I applaud the attempt this project and the GPU Working Group are
| making here. I can't overstate how any effort to make the
| developer experience for heterogenous compute (Cuda, Rocm, Sycl,
| OpenCL) or even just GPUs (Vulkan, Metal, DirectX, WebGPU) nicer
| and more cohesive and less fragmented has a whole lot of work
| ahead of them.
| slashdev wrote:
| This is a little crude still, but the fact that this is even
| possible is mind blowing. This has the potential, if progress
| continues, to break the vendor-locked nightmare that is GPU
| software and open up the space to real competition between
| hardware vendors.
|
| Imagine a world where machine learning models are written in Rust
| and can run on both Nvidia and AMD.
|
| To get max performance you likely have to break the abstraction
| and write some vendor-specific code for each, but that's an
| optimization problem. You still have a portable kernel that runs
| cross platform.
| bwfan123 wrote:
| > Imagine a world where machine learning models are written in
| Rust and can run on both Nvidia and AMD
|
| Not likely in the next decade if ever. Unfortunately, the
| entire ecosystems of jax and torch are python based. Imagine
| retraining all those devs to use rust tooling.
| willglynn wrote:
| You might be interested in https://burn.dev, a Rust machine
| learning framework. It has CUDA and ROCm backends among others.
| slashdev wrote:
| I am interested, thanks for sharing!
| reactordev wrote:
| This is amazing. With time, we should be able to write GPU
| programs semantically identical to user land programs.
|
| The implications of this for inference is going to be huge.
| 5pl1n73r wrote:
| CUDA programming consists of making a kernel parameterized by
| its thread id which is used to slightly alter its behavior
| while it executes on many hundreds of GPU cores; it's very
| different than general purpose programming. Memory and
| branching behave differently there. I'd say at best, it will be
| like traditional programs and libraries with multiple
| incompatible event loops.
| melodyogonna wrote:
| Very interesting. I wonder about the model of storing the GPU IR
| in binary for a real-world project; it seems like that could
| bloat the binary size a lot.
|
| I also wonder about the performance of just compiling for a
| target GPU AOT. These GPUs can be very different even if they
| come from the same vendor. This seems like it would compile to
| the lowest common denominator for each vendor, leaving
| performance on the table. For example, Nvidia H-100s and Nvidia
| Blackwell GPUs are different beasts, with specialised intrinsics
| that are not shared, and to generate a PTX that would work on
| both would require not using specialised features in one or both
| of these GPUs.
|
| Mojo solves these problems by JIT compiling GPU kernels at the
| point where they're launched.
| LegNeato wrote:
| The underlying projects support loading at runtime, so you
| could have as many AOT compiled kernel variants and load the
| one you want. Disk is cheap. You could even ship rustc if you
| really wanted for "JIT" (lol), but maybe not super crazy as
| mojo is llvm based anyway. There is of course the warmup /
| stuttering problem, but that is a separate issue and
| (sometimes) not an issue for compute vs graphics where it is a
| bigger issue. I have some thoughts on how to improve the status
| quo with things unique to Rust but too early to know.
|
| One of the issues with GPUs as a platform is runtime probing of
| capabilities is... rudimentary to say the least. Rust has to
| deal with similar stuff with CPUs+SIMD FWIW. AOT vs JIT is not
| a new problem domain and there are no silver bullets only
| tradeoffs. Mojo hasn't solved anything in particular, their
| position in the solution space (JIT) has the same tradeoffs as
| anyone else doing JIT.
| melodyogonna wrote:
| > The underlying projects support loading at runtime, so you
| could have as many AOT compiled kernel variants and load the
| one you want.
|
| I'm not sure I understand. What underlying projects? The only
| reference to loading at runtime I see on the post is loading
| the AOT-compiled IR.
|
| > There is of course the warmup / stuttering problem, but
| that is a separate issue and (sometimes) not an issue for
| compute vs graphics where it is a bigger issue.
|
| It should be worth noting that Mojo itself is not JIT
| compiled; Mojo has a GPU infrastructure that can JIT compile
| Mojo code at runtime [1].
|
| > One of the issues with GPUs as a platform is runtime
| probing of capabilities is... rudimentary to say the least.
|
| Also not an issue in Mojo when you combine the selective JIT
| compilation I mentioned with powerful compile-time
| programming [2].
|
| 1. https://docs.modular.com/mojo/manual/gpu/intro-
| tutorial#4-co... - here the kernel "print_threads" will be
| jit compiled
|
| 2. https://docs.modular.com/mojo/manual/parameters/
___________________________________________________________________
(page generated 2025-07-26 23:00 UTC)