[HN Gopher] VUDA: A Vulkan Implementation of CUDA
___________________________________________________________________
VUDA: A Vulkan Implementation of CUDA
Author : tormeh
Score : 256 points
Date : 2023-07-01 13:00 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| netheril96 wrote:
| Just in case other people who have AMD GPU and run Windows have
| the same needs as I have, that is, to train or run machine
| learning models, please checkout torch-directml and tensorflow-
| directml.
| bornfreddy wrote:
| Does that work? I might be in market for a new GPU if AMD had
| something that beats NVidia for ML (for sane price)... I can't
| really justify buying NVidia GPU, anything decent is too
| expensive.
| skocznymroczny wrote:
| How does it work? Last time I tried DirectML it wasn't well
| supposed and there was little software which supported it. Also
| the performance seemed to be not too great. I am currently
| using a Linux install because with ROCm I can use popular tools
| like Automatic111 webui and oobabooga.
| HarHarVeryFunny wrote:
| I'm not sure this really makes any more sense than AMD chasing
| CUDA compatibility with ROCm/MiOpen/HIP. CUDA and DirectX seem
| too low level to be used as a compatibility API over widely
| divergent hardware (AMD vs NVidia) without giving up a lot of
| performance.
|
| cuDNN being higher level offers more opportunity for
| compatibility without losing performance (i.e different
| implementations of kernels fine-tuned for optimal performance
| on AMD vs NVidia hardware), but the trouble is that so much of
| what frameworks like PyTorch do is based on custom kernels, not
| just cuDNN.
|
| It seems the best bet for AMD would be a rock solid low level
| API (not a moving target) and support of high level optimizing
| ML compilers to reduce the level of effort for the framework
| (PyTorch, TensorFlow, JAX ...) vendors to provide framework-
| level support on top of that. Ultimately they'd need to work
| very closely with the framework vendors to provide this
| support, since they are the ones who would be benefiting from
| it.
|
| It's odd how much of an afterthought ML support has seemed to
| be for AMD over the years... maybe the relative size of the
| consumer ML market vs graphics/gaming market didn't seem to
| make it worth their effort, but as NVidia has shown this is a
| path to gaining much more lucrative data center wins.
| gymbeaux wrote:
| Things like this pop up relatively often but they never pick up
| steam and I am still using Nvidia GPUs. I would imagine this is
| no different.
| wtcactus wrote:
| Seems dead. Last commit was February 2022.
| qwertox wrote:
| And that was only one line added. Most of the code is 3 and 5
| years old.
| xeonmc wrote:
| Huge missed opportunity to call it "Vuudoo"
| panzi wrote:
| There needs to be a 3rd implementation called SHUDA.
| jaimex2 wrote:
| I have no hope for AMD. They should have made compatibility tools
| yesterday.
| saboot wrote:
| That is quite interesting .. so I should be able to run my cuda
| accelerated programs on AMD and Intel devices then, correct?
| josalhor wrote:
| As someone who has never programmed directly for a GPU, how does
| this compare to HIP? Can this be an efficient abstraction over
| Nvidia and AMD GPUs?
| westurner wrote:
| From https://news.ycombinator.com/item?id=34399633 :
|
| >>> _hipify-clang is a clang-based tool for translating CUDA
| sources into HIP sources. It translates CUDA source into an
| abstract syntax tree, which is traversed by transformation
| matchers. After applying all the matchers, the output HIP
| source is produced. [...]_
|
| (Edit) CUDA APIs supported by hipify-clang:
| https://rocm.docs.amd.com/projects/HIPIFY/en/latest/supporte...
| nightowl_games wrote:
| How does this relate to the goals outlined by George Hotz to
| bring ML to AMD chips and break the Nvidia dominance?
|
| I'm not an expert here but this approach seems powerful and
| important. But this system seems complex enough to doubt the
| ability of an individual to build. It seems like this would need
| a corporate sponsor to get off the ground. Perhaps AMD itself
| would be interested in paying engineers to iterate on this?
| jitl wrote:
| Holtz is talking about drivers too, not just user space
| libraries.
|
| > The software is terrible! There's kernel panics in the
| driver. You have to run a newer kernel than the Ubuntu default
| to make it remotely stable. I'm still not sure if the driver
| supports putting two cards in one machine, or if there's some
| poorly written global state. When I put the second card in and
| run an OpenCL program, half the time it kernel panics and you
| have to reboot.
|
| He also talks about user space stuff but clearly he thinks the
| whole stack, above and below this kind of library also needs a
| lot of work.
| cwillu wrote:
| Par for the course with new kernel things: it's unusual for
| something new in the kernel to be stable in the distro
| kernels unless they've devoted a great deal of effort to
| backport things.
| arthur2e5 wrote:
| The way conservative distros define "stable" is part of the
| problem. For things less than 3 years old, going for stale
| versions often runs counter to "stable".
| iforgotpassword wrote:
| This is still so mind-boggling to me. AMD should be in a good
| financial position now that Zen was such a success, and that
| their GPUs are catching up too. Why are their drivers still a
| Clusterfuck across the board after all these years? Why not
| throw more manpower at the problem?
|
| I'm sure even if their GPUs were twice as fast as Nvidia's,
| everybody would still buy team green because it's better to
| have a card that works than a broken piece of garbage. We
| tried to get an MI50 to work reliably at work with KVM, but
| that thing was a complete dumpster fire. A colleague just
| bought a 7900XTX for gaming and spent days getting it to
| work. This included three Windows reinstalls. And that use
| case is gaming on Windows, which supposedly is the best
| supported case. It only gets worse from there. Compute on
| Linux? Lol.
|
| Now last time this topic came up, someone claimed that AMD is
| pretty much at their limits production wise, and there are a
| few unnamed large companies buying loads of their cards for
| compute and cloud gaming, and AMD basically has engineers
| dedicated to making sure things work exactly for their use
| case, so they don't have to really care about the rest.
| Sounds pretty wild, but not completely unrealistic...
| derstander wrote:
| > And that use case is gaming on Windows, which supposedly
| is the best supported case.
|
| I'm being a little tongue-in-cheek here, but the best
| supported case for AMD is gaming via console: AMD provides
| CPU/GPU for the current generation of both the XBox _and_
| PlayStation consoles.
|
| Which suggests to me that they shouldn't have too much
| problem supporting their hardware on Windows or Linux. But
| that's outside of my area of expertise. Maybe they need to
| spend too much engineer effort and time supporting the
| consoles at what's probably a pretty thin profit margin?
| Narishma wrote:
| The previous generation as well, and the one before for
| Xbox 360 and Wii, and the one before for Gamecube.
| roadbuster wrote:
| > Why not throw more manpower at the problem?
|
| Oh, of course.
| iforgotpassword wrote:
| No, really. We've worked with Intel, Nvidia and AMD...
| Well for the latter, at least tried. We're not a big
| fish, but response time and quality of responses were
| stellar with Intel and Nvidia. AMD took weeks and even
| when asking very precise questions with lots of technical
| background, there was a lot of "hmm dunno have to find
| someone who'd know" kind of answers, and it would often
| take one to two weeks for a single reply. And that's not
| even dev work, it's just tech support for your own damn
| stuff you're trying to sell.
|
| You can't seriously tell me that's not something they
| could fix.
| hedora wrote:
| I've had the opposite experience under Linux.
|
| My old nvidia 570's drivers went into severe bitrot. Basic
| stuff like screensavers and desktops broke badly, and games
| were flaky. The card is still more than powerful enough for
| what I used it for.
|
| I switched to AMD, with open source drivers. I get windows-
| level performance on AAA (and indie) games in steam, and
| zero compatibility issues with the rest of the Linux
| ecosystem.
| amlib wrote:
| AMD GPU support in linux is a bit of a flip flop. Some
| generations seem to get a lot of love and work really
| damn well (sometimes with better and broader support than
| the windows drivers) like RDNA2 and most Polaris cards.
| Others such as Vega and specially now RDNA3 are a
| shitshow with a lot of things just broken.
| tyfon wrote:
| I ran vega 64 in linux for 3-4 years, it was really nice.
| It also worked without bugs with proton, the 3060 I have
| now gives me a lot of artifacts like incorrect lightning
| and even X crashes once in a while.
|
| I'm considering switching back to an AMD card due to
| this.
| gcoakes wrote:
| I used a Vega64 for years and just bought a 6000 series.
| Both work great on my machines. I'm typically running the
| bleeding edge kernels though, so that might explain it a
| bit. I would think Ubuntu is probably the most supported
| if you opt for the OEM kernel.
| aseipp wrote:
| It really depends on the card. I have an old RDNA
| workstation card in my server and the driver was a real
| crapshoot. I eventually started delaying updates to newer
| kernel releases (which would fix other bugs!) because
| there would be regressions of various kinds. Graphics
| under Linux is still a bit painful after all these years.
|
| At least Nvidia finally open sourced their driver too, I
| guess. And Intel is still open source. But it still sucks
| a bit I think unless you do research.
| izacus wrote:
| Why are you talking about screensavers in a CUDA thread
| though? You can't compute on a fancy screensaver
| animation, you need a working CUDA driver that nVidia
| provides for that.
| kbenson wrote:
| Likely because this subsection of the thread seems more
| focused on drivers in general and less on CUDA, if the
| other replies are anything to go by.
| fiddlerwoaroof wrote:
| Yeah, I've always bought AMD for this reason since some
| really bad experiences with Nvidia cards.
| varelse wrote:
| [dead]
| imtringued wrote:
| The only hope is rusticl and it happens inspite of AMD.
| roenxi wrote:
| Hmm. I found
|
| https://www.youtube.com/watch?v=Mr0rWJhv9jU
|
| and
|
| https://geohot.github.io/blog/jekyll/update/2023/06/07/a-div...
|
| I feel a lot better about my journey with AMD now; there seemed
| to be some major issues with their GPU drivers. Now I know it
| wasn't just me.
| DrNosferatu wrote:
| Sounds great!
|
| How far is it actually compatible right now?
|
| Are there any tests / benchmarks?
|
| Can this be used to run CUDA-accelerated LLMs?
| syntaxers wrote:
| > > VUDA only takes kernels in SPIR-V format. VUDA does not
| provide any support for compiling CUDA C kernels directly to
| SPIR-V (yet). However, it does not know or care how the SPIR-V
| source was created - may it be GLSL, HLSL, OpenCL.
|
| So the answer is no, it can't be used with kernels that use
| cublas or cudnn, which excludes almost all ML use-cases.
| johndough wrote:
| The Vulkan spec does not enforce strict IEEE 754 floating point
| semantics, so perfect compatibility with CUDA is impossible.
|
| https://registry.khronos.org/vulkan/specs/1.3-khr-extensions...
|
| However, the deep learning field does currently not pay much
| attention to reproducibility, so this might not be a big issue.
| my123 wrote:
| Memory management in Vulkan is _very_ restricted. Nothing
| remotely like UVM. CUDA on Vulkan for these reasons will
| always stay a pet project at best, with no shot at usable
| quality whatsoever.
| empyrrhicist wrote:
| Doesn't look very active, does it still work?
| fancyfredbot wrote:
| It's not an implementation of CUDA, it's an implementation of the
| CUDA runtime API. The API is used to configure the card, allocate
| and copy memory, and run kernels. Importantly you cannot use this
| to write the actual kernels which run on the GPU!
| pjmlp wrote:
| Additionally, any wannabe CUDA replacement needs to support PTX
| and polyglot development, or is a non starter for lots of
| workloads.
| RicoElectrico wrote:
| So what is this useful for, then?
| sheepscreek wrote:
| I was half hoping this meant running CUDA code on AMD GPUs.
| Thanks for clarifying.
| empyrrhicist wrote:
| I know AMD has a whole bunch of (related?) projects for GPU
| compute, but man - if they could just provide an interop
| layer that Just Works they'd get immediate access to so much
| more market share.
| collsni wrote:
| It is coming from what I can tell
| meragrin_ wrote:
| It's been coming for years now. I will probably be years
| before it really is here.
| sanxiyn wrote:
| Eh well, it is very close to just working. From "Training
| LLMs with AMD MI250 GPUs and MosaicML":
|
| > It all just works. No code changes were needed.
|
| https://www.mosaicml.com/blog/amd-mi250
| paulmd wrote:
| "Just works" in this context means executing the compiled
| CUDA or the PTX bytecode without recompiling. Nobody is
| ever going to utilize ROCm if it requires distributing as
| source and recompiling.
|
| To make it even more insulting, even simply installing
| ROCm itself is a massive burden, even on an ostensibly-
| supported (as geohot discovered) and even just "it works
| out of the box if you distribute and compile it locally"
| is ignoring that whole massive "draw the rest of the owl"
| stage of getting ROCm installed and building properly in
| your environment.
| causality0 wrote:
| Don't forget AMD doesn't seem to even care about ROCm
| themselves. Six months in and RDNA3 cards still don't
| support it. Can you imagine if Nvidia launched RTX40-
| cards with no DLSS even though 30- cards already had it,
| and six months started boasting about how DLSS support
| was "coming this fall"?
| i80and wrote:
| I've been running PyTorch on my Radeon 7900 XT using
| ROCm. Is that not supposed to work?
| graphe wrote:
| No, it actually isn't supposed to work, it's not
| officially supported. https://sep5.readthedocs.io/en/late
| st/Installation_Guide/Ins...
| i80and wrote:
| Fascinating. And yet.
| wmf wrote:
| ROCm is for CDNA not RDNA. It has limited, best-effort
| RDNA support for a few cards.
| heyoni wrote:
| Then they too could call themselves an AI company!
| empyrrhicist wrote:
| I've been hoping for it for so long - I wonder if there's
| enough interest that someone could do a GoFundMe to hire
| at least one full time dev lol.
| jdoerfert wrote:
| Shameless plug: https://www.osti.gov/servlets/purl/1892137
|
| TLDR; If you provide even more functions through the
| overloaded headers, incl. "hidden ones", e.g.,
| `__cudaPushCallConfiguration`, you can use LLVM/Clang as a
| CUDA compiler and target AMD GPUs, the host, and soon GPUs of
| two other manufacturers.
| einpoklum wrote:
| 1. This implements the clunky C-ish API; there's also the
| Modern-C++ API wrappers, with automatic error checking, RAII
| resource control etc.; see: https://github.com/eyalroz/cuda-api-
| wrappers (due disclosure: I'm the author)
|
| 2. Implementing the _runtime_ API is not the right choice; it's
| important to implement the _driver_ API, otherwise you can't
| isolate contexts, dynamically add newly-compiled JIT kernels via
| modules etc.
|
| 3. This is less than 3000 lines of code. Wrapping all of the core
| CUDA APIs (driver, runtime, NVTX, JIT compilation of CUDA-C++ and
| of PTX) took me > 14,000 LoC.
| rootw0rm wrote:
| nice project. this is why HN kicks ass
| einpoklum wrote:
| Thanks for the compliment :-)
|
| What I _really_ like to receive, though, is feedback from
| using the wrappers, ideas for changes/improvements, and of
| course messages volunteering to QA new versions before their
| release :-P
| AndrewKemendo wrote:
| This could be a big deal if we have an actual alternative to CUDA
|
| I can't see NVIDIA letting this just exist
| Ballas wrote:
| My opinion is that CUDA is not the mote keeping the others out
| - it's the CUDNN (and CUBLAS), more specifically the level to
| which they are optimized.
| roenxi wrote:
| I'm not even certain optimisation matters. I can crash my
| machine (AMD graphics) with a stock Debian install by letting
| something attempt BLAS on the GPU.
|
| The situation is starting to improve though. Installed a
| bunch of libraries from https://repo.radeon.com/rocm/apt/5.4
| jammy main and the crashes got less frequent. I don't have a
| lot of faith in AMD to deliver reliable BLAS libraries at
| this point, but it could happen. The hardware is there, I
| just don't think they're prioritising supporting the right
| places in the distribution chain or supporting consumer-level
| graphics.
| metal_am wrote:
| How about something like MAGMA?
| [deleted]
| flykespice wrote:
| And on what legal ground would NVIDIA have to take this down?
| themoonisachees wrote:
| When you have Nvidia money you don't need grounds to sue, the
| lawyers will think of something and drag any open-source devs
| through years-long suits.
|
| The only saving grace would be Oracle v Google which
| established the de jure that an API isn't copyrightable.
| hedora wrote:
| APIs weren't copyrightable before Oracle v Google. There
| was plenty of precedent saying that. For example, before
| they were called Oracle, they built a clone of IBM SEQUEL.
|
| The main concern with Oracle v Google was that the court
| would ignore or misinterpret the existing precedent.
|
| A secondary concern was that a Google employee formerly
| worked on Java at Sun (and/or Oracle), and copy-pasted some
| implementation source code from oracle to google's code
| bases. There was a real possibility the "APIs aren't
| copyrightable" precedent would stand, but the courts would
| rule that Google couldn't continue distributing Dalvik.
___________________________________________________________________
(page generated 2023-07-01 23:00 UTC)