[HN Gopher] VUDA: A Vulkan Implementation of CUDA
       ___________________________________________________________________
        
       VUDA: A Vulkan Implementation of CUDA
        
       Author : tormeh
       Score  : 256 points
       Date   : 2023-07-01 13:00 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | netheril96 wrote:
       | Just in case other people who have AMD GPU and run Windows have
       | the same needs as I have, that is, to train or run machine
       | learning models, please checkout torch-directml and tensorflow-
       | directml.
        
         | bornfreddy wrote:
         | Does that work? I might be in market for a new GPU if AMD had
         | something that beats NVidia for ML (for sane price)... I can't
         | really justify buying NVidia GPU, anything decent is too
         | expensive.
        
         | skocznymroczny wrote:
         | How does it work? Last time I tried DirectML it wasn't well
         | supposed and there was little software which supported it. Also
         | the performance seemed to be not too great. I am currently
         | using a Linux install because with ROCm I can use popular tools
         | like Automatic111 webui and oobabooga.
        
         | HarHarVeryFunny wrote:
         | I'm not sure this really makes any more sense than AMD chasing
         | CUDA compatibility with ROCm/MiOpen/HIP. CUDA and DirectX seem
         | too low level to be used as a compatibility API over widely
         | divergent hardware (AMD vs NVidia) without giving up a lot of
         | performance.
         | 
         | cuDNN being higher level offers more opportunity for
         | compatibility without losing performance (i.e different
         | implementations of kernels fine-tuned for optimal performance
         | on AMD vs NVidia hardware), but the trouble is that so much of
         | what frameworks like PyTorch do is based on custom kernels, not
         | just cuDNN.
         | 
         | It seems the best bet for AMD would be a rock solid low level
         | API (not a moving target) and support of high level optimizing
         | ML compilers to reduce the level of effort for the framework
         | (PyTorch, TensorFlow, JAX ...) vendors to provide framework-
         | level support on top of that. Ultimately they'd need to work
         | very closely with the framework vendors to provide this
         | support, since they are the ones who would be benefiting from
         | it.
         | 
         | It's odd how much of an afterthought ML support has seemed to
         | be for AMD over the years... maybe the relative size of the
         | consumer ML market vs graphics/gaming market didn't seem to
         | make it worth their effort, but as NVidia has shown this is a
         | path to gaining much more lucrative data center wins.
        
       | gymbeaux wrote:
       | Things like this pop up relatively often but they never pick up
       | steam and I am still using Nvidia GPUs. I would imagine this is
       | no different.
        
       | wtcactus wrote:
       | Seems dead. Last commit was February 2022.
        
         | qwertox wrote:
         | And that was only one line added. Most of the code is 3 and 5
         | years old.
        
       | xeonmc wrote:
       | Huge missed opportunity to call it "Vuudoo"
        
       | panzi wrote:
       | There needs to be a 3rd implementation called SHUDA.
        
       | jaimex2 wrote:
       | I have no hope for AMD. They should have made compatibility tools
       | yesterday.
        
       | saboot wrote:
       | That is quite interesting .. so I should be able to run my cuda
       | accelerated programs on AMD and Intel devices then, correct?
        
       | josalhor wrote:
       | As someone who has never programmed directly for a GPU, how does
       | this compare to HIP? Can this be an efficient abstraction over
       | Nvidia and AMD GPUs?
        
         | westurner wrote:
         | From https://news.ycombinator.com/item?id=34399633 :
         | 
         | >>> _hipify-clang is a clang-based tool for translating CUDA
         | sources into HIP sources. It translates CUDA source into an
         | abstract syntax tree, which is traversed by transformation
         | matchers. After applying all the matchers, the output HIP
         | source is produced. [...]_
         | 
         | (Edit) CUDA APIs supported by hipify-clang:
         | https://rocm.docs.amd.com/projects/HIPIFY/en/latest/supporte...
        
       | nightowl_games wrote:
       | How does this relate to the goals outlined by George Hotz to
       | bring ML to AMD chips and break the Nvidia dominance?
       | 
       | I'm not an expert here but this approach seems powerful and
       | important. But this system seems complex enough to doubt the
       | ability of an individual to build. It seems like this would need
       | a corporate sponsor to get off the ground. Perhaps AMD itself
       | would be interested in paying engineers to iterate on this?
        
         | jitl wrote:
         | Holtz is talking about drivers too, not just user space
         | libraries.
         | 
         | > The software is terrible! There's kernel panics in the
         | driver. You have to run a newer kernel than the Ubuntu default
         | to make it remotely stable. I'm still not sure if the driver
         | supports putting two cards in one machine, or if there's some
         | poorly written global state. When I put the second card in and
         | run an OpenCL program, half the time it kernel panics and you
         | have to reboot.
         | 
         | He also talks about user space stuff but clearly he thinks the
         | whole stack, above and below this kind of library also needs a
         | lot of work.
        
           | cwillu wrote:
           | Par for the course with new kernel things: it's unusual for
           | something new in the kernel to be stable in the distro
           | kernels unless they've devoted a great deal of effort to
           | backport things.
        
             | arthur2e5 wrote:
             | The way conservative distros define "stable" is part of the
             | problem. For things less than 3 years old, going for stale
             | versions often runs counter to "stable".
        
           | iforgotpassword wrote:
           | This is still so mind-boggling to me. AMD should be in a good
           | financial position now that Zen was such a success, and that
           | their GPUs are catching up too. Why are their drivers still a
           | Clusterfuck across the board after all these years? Why not
           | throw more manpower at the problem?
           | 
           | I'm sure even if their GPUs were twice as fast as Nvidia's,
           | everybody would still buy team green because it's better to
           | have a card that works than a broken piece of garbage. We
           | tried to get an MI50 to work reliably at work with KVM, but
           | that thing was a complete dumpster fire. A colleague just
           | bought a 7900XTX for gaming and spent days getting it to
           | work. This included three Windows reinstalls. And that use
           | case is gaming on Windows, which supposedly is the best
           | supported case. It only gets worse from there. Compute on
           | Linux? Lol.
           | 
           | Now last time this topic came up, someone claimed that AMD is
           | pretty much at their limits production wise, and there are a
           | few unnamed large companies buying loads of their cards for
           | compute and cloud gaming, and AMD basically has engineers
           | dedicated to making sure things work exactly for their use
           | case, so they don't have to really care about the rest.
           | Sounds pretty wild, but not completely unrealistic...
        
             | derstander wrote:
             | > And that use case is gaming on Windows, which supposedly
             | is the best supported case.
             | 
             | I'm being a little tongue-in-cheek here, but the best
             | supported case for AMD is gaming via console: AMD provides
             | CPU/GPU for the current generation of both the XBox _and_
             | PlayStation consoles.
             | 
             | Which suggests to me that they shouldn't have too much
             | problem supporting their hardware on Windows or Linux. But
             | that's outside of my area of expertise. Maybe they need to
             | spend too much engineer effort and time supporting the
             | consoles at what's probably a pretty thin profit margin?
        
               | Narishma wrote:
               | The previous generation as well, and the one before for
               | Xbox 360 and Wii, and the one before for Gamecube.
        
             | roadbuster wrote:
             | > Why not throw more manpower at the problem?
             | 
             | Oh, of course.
        
               | iforgotpassword wrote:
               | No, really. We've worked with Intel, Nvidia and AMD...
               | Well for the latter, at least tried. We're not a big
               | fish, but response time and quality of responses were
               | stellar with Intel and Nvidia. AMD took weeks and even
               | when asking very precise questions with lots of technical
               | background, there was a lot of "hmm dunno have to find
               | someone who'd know" kind of answers, and it would often
               | take one to two weeks for a single reply. And that's not
               | even dev work, it's just tech support for your own damn
               | stuff you're trying to sell.
               | 
               | You can't seriously tell me that's not something they
               | could fix.
        
             | hedora wrote:
             | I've had the opposite experience under Linux.
             | 
             | My old nvidia 570's drivers went into severe bitrot. Basic
             | stuff like screensavers and desktops broke badly, and games
             | were flaky. The card is still more than powerful enough for
             | what I used it for.
             | 
             | I switched to AMD, with open source drivers. I get windows-
             | level performance on AAA (and indie) games in steam, and
             | zero compatibility issues with the rest of the Linux
             | ecosystem.
        
               | amlib wrote:
               | AMD GPU support in linux is a bit of a flip flop. Some
               | generations seem to get a lot of love and work really
               | damn well (sometimes with better and broader support than
               | the windows drivers) like RDNA2 and most Polaris cards.
               | Others such as Vega and specially now RDNA3 are a
               | shitshow with a lot of things just broken.
        
               | tyfon wrote:
               | I ran vega 64 in linux for 3-4 years, it was really nice.
               | It also worked without bugs with proton, the 3060 I have
               | now gives me a lot of artifacts like incorrect lightning
               | and even X crashes once in a while.
               | 
               | I'm considering switching back to an AMD card due to
               | this.
        
               | gcoakes wrote:
               | I used a Vega64 for years and just bought a 6000 series.
               | Both work great on my machines. I'm typically running the
               | bleeding edge kernels though, so that might explain it a
               | bit. I would think Ubuntu is probably the most supported
               | if you opt for the OEM kernel.
        
               | aseipp wrote:
               | It really depends on the card. I have an old RDNA
               | workstation card in my server and the driver was a real
               | crapshoot. I eventually started delaying updates to newer
               | kernel releases (which would fix other bugs!) because
               | there would be regressions of various kinds. Graphics
               | under Linux is still a bit painful after all these years.
               | 
               | At least Nvidia finally open sourced their driver too, I
               | guess. And Intel is still open source. But it still sucks
               | a bit I think unless you do research.
        
               | izacus wrote:
               | Why are you talking about screensavers in a CUDA thread
               | though? You can't compute on a fancy screensaver
               | animation, you need a working CUDA driver that nVidia
               | provides for that.
        
               | kbenson wrote:
               | Likely because this subsection of the thread seems more
               | focused on drivers in general and less on CUDA, if the
               | other replies are anything to go by.
        
               | fiddlerwoaroof wrote:
               | Yeah, I've always bought AMD for this reason since some
               | really bad experiences with Nvidia cards.
        
           | varelse wrote:
           | [dead]
        
           | imtringued wrote:
           | The only hope is rusticl and it happens inspite of AMD.
        
         | roenxi wrote:
         | Hmm. I found
         | 
         | https://www.youtube.com/watch?v=Mr0rWJhv9jU
         | 
         | and
         | 
         | https://geohot.github.io/blog/jekyll/update/2023/06/07/a-div...
         | 
         | I feel a lot better about my journey with AMD now; there seemed
         | to be some major issues with their GPU drivers. Now I know it
         | wasn't just me.
        
       | DrNosferatu wrote:
       | Sounds great!
       | 
       | How far is it actually compatible right now?
       | 
       | Are there any tests / benchmarks?
       | 
       | Can this be used to run CUDA-accelerated LLMs?
        
         | syntaxers wrote:
         | > > VUDA only takes kernels in SPIR-V format. VUDA does not
         | provide any support for compiling CUDA C kernels directly to
         | SPIR-V (yet). However, it does not know or care how the SPIR-V
         | source was created - may it be GLSL, HLSL, OpenCL.
         | 
         | So the answer is no, it can't be used with kernels that use
         | cublas or cudnn, which excludes almost all ML use-cases.
        
         | johndough wrote:
         | The Vulkan spec does not enforce strict IEEE 754 floating point
         | semantics, so perfect compatibility with CUDA is impossible.
         | 
         | https://registry.khronos.org/vulkan/specs/1.3-khr-extensions...
         | 
         | However, the deep learning field does currently not pay much
         | attention to reproducibility, so this might not be a big issue.
        
           | my123 wrote:
           | Memory management in Vulkan is _very_ restricted. Nothing
           | remotely like UVM. CUDA on Vulkan for these reasons will
           | always stay a pet project at best, with no shot at usable
           | quality whatsoever.
        
       | empyrrhicist wrote:
       | Doesn't look very active, does it still work?
        
       | fancyfredbot wrote:
       | It's not an implementation of CUDA, it's an implementation of the
       | CUDA runtime API. The API is used to configure the card, allocate
       | and copy memory, and run kernels. Importantly you cannot use this
       | to write the actual kernels which run on the GPU!
        
         | pjmlp wrote:
         | Additionally, any wannabe CUDA replacement needs to support PTX
         | and polyglot development, or is a non starter for lots of
         | workloads.
        
         | RicoElectrico wrote:
         | So what is this useful for, then?
        
         | sheepscreek wrote:
         | I was half hoping this meant running CUDA code on AMD GPUs.
         | Thanks for clarifying.
        
           | empyrrhicist wrote:
           | I know AMD has a whole bunch of (related?) projects for GPU
           | compute, but man - if they could just provide an interop
           | layer that Just Works they'd get immediate access to so much
           | more market share.
        
             | collsni wrote:
             | It is coming from what I can tell
        
               | meragrin_ wrote:
               | It's been coming for years now. I will probably be years
               | before it really is here.
        
             | sanxiyn wrote:
             | Eh well, it is very close to just working. From "Training
             | LLMs with AMD MI250 GPUs and MosaicML":
             | 
             | > It all just works. No code changes were needed.
             | 
             | https://www.mosaicml.com/blog/amd-mi250
        
               | paulmd wrote:
               | "Just works" in this context means executing the compiled
               | CUDA or the PTX bytecode without recompiling. Nobody is
               | ever going to utilize ROCm if it requires distributing as
               | source and recompiling.
               | 
               | To make it even more insulting, even simply installing
               | ROCm itself is a massive burden, even on an ostensibly-
               | supported (as geohot discovered) and even just "it works
               | out of the box if you distribute and compile it locally"
               | is ignoring that whole massive "draw the rest of the owl"
               | stage of getting ROCm installed and building properly in
               | your environment.
        
               | causality0 wrote:
               | Don't forget AMD doesn't seem to even care about ROCm
               | themselves. Six months in and RDNA3 cards still don't
               | support it. Can you imagine if Nvidia launched RTX40-
               | cards with no DLSS even though 30- cards already had it,
               | and six months started boasting about how DLSS support
               | was "coming this fall"?
        
               | i80and wrote:
               | I've been running PyTorch on my Radeon 7900 XT using
               | ROCm. Is that not supposed to work?
        
               | graphe wrote:
               | No, it actually isn't supposed to work, it's not
               | officially supported. https://sep5.readthedocs.io/en/late
               | st/Installation_Guide/Ins...
        
               | i80and wrote:
               | Fascinating. And yet.
        
               | wmf wrote:
               | ROCm is for CDNA not RDNA. It has limited, best-effort
               | RDNA support for a few cards.
        
             | heyoni wrote:
             | Then they too could call themselves an AI company!
        
               | empyrrhicist wrote:
               | I've been hoping for it for so long - I wonder if there's
               | enough interest that someone could do a GoFundMe to hire
               | at least one full time dev lol.
        
           | jdoerfert wrote:
           | Shameless plug: https://www.osti.gov/servlets/purl/1892137
           | 
           | TLDR; If you provide even more functions through the
           | overloaded headers, incl. "hidden ones", e.g.,
           | `__cudaPushCallConfiguration`, you can use LLVM/Clang as a
           | CUDA compiler and target AMD GPUs, the host, and soon GPUs of
           | two other manufacturers.
        
       | einpoklum wrote:
       | 1. This implements the clunky C-ish API; there's also the
       | Modern-C++ API wrappers, with automatic error checking, RAII
       | resource control etc.; see: https://github.com/eyalroz/cuda-api-
       | wrappers (due disclosure: I'm the author)
       | 
       | 2. Implementing the _runtime_ API is not the right choice; it's
       | important to implement the _driver_ API, otherwise you can't
       | isolate contexts, dynamically add newly-compiled JIT kernels via
       | modules etc.
       | 
       | 3. This is less than 3000 lines of code. Wrapping all of the core
       | CUDA APIs (driver, runtime, NVTX, JIT compilation of CUDA-C++ and
       | of PTX) took me > 14,000 LoC.
        
         | rootw0rm wrote:
         | nice project. this is why HN kicks ass
        
           | einpoklum wrote:
           | Thanks for the compliment :-)
           | 
           | What I _really_ like to receive, though, is feedback from
           | using the wrappers, ideas for changes/improvements, and of
           | course messages volunteering to QA new versions before their
           | release :-P
        
       | AndrewKemendo wrote:
       | This could be a big deal if we have an actual alternative to CUDA
       | 
       | I can't see NVIDIA letting this just exist
        
         | Ballas wrote:
         | My opinion is that CUDA is not the mote keeping the others out
         | - it's the CUDNN (and CUBLAS), more specifically the level to
         | which they are optimized.
        
           | roenxi wrote:
           | I'm not even certain optimisation matters. I can crash my
           | machine (AMD graphics) with a stock Debian install by letting
           | something attempt BLAS on the GPU.
           | 
           | The situation is starting to improve though. Installed a
           | bunch of libraries from https://repo.radeon.com/rocm/apt/5.4
           | jammy main and the crashes got less frequent. I don't have a
           | lot of faith in AMD to deliver reliable BLAS libraries at
           | this point, but it could happen. The hardware is there, I
           | just don't think they're prioritising supporting the right
           | places in the distribution chain or supporting consumer-level
           | graphics.
        
             | metal_am wrote:
             | How about something like MAGMA?
        
           | [deleted]
        
         | flykespice wrote:
         | And on what legal ground would NVIDIA have to take this down?
        
           | themoonisachees wrote:
           | When you have Nvidia money you don't need grounds to sue, the
           | lawyers will think of something and drag any open-source devs
           | through years-long suits.
           | 
           | The only saving grace would be Oracle v Google which
           | established the de jure that an API isn't copyrightable.
        
             | hedora wrote:
             | APIs weren't copyrightable before Oracle v Google. There
             | was plenty of precedent saying that. For example, before
             | they were called Oracle, they built a clone of IBM SEQUEL.
             | 
             | The main concern with Oracle v Google was that the court
             | would ignore or misinterpret the existing precedent.
             | 
             | A secondary concern was that a Google employee formerly
             | worked on Java at Sun (and/or Oracle), and copy-pasted some
             | implementation source code from oracle to google's code
             | bases. There was a real possibility the "APIs aren't
             | copyrightable" precedent would stand, but the courts would
             | rule that Google couldn't continue distributing Dalvik.
        
       ___________________________________________________________________
       (page generated 2023-07-01 23:00 UTC)