[HN Gopher] AMD Publishes Open-Source Driver for GPU Virtualizat...
       ___________________________________________________________________
        
       AMD Publishes Open-Source Driver for GPU Virtualization, Radeon "In
       the Roadmap"
        
       Author : davidlt
       Score  : 175 points
       Date   : 2025-04-24 06:58 UTC (16 hours ago)
        
 (HTM) web link (www.phoronix.com)
 (TXT) w3m dump (www.phoronix.com)
        
       | janpmz wrote:
       | This article is almost unreadable for me. The ads change in size
       | and make the text jump. I'm adding it to NotebookLM now.
        
         | Mountain_Skies wrote:
         | The article is extremely light on details anyway. The most
         | important thing in it is the link to the repo at
         | https://github.com/amd/MxGPU-Virtualization
        
         | DistractionRect wrote:
         | Seems like using a sledge hammer to pound a nail. Why not just
         | use an ad blocker like ublock origin?
        
       | proxysna wrote:
       | That's pretty sick. Nice to see such things trickle down to
       | consumer GPU's.
        
       | seanhunter wrote:
       | It blows my mind how reliably AMD shoots itself in the foot. What
       | we want isn't that hard:
       | 
       | 1) Support your graphics cards on linux using kernel drivers that
       | you upstream. All of them. Not just a handful - all the ones you
       | sell from say 18 months ago till today.
       | 
       | 2) Make GPU acceleration actually work out of the box for pytorch
       | and tensorflow. Not some special fork, patched version that you
       | "maintain" on your website, the tip of the main branch for both
       | of those libraries should just compile out of the box and give
       | people gpu-accelerated ML.
       | 
       | This is table stakes but it blows my mind that they keep making
       | press releases and promises like this that things are on the
       | roadmap without doing thing one and unfucking the basic dev
       | experience so people can actually use their GPUs for real work.
       | 
       | How it actually is: 1) Some cards work with rocm, some cards work
       | with one of the other variations of BS libraries they have come
       | up with over the years. Some cards work with amdgpu but many only
       | work with proprietary kernel drivers which means if you don't use
       | precisely one of the distributions and kernel versions that they
       | maintain you are sool.
       | 
       | 2) Nothing whatsoever builds out of the box and when you get it
       | to build almost nothing runs gpu accelerated. For me, pytorch
       | requires a special downgrade, a python downgrade and a switch to
       | a fork that AMD supposedly maintain although it doesn't compile
       | for me and when I managed to beat it into a shape where it
       | compiled it wouldn't run GPU accelerated even though games use
       | the GPU just fine. I have a GPU that is supposedly current, so
       | they are actively selling it, but can I use it? Can I bollocks.
       | Ollama won't talk to my GPU even though it supposedly works with
       | ROCm. It only works with ROCm with some graphics cards.
       | Tensorflow similar story when I last tried it although admittedly
       | I didn't try as hard as pytorch.
       | 
       | Just make your shit work so that people can use it. It really
       | shouldn't be that hard. The dev experience with NVidia is a
       | million times better.
        
         | faust201 wrote:
         | IIRC there was only one AMD employee that was working to
         | integrate linux based things. Often, the response was - things
         | are stuck in Intellectual property, or project managers etc. So
         | even specs were not available.
        
         | logicchains wrote:
         | SemiAnalysis had a good article on this recently, basically the
         | reason AMD still sucks on the ML software side is that their
         | compensation for devs is significantly worse than competitors
         | like NVidia, Google and OpenAI, so most of the most competent
         | devs go elsewhere.
        
         | bayindirh wrote:
         | AMD has two driver teams at this point. One of Linux/Open
         | Source, one for Catalyst/Closed source, and they are not
         | allowed to interact.
         | 
         | Because, there are tons of IP and trade secrets involved in
         | driver development and optimization. Sometimes game related,
         | sometimes for patching a rogue application which developers
         | can't or don't fix, etc. etc.
         | 
         | GPU drivers are ought to be easy, but in reality, they are not.
         | The open source drivers are "vanilla" drivers without all these
         | case-dependent patching and optimization. Actually, they really
         | work well out of the box for normal desktop applications. I
         | don't think there are any cards which do (or will) not work
         | with the open source kernel drivers as long as you use a
         | sufficiently recent version.
         | 
         | ...and you mention ROCm.
         | 
         | I'm not sure how ROCm's intellectual underpinnings are but,
         | claiming lack of effort is a bit unfair to AMD. Yes, software
         | was never their strong suit, but they're way better when
         | compared to 20 years earlier. They have a proper open source
         | driver which works, and a whole fleet of open source ROCm
         | packages, which is very rigorously CI/CD tested by their
         | maintainers now.
         | 
         | Do not forget that some of the world's most powerful
         | supercomputers run on Instinct cards, and AMD is getting tons
         | of experience from these big players. If you think the
         | underpinnings of GPGPU libraries are easy, I can only say that
         | the reality is _very_ different. The simple things people do
         | with PyTorch and other very high level libraries pull enormous
         | tricks under the hood, and you 're really pushing the
         | boundaries of the hardware performance and capability-wise.
         | 
         | NVIDIA is not selling a tray full of switches and GPUs and
         | require OEMs to integrate it as-is for no reason. On the other
         | hand, the same NVIDIA acts very slowly to enable an open source
         | ecosystem.
         | 
         | So, yes, AMD is not in an ideal position right now, but calling
         | them incompetent doesn't help either.
         | 
         | P.S.: The company which fought for a completely open source
         | HDMI 2.1 capable display driver is _AMD_ , not NVIDIA.
        
           | spockz wrote:
           | I accept that there are two teams for reasons that include
           | IP. However, Nvidia must have the same problem and they
           | appear not to be hamstrung by it. So what is the difference?
        
             | bayindirh wrote:
             | NVIDIA and AMD, from my experience, have completely
             | different cultures.
             | 
             | ATI started as a much more closed company. Then, they
             | pivoted and started to open their parts. They were
             | hamstrung at the HDCP at one point, and they decided to
             | decouple HDCP block from video accelerators at silicon
             | level to allow open source drivers to access video hardware
             | without leaking/disabling HDCP support. So, they devoted to
             | open what they have, but when you have tons of legacy IP,
             | things doesn't go from 0 to 100 in a day. I want to remind
             | that "game dependent driver optimization" started pre
             | 2000s. This is how rooted these codebases are.
             | 
             | NVIDIA took a different approach, which was being
             | indifferent on the surface, but their hardware became a bit
             | more hostile at every turn towards nouveau. Then they
             | released some specs to allow "hardware enablement" by
             | nouveau, so closed source drivers can be installed, and the
             | card didn't blank-screened at boot.
             | 
             | Then, as they fought with kernel developers, with some hard
             | prodding by Kernel guys and some coercing by RedHat, NVIDIA
             | accepted to completely remove "shim" shenanigans, and moved
             | closed bits of the kernel module to card firmware by
             | revising card architecture. It's important to keep in mind
             | that NVIDIA's open drivers means "an open, bare bones
             | kernel module, a full fledged and signed proprietary
             | firmware which can be used by closed source drivers only
             | and a closed source GLX stack", where in AMD this means "An
             | open source kernel module, standard MESA libraries, and a
             | closed source firmware available to all drivers".
             | 
             | It was in talks with nouveau guys to allow them to use the
             | full powered firmware with clock and power management
             | support, but I don't know where it went.
             | 
             | The CUDA environment is also the same. Yes it works very
             | well, and it's a vast garden, but it's walled and protected
             | by electrified fence and turrets. You're all in, or all
             | out.
        
               | markus_zhang wrote:
               | I'm wondering how much effort went into RE Nvidia cards
               | and drivers. Graphics card drivers are completely a
               | mythical beast to me, and I guess it's one of the most
               | complicated drivers in the hardware world.
        
             | mariusor wrote:
             | > Nvidia must have the same problem and they appear not to
             | be hamstrung by it
             | 
             | Probably because they don't have an open-source driver for
             | linux and they can focus on the proprietary one.
        
           | rwmj wrote:
           | A laundry list of excuses ... or a list of things to work on.
           | ("Why the hell do we have two driver teams?" - would be my #1
           | thing to fix if I was at AMD.)
        
             | bayindirh wrote:
             | I guess that you don't understand that how silicon and 3rd
             | party IP works. It took Intel a completely new GPU from
             | scratch to be able to open drivers. AMD did at least one
             | revision to their silicon to enable that kind of openness.
             | 
             | Yet, HDMI forum said that they can't implement an HDMI2.1
             | capable driver in the open, with some nasty legal letters.
             | 
             | I have a couple of friends who wrote 3D engines from
             | scratch and debugged graphics drivers for their engines for
             | a living. It's a completely different jungle filled with
             | completely different beasts.
             | 
             | I think being able to call glxinfo on an AMD card running
             | with completely open drivers and being able to see
             | extensions from NVIDIA, AMD, SGI, IBM and others is a big
             | win already.
        
               | markus_zhang wrote:
               | Thanks. All these sound interesting.
               | 
               | What does the 3d engine look like? Custom made AAA in
               | companies such as EA or Ubisoft?
        
               | gessha wrote:
               | But who holds these patents? And what are they about?
        
               | pja wrote:
               | DRM mostly.
        
             | jeroenhd wrote:
             | The "fix" would be to make games perform like shit on
             | Windows and disable HDR and other proprietary features, or
             | to abolish the open Linux drivers. You can't have both,
             | unless you do what Nvidia does and move all of the
             | proprietary stuff to the GPU firmware and write a minimal
             | driver to control that massive firmware blob. Which,
             | obviously, would require reengineering the GPU hardware,
             | which is expensive and of questionable value.
             | 
             | They can't open source their proprietary drivers even if
             | they wanted to because they don't own all of the IP and
             | their code is full of NDA'd trade secrets. AMD isn't paying
             | two different teams to do the same work because they like
             | wasting money.
        
               | wirybeige wrote:
               | AMD already has large firmware blobs. Both intel and
               | nvidia have the software side of GPUs figured out.
        
               | bayindirh wrote:
               | NVIDIA's blobs are different when compared to others.
               | They do not want to give away how their GPU clocking and
               | enablement works. As a result, NVIDIA's blobs are both
               | signed and picky about "who" they communicate with. You
               | can't use the NVIDIA's full fledged firmware with nouveau
               | for example.
               | 
               | On the other hand, the card enablement sequences are open
               | for AMD and Intel. AMD only protects card's thermal and
               | fan configuration data to preventing card damage, AFAIK.
               | You can clock the card and use its power management
               | features the way you like. For NVIDIA, even they are out
               | of reach.
               | 
               | AMD's open drivers work way better than NVIDIA's closed
               | ones, too. I have never seen how a single application
               | refused to launch until I used NVIDIA closed drivers.
        
               | detaro wrote:
               | AMDs open driver runs games with HDR quite well on Linux,
               | so that specific thing is not preventing it.
        
           | onli wrote:
           | Fact of the matter is that I have a Radeon RX 6600, which I
           | can't use with ollama. First, there is no ROCm at all in my
           | distros repository - it doesn't compile reliably and needs
           | too many ressources. Then, when compiling it manually, it
           | turns out that ROCm doesn't even support the card in the
           | first place.
           | 
           | I'm aware that 8GB Vram are not enough for most such
           | workloads. But no support at all? That's ridiculous. Let me
           | use the card and fall back to system memory for all I care.
           | 
           | Nvidia, as much as I hate their usually awfully insufficient
           | linux support, has no such restrictions for any of their
           | modern cards, as far as I'm aware.
        
             | smerrill wrote:
             | You should be able to use ollama's Vulkan backend and in my
             | experience the speed will be the same. (I just spent a
             | bunch of time putting Linux on my 2025 ASUS ROG Flow Z13 to
             | use ROCm, only to see the exact same performance as
             | Vulkan.)
        
               | onli wrote:
               | That would mean switching to
               | https://github.com/whyvl/ollama-vulkan? I see no backend
               | selection in ollama, nor anything in the faq.
        
             | yjftsjthsd-h wrote:
             | > I'm aware that 8GB Vram are not enough for most such
             | workloads. But no support at all? That's ridiculous. Let me
             | use the card and fall back to system memory for all I care.
             | 
             | > Nvidia, as much as I hate their usually awfully
             | insufficient linux support, has no such restrictions for
             | any of their modern cards, as far as I'm aware.
             | 
             | In fact, I _regularly_ run llamafile (and sometimes ollama)
             | on an nvidia dGPU in a laptop, with 4GB of VRAM, and it
             | works fine (ish... I mostly do the thing where some layers
             | are on the GPU and some are CPU; it 's still faster than
             | pure CPU so whatever).
        
             | pja wrote:
             | My recent experience has been that the Vulkan support in
             | llama.cpp is pretty good. It may lag behind Cuda / Metal
             | for the bleeding edge models if they need a new operator.
             | 
             | Try it out! Benchmarks here: https://github.com/ggml-
             | org/llama.cpp/discussions/10879
             | 
             | (ollama doesn't support vulkan for some weird reason. I
             | guess they never pulled the code from llama.cpp)
        
               | onli wrote:
               | Thanks, I might indeed give this a test!
        
             | JonChesterfield wrote:
             | I know it's in Debian (and thus Ubuntu), Arch, Gentoo.
             | Pretty sure RedHat, Suse, Nix have it. What distro are you
             | using?
             | 
             | ROCm is a train wreck to compile from source but can be
             | done with sufficient bloodymindedness.
             | 
             | The RX6600 is a gfx1032. I used a gfx1010 for ages with
             | this stuff. Seems likely it'll run for you if you ignore
             | the "supported cards" list, which really should be renamed
             | to something that antagonises people less.
        
               | onli wrote:
               | I'm using void. https://github.com/void-linux/void-
               | packages/issues/26415 gives an insight, though it doesn't
               | explain the whole problem, if I remember correctly what
               | maintainers wrote elsewhere.
               | 
               | > _ROCm is a train wreck to compile from source but can
               | be done with sufficient bloodymindedness._
               | 
               | Yeah, I did that myself. Not impossible, just a bit
               | annoying and time consuming. The issue I ran into then
               | was exactly picking the gpu model (incredible that this
               | is even necessary) and not having the gfx1032 available,
               | see https://github.com/void-linux/void-
               | packages/issues/26415#iss... for what I was following
               | back then. I tried to edit the configuration for the
               | gfx1032 anyway, but it did not succeed.
               | 
               | Side note: Already having to know which card corresponds
               | to which code is annoying, and completely unnecessary.
               | They could also just map the consumer facing name. But
               | that would be too easy I assume.
        
         | bavell wrote:
         | Sucks that you've had so much trouble... My experience with my
         | cheap 6750XT is that it just works OOTB on Arch with rocm,
         | llama.cpp, ollama, whisper, etc by setting an envvar.
        
           | A4ET8a8uTh0_v2 wrote:
           | Did you do any writeup/post on your experience with setting
           | it up? I think it would have some audience ( apart from me
           | that is ).
        
             | dharmab wrote:
             | It never occurred to me because it was such an easy process
             | kn Arch.
             | 
             | https://wiki.archlinux.org/title/Ollama 3 commands to
             | install and run an LLM.
        
           | rbjorklin wrote:
           | This has been my experience with a 6950XT as well using
           | Fedora.
        
         | pshirshov wrote:
         | > Support your graphics cards on linux using kernel drivers
         | that you upstream. All of them. Not just a handful - all the
         | ones you sell from say 18 months ago till today.
         | 
         | All the stuff works even if it's not officially supported. It's
         | not that hard to set a single environment variable
         | (HSA_OVERRIDE_GFX_VERSION).
         | 
         | Like literally, everything works, from Vega 56/64 to ryzen 99xx
         | iGPUs.
         | 
         | Also, try nixos. Everything literally works with a single
         | config entry after recent merge of rocm 6.3. I successfully run
         | a zoo of various radeons of different generations.
        
           | seanhunter wrote:
           | I'm using Nix and setting HSA_OVERRIDE_GFX_VERSION. It's not
           | working on my GPU (Radeon RX7600)
        
             | JonChesterfield wrote:
             | This https://www.techpowerup.com/gpu-specs/radeon-
             | rx-7600.c4153 says that's a gfx1102. I'd expect that to
             | work out of the box. Linux kernel version vaguely near one
             | of the nominally supported ones?
        
           | iforgotpassword wrote:
           | But why do you even have to do this fucking bullshit that you
           | randomly stumble upon, while googling error message after
           | error message, ending up in random github repos and issues?
           | 
           | And no, just because the three random cards you have work
           | doesn't mean "everything works". Just tried an MI300A a few
           | months ago... I just wanted to test ollama as this is one of
           | the hottest applications for GPU acceleration now, it will
           | surely be well supported right? First, the gfx version listed
           | for it in the ollama docs is wrong - but OK, figured it out.
           | Then Tried some random models with it, the only output it
           | ever generates is GGGGGGGGGGGGG. Apparently only fp16 models
           | work, nothing more quantized. So I pick one explicitly. Then
           | it's slower than running on the cpu in the same system.
           | 
           | Thanks but no thanks; this cost me two days when Nvidia just
           | works first try.
        
             | pshirshov wrote:
             | > But why do you even have to do this fucking bullshit
             | 
             | Because it's like 2-4 times cheaper than to go nvidia?..
             | 
             | > the three random cards you have
             | 
             | It's more than 3 random cards. I run 6900 XT, 7900 XTX,
             | W7900 Pro, VII, VII Pro, Vega 56, Vega 64, 6800 XT, 5700 XT
             | plus I've experimented with a 9950 iGPU, a 5xxx series iGPU
             | and the only thing which didn't work was 3400g iGPU.
             | 
             | > Apparently only fp16 models work
             | 
             | fp8 works for me
        
         | nimish wrote:
         | AMD has never understood that software is important. It's not
         | culturally baked into them.
        
           | latchkey wrote:
           | You aren't wrong, but the important question is: "If they
           | decided to change, could they actually do it?"
        
         | pjmlp wrote:
         | Meanwhile NVidia has embraced Python as first class programming
         | language on CUDA, with the new cuTile format, as companion to
         | PTX.
         | 
         | And given the tooling they are adding full steam ahead for
         | Python, I even wonder if Mojo will manage to get enough
         | mindshare, let alone what AMD and Intel are not doing.
        
         | creata wrote:
         | It doesn't diminish most of your points, but getting PyTorch to
         | work on Arch Linux is as easy as installing the `python-
         | pytorch-opt-rocm` package. Similar with Ollama: `ollama-rocm`.
         | So if you just want to use PyTorch, and don't need the _very_
         | latest version, I wouldn 't say the dev experience with Nvidia
         | is much better.
        
           | mindcrime wrote:
           | > Similar with Ollama: `ollama-rocm`.
           | 
           | Same experience here. Installing ROCm and Ollama on my box
           | were both dead simple and everything worked right out of the
           | box. Using an RX 7900 XTX card, FWIW.
        
         | jmward01 wrote:
         | Adding to the list above. Please, PLEASE, give me a single
         | place I can look to see what pytorch will actually support. I
         | would probably buy something if I could get a straight answer
         | on what will/won't actually work.
        
       | throwaway48476 wrote:
       | If AMD does deliver on client dGPU virtualization it would be
       | amazing.
        
         | transpute wrote:
         | Some old AMD workstation GPUs supported SR-IOV, that repo was
         | just archived.
         | 
         | https://open-iov.org/index.php/GPU_Support#AMD
         | 
         | https://github.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualiza...
         | 
         | As "AI" use cases mature, NPU/AIE-ML virtualization will also
         | be needed.
        
           | mtillman wrote:
           | The pro vii (a wonderfully spec'd card for my purposes) had a
           | bunch of awesome features-even remote access-but then they
           | pulled the marketing pages from their website and stopped
           | shipping the link bridge (non mpx) to water down the value of
           | the card. I have no idea how AMD works.
        
       | AbuAssar wrote:
       | related: (AMD 2.0 - New Sense of Urgency)
       | 
       | https://news.ycombinator.com/item?id=43780972
        
         | bryanlarsen wrote:
         | Wow, that's a very substantive article.
        
         | okucu wrote:
         | >Dylan Patel
         | 
         | Yeah, not gonna read that.
        
       | 404human wrote:
       | Anyone else find it funny that AMD releases new GPU features
       | while people still can't get basic ML stuff working? It's like
       | building a fancy garage before fixing the broken car.
        
         | Nullabillity wrote:
         | Graphics work just fine. Y'know, the G in the name.
        
           | Redoubts wrote:
           | Shame that's not where the money is
        
         | creata wrote:
         | Maybe ROCm is a bit of a mess, but AMD makes amazing _graphics_
         | cards.
         | 
         | And they open sourced their GPU driver, which is a massive plus
         | in my book.
         | 
         | If Radeon got virtualization support, it would make it the
         | perfect GPU for running video games (or other GPU applications)
         | in virtual machines - you wouldn't need to mess with PCIe
         | passthrough anymore.
        
       | latchkey wrote:
       | Hi, this is actually pretty important for my business. I've been
       | waiting for the driver to be released for a year now. We got the
       | binary a few months ago under NDA, but open sourcing it, is next
       | level for us.
       | 
       | What I wrote about this on twitter:
       | https://x.com/HotAisle/status/1914549886185611627
        
       | latchkey wrote:
       | Previously: https://news.ycombinator.com/item?id=43759350
        
       ___________________________________________________________________
       (page generated 2025-04-24 23:02 UTC)