[HN Gopher] AMD Publishes Open-Source Driver for GPU Virtualizat...
___________________________________________________________________
AMD Publishes Open-Source Driver for GPU Virtualization, Radeon "In
the Roadmap"
Author : davidlt
Score : 175 points
Date : 2025-04-24 06:58 UTC (16 hours ago)
(HTM) web link (www.phoronix.com)
(TXT) w3m dump (www.phoronix.com)
| janpmz wrote:
| This article is almost unreadable for me. The ads change in size
| and make the text jump. I'm adding it to NotebookLM now.
| Mountain_Skies wrote:
| The article is extremely light on details anyway. The most
| important thing in it is the link to the repo at
| https://github.com/amd/MxGPU-Virtualization
| DistractionRect wrote:
| Seems like using a sledge hammer to pound a nail. Why not just
| use an ad blocker like ublock origin?
| proxysna wrote:
| That's pretty sick. Nice to see such things trickle down to
| consumer GPU's.
| seanhunter wrote:
| It blows my mind how reliably AMD shoots itself in the foot. What
| we want isn't that hard:
|
| 1) Support your graphics cards on linux using kernel drivers that
| you upstream. All of them. Not just a handful - all the ones you
| sell from say 18 months ago till today.
|
| 2) Make GPU acceleration actually work out of the box for pytorch
| and tensorflow. Not some special fork, patched version that you
| "maintain" on your website, the tip of the main branch for both
| of those libraries should just compile out of the box and give
| people gpu-accelerated ML.
|
| This is table stakes but it blows my mind that they keep making
| press releases and promises like this that things are on the
| roadmap without doing thing one and unfucking the basic dev
| experience so people can actually use their GPUs for real work.
|
| How it actually is: 1) Some cards work with rocm, some cards work
| with one of the other variations of BS libraries they have come
| up with over the years. Some cards work with amdgpu but many only
| work with proprietary kernel drivers which means if you don't use
| precisely one of the distributions and kernel versions that they
| maintain you are sool.
|
| 2) Nothing whatsoever builds out of the box and when you get it
| to build almost nothing runs gpu accelerated. For me, pytorch
| requires a special downgrade, a python downgrade and a switch to
| a fork that AMD supposedly maintain although it doesn't compile
| for me and when I managed to beat it into a shape where it
| compiled it wouldn't run GPU accelerated even though games use
| the GPU just fine. I have a GPU that is supposedly current, so
| they are actively selling it, but can I use it? Can I bollocks.
| Ollama won't talk to my GPU even though it supposedly works with
| ROCm. It only works with ROCm with some graphics cards.
| Tensorflow similar story when I last tried it although admittedly
| I didn't try as hard as pytorch.
|
| Just make your shit work so that people can use it. It really
| shouldn't be that hard. The dev experience with NVidia is a
| million times better.
| faust201 wrote:
| IIRC there was only one AMD employee that was working to
| integrate linux based things. Often, the response was - things
| are stuck in Intellectual property, or project managers etc. So
| even specs were not available.
| logicchains wrote:
| SemiAnalysis had a good article on this recently, basically the
| reason AMD still sucks on the ML software side is that their
| compensation for devs is significantly worse than competitors
| like NVidia, Google and OpenAI, so most of the most competent
| devs go elsewhere.
| bayindirh wrote:
| AMD has two driver teams at this point. One of Linux/Open
| Source, one for Catalyst/Closed source, and they are not
| allowed to interact.
|
| Because, there are tons of IP and trade secrets involved in
| driver development and optimization. Sometimes game related,
| sometimes for patching a rogue application which developers
| can't or don't fix, etc. etc.
|
| GPU drivers are ought to be easy, but in reality, they are not.
| The open source drivers are "vanilla" drivers without all these
| case-dependent patching and optimization. Actually, they really
| work well out of the box for normal desktop applications. I
| don't think there are any cards which do (or will) not work
| with the open source kernel drivers as long as you use a
| sufficiently recent version.
|
| ...and you mention ROCm.
|
| I'm not sure how ROCm's intellectual underpinnings are but,
| claiming lack of effort is a bit unfair to AMD. Yes, software
| was never their strong suit, but they're way better when
| compared to 20 years earlier. They have a proper open source
| driver which works, and a whole fleet of open source ROCm
| packages, which is very rigorously CI/CD tested by their
| maintainers now.
|
| Do not forget that some of the world's most powerful
| supercomputers run on Instinct cards, and AMD is getting tons
| of experience from these big players. If you think the
| underpinnings of GPGPU libraries are easy, I can only say that
| the reality is _very_ different. The simple things people do
| with PyTorch and other very high level libraries pull enormous
| tricks under the hood, and you 're really pushing the
| boundaries of the hardware performance and capability-wise.
|
| NVIDIA is not selling a tray full of switches and GPUs and
| require OEMs to integrate it as-is for no reason. On the other
| hand, the same NVIDIA acts very slowly to enable an open source
| ecosystem.
|
| So, yes, AMD is not in an ideal position right now, but calling
| them incompetent doesn't help either.
|
| P.S.: The company which fought for a completely open source
| HDMI 2.1 capable display driver is _AMD_ , not NVIDIA.
| spockz wrote:
| I accept that there are two teams for reasons that include
| IP. However, Nvidia must have the same problem and they
| appear not to be hamstrung by it. So what is the difference?
| bayindirh wrote:
| NVIDIA and AMD, from my experience, have completely
| different cultures.
|
| ATI started as a much more closed company. Then, they
| pivoted and started to open their parts. They were
| hamstrung at the HDCP at one point, and they decided to
| decouple HDCP block from video accelerators at silicon
| level to allow open source drivers to access video hardware
| without leaking/disabling HDCP support. So, they devoted to
| open what they have, but when you have tons of legacy IP,
| things doesn't go from 0 to 100 in a day. I want to remind
| that "game dependent driver optimization" started pre
| 2000s. This is how rooted these codebases are.
|
| NVIDIA took a different approach, which was being
| indifferent on the surface, but their hardware became a bit
| more hostile at every turn towards nouveau. Then they
| released some specs to allow "hardware enablement" by
| nouveau, so closed source drivers can be installed, and the
| card didn't blank-screened at boot.
|
| Then, as they fought with kernel developers, with some hard
| prodding by Kernel guys and some coercing by RedHat, NVIDIA
| accepted to completely remove "shim" shenanigans, and moved
| closed bits of the kernel module to card firmware by
| revising card architecture. It's important to keep in mind
| that NVIDIA's open drivers means "an open, bare bones
| kernel module, a full fledged and signed proprietary
| firmware which can be used by closed source drivers only
| and a closed source GLX stack", where in AMD this means "An
| open source kernel module, standard MESA libraries, and a
| closed source firmware available to all drivers".
|
| It was in talks with nouveau guys to allow them to use the
| full powered firmware with clock and power management
| support, but I don't know where it went.
|
| The CUDA environment is also the same. Yes it works very
| well, and it's a vast garden, but it's walled and protected
| by electrified fence and turrets. You're all in, or all
| out.
| markus_zhang wrote:
| I'm wondering how much effort went into RE Nvidia cards
| and drivers. Graphics card drivers are completely a
| mythical beast to me, and I guess it's one of the most
| complicated drivers in the hardware world.
| mariusor wrote:
| > Nvidia must have the same problem and they appear not to
| be hamstrung by it
|
| Probably because they don't have an open-source driver for
| linux and they can focus on the proprietary one.
| rwmj wrote:
| A laundry list of excuses ... or a list of things to work on.
| ("Why the hell do we have two driver teams?" - would be my #1
| thing to fix if I was at AMD.)
| bayindirh wrote:
| I guess that you don't understand that how silicon and 3rd
| party IP works. It took Intel a completely new GPU from
| scratch to be able to open drivers. AMD did at least one
| revision to their silicon to enable that kind of openness.
|
| Yet, HDMI forum said that they can't implement an HDMI2.1
| capable driver in the open, with some nasty legal letters.
|
| I have a couple of friends who wrote 3D engines from
| scratch and debugged graphics drivers for their engines for
| a living. It's a completely different jungle filled with
| completely different beasts.
|
| I think being able to call glxinfo on an AMD card running
| with completely open drivers and being able to see
| extensions from NVIDIA, AMD, SGI, IBM and others is a big
| win already.
| markus_zhang wrote:
| Thanks. All these sound interesting.
|
| What does the 3d engine look like? Custom made AAA in
| companies such as EA or Ubisoft?
| gessha wrote:
| But who holds these patents? And what are they about?
| pja wrote:
| DRM mostly.
| jeroenhd wrote:
| The "fix" would be to make games perform like shit on
| Windows and disable HDR and other proprietary features, or
| to abolish the open Linux drivers. You can't have both,
| unless you do what Nvidia does and move all of the
| proprietary stuff to the GPU firmware and write a minimal
| driver to control that massive firmware blob. Which,
| obviously, would require reengineering the GPU hardware,
| which is expensive and of questionable value.
|
| They can't open source their proprietary drivers even if
| they wanted to because they don't own all of the IP and
| their code is full of NDA'd trade secrets. AMD isn't paying
| two different teams to do the same work because they like
| wasting money.
| wirybeige wrote:
| AMD already has large firmware blobs. Both intel and
| nvidia have the software side of GPUs figured out.
| bayindirh wrote:
| NVIDIA's blobs are different when compared to others.
| They do not want to give away how their GPU clocking and
| enablement works. As a result, NVIDIA's blobs are both
| signed and picky about "who" they communicate with. You
| can't use the NVIDIA's full fledged firmware with nouveau
| for example.
|
| On the other hand, the card enablement sequences are open
| for AMD and Intel. AMD only protects card's thermal and
| fan configuration data to preventing card damage, AFAIK.
| You can clock the card and use its power management
| features the way you like. For NVIDIA, even they are out
| of reach.
|
| AMD's open drivers work way better than NVIDIA's closed
| ones, too. I have never seen how a single application
| refused to launch until I used NVIDIA closed drivers.
| detaro wrote:
| AMDs open driver runs games with HDR quite well on Linux,
| so that specific thing is not preventing it.
| onli wrote:
| Fact of the matter is that I have a Radeon RX 6600, which I
| can't use with ollama. First, there is no ROCm at all in my
| distros repository - it doesn't compile reliably and needs
| too many ressources. Then, when compiling it manually, it
| turns out that ROCm doesn't even support the card in the
| first place.
|
| I'm aware that 8GB Vram are not enough for most such
| workloads. But no support at all? That's ridiculous. Let me
| use the card and fall back to system memory for all I care.
|
| Nvidia, as much as I hate their usually awfully insufficient
| linux support, has no such restrictions for any of their
| modern cards, as far as I'm aware.
| smerrill wrote:
| You should be able to use ollama's Vulkan backend and in my
| experience the speed will be the same. (I just spent a
| bunch of time putting Linux on my 2025 ASUS ROG Flow Z13 to
| use ROCm, only to see the exact same performance as
| Vulkan.)
| onli wrote:
| That would mean switching to
| https://github.com/whyvl/ollama-vulkan? I see no backend
| selection in ollama, nor anything in the faq.
| yjftsjthsd-h wrote:
| > I'm aware that 8GB Vram are not enough for most such
| workloads. But no support at all? That's ridiculous. Let me
| use the card and fall back to system memory for all I care.
|
| > Nvidia, as much as I hate their usually awfully
| insufficient linux support, has no such restrictions for
| any of their modern cards, as far as I'm aware.
|
| In fact, I _regularly_ run llamafile (and sometimes ollama)
| on an nvidia dGPU in a laptop, with 4GB of VRAM, and it
| works fine (ish... I mostly do the thing where some layers
| are on the GPU and some are CPU; it 's still faster than
| pure CPU so whatever).
| pja wrote:
| My recent experience has been that the Vulkan support in
| llama.cpp is pretty good. It may lag behind Cuda / Metal
| for the bleeding edge models if they need a new operator.
|
| Try it out! Benchmarks here: https://github.com/ggml-
| org/llama.cpp/discussions/10879
|
| (ollama doesn't support vulkan for some weird reason. I
| guess they never pulled the code from llama.cpp)
| onli wrote:
| Thanks, I might indeed give this a test!
| JonChesterfield wrote:
| I know it's in Debian (and thus Ubuntu), Arch, Gentoo.
| Pretty sure RedHat, Suse, Nix have it. What distro are you
| using?
|
| ROCm is a train wreck to compile from source but can be
| done with sufficient bloodymindedness.
|
| The RX6600 is a gfx1032. I used a gfx1010 for ages with
| this stuff. Seems likely it'll run for you if you ignore
| the "supported cards" list, which really should be renamed
| to something that antagonises people less.
| onli wrote:
| I'm using void. https://github.com/void-linux/void-
| packages/issues/26415 gives an insight, though it doesn't
| explain the whole problem, if I remember correctly what
| maintainers wrote elsewhere.
|
| > _ROCm is a train wreck to compile from source but can
| be done with sufficient bloodymindedness._
|
| Yeah, I did that myself. Not impossible, just a bit
| annoying and time consuming. The issue I ran into then
| was exactly picking the gpu model (incredible that this
| is even necessary) and not having the gfx1032 available,
| see https://github.com/void-linux/void-
| packages/issues/26415#iss... for what I was following
| back then. I tried to edit the configuration for the
| gfx1032 anyway, but it did not succeed.
|
| Side note: Already having to know which card corresponds
| to which code is annoying, and completely unnecessary.
| They could also just map the consumer facing name. But
| that would be too easy I assume.
| bavell wrote:
| Sucks that you've had so much trouble... My experience with my
| cheap 6750XT is that it just works OOTB on Arch with rocm,
| llama.cpp, ollama, whisper, etc by setting an envvar.
| A4ET8a8uTh0_v2 wrote:
| Did you do any writeup/post on your experience with setting
| it up? I think it would have some audience ( apart from me
| that is ).
| dharmab wrote:
| It never occurred to me because it was such an easy process
| kn Arch.
|
| https://wiki.archlinux.org/title/Ollama 3 commands to
| install and run an LLM.
| rbjorklin wrote:
| This has been my experience with a 6950XT as well using
| Fedora.
| pshirshov wrote:
| > Support your graphics cards on linux using kernel drivers
| that you upstream. All of them. Not just a handful - all the
| ones you sell from say 18 months ago till today.
|
| All the stuff works even if it's not officially supported. It's
| not that hard to set a single environment variable
| (HSA_OVERRIDE_GFX_VERSION).
|
| Like literally, everything works, from Vega 56/64 to ryzen 99xx
| iGPUs.
|
| Also, try nixos. Everything literally works with a single
| config entry after recent merge of rocm 6.3. I successfully run
| a zoo of various radeons of different generations.
| seanhunter wrote:
| I'm using Nix and setting HSA_OVERRIDE_GFX_VERSION. It's not
| working on my GPU (Radeon RX7600)
| JonChesterfield wrote:
| This https://www.techpowerup.com/gpu-specs/radeon-
| rx-7600.c4153 says that's a gfx1102. I'd expect that to
| work out of the box. Linux kernel version vaguely near one
| of the nominally supported ones?
| iforgotpassword wrote:
| But why do you even have to do this fucking bullshit that you
| randomly stumble upon, while googling error message after
| error message, ending up in random github repos and issues?
|
| And no, just because the three random cards you have work
| doesn't mean "everything works". Just tried an MI300A a few
| months ago... I just wanted to test ollama as this is one of
| the hottest applications for GPU acceleration now, it will
| surely be well supported right? First, the gfx version listed
| for it in the ollama docs is wrong - but OK, figured it out.
| Then Tried some random models with it, the only output it
| ever generates is GGGGGGGGGGGGG. Apparently only fp16 models
| work, nothing more quantized. So I pick one explicitly. Then
| it's slower than running on the cpu in the same system.
|
| Thanks but no thanks; this cost me two days when Nvidia just
| works first try.
| pshirshov wrote:
| > But why do you even have to do this fucking bullshit
|
| Because it's like 2-4 times cheaper than to go nvidia?..
|
| > the three random cards you have
|
| It's more than 3 random cards. I run 6900 XT, 7900 XTX,
| W7900 Pro, VII, VII Pro, Vega 56, Vega 64, 6800 XT, 5700 XT
| plus I've experimented with a 9950 iGPU, a 5xxx series iGPU
| and the only thing which didn't work was 3400g iGPU.
|
| > Apparently only fp16 models work
|
| fp8 works for me
| nimish wrote:
| AMD has never understood that software is important. It's not
| culturally baked into them.
| latchkey wrote:
| You aren't wrong, but the important question is: "If they
| decided to change, could they actually do it?"
| pjmlp wrote:
| Meanwhile NVidia has embraced Python as first class programming
| language on CUDA, with the new cuTile format, as companion to
| PTX.
|
| And given the tooling they are adding full steam ahead for
| Python, I even wonder if Mojo will manage to get enough
| mindshare, let alone what AMD and Intel are not doing.
| creata wrote:
| It doesn't diminish most of your points, but getting PyTorch to
| work on Arch Linux is as easy as installing the `python-
| pytorch-opt-rocm` package. Similar with Ollama: `ollama-rocm`.
| So if you just want to use PyTorch, and don't need the _very_
| latest version, I wouldn 't say the dev experience with Nvidia
| is much better.
| mindcrime wrote:
| > Similar with Ollama: `ollama-rocm`.
|
| Same experience here. Installing ROCm and Ollama on my box
| were both dead simple and everything worked right out of the
| box. Using an RX 7900 XTX card, FWIW.
| jmward01 wrote:
| Adding to the list above. Please, PLEASE, give me a single
| place I can look to see what pytorch will actually support. I
| would probably buy something if I could get a straight answer
| on what will/won't actually work.
| throwaway48476 wrote:
| If AMD does deliver on client dGPU virtualization it would be
| amazing.
| transpute wrote:
| Some old AMD workstation GPUs supported SR-IOV, that repo was
| just archived.
|
| https://open-iov.org/index.php/GPU_Support#AMD
|
| https://github.com/GPUOpen-LibrariesAndSDKs/MxGPU-Virtualiza...
|
| As "AI" use cases mature, NPU/AIE-ML virtualization will also
| be needed.
| mtillman wrote:
| The pro vii (a wonderfully spec'd card for my purposes) had a
| bunch of awesome features-even remote access-but then they
| pulled the marketing pages from their website and stopped
| shipping the link bridge (non mpx) to water down the value of
| the card. I have no idea how AMD works.
| AbuAssar wrote:
| related: (AMD 2.0 - New Sense of Urgency)
|
| https://news.ycombinator.com/item?id=43780972
| bryanlarsen wrote:
| Wow, that's a very substantive article.
| okucu wrote:
| >Dylan Patel
|
| Yeah, not gonna read that.
| 404human wrote:
| Anyone else find it funny that AMD releases new GPU features
| while people still can't get basic ML stuff working? It's like
| building a fancy garage before fixing the broken car.
| Nullabillity wrote:
| Graphics work just fine. Y'know, the G in the name.
| Redoubts wrote:
| Shame that's not where the money is
| creata wrote:
| Maybe ROCm is a bit of a mess, but AMD makes amazing _graphics_
| cards.
|
| And they open sourced their GPU driver, which is a massive plus
| in my book.
|
| If Radeon got virtualization support, it would make it the
| perfect GPU for running video games (or other GPU applications)
| in virtual machines - you wouldn't need to mess with PCIe
| passthrough anymore.
| latchkey wrote:
| Hi, this is actually pretty important for my business. I've been
| waiting for the driver to be released for a year now. We got the
| binary a few months ago under NDA, but open sourcing it, is next
| level for us.
|
| What I wrote about this on twitter:
| https://x.com/HotAisle/status/1914549886185611627
| latchkey wrote:
| Previously: https://news.ycombinator.com/item?id=43759350
___________________________________________________________________
(page generated 2025-04-24 23:02 UTC)