[HN Gopher] ROCm Device Support Wishlist
       ___________________________________________________________________
        
       ROCm Device Support Wishlist
        
       Author : pella
       Score  : 91 points
       Date   : 2025-01-20 19:31 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | superkuh wrote:
       | My wishlist for ROCm support is actually supporting the cards
       | they already released. But that's not going to happen.
       | 
       | By the time an (consumer) AMD device is supported by ROCm it'll
       | only have a few years of ROCm support left before support is
       | removed. Lifespan of support for AMD cards with ROCm is very
       | short. You end up having to use Vulkan which is not optimized, of
       | course, and a bit slower. I once bought an AMD GPU 2 years after
       | release and 1 year after I bought it ROCm support was dropped.
        
         | slavik81 wrote:
         | FWIW, every ROCm library currently in the Debian 13 'main' and
         | Ubuntu 24.04 'universe' repository has been built for and
         | tested on every discrete consumer GPU architecture since Vega.
         | Not every package is available that way, but the ones that are
         | have been tested on and work on Vega 10, Vega 20, RDNA 1, 2 and
         | 3.
         | 
         | Note that these are not the packages distributed by AMD. They
         | are the packages in the OS repositories. Not all the ROCm
         | packages are there, but most of them are. The biggest downside
         | is that some of them are a little old and don't have all the
         | latest performance optimizations for RDNA 3.
         | 
         | Those operating systems will be around for the next decade, so
         | that should at least provide one option for users of older
         | hardware.
        
           | mappu wrote:
           | I can confirm this, Debian's ROCm distribution worked great
           | for me on some "unsupported" cards.
        
           | buildbot wrote:
           | Packages existing and the software actually working are very
           | different things. You can run rocm on unsupported GPUs like a
           | 780m, but as soon as you hit an issue you are out of luck.
           | And you'll hit an issue.
           | 
           | For example, my 780m gets 1-2 inferences from llama.cpp
           | before dropping off the bus due to a segfault in the driver.
           | It's a bad enough lockup that linux can't cleanly shutdown
           | and will hang under hard rebooted.
        
             | slavik81 wrote:
             | The 780m is an integrated GPU. I specified discrete GPUs
             | because that's what I have tested and can confirm will
             | work.
             | 
             | I have dozens of different AMD GPUs and I personally host
             | most of the Debian ROCm Team's continuous integration
             | servers. Over the past year, I have worked together with
             | other members of the Debian project to ensure that every
             | potentially affected ROCm library is tested on every
             | discrete consumer AMD GPU architecture since Vega whenever
             | a new version of a package is uploaded to Debian.
             | 
             | FWIW, Framework Computers donated a few laptops to Debian
             | last year, which I plan to use to enable the 780m too. I
             | just haven't had the time yet. Fedora has some patches that
             | add support for that architecture.
        
               | buildbot wrote:
               | Yes I am aware it's an integrated GPU. I guess supporting
               | your iGPU customers is less important.
               | 
               | I'll give my 6900xt a try again, and see if I can get it
               | to do more than 2x inferences as well. I recall it being
               | slightly more stable but still not useable compared to
               | CPU only or Nvidia cards.
        
         | bb88 wrote:
         | They should have at a minimum 5 year support release cycle.
        
         | mikepurvis wrote:
         | As the underdog AMD can't afford to have their efforts
         | perceived as half-assed or a hobby or whatever. They should be
         | moving heaven and earth to maximize their value proposition,
         | promising and delivering on _longer_ support horizons to
         | demonstrate the long term value of their ecosystem.
        
           | seanhunter wrote:
           | Honestly at this point half-assed support would be a
           | significant step up from their historical position. The one
           | thing they have pioneered is new tiers of fractional
           | assedness asymptotically approaching zero.
        
           | XorNot wrote:
           | I mean at this point my next card is going to be an nvidia.
           | It has been a total waste of time trying to use rocm for
           | anything machine-learning based. No one uses it. No one can
           | use it. The card I have is somehow always not quite
           | supported.
        
             | llm_trw wrote:
             | We go from:
             | 
             | Support is coming in three months!
             | 
             | To
             | 
             | This card is ancient and will be no longer developed for.
             | Buy our brand new card released in three months!
             | 
             | Every damned time.
        
         | 7speter wrote:
         | I have a mi50 with 16gb of hbm thats collecting dust (its Vega
         | bases, so it can play games, I guess) because I don't want to
         | bother setting up a system with Ubuntu 20.04, the last version
         | of Ubuntu the last version of ROCM that supported the MI50
         | works on.
         | 
         | With situations like this, its not hard to see why Nvidia
         | totally dominates in the compute/ai market.
        
           | slavik81 wrote:
           | The MI50 may be considered deprecated in newer releases, but
           | it seems to work fine in my experience. I have a Radeon VII
           | in my workstation (which shares the same architecture) and I
           | host the MI60 test machine for Debian AI Team. I haven't had
           | any trouble with them.
        
             | 7speter wrote:
             | I don't think the mi60 has reached deprecated status yet
             | (the last time I look at prices for the mi50 and mi60, the
             | mi60 was something like 3x expensive, and I think thats
             | because its still officially supported), but I'll check
             | this all out. Thanks.
        
               | slavik81 wrote:
               | The MI60 is basically just a faster MI50 with more
               | memory. They were deprecated together. It's plausible
               | there could be small firmware or driver differences that
               | cause issues in one but not the other, but I think that's
               | unlikely.
        
           | FuriouslyAdrift wrote:
           | AMD did over $5 billion in GPU compute (Instinct line) last
           | year. Not nVidia numbers but also not bad. Customers love
           | that they can actually get Instinct system rather than trying
           | to compete with the hyperscalers for limited supplies of
           | nVidia systems. Meta and Microsoft are the two biggest buyers
           | of AMD Instincts, though...
           | 
           | AMD Instinct is also more power efficient and has comparable
           | (if not better) performance for the same (or less) price.
        
             | 7speter wrote:
             | Meta and Microsoft buys hundreds of thousands of Nvidia
             | accelerators a year, and are a big reason why everyone else
             | has to compete for nvidia units.
        
         | FuriouslyAdrift wrote:
         | AMD has separate architectures for GPU compute (Instinct https:
         | //www.amd.com/en/products/accelerators/instinct/mi300....) and
         | consumer video (Radeon).
         | 
         | AMD are merging the architectures (UDNA) like nVidia but it's
         | not going to be before 2026. (https://wccftech.com/amd-ryzen-
         | zen-6-cpus-radeon-udna-gpus-u...)
        
           | 7speter wrote:
           | You can use ROCM on consumer radeon as long as you pay more
           | than 400 dollars for one of their gpus. Meanwhile, you can
           | run stable diffusion with the -lowvram flag on a 3050 6gb
           | that goes for 180 dollars
        
         | nubinetwork wrote:
         | Seeing Radeon VII on the deprecation list is a little
         | saddening, unless they start putting out more 16gb+ GPUs that
         | aren't overly expensive...
        
       | ghostpepper wrote:
       | I can understand wanting to prioritize support for the cards
       | people want to use most, but they should still plan to write
       | software support for all the cards that have hardware support.
        
         | KeplerBoy wrote:
         | Imagine Nvidia not supporting CUDA on any of their cards.
         | Unthinkable.
        
           | latchkey wrote:
           | Nvidia takes a software first approach and AMD takes a
           | hardware first approach.
           | 
           | It is clear that AMD's approach isn't working and they need
           | to change their balance.
        
             | kouteiheika wrote:
             | Hardware first, but then their hardware isn't any better
             | than NVidia's, so I don't see how that's a valid excuse
             | here.
             | 
             | (Okay, maybe their super high end unobtanium-level GPUs are
             | better hardware-wise. Don't know, don't care about
             | enterprise-only hardware that is unbuyable by mere
             | mortals.)
        
               | latchkey wrote:
               | Some of it isn't unbuyable... it is just expensive.
               | https://www.ebay.com/itm/305850340813
               | 
               | But that's why my business exists...
               | https://news.ycombinator.com/item?id=42759191
        
       | latchkey wrote:
       | For context, the submitter of the issue is Anush Elangovan from
       | AMD who's recently been a lot more active on social after the
       | SemiAnalysis article, and taking the reigns / responsibility of
       | moving AMD's software efforts forward.
       | 
       | However you want to dissect this specific issue, I'd generally
       | consider this a positive step and nice to see it hit the front
       | page.
       | 
       | https://www.reddit.com/r/ROCm/comments/1i5aatx/rocm_feedback...
       | 
       | https://www.reddit.com/user/powderluv/
        
         | KeplerBoy wrote:
         | Also know as the AMD representative who recently argued with
         | Hotz about supporting tinycorp.
        
           | latchkey wrote:
           | Is that a bad thing? Good for him to stand up to extortion.
        
             | KeplerBoy wrote:
             | Hard to say from my perspective.
             | 
             | I think AMDs offer was fair (full remote access to several
             | test machines), then again just giving tinycorp the boxes
             | on their terms with no strings attached as a kind of
             | research grant would have earned them some goodwill with
             | that corner of the community.
             | 
             | Either way both parties will continue making controversial
             | decisions.
        
               | latchkey wrote:
               | It isn't hard. We offered as well. Full BIOS access even.
               | 
               | Another neocloud, that is funded directly by AMD, also
               | offered to buy him boxes. He refused. It _had_ to come
               | from AMD. That 's absurd and extortionist.
               | 
               | Long thread here:
               | https://x.com/HotAisle/status/1880467322848137295
        
               | dhruvdh wrote:
               | To add, AMD only makes _parts_ of an MI300X server.
               | 
               | It's like asking a tire manufacturer to give you a car
               | for free.
        
               | latchkey wrote:
               | Great analogy!
               | 
               | Just uploaded some pictures of how complex these machines
               | really are...
               | 
               | https://imgur.com/gallery/dell-xe9860-amd-mi300x-bGKyQKr
        
             | modeless wrote:
             | Offering software support in exchange for payment is
             | extortion?
        
               | latchkey wrote:
               | It is far more complex than that.
        
             | rikafurude21 wrote:
             | "I estimate having software on par with NVDA would raise
             | their market cap by 100B. Then you estimate what the chance
             | it that @__tinygrad__ can close that gap, say it's 0.1%,
             | probably a very low estimate when you see what we have done
             | so far, but still...
             | 
             | That's worth 100M. And they won't even send us 2 ~100k
             | boxes. In what world does that make sense, except in a
             | world where decisions are made based on pride instead of
             | ROI. Culture issue."
             | 
             | https://x.com/__tinygrad__/status/1879620242315317304
        
               | AshamedCaptain wrote:
               | I would really like to see a concrete, legit way to
               | materialize a "100M raise in market cap" into actual ROI
               | ...
        
               | rikafurude21 wrote:
               | When the market cap rises, price of shares goes up? Do
               | you know what a market cap is?
        
               | carlmr wrote:
               | Yes, but the company doesn't get more money from that.
               | The only, way to get money out of it is by selling shares
               | at the new price.
               | 
               | However it would also raise future revenue, which should
               | be what's reflected by the market.
               | 
               | So it would still be something that's good for the
               | company, but not nearly 100B good.
        
               | rikafurude21 wrote:
               | You dont think AMD being competitive with Nvidia (3,37
               | trillion USD MC) would be "nearly 100B good"? Believe it
               | or not the only reason thats not the case is good bug-
               | free software. Thats what tinygrad is doing
        
               | latchkey wrote:
               | This is his opinion, nothing more, nothing less. He
               | currently has a partially implemented piece of software
               | that hasn't seen a release since November and isn't
               | performant at all.
               | 
               | Take the free offer, prove everyone wrong and then start
               | to tell us how great you are.
               | https://x.com/HotAisle/status/1880507210217750550
        
               | FeepingCreature wrote:
               | To be fair, having seen his software evolve, and having
               | seen ROCm evolve, I'm more optimistic for his software in
               | a year than yours.
               | 
               | He picked his _problem_ better.
        
         | clhodapp wrote:
         | Which SemiAnalysis article?
        
           | latchkey wrote:
           | https://semianalysis.com/2024/12/22/mi300x-vs-h100-vs-h200-b.
           | ..
        
       | ac29 wrote:
       | AMD supports only a single Radeon GPU in Linux (RX 7900 in three
       | variants)?
       | 
       | Windows support is also bad, but supports significantly more than
       | one GPU.
        
         | llm_trw wrote:
         | Imagine nvidia supported only the 4090, 4080 and 4070 for cuda
         | at the consumer level. With the 3090 not being supported since
         | the 40xx series came out. This is what amd is defending here.
        
         | Delk wrote:
         | I honestly can't figure out which Radeon GPUs are supposed to
         | be supported.
         | 
         | The GitHub discussion page in the title lists RX 6800 (and a
         | bunch of RX 7xxx GPUs) as supported, and some lower-end RX 6xxx
         | ones as supported for runtime. The same comment also links to a
         | page on the AMD website for a "compatibility matrix" [1].
         | 
         | That page only shows RX 7900 variants as supported on the
         | consumer Radeon tab. On the workstation side, Radeon Pro W6800
         | and some W7xxx cards are listed as supported. It also suggests
         | to see the "Use ROCm on Radeon GPU documentation" page [2] if
         | using ROCm on Radeon or Radeon Pro cards.
         | 
         | That link leads to a page for "compatibility matrices" --
         | again. If you click the link for Linux compatibility, you get a
         | page on "Linux support matrices by ROCm version" [3].
         | 
         | That "by ROCm version" page literally only has a subsection for
         | ROCm 6.2.3. It only lists RX 7900 and Pro W7xxx cards as
         | supported. No mention of W6800.
         | 
         | (The page does have an unintuitively placed "Version List" link
         | through which you can find docs for ROCm 5.7 [4]. Those older
         | docs are no more useful than the 6.2.3 ones.)
         | 
         | Is RX 6800 supported? Or W6800? Even the amd.com pages seem to
         | contradict each other on the latter.
         | 
         | Maybe the pages on the AMD site only list official production
         | support or something. In any case it's confusing as hell.
         | 
         | Nothing against the GitHub page author who at least seems to
         | try and be clear but the official documentation leaves a lot to
         | be desired.
         | 
         | [1] https://rocm.docs.amd.com/projects/install-on-
         | linux/en/lates...
         | 
         | [2]
         | https://rocm.docs.amd.com/projects/radeon/en/latest/docs/com...
         | 
         | [3]
         | https://rocm.docs.amd.com/projects/radeon/en/latest/docs/com...
         | 
         | [4]
         | https://rocm.docs.amd.com/projects/radeon/en/docs-5.7.0/docs...
        
       | wtcactus wrote:
       | I'm constantly baffled and amused on why AMD keeps majorly
       | failing at this.
       | 
       | Either the management at AMD is not smart enough to understand
       | that without the computing software side they will always be a
       | distant number 2 to NVIDIA, or the management at AMD considers it
       | hopeless to ever be able to create something as good as CUDA
       | because they don't have and can't hire smart enough people to
       | write the software.
       | 
       | Really, it's just baffling why they continue on this path to
       | irrelevance. Give it a few years and even Intel will get ahead of
       | them on the GPU side.
        
         | musicale wrote:
         | If I were Jensen, I would snap up all the GPU software experts
         | I possibly could, and put them to work improving the CUDA
         | ecosystem. I'd also spin up a big research group to further
         | fuel the CUDA pipeline for hardware, software, and application
         | areas.
         | 
         | Which is exactly what NVIDIA seems to be doing.
         | 
         | AMD's ROCm software group seems far behind, is probably
         | understaffed, and probably is paid a fraction of what NVIDIA
         | pays its CUDA software groups.
         | 
         | AMD also has to catch up with NVlink and Spectrum-X (and/or
         | InfiniBand.)
         | 
         | AMD's main leverage point is its CPUs, and its raw GPU hardware
         | isn't bad, but there is a long way to go in terms of GPU
         | software ecosystem and interconnect.
        
       | maverwa wrote:
       | I figure that list is only what's officially supported, meaning
       | things not on that list may or may not work?. For example, my
       | 6800 XT runs stable diffusion just fine on Linux with PyTorch
       | ROCm.
        
       ___________________________________________________________________
       (page generated 2025-01-20 23:00 UTC)