[HN Gopher] ROCm Device Support Wishlist
___________________________________________________________________
ROCm Device Support Wishlist
Author : pella
Score : 91 points
Date : 2025-01-20 19:31 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| superkuh wrote:
| My wishlist for ROCm support is actually supporting the cards
| they already released. But that's not going to happen.
|
| By the time an (consumer) AMD device is supported by ROCm it'll
| only have a few years of ROCm support left before support is
| removed. Lifespan of support for AMD cards with ROCm is very
| short. You end up having to use Vulkan which is not optimized, of
| course, and a bit slower. I once bought an AMD GPU 2 years after
| release and 1 year after I bought it ROCm support was dropped.
| slavik81 wrote:
| FWIW, every ROCm library currently in the Debian 13 'main' and
| Ubuntu 24.04 'universe' repository has been built for and
| tested on every discrete consumer GPU architecture since Vega.
| Not every package is available that way, but the ones that are
| have been tested on and work on Vega 10, Vega 20, RDNA 1, 2 and
| 3.
|
| Note that these are not the packages distributed by AMD. They
| are the packages in the OS repositories. Not all the ROCm
| packages are there, but most of them are. The biggest downside
| is that some of them are a little old and don't have all the
| latest performance optimizations for RDNA 3.
|
| Those operating systems will be around for the next decade, so
| that should at least provide one option for users of older
| hardware.
| mappu wrote:
| I can confirm this, Debian's ROCm distribution worked great
| for me on some "unsupported" cards.
| buildbot wrote:
| Packages existing and the software actually working are very
| different things. You can run rocm on unsupported GPUs like a
| 780m, but as soon as you hit an issue you are out of luck.
| And you'll hit an issue.
|
| For example, my 780m gets 1-2 inferences from llama.cpp
| before dropping off the bus due to a segfault in the driver.
| It's a bad enough lockup that linux can't cleanly shutdown
| and will hang under hard rebooted.
| slavik81 wrote:
| The 780m is an integrated GPU. I specified discrete GPUs
| because that's what I have tested and can confirm will
| work.
|
| I have dozens of different AMD GPUs and I personally host
| most of the Debian ROCm Team's continuous integration
| servers. Over the past year, I have worked together with
| other members of the Debian project to ensure that every
| potentially affected ROCm library is tested on every
| discrete consumer AMD GPU architecture since Vega whenever
| a new version of a package is uploaded to Debian.
|
| FWIW, Framework Computers donated a few laptops to Debian
| last year, which I plan to use to enable the 780m too. I
| just haven't had the time yet. Fedora has some patches that
| add support for that architecture.
| buildbot wrote:
| Yes I am aware it's an integrated GPU. I guess supporting
| your iGPU customers is less important.
|
| I'll give my 6900xt a try again, and see if I can get it
| to do more than 2x inferences as well. I recall it being
| slightly more stable but still not useable compared to
| CPU only or Nvidia cards.
| bb88 wrote:
| They should have at a minimum 5 year support release cycle.
| mikepurvis wrote:
| As the underdog AMD can't afford to have their efforts
| perceived as half-assed or a hobby or whatever. They should be
| moving heaven and earth to maximize their value proposition,
| promising and delivering on _longer_ support horizons to
| demonstrate the long term value of their ecosystem.
| seanhunter wrote:
| Honestly at this point half-assed support would be a
| significant step up from their historical position. The one
| thing they have pioneered is new tiers of fractional
| assedness asymptotically approaching zero.
| XorNot wrote:
| I mean at this point my next card is going to be an nvidia.
| It has been a total waste of time trying to use rocm for
| anything machine-learning based. No one uses it. No one can
| use it. The card I have is somehow always not quite
| supported.
| llm_trw wrote:
| We go from:
|
| Support is coming in three months!
|
| To
|
| This card is ancient and will be no longer developed for.
| Buy our brand new card released in three months!
|
| Every damned time.
| 7speter wrote:
| I have a mi50 with 16gb of hbm thats collecting dust (its Vega
| bases, so it can play games, I guess) because I don't want to
| bother setting up a system with Ubuntu 20.04, the last version
| of Ubuntu the last version of ROCM that supported the MI50
| works on.
|
| With situations like this, its not hard to see why Nvidia
| totally dominates in the compute/ai market.
| slavik81 wrote:
| The MI50 may be considered deprecated in newer releases, but
| it seems to work fine in my experience. I have a Radeon VII
| in my workstation (which shares the same architecture) and I
| host the MI60 test machine for Debian AI Team. I haven't had
| any trouble with them.
| 7speter wrote:
| I don't think the mi60 has reached deprecated status yet
| (the last time I look at prices for the mi50 and mi60, the
| mi60 was something like 3x expensive, and I think thats
| because its still officially supported), but I'll check
| this all out. Thanks.
| slavik81 wrote:
| The MI60 is basically just a faster MI50 with more
| memory. They were deprecated together. It's plausible
| there could be small firmware or driver differences that
| cause issues in one but not the other, but I think that's
| unlikely.
| FuriouslyAdrift wrote:
| AMD did over $5 billion in GPU compute (Instinct line) last
| year. Not nVidia numbers but also not bad. Customers love
| that they can actually get Instinct system rather than trying
| to compete with the hyperscalers for limited supplies of
| nVidia systems. Meta and Microsoft are the two biggest buyers
| of AMD Instincts, though...
|
| AMD Instinct is also more power efficient and has comparable
| (if not better) performance for the same (or less) price.
| 7speter wrote:
| Meta and Microsoft buys hundreds of thousands of Nvidia
| accelerators a year, and are a big reason why everyone else
| has to compete for nvidia units.
| FuriouslyAdrift wrote:
| AMD has separate architectures for GPU compute (Instinct https:
| //www.amd.com/en/products/accelerators/instinct/mi300....) and
| consumer video (Radeon).
|
| AMD are merging the architectures (UDNA) like nVidia but it's
| not going to be before 2026. (https://wccftech.com/amd-ryzen-
| zen-6-cpus-radeon-udna-gpus-u...)
| 7speter wrote:
| You can use ROCM on consumer radeon as long as you pay more
| than 400 dollars for one of their gpus. Meanwhile, you can
| run stable diffusion with the -lowvram flag on a 3050 6gb
| that goes for 180 dollars
| nubinetwork wrote:
| Seeing Radeon VII on the deprecation list is a little
| saddening, unless they start putting out more 16gb+ GPUs that
| aren't overly expensive...
| ghostpepper wrote:
| I can understand wanting to prioritize support for the cards
| people want to use most, but they should still plan to write
| software support for all the cards that have hardware support.
| KeplerBoy wrote:
| Imagine Nvidia not supporting CUDA on any of their cards.
| Unthinkable.
| latchkey wrote:
| Nvidia takes a software first approach and AMD takes a
| hardware first approach.
|
| It is clear that AMD's approach isn't working and they need
| to change their balance.
| kouteiheika wrote:
| Hardware first, but then their hardware isn't any better
| than NVidia's, so I don't see how that's a valid excuse
| here.
|
| (Okay, maybe their super high end unobtanium-level GPUs are
| better hardware-wise. Don't know, don't care about
| enterprise-only hardware that is unbuyable by mere
| mortals.)
| latchkey wrote:
| Some of it isn't unbuyable... it is just expensive.
| https://www.ebay.com/itm/305850340813
|
| But that's why my business exists...
| https://news.ycombinator.com/item?id=42759191
| latchkey wrote:
| For context, the submitter of the issue is Anush Elangovan from
| AMD who's recently been a lot more active on social after the
| SemiAnalysis article, and taking the reigns / responsibility of
| moving AMD's software efforts forward.
|
| However you want to dissect this specific issue, I'd generally
| consider this a positive step and nice to see it hit the front
| page.
|
| https://www.reddit.com/r/ROCm/comments/1i5aatx/rocm_feedback...
|
| https://www.reddit.com/user/powderluv/
| KeplerBoy wrote:
| Also know as the AMD representative who recently argued with
| Hotz about supporting tinycorp.
| latchkey wrote:
| Is that a bad thing? Good for him to stand up to extortion.
| KeplerBoy wrote:
| Hard to say from my perspective.
|
| I think AMDs offer was fair (full remote access to several
| test machines), then again just giving tinycorp the boxes
| on their terms with no strings attached as a kind of
| research grant would have earned them some goodwill with
| that corner of the community.
|
| Either way both parties will continue making controversial
| decisions.
| latchkey wrote:
| It isn't hard. We offered as well. Full BIOS access even.
|
| Another neocloud, that is funded directly by AMD, also
| offered to buy him boxes. He refused. It _had_ to come
| from AMD. That 's absurd and extortionist.
|
| Long thread here:
| https://x.com/HotAisle/status/1880467322848137295
| dhruvdh wrote:
| To add, AMD only makes _parts_ of an MI300X server.
|
| It's like asking a tire manufacturer to give you a car
| for free.
| latchkey wrote:
| Great analogy!
|
| Just uploaded some pictures of how complex these machines
| really are...
|
| https://imgur.com/gallery/dell-xe9860-amd-mi300x-bGKyQKr
| modeless wrote:
| Offering software support in exchange for payment is
| extortion?
| latchkey wrote:
| It is far more complex than that.
| rikafurude21 wrote:
| "I estimate having software on par with NVDA would raise
| their market cap by 100B. Then you estimate what the chance
| it that @__tinygrad__ can close that gap, say it's 0.1%,
| probably a very low estimate when you see what we have done
| so far, but still...
|
| That's worth 100M. And they won't even send us 2 ~100k
| boxes. In what world does that make sense, except in a
| world where decisions are made based on pride instead of
| ROI. Culture issue."
|
| https://x.com/__tinygrad__/status/1879620242315317304
| AshamedCaptain wrote:
| I would really like to see a concrete, legit way to
| materialize a "100M raise in market cap" into actual ROI
| ...
| rikafurude21 wrote:
| When the market cap rises, price of shares goes up? Do
| you know what a market cap is?
| carlmr wrote:
| Yes, but the company doesn't get more money from that.
| The only, way to get money out of it is by selling shares
| at the new price.
|
| However it would also raise future revenue, which should
| be what's reflected by the market.
|
| So it would still be something that's good for the
| company, but not nearly 100B good.
| rikafurude21 wrote:
| You dont think AMD being competitive with Nvidia (3,37
| trillion USD MC) would be "nearly 100B good"? Believe it
| or not the only reason thats not the case is good bug-
| free software. Thats what tinygrad is doing
| latchkey wrote:
| This is his opinion, nothing more, nothing less. He
| currently has a partially implemented piece of software
| that hasn't seen a release since November and isn't
| performant at all.
|
| Take the free offer, prove everyone wrong and then start
| to tell us how great you are.
| https://x.com/HotAisle/status/1880507210217750550
| FeepingCreature wrote:
| To be fair, having seen his software evolve, and having
| seen ROCm evolve, I'm more optimistic for his software in
| a year than yours.
|
| He picked his _problem_ better.
| clhodapp wrote:
| Which SemiAnalysis article?
| latchkey wrote:
| https://semianalysis.com/2024/12/22/mi300x-vs-h100-vs-h200-b.
| ..
| ac29 wrote:
| AMD supports only a single Radeon GPU in Linux (RX 7900 in three
| variants)?
|
| Windows support is also bad, but supports significantly more than
| one GPU.
| llm_trw wrote:
| Imagine nvidia supported only the 4090, 4080 and 4070 for cuda
| at the consumer level. With the 3090 not being supported since
| the 40xx series came out. This is what amd is defending here.
| Delk wrote:
| I honestly can't figure out which Radeon GPUs are supposed to
| be supported.
|
| The GitHub discussion page in the title lists RX 6800 (and a
| bunch of RX 7xxx GPUs) as supported, and some lower-end RX 6xxx
| ones as supported for runtime. The same comment also links to a
| page on the AMD website for a "compatibility matrix" [1].
|
| That page only shows RX 7900 variants as supported on the
| consumer Radeon tab. On the workstation side, Radeon Pro W6800
| and some W7xxx cards are listed as supported. It also suggests
| to see the "Use ROCm on Radeon GPU documentation" page [2] if
| using ROCm on Radeon or Radeon Pro cards.
|
| That link leads to a page for "compatibility matrices" --
| again. If you click the link for Linux compatibility, you get a
| page on "Linux support matrices by ROCm version" [3].
|
| That "by ROCm version" page literally only has a subsection for
| ROCm 6.2.3. It only lists RX 7900 and Pro W7xxx cards as
| supported. No mention of W6800.
|
| (The page does have an unintuitively placed "Version List" link
| through which you can find docs for ROCm 5.7 [4]. Those older
| docs are no more useful than the 6.2.3 ones.)
|
| Is RX 6800 supported? Or W6800? Even the amd.com pages seem to
| contradict each other on the latter.
|
| Maybe the pages on the AMD site only list official production
| support or something. In any case it's confusing as hell.
|
| Nothing against the GitHub page author who at least seems to
| try and be clear but the official documentation leaves a lot to
| be desired.
|
| [1] https://rocm.docs.amd.com/projects/install-on-
| linux/en/lates...
|
| [2]
| https://rocm.docs.amd.com/projects/radeon/en/latest/docs/com...
|
| [3]
| https://rocm.docs.amd.com/projects/radeon/en/latest/docs/com...
|
| [4]
| https://rocm.docs.amd.com/projects/radeon/en/docs-5.7.0/docs...
| wtcactus wrote:
| I'm constantly baffled and amused on why AMD keeps majorly
| failing at this.
|
| Either the management at AMD is not smart enough to understand
| that without the computing software side they will always be a
| distant number 2 to NVIDIA, or the management at AMD considers it
| hopeless to ever be able to create something as good as CUDA
| because they don't have and can't hire smart enough people to
| write the software.
|
| Really, it's just baffling why they continue on this path to
| irrelevance. Give it a few years and even Intel will get ahead of
| them on the GPU side.
| musicale wrote:
| If I were Jensen, I would snap up all the GPU software experts
| I possibly could, and put them to work improving the CUDA
| ecosystem. I'd also spin up a big research group to further
| fuel the CUDA pipeline for hardware, software, and application
| areas.
|
| Which is exactly what NVIDIA seems to be doing.
|
| AMD's ROCm software group seems far behind, is probably
| understaffed, and probably is paid a fraction of what NVIDIA
| pays its CUDA software groups.
|
| AMD also has to catch up with NVlink and Spectrum-X (and/or
| InfiniBand.)
|
| AMD's main leverage point is its CPUs, and its raw GPU hardware
| isn't bad, but there is a long way to go in terms of GPU
| software ecosystem and interconnect.
| maverwa wrote:
| I figure that list is only what's officially supported, meaning
| things not on that list may or may not work?. For example, my
| 6800 XT runs stable diffusion just fine on Linux with PyTorch
| ROCm.
___________________________________________________________________
(page generated 2025-01-20 23:00 UTC)