[HN Gopher] What Is AMD ROCm?
___________________________________________________________________
What Is AMD ROCm?
Author : todsacerdoti
Score : 50 points
Date : 2021-11-25 15:13 UTC (7 hours ago)
(HTM) web link (threedots.ovh)
(TXT) w3m dump (threedots.ovh)
| ClumsyPilot wrote:
| Is this like 3rd or 4th GPGPU programming framework to come out
| of AMD? They will never get any adoption at this rate.
| karmakaze wrote:
| Add an indication as to where 'ROCm' comes from. Wikipedia
| redirects to [0]. Still have no clue where the 'm' comes from (is
| it the last letter of platfor'm'?)
|
| [0]
| https://en.wikipedia.org/wiki/GPUOpen#Radeon_Open_Compute_(R...
| slavik81 wrote:
| It used to be "Radeon Open Compute platform" with the m coming
| from platform. It's now "Radeon Open Ecosystem" with the c and
| the m from ecosystem [1].
|
| The old expansion still shows up in a few places where it's
| difficult to remove.
|
| [1]: https://github.com/ROCmSoftwarePlatform/rocBLAS#rocblas
| karmakaze wrote:
| Ah so (R)adeon (O)pen E(c)osyste(m) stylized ROCm. I can see
| why it's not often spelled out, thanks.
| blip54321 wrote:
| I tried ROCm. I bought a supported card (RX570/RX580 series).
| Within 12 months, AMD dropped support. Newer versions of ROCm
| didn't work with the card. Older versions didn't actually work
| either, since all other tooling assumed newer versions.
| Dependency hell. When things kinda started working in one
| context, where I could use old tooling (not the one I wanted to
| use ROCm in), it CUPy was slower than CPU, and then hard crashed
| my computer randomly. I read a web page that the card can either
| act as HIP or a graphics card, but not both at the same time. I
| have no idea if that's right, but if it is, it's dumb.
|
| AMD had no support. Card maker said this didn't fall under
| warranty. I got burned over and over.
|
| I bought NVidia. It just worked.
|
| I'm working on a potentially major piece of infrastructure, and
| AMD is accumulating debt. If it worked out-of-the-gate, I imagine
| we would have kept support. Within 6 more months, we'll be
| NVidia-specific. AMD will be that much further in the hole for
| support.
|
| I'd love for ROCm to win, since I think open is critical here. On
| the other hand, I can't imagine it will. AMD would need to run
| this as a loss leader for a while, and engineer this at a level
| to get this competitive with NVidia.
|
| A half-baked product like ROCm seems like a money hole for
| everyone involved. Customers get burned, and I can't imagine AMD
| comes out positive.
|
| In the meantime, NVidia is minting gold here.
| belval wrote:
| Yeah Nvidia is so far ahead at this point I wouldn't really
| risk it on anything else. The problem is that all that
| troubleshooting adds up fast and the whole DL golden years were
| built on CUDA. TensorFlow and PyTorch both "support" ROCm and
| HIP but you run into weird issues very often. A lot of public
| repository for recent architectures also come with their own
| CUDA kernels that you need to compile to the vendor lock-in is
| very strong in my opinion.
|
| And even if AMD's offering was not an absolute dumpster fire,
| Google, Microsoft and Amazon all have their own accelerator
| that are maturing and will be more cost effective on the long
| run.
| zmachinaz wrote:
| I have been using ROCm for 2y+. The investment in this
| infrastructure was a big mistake. The biggest burner was the need
| to do a clean install on each new ROCm release. Clean here means
| manually finding and deleting all traces from the previous ROCm
| version, and recompilation of apps like pytorch. Good upgrades
| took hours, bad ones days ... . Finally I settled to freeze the
| system and not touch it anymore until retirement of the cards,
| hopefully soon.
| stuaxo wrote:
| Having to choose between Steam support and ROCm drivers is a pain
| - it stops tinkering.
|
| Almost everyone on Linux will have experience of breaking their
| drivers at some point, and installing another alternate set is a
| big risk.
|
| It seems silly to not have OpenCL and HIP access without having
| to use this alternate stack.
| esistgut wrote:
| Shouldn't it work on top of the open source drivers?
| slavik81 wrote:
| The ROCm and AMDGPU PRO stacks were unified with ROCm 4.5 and
| AMDGPU 21.40. I would expect Steam to work. That was just a
| couple weeks ago, but have you tried it out?
| vetinari wrote:
| Isn't 4.5 also the version, that kicked Vega64 to the curb?
|
| So even if I wanted, I can't. Sincerely, I'm fed up with
| AMD's attitude towards compute.
| my123 wrote:
| > Isn't 4.5 also the version, that kicked Vega64 to the
| curb?
|
| ROCm 4.5 is the _last_ version to support the Vega10 ASIC
| (MI25, Vega56, Vega64).
|
| https://github.com/RadeonOpenCompute/ROCm/#amd-instinct-
| mi25...
|
| The next ROCm release after 4.5 is sometime in Q1 next
| year. So it's on planned death really soon.
|
| It is transitioning to _that_ comical AMD "enabled in the
| codebase but not tested and not supported" state, rotting
| slowly like Polaris support did.
| esistgut wrote:
| https://www.reddit.com/r/Amd/comments/r1gb05/radeon_6600xt_c...
| this show some numbers on ROCm performances.
| VHRanger wrote:
| It's something that has very little uptake because it's not
| supported on mainline GPUs?
|
| I want to use it for compute on something like a rx 6800 and to
| my knowledge can't
| bubblethink wrote:
| AMD is much smaller in comparison, and their main focus with
| ROCM is to get pytorch and tensorflow to work with enterprise
| GPUs. Everything else is long tail in terms of scale.
| vardump wrote:
| Sadly this is why I (and many others) keep begrudgingly
| choosing Nvidia cards instead.
|
| Once the developers are familiar with CUDA, what are the
| chances you'd choose ROCm for deployment? Yeah, not great.
|
| Consumer cards' ROCm support is strategic.
| rjzzleep wrote:
| I was going to post exactly the same. And if you look at the
| github issues of their project you will see that very often it
| looks outsourced support teams comment on these issues with the
| standard: we will discuss this internally and get back to you
| kind of response that enterprises twitter support usually
| gives.
|
| Not really what you expect from quality engineering. At the end
| of the day these kind of companies don't understand the value
| of development and engineering clients as customers.
|
| It's unfortunate really.
|
| EDIT: here's an example: ROCmSupport commented
| on Feb 22 * Hi @powderluv Thanks for
| reaching us. I can not comment on RDNA2 support right now.
| We are working on adding a few more new hardware into ROCm
| environment. Please stay tuned via our documentation.
| Thank you. @ROCmSupport ROCmSupport closed this
| on Feb 22
|
| https://github.com/RadeonOpenCompute/ROCm/issues/1390#issuec...
| dylan604 wrote:
| Sounds like a race from support staff to see how many tickets
| they can close to make themselves look good for the PM at the
| next review
| nicolaslem wrote:
| On the other hand AMD has a fraction of the engineering
| resources Intel and Nvidia have. They need to make some
| choice and looking back at the last few years it seems that
| their choice to focus their efforts on hardware and gaming
| paid off.
| to11mtm wrote:
| ATI/AMD has always had finicky drivers and engineering
| decisions IMO.
|
| I guess on the plus side they at least have a more open
| driver than NVidia (AFAIK nouveau doesn't get any support
| from them, at least AMD tries to maintain their open source
| driver on some level.)
|
| And yet, every time I've tried an ATI/AMD Card, the driver
| experience even in windows has been pretty off-putting, and
| while I suppose we are finally at a point where one is less
| likely to be impacted by their issues with 768p overscan on
| TVs, I wonder what zany quirk they'll come up with next.
| dagmx wrote:
| On the flip side, I think that they haven't focused on a
| compelling compute story means that anyone doing anything
| other than pure gaming is better served by an Nvidia card.
| sorenjan wrote:
| There's also no Windows support, so you cant use it to make
| consumer programs, or on your gaming rig without dual boot.
| It's made for data centers with bespoke software, not really
| for distribution.
| dogma1138 wrote:
| It's worst because it has no intermediate state there is no
| guarantee for forward compatibility (backward compatibility is
| also kinda broken). Shipping anything with HIP will be a pain.
|
| With CUDA you simply target a specific CUDA version and there
| is full forward and backwards compatibility on any hardware
| that supports that version.
| slavik81 wrote:
| It's not officially supported, but I think it would work if you
| installed the official ROCm 4.5 packages. The RX 6800 is listed
| as gfx1030 [1], which has been shipping in most libraries since
| ROCm 4.2. I've heard there were a few bugs, but I've been using
| it for months without encountering any issues myself.
|
| (I work for AMD on ROCm. All opinions are my own.)
|
| [1]: https://llvm.org/docs/AMDGPUUsage.html#processors
| techdragon wrote:
| Can you please emphasise to your management chain how
| important the need for less terrible support and developer
| relations vis GitHub is. They can close support questions but
| basically anything in these repos gets closed as fast as
| possible even feature requests and other things that should
| be left open.
|
| I doubt they have the funding to meaningfully impact the
| overall hardware and software support matrix but if they
| could just make the GitHub repo feel less like I'm back in my
| days working at a call centre raising tickets to a second
| level support team in a foreign country who's only business
| KPI was tickets closed per day.
| slavik81 wrote:
| I agree with you. The communication between AMD and the
| community has been less than ideal.
|
| I think it's worth noting, though, it's not always as bad
| as the example in the sibling comment. The
| RadeonOpenCompute/ROCm repo catches a lot of questions
| about big features and the future direction of the project.
| Those are particularly difficult to answer as an engineer.
| As much as I'd like to, I can't make a product announcement
| in a GitHub issue.
|
| If you have a specific technical problem and you open an
| issue on the repo for the corresponding component, you'll
| probably have a better experience. Some teams are more
| responsive than others, but that will at least maximize
| your odds of successful resolution.
___________________________________________________________________
(page generated 2021-11-25 23:01 UTC)