[HN Gopher] What Is AMD ROCm?
       ___________________________________________________________________
        
       What Is AMD ROCm?
        
       Author : todsacerdoti
       Score  : 50 points
       Date   : 2021-11-25 15:13 UTC (7 hours ago)
        
 (HTM) web link (threedots.ovh)
 (TXT) w3m dump (threedots.ovh)
        
       | ClumsyPilot wrote:
       | Is this like 3rd or 4th GPGPU programming framework to come out
       | of AMD? They will never get any adoption at this rate.
        
       | karmakaze wrote:
       | Add an indication as to where 'ROCm' comes from. Wikipedia
       | redirects to [0]. Still have no clue where the 'm' comes from (is
       | it the last letter of platfor'm'?)
       | 
       | [0]
       | https://en.wikipedia.org/wiki/GPUOpen#Radeon_Open_Compute_(R...
        
         | slavik81 wrote:
         | It used to be "Radeon Open Compute platform" with the m coming
         | from platform. It's now "Radeon Open Ecosystem" with the c and
         | the m from ecosystem [1].
         | 
         | The old expansion still shows up in a few places where it's
         | difficult to remove.
         | 
         | [1]: https://github.com/ROCmSoftwarePlatform/rocBLAS#rocblas
        
           | karmakaze wrote:
           | Ah so (R)adeon (O)pen E(c)osyste(m) stylized ROCm. I can see
           | why it's not often spelled out, thanks.
        
       | blip54321 wrote:
       | I tried ROCm. I bought a supported card (RX570/RX580 series).
       | Within 12 months, AMD dropped support. Newer versions of ROCm
       | didn't work with the card. Older versions didn't actually work
       | either, since all other tooling assumed newer versions.
       | Dependency hell. When things kinda started working in one
       | context, where I could use old tooling (not the one I wanted to
       | use ROCm in), it CUPy was slower than CPU, and then hard crashed
       | my computer randomly. I read a web page that the card can either
       | act as HIP or a graphics card, but not both at the same time. I
       | have no idea if that's right, but if it is, it's dumb.
       | 
       | AMD had no support. Card maker said this didn't fall under
       | warranty. I got burned over and over.
       | 
       | I bought NVidia. It just worked.
       | 
       | I'm working on a potentially major piece of infrastructure, and
       | AMD is accumulating debt. If it worked out-of-the-gate, I imagine
       | we would have kept support. Within 6 more months, we'll be
       | NVidia-specific. AMD will be that much further in the hole for
       | support.
       | 
       | I'd love for ROCm to win, since I think open is critical here. On
       | the other hand, I can't imagine it will. AMD would need to run
       | this as a loss leader for a while, and engineer this at a level
       | to get this competitive with NVidia.
       | 
       | A half-baked product like ROCm seems like a money hole for
       | everyone involved. Customers get burned, and I can't imagine AMD
       | comes out positive.
       | 
       | In the meantime, NVidia is minting gold here.
        
         | belval wrote:
         | Yeah Nvidia is so far ahead at this point I wouldn't really
         | risk it on anything else. The problem is that all that
         | troubleshooting adds up fast and the whole DL golden years were
         | built on CUDA. TensorFlow and PyTorch both "support" ROCm and
         | HIP but you run into weird issues very often. A lot of public
         | repository for recent architectures also come with their own
         | CUDA kernels that you need to compile to the vendor lock-in is
         | very strong in my opinion.
         | 
         | And even if AMD's offering was not an absolute dumpster fire,
         | Google, Microsoft and Amazon all have their own accelerator
         | that are maturing and will be more cost effective on the long
         | run.
        
       | zmachinaz wrote:
       | I have been using ROCm for 2y+. The investment in this
       | infrastructure was a big mistake. The biggest burner was the need
       | to do a clean install on each new ROCm release. Clean here means
       | manually finding and deleting all traces from the previous ROCm
       | version, and recompilation of apps like pytorch. Good upgrades
       | took hours, bad ones days ... . Finally I settled to freeze the
       | system and not touch it anymore until retirement of the cards,
       | hopefully soon.
        
       | stuaxo wrote:
       | Having to choose between Steam support and ROCm drivers is a pain
       | - it stops tinkering.
       | 
       | Almost everyone on Linux will have experience of breaking their
       | drivers at some point, and installing another alternate set is a
       | big risk.
       | 
       | It seems silly to not have OpenCL and HIP access without having
       | to use this alternate stack.
        
         | esistgut wrote:
         | Shouldn't it work on top of the open source drivers?
        
         | slavik81 wrote:
         | The ROCm and AMDGPU PRO stacks were unified with ROCm 4.5 and
         | AMDGPU 21.40. I would expect Steam to work. That was just a
         | couple weeks ago, but have you tried it out?
        
           | vetinari wrote:
           | Isn't 4.5 also the version, that kicked Vega64 to the curb?
           | 
           | So even if I wanted, I can't. Sincerely, I'm fed up with
           | AMD's attitude towards compute.
        
             | my123 wrote:
             | > Isn't 4.5 also the version, that kicked Vega64 to the
             | curb?
             | 
             | ROCm 4.5 is the _last_ version to support the Vega10 ASIC
             | (MI25, Vega56, Vega64).
             | 
             | https://github.com/RadeonOpenCompute/ROCm/#amd-instinct-
             | mi25...
             | 
             | The next ROCm release after 4.5 is sometime in Q1 next
             | year. So it's on planned death really soon.
             | 
             | It is transitioning to _that_ comical AMD "enabled in the
             | codebase but not tested and not supported" state, rotting
             | slowly like Polaris support did.
        
       | esistgut wrote:
       | https://www.reddit.com/r/Amd/comments/r1gb05/radeon_6600xt_c...
       | this show some numbers on ROCm performances.
        
       | VHRanger wrote:
       | It's something that has very little uptake because it's not
       | supported on mainline GPUs?
       | 
       | I want to use it for compute on something like a rx 6800 and to
       | my knowledge can't
        
         | bubblethink wrote:
         | AMD is much smaller in comparison, and their main focus with
         | ROCM is to get pytorch and tensorflow to work with enterprise
         | GPUs. Everything else is long tail in terms of scale.
        
           | vardump wrote:
           | Sadly this is why I (and many others) keep begrudgingly
           | choosing Nvidia cards instead.
           | 
           | Once the developers are familiar with CUDA, what are the
           | chances you'd choose ROCm for deployment? Yeah, not great.
           | 
           | Consumer cards' ROCm support is strategic.
        
         | rjzzleep wrote:
         | I was going to post exactly the same. And if you look at the
         | github issues of their project you will see that very often it
         | looks outsourced support teams comment on these issues with the
         | standard: we will discuss this internally and get back to you
         | kind of response that enterprises twitter support usually
         | gives.
         | 
         | Not really what you expect from quality engineering. At the end
         | of the day these kind of companies don't understand the value
         | of development and engineering clients as customers.
         | 
         | It's unfortunate really.
         | 
         | EDIT: here's an example:                  ROCmSupport commented
         | on Feb 22 *             Hi @powderluv        Thanks for
         | reaching us. I can not comment on RDNA2  support right now.
         | We are working on adding a few more new hardware into ROCm
         | environment.        Please stay tuned via our documentation.
         | Thank you.                @ROCmSupport ROCmSupport closed this
         | on Feb 22
         | 
         | https://github.com/RadeonOpenCompute/ROCm/issues/1390#issuec...
        
           | dylan604 wrote:
           | Sounds like a race from support staff to see how many tickets
           | they can close to make themselves look good for the PM at the
           | next review
        
           | nicolaslem wrote:
           | On the other hand AMD has a fraction of the engineering
           | resources Intel and Nvidia have. They need to make some
           | choice and looking back at the last few years it seems that
           | their choice to focus their efforts on hardware and gaming
           | paid off.
        
             | to11mtm wrote:
             | ATI/AMD has always had finicky drivers and engineering
             | decisions IMO.
             | 
             | I guess on the plus side they at least have a more open
             | driver than NVidia (AFAIK nouveau doesn't get any support
             | from them, at least AMD tries to maintain their open source
             | driver on some level.)
             | 
             | And yet, every time I've tried an ATI/AMD Card, the driver
             | experience even in windows has been pretty off-putting, and
             | while I suppose we are finally at a point where one is less
             | likely to be impacted by their issues with 768p overscan on
             | TVs, I wonder what zany quirk they'll come up with next.
        
             | dagmx wrote:
             | On the flip side, I think that they haven't focused on a
             | compelling compute story means that anyone doing anything
             | other than pure gaming is better served by an Nvidia card.
        
         | sorenjan wrote:
         | There's also no Windows support, so you cant use it to make
         | consumer programs, or on your gaming rig without dual boot.
         | It's made for data centers with bespoke software, not really
         | for distribution.
        
         | dogma1138 wrote:
         | It's worst because it has no intermediate state there is no
         | guarantee for forward compatibility (backward compatibility is
         | also kinda broken). Shipping anything with HIP will be a pain.
         | 
         | With CUDA you simply target a specific CUDA version and there
         | is full forward and backwards compatibility on any hardware
         | that supports that version.
        
         | slavik81 wrote:
         | It's not officially supported, but I think it would work if you
         | installed the official ROCm 4.5 packages. The RX 6800 is listed
         | as gfx1030 [1], which has been shipping in most libraries since
         | ROCm 4.2. I've heard there were a few bugs, but I've been using
         | it for months without encountering any issues myself.
         | 
         | (I work for AMD on ROCm. All opinions are my own.)
         | 
         | [1]: https://llvm.org/docs/AMDGPUUsage.html#processors
        
           | techdragon wrote:
           | Can you please emphasise to your management chain how
           | important the need for less terrible support and developer
           | relations vis GitHub is. They can close support questions but
           | basically anything in these repos gets closed as fast as
           | possible even feature requests and other things that should
           | be left open.
           | 
           | I doubt they have the funding to meaningfully impact the
           | overall hardware and software support matrix but if they
           | could just make the GitHub repo feel less like I'm back in my
           | days working at a call centre raising tickets to a second
           | level support team in a foreign country who's only business
           | KPI was tickets closed per day.
        
             | slavik81 wrote:
             | I agree with you. The communication between AMD and the
             | community has been less than ideal.
             | 
             | I think it's worth noting, though, it's not always as bad
             | as the example in the sibling comment. The
             | RadeonOpenCompute/ROCm repo catches a lot of questions
             | about big features and the future direction of the project.
             | Those are particularly difficult to answer as an engineer.
             | As much as I'd like to, I can't make a product announcement
             | in a GitHub issue.
             | 
             | If you have a specific technical problem and you open an
             | issue on the repo for the corresponding component, you'll
             | probably have a better experience. Some teams are more
             | responsive than others, but that will at least maximize
             | your odds of successful resolution.
        
       ___________________________________________________________________
       (page generated 2021-11-25 23:01 UTC)