[HN Gopher] ROCm is AMD's priority, executive says
___________________________________________________________________
ROCm is AMD's priority, executive says
Author : mindcrime
Score : 215 points
Date : 2023-09-26 17:54 UTC (5 hours ago)
(HTM) web link (www.eetimes.com)
(TXT) w3m dump (www.eetimes.com)
| halJordan wrote:
| The first step is admitting there's a problem. So... that's nice.
| ethbr1 wrote:
| Exactly. People might trust AMD if they continue to invest in
| this for the next 10 years.
|
| It's clear it wasn't a corporate priority. Convince people it
| is via sustained action and investment, and _eventually_ they
| might change their minds.
| clhodapp wrote:
| If they were serious, they would start something like drm/mesa
| but for compute and it would just work out of the box with a
| stock Linux kernel.
| HideousKojima wrote:
| Only 16 years after Nvidia released CUDA
| grubbs wrote:
| I remember chatting with some Nvidia rep at CES 2008. He showed
| me how cuda could be used to accelerate video upscale and
| encoding. I was 19 at the time and just a hobbyist. I thought
| that was the coolest thing in the world.
|
| (And yes I "snuck" in to CES using a fake business card to get
| my badge)
| gdiamos wrote:
| Back in the day, using CUDA was really hard. It got better as
| more people built on it and it got battle tested.
| hyperbovine wrote:
| It's still not exactly easy, and the API has not changed
| much since the aughts except than to become richer and more
| complicated. But almost nobody writes raw CUDA anymore.
| It's abstracted away beneath many layers of libraries, e.g.
| Flax -> Jax -> lax -> XLA -> CUDA.
| Dah00n wrote:
| You remind me of one of those kind of people who are part of
| "team green" or an Apple Fan. People that wish nothing more
| than to see "the others" fail. A win for their team is good but
| a fail of the other team is the best thing ever and make them
| feel all giddy inside.
| jacquesm wrote:
| What a useless comment. It is you that drives the fire, I
| would be more than happy with a bit more competition. The sad
| reality is that right now if you want to focus on your job
| and not on the intermediary layers that NV is pretty much the
| only game in town. The 'Team Green' bs came out of the gaming
| world where people with zero qualifications were facing off
| with other people with zero qualifications about whose HW was
| 'the best' when 'the best' meant: I can play games. But this
| is entirely different, it is about long and deep support of a
| complex hardware/software combo where whole empires are built
| upon that support. Those are not decisions made lightly and
| unfortunately AMD has done very poorly so far. This
| announcement is great but the proof of the pudding will be in
| the eating, so let's see how many engineers they dedicate to
| delivering top notch software.
| HideousKojima wrote:
| The hilarious thing is I'm actually an AMD fanboy, I've made
| a point to only get their GPUs (and CPUs) for the last decade
| or so. But I'm still annoyed and frustrated that it's taken
| them so long to get their act together on this.
| Havoc wrote:
| I've concluded they're just allergic to money.
|
| Even after it became very clear that this is going to be big
| they're still slow off the block as if they're not even trying.
|
| e.g. Why not make a list of the top 500 people in AI field and
| send them cards no strings attached plus as good of low level
| documentation as you can muster. Insignificant cost to AMD but
| could move the mindshare needle if even 20 of the 500 experiment
| and make some noise about it in their circles.
|
| The Icewhale guys did exactly that best as I can tell. 350k USD
| hardware kickstarter so really lean. Yet all the youtubers even
| vaguely in their niche seem to have one of their boards. It's a
| good board don't get me wrong, but there is no way that was
| organic. Some sharp marketeer made sure the right people have the
| gear to influence mindshare.
|
| https://www.youtube.com/results?search_query=zimaboard
| [deleted]
| treprinum wrote:
| I suspect it's because they don't want to pay for software
| engineers as hardware engineers are much cheaper. I was
| contacted by their recruiter last year and it turned out the
| principal engineer salary was at the level of entry FAANG
| salary, so I suspect they can't really source the best people.
| jjoonathan wrote:
| My suspicion is that the GPGPU hardware in shipped cards has
| known problems / severe limitations due to neglect of that side
| of the architecture for the last ~10 years. Shipping a bunch of
| cards only to burn the next generation of AMD compute fans as
| badly as they burned the last generation of AMD compute fans
| would _not_ be wise. It 's painful to wait, but it may well be
| for the best.
| freeone3000 wrote:
| ROCm on Vega only works on certain motherboards because the
| card lacks a synchronization clock over the PCI bus. They
| added it on _some_ later cards. It's absurd how much is
| lacking and inconsistent.
| gdiamos wrote:
| Instinct has much better SW support today than Radeon, so you
| would need to send MI210s/etc .
|
| I think it's at the point where if you are comfortable with
| GEMM kernels, setting up SLURM, etc it is usable. But if you
| want to stay at the huggingface layer or higher, you will run
| into issues.
|
| Many AI researchers are higher level than that these days,
| but some are still of us willing to go lower level.
| spacecadet wrote:
| Yeah, this. I tried to do some computing with AMD server
| grade cards 2 years ago and found all of the API so out of
| fate and the documentation equally out of date... Went CUDA
| and didnt look back. Sad, cause Im an AMD fanboy of old.
| tysam_and wrote:
| It seems like Hotz and co are able to move pretty well on it,
| so maybe there's some low-level stuff they're using (or maybe
| they're forced to for a few reasons) w.r.t. the tinybox, but
| it is impressive how much they've been able to do so far I
| think. :3 <3 :')))) :')
| simfree wrote:
| The Radeon MI series seems to perform fine if you follow
| their software stack happy path. Same for using modified
| versions of ROCm on APUs, it's just no one has been willing
| to invest in paying a few developers to work on broader
| hardware support full-time, thus any bugs outside enterprise
| Linux distros on Radeon MI series cards do not get triaged.
| roenxi wrote:
| > e.g. Why not...
|
| A key part of progress is choosing the direction to progress
| in. Flashy knee-jerk moves like that sound good but it isn't
| the fastest way to move forward. The first step (which I think
| they've taken) is for the executives to align on what the
| market wants. The second is to work out how to achieve it, the
| third to do it. Handing out freebies would probably help, but
| it'll take sustained long term strategy for AMD to make money.
|
| AMD's problem isn't low-level developer interest. The George
| Hotz video rant on AMD was enlightening - the interest is there
| and the official drivers just don't work. A few years ago I
| made an effort to get in to reinforcement learning as a hobby
| and was blocked by AMD crashes. At the time I assumed I'd done
| something wrong. I still believe that, but I'm less certain
| now. It is possible that the reason AMD is doing so poorly is
| just that their code to do BLAS is buggy.
|
| People get very excited about CUDA and maybe everything there
| is necessary, but on AMD the problem seems to be that the card
| can't reliably multiply matrices together. I got some early
| nights using Stable Diffusion because everything worked great
| for an hour then the kernel paniced. I didn't give AMD any
| feedback because I run an unsupported card and OS - effectively
| all cards and OSs are unsupported - but if that is widespread
| behaviour it would be a grave blocker.
|
| I think they are serious now though. The ROCM documentation
| dropped a lot of infuriating corporate waffle recently and that
| is a sign that good people are involved. Still going to wait
| and see before getting too hopeful that it works out well.
| jacquesm wrote:
| > Flashy knee-jerk moves like that sound good but it isn't
| the fastest way to move forward.
|
| NVidia:
|
| - Games -> we're on it
|
| - Machine learning -> we're on it
|
| - Crypto -> we're on it
|
| - LLM / AI -> we're on it
|
| Compare the growth rate of NVidia vs AMD and you get the
| picture. Flashy knee-jerk moves are bad, identifying growth
| segments in your industry and running with them is
| _excellent_ strategy.
|
| People get excited about CUDA _because it works_ , and AMD
| could have had a very large slice of that pie.
|
| > on AMD the problem seems to be that the card can't reliably
| multiply matrices together. I got some early nights using
| Stable Diffusion because everything worked great for an hour
| then the kernel paniced. I didn't give AMD any feedback
| because I run an unsupported card and OS - effectively all
| cards and OSs are unsupported - but if that is widespread
| behaviour[sic] it would be a grave blocker.
|
| Exactly. And with NVIDIA you'd be working on your problem
| instead. And that's what makes the difference. AMD should do
| exactly what the OP wrote: gain mindshare by getting at least
| some researchers on board with their product, assuming they
| haven't burned their brand completely by now.
| seunosewa wrote:
| NVIDIA is focused on graphic cards. AMD has the tough CPU
| market to worry about.
| jacquesm wrote:
| That's AMD's problem to solve, they made that choice.
|
| NV doesn't have to worry about resource allocation,
| branding etc. AMD could copy that by spinning out it's
| GPU division. Note that 'graphic cards' is no longer a
| proper identifier either, they just happen to have
| display connectors on them (and not even all of them).
| They're more like co-processors that you may also use to
| generate graphics. But I'm not even sure if that's the
| bulk of the applications.
| TheCleric wrote:
| Never half ass two things when you can whole ass one
| thing.
| gravypod wrote:
| If this turns around it will be amazing but ROCm isnt the only
| issue. The entire driver stack is important. If they came out
| with virtualization support for their gpus (even if everyone paid
| a 10% perf hit) they'd take over the cheap hosted gpu space which
| is a huge market.
| mindcrime wrote:
| Getting proper (and official) ROCm support across their
| consumer GPU line will be big as well. Hobbyists aren't buying
| MI300's and their ilk. And surely AMD is better off if a would
| be hobbyist (or low budget academic/industrial researcher)
| chooses a Radeon card over something from NVIDIA!
|
| I'm about to buy a high-end Radeon card myself, gambling that
| AMD is serious about this and will get it right, and that it
| won't be a wasted purchase. So yeah, if I seem like an AMD fan-
| boy (I am, somewhat) at least I'm putting my money where my
| mouth is. :-)
|
| _AMD's software stacks for each class of product are separate:
| ROCm (short for Radeon Open Compute platform) targets its
| Instinct data center GPU lines (and, soon, its Radeon consumer
| GPUs),_
|
| They've been saying this for a while, and I'm encouraged by
| reports that people "out there" in the wild have actually
| gotten this to work with some cards, even in advance of the
| official support shipping. So here's hoping they are really
| serious about this point and make this real.
| jauntywundrkind wrote:
| Apologies for the snark, but maybe it's better that _so far_
| AMD has had terrible consumer card support. What little
| hardware they have targeted seems to be barely stable &
| barely work for the very limited workloads that are
| supported. If regular consumers were told their GPUs would
| work for GPGPU, they might be rotten pissed when they found
| out what the real state of affairs is.
|
| But if AMD really wants a market impact - which is what this
| submission is about - getting good support across a decent
| range of consumer GPUs is absolutely required. They cannot
| win this ecosystem battle with only datacenter mindshare.
| auggierose wrote:
| Yeah, don't. Buy an Nvidia and get shit done.
| bryanlarsen wrote:
| Easier said than done, at least for H100.
| dotnet00 wrote:
| They're talking about consumer cards, which is the point.
| You can learn CUDA off any consumer nvidia card and have
| it translate to the fancier gear, that's part of why
| nvidia has so much mindshare.
|
| Eg I can write my cuda code with my 3090s, my boss can
| test it on his laptop's discrete graphics, and then after
| that we can take the time to bring it to our V100s and
| A100s and nothing really has to change.
| iforgotpassword wrote:
| A bit harsh but I agree in that I only believe it when I
| see it. Have been burned by empty promises by AMD before.
| capableweb wrote:
| For some people, it's not just about getting results or
| "get shit done" but about the journey and learning on the
| way there. Also, AMDs approach to openness tends to be a
| bit better than NVIDIA, so there's that too. And since
| we're on _Hacker_ News after all, an AMD GPU for the hacker
| betting on the future seems pretty fitting.
| bravetraveler wrote:
| For someone using Linux, an AMD card may be even better
| suited for 'getting things done'
|
| Wayland and many things _outside of GPGPU_ are much
| better; ie: power control /gating/monitoring are all
| available over _sysfs_. You can over /underclock a fleet
| of systems with traditional config management.
|
| GPGPU surely deserves some weight given the context of
| the thread, but let's not ignore the warts Nvidia shows
| elsewhere.
| mindcrime wrote:
| I get where you're coming from, and in fact I am planning
| to also build an NVIDIA based ML box as well. But I
| pointedly want to support AMD here for a variety of
| reasons, including an ideological bias towards Open Source
| Software, and a historical affinity for AMD that dates back
| to the mid 90's.
| Conscat wrote:
| AMD's debuggers and profilers let you disassemble
| kernel/shader machine code and introspect registers and
| instruction latency. That's something at least that Nvidia
| doesn't do with Nsight tools.
| jauntywundrkind wrote:
| Virtualization is such a key ability. I really really lament
| that it's been tucked away, in a couple specific products (The
| last MxGPU is, what, half a decade old? More? Oh I guess they
| finally spun off a new one, an RDNA2 V620!).
|
| I keep close & cherish a small hope that for some use-cases we
| might get a soft virtualization-alike that just works. I don't
| know enough to say how likely this is to adequately work, but
| in automotive & some other places there are nested Waylands,
| designed to share hardware. You still need a shared OS layer, a
| shared kernel, and a compositor that manages all the
| subdesktops - this isn't full virtualization - but
| hypothetically you get something very similar to
| virtualized/VDI gpus, if you can handle the constraints.
|
| This is really a huge huge huge shift that Wayland has
| potentially enabled, by actually using kernel resources like
| DMA-BUFs and what not, where apps can just allocate whatever &
| pass the compositor filehandles to the bufs. Wayland is ground
| up, unlike X's top down. So it's just a matter of writing
| compositors smart enough to push what data from whom needs to
| get rendered and sent out where.
|
| I would love to know more about what hardware virtualization
| really buys, know more about the limitations of what VDI is
| possible in software. But my hope is, in not too long, there's
| good enough VDI infrastructure that it's basically moot whether
| a gpu has hardware support. There will be some use cases where
| yes every users needs to run their own kernel & OS, and that
| won't be supported (albeit virtio might workaround even that
| quite effectively), but for 95% of use cases the more modern
| software stack might make this a non-issue. And at that point,
| these companies might stop having such expensive-ass product
| segmentation, charging 3x as much to have a couple hardware
| virtual devices, since in fact it costs them essentially
| nothing & the software virtualization is so competitive.
| 01100011 wrote:
| As far as I understand it, AMD basically has to do this because
| games are going to increasingly rely on LLMs & generative AI
| operating simultaneously with the graphics pipeline.
| imbusy111 wrote:
| It has nothing to do with games. The market outside of games
| for compute is much bigger at the moment with the AI hype, and
| AMD is positioned to take a good slice of it, if they get their
| software stack in order.
| alex21212 wrote:
| Rocm and amd drives me nuts. The lack of support for consumer
| cards and the hassle of getting basic things in pytorch to just
| work was too much.
|
| I was burned by support that never came for my 6800xt. Recently
| went back to NVIDIA with a 4070 for pytorch.
|
| I hope amd gets their act together with rocm but I'm not going to
| buy an AMD GPU until they do fix it rather than just vaguely
| promise to add support some day ...
| zucker42 wrote:
| Exactly. I recently started a NN side project. The process for
| setting up PyTorch was to run `pacman -S cuda` and `pip install
| torch`. I was using a GTX 1060. If it was a project with a
| bigger budget, I could have rented servers from AWS with all
| the software preinstalled in no time. I don't even know if it
| would have been possible for me to do it with AMD, even if I
| owned an AMD graphics card.
|
| People like me are small potatoes to AMD, but surely it's hard
| to make significant inroads when it's impossible for anyone to
| learn or do small projects on ROCM, and big projects can't rely
| on ROCM just working.
| jacquesm wrote:
| People like you are small potatoes until you have some
| measure of success and then suddenly you're burning up GPU
| hours by the truckload and whatever you're used to you will
| continue using.
| Tsiklon wrote:
| I think AMD need to do something BIG in the enterprise space. It
| seems Nvidia have the Lion's Share of the Market, but Intel have
| been making good strides there with their DC GPUs.
|
| The software stack is the key here. If the drivers aren't there
| it doesn't matter what paper capabilities your product has if you
| can't use it.
|
| AMD have on paper done well with performance in recent
| generations of consumer cards but their drivers universally seem
| to be the let down to making the most of their architecture.
| therealmarv wrote:
| they have! On one of the last keynotes in Summer they announced
| direct competitor to chips from Nvidia AI chips for
| enterprises: MI300X
|
| https://www.anandtech.com/show/18915/amd-expands-mi300-famil...
|
| Software stack is crucial of course but if you buy this kind of
| chips (means you have a lot of money) you probably can also
| optimise your stack for it for some extra bucks to not rely on
| Nvidia's supply.
| vegabook wrote:
| With all due respect this is an insult to those of us who have
| loyally purchased AMD for numerous years, trying our very best to
| do compute with days, nay weeks, of attempts.
|
| Now 5 years too late we get told its suddenly their number one
| priority.
|
| Too late. Not only has all goodwill gone, but it's in deep
| negative territory. Even 50% lower performance stacks like Intel
| / Apple are much more appealing than AMD will ever be at this
| stage.
| capableweb wrote:
| "senior VP of the AI group at AMD", said at a "AI Hardware
| Summit" that "My area is AMDs No. 1 Priority".
|
| Tell me when the rest of the company aligns with you and has
| started to show any results in providing a good experience for
| people to do machine learning with AMD. As it stands right now,
| there is so much tooling missing, and the tooling that's there is
| severely lacking.
|
| But, I have a faith. They've reinvented themselves with CPUs,
| multiple times, so why not with GPUs, again?
| mindcrime wrote:
| _Tell me when the rest of the company aligns with you_
|
| More or less the same message has been promulgated[1][2] by no
| less than Lisa Su[3], FWIW.
|
| [1]: https://www.phoronix.com/news/Lisa-Su-ROCm-Commitment
|
| [2]: https://www.forbes.com/sites/iainmartin/2023/05/31/lisa-
| su-s...
|
| [3]: https://en.wikipedia.org/wiki/Lisa_Su
| no_wizard wrote:
| The inevitable fight here is between ROCm which may have, 100s of
| AMD engineers working on it and related verticals, at best,
| without significant changes at the company, plus whatever
| contributions they can muster from the community.
|
| I think at least headcount check, CUDA had _thousands_ of
| engineers working on it and related verticals.
|
| I know there's a philosophy that states, eventually, open source
| eats everything, however, this one seems like there is so much
| catch up that AMD will need to spend big and fast to get off the
| ground competitively.
| [deleted]
| martinald wrote:
| It's absolutely mindboggling to me that AMD is still struggling
| so badly on this.
|
| There is an absolutely enormous market for AMD GPUs for this, but
| they seem to be completely stuck on how to build a developer
| ecosystem.
|
| Why aren't AMD throwing as many developers as possible submitting
| PRs for the open source LLM effort adding ROCm support, for
| example?
|
| It would give AMD real world insights to the problems with their
| drivers and SDKs as well, which are incredibly numerous.
|
| People would be willing to overlook a huge amount of jank for
| cheap(er) cards with large VRAM configurations. I don't think
| they when need to be particularly fast, just have the VRAM
| needed, which I'm sure AMD could put specialist cards together
| for.
| hedgehog wrote:
| Historically they believed that "the community" would address
| broader ML software support. I think the idea was they could
| assign dedicated engineers for bigger customers and together
| that was a sort of Pareto-goodish solution given their
| constraints as a company. Even in retrospect I'm not sure if
| that was a good call or not.
| Almondsetat wrote:
| I mean, they _would_ be right if all their cards, both
| consumer and enterprises, supported the same programming
| interface.
|
| You cannot trust the community to do the work for you but
| then only make the software available for $Xk dollar cards
| ryukoposting wrote:
| s/OpenCL/ROCm/g
| pixelpoet wrote:
| Oh man, this is exactly what I want to see on HN frontpage!
|
| I commented on another article about an AMD chip that had no
| OpenCL support that it made it dead in the water for me, and was
| downvoted; surely everyone understands how important CUDA is, and
| everyone should understand how important open standards are (e.g.
| FreeSync vs Nvidia's GSync), so I can't understand why more
| people don't share my zeal for OpenCL.
|
| I've shipped two commercial products based on it which still
| works perfectly today on all 3 desktop platforms from all GPU
| vendors... what's not to love?
| tysam_and wrote:
| If they can make a 288 GB $4.4-6.8k prosumer, home-computer-
| friendly graphics card, I will be extremely happy. Might be a
| pipe dream (today at least, lol, and standard in like...what, 5
| years?), but if they can pull that off, then I think things
| would really change a lot.
|
| I don't care if it's slow, bottom-of-the-barrel GDDR6, or
| whatever, just being able to enter the high-end model
| finetuning & training regime for ML models on a budget
| _without_ dilly-dallying with multiple graphics cards (a
| monstrous pain-in-the-neck from a software, engineering, &
| experimentation perspective)_ would enable so much large-scale
| development work to happen.
|
| The compute is extremely important, and in most day-to-day
| usecases, the memory bandwidth even moreso, but boy oh boy
| would I love to enter the world offered by a large unified card
| architecture.
|
| (Basically, in my experience, parallelizing a model across
| multiple GPUs is like compiling from code to a binary --
| technically you can 'edit' it, but it's like directly hex
| editing strings in a binary blob, extremely limited. Hence why
| I try to stick with models that take only a few seconds
| (minutes at most) to train on highly-representative tasks,
| distill first principles, and then expand and exploit that to
| other modalities from there).
| Conscat wrote:
| OpenCL isn't very useful now that we have Vulkan. Its biggest
| advantage is that there exist C++ compilers for its kernels.
| But AMD's OpenCL runtime inserts excessive memory barriers not
| required by the spec (they won't fix this due to Hyrum's Law)
| and Vulkan gives you more control over the memory allocation
| and synchronization anyways. If we had better Vulkan shader
| compilers, OpenCL would serve basically no purpose, at least
| for AMD hardware.
| cpill wrote:
| AI libs could use it and we'd break the bonds in CUDA. Also
| Rust might get an implementation which would give it they
| non-intervention to overtake C++
| pjmlp wrote:
| No it wouldn't, until it provides the same polyglot support
| and graphical tooling as CUDA.
|
| At least Intel is trying with oneAPI into that direction.
| raphlinus wrote:
| Yeah, that's a big if. In theory there's nothing preventing
| good compilation to Vulkan compute shaders, in practice
| people just aren't doing it, as CUDA actually works today.
|
| I also agree that Vulkan is more promising than OpenCL. With
| recent extensions, it has real pointers (buffer device
| address), cooperative matrix multiplication (also known as
| tensor cores or WMMA), scalar types other than 32 bits,
| proper barrier (including device-scoped, needed for single
| pass scan), and other important features.
| 20k wrote:
| Its not that they're supporting buggy code, they just
| downgraded the quality of their implementation significantly.
| They made the compiler a lot worse when they swapped to rocm
|
| https://github.com/RadeonOpenCompute/ROCm-OpenCL-
| Runtime/iss... is the tracking issue for it filed a year ago,
| which appears to be wontfix largely because its a lot of work
|
| OpenCL still unfortunately supports quite a few things that
| vulkan doesn't, which makes swapping away very difficult for
| some use cases
| parl_match wrote:
| > I can't understand why more people don't share my zeal for
| OpenCL.
|
| When I last worked with it, it was difficult, unstable, and
| performed poorly. CUDA, on the other hand, has been nothing but
| good (at least). Well, nvidia pricing aside ;)
|
| OpenCL might be a lot better now, but for a lot of us, we
| remember when it was actively a bad choice.
| Vvector wrote:
| But is this just more BS from AMD?
|
| https://www.bit-tech.net/reviews/tech/cpus/amd-betting-every...
| AMD Betting Everything on OpenCL (2011)
| jjoonathan wrote:
| I'm pretty sure the NVDA pump finally convinced the AMD board
| / C-Suite to prioritize this, but it takes time to steer a
| big ship. I'm hopeful, but there are still bad incentives to
| jump the gun on announcements so I'll let others take the
| plunge first.
| kldx wrote:
| > I've shipped two commercial products based on it which still
| works perfectly today on all 3 desktop platforms from all GPU
| vendors... what's not to love?
|
| In my experience, if commercial products involved any sort of
| hand-optimized, proprietary OpenCL, one would be shocked by the
| lack of documentation and zero consistency across AMD's GPUs.
| Intel has SPIRV and Nvidia has PTX and this works pretty well.
| But some AMD cards support SPIR or SPIRV, and some don't and
| this support matrix keeps changing over time without a single
| source of truth.
|
| Throw in random segfaults inside AMD's OpenCL implementation
| and you have a fun day debugging!
|
| Dockerizing OpenCL on AMD is another nightmare I don't want to
| get into. Intel is literally installing the compute runtime and
| mapping `/dev/dri` inside the container. On paper, AMD has the
| same process but in reality I had to run `LD_DEBUG=binding` so
| many times just to figure out why AMD runtime breaks inside
| docker.
|
| There may be great upsides to AMD's hardware in other domains
| though
| jjoonathan wrote:
| For a long time, AMD promoted OpenCL as viable without it
| actually being viable. This leaves scars and resentment. Mine
| come from about 10 years ago. They run deep.
|
| I'm glad to hear your experience was better, but I'm fresh out
| of trust. This time, I need to see major projects in my
| application areas working on AMD _before_ I buy, because AMD
| has taught me that "trust us" and "just around the corner" can
| mean "10 years later and it still hasn't happened." I'm pretty
| sure that this time _is_ different, but the green tax is dirt
| cheap compared to learning this lesson the hard way, so I 'm
| letting others jump first this time.
| gdiamos wrote:
| Relevant, we deployed Lamini on hundreds of MI200 GPUs.
|
| Lisa tweet: https://x.com/LisaSu/status/1706707561809105331?s=20
|
| Lamini tweet:
| https://x.com/realSharonZhou/status/1706701693684154766?s=20
|
| Blog: https://www.lamini.ai/blog/lamini-amd-paving-the-road-to-
| gpu...
|
| Register:
| https://www.theregister.com/2023/09/26/amd_instinct_ai_lamin...
| CRN: https://www.crn.com/news/components-peripherals/llm-
| startup-...
|
| The hard part about using any AI Chips other than NVIDIA has been
| software. ROCm is finally at the point where it can train and
| deploy LLMs like Llama 2 in production.
|
| If you want to try this out, one big issue is that software
| support is hugely different on Instinct vs Radeon. I think AMD
| will fix this eventually, but today you need to use Instinct.
|
| We will post more information explaining how this works in the
| next few weeks.
|
| The middle section of the blog post above includes some details
| including GEMM/memcpy performance, and some of the software
| layers that we needed to write to run on AMD.
| mardifoufs wrote:
| What's the cost benefit vs. Nvidia? Is it cheaper?
| light_hue_1 wrote:
| You simply cannot buy nvidia GPUs at scale at the moment.
| We're getting quotes that are many months out, sometimes even
| a year+ out.
| gdiamos wrote:
| We kept hearing 52 weeks for new shipments.
| gdiamos wrote:
| Available in orders of up to 10,000 GPUs today - no shortage
|
| More than 10x cheaper than allocating machines on a tier 1
| cloud - AWS, Azure, GCP, Oracle, etc
|
| More memory - 128GB HBM per GPU - means bigger models fit for
| training/inference without the nightmare of model parallelism
| over MPI/infiniband/etc
|
| Longer term - finetuning optimizations
| mardifoufs wrote:
| Ah! The memory sounds interesting. How would that compare
| to similar Nvidia hardware w.r.t cost assuming the hardware
| was available?
|
| Does AMD provide something similar to nvlink, and even
| libraries like cudnn?
|
| Also, last I checked none of the public clouds offered any
| of the latest gens MI GPUs, so I wasn't aware that it had
| good availability! Azure had a preview but I'll look more
| into it now.
|
| Thank you for your answer btw!
| gdiamos wrote:
| Yeah getting around the no public cloud thing was really
| annoying. We had to build our own datacenter.
|
| On the plus side, it was drastically cheaper and now we
| can just slot in machines.
|
| I would prefer that a tier 1 cloud made MI GPUs available
| though. It would make it so much more accessible.
| gdiamos wrote:
| See the memory size comparison (GB) in this table: https:
| //en.wikipedia.org/wiki/List_of_Nvidia_graphics_proces...
| tbihl wrote:
| It blows my mind that A100 and H100 are each safely below
| 1000W power draw.
| gardnr wrote:
| The classic economic benefits of competition:
|
| * Drives down price
|
| * Enhances product features (I see them competing on VRAM
| first)
|
| * Helps to insulates buyers from supply issues
|
| Nvidia has kneecapped their consumer grade hardware to ensure
| the gaming market still has scraps to buy in spite of crypto
| mining and the AI gold rush. All AMD would have to do to eat
| into Nvidia marketshare is remove the hardware locks in low-
| end cards and ship one with 64GB+ of VRAM.
|
| This of course would only work if they have comparable/usable
| software support. Any improvements to ROCm will be a boon for
| any company that doesn't already have or can't afford huge
| farms of high-end Nvidia chips.
| jauntywundrkind wrote:
| > _If you want to try this out, one big issue is that software
| support is hugely different on Instinct vs Radeon. I think AMD
| will fix this eventually, but today you need to use Instinct._
|
| I'm really really worried about AMD, and whether they're going
| to care about anyone else. They might just care about Instinct,
| where margins are so high, and ignore consumer cards or making
| more friction and segmentation for consumer cards.
|
| Part of what made CUDA so successful was that the low hardware
| barrier to entry created such a popular offering. Everyone used
| it. I really hope AMD realizes that, and really hope AMD
| invests in consumer card software too. Just making it work on
| the high end doesn't seem enough to get the kind of mass-
| movement ecosystem success AMD really needs. I'm afraid they
| might go for a smaller win, try to compete only at the top.
| dotnet00 wrote:
| It's nice to hear that there are actual results to show, since
| AMD execs simply saying that ROCm is a priority isn't really
| convincing anymore given their track record on claims regarding
| support on the consumer side.
| viewtransform wrote:
| The difference this time is that the executive is from
| Xilinx. Xilinx has had an AI software development team for a
| while in the FPGA space.
|
| AMD has had poor management in the GPU computing space since
| Raja Koduri's time (he put the best engineering resources on
| VR during his tenure and ignored deep learning). Subsequent
| directors have not had a long term vision and left within a
| few years.
|
| Looks like Lisa Su has corrected this now - they seem to have
| moved AMD software engineers en masse to work under Xilinx
| management on AI. Remains to be seen if this new management
| hierarchy will have a better vision and customer focus.
| varelse wrote:
| [dead]
| tbruckner wrote:
| I would really hope you could get decent utilization on ops as
| fundamental as GEMM/memcpy on a single device. Translating that
| to MFU is a completely different story.
| gdiamos wrote:
| We get good utilization at scale as well. Typically 30-40% of
| peak at the full application level for training and
| inference.
|
| Perf isn't the biggest problem though, many AI chips can do
| this or a bit better on benchmarks, if you invest the
| engineering time to tune the benchmark.
|
| The really hard part is getting a complete software stack
| running.
|
| It took us over 3 years because many of the layers just
| didn't exist, e.g. scale out LLM inference service that
| supports multiple requests with fine-grained batching across
| models distributed over multiple GPUs.
|
| On Instinct, ROCm gets you the ability to run most pytorch
| models on one GPU assuming you get the right drivers,
| compilers, framework builds, etc.
|
| That's a good start, but you need more to serve a real
| application.
| mgaunard wrote:
| People have been using their GPGPUs for decades on a
| variety of scientific applications, and there are all kinds
| of hybrid and multi-device frameworks that exist (often
| supporting multiple backends).
|
| The difference is that it didn't get a lot of love as part
| of the overhyped python LLM movement.
| gdiamos wrote:
| Completely agree, I'd love to see some of the innovations
| from HPC move over into their LLM stack.
|
| We are working on it, but it takes time.
|
| Contributions to foundational layers like ROCBlas,
| pytorch, slurm, Tensile, huggingface, etc would help.
| dauertewigkeit wrote:
| With all this hype about CUDA, I have recently started looking
| into programming CUDA as a job as I love that kind of challenge,
| but to my dismay I found that these tasks are very niche. So it
| is not even that people are routinely writing new CUDA code. It's
| just that the current corpus is too big and comprehensive for
| alternatives to compete with.
| jacquesm wrote:
| That and a massive amount of experience already out there on
| how to optimize for that particular architecture. NVidia has
| done well for itself on the back of four sequential very good
| bets coupled with dedication unmatched by any other vendor,
| both on the hardware and on the software side. It also was one
| of the few times that I didn't care if I ran the vendor
| supplied closed source stuff because it seemed to work just
| fine and I never had the feeling they would suddenly drop
| support for my platform.
| coder543 wrote:
| Specialized skills can have a fairly small job market
| sometimes. I think a lot of CUDA code ends up being
| foundational as part of popular libraries, supporting tons of
| applications that never need to write a single line of CUDA
| themselves.
| ckastner wrote:
| The Debian ROCm Team [1] has made quite a bit of progress in
| getting the ROCm stack into the official Archive.
|
| Most components are already packaged, the next big target is
| adding support to the PyTorch package.
|
| Many of the packages are older versions; this is because getting
| broad coverage was prioritized. The other next big target that is
| currently being worked on is getting full ROCm 5.7 support.
|
| I fully expect Debian 13 (trixie) to come with full ROCm support
| out-of-the-box, and as a consequence, also derivatives to have
| support (Ubuntu above all). In fact, there will almost certainly
| be backports of ROCm 5.7 to Debian 12 (bookworm) within the next
| few months, so one will be able to just $ sudo
| apt-get install pytorch-rocm
|
| One current obstacle is infrastructure: the Debian build and CI
| infrastructures (both hardware and software) were not designed
| with GPUs in mind. This is also being worked on.
|
| Edit: forgot to say that the CI infra that the Team is setting up
| here tests all of these packages on consumer cards, too. So while
| there may not be _official_ support for most of these, upstream
| tests passing on the cards within the infra should be a good
| indication for _practical_ support.
|
| [1] https://salsa.debian.org/rocm-team/
| avcxz wrote:
| I'd also like to point out that ROCm has been packaged for Arch
| Linux since the beginning of 2023, with efforts starting since
| March 2020 [1].
|
| Currently on Arch Linux you can run the following successfully:
| $ sudo pacman -S python-pytorch-rocm
|
| Arch Linux even has ROCm support with blender.
|
| [1] https://github.com/rocm-arch
| mgaunard wrote:
| AMD has a history of providing sub-par software, and their
| strategy of (partially) opening up their specifications and have
| other people write it for free didn't work either.
|
| Nvidia has huge software teams, and so does Intel.
| mindcrime wrote:
| I don't know if they'll ultimately succeed or not, but they at
| least seem to be putting genuine effort into this. ROCm
| releases are coming out at a relatively nice clip[1], including
| a new release just a week or two ago[2].
|
| [1]: https://github.com/RadeonOpenCompute/ROCm/releases
|
| [2]: https://www.phoronix.com/news/AMD-ROCm-5.7-Released
| Vvector wrote:
| Yeah, AMD is doing more with ROCm. But are they catching up
| to Nvidia, or just not falling behind as fast as before? Only
| time will tell
| dagw wrote:
| Not only sub-par software, but sub-par software that they drop
| support for after a couple of years. People can work around the
| problems with sub-par software if they believe that it will
| benefit them long term. They will absolutely not put in the
| effort if they fear it will be completely useless in 2 years
| time.
| raphlinus wrote:
| ROCm makes me sad, as it reminds me of how much better GPUs could
| be than they are today.
|
| I've lately been exploring the idea of a "Good Parallel
| Computer," which combines most of the agility of a CPU with the
| efficient parallel throughput of a GPU. The central concept is
| that the decision to launch a workgroup is made by a programmable
| controller, rather than just being a cube of (x, y, z) or
| downstream of triangles. A particular workload it would likely
| excel at is sparse matrix multiplication, including multiple
| quantization levels like SpQR[1]. I'm hopeful that it could be an
| advance in execution model, but also a simplification, as I
| believe a lot of the complexity of the current GPU model is
| because of lots of workarounds for the weak execution model.
|
| I'm not optimistic about this being built any time soon, as it
| requires rethinking the software stack. But it's fun to think
| about. I might blog about it at some point, but I'm also
| interested in connecting with people who have been thinking along
| similar lines.
|
| [1]: https://arxiv.org/abs/2306.03078
| johncolanduoni wrote:
| How does this differ from CUDA's dynamic parallelism, which
| lets you launch kernels from within a kernel?
| raphlinus wrote:
| There are a lot of similarities, but the granularity is
| finer. The idea is that you make a decision to launch one
| workgroup (typically 1024 threads) when the input is
| available, which would typically be driven by queues, and
| potentially with joins as well, which is something the new
| work graph stuff can't quite do. Otherwise the idea of stages
| running in parallel, connected by queues, is similar. But I
| did an analysis of work graphs and came to the conclusion
| that it wouldn't help with the Vello (2d vector graphics)
| workload at all.
| JonChesterfield wrote:
| A workgroup/kernel can launch other ones without talking to the
| host. Like cuda's dynamic thing except with no nested lifetime
| restrictions. This is somewhat documented under the name HSA.
|
| Involves getting a pointer to a HSA queue and writing a
| dispatch packet to it. Same interface the host has for
| launching kernels - easier in some ways (you've got the kernel
| descriptor as a symbol, not as a name to dlsym) and harder in
| others (dynamic memory allocation is a pain).
| raphlinus wrote:
| Yeah, dynamic memory allocation from GPU space seems to be
| the real sticking point. I'll look into HSA queues, that
| looks very interesting, thanks.
___________________________________________________________________
(page generated 2023-09-26 23:00 UTC)