[HN Gopher] AMD may get across the CUDA moat
___________________________________________________________________
AMD may get across the CUDA moat
Author : danzheng
Score : 218 points
Date : 2023-10-06 17:35 UTC (5 hours ago)
(HTM) web link (www.hpcwire.com)
(TXT) w3m dump (www.hpcwire.com)
| pama wrote:
| There is only limited empirical evidence of AMD closing the gap
| that NVidia has created in the science or ML software. Even when
| considering pytorch only, the engineering effort to maintain
| specialized ROCm along with CUDA solutions is not trivial (think
| flashattention, or any customization that optimizes your own
| model). If your GPUs only need a simple ML workflow all times for
| a few years nonstop, maybe there exist corner cases where the
| finances make sense. It is hard for AMD now to close the gap
| across the scientific/industrial software base of CUDA. NVidia
| feels like a software company for the hardware they produce;
| luckily they make the money from hardware thus cannot lock the
| software libraries.
|
| (Edited "no" to limited empirical evidence after a fellow user
| mentioned El Capitan.)
| Certhas wrote:
| The fact that El Capitan is AMD says that at least for
| Science/HPC there definitely is evidence of a closing gap.
| pama wrote:
| Thanks. You are actually right that this new supercomputer
| might move the needle once it is in production mode. I will
| wait and see how it goes.
| fotcorn wrote:
| ROCm has HIP (1) which is a compatibility layer to run CUDA
| code on AMD GPUs. In theory, you only have to adjust #includes,
| and everything should just work, but as usual, reality is
| different.
|
| Newer backends for AI frameworks like OpenXLA and OpenAI Triton
| directly generate GPU native code using MLIR and LLVM, they do
| not use CUDA apart from some glue code to actually load the
| code onto the GPU and get the data there. Both already support
| ROCm, but from what I've read the support is not as mature yet
| compared to NVIDIA.
|
| 1: https://github.com/ROCm-Developer-Tools/HIP
| binarymax wrote:
| And the question for most that remains once AMD catches up: will
| the duopoly result in lower prices to a reasonable level for
| hobbyists or bootstrapped startups, or will AMD just gouge like
| NVidia?
| evanjrowley wrote:
| AMD prices will go up because of the newfound ability to gouge
| for AI/ML/GPGPU workloads. Nvidia's will likely go down, but I
| don't expect it will be by much. The market demand is high, so
| the equilibrium price will also be high. Supply isn't at
| pandemic / crypto-rush lows, but the supply of cards useful for
| CUDA/ROCm still is.
| klysm wrote:
| A simplistic economic take would suggest that the competition
| would result in lower prices, but given two players in the
| market who knows.
| sumtechguy wrote:
| It is oligopoly pricing.
|
| https://www.investopedia.com/terms/o/oligopoly.asp
|
| With that few competitors pricing would not change much.
| ad404b8a372f2b9 wrote:
| Prices seemed to have lowered when AMD came out with CPUs
| competitive with Intel's.
| tibbydudeza wrote:
| Price difference between 13900K and AMD Ryzen 9 7950x is
| not big - the latest 7950X3D is about on par with the
| higher clocked 13900KS as well.
| redeeman wrote:
| because intel lowered their prices
| AnthonyMouse wrote:
| That's mostly when there isn't a lot of price elasticity of
| demand. If you're Comcast and Verizon, each customer wants
| one internet connection and you're not going to change the
| size of the market much by offering better prices.
|
| If you're AMD and NVIDIA and lowering the price would
| double the number of customers, you might very well want to
| do that, unless you're supply constrained -- which has been
| the issue because they're both bidding against everyone
| else for limited fab capacity. But that should be
| temporary.
|
| This is also a market with a network effect. If all your
| GPUs are $1000 and nobody can afford them then nobody is
| going to write code for them, and then who wants them? So
| the winning strategy is actually to make sure that there
| are kind of okay GPUs available for less than $300 and make
| sure lots of people have them, then sell very expensive
| ones that use the same architecture but are faster.
|
| That has been the traditional model, but the lack of
| production capacity meant that they've only been making the
| overpriced ones recently. Which isn't actually in their
| interests once the supply of fab capacity loosens up.
| binarymax wrote:
| My intuition is along the lines that if AMD had a competing
| product earlier, then it would have kept prices down. But
| since Nvidia has shown what the market will pay, AMD won't be
| able to resist overcharging. It will probably come down a
| little, but nowhere near to the point of affordability.
|
| I sure hope I'm wrong.
| tyre wrote:
| AMD might have to charge less to break into customers that
| are already bought into Nvidia. There has to be a discount
| to cover the switching costs + still provide savings (or
| access).
| zirgs wrote:
| AMD will have to provide a REALLY steep discount to
| convince me to come back.
| wil421 wrote:
| Why would their investors allow anything else? I'm sure they
| see it as a huge loss like intel and mobile.
| quitit wrote:
| I think in this case the changes needed to make AMD useful will
| open the market to other players as well (e.g. Intel).
|
| PyTorch is already walking down this path and while CUDA-based
| performance is significantly better, that is changing and of
| course an area of continued focus.
|
| It's not that people don't like Nvidia, rather it's just that
| there is a lot of hardware out there that can technically
| perform competitively, but the work needs to be done to bring
| it into the circle.
| binarymax wrote:
| Last I checked I saw the H100 was about two gens more
| advanced for certain components (tensor cores, bfloats,
| cache, mem bandwidth) - but my research may have been wrong
| as admittedly I'm not as familiar with AMDs offerings for
| GPU.
| FuriouslyAdrift wrote:
| They are not behind...
| https://www.tomshardware.com/news/amd-expands-mi300-with-
| gpu...
|
| You can also actually buy them as opposed to the nVidia
| offerings which you are going to have to fight for.
| rdsubhas wrote:
| Demand will push AMD prices up by couple hundred bucks and
| Nvidia cards down by couple hundred bucks. A hobbyist customer
| will be neither better or worse.
| rafaelmn wrote:
| If the margins and demand is there Intel will eventually show
| up
| Havoc wrote:
| Is either in doubt?
| rafaelmn wrote:
| Wouldn't be surprised if a bunch of investment is hype
| bubble and demand correction forces price correction. Maybe
| not immediately but at Intel's pace - they managed to miss
| out on mining bubble, wouldn't be surprised for them to
| release in a correction.
| wmf wrote:
| Intel already showed up three or four times but their
| software is as bad as AMD's used to be.
| ilc wrote:
| Thankfully, software can be fixed over time as AMD has
| shown. Lack of another competitor can't be fixed as easily.
| ris wrote:
| I don't understand the author's argument (if there is one) -
| pytorch has existed for ages. AMD's Instinct MI* range has
| existed for years now. If these are the key ingredients why has
| it not already happened?
| jiggawatts wrote:
| Can I buy an MI300 or even rent one in a cloud?
| pjmlp wrote:
| Unless they get their act together regarding CUDA polyglot
| tooling, I seriously doubt it.
| javchz wrote:
| CUDA is the only reason I have an Nvidia card, but if more
| projects start migrating to a more agnostic environment, I'll be
| really grateful.
|
| Running Nvidia in Linux isn't as much fun. Fedora and Debian can
| be incredibly reliable systems, but when you add an Nvidia card,
| I feel like I am back in Windows Vista with kernel crashes from
| time to time.
| kombine wrote:
| I use a rolling distro (OpenSUSE Tumbleweed) and have had zero
| issues with my NVIDIA card despite it pulling the kernel and
| driver updates as they get released. The driver repo is
| maintained by NVIDIA itself, which is amazing.
| filterfiber wrote:
| Do you use wayland, multiple monitors, and/or play games or
| is it just for ML/AI?
| smoldesu wrote:
| I do all of those things with my 3070 and it works just
| fine. Most of them will depend on your DE's Wayland
| implementation.
|
| I'm not here to desparage anyone experiencing issues, but
| my experience on the NixOS rolling-release channel has also
| been pretty boring. There was a time when my old 1050 Ti
| struggled, but the modern upstream drivers feel just as
| smooth as my Intel system does.
| smoldesu wrote:
| Those problems might just be GNOME-related at this point. I've
| been daily-driving two different Nvidia cards for ~3 years now
| (1050 Ti then 3070 Ti) and Wayland has felt pretty stable for
| the past 12 months. The worst problem I had experienced in that
| time was Electron and Java apps drawing incorrectly in
| xWayland, but both of those are fixed upstream.
|
| I'm definitely not against better hardware support for AI, but
| I think your problems are more GNOME's fault than Nvidia's.
| KDE's Wayland session is almost flawless on Nvidia nowadays.
| arsome wrote:
| If GNOME can tank the kernel, it ain't GNOME's fault.
| kombine wrote:
| I really hope that with KDE 6 I can finally switch to
| Wayland!
| PH95VuimJjqBqy wrote:
| I see these complains from time to time and I never understand
| them.
|
| I've literally been running nvidia on linux since the TNT2 days
| and have _never_ had this sort of issue. That's across many
| drivers and many cards over the many many years.
| ant6n wrote:
| Well tnt2 should be pretty well supported by now ;-)
| PH95VuimJjqBqy wrote:
| lmao, touche :)
| temp0826 wrote:
| I understand it, but I also haven't had any trouble since I
| figured out the right procedure for me on fedora (which
| probably took some time, but it's been so long that I can't
| remember). Whenever I read people having issues it sounds
| like they are using a package installed via dnf for the
| driver/etc. I've always had issues with dkms and the like and
| just install the latest .run from nvidia's website whenever I
| have a kernel update (I made a one-line script to call it
| with the silent option and flags for signing for secure boot
| so I don't really think about it). No issues in a very long
| time even with the whackiness of prime/optimus offloading on
| my old laptop.
| PH95VuimJjqBqy wrote:
| actually, it's a good point because that's how I always
| install nvidia drivers as well. Never from the local
| package manager.
| einpoklum wrote:
| I have been NVIDIA cards for compute capabilities only, both
| personally and at work, for nearly a decade. I've had dozens
| and dozens of different issues involving the hardware, the
| drivers, integration with the rest of the OS, version
| compatibilities, ensuring my desktop environment doesn't try
| to use the NVIDIA cards, etc. etc.
|
| Having said that - I (or rarely, other people) have almost
| always managed to work out those issues and get my systems to
| work. Not in all cases though.
| jjoonathan wrote:
| Same but linux experience is a steep and bumpy function of
| hardware.
|
| My guess: something like laptop GPU switching failed badly in
| the nvidia binary, earning it a reputation.
| HideousKojima wrote:
| That was my experience, Nvidia Optimus (which is what
| allows dynamic switching between the integrated and
| dedicated GPU in laptops) was completely broken (as in a
| black screen, not just crashes or other issues) for several
| years, and Nvidia didn't care to do anything about it.
| PH95VuimJjqBqy wrote:
| I don't run laptops except when work requires it and that
| tends to be windows so that may explain the difference in
| experience.
| lhl wrote:
| Yeah, Optimus was a huge PITA. I remember fighting with
| workarounds like bumblebee and prime for years. Also
| Nvidia dragged their feet on Wayland support for a few
| years too (and simultaneously was seemingly intent on
| sabotaging Nouveau).
| distract8901 wrote:
| I tried bumblebee again recently, and it works shockingly
| well now. I have a thinkpad T530 from 2013 with an
| NVS5400m.
|
| There is some strange issue with some games where they
| don't get full performance from the dGPU, but more than
| the iGPU. I have to use optirun to get full performance.
|
| It also has problems when the computer wakes from sleep.
| For whatever reason, hardware video decoding doesn't work
| after entering standby. Makes steam in home streaming
| crash on the client, but flipping to software decoding
| usually works fine.
|
| The important part is that battery life is almost as good
| with bumblebee as it is with the dGPU turned off. No more
| fucking with Prime or rebooting into BIOS to turn the GPU
| back on.
| wubrr wrote:
| Yeah, nvidia linux support is meh, but still much better than
| amd.
| silisili wrote:
| In the closed source days of fglrx or whatever it's called
| I'd agree. Since they went open source, hard disagree. AMD
| graphics work in Linux about as well as Intel always has.
| phkahler wrote:
| >> Yeah, nvidia linux support is meh, but still much better
| than amd.
|
| Can not confirm. I used nvidia for years when it was the only
| option. Then used the nouveau driver on a well supported card
| because it worked well and eliminated hassle. Now I'm on AMD
| APU and it just works out of the box. YMMV of course. We do
| get reports of issues with AMD on specific driver versions,
| but I can't reproduce.
| bryanlarsen wrote:
| Not my experience. The open source AMD drivers are much more
| pleasant to deal with than the closed source Nvidia ones.
| acomjean wrote:
| As someone who was tasked with trying to get nvidia working
| on Ubuntu, it's a pretty terrible experience.
|
| I have a nvidia laptop with popos. That works well.
| Zambyte wrote:
| Is it better than AMD? I have had literally no graphics
| issues on my 6650 XT with swaywm using the built in kernel
| drivers.
| christkv wrote:
| I think the problems are pro drivers and the issues with
| ROCm being buggy not the open source graphics drivers.
| treprinum wrote:
| I never had an issue with nVidia drivers on Linux in the
| past 5 years, but recently bought a laptop with a 4090 and
| AMD CPU. Now I get random freezes, often right after I
| login into Cinnamon but can't really tell if it's the
| nVidia driver for 4090, AMDGPU driver for integrated RDNA,
| kernel 6.2 or Cinnamon issue. The laptop just hangs and
| stops responding to keyboard so I can't login to console
| and dmesg it.
| SoftTalker wrote:
| The main issue with Nvidia on Linux AIUI is that they
| don't release the source code for their drivers.
| treprinum wrote:
| That might be a philosophical problem that never
| prevented me from training models on Linux. The half-
| baked half-crashing AMD solutions just lead to wasting
| time I can spend on ML research instead.
| aseipp wrote:
| This week I upgraded my kernel on a 2017 workstation to
| 6.5.5 and when I rebooted and looked at 'dmesg' there were
| no less than 7 kernel faults with stack traces in my
| 'dmesg' from amdgpu. Just from booting up. This is a no-
| graphical-desktop system using a Radeon Pro W5500, which is
| 3.5 years old (I just had the card and needed something to
| plug in for it to POST.)
|
| I have come to accept that graphics card drivers and
| hardware stability ultimately comes down to whether or not
| ghosts have decided to haunt you.
| HansHamster wrote:
| Guess I'm also doing something wrong. Never had any serious
| issues with either Nvidia or AMD on Linux (and only a few
| annoyances on RNDA2 shortly after release)...
| distract8901 wrote:
| My Arch system would occasionally boot to a black screen. When
| this happened, no amount of tinkering could get it back. I had
| to reinstall the whole OS.
|
| Turns out it was a conflict between nvidia drivers and my (10
| year old) Intel integrated GPU. But once I switched to an AMD
| card, everything works flawlessly.
|
| Ubuntu based systems barely worked at all. Incredibly unstable
| and would occasionally corrupt the output and barf colors and
| fragments of the desktop all over my screens.
|
| AMD on arch has been an absolute delight. It just. Works. It's
| more stable than nvidia on windows.
|
| For a lot of reasons-- but mainly Linux drivers-- I've totally
| sworn off nvidia cards. AMD just works better for me.
| fluxem wrote:
| I call it the 90% problem. If AMD works for 90% of my projects, I
| would still buy NVIDIA, which works for 100%, even though I'm
| paying a premium
| hot_gril wrote:
| I'm lazy, so it's 99% for me. I don't even mess with AMD CPUs;
| I know they're not _exactly_ the same instruction set as Intel,
| and more importantly they work with a different (and less
| mainstream) set of mobos, so I don 't want em. If AMD manages
| to pull more customers their way, that's great, it just means
| lower Intel premium for me.
| Zetobal wrote:
| They are just too late even if they catch up. Until they make a
| leap like they did with ryzen nothing will happen.
| Havoc wrote:
| >They are just too late even if they catch up.
|
| Late certainly, too late I don't think so.
|
| If you can field a competitively priced consumer card that can
| run llama fast then you're already halfway there because then
| the ecosystem takes off. Especially since nvidia is being
| really stingy with their vram amounts.
|
| H100 & datacenter is a separate battle certainly, but on
| mindshare I think some deft moves from AMD will get them there
| quite fast once they pull their finger out their A and actually
| try sorting out the driver stack.
| dylan604 wrote:
| >If you can field a competitively priced consumer card
|
| if this unicorn were to show up, what's to say that all the
| non-consumers won't just scarf up these equally performant
| yet lower priced cards causing the supply-demand situation
| we're in now? the only difference would be a sudden supply of
| the expensive Nvidia cards that nobody wants because of their
| price.
| AnthonyMouse wrote:
| The thing that causes it to be competitively priced is
| having enough production capacity to prevent that from
| happening.
|
| One way to do that may be to produce a card on an older
| process node (or the existing one when a new one comes out)
| that has a lot of VRAM. There is less demand for the older
| node so they can produce more of them and thereby sell them
| for a lower price without running out.
| Havoc wrote:
| >if this unicorn were to show up
|
| A unicorn like that showed up a couple hours ago. Someone
| posted a guide for getting llama to run on a 7900xtx
|
| https://old.reddit.com/r/LocalLLaMA/comments/170tghx/guide_
| i...
|
| It's still slow and janky but this really isn't that far
| away.
|
| I don't buy that AMD can't make this happen if they
| actually tried.
|
| Go on fiverr, get them to compile a list of top 100 people
| in the DIY LLM space, send them all free 7900XTXs. Doesn't
| matter if half of it is wrong, just send it. Next take 1.2m
| USD, post a dozen 100k bounties against llama.cpp that are
| AMD specific - support & optimise the gear. Rinse and
| repeat with every other hobbyist LLM/stable diffusion
| project. A lot of these are zero profit open source /
| passion / hobby projects. If 6 figure bounties show up
| it'll absolute raise pulses. Next do all the big youtubers
| in the space - carefully on that one so that it doesn't
| come across as an attempted pay-off...but you want them to
| know that you want this space to grow and are willing to
| put your money where your mouth is.
|
| That'll cost AMD what 2m 3m? To move the needle on a multi
| billion market? That's the cheapest marketing you've ever
| seen.
|
| As I said the datacenter & enterprise market is another
| beast entirely full of moats and strategy, but I don't see
| why a suitably motivated senior AMD exec can't tackle the
| enthusiast market single handedly with a couple of emails,
| a cheque book and a tshirt that has the nike slogan on it.
|
| >what's to say that all the non-consumers won't just scarf
| up these equally performant yet lower priced cards
|
| It doesn't matter. They're in the business of selling
| cards. To consumers, to datacenters, to your grandmother.
| From a profit driven capitalist company the details don't
| matter as long as there is traction & volume. The above -
| opening up even the possibility of a new market - is gold
| in that perspective. And from a consumer perspective
| anything that breaks the nvidia cuda monopoly is a win.
| lhl wrote:
| llama.cpp, ExLlama, and MLC LLM have all had ROCm
| inferencing for months (here are a bunch of setup
| instructions I've written up, for Linux and Windows:
| https://llm-tracker.info/books/howto-guides/page/amd-gpus
| ) - but I don't think that's the problem (and wouldn't
| drive lots of volume or having downstream impact in any
| case).
|
| The bigger problem is on the training/research support.
| Eg, here's no official support for AMD GPUs for
| bitsandbytes, and no support at all for
| FlashAttention/FA2 (nothing that 100K in hardware/grants
| to Dettmers or Dao's labs wouldn't fix I suspect).
|
| The real elephant though is that AMD still having the
| disconnect that lack of support for consumer cards and
| home/academic devs in general has been disastrous (while
| Nvidia supports CUDA on basically every single GPU
| they've made since 2010) - just last week there was this
| mindblowing thread where it turns out an AMD employee is
| paying out of pocket for AMD GPUs to support build/CI for
| drivers on Debian. I mean, WTF, that's stupidity that's
| beyond embarrassing and gets into negligence terriroty
| IMO: https://news.ycombinator.com/item?id=37665784
| dylan604 wrote:
| >an AMD employee is paying out of pocket for AMD GPUs
|
| I hope he's at least getting an employee discount! I
| guess AMD is not a fan of the 20% concept either
| omneity wrote:
| I was able to use ROCm recently with Pytorch and after pulling
| some hair it worked quite well. The Radeon GPU I had on hand was
| a bit old and underpowered (RDNA2) and it only supported matmul
| on fp64, but for the job I needed done I saw a 200x increase in
| it/s over CPU despite the need to cast everywhere, and that made
| me super happy.
|
| Best of all is that I simply set the device to
| `torch.device('cuda')` rather than openCL, which does wonders for
| compatibility and to keep code simple.
|
| Protip: Use the official ROCM Pytorch base docker image [0]. The
| AMD setup is so finicky and dependent on specific versions of
| sdk/drivers/libraries and it will be much harder to make work if
| you try to install them separately.
|
| [0]:
| https://rocm.docs.amd.com/en/latest/how_to/pytorch_install/p...
| wyldfire wrote:
| > Best of all is that I simply set the device to
| `torch.device('cuda')` rather than openCL, which does wonders
| for compatibility
|
| Man oh man where did we go wrong that cuda is the more
| compatible option over OpenCL?
| KeplerBoy wrote:
| It must be a misnomer on PyTorch's side. Clearly it's neither
| CUDA nor OpenCL.
|
| AMD should just get it's shit together. This is ridiculous.
| Not the name, but the fact that you can only do FP64 on a
| GPU. Everybody is moving to FP16 and AMD is stuck on doubles?
| omneity wrote:
| I believe the fp64 limitation came from the laptop-grade
| GPU I had rather than inherent to AMD or ROCm.
|
| The API level I could target was at least two or three
| versions behind the latest they have to offer.
| KeplerBoy wrote:
| Might very well be true. I don't blame anyone for not
| diving deeper into figuring out why this stuff doesn't
| work.
|
| But this is one of the great strengths of CUDA: I can
| develop a kernel on my workstation, my boss can demo it
| on his laptop and we can deploy it on Jetsons or the
| multi-gpu cluster with minimal changes and i can be sure
| that everything runs everywhere.
| RockRobotRock wrote:
| Have you gotten it to work with Whisper by any chance?
| mikepurvis wrote:
| Sigh. It's great that these container images exist to give
| people an easy on-ramp, but they definitely don't work for
| every use case (especially once you're in embedded where space
| matters and you might not be online to pull multi-gb updates
| from some registry).
|
| So it's important that vendors don't feel let off the hook to
| provide sane packaging just because there's an option to use a
| kitchen-sink container image they rebuild every day from
| source.
| fwsgonzo wrote:
| I feel the same way, especially about build systems. OpenSSL
| and v8 are among a large list of things that have horrid
| build systems. Only way to build them sanely is to use some
| randos CMake fork, then it Just Works. Literally a two-liner
| in your build system to add them to your project with a sane
| CMake script.
| mikepurvis wrote:
| I was part of a Nix migration over the past two years, and
| literally one of the first things we checked is that there
| was already a community-maintained tensorflow+gpu package
| in nixpkgs because without that the whole thing would have
| been a complete non-starter, and we sure as heck didn't
| have the resources or know-how to figure it out for
| ourselves as a small DevOps team just trying to do basic
| packaging.
| amelius wrote:
| > So it's important that vendors don't feel let off the hook
| to provide sane packaging just because there's an option to
| use a kitchen-sink container image they rebuild every day.
|
| Sadly if e.g. 95% of their users can use the container, then
| it could make economical sense to do it that way.
| xahrepap wrote:
| I know it's still different than what you're looking for, so
| you probably already know this, but many projects like this
| have the Dockerfile on github which shows exactly how they
| set up the image. For example:
|
| https://github.com/RadeonOpenCompute/ROCm-
| docker/blob/master...
|
| They also have some for Fedora. Looks like for this you need
| to install their repo: curl -sL
| https://repo.radeon.com/rocm/rocm.gpg.key | apt-key add - \
| && printf "deb [arch=amd64]
| https://repo.radeon.com/rocm/apt/$ROCM_VERSION/ jammy main" |
| tee /etc/apt/sources.list.d/rocm.list \ && printf
| "deb [arch=amd64]
| https://repo.radeon.com/amdgpu/$AMDGPU_VERSION/ubuntu jammy
| main" | tee /etc/apt/sources.list.d/amdgpu.list \
|
| then install Python, a couple other dependencies (build-
| essential, etc) and then the package in question: rocm-dev
|
| So they are doing the packaging. There might even be
| documentation elsewhere for that type of setup.
| mikepurvis wrote:
| Oh yeah, I mean... having the source for the container
| build is kind of table stakes at this point. No one would
| accept a 10gb mystery meat blob as the basis of their
| production system. It's bad enough that we still accept
| binary-only drivers and proprietary libraries like
| TensorRT.
|
| I think my issue is more just with the _mindset_ that it 's
| okay to have one narrow slice of supported versions of
| everything that are "known to work together" and those are
| what's in the container and anything outside of those and
| you're immediately pooched.
|
| This is not hypothetical btw, I've run into real problems
| around it with libraries like gproto, where tensorflow's
| bazel build pulls in an exact version that's different from
| the default one in nixpkgs, and now you get symbol
| conflicts when something tries to link to the tensorflow
| c++ API while linking to another component already using
| the default gproto. I know these problems are solveable
| with symbol visibility control and whatever, but that stuff
| is far from universal and hard to get right, especially if
| the person setting up the build rules for the library
| doesn't themselves use it in that type of heterogeneous
| environment (like, everyone at Google just links the same
| global proto version from the monorepo so it doesn't
| matter).
| mathisfun123 wrote:
| > especially once you're in embedded
|
| is this a real problem? exactly which embedded platform has a
| device that ROCm supports?
| mikepurvis wrote:
| Robotic perception is the one relevant to me. You want to
| do object recognition on an industrial x86 or Jetson-type
| machine, without having to use Ubuntu or whatever the one
| "blessed" underlay system is (either natively or implicitly
| because you pulled a container based on it).
| mathisfun123 wrote:
| >industrial x86 or Jetson-type machine
|
| that's not embedded dev. if you
|
| 1. use underpowered devices to perform sophisticated
| tasks
|
| 2. using code/tools that operate at extremely high levels
| of "abstraction"
|
| don't be surprised when all the inherent complexity is
| tamed using just more layers of "abstraction". if that
| becomes a problem for your cost/power/space budget then
| reconsider choice 1 or choice 2.
| mikepurvis wrote:
| Not sure this is worth an argument over semantics, but
| modern "embedded" development is a lot bigger than just
| microcontrollers and wearables. IMO as soon as you're
| deploying a computer into any kind of "appliance", or
| you're offline for periods of time, or you're running on
| batteries or your primary network connection is
| wireless... then yeah, you're starting to hit the
| requirements associated with embedded and need to seek
| established solutions for them, including using distros
| which account for those requirements.
| mathisfun123 wrote:
| > IMO as soon as you're deploying a computer into any
| kind of "appliance", or you're offline for periods of
| time, or you're running on batteries or your primary
| network connection is wireless
|
| yes and in those instances you do not reach for
| pytorch/tensorflow on top of ubuntu on top of x86 with a
| discrete gpu and 32gb of ram. instead you reach for C and
| micro or some arm soc that supports baremetal or at most
| rtos. that's embedded dev.
|
| so i'll repeat myself: if you want to run extremely high-
| level code then don't be "surprised pikachu" when your
| underpowered platform, that you chose due to concrete,
| tight budgets doesn't work out.
| IronWolve wrote:
| Yup, thank the hobbyists. Pytorch is allowing other hardware.
| Stable diffusion working on m chips, intel arc, and Amd.
|
| Now what I'd like to see is real benchmarks for compute power.
| Might even get a few startups to compete in this new area.
| mattnewton wrote:
| Re: startups, Geohotz raised a few million for this already.
| https://tinygrad.org/
| nomel wrote:
| Obligatory Lex Fridman podcast, where he discusses it:
| https://youtu.be/dNrTrx42DGQ?t=2408
| IntelMiner wrote:
| Didn't he do what he always does. Rake in a ton of money,
| fart around and then cash out exclaiming it's everyone else's
| fault?
|
| The way he stole Fail0verflow's work with the PS3 security
| leak after failing to find a hypervisor exploit for months
| absolutely soured any respect I had for him at the time
| adastra22 wrote:
| Wow, TIL
| throwitawayfam wrote:
| Yep, did exactly that. IMO he threw a fit, even though AMD
| was working with him squashing bugs. https://github.com/Rad
| eonOpenCompute/ROCm/issues/2198#issuec...
| [deleted]
| nomel wrote:
| To be fair, kernel crashes from running an AMD provided
| demo loop isn't something he should have to work with
| them on. That's borderline incompetence. His perspective
| was around integration into his product, where every AMD
| bug is a bug in his product. They deserve criticism, and
| responded accordingly (actual resources to get their shit
| together). It's not like GPU accelerated ML is some new
| thing.
| aeyes wrote:
| He's back on it after getting AMD's CEO to commit
| resources to this:
|
| https://twitter.com/realGeorgeHotz/status/166980346408248
| 934...
|
| https://twitter.com/LisaSu/status/1669848494637735936
| kinematikk wrote:
| Do you have a source on the stealing part? A quick Google
| search didn't result in anything
| IntelMiner wrote:
| Marcan (of Asahi Linux fame) has talked about it _many_
| times before. But an abridged version
|
| Fail0verflow demoed how they were able to derive the
| private signing keys for the Sony Playstation 3 console
| at I believe CCC
|
| Geohot after watching the livestream raced into action to
| demo a "hello world!" jailbreak application and
| absolutely stole their thunder without giving any credit
| mandevil wrote:
| It isn't the hobbyists who are making sure that PyTorch and
| other frameworks runs well on these chips, but teams of
| engineers who work for NVIDIA, AMD, Intel, etc. who are doing
| this as their primary assigned jobs, in exchange for money from
| their employer, who are paying those salaries because they want
| to sell chips into the enormous demand for running PyTorch
| faster.
|
| Hobbyist and open-source are definitely not synonyms.
| Eisenstein wrote:
| People don't usually get employed to make things with no
| demand, and people who work for companies with a budget line
| don't really care how much the nVidia tax is. You can thank
| hobbyists for creating a lot of demand for compatability with
| other cards.
| kiratp wrote:
| There are so many billions of dollar being spent on this
| hardware that everyone other than Nvidia is doing
| everything they can to make competition happen.
|
| Eg: https://www.intel.com/content/www/us/en/developer/video
| s/opt...
|
| https://www.intel.com/content/www/us/en/developer/tools/one
| a...
|
| https://developer.apple.com/metal/tensorflow-plugin/
|
| Large scale opensource is, outside of a few exceptions,
| built by engineers paid to build it.
| jauntywundrkind wrote:
| Pytorch is just using Google's OpenXLA now, & OpenXLA is the
| actual cross platform thing, no? I'm not very well versed in
| this area, so pardon if mistaken.
| https://pytorch.org/blog/pytorch-2.0-xla-path-forward/
| fotcorn wrote:
| You can use OpenXLA, but it's not the default. The main use-
| case for OpenXLA is running PyTorch on Google TPUs. OpenXLA
| also supports GPUs, but I am not sure how many people use
| that. Afaik JAX uses OpenXLA as backend to run on GPUs.
|
| If you use model.compile() in PyTorch, you use TorchInductor
| and OpenAIs Triton by default.
| nabla9 wrote:
| > Crossing the CUDA moat for AMD GPUs may be as easy as using
| PyTorch.
|
| Nvidia has spent huge amount of work to make code run smoothly
| and fast. AMD has to work hard to catch up. ROCm code is slower ,
| has more bugs, don't have enough features and they have
| compatibility issues between cards.
| latchkey wrote:
| Lisa has said that they are committed to improving ROCm,
| especially for AI workloads. Recent releases (5.6/5.7) prove
| that.
| einpoklum wrote:
| > Nvidia has spent huge amount of work to make code run
| smoothly and fast.
|
| Well, let's say "smoother" rather than "smoothly".
|
| > ROCm code is slower
|
| On physically-comparable hardware? Possible, but that's not an
| easy claim to make, certainly not as expansively as you have.
| References?
|
| > has more bugs
|
| Possible, but - NVIDIA keeps their bug database secret. I'm
| guessing you're concluding this from anecdotal experience?
| That's fair enough, but then - say so.
|
| > ROCm ... don't have enough features and
|
| Likely. while AMD has both spent less in that department (and
| had less to spend I guess); plus, and no less importantly - it
| tried to go along with the OpenCL initiative, as specified by
| the Khronos consortium, while NVIDIA has sort of "betrayed" the
| initiative by investing in it's vendor-locked, incompatible
| ecosystem and letting their OpenCL support decay in some
| respects.
|
| > they have compatibility issues between cards.
|
| such as?
| whywhywhywhy wrote:
| Anyone who has to work in this ecosystem surely thinks this is a
| naive take
| freedomben wrote:
| For someone who doesn't work in this ecosystem, can you
| elaborate? What's the real situation currently?
| superkuh wrote:
| >There is also a version of PyTorch that uses AMD ROCm, an open-
| source software stack for AMD GPU programming. Crossing the CUDA
| moat for AMD GPUs may be as easy as using PyTorch.
|
| Unfortunately since the AMD firmware doesn't reliably do what
| it's supposed to those ROCm calls often don't either. That's if
| your AMD card is even still supported by ROCm: the AMD RX 580 I
| bought in 2021 (the great GPU shortage) had it's ROCm support
| dropped in 2022 (4 years support total).
|
| The only reliable interface in my experience has been via opencl.
| zucker42 wrote:
| Do you mean OpenCL using Rusticl or something else? And what DL
| framework, if any?
| superkuh wrote:
| I should clarify that I mean for human person uses. Not
| commercial or institutional. But, clBLAST via llama.cpp for
| LLM currently. Or far in the past just pure opencl for things
| with AMD cards.
| htrp wrote:
| has opencl actually improved enough to be competitive?
| orangepurple wrote:
| I thought ONNX is supposed to be the ultimate common
| denominator for machine learning model cross platform
| compatibility
| [deleted]
| the__alchemist wrote:
| When coding using Vulkan, for graphics or compute (The latter is
| the relevant one here), you need to have CPU code (Written in
| C++, Rust etc), then serialize it as bytes, then have shaders
| which run on the graphics card. This 3-step process creates
| friction, much in the same way as backend/serialization/frontend
| does in web dev. Duplication of work, type checking not going
| across the bridge, the shader language being limited etc.
|
| My understanding is CUDA's main strength is avoiding this. Do you
| agree? Is that why it's such a big deal? Ie, why this article was
| written, since you could always do compute shaders on AMD etc
| using Vulkan.
| atemerev wrote:
| Nope. PyTorch is not enough, you have to do come C++ occasionally
| (as the code there can be optimized radically, as we see in
| llama.cpp and the like). ROCm is unusable compared to CUDA (4x
| more code for the same problem).
|
| I don't understand why everyone neglects good, usable and
| performant lower-level APIs. ROCm is fast, low-level, but much
| much harder to use than CUDA, and the market seems to agree.
| alecco wrote:
| Regurgitated months-old content. blogspam
| einpoklum wrote:
| TL;DR:
|
| 1. Since PyTorch has grown very popular, and there's an AMD
| backend for that, one can switch GPU vendors when doing
| Generative AI work.
|
| 2. Like NVIDIA's Grace+Hopper CPU-GPU combo, AMD is/will be
| offering "Instinct MI300A", which improves performance over
| having the GPU across a PCIe bus from a regular CPU.
| bigcat12345678 wrote:
| Cuda is the foundation
|
| NVIDIA moat is the years of work built by oss community, big
| corporations, research insistute
|
| They spend all time building for cuda, a lot of implicit designs
| are derived from cuda's characteristic
|
| That will be the main challenge
| mikepurvis wrote:
| It depends on the domain. Increasingly people's interfaces to
| this stuff are the higher level libraries like tensorflow,
| pytorch, numpy/cupy, and to a lesser degree accelerated
| processing libraries such as opencv, PCL, suitesparse, ceres-
| solver, and friends.
|
| If you can add hardware support to a major library _and_
| improve on the packaging and deployment front while also
| undercutting on price, that 's the moat gone overnight. CUDA
| itself only matters in terms of lock-in if you're calling
| CUDA's own functions.
| bigcat12345678 wrote:
| what I meant is that all these stuff have 15 years of
| implicit accumulation of knowledge and tips and even hacks
| builtin in the software
|
| No matter what you depends on, you'll have a slew of larger
| or minor obstacles or annoyance
|
| That collectively is the most itself
|
| As you said, already it's clear that replacing cuda itself is
| not that daunting
| ddtaylor wrote:
| It's worth noting that AMD also has a ROCm port of Tensorflow.
| ginko wrote:
| When I try to install rocm-ml-sdk on Arch linux it'll tell me
| the total installed size would be about 18GB.
|
| What can possibly explain this much bloat for what should
| essentially be a library on top of a graphics driver as well as
| some tools (compiler, profiler etc.)? A couple hundred MB I
| could understand if they come with graphical apps and demos,
| but not this..
| tomsmeding wrote:
| A regular TensorFlow installation, just the Python library,
| is an 184 MB wheel that unpacks to about 1.2 GB of stuff. I
| have no clue what mess goes in there, but it's a lot.
|
| Still, if you're right that this package seems to take 18 GB
| disk size, something weird is going on.
| slavik81 wrote:
| There's a lot of kernels that are specialized for
| particular sets of input parameters and tuned for improved
| performance on specific hardware, which makes the libraries
| a couple hundred megabytes per architecture. The ROCm
| libraries are huge because they are fat binaries containing
| native machine code for ~13 different GPU architectures.
| RcouF1uZ4gsC wrote:
| I am not so sure.
|
| Everyone knows that CUDA is a core competency of Nvidia and they
| have stuck to it for years and years refining it, fixing bugs,
| and making the experience smoother on Nvidia hardware.
|
| On the other hand, AMD has not had the same level of commitment.
| They used to sing the praises of OpenCL. And then there is ROCm.
| Tomorrow, it might be something else.
|
| Thus, Nvidia CUDA will get a lot more attention and tuning from
| even the portability layers because they know that their
| investment in it will reap dividends even years from now, whereas
| their investment in AMD might be obsolete in a few years.
|
| In addition, even if there is theoretical support, getting
| specific driver support and working around driver bugs is likely
| to be more of a pain with AMD.
| AnthonyMouse wrote:
| This is what people complain about, but at the same time there
| aren't enough cards, so the people with AMD cards want to use
| them. So they fix the bugs, or report them to AMD so they can
| fix them, and it gets better. Then more people use them and
| submit patches and bug reporters, and it gets better.
|
| At some point the old complaints are no longer valid.
| pixelesque wrote:
| Does AMD have a solution to forward device combatibility (like
| PTX for NVidia)?
|
| Last time I looked into ROCm (two years ago?), you seemed to have
| to compile stuff explicitly for the architecture you were using,
| so if a new card came out, you couldn't use it without a
| recompile.
| mnau wrote:
| Not natively, but AdaptiveCpp (previously hiSycl, then
| OpenSycl) has a single source single compiler pass, where they
| basically store LLVM IR as an intermediate representation.
|
| https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/...
|
| Performance penalty was within ew precents, at least according
| to the paper (figure 9 and 10)
| https://cdrdv2-public.intel.com/786536/Heidelberg_IWOCL__SYC...
| einpoklum wrote:
| I don't know what they do with ROCm, but with OpenCL, the
| answer is: Certainly. It's called SPIR:
|
| https://www.khronos.org/spir/
| ur-whale wrote:
| > AMD May Get Across the CUDA Moat
|
| I really wish they would, and properly, as in: fully open
| solution to match CUDA.
|
| CUDA is a cancer on the industry.
| hot_gril wrote:
| People complain about Nvidia being anticompetitive with CUDA, but
| I don't really see it. They saw a gap in the standards for on-GPU
| compute and put tons of effort into a proprietary alternative.
| They tied CUDA to their own hardware, which sorta makes technical
| sense given the optimizations involved, but it's their choice
| anyway. They still support the open standards, but many prefer
| CUDA and will pay the Nvidia premium for it because it's actually
| nicer. They also don't have CPU marketshare to tie things to.
|
| Good for them. We can hope the open side catches up either by
| improving their standards, or adding more layers like this
| article describes.
| zirgs wrote:
| CUDA was released in 2007 and the development of it started
| even earlier - possibly even in the 90s. Back then nobody else
| cared about GPU compute. OpenCL came out 2 years after that.
| killerstorm wrote:
| Not true. People got interested in general-purpose GPU
| compute (GPGPU) in early 2000s when video cards with
| programmable shaders became available.
| https://en.wikipedia.org/wiki/General-
| purpose_computing_on_g...
|
| People made a programming language & a compiler/runtime for
| GPGPU in 2004: https://en.wikipedia.org/wiki/BrookGPU
| frnkng wrote:
| As a former ETH miner I learned the hard way that saving a few
| bucks on hardware may not be worth operational issues.
|
| I had a miner running with Nividia cards and a miner running with
| AMD cards. One of them had massive maintenance demand and the
| other did not. I will not state which brand was better imho.
|
| Currently I estimate that running miners and running gpu servers
| has similar operational requirements and finally at scale similar
| financial considerations.
|
| So, whatever is cheapest to operate in terms of time expenditure,
| hw cost, energy use,... will be used the most.
|
| P.s.: I ran the mining operation not to earn money but mainly out
| of curiosity. And it was a small scale business powered by a pv
| system and a attached heat pump.
| latchkey wrote:
| I ran 150,000+ AMD cards for mining ETH. Once I fully automated
| all the vbios installs and individual card tuning, it ran
| beautifully. Took a lot of work to get there though!
|
| Fact is that every single GPU chip is a snowflake. No two
| operate the same.
| rottencupcakes wrote:
| Have you ever written about this enterprise? This sounds
| super unique and I would be very interested in hearing about
| how it was run and how it turned out.
| latchkey wrote:
| It was unique, not many people on the planet, that I know
| of, who've run as many GPUs as I have. Especially not
| working for a giant company with large teams of people. For
| the tech team, it was just me and one other guy. Everything
| _had_ to be automated because there was no way we could
| survive otherwise.
|
| I've put a bunch of comments here on HN about the stuff I
| can talk about.
|
| It no longer exists after PoS.
| freedomben wrote:
| what type of cards did you have? what did you do with
| them after PoS? How did you even buy so many cards?
| Sorry, like the other commenter I'm extremely curious
| latchkey wrote:
| Primarily 470,480,570,580. We also ran a very large
| cluster of PS5 APU chips too.
|
| Got the chips directly from AMD. Since these are 4-5 year
| old chips, they were not going to ever be used. It is
| more ROI efficient with ETH mining to use older cards
| than newer ones.
|
| Had a couple OEM manufacture the cards specially for us
| with 8gb, heatsinks instead of fans (lower power usage)
| and no display ports (lower cost).
|
| They will be recycled as there isn't much use for them
| now.
|
| I'm also no longer with the company.
___________________________________________________________________
(page generated 2023-10-06 23:00 UTC)