[HN Gopher] AMD funded a drop-in CUDA implementation built on RO...
___________________________________________________________________
AMD funded a drop-in CUDA implementation built on ROCm: It's now
open-source
Author : mfiguiere
Score : 817 points
Date : 2024-02-12 14:00 UTC (8 hours ago)
(HTM) web link (www.phoronix.com)
(TXT) w3m dump (www.phoronix.com)
| hd4 wrote:
| https://github.com/vosen/ZLUDA - source
| MegaDeKay wrote:
| Latest commit message: "Nobody expects the Red Team"
| pella wrote:
| https://github.com/vosen/ZLUDA/tree/v3
| fariszr wrote:
| > after the CUDA back-end was around for years and after dropping
| OpenCL, Blender did add a Radeon HIP back-end... But the real
| kicker here is that using ZLUDA + CUDA back-end was slightly
| faster than the native Radeon HIP backend.
|
| This is absolutely crazy.
| toxik wrote:
| Is AMD just a puppet org to placate antitrust fears? Why are
| they like this?
| swozey wrote:
| Is this really a theory? If so my $8 AMD stock from, 2015? is
| currently worth $176 so they should make more shell companies
| they're doing great.
|
| I guess that might answer my "Why would AMD find that having
| a CUDA competitor isn't a business case unless they couldn't
| do it or the cards underperformed significantly."
| kllrnohj wrote:
| For some reason AMD's GPU division continues to be run,
| well, horribly. The CPU division is crushing it, but the
| GPU division is comically bad. During the great GPU
| shortage AMD had multiple opportunities to capture chunks
| of the market and secure market share, increasing the
| priority for developers to acknowledge and target AMD's
| GPUs. What did they do instead? Not a goddamn thing, they
| followed Nvidia's pricing and managed to sell jack shit
| (like seriously the RX 580 is still the first AMD card to
| show up on the steam hardware survey).
|
| They're not going big enough dies at the top end to compete
| with nvidia for the halo, and they're refusing to undercut
| at the low end where nvidia's reputation for absurd pricing
| is at an all time high. AMD's GPU division is a clown show,
| it's impressively bad. Even though the hardware itself is
| _fine_ they just can 't stop either making terrible product
| launches, awful pricing strategies, or just brain dead
| software choices like shipping a feature that triggered
| anti-cheat, getting their customers predictably banned &
| angering game devs in the process
|
| And relevant to this discussion Nvidia's refusal to add
| VRAM to their lower end cards is a prime opportunity for
| AMD to go after the lower-end compute / AI interested crowd
| who will become the next generation software devs. What are
| they doing with this? Well, they're not making ROCm
| available to basically anyone, that's apparently the
| winning strategy. ROCm 6.0 only supports the 7900 XTX and
| the... Radeon VII. The weird one-off Vega 20 refresh. Of
| all the random cards to support, why the hell would you
| pick that one???
| swozey wrote:
| > The (AMD) CPU division is crushing it
|
| I worked at a baremetal CDN with 60 pops and a few years
| ago we _had_ to switch to AMD because of PCIE bandwidth
| over to our smartNICs and nvmeOF sort of things. We 'd
| long hit limits on Intel before the Epyc stuff came out
| so we had to have more servers running than we wanted
| because we had to limit how much we did with one server
| to not hit the limits and cause everything to lock.
|
| And we were _excited_ , not a single apprehension. Epyc
| crushed the server market, everyone is using them. Well,
| it's going ARM now but Epyc will still be around awhile.
| wheybags wrote:
| Cannot understand why AMD would stop funding this. It seems like
| this should have a whole team allocated to it.
| otoburb wrote:
| They would always be at the mercy of NVIDIA's API. Without
| knowing the inner workings, perhaps a major concern with this
| approach is the need to implement on NVIDIA's schedule instead
| of AMD's which is a very reactive stance.
|
| This approach actually would make sense if AMD felt, like most
| of us perhaps, that the NVIDIA ecosystem is too entrenched, but
| perhaps they made the decision recently to discontinue funding
| because they (now?) feel otherwise.
| blagie wrote:
| They've been at mercy of Intel x86 APIs for a long time.
| Didn't kill them.
|
| What happens here is that the original vendor loses control
| of the API once there are multiple implementations. That's
| the best possible outcome for AMD.
|
| In either case, they have a limited window to be adopted, and
| that's more important. The abstraction layer here helps too.
| AMD code is !@#$%. If this were adopted, it makes it easier
| to fix things underneath. All that is a lot more important
| than a dream of disrupting CUDA.
| rubatuga wrote:
| x86 is not the same, the courts forced the release of x86
| architecture to AMD during an antitrust lawsuit
| anon291 wrote:
| You don't think the courts would force the opening of
| CUDA? Didn't a court already rule that API cannot be
| patented. I believe it was a Google case. As long as no
| implementation was stolen, the API itself is not able to
| be copyrighted.
|
| Here it is: https://arstechnica.com/tech-
| policy/2021/04/how-the-supreme-...
| Symmetry wrote:
| Regardless of the legal status of APIs, this Phoronix
| article is about AMD providing a replacement ABI and I
| wouldn't assume the legal issues are necessarily the
| same. But because this is a case where AMD is following a
| software target there's the possibility, if AMD starts to
| succeed, that NVidia might change their ABI in ways that
| deliberatly hurt AMD's compatibility efforts in ways that
| would be much more difficult for APIs or hardware.
| That's, presumably, why AMD is going forward with their
| API emulation effort instead.
| anon291 wrote:
| If you read the article, it's about Google's re-
| implementation of the Java API and runtime. Thus, yes,
| Google was providing both API and ABI compatibility.
| Symmetry wrote:
| I read the article when it came out and re-scimmed it
| just now. My understanding at the time and still was that
| the legal case revolved around the API and the exhibits
| entered into evidence I saw were all Java function names
| with their arguments and things of that sort. And I'm
| given to understand that the Dalvik Java implementation
| Google was using with Android was register based rather
| than than the stack based standard Java, which sounds to
| me like it would make actual binary compatibility
| impossible.
| jcranmer wrote:
| > Didn't a court already rule that API cannot be
| patented. I believe it was a Google case. As long as no
| implementation was stolen, the API itself is not able to
| be copyrighted.
|
| That is... not accurate in the slightest.
|
| Oracle v Google was not about patentability. Software
| patentability is its own separate minefield, since anyone
| who looks at the general tenor of SCOTUS cases on the
| issue should be able to figure out that SCOTUS is at best
| highly skeptical of software patents, even if it hasn't
| made any direct ruling on the topic. (Mostly this is a
| matter of them being able to tell what they don't like
| but not what they do like, I think). But I've had a
| patent attorney straight-out tell me that in absence of
| better guidance, they're just pretending the most recent
| relevant ruling (which held that X-on-a-computer isn't
| patentable) doesn't exist. In any case, a patent on
| software APIs (as opposed to software as a whole) would
| very clearly fall under the "what are you on, this isn't
| patentable" category of patentability.
|
| The case was about the copyrightability of software APIs.
| Except if you read the decision itself, SCOTUS doesn't
| actually answer the question [1]. Instead, it focuses on
| whether or not Google's use of the Java APIs were fair
| use. Fair use is a dangerous thing to rely on for legal
| precedent, since there's no "automatic" fair use
| category, but instead a balancing test ostensibly of four
| factors but practically of one factor: does it hurt the
| original copyright owner's profits [2].
|
| There's an older decision which held that the "structure,
| sequence, and organization" of code is copyrightable
| independent of the larger work of software, which is
| generally interpreted as saying that software APIs are
| copyrightable. At the same time, however, it's widespread
| practice in the industry to assume that "clean room"
| development of an API doesn't violate any copyright. The
| SCOTUS decision in Google v Oracle was widely interpreted
| as endorsing this interpretation of the law.
|
| [1] There's a sentence or two that suggests to me there
| was previously a section on copyrightability that had
| been ripped out of the opinion.
|
| [2] See also the more recent Andy Warhol SCOTUS decision
| which, I kid you not, says that you have to analyze this
| to figure out whether or not a use is "transformative".
| Which kind of implicitly overturns Google v Oracle if you
| think about it, but is unlikely to in practice.
| monocasa wrote:
| To be fair, there were patent claims in Oracle vs. Google
| too. That's why the appeals went through the CAFC rather
| than the 9th circuit. Those claims were simply thrown out
| pretty early. Whether that says something about more
| generally or was simply a set of weak claims intended for
| venue shopping is a legitimate discussion to be had
| though.
| hardware2win wrote:
| You think x86 would be changed in such a way that it'd
| break and?
|
| Because what else?
|
| If so, then i think that this is crazy because software
| is harder to change than hardware
| tikkabhuna wrote:
| My understanding is that with AMD64 there's a circular
| dependency where AMD need Intel for x86 and Intel need AMD
| for x86_64?
| monocasa wrote:
| That's true now, but AMD has been making x86 compatible
| CPUs since the original 8086.
| lambdaone wrote:
| More than that, a second implementation of CUDA acts as a
| disincentive for NVIDIA to make breaking changes to it,
| since it would reduce any incentive for software developers
| to follow those changes, as it reduces the value of their
| software by eliminating hardware choice for end-users
| (which in some case like large companies are also the
| developers themselves).
|
| At the same time, open source projects can be pretty nimble
| in chasing things like changing APIs, potentially
| frustrating the effectiveness of API pivoting by NVIDIA in
| a second way.
| visarga wrote:
| > They would always be at the mercy of NVIDIA's API.
|
| They only need to support PyTorch. Not CUDA
| sam_goody wrote:
| I don't really follow this, but isn't it a bad sign for ROCm
| that, for example, ZLUDA + Blender 4's CUDA back-end delivers
| better performance than the native Radeon HIP back-end?
| fariszr wrote:
| It really shows how neglected their software stack is, or at
| least how neglected this implementation is.
| whizzter wrote:
| Could be that the CUDA backend has seen far more specialization
| optimizations whereas the seeingly fairly fresh HIP backend
| hasn't had as many developers looking at it, in the end a few
| more control instructions on the CPU side to go through the
| ZLUDA wrapper will be insignificant compared to all the time
| spent inside better optimized GPU kernels.
| KeplerBoy wrote:
| Surely this can be attributed to Blender's HIP code just being
| suboptimal because nobody really cares about it. By extension
| nobody cares about it because performance is suboptimal.
|
| It's AMDs job to break that circle.
| mdre wrote:
| I'd say it's even worse, since for rendering Optix is like 30%
| faster than CUDA. But that requires the tensor cores. At this
| point AMD is waaay behind hardware wise.
| btown wrote:
| Why would this not be AMD's top priority among priorities?
| Someone recently likened the situation to an Iron Age where
| NVIDIA owns all the iron. And this sounds like AMD knowing about
| a new source of ore and not even being willing to sink a single
| engineer's salary into exploration.
|
| My only guess is they have a parallel skunkworks working on the
| same thing, but in a way that they can keep it closed-source -
| that this was a hedge they think they no longer need, and they
| are missing the forest for the trees on the benefits of cross-
| pollination and open source ethos to their business.
| fariszr wrote:
| According to the article, AMD seems to have pulled the plug on
| this as they think it will hinder ROCMv6 adoption, which still
| btw only supports two consumer cards out of their entire line
| up[1]
|
| 1. https://www.phoronix.com/news/AMD-ROCm-6.0-Released
| bhouston wrote:
| AMD should have the funds to push both of these initiatives
| at once. If the ROCM team has political reasons to kill the
| competition, it is because they are scared it will succeed.
| I've seen this happen in big companies.
|
| But management at AMD should be above petty team politics and
| fund both because at the company level they do not care which
| solution wins in the end.
| imtringued wrote:
| Why would they be worried about people using their product?
| Some CUDA wrapper on top of ROCM isn't going to get them
| fired. It doesn't get rid of ROCM's function as a GPGPU
| driver.
| zer00eyz wrote:
| If your AMD you don't want to be compatible till you have a
| compelling feature of your own.
|
| Good enough CUDA + New feature x gives them leverage in the
| inevitable court battle(S) and patten sharing agreement
| that everyone wants to see.
|
| AMD' already stuck its toe in the water: new CPU's with
| their AI cores built in. If you can get a AM5 socket to run
| with 196 gigs, that's a large (all be it slow) model you
| can run.
| kkielhofner wrote:
| With the most recent card being their one year old flagship
| ($1k) consumer GPU...
|
| Meanwhile CUDA supports anything with Nvidia stamped on it
| before it's even released. They'll even go as far as doing
| things like adding support for new GPUs/compute families to
| older CUDA versions (see Hopper/Ada and CUDA 11.8).
|
| You can go out and buy any Nvidia GPU the day of release,
| take it home, plug it in, and everything just works. This is
| what people expect.
|
| AMD seems to have no clue that this level of usability is
| what it will take to actually compete with Nvidia and it's a
| real shame - their hardware is great.
| KingOfCoders wrote:
| AMD thinks the reason Nvidia is ahead of them is bad
| marketing on their part, and good marketing (All is AI) by
| Nvidia. They don't see the difference in software stacks.
|
| For years I want to get off the Nvidia train for AI, but
| I'm forced to buy another Nvidia card b/c AMD stuff just
| doesn't work, and all examples work with Nvidia cards as
| they should.
| fortran77 wrote:
| At the risk of sounding like Jeff Ballmer, the reason I
| only use NVIDIA for GPGPU work (our company does a lot of
| it!) is the developer support. They have compilers,
| tools, documentation, and tech support for developers who
| want to do any type of GPGPU computing on their hardware
| that just isn't matched on any other platform.
| roenxi wrote:
| You've got to remember that AMD are behind at all aspects
| of this, including documenting their work in an easily
| digestible way.
|
| "Support" means that the card is actively tested and
| presumably has some sort of SLA-style push to fix bugs for.
| As their stack matures, a bunch of cards that don't have
| official support will work well [0]. I have an unsupported
| card. There are horrible bugs. But the evidence I've seen
| is that the card will work better with time even though it
| is never going to be officially supported. I don't think
| any of my hardware is officially supported by the
| manufacturer, but the kernel drivers still work fine.
|
| > Meanwhile CUDA supports anything with Nvidia stamped on
| it before it's even released...
|
| A lot of older Nvidia cards don't support CUDA v9 [1]. It
| isn't like everything supports everything, particularly in
| the early part of building out capability. The impression
| I'm getting is that in practice the gap in strategy here is
| not as large as the current state makes it seem.
|
| [0] If anyone has bought an AMD card for their machine to
| multiply matrices they've been gambling on whether the
| capability is there. This comment is reasonable
| speculation, but I want to caveat the optimism by asserting
| that I'm not going to put money into AMD compute until
| there is some some actual evidence on the table that GPU
| lockups are rare.
|
| [1] https://en.wikipedia.org/wiki/CUDA#GPUs_supported
| spookie wrote:
| To be fair, if anything, that table still shows you'll
| have compatibility with at least 3 major releases. Either
| way, I agree their strategy is getting results, it just
| takes time. I do prefer their open source commitment, I
| just hope they continue.
| paulmd wrote:
| All versions of CUDA support PTX, which is an
| intermediate bytecode/compiler representation that can be
| finally-compiled by even CUDA 1.0.
|
| So the contract is: as long as your future program does
| not touch any intrinsics etc that do not exist in CUDA
| 1.0, you can export the new program from CUDA 27.0 as
| PTX, and the GTX 6800 driver will read the PTX and let
| your gpu run it as CUDA 1.0 code... so it is quite
| literally just as they describe, unlimited forward and
| backward capability/support as long as you go through PTX
| in the middle.
|
| https://docs.nvidia.com/cuda/archive/10.1/parallel-
| thread-ex...
|
| https://en.wikipedia.org/wiki/Parallel_Thread_Execution
| ColonelPhantom wrote:
| CUDA dropped Tesla (from 2006!) only as of 7.0, which
| seems to have released around 2015. Fermi support lasted
| from 2010 until 2017, giving it a solid 7 years still.
| Kepler support was dropped around 2020, and the first
| cards were released in 2012.
|
| As such Fermi seems to be the shortest supported
| architecture, and it was around for 7 years. GCN4
| (Polaris) was introduced in 2016, and seems to have been
| officially dropped around 2021, just 5 years in. While
| you could still get it working with various workarounds,
| I don't see the evidence of Nvidia being even remotely as
| hasty as AMD with removing support, even for early
| architectures like Tesla and Fermi.
| hedgehog wrote:
| On top of this some Kepler support (for K80s etc) is
| still maintained in CUDA 11 which was last updated late
| 2022, and libraries like PyTorch and TensorFlow still
| support CUDA 11.8 out of the box.
| Certhas wrote:
| The most recent "card" is their MI300 line.
|
| It's annoying as hell to you and me that they are not
| catering to the market of people who want to run stuff on
| their gaming cards.
|
| But it's not clear it's bad strategy to focus on executing
| in the high-end first. They have been very successful
| landing MI300s in the HPC space...
|
| Edit: I just looked it up: 25% of the GPU Compute in the
| current Top500 Supercomputers is AMD
|
| https://www.top500.org/statistics/list/
|
| Even though the list has plenty of V100 and A100s which
| came out (much) earlier. Don't have the data at hand, but I
| wouldn't be surprised if AMD got more of the Top500 new
| installations than nVidia in the last two years.
| latchkey wrote:
| I'm building a bare metal business around MI300x and top
| end Epyc CPUs. We will have them for rental soon. The
| goal is to build a public super computer that isn't just
| available to researchers in HPC.
| beebeepka wrote:
| Is it true MI300 line is 3-4x cheaper for similar
| performance than whatever nvidia is selling in highest
| segment?
| latchkey wrote:
| I probably can't comment on that, but what I can comment
| on is this:
|
| H100's are hard to get. Nearly impossible. CoreWeave and
| others have scooped them all up for the foreseeable
| future. So, if you are looking at only price as the
| factor, then it becomes somewhat irrelevant, if you can't
| even buy them [0]. I don't really understand the focus on
| price because of this fact.
|
| Even if you do manage to score yourself some H100's. You
| also need to factor in the networking between nodes. IB
| (Infiniband) made by Mellanox, is owned by NVIDIA. Lead
| times on that equipment are 50+ weeks. Again, price
| becomes irrelevant if you can't even network your boxes
| together.
|
| As someone building a business around MI300x (and future
| products), I don't care that much about price [!]. We
| know going in that this is a super capital intensive
| business and have secured the backing to support that. It
| is one of those things where "if you have to ask, you
| can't afford it."
|
| We buy cards by the chassis, it is one price. I actually
| don't know the exact prices of the cards (but I can infer
| it). It is a lot about who you know and what you're
| doing. You buy more chassis, you get better pricing.
| Azure is probably paying half of what I'm paying [1]. But
| I'd also say that from what I've seen so far, their
| chassis aren't nearly as nice as mine. I have dual
| 9754's, 2x bonded 400G, 3TB ram, and 122TB nvme... plus
| the 8x MI300x. These are top of the top. They have Intel
| and I don't know what else inside.
|
| [!] Before you harp on me, of course I care about
| price... but at the end of the day, it isn't what I'm
| focused on today as much as just being focused on
| investing all of the capex/opex that I can get my hands
| on, into building a sustainable business that provides as
| much value as possible to our customers.
|
| [0] https://www.tomshardware.com/news/tsmc-shortage-of-
| nvidias-a...
|
| [1] https://www.techradar.com/pro/instincts-are-
| massively-cheape...
| beebeepka wrote:
| Pretty sweet. I do envy you. For what it's worth, I would
| prefer AMD to charge as much as possible for these little
| beasts.
| latchkey wrote:
| They definitely aren't cheap.
| kkielhofner wrote:
| Indeed, but this is extremely short-sighted.
|
| You don't win an overall market by focusing on several
| hundred million dollar bespoke HPC builds where the
| platform (frankly) doesn't matter at all. I'm working on
| a project on an AMD platform on the list (won't say - for
| now) and needless to say you build whatever you have to
| what's there, regardless of what it takes and the
| operators/owners and vendor support teams pour in
| whatever resources are necessary to make it work.
|
| You win a market a generation at a time - supporting low
| end cards for tinkerers, the educational market, etc. AMD
| should focus on the low-end because that's where the next
| generation of AI devs, startups, innovation, etc is
| coming from and for now that's going to continue to be
| CUDA/Nvidia.
| voakbasda wrote:
| In the embedded space, Nvidia regularly drops support for
| older hardware. The last supported kernel for their Jetson
| TX2 was 4.9. Their newer Jetson Xavier line is stuck on
| 5.10.
|
| The hardware may be great, but their software ecosystem is
| utter crap. As long as they stay the unchallenged leader in
| hardware, I expect Nvidia will continue to produce crap
| software.
|
| I would push to switch our products in a heartbeat, if AMD
| actually gets their act together. If this alternative
| offers a path to evaluate our current application software
| stack on an AMD devkit, I would buy one tomorrow.
| kkielhofner wrote:
| In the embedded space customers develop bespoke solutions
| to well, embed them in products where they (essentially)
| bake the firmware image and more-or-less freeze the
| entire software stack less incremental updates. The next
| version of your product uses the next fresh Jetson and
| Jetpack release. Repeat. Using the latest and greatest
| kernel is far from a top consideration in these
| applications...
|
| I was actually advising an HN user against using Jetson
| just the other day because it's such an extreme outlier
| when it comes to Nvidia and software support. Frankly
| Jetson makes no sense unless you really need the power
| efficiency and form-factor.
|
| Meanwhile, any seven year old >= Pascal card is fully
| supported in CUDA 12 and the most recent driver releases.
| That combined with my initial data points and others
| people have chimed in with on this thread is far from
| "utter crap".
|
| Use the right tool for the job.
| streb-lo wrote:
| I have been using rocm on my 7800xt, it seems to be
| supported just fine.
| MrBuddyCasino wrote:
| AMD truly deserves its misfortune in the GPU market.
| incrudible wrote:
| That is really out of touch. ROCm is garbage as far as I am
| concerned. A drop in replacement, especially one that seems
| to perform quite well, is really interesting however.
| iforgotpassword wrote:
| Someone built the same a while ago for Intel gpus, I think even
| still the old pre-Xe ones. With arc/xe on the horizon, people
| had the same question: why isn't Intel sponsoring this or even
| building their own. It was speculated that this might get them
| into legal hot water with Nvidia, Google VS. Oracle was brought
| up, etc...
| my123 wrote:
| They financed the prior iteration of Zluda:
| https://github.com/vosen/ZLUDA?tab=readme-ov-file#faq
|
| but then stopped
| formerly_proven wrote:
| > [2021] After some deliberation, Intel decided that there
| is no business case for running CUDA applications on Intel
| GPUs.
|
| oof
| iforgotpassword wrote:
| That's an oof indeed. Are AMD and Intel really that
| delusional, ie "once we get our own version of Cuda right
| everybody will just rewrite all their software to make
| use of it", or do they know something we mere mortals
| don't?
| garaetjjte wrote:
| Maybe their lawyers are afraid of another round of "are
| APIs copyrightable"?
| AtheistOfFail wrote:
| > After two years of development and some deliberation,
| AMD decided that there is no business case for running
| CUDA applications on AMD GPUs.
|
| Oof x2
| Cheer2171 wrote:
| Are you freaking kidding me!?!? Fire those MBAs
| immediately.
| geodel wrote:
| Well simplest reason would be money. There are few companies
| rolling in kind of money like Nvidia and AMD is not one of
| them. Cloud vendors would care a bit for them it is just
| business if Nvidia cost a lot more they in turn charge their
| customers a lot more while keeping their margins. I know some
| people still harbors notion that _competition_ will lower the
| price, and it may, just not in sense customers imagine.
| izacus wrote:
| Why do you think running after nVidia for this submarket is a
| good idea for them? The AMD GPU team isn't especially big and
| the development investment is massive. Moreover, they'll have
| the opportunity cost for projects they're now dominating in
| (all game consoles for example).
|
| Do you expect them to be able to capitalize on the AI fad so
| much (and quickly enough!) that it's worth dropping the ball on
| projects they're now doing well in? Or perhaps continue
| investing into the part of the market where they're doing much
| better than nVidia?
| jandrese wrote:
| If the alternative it to ignore one of the biggest developing
| markets then yeah, maybe they should start trying to catch
| up. Unless you think GPU compute is a fad that's going to
| fizzle out?
| izacus wrote:
| One of the most important decisions a company can do, is to
| decide which markets they'll focus in and which they won't.
| This is even true for megacorps (see: Google and their
| parade of messups). There's just not enough time to be in
| all markets all at once.
|
| So, again, it's not at all clear that AMD being in the
| compute GPU game is the automatic win for them in the
| future. There's plenty of companies that killed themselves
| trying to run after big profitable new fad markets (see:
| Nokia and Windows Phone, and many other cases).
|
| So let's examine that - does AMD actually have a good shot
| of taking a significant chunk of market that will offset
| them not investing in some other market?
| jandrese wrote:
| AMD is literally the only company on the market poised to
| exploit the explosion in demand for GPU compute after
| nVidia (sorry Intel). To not even really try to break in
| is insanity. nVidia didn't grow their market cap by 5x
| over the course of a year because people really got into
| 3D gaming. Even as an also ran on the coat tails of
| nVidia with a compatibility glue library the market is
| clearly demanding more product.
| justinclift wrote:
| Isn't Intel's next gen GPU supposed to be pretty strong
| on compute?
|
| Read an article about it recently, but when trying to
| remember the details / find it again just now I'm not
| seeing it. :(
| jandrese wrote:
| Intel is trying, but all of their efforts thus far have
| been pretty sad and abortive. I don't think anybody is
| taking them seriously at this point.
| spookie wrote:
| Their OneAPI is really interesting!
| 7speter wrote:
| I'm not an expert like you would find here on HN, I am
| only really a tinkerer and learner, amateur at best, but
| I think Intel's compute is very promising on Alchemist.
| The A770 beats out the 4060ti 16gb in video rendering via
| Davinci Resolve and Adobe; has AV1 support in free
| Davinci Resolve while Lovelace only has AV1 support in
| studio. Then for AI, the A770 has had a good showing in
| stable diffsion against Nvidia's midrange Lovelace since
| the summer: https://www.tomshardware.com/news/stable-
| diffusion-for-intel...
|
| The big issue for Intel is pretty similar to that of AMD;
| everything is made for CUDA, and Intel has to either
| build their own solutions or convince people to build
| support for Intel. While I'm working on learning AI and
| plan to use an Nvidia card, its pretty the progress Intel
| has made in the last couple of years since introducing
| their first GPU to market has been pretty wild, and I
| think it really give AMD pause.
| atq2119 wrote:
| They are breaking in, though. By all accounts, MI300s are
| being sold as fast as they can make them.
| thfuran wrote:
| Investing in what other market?
| yywwbbn wrote:
| > So, again, it's not at all clear that AMD being in the
| compute GPU game is the automatic win for them in the
| future. There's
|
| You're right about that but it seems that it's pretty
| clear that not being in the compute GPU game is an
| automatic loss for them (look at their recent revenue
| growth in the past quarter and two by in each sector)
| imtringued wrote:
| Are you seriously telling me they shouldn't invest into
| one of their core markets? The necessary investments are
| probably insignificant. Let's say you need a budget of 10
| million dollars (50 developers) to assemble a dev team to
| fix ROCM. How many 7900 XTX to break even on revenue?
| Roughly 9000. How many did they sell? I'm too lazy to
| count but Mindfactory a German online shop alone sold
| around 6k units.
| nindalf wrote:
| AMD is betting big on GPUs. They recently released the MI300,
| which has "2x transistors, 2.4x memory and 1.6x memory
| bandwidth more than the H100, the top-of-the-line artificial-
| intelligence chip made by Nvidia"
| (https://www.economist.com/business/2024/01/31/could-amd-
| brea...).
|
| They very much plan to compete in this space, and hope to
| ship $3.5B of these chips in the next year. Small compared to
| Nvidia's revenues of $59B (includes both consumer and data
| centre), but AMD hopes to match them. It's too big a market
| to ignore, and they have the hardware chops to match Nvidia.
| What they lack is software, and it's unclear if they'll ever
| figure that out.
| incrudible wrote:
| They are trying to compete in the segment of data center
| market where the shots are called by bean counters
| calculating FLOPS per dollar.
| BearOso wrote:
| A market where Nvidia chips are all bought out, so what's
| left?
| latchkey wrote:
| That's why I'm going to democratize that business and
| make it available to anyone who wants access. How does
| bare metal rentals of MI300x and top end Epyc CPUs sound?
| We take on the capex/opex/risk and give people what they
| want, which is access to HPC clusters.
| throwawaymaths wrote:
| IIRC (this could be old news) AMD GPUs are preferred in the
| supercomputer segment because they offer better flops/unit
| energy. However without a cuda-like you're missing out on the
| AI part of supercompute, which is increasing proportion.
|
| The margins on supercompute-related sales are very high.
| Simplifying, but you can basically take a consumer chip,
| unlock a few things, add more memory capacity, relicense, and
| your margin goes up by a huge factor.
| Symmetry wrote:
| It's more that the resource balance in AMD's compute line
| of GPUs (the CDNA ones) has been more focused on the double
| precision operations that most supercomputer code makes
| heavy use of.
| throwawaymaths wrote:
| Thanks for clarifying! I had a feeling I had my story
| slightly wrong
| anonylizard wrote:
| They are preferred not because of inherent superiority of
| AMD GPUs. But simply because they have to price lower and
| have lower margins.
|
| Nvidia could always just half their prices one day, and
| wipe out every non-state-funded competitor. But Nvidia
| prefers to collect their extreme margins and funnel it into
| even more R&D in AI.
| hnlmorg wrote:
| GPU for compute has been a thing since the 00s. Regardless of
| whether AI is a fad (it isn't, but we can agree to disagree
| on this one) not investing more in GPU compute is a weird
| decision.
| FPGAhacker wrote:
| It was Microsoft's strategy for several decades (outsiders
| called it embrace, extend, extinguish, only partially in
| jest). It can work for some companies.
| currymj wrote:
| everyone buying GPUs for AI and scientific workloads wishes
| AMD was a viable option, and this has been true for almost a
| decade now.
|
| the hardware is already good enough, people would be happy to
| use it and accept that's it's not quite as optimized for DL
| as Nvidia.
|
| people would even accept that the software is not as
| optimized as CUDA, I think, as long as it is correct and
| reasonably fast.
|
| the problem is just that every time i've tried it, it's been
| a pain in the ass to install and there are always weird bugs
| and crashes. I don't think it's hubris to say that they could
| fix these sorts of problems if they had the will.
| bonton89 wrote:
| AMD also has the problem that they make much better margins
| on their CPUs than on their GPUs and there are only so many
| TSMC wafers. So in a way making more GPUs is like burning up
| free money.
| carlossouza wrote:
| Because the supply for this market is constrained.
|
| It's a pure business decision based on simple math.
|
| If the estimated revenues from selling to the underserved
| market are higher than the cost of funding the project (they
| probably are, considering the obscene margins from NVIDIA),
| then it's a no-brainer.
| yywwbbn wrote:
| Because their current market valuation was massively inflated
| because of the AI/GPU boom and/or bubble?
|
| In rational world their stock price would collapse if they
| don't focus on it and are unable to deliver anything
| competitive in the upcoming year or two
|
| > of the market where they're doing much better than nVidia?
|
| So the market that's hardly growing, Nvidia is not competing
| in and Intel still has bigger market share and is catching up
| performance wise? AMD's valuation is this highly only because
| they are seen as the only company that could directly compete
| with Nvidia in the data center GPU market.
| jandrese wrote:
| AMD's management seems to be only vaguely aware that GPU
| compute is a thing. All of their efforts in the field feel like
| afterthoughts. Or maybe they are all just hardware guys who
| think of software as just a cost center.
| giovannibonetti wrote:
| Maybe they just can't lure in good software developers with
| the right skill set, either due to not paying them enough or
| not having a good work environment in comparison to the other
| places that could hire them.
| captainbland wrote:
| I did a cursory glance at Nvidia's and AMD's respective
| careers pages for software developers at one point - what
| struck me was they both have similarly high requirements
| for engineers in fields like GPU compute and AI but Nvidia
| hires much more widely, geographically speaking, than AMD.
|
| As a total outsider it seems to me that maybe one of AMD's
| big problems is they just aren't set up to take advantage
| of the global talent pool in the same way Nvidia is.
| newsclues wrote:
| They are aware, but it wasn't until recently that they had
| the resources to invest in the space. They had to build Zen
| and start making buckets of money first
| beebeepka wrote:
| Exactly. AMD stock was like 2 dollars just eight years ago.
| They didn't have any money and, amusingly, it was their GPU
| business that kept them going on life support.
|
| Their leadership seems quite a bit more competent than
| random forum commenters give them credit for. I guess what
| they need, marketing wise, is a few successful halo GPU
| launches. They haven't done that in a while. Lisa
| acknowledged this years ago. It's marketing 101. I guess
| these things are easier said than done.
| lostdog wrote:
| It feels like "Make the AI software work on our GPUs," is on
| some VP's OKRs, but isn't really being checked on for
| progress or quality.
| trynumber9 wrote:
| That doesn't explain CDNA. They focused on high-throughput
| FP64 which is not where the market went.
| hjabird wrote:
| The problem with effectively supporting CUDA is that encourages
| CUDA adoption all the more strongly. Meanwhile, AMD will always
| be playing catch-up, forever having to patch issues, work
| around Nvidia/AMD differences, and accept the performance
| penalty that comes from having code optimised for another
| vendor's hardware. AMD needs to encourage developers to use
| their own ecosystem or an open standard.
| slashdev wrote:
| With Nvidia controlling 90%+ of the market, this is not a
| viable option. They'd better lean hard into CUDA support if
| they want to be relevant.
| cduzz wrote:
| A bit of story telling here:
|
| IBM and Microsoft made OS/2. The first version worked on
| 286s and was stable but useless.
|
| The second version worked only on 386s and was quite good,
| and even had wonderful windows 3.x compatibility. "Better
| windows than windows!"
|
| At that point Microsoft wanted out of the deal and they
| wanted to make their newer version of windows, NT, which
| they did.
|
| IBM now had a competitor to "new" windows and a very
| compatible version of "old" windows. Microsoft killed OS2
| by a variety of ways (including just letting IBM be IBM)
| but also by making it very difficult for last month's
| version of OS/2 to run next month's bunch of Windows
| programs.
|
| To bring this back to the point -- IBM vs Microsoft is akin
| to AMD vs Nvidia -- where nvidia has the standard that AMD
| is implementing, and so no matter what if you play in the
| backward compatibility realm you're always going to be
| playing catch-up and likely always in a position where
| winning is exceedingly hard.
|
| As WOPR once said "interesting game; the only way to win is
| to not play."
| panick21_ wrote:
| IBM also made a whole bunch of strategic mistakes beyond
| that. Most importantly their hardware division didn't
| give a flying f about OS/2. Even when they had a 'better
| Windows' they did not actually use it themselves and
| didn't push it to other vendors.
|
| Windows NT wasn't really relevant in that competition for
| much longer, only XP was finally for end consumers.
|
| > where nvidia has the standard that AMD is implementing,
| and so no matter what if you play in the backward
| compatibility realm you're always going to be playing
| catch-up
|
| That's not true. If AMD starts adding their own features
| and have their own advantages, that can flip.
|
| It only takes a single generation of hardware, or a
| single feature for things to flip.
|
| Look at Linux and Unix. Its started out with Linux
| implementing Unix, and now the Unix are trying to add
| compatibility with with Linux.
|
| Is SGI still the driving force behind OpenGL/Vulcan? Did
| you think it was a bad idea for other companies to use
| OpenGL?
|
| AMD was successful against Intel with x86_64.
|
| There are lots of example of the company making something
| popular, not being able to take full advantage of it in
| the long run.
| chuckadams wrote:
| Slapping a price tag of over $300 on OS/2 didn't do IBM
| any favors either.
| BizarroLand wrote:
| That's what happens when your primary business model is
| selling to the military. They had to pay what IBM charged
| them (within a small bit of reason) and it was incredibly
| difficult for them to pivot away from any path they chose
| in the 80's once they had chosen it.
|
| However, that same logic doesn't apply to consumers, and
| since they continued to fail to learn that lesson now IBM
| doesn't even target the consumer market given that they
| never learned how to be competitive and could only ever
| effectively function when they had a monopoly or at least
| a vendor lock-in.
|
| https://en.wikipedia.org/wiki/Acquisition_of_the_IBM_PC_b
| usi...
| incrudible wrote:
| Windows before NT was crap, so users had an incentive to
| upgrade. If there had existed a Windows 7 alternative
| that was near fully compatible and FOSS, I would wager
| Microsoft would have lost to it with Windows 8 and even
| 10. The only reason to update for most people was
| Microsoft dropping support.
|
| For CUDA, it is not just AMD who would need to catch up.
| Developers also are not necessarily going to target the
| latest feature set immediately, especially if it only
| benefits (or requires) new hardware.
|
| I accept the final statement, but that also means AMD for
| compute is gonna be dead like OS/2. Their stack just will
| not reach critical mass.
| BizarroLand wrote:
| Todays linux OS's would have competed incredibly strongly
| against Vista and probably would have gone blow for blow
| against 7.
|
| Proton, Wine, and all of the compatibility fixes and
| drive improvements that the community has made in the
| last 16 years has been amazing, and every day is another
| day where you can say that it has never been easier to
| switch away from Windows.
|
| However, Microsoft has definitely been drinking the IBM
| koolaid a little to long and has lost the mandate of
| heaven. I think in the next 7-10 years we will reach a
| point where there is nothing Windows can do that linux
| cannot do better and easier without spying on you, and we
| may be 3-5 years from a "killer app" that is specifically
| built to be incompatible with Windows just as a big FU to
| them, possibly in the VR world, possibly in AR, and once
| that happens maybe, maybe, maybe it will finally actually
| be the year of the linux desktop.
| paulmd wrote:
| > However, Microsoft has definitely been drinking the IBM
| koolaid a little to long and has lost the mandate of
| heaven. I think in the next 7-10 years we will reach a
| point where there is nothing Windows can do that linux
| cannot do better and easier without spying on you
|
| that's a fascinating statement with the clear ascendancy
| of neural-assisted algorithms etc. Things like DLSS are
| the future - small models that just quietly optimize some
| part of a workload that was commonly considered
| impossible to the extent nobody even thinks about it
| anymore.
|
| my prediction is that in 10 years we are looking at the
| rise of tag+collection based filesystems and operating
| system paradigms. all of us generate a huge amount of
| "digital garbage" constantly, and you either sort it out
| into the important stuff, keep temporarily, and toss, or
| you accumulate a giant digital garbage pile. AI systems
| are gonna automate that process, it's gonna start on
| traditional tree-based systems but eventually you don't
| need the tree at all, AI is what's going to make that
| pivot to true tag/collection systems possible.
|
| Tags mostly haven't worked because of a bunch of
| individual issues which are pretty much solved by AI.
| Tags aren't specific enough: well, AI can give you good
| guesses at relevance. Tagging files and maintaining
| collections is a pain: well, the AI can generate tags and
| assign collections for you. Tags really require an
| ontology for "fuzzy" matching (search for "food" should
| return the tag "hot dog") - well, LLMs understand
| ontologies fine. Etc etc. And if you do it right, you can
| basically have the AI generate "inbox/outbox" for you,
| deduplicate files and handle versioning, etc, all
| relatively seamlessly.
|
| microsoft and macos are both clearly racing for this with
| the "AI os" concept. It's not just better relevance
| searches etc. And the "generate me a whole paragraph
| before you even know what I'm trying to type" stuff is
| not how it's going to work either. That stuff is like
| specular highlights in video games around 2007 or
| whatever - once you had the tool, for a few years
| everything was _w e t_ until developers learned some
| restraint with it. But there are very very good
| applications that are going to come out in the 10 year
| window that are going to reduce operator cognitive load
| by a lot - that is the "AI OS" concept. What would the
| OS look like if you truly had the "computer is my
| secretary" idea? Not just dictating memorandums, but
| assistance in keeping your life in order and keeping you
| on-task.
|
| I simply cannot see linux being able to keep up with this
| change, in the same way the kernel can't just switch to
| rust - at some point you are too calcified to ever do the
| big-bang rewrite if there is not a BDFL telling you that
| it's got to happen.
|
| the downside of being "the bazaar" is that you are
| standards-driven and have to deal with corralling a
| million whiny nerds constantly complaining about "spying
| on me just like microsoft" and continuing to push in
| their own other directions (sysvinit/upstart/systemd
| factions, etc) and whatever else, _on top of all the
| other technical issues of doing a big-bang rewrite_.
| linux is too calcified to ever pivot away from being a
| tree-based OS and it 's going to be another 2-3 decades
| before they catch up with "proper support for new file-
| organization paradigms" etc even in the smaller sense.
|
| that's really just the tip of the iceberg on the things
| AI is going to change, and linux is probably going to be
| left out of most of those _commercial applications_
| despite being where the research is done. It 's just too
| much of a mess and too many nerdlingers pushing back to
| ever get anything done. Unix will be represented in this
| new paradigm but not Linux - the commercial operators who
| have the centralization and fortitude to build a
| cathedral will get there much quicker, and that looks
| like MacOS or Solaris not linux.
|
| Or at least, unless I see some big announcement from KDE
| or Gnome or Canonical/Red Hat about a big AI-OS
| rewrite... I assume that's pretty much where the center
| of gravity is going to stay for linux.
| BizarroLand wrote:
| Counterpoint: Most AI stuff is developed on either an OS
| agnostic language like Python or C, and then ported to
| Linux/OSX/Windows, so for AI it is less about the OS it
| runs on than the hardware, drivers, and/or connections
| that the OS supports.
|
| For the non-vendor lock in AI's (copilot), casting as
| wide of a net as possible to catch customers as easily as
| possible should by default mean that they would invest
| the small amount of money to build linux integrations
| into their AI platforms.
|
| Plus, the googs has a pretty deep investment into the
| linux ecosystem and should have little issue pushing bard
| or gemini or whatever they'll call it next week before
| they kill it out into a linux compatible interface, and
| if they do that then the other big players will follow.
|
| And, don't overlook the next generation of VR headsets.
| People have gotten silly over the Apple headset, but
| Valve should be rolling out the Deckhard soon and others
| will start to compete in that space since Apple raised
| the price bar and should soon start rolling out hardware
| with more features and software to take advantage of it.
| incrudible wrote:
| "Neural assisted algorithms" are just algorithms with
| large lookup tables. Another magnitude of binary bloat,
| but that's nothing we haven't experienced before. There's
| no need to fundamentally change the OS paradigm for it.
| paulmd wrote:
| I think we're well past the "dlss is just FSR2 with
| lookup tables, you can ALWAYS replicate the outcomes of
| neural algorithms with deterministic ones" phase, imo.
|
| if that's the case you have billion-dollar opportunities
| waiting for you to prove it!
| pjmlp wrote:
| There is no competition when games only come to Linux by
| "emulating" Windows.
|
| The only thing it has going for it is being a free beer
| UNIX clone for headless environments, and even then,
| isn't that relevant on cloud environments where
| containers and managed languages abstract everything they
| run on.
| BizarroLand wrote:
| Thanks to the Steam Deck, more and more games are being
| ported for Linux compatibility by default.
|
| Maybe some Microsoft owned games makers will never make
| the shift, but if the majority of others do then that's
| the death knell.
| incrudible wrote:
| Are they _ported_ though? I would say thanks to the Steam
| Deck, Proton is at a point where native Linux ports are
| unnecessary. It 's also a much more stable target to
| develop against than N+1 Linux distros.
| pjmlp wrote:
| Nah, everyone is relying on Proton, there are hardly any
| native GNU/Linux games being ported, not even Android/NDK
| ones, where SDL, OpenGL, Vulkan, C, C++ are present, and
| would be extremely easy to port.
| foobiekr wrote:
| IBM was also incompetent and the os/2 team in Boca was
| had some exceptional engineers but was packed witg mostly
| mediocre-to-bad ones, which is why so many things in OS/2
| were bad and why IBM got upset for Microsoft contributing
| negative work to the project because their lines of code
| contribution was negative (they were rewriting a lot of
| inefficient bloated IBM code).
|
| A lot went wrong with os/2. For CUDA, I think a better
| analogy is vhs. The standard, in the effective not open
| sense, is what it is. AMD sucks at software and views it
| as an expense rather than an advantage.
| AYBABTME wrote:
| You would think that by now AMD realizes that poor
| software is what left them behind in the dust, and would
| have changed that mindset.
| hyperman1 wrote:
| Most businesses understand the pain points of their
| suppliers very well, as they feel that pain and gave
| themselves organized around it.
|
| They have a hard time to understand the pain points of
| their consumers, as they don't feel that pain, look
| trough their own organisation-coloured glases, and can't
| see the real pain points from the whiney-customer ones.
|
| AMD probably thinks software ecosystems are the easy
| part, ready to take it on whenever they feel like it and
| throw a token amount at it. They've built a great engine,
| see the carossery as beneath them, and don't understand
| why the lazy customer wants them to build the rest of the
| car too.
| neerajsi wrote:
| I'm not in the gpu programming realm, so this observation
| might be inaccurate:
|
| I think the case of cuda vs an open standard is different
| from os2 vs Windows because the customers of cuda are
| programmers with access to source code while the
| customers of os2 were end users trying to run apps
| written by others.
|
| If your shrink-wrapped software didn't run on os2, you'd
| have no choice but to go buy Windows. Otoh if your ai
| model doesn't run on an AMD device and the issue is
| something minor, you can edit the shader code.
| bachmeier wrote:
| > The problem with effectively supporting CUDA is that
| encourages CUDA adoption all the more strongly.
|
| I'm curious about this. Sure some CUDA code has already been
| written. If something new comes along that provides better
| performance per dollar spent, why continue writing CUDA for
| new projects? I don't think the argument that "this is what
| we know how to write" works in this case. These aren't
| scripts you want someone to knock out quickly.
| Uehreka wrote:
| > If something new comes along that provides better
| performance per dollar spent
|
| They won't be able to do that, their hardware isn't fast
| enough.
|
| Nvidia is beating them at hardware performance, AND ALSO
| has an exclusive SDK (CUDA) that is used by almost all deep
| learning projects. If AMD can get their cards to run CUDA
| via ROCm, then they can begin to compete with Nvidia on
| price (though not performance). Then, and only then, if
| they can start actually producing cards with equivalent
| performance (also a big stretch) they can try for an
| Embrace Extend Extinguish play against CUDA.
| bachmeier wrote:
| > They won't be able to do that, their hardware isn't
| fast enough.
|
| Well, then I guess CUDA is not really the problem, so
| being able to run CUDA on AMD hardware wouldn't solve
| anything.
|
| > try for an Embrace Extend Extinguish play against CUDA
|
| They wouldn't need to go that route. They just need a way
| to run existing CUDA code on AMD hardware. Once that
| happens, their customers have the option to save money by
| writing ROCm or whatever AMD is working on at that time.
| foobiekr wrote:
| Intel has the same software issue as AMD but their
| hardware is genuinely competitive if a generation behind.
| Cost and power wise, Intel is there; software? No.
| Uehreka wrote:
| > Well, then I guess CUDA is not really the problem
|
| It is. All the things are the problem. AMD is behind on
| both hardware and software, for both gaming and compute
| workloads, and has been for many years. Their competitor
| has them beat in pretty much every vertical, and the
| lock-in from CUDA helps ensure that even if AMD can get
| their act together on the hardware side, existing compute
| workloads (there are oceans of existing workloads) won't
| run on their hardware, so it won't matter for
| professional or datacenter usage.
|
| To compete with Nvidia in those verticals, AMD has to fix
| all of it. Ideally they'd come out with something better
| than CUDA, but they have not shown an aptitude for being
| able to do something like that. That's why people keep
| telling them to just make a compatibility layer. It's a
| sad place to be, but that's the sad place where AMD is,
| and they have to play the hand they've been dealt.
| dotnet00 wrote:
| If something new comes along that provides better
| performance per dollar, but you have no confidence that
| it'll continue to be available in the future, it's far less
| appealing. There's also little point in being cheaper if it
| just doesn't have the raw performance to justify the effort
| in implementing in that language.
|
| CUDA currently has the better raw performance, better
| availability, and a long record indicating that the
| platform won't just disappear in a couple of years. You can
| use it on pretty much any NVIDIA GPU and it's properly
| supported. The same CUDA code that ran on a GTX680 can run
| on an RTX4090 with minimal changes if any (maybe even the
| same binary).
|
| In comparison, AMD has a very spotty record with their
| compute technologies, stuff gets released and becomes
| effectively abandonware, or after just a few years support
| gets dropped regardless of the hardware's popularity. For
| several generations they basically led people on with
| promises of full support on consumer hardware that either
| never arrived or arrived when the next generation of cards
| were already available, and despite the general popularity
| of the rx580 and the popularity of the Radeon VII in
| compute applications, they dropped 'official' support. AMD
| treats its 'consumer' cards as third class citizens for
| compute support, but you aren't going to convince people to
| seriously look into your platform like that. Plus, it's a
| lot more appealing to have "GPU acceleration will allow us
| to take advantage of newer supercomputers, while also
| offering massive benefits to regular users" than just the
| former.
|
| This was ultimately what removed AMD as a consideration for
| us when we were deciding on which to focus on for GPU
| acceleration in our application. Many of us already had
| access to an NVIDIA GPU of any sort, which would make
| development easier, while the entire facility had one ROCm
| capable AMD GPU at the time, specifically so they could
| occasionally check in on its status.
| panick21_ wrote:
| That's not guaranteed at all. One could make the same
| argument about Linux vs Commercial Unix.
|
| If the put their stuff as OpenSource, including firmware, I
| think they will win out eventually.
|
| And its also not a guarantee that Nvidia will always produce
| the superior hardware for that code.
| kgeist wrote:
| Intel embraced Amd64 ditching Itanium. Wasn't it a good
| decision that worked out well? Is it comparable?
| teucris wrote:
| In hindsight, yes, but just because a specific technology
| is leading an industry doesn't mean it's going to be the
| best option. It has to play out long enough for the market
| to indicate a preference. In this case, for better or
| worse, it looks like CUDA's the preference.
| diggan wrote:
| > It has to play out long enough for the market to
| indicate a preference
|
| By what measures hasn't that happened already? CUDA been
| around and constantly improving for more than 15 years,
| and there is no competitors in sight so far. It's
| basically the de facto standard in many ecosystems.
| teucris wrote:
| There haven't been any as successful, but there have been
| competitors. OpenCL, DirectX come to mind.
| cogman10 wrote:
| SYCL is the latest attempt that I'm aware of. It's still
| pretty active and may just work as it doesn't rely on
| video card manufactures to work out.
| zozbot234 wrote:
| SYCL is the quasi-successor to OpenCL, built on the same
| flavor of SPIR-V. Various efforts are trying to run it on
| top of Vulkan Compute (which tends to be broadly support
| by modern GPU's) but it's non-trivial because the
| technologies are independently developed and there are
| some incompatibilities.
| kllrnohj wrote:
| Intel & AMD have a cross-license agreement covering
| everything x86 (and x86_64) thanks to lots and lots of
| lawsuits over their many years of competition.
|
| So while Intel had to bow to AMD's success and give up
| Itanium, they weren't then limited by that and could
| proceed to iterate on top of it.
|
| Meanwhile it'll be a cold day in hell before Nvidia
| licenses anything about CUDA to AMD, much less allows AMD
| to iterate on top of it.
| kevin_thibedeau wrote:
| The original cross licensing was government imposed
| because a second source was needed for the military.
| atq2119 wrote:
| Makes you wonder why DoE labs and similar facilities
| don't mandate open licensing of CUDA.
| krab wrote:
| Isn't API out of scope for copyright? In the case of
| CUDA, it seems they can copy most of it and then iterate
| in their own, keeping a compatible subset.
| throwoutway wrote:
| Is it? Apple Silicon exists, but Apple created a translation
| layer above it so the transition could be smoother.
| jack_pp wrote:
| not really the same in that Apple was absolutely required
| to do this in order for people to transition smoothly and
| it wasn't competing against another company / platform, it
| just needed apps from its previous platform to work while
| people recompile apps for the current one which they will
| Jorropo wrote:
| This is extremely different, apple was targeting end
| consumers that just want their app to run. The performance
| between apple rosetta and native cpu were still multiple
| times different.
|
| People writing CUDA apps don't just want stuff to run,
| performance is an extremely important factor else they
| would target CPUs which are easier to program for.
|
| From their readme: > On Server GPUs, ZLUDA can compile CUDA
| GPU code to run in one of two modes: > Fast mode, which is
| faster, but can make exotic (but correct) GPU code hang. >
| Slow mode, which should make GPU code more stable, but can
| prevent some applications from running on ZLUDA.
| hamandcheese wrote:
| > The performance between apple rosetta and native cpu
| were still multiple times different.
|
| Rosetta 2 runs apps at 80-90% their native speed.
| Jorropo wrote:
| Indeed I got that wrong. Sadly minimal SIMD and hardware
| acceleration support.
| piva00 wrote:
| > The performance between apple rosetta and native cpu
| were still multiple times different.
|
| Not at all, the performance hit was in the low 10s %,
| before natively supporting Apple Silicon most of the apps
| I use for music/video/photography didn't seem to have a
| performance impact at all, even more when the M1 machines
| were so much faster than the Intels.
| coldtea wrote:
| > _The problem with effectively supporting CUDA is that
| encourages CUDA adoption all the more strongly_
|
| Worked fine for MS with Excel supporting Lotus 123 and Word
| supporting WordPerfect's formats when those were dominant...
| Dork1234 wrote:
| Microsoft could do that because they had the Operating
| System monopoly to leverage and take out both Lotus 123 and
| WordPerfect. Without the monopoly of the operating system
| they wouldn't of been able to Embrace, Extend, Extinguish.
|
| https://en.wikipedia.org/wiki/Embrace,_extend,_and_extingui
| s...
| bell-cot wrote:
| But MS controlled the underlying OS. Letting them both
| throw money at the problem, and (by accounts at the time)
| frequently tweak the OS in ways that made life difficult
| for Lotus, WordPerfect, Ashton-Tate, etc.
| p_l wrote:
| Last I checked, Lotus did themselves by not innovating,
| and betting on the wrong horse (OS/2) then not doing well
| on a pivot to Windows.
|
| Meanwhile Excel was gaining features and winning users
| with them even before Windows was in play.
| dadadad100 wrote:
| This is a key point. Before windows we had all the dos
| players - WordPerfect was king. Microsoft was more
| focused on the Mac. I've always assumed that Microsoft
| understood that a GUI was coming and trained a generation
| of developers on the main gui of the day. Once windows
| came out the dos focused apps could not adapt in time
| robocat wrote:
| > betting on the wrong horse (OS/2)
|
| Ahhhh, your hindsight is well developed. I would be
| interested to know the background on the reasons why
| Lotus made that bet. We can't know the counterfactual,
| but Lotus delivering on a platform owned by their deadly
| competitor Microsoft would seem to me to be a clearly
| worrysome idea to Lotus at the time. Turned out it was an
| existentially bad idea. Did Lotus fear Microsoft? "DOS
| ain't done till Lotus won't run" is a myth[1] for a
| reason. Edit: DRDOS errors[2] were one reason Lotus might
| fear Microsoft. We can just imagine a narritive of a
| different timeline where Lotus delivered on Windows but
| did some things differently to beat Excel. I agree, Lotus
| made other mistakes and Microsoft made some great
| decisions, but the point remains.
|
| We can also suspect that AMD have a similar choice now
| where they are forked. Depending on Nvidea/CUDA may be a
| similar choice for AMD - fail if they do and fail if they
| don't.
|
| [1] http://www.proudlyserving.com/archives/2005/08/dos_ai
| nt_done...
|
| [2] https://www.theregister.com/1999/11/05/how_ms_played_
| the_inc...
| p_l wrote:
| I've seen rumours from self-claimed ex-Lotus employees
| that IBM made a deal with Lotus to prioritise OS/2
| andy_ppp wrote:
| When the alternative is failure I suppose you choose the
| least bad option. Nobody is betting the farm on ROCm!
| hjabird wrote:
| True. This is the big advantage of an open standard instead
| jumping from one vendors walled garden to another.
| more_corn wrote:
| They have already lost. The question is do they want to come
| in second in the game to control the future of the world or
| not play at all?
| bick_nyers wrote:
| The latest version of CUDA is 12.3, and version 12.2 came out
| 6 months prior. How many people are running an older version
| of CUDA right now on NVIDIA hardware for whatever particular
| reason?
|
| Even if AMD lagged support on CUDA versioning, I think it
| would be widely accepted if the performance per dollar at
| certain price points was better.
|
| Taking the whole market from NVIDIA is not really an option,
| it's better to attack certain price points and niches and
| then expand from there. The CUDA ship sailed a long time ago
| in my view.
| swozey wrote:
| I just went through this this weekend - If you're running
| in Windows and want to use deepspeed, you have to still use
| Cuda 12.1 because deepspeed 13.1 is the latest that works
| with 12.1. There's no deepspeed for windows that works with
| 12.3.
|
| I tried to get it working this weekend but it was a huge
| PITA so I switched to putting everything into WSL2 then in
| arch on there pytorch etc in containers so I could flip
| versions easily now that I know how SPECIFIC the versions
| are to one another.
|
| I'm still working on that part, halfway into it my WSL2
| completely broke and I had to reinstall windows. I'm scared
| to mount the vhdx right now. I did ALL of my work and ALL
| of my documentation is inside of the WSL2 archlinux and NOT
| on my windows machine. I have EVERYTHING I need to quickly
| put another server up (dotfiles, configs) sitting in a
| chezmoi git repo ON THE VM. That I only git committed one
| init like 5 mins into everything. THAT was a learning
| experience, now I have no idea if I should follow the "best
| practice" of keeping projects in wsl or having wsl reach
| out to windows, there's a performance drop. The 9p
| networking stopped working and no matter what I
| reinstalled, reset, removed features, reset windows, etc,
| it wouldn't start. But at least I have that WSL2 .vhdx
| image that will hopefully mount and start. And probably
| break WSL2 again. I even SPECIFICALLY took backups of the
| image as tarballs every hour in case I broke LINUX, not
| WSL.
|
| If anyone has done sd containers in wsl2 already let me
| know. I've tried to use WSL for dev work (i use osx) like
| this 2-3 times in the last 4-5 years and I always run into
| some catastrophically broken thing that makes my WSL stop
| working. I hadn't used it in years so hoped it was super
| reliable by now. This is on 3 different desktops with
| completely different hardware, etc. I was terrified it
| would break this weekend and IT DID. At least I can be up
| in windows in 20 minutes thanks to chocolately and chezmoi.
| Wiped out my entire gaming desktop.
|
| Sorry I'm venting now this was my entire weekend.
|
| This repo is from a deepspeed contrib (iirc) and lists the
| reqs for deepspeed + windows that mention the version
| matches
|
| https://github.com/S95Sedan/Deepspeed-Windows
|
| > conda install pytorch==2.1.2 torchvision==0.16.2
| torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
|
| It may sound weird to do any of this in Windows, or maybe
| not, but if it does just remember that it's a lot of gamers
| like me with 4090s who just want to learn ML stuff as a
| hobby. I have absolutely no idea what I'm doing but thank
| god I know containers and linux like the back of my hand.
| bick_nyers wrote:
| Vent away! Sounds frustrating for sure.
|
| As much as I love Microsoft/Windows for the work they
| have put into WSL, I ended up just putting Kubuntu on my
| devices and use QEMU with GPU passthrough whenever I need
| Windows. Gaming perf is good. You need an iGPU or a cheap
| second GPU for Linux in order to hand off a 4090 etc. to
| Windows (unless maybe your motherboard happens to support
| headless boot but if it's a consumer board it doesn't).
| Dual boot with Windows always gave me trouble.
| katbyte wrote:
| I recently gave this a go as I'd not had a windows
| desktop for a long time, have a beefy Proxmox server and
| wanted to play some windows only games - works shockingly
| well with an a4000 and 35m optical hdmi cables! - however
| I'm getting random audio crackling and popping and I've
| yet to figure out what's causing it.
|
| First I thought it was hardware related in a Remote
| Desktop session leading me to think some weird audio
| driver thing
|
| have you encountered anything like this at all?
| swozey wrote:
| What are you running for audio? pipewire+jack, pipewire,
| jack2, pulseaudio? I wonder if it's from latency.
| Pulseaudio is the most common but if you do any audio
| engineering or play guitar etc with your machine we all
| use jack protocol for less latency.
|
| https://linuxmusicians.com/viewtopic.php?t=25556
|
| Could be completely unrelated though, RDP sessions can
| definitely act up, get audio out of sync etc. I try to
| never do pass through rdp audio, it's not even enabled by
| default in the mstsc client IIRC but that may just be a
| "probably server" thing.
| swozey wrote:
| Are you flipping from your main GPU to like a GT710 to do
| the gpu vfio mount? Or can you share the dgpu directly
| and not have to go headless now?
|
| I've done this on both a hackintosh and void linux. I was
| so excited to get the hackintosh working because I
| honestly hate day desktop linux, it's my day job to work
| on and I just don't want to deal with it after work.
|
| Unfortunately both would break in significant ways and
| I'd have to trudge through and fix things. I had that
| void desktop backed up with Duplicacy (duplicati front
| end) and IIRC I tried to roll back after breaking qemu,
| it just dumps all your backup files into their dirs, and
| I think I broke it more.
|
| I think at that point I was back up in Windows in 30
| mins.. and all of its intricacies like bsoding 30% of the
| time that I either restart it or unplug a usb hub. But my
| Macbooks have a 30% chance of not waking up on Monday
| morning when I haven't used them all weekend without me
| having to grab them and open the screen.
| carlossouza wrote:
| Great comment.
|
| I bet there are at least two markets (or niches):
|
| 1. People who want the absolute best performance and the
| latest possible version and are willing to pay the premium
| for it;
|
| 2. People who want to trade performance by cost and accept
| working with not-the-latest versions.
|
| In fact, I bet the market for (2) is much larger than (1).
| bluedino wrote:
| > How many people are running an older version of CUDA
| right now on NVIDIA hardware for whatever particular
| reason?
|
| I would guess there are lots of people still running CUDA
| 11. Older clusters, etc. A lot of that software doesn't get
| updated very often.
| hjabird wrote:
| There are some great replies to my comment - my original
| comment was too reductive. However, I still think that
| entrenching CUDA as the de-facto language for heterogeneous
| computing is a mistake. We need an open ecosystem for AI and
| HPC, where vendors compete on producing the best hardware.
| ethbr1 wrote:
| The problem with open standards is that someone has to
| write them.
|
| And that someone usually isn't a manufacturer, lest the
| committee be accused of bias.
|
| Consequently, you get (a) outdated features that SotA has
| already moved beyond, (b) designed in a way that doesn't
| correspond to actual practice, and (c) that are overly
| generalized.
|
| There are some notable exceptions (e.g. IETF), but the
| general rule has been that open specs please no one,
| slowly.
|
| IMHO, FRAND and liberal cross-licensing produce better
| results.
| jchw wrote:
| Vulkan already has some standard compute functionality.
| Not sure if it's low level enough to be able to e.g.
| recompile and run CUDA kernels, but I think if people
| were looking for a vendor-neutral standard to build GPGPU
| compute features on top of, I mean, that seems to be the
| obvious modern choice.
| zozbot234 wrote:
| There is already a work-in-progress implementation of HIP
| on top of OpenCL https://github.com/CHIP-SPV/chipStar and
| the Mesa RustiCL folks are quite interested in getting
| that to run on top of Vulkan.
|
| (To be clear, HIP is about converting CUDA source code
| not running CUDA-compiled binaries but the Zluda project
| discussed in OP heavily relies on it.)
| jvanderbot wrote:
| If you replace CUDA -> x86 and NVIDIA -> Intel, you'll see a
| familiar story which AMD has already proved it can work
| through.
|
| These were precisely the arguments for 'x86 will entrench
| Intel for all time', and we've seen AMD succeed at that game
| just fine.
| ianlevesque wrote:
| And indeed more than succeed, they invented x86_64.
| stcredzero wrote:
| _And indeed more than succeed, they invented x86_64._
|
| If AMD invented the analogous to x86_64 for CUDA, this
| would increase competition and progress in AI by some
| huge fraction.
| pjmlp wrote:
| Only works if NVidia misteps and creates the Itanium
| version of CUDA.
| stcredzero wrote:
| You don't think someone would welcome the option to have
| more hardware buying options, even if the "Itanium
| version" didn't happen?
| sangnoir wrote:
| x86_64's win was helped by Intel's Itanium misstep. AMD
| can't bank on Nvidia making a mistake, and Nvidia seems
| content with incremental changes to CUDA, contrasted with
| Intel's 32-bit to 64-bit transition. It is highly
| unlikely that AMD can find and exploit a similar chink in
| the amor against CUDA.
| LamaOfRuin wrote:
| If they're content with incremental changes to CUDA then
| it doesn't cost much to keep updated compatibility and do
| it as quickly as any users actually adopt changes.
| samstave wrote:
| Transmetta was Intels boogey-man in the 90s.
| ethbr1 wrote:
| > _These were precisely the arguments for 'x86 will
| entrench Intel for all time', and we've seen AMD succeed at
| that game just fine._
|
| ... after a couple decades of legal proceedings and a
| looming FTC monopoly case convinced Intel to throw in the
| towel, cross-license, and compete more fairly with AMD.
|
| https://jolt.law.harvard.edu/digest/intel-and-amd-
| settlement
|
| AMD didn't just magically do it on its own.
| clhodapp wrote:
| If that's the model, it sounds like the path would be to
| burn money to stay right behind NVIDIA and wait for them to
| become complacent and stumble technically, creating the
| opportunity to leapfrog them. Keeping up could be very
| expensive if they don't force something like the mutual
| licensing requirements around x86.
| mindcrime wrote:
| Yep. This is very similar to the "catch-22" that IBM wound up
| in with OS/2 and the Windows API. On the one hand, by
| supporting Windows software on OS/2, they gave OS/2 customers
| access to a ready base of available, popular software. But in
| doing so, they also reduced the incentive for ISV's to
| produce OS/2 native software that could take advantage of
| unique features of OS/2.
|
| It's a classic "between a rock and a hard place" scenario.
| Quite a conundrum.
| ianlevesque wrote:
| Thinking about the highly adjacent graphics APIs history,
| did anyone really 'win' the Direct3D, OpenGL, Metal, Vulkan
| war? Are we benefiting from the fragmentation?
|
| If the players in the space have naturally coalesced around
| one over the last decade, can we skip the thrashing and
| just go with it this time?
| tadfisher wrote:
| The game engines won. Folks aren't building Direct3D or
| Vulkan renderers; they're using Unity or Unreal or Godot
| and clicking "export" to target whatever API makes sense
| for the platform.
|
| WebGPU might be the thing that unifies the frontend API
| for folks writing cross-platform renderers, seeing as
| browsers will have to implement it on top of the platform
| APIs anyway.
| imtringued wrote:
| This feels like a massive punch in the gut. An opensource
| project, not ruined by AMD's internal mismanaged gets shit done
| within two years and AMD goes "meh"?!? There are billions of
| dollars on the line! It's like AMD actively hates it's
| customers.
|
| Now the only thing they need to do is make sure ROCm itself is
| stable.
| largbae wrote:
| It certainly seems ironic that the company that beat Intel at
| its own compatibility game with x86-64 would abandon
| compatibility with today's market leader.
| rob74 wrote:
| The situation is a bit different: AMD got its foot in the
| door with the x86 market because IBM back in the early 1980s
| forced Intel to license the technology so AMD could act as a
| second source of CPUs. In the GPU market, ATI (later bought
| by AMD) and nVidia emerged as the market leaders after the
| other 3D graphics pioneers (3Dfx) gave up - but their GPUs
| were never compatible in the first place, and if AMD tried to
| _make_ them compatible, nVidia could sue the hell out of
| them...
| alberth wrote:
| DirectX vs OpenGL.
|
| This brings back memories of late 90s / early 00s of Microsoft
| pushing hard their proprietary graphic libraries (DirectX) vs
| open standards (OpenGL).
|
| Fast forward 25-years and even today, Microsoft still dominates
| in PC gaming as a result.
|
| There's a bad track record of open standard for GPUs.
|
| Even Apple themselves gave up on OpenGL and has their own
| proprietary offering (Metal).
| incrudible wrote:
| To add to that, Linux gaming today is dominated by a wrapper
| implementing DirectX.
| Zardoz84 wrote:
| Vulkan running an emulation of DirectX and being faster
| Keyframe wrote:
| Let's not forget the Fahrenheit maneuver by Microsoft that
| left SGI stranding and not forward OpenGL.
| pjmlp wrote:
| Yeah, it never mattered to game consoles either way.
| okanat wrote:
| OpenGL was invented at SGI and it was closed source until it
| was given away. It is very popular in its niche i.e. CAD
| design because the original closed source SGI APIs were very
| successful.
|
| DirectX was targetted at gaming and was a much more limited
| simpler API which made programming games in it easier. It
| couldn't do everything that OpenGL can which is why CAD
| programs didn't use it even on Windows. DirectX worked
| because it chose its market correctly and delivered what the
| customers want. Window's exceptional backwards compatibility
| helped greatly as well. Many simple game engines still use
| DX9 API to this day.
|
| It is not so much about having an open standard, but being
| able to provide extra functionality and performance. Unlike
| the CPU-dominated areas where executing the common baseline
| ISA is very competitive, in accelerated computing using every
| single bit of performance and having new and niche features
| matter. So providing exceptional hardware with good software
| is critical for the competition. Closed APIs have much more
| quick delivery time and they don't have to deal with multiple
| vendors.
|
| Nobody except Nvidia delivers good enough low level software
| and their hardware is exceptionally good. AMD's combination
| is neither. The hardware is slower and it is hard to program
| so they continuously lose the race.
| pjmlp wrote:
| Also to note, dispite urban myths, OpenGL never mattered on
| game consoles, which people keep forgeting about when
| praising OpenGL "portability".
|
| Then there is the whole issue of extension spaghetti, and
| incompatibilities across OpenGL, OpenGL ES and WebGL, hardly
| possible to have portable code 1:1 everywhere, beyond toy
| examples.
| beebeepka wrote:
| I guess every recent not-xbox never mattered.
| pjmlp wrote:
| Like Nintendo, SEGA and Sony ones?
| owlbite wrote:
| Code portability isn't performance portability, a fact that was
| driven home back in the bad old OpenCL era. Code is going to
| have to rewritten to be efficient on AMD architectures.
|
| At which point why tie yourself to the competitor's language.
| Probably much more effective to just write a well optimized
| library that serves the MLIR/whatever is popular API in order
| to run big ML jobs.
| modeless wrote:
| I've been critical of AMD's failure to compete in AI for over a
| decade now, but I can see why AMD wouldn't want to go the route
| of cloning CUDA and I'm surprised they even tried. They would
| be on a never ending treadmill of feature catchup and bug-for-
| bug compatibility, and wouldn't have the freedom to change the
| API to suit their hardware.
|
| The right path for AMD has always been to make their own API
| that runs on _all_ of their own hardware, just as CUDA does for
| Nvidia, and push support for that API into all the open source
| ML projects (but mostly PyTorch), while attacking Nvidia 's
| price discrimination by providing features they use to segment
| the market (e.g. virtualization, high VRAM) at lower price
| points.
|
| Perhaps one day AMD will realize this. It seems like they're
| slowly moving in the right direction now, and all it took for
| them to wake up was Nvidia's market cap skyrocketing to 4th in
| the world on the back of their AI efforts...
| matchagaucho wrote:
| But AMD was formed to shadow Intel's x86?
| modeless wrote:
| ISAs are smaller and less stateful and better documented
| and less buggy and most importantly they evolve much more
| slowly than software APIs. Much more feasible to clone.
| Especially back when AMD started.
| paulmd wrote:
| PTX is just an ISA too. Programming languages annd ISA
| representations are effectively fungible, that's the
| lesson of Microsoft CLR/Intermediate Language and Java
| too. A "machine" is a hardware and a language.
| modeless wrote:
| PTX is not a hardware ISA though, it's still software and
| can change more rapidly.
| paulmd wrote:
| Not without breaking the support contract? If you change
| PTX format then CUDA 1.0 machines can no longer it and
| it's no longer PTX.
|
| Again, you are missing the point. Java is both a language
| (java source) and a machine (the JVM). The latter is a
| hardware ISA - there are processors that implement Java
| bytecode as their ISA format. Yet most people who are
| running Java are not doing so on java-machine hardware,
| yet they _are_ using the java ISA in the process.
|
| https://en.wikipedia.org/wiki/Java_processor
|
| https://en.wikipedia.org/wiki/Bytecode#Execution
|
| any bytecode is an ISA, the bytecode spec defines the
| machine and you can physically build such a machine that
| executes bytecode directly. Or you can translate via an
| intermediate layer, like how Transmeta Crusoe processors
| executed x86 as bytecode on a VLIW processor (and how
| most modern x86 processors actually use RISC micro-ops
| inside).
|
| these are completely fungible concepts. They are not
| quite _the same thing_ but bytecode is clearly an ISA in
| itself. Any given processor can _choose_ to use a
| particular bytecode as either an ISA or translate it to
| its native representation, and this includes both PTX,
| Java, and x86 (among all other bytecodes). And you can do
| the same for any other ISA (x86 as bytecode
| representation, etc).
|
| furthermore, what most people think of as "ISAs" aren't
| necessarily so. For example RDNA2 is an ISA _family_ -
| different processors have different capabilities (for
| example 5500XT has mesh shader support while 5700XT does
| not) and the APUs use a still different ISA internally
| etc. GFX1101 is not the same ISA as GFX1103 and so on.
| These are properly _implementations_ not ISAs, or if you
| consider it to be an ISA then there is also a meta-ISA
| encompassing larger groups (which also applies to x86 's
| numerous variations). But people casually throw it all
| into the "ISA" bucket and it leads to this imprecision.
|
| like many things in computing, it's all a matter of
| perspective/position. where is the boundary between "CMT
| core within a 2-thread module that shares a front-end"
| and "SMT thread within a core with an ALU pinned to one
| particular thread"? It's a matter of perspective. Where
| is the boundary of "software" vs "hardware" when
| virtually every "software" implementation uses fixed-
| function accelerator units and every fixed-function
| accelerator unit is running a control program that
| defines a flow of execution and has
| schedulers/scoreboards multiplexing the execution unit
| across arbitrary data flows? It's a matter of
| perspective.
| modeless wrote:
| You are missing the point. PTX is not designed as a
| vendor neutral abstraction like JVM/CLR bytecode.
| Furthermore CUDA is a lot more than PTX. There's a whole
| API there, plus applications ship machine code and rely
| on Nvidia libraries which can be prohibited from running
| on AMD by license and with DRM, so those large libraries
| would also become part of the API boundary that AMD would
| have to reimplement and support.
|
| Chasing CUDA compatibility is a fool's errand when the
| most important users of CUDA are open source. Just add
| explicit AMD support upstream and skip the never ending
| compatibility treadmill, and get better performance too.
| And once support is established and well used the
| community will pitch in to maintain it.
| atq2119 wrote:
| AMD was founded at almost the same time as Intel. X86
| didn't exist at the time.
|
| But yes, AMD was playing the "follow x86" game for a long
| time until they came up with x86-64, which evened the
| playing field in terms of architecture.
| chem83 wrote:
| To be fair to AMD, they've been trying to solve ML workload
| portability at more fundamental levels with the acquisition of
| Nod.ai and de-facto incorporation of Google's IREE compiler
| project + MLIR.
| whywhywhywhy wrote:
| > Why would this not be AMD's top priority among priorities?
|
| Same reason it wasn't when it was obvious Nvidia was taking
| over this space maybe 8 years ago now when they let OpenCL die
| then proceeded to do nothing till it's too late.
|
| Speaking to anyone working in general purpose GPU coding back
| then they all just said the same thing, OpenCL was a nightmare
| to work with and CUDA was easy and mature compared to it.
| Writing was on the wall where things were heading the second
| you saw a photon based renderer running on GPU vs CPU all the
| way back then, AMD has only themselves to blame because Nvidia
| basically showed them the potential with CUDA.
| btown wrote:
| One would hope that they've learned since then - but it could
| very well be that they haven't!
| phero_cnstrcts wrote:
| Because the two CEOs are family? Like literally.
| CamperBob2 wrote:
| That didn't stop World War I...
| mdre wrote:
| Fun fact: ZLUDA means something like illusion/delusion/figment.
| Well played! (I see the main dev is from Poland.)
| Detrytus wrote:
| You should also mention that CUDA in Polish means "miracles"
| (plural).
| miduil wrote:
| Wow, this is great news. I really hope that the community will
| find ways to sustainable fund this project, being suddenly run a
| lot of innovative CUDA based projects on AMD GPUs is a big game-
| changer, especially because you don't have to deal with the poor
| state of nvidia on linux support.
| sam_goody wrote:
| Aside from the latest commit, there has been no activity for
| almost 3 years (latest code change on Feb 22, 2021).
|
| People are criticizing AMD for dropping this, but it makes sense
| to stop paying for development when the dev has stopped doing the
| work, no?
|
| And if he means that AMD stopped paying 3 years ago - well, that
| was before dinosaurs and ChatGPT, and alot has changed since
| then.
|
| https://github.com/vosen/ZLUDA/commits/v3
| EspadaV9 wrote:
| Pretty sure this was developed in private, but because AMD
| cancelled the contract he has been allowed to open source the
| code, and this is the "throw it over the fence" code dump.
| rrrix1 wrote:
| This. 762 changed files with 252,017
| additions and 39,027 deletions.
|
| https://github.com/vosen/ZLUDA/commit/1b9ba2b2333746c5e2b05a.
| ..
| Ambroisie wrote:
| My thinking is that the dev _did_ work on it for X amount of
| time, but as part of their contract is not allowed to share the
| _actual_ history of the repo, thus the massive code dumped in
| their "Nobody expects the Red Team" commit?
| rswail wrote:
| Have a look at the latest commit and the level of change.
|
| Effectively the internal commits while he was working for AMD
| aren't in the repo, but the squashed commit contains all of the
| changes.
| michaellarabel wrote:
| As I wrote in the article, it was privately developed the past
| 2+ years while being contracted by AMD during that time... In a
| private GitHub repo. Now that he's able to make it public /
| open-source, he squashed all the changes into a clean new
| commit to make it public. The ZLUDA code from 3+ years ago was
| when he was experimenting with CUDA on Intel GPUs.
| SushiHippie wrote:
| The code prior to this was all for the intel gpu zluda, and
| then the latest commit is all the amd zluda code, hence why the
| commit talks about the red team
| Zopieux wrote:
| If only this exact concern was addressed explicitly in the
| first FAQ at the bottom of the README...
|
| https://github.com/vosen/ZLUDA/tree/v3?tab=readme-ov-file#fa...
| Detrytus wrote:
| This is really interesting (from the project's README):
|
| > AMD decided that there is no business case for running CUDA
| applications on AMD GPUs.
|
| Is AMD leadership brain-damaged, or something?
| AndrewKemendo wrote:
| ROCm is not spelled out anywhere in their documentation and the
| best answers in search come from Github and not AMD official
| documents
|
| "Radeon Open Compute Platform"
|
| https://github.com/ROCm/ROCm/issues/1628
|
| And they wonder why they are losing. Branding absolutely matters.
| rtavares wrote:
| Later in the same thread:
|
| > ROCm is a brand name for ROCm(tm) open software platform (for
| software) or the ROCm(tm) open platform ecosystem (includes
| hardware like FPGAs or other CPU architectures).
|
| > Note, ROCm no longer functions as an acronym.
| ametrau wrote:
| >> Note, ROCm no longer functions as an acronym.
|
| That is really dumb. Like LLVM.
| marcus0x62 wrote:
| That, and it only runs on a handful of their GPUs.
| NekkoDroid wrote:
| If you are talking about the "supported" list of GPUs, those
| listed are only the ones they fully validate and QA test,
| other of same gen are likely to work, but most likely with
| some bumps along the way. In one of the a bit older phoronix
| posts about ROCm one of their engeneers did say they are
| trying to expand the list of validated & QA'd cards, as well
| as destinguishing between "validated", "supported" and "non-
| functional"
| machomaster wrote:
| They can say whatever but the action is what matters, not
| wishes and promises. And the reality is that list of
| supported GPUs has been unchanged since they first
| announced it a year ago.
| alwayslikethis wrote:
| I mean, I also had to look up what CUDA stands for.
| hasmanean wrote:
| Compute unified device architecture ?
| phh wrote:
| I have no idea what CUDA stands for, and I live just fine
| without knowing it.
| moffkalast wrote:
| Cleverly Undermining Disorganized AMD
| rvnx wrote:
| Countless Updates Developer Agony
| egorfine wrote:
| This is the right definition.
| hyperbovine wrote:
| Lost five hours of my life yesterday discovering the fact
| that "CUDA 12.3" != "CUDA 12.3 Update 2".
|
| (Yes, that's obvious, but not so obvious when your GPU
| applications submitted to a cluster start crashing randomly
| for no apparent reason.)
| smokel wrote:
| Compute Unified Device Architecture [1]
|
| [1] https://en.wikipedia.org/wiki/CUDA
| alfalfasprout wrote:
| Crap, updates destroyed (my) application
| sorenjan wrote:
| Funnily enough it doesn't work on their RDNA ("Radeon DNA")
| hardware (with some exceptions I think), but it's aimed at
| their CDNA (Compute DNA). If they would come up with a new name
| today it probably wouldn't include Radeon.
|
| AMD seems to be a firm believer in separating the consumer
| chips for gaming and the compute chips for everything else.
| This probably makes a lot of sense from a chip design and
| current business perspective, but I think it's shortsighted and
| a bad idea. GPUs are very competent compute devices, and
| basically wasting all that performance for "only" gaming is
| strange to me. AI and other compute is getting more and more
| important for things like image and video processing, language
| models, etc. Not only for regular consumers, but for
| enthusiasts and developers it makes a lot of sense to be able
| to use your 10 TFLOPS chip even when you're not gaming.
|
| While reading through the AMD CDNA whitepaper I saw this and
| got a good chuckle. "culmination of years of effort by AMD"
| indeed.
|
| > The computational resources offered by the AMD CDNA family
| are nothing short of astounding. However, the key to
| heterogeneous computing is a software stack and ecosystem that
| easily puts these abilities into the hands of software
| developers and customers. The AMD ROCm 4.0 software stack is
| the culmination of years of effort by AMD to provide an open,
| standards-based, low-friction ecosystem that enables
| productivity creating portable and efficient high-performance
| applications for both first- and third-party developers.
|
| https://www.amd.com/content/dam/amd/en/documents/instinct-bu...
| slavik81 wrote:
| ROCm works fine on the RDNA cards. On Ubuntu 23.10 and Debian
| Sid, the system packages for the ROCm math libraries have
| been built to run on every discrete Vega, RDNA 1, RDNA 2,
| CDNA 1, and CDNA 2 GPU. I've manually tested dozens of cards
| and every single one worked. There were just a handful of
| bugs in a couple of the libraries that could easily be fixed
| by a motivated individual. https://slerp.xyz/rocm/logs/full/
|
| The system package for HIP on Debian has been stuck on ROCm
| 5.2 / clang-15 for a while, but once I get it updated to ROCm
| 5.7 / clang-17, I expect that all discrete RDNA 3 GPUs will
| work.
| stonogo wrote:
| It doesn't matter to my lab whether it technically runs.
| According to https://rocm.docs.amd.com/projects/install-on-
| linux/en/lates... it only supports three commercially-
| available Radeon cards (and four available Radeon Pro) on
| Linux. Contrast this to CUDA, which supports literally
| every nVIDIA card in the building, including the crappy NVS
| series and weirdo laptop GPUs, and it basically becomes
| impossible to convince anyone to develop for ROCm.
| atq2119 wrote:
| My understanding is that there was some trademark silliness
| around "open compute", and AMD decided that instead of doing a
| full rebrand, they would stick to ROCm but pretend that it
| wasn't ever an acronym.
| michaellarabel wrote:
| Yeah it was due to the Open Compute Project AFAIK... Though
| for a little while AMD was telling me they really meant to
| call it "Radeon Open eCosystem" before then dropping that too
| with many still using the original name.
| slavik81 wrote:
| That is intentional. We had to change the name. ROCm is no
| longer an acronym.
| AndrewKemendo wrote:
| I assume you're on the team if you're saying "we"
|
| Can you say why you had to change the name?
| pjmlp wrote:
| So polyglot programing workflows via PTX targeting are equally
| supported?
| michalf6 wrote:
| Zluda roughly means "delusion" / "mirage" / "illusion" in Polish,
| given the author is called Andrzej Janik this may be a pun :)
| rvba wrote:
| Arguably one could also translate it as "something that will
| never happen".
|
| At the same time "cuda" could be translated as "wonders".
| eqvinox wrote:
| Keeping my hopes curtailed until I see proper benchmarks...
| hd4 wrote:
| The interest in this thread tells me there are a lot of people
| who are not cool with the CUDA monopoly.
| smoldesu wrote:
| Those people should have spoken up when their hardware
| manufacturers abandoned OpenCL. The industry set itself 5-10
| years behind by ignoring open GPGPU compute drivers while
| Nvidia slowly built their empire. Just look at how long it's
| taken to re-impliment a _fraction_ of the CUDA featureset on a
| small handful of hardware.
|
| CUDA shouldn't exist. We should have hardware manufacturers
| _working together_ , using common APIs and standardizing
| instead of going for the throat. The further platforms drift
| apart, the more valuable Nvidia's vertical integration becomes.
| mnau wrote:
| Common API means being replaceable, fungible. There are no
| margins in that.
| smoldesu wrote:
| Correct. It's why the concept of 'proprietary UNIX' didn't
| survive long once program portability became an incentive.
| Avamander wrote:
| Is my impression wrong, that people understood the need for
| OCL only after CUDA had already cornered and strangled the
| market?
| smoldesu wrote:
| You're mostly right. CUDA was a "sleeper product" that
| existed early-on but didn't see serious demand until later.
| OpenCL was Khronos Group's hedged bet against the success
| of CUDA; it was assumed that they would invest in it more
| as demand for GPGPU increased. After 10 years though,
| OpenCL wasn't really positioned to compete and CUDA was
| more fully-featured than ever. Adding insult to injury, OS
| manufacturers like Microsoft and Apple started to avoid
| standardized GPU libraries in favor of more insular native
| APIs. By the time demand for CUDA materialized, OpenCL had
| already been left for dead by most of the involved parties.
| cashsterling wrote:
| I feel like AMD's senior executives all own a lot of nVIDIA
| stock.
| lambdaone wrote:
| It seems to me that AMD are crazy to stop funding this. CUDA-on-
| ROCm breaks NVIDIA's moat, and would also act as a disincentive
| for NVIDIA to make breaking changes to CUDA; what more could AMD
| want?
|
| When you're #1, you can go all-in on your own proprietary stack,
| knowing that network effects will drive your market share higher
| and higher for you for free.
|
| When you're #2, you need to follow de-facto standards and work on
| creating and following truly open ones, and try to compete on
| actual value, rather than rent-seeking. AMD of all companies
| should know this.
| RamRodification wrote:
| > and would also act as a disincentive for NVIDIA to make
| breaking changes to CUDA
|
| I don't know about that. You could kinda argue the opposite.
| "We improved CUDA. Oh it stopped working for you on AMD
| hardware? Too bad. Buy Nvidia next time"
| mnau wrote:
| Also known as OS/2: Redux strategy.
| freeone3000 wrote:
| Most CUDA applications do not target the newest CUDA version!
| Despite 12.1 being out, lots of code still targets 7 or 8 to
| support old NVIDIA cards. Similar support for AMD isn't
| unthinkable (but a rewrite to rocm would be).
| outside415 wrote:
| NVIDIA is about ecosystem plays, they have no interest in
| sabotage or anti competition plays. Leave that to apple and
| google and their dumb app stores and mobile OSs.
| 0x457 wrote:
| > NVIDIA is about ecosystem plays, they have no interest in
| sabotage or anti competition plays.
|
| Are we talking about the same NVIDIA? The entire Nvidia GPU
| strategy for nvidia is - make a feature (or find existing
| one) that performs better on their cards - pay developers
| to use (and sometimes misuse) it extensively.
| saboot wrote:
| Yep, I develop several applications that use CUDA. I see
| AMD/Radeon powered computers for sale and want to buy one, but
| I am not going to risk not being able to run those applications
| or having to rewrite them.
|
| If they want me as a customer, and they have not created a
| viable alternative to CUDA, they need to pursue this.
| weebull wrote:
| Define "viable"?
| tester756 wrote:
| If you see:
|
| 1) billions of dollar at the stake
|
| 2) one of the most successful leadership
|
| 3) during hottest peroid of their business where they heard
| about Nvidia's moat probably thousands of times during last 18
| months...
|
| and you call some decision "crazy", then you probably do not
| have the same informations that they do
|
| or they underperformed, who knows, but I bet on #1 reason.
| 2OEH8eoCRo0 wrote:
| Question: Why aren't we using LLMs to translate programs to use
| ROCm?
|
| Isn't translation one of the strengths of LLMs?
| JonChesterfield wrote:
| You can translate cuda to hip using a regex. LLM is rather
| overkill.
| mogoh wrote:
| Why is CUDA so prevalent oppose to its alternatives?
| smoldesu wrote:
| At first, it was because Nvidia had a wide variety of highly
| used cards that almost all support some form of CUDA. By-and-
| large, your gaming GPU could debug and run the same code that
| you'd scale up to a datacenter, which was a huge boon for
| researchers and niche industry applications.
|
| With that momentum, CUDA got incorporated into a _lot_ of high-
| performance computing applications. Few alternatives show up
| because there aren 't many acceleration frameworks that are as
| large or complete as CUDA. Nvidia pushed forward by scaling
| down to robotics and edge-compute scale hardware, and now are
| scaling up with their DGX/Grace platforms.
|
| Today, Nvidia is prevalent because all attempts to subvert them
| have failed. Khronos Group tried to get the industry to rally
| around OpenCL as a widely-supported alternative, but too many
| stakeholders abandoned it before the initial crypto/AI booms
| kicked off the demand for GPGPU compute.
| JonChesterfield wrote:
| Opencl was the alternative, came along later, couldn't write a
| lot of programs that cuda can. Cuda is legitimately better than
| opencl.
| enonimal wrote:
| From the ARCHITECTURE.md:
|
| > Those pointers point to undocumented functions forming CUDA
| Dark API. It's impossible to tell how many of them exist, but
| debugging experience suggests there are tens of function pointers
| across tens of tables. A typical application will use one or two
| most common. Due to they undocumented nature they are exclusively
| used by Runtime API and NVIDIA libraries (and in by CUDA
| applications in turn). We don't have names of those functions nor
| names or types of the arguments. This makes implementing them
| time-consuming. Dark API functions are are reverse-engineered and
| implemented by ZLUDA on case-by-case basis once we observe an
| application making use of it.
| leeoniya wrote:
| fertile soil for Alyssa and Asahi Lina :)
|
| https://rosenzweig.io/
|
| https://vt.social/@lina
| smcl wrote:
| I know that Lina doesn't like a lot of the attention HN sends
| her way so it may be better if you don't link her socials
| here.
| bigdict wrote:
| Sounds ridiculous, why have a public presence on a social
| network then?
| cyanydeez wrote:
| oh. I think the emphasis is on hacker news.
|
| you know certain social media sites contain certain toxic
| conversants.
| bigdict wrote:
| > you know certain social media sites contain certain
| toxic conversants.
|
| That's just people...
| PoignardAzur wrote:
| Having an ARCHITECTURE.md file at all is extremely promising,
| but theirs seems pretty polished too!
| gdiamos wrote:
| These were a huge pain in the ass when I tried this 20 years
| ago on Ocelot.
|
| Eventually one of the NVIDIA engineers just asked me to join
| and I did. :-P
| Cu3PO42 wrote:
| I'm really rooting for AMD to break the CUDA monopoly. To this
| end, I genuinely don't know whether a translation layer is a good
| thing or not. On the upside it makes the hardware much more
| viable instantly and will boost adoption, on the downside you run
| the risk that devs will never support ROCm, because you can just
| use the translation layer.
|
| I think this is essentially the same situation as Proton+DXVK for
| Linux gaming. I think that that is a net positive for Linux, but
| I'm less sure about this. Getting good performance out of GPU
| compute requires much more tuning to the concrete architecture,
| which I'm afraid devs just won't do for AMD GPUs through this
| layer, always leaving them behind their Nvidia counterparts.
|
| However, AMD desperately needs to do something. Story time:
|
| On the weekend I wanted to play around with Stable Diffusion. Why
| pay for cloud compute, when I have a powerful GPU at home, I
| thought. Said GPU is a 7900 XTX, i.e. the most powerful consumer
| card from AMD at this time. Only very few AMD GPUs are supported
| by ROCm at this time, but mine is, thankfully.
|
| So, how hard could it possibly to get Stable Diffusion running on
| my GPU? Hard. I don't think my problems were actually caused by
| AMD: I had ROCm installed and my card recognized by rocminfo in a
| matter of minutes. But the whole ML world is so focused on Nvidia
| that it took me ages to get a working installation of pytorch and
| friends. The InvokeAI installer, for example, asks if you want to
| use CUDA or ROCm, but then always installs the CUDA variant
| whatever you answer. Ultimately, I did get a model to load, but
| the software crashed my graphical session before generating a
| single image.
|
| The whole experience left me frustrated and wanting to buy an
| Nvidia GPU again...
| Certhas wrote:
| They are focusing on HPC first. Which seems reasonable if your
| software stack is lacking. Look for sophisticated customers
| that can help build an ecosystem.
|
| As I mentioned elsewhere, 25% of GPU compute on the Top 500
| Supercomputer list is AMD. This all on the back of a card that
| came out only three years ago. We are very rapidly moving
| towards a situation where there are many, many high-performance
| developers that will target ROCm.
| ametrau wrote:
| Is a top 500 super computer list a good way of measuring
| relevancy in the future?
| latchkey wrote:
| No, it isn't. What is a better measure is to look at
| businesses like what I'm building (and others), where we
| take on the capex/opex risk around top end AMD products and
| bring them to the masses through bare metal rentals.
| Previously, these sorts of cards were only available to the
| Top 500.
| whywhywhywhy wrote:
| > I'm really rooting for AMD to break the CUDA monopoly
|
| Personally I want Nvidia to break the x86-64 monopoly, with how
| amazing properly spec'd Nvidia cards are to work with I can
| only dream of a world where Nvidia is my CPU too.
| kuschkufan wrote:
| apt username
| smcleod wrote:
| That's already been done with ARM.
| weebull wrote:
| > Personally I want Nvidia to break the x86-64 monopoly
|
| The one supplied by two companies?
| Keyframe wrote:
| Maybe he meant homogeneity which Nvidia did try and tries
| with Arm.. but, on the other hand how wild would it be for
| Nvidia to enter x86-64 as well? It's probably never going
| to happen due to licensing if nothing else, lest we
| remember nForce chipset ordeal with intel legal.
| bntyhntr wrote:
| I would love to be able to have a native stable diffusion
| experience, my rx 580 takes 30s to generate a single image. But
| it does work after following
| https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki...
|
| I got this up and running on my windows machine in short order
| and I don't even know what stable diffusion is.
|
| But again, it would be nice to have first class support to
| locally participate in the fun.
| Cu3PO42 wrote:
| I have heard that DirectML was a somewhat easier story, but
| allegedly has worse performance (and obviously it's Windows
| only...). But I'm not entirely suprised that setup is
| somewhat easier on Windows, where bundling everything is an
| accepted approach.
|
| With AMD's official 15GB(!) Docker image, I was now able to
| get the A1111 UI running. With SD 1.5 and 30 sample
| iterations, generating an image takes under 2s. I'm still
| struggling to get InvokeAI running.
| westurner wrote:
| > _Proton+DXVK for Linux gaming_
|
| "Building the DirectX shader compiler better than Microsoft?"
| (2024) https://news.ycombinator.com/item?id=39324800
|
| E.g. llama.cpp already supports hipBLAS; is there an advantage
| to this ROCm CUDA-compatibility layer - ZLUDA on Radeon (and
| not yet Intel OneAPI) - instead or in addition?
| https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#hi...
| https://news.ycombinator.com/item?id=38588573
|
| What can't WebGPU abstract away from CUDA unportability?
| https://news.ycombinator.com/item?id=38527552
| nocombination wrote:
| As other folks have commented, CUDA not being an open standard
| is a large part of the problem. That and the developers who
| target CUDA directly when writing Stable Diffusion algorithms--
| they are forcing the monopoly. Even at the cost of not being
| able to squeeze every ounce out of the GPU, portability greatly
| improves software access when people target Vulkan et al.
| formerly_proven wrote:
| > I'm really rooting for AMD to break the CUDA monopoly. To
| this end, I genuinely don't know whether a translation layer is
| a good thing or not. On the upside it makes the hardware much
| more viable instantly and will boost adoption, on the downside
| you run the risk that devs will never support ROCm, because you
| can just use the translation layer.
|
| On the other hand:
|
| > The next major ROCm release (ROCm 6.0) will not be backward
| [source] compatible with the ROCm 5 series.
|
| Even worse, not even the driver is backwards-compatible:
|
| > There are some known limitations though like currently only
| targeting the ROCm 5.x API and not the newly-released ROCm 6.x
| releases.. In turn having to stick to ROCm 5.7 series as the
| latest means that using the ROCm DKMS modules don't build
| against the Linux 6.5 kernel now shipped by Ubuntu 22.04 LTS
| HWE stacks, for example. Hopefully there will be enough
| community support to see ZLUDA ported to ROCM 6 so at least it
| can be maintained with current software releases.
| nialv7 wrote:
| I am surprised that everybody seem to have forgotten the
| (in)famous Embrace, Extend and Extinguish strategy.
|
| It's time for Open Source to be on the extinguishing side for
| once.
| sophrocyne wrote:
| Hey there -
|
| I'm a maintainer (and CEO) of Invoke.
|
| It's something we're monitoring as well.
|
| ROCm has been challenging to work with - we're actively talking
| to AMD to keep apprised of ways we can mitigate some of the
| more troublesome experiences that users have with getting
| Invoke running on AMD (and hoping to expand official support to
| Windows AMD)
|
| The problem is that a lot of the solutions proposed involve
| significant/unsustainable dev effort (i.e., supporting an
| entirely different inference paradigm), rather than "drop in"
| for the existing Torch/diffusers pipelines.
|
| While I don't know enough about your set up to offer immediate
| solutions, if you join the discord, am sure folks would be
| happy to try walking through some manual
| troubleshooting/experimentation to get you up and running -
| discord.gg/invoke-ai
| latchkey wrote:
| Invoke is awesome. Let me know if you guys want some MI300x
| to develop/test on. =) We've also got some good contacts at
| AMD if you need help there as well.
| CapsAdmin wrote:
| Hope this can benefit from the seemingly infinite enthusiasm from
| rust programmers
| sharts wrote:
| AMD fail to realize software toolchain is what makes nvidia
| great. AMD thinks the hardware is all that's needed
| JonChesterfield wrote:
| Nvidia's toolchain is really not great. Applications are just
| written to step around the bugs.
|
| ROCm has different bugs, which the application workarounds tend
| to miss.
| bornfreddy wrote:
| Yes. This is what makes Nvidia's toolchain, if not great, at
| least ok. As a developer I can actually use their GPUs. And
| what I developed locally I can yhen run on Nvidia hardware in
| the cloud and pay by usage.
|
| AMD doesn't seem to understand that affordable entry-level
| hardware with good software support is key.
| JonChesterfield wrote:
| Ah yes, so that one does seem to be a stumbling block. ROCm
| is not remotely convinced that running on gaming cards is a
| particularly useful thing. HN is really sure that being
| able to develop code on ~free cards that you've got lying
| around anyway is an important gateway to running on amdgpu.
|
| The sad thing is people can absolutely run ROCm on gaming
| cards if they build from source. Weirdly GPU programmers
| seem determined to use proprietary binaries to run
| "supported" hardware, and thus stick with CUDA.
|
| I don't understand why AMD won't write the names of some
| graphics cards under "supported", even if they didn't test
| them as carefully as the MI series, and I don't understand
| why developers are so opposed to compiling their toolchains
| from source. For one thing it means you can't debug the
| toolchain effectively when it falls over, weird limitation
| to inflict on oneself.
|
| Strange world.
| CapsAdmin wrote:
| One thing I didn't see mentioned anywhere apart from the repos
| readme:
|
| > PyTorch received very little testing. ZLUDA's coverage of cuDNN
| APIs is very minimal (just enough to run ResNet-50) and
| realistically you won't get much running.
| yieldcrv wrote:
| Sam could get more chips for way less than $7 trillion if he
| helps fund and mature this
| JonChesterfield wrote:
| I'm pretty tired of the business model of raising capital from
| VCs to give to Nvidia.
| Keyframe wrote:
| This event of release is however a result of AMD stopped funding
| it per "After two years of development and some deliberation, AMD
| decided that there is no business case for running CUDA
| applications on AMD GPUs. One of the terms of my contract with
| AMD was that if AMD did not find it fit for further development,
| I could release it. Which brings us to today." from
| https://github.com/vosen/ZLUDA?tab=readme-ov-file#faq
|
| so, same mistake intel made before.
| VoxPelli wrote:
| Sounds like he had a good contract, would be great to read more
| about that, hopefully more devs could include the same
| phrasing!
| nikanj wrote:
| This should be the top comment here, people are getting their
| hopes up for nothing
| jacoblambda wrote:
| I mean it could also be that there was no business case for it
| as long as it remained closed source work.
|
| If the now very clearly well functioning implementation
| continues to perform as well as it is, the community may be
| able to keep it funded and functioning.
|
| And the other side of this is that with renewed AMD
| interest/support for the rocm/HIP project, it might be just
| good enough as a stopgap step to push projects towards rocm/HIP
| adoption. (included below is another blurb from the readme).
|
| > I am a developer writing CUDA code, does this project help me
| port my code to ROCm/HIP?
|
| > Currently no, this project is strictly for end users. However
| this project could be used for a much more gradual porting from
| CUDA to HIP than anything else. You could start with an
| unmodified application running on ZLUDA, then have ZLUDA expose
| the underlying HIP objects (streams, modules, etc.), allowing
| to rewrite GPU kernels one at a time. Or you could have a mixed
| CUDA-HIP application where only the most performance sensitive
| GPU kernels are written in the native AMD language.
| pk-protect-ai wrote:
| > After two years of development and some deliberation, AMD
| decided that there is no business case for running CUDA
| applications on AMD GPUs
|
| Who was responsible at AMD for this project and why is he still
| not fired???????? How brain dead someone have to be to reject
| the major market share??????
| tgsovlerkhgsel wrote:
| How is this not priority #1 for them, with NVIDIA stock
| shooting to the moon because everyone does machine learning
| using CUDA-centric tools?
|
| If AMD could get 90% of the CUDA ML stuff to seamlessly run on
| AMD hardware, and could provide hardware at a competitive cost-
| per-performance (which I assume they probably could since
| NVIDIA must have an insane profit margin on their GPUs),
| wouldn't that be _the_ opportunity to eat NVIDIA 's lunch?
| make3 wrote:
| it's a common misconception that deep learning stuff is built
| in cuda. it's actually built on CUDNN kernels that don't use
| cuda but are actually gpu assembly written by hand by phds.
| I'm really not convinced that this project here would be able
| to be used for this. the ROCm kernels that are analogue to
| cudnn though, yes
| pheatherlite wrote:
| The only reason our lab bought 20k worth of Nvidia gpu cards
| rather than amd was the cuda industry standard (might as
| wellbe). It's kind of mind boggling how much business amd
| must be losing over this.
| shmerl wrote:
| Anything that breaks CUDA lock-in is great! This reminds how
| DX/D3D lock-in was broken by dxvk and vkd3d-proton.
|
| _> It apparently came down to an AMD business decision to
| discontinue the effort_
|
| Bad decision if that's the case. May be someone can pick it up,
| since it's open now.
| swozey wrote:
| I may have missed it in the article, but this post would mean
| absolutely nothing to me except for the fact that last week I got
| into stable diffusion so I'm crushing my 4090 with pytorch and
| deepspeed, etc and dealing with a lot of nvidia ctk/sdk stuff.
| Well, I'm actually trying to do this in windows w/ wsl2 and
| deepmind/torch/etc in containers and it's completely broken so
| not crushing currently.
|
| I guess awhile ago it was found that Nvidia was bypassing the
| kernels GPL license driver check and I read that kernel 6.6 was
| going to lock that driver out if they didn't fix it, and from
| what I've read there was no reply or anything done by nvidia yet.
| Which I think I probably just can't find.
|
| Am I wrong about that part?
|
| We're on kernel 6.7.4 now and I'm still using the same drivers.
| Did it get pushed back, did nvidia fix it?
|
| Also, while trying to find answers myself I came across this 21
| year old post which is pretty funny and very apt for the topic
| https://linux-kernel.vger.kernel.narkive.com/eVHsVP1e/why-is...
|
| I'm seeing conflicting info all over the place so I'm not really
| sure what the status of this GPL nvidia driver block thing is.
| zoobab wrote:
| "For reasons unknown to me, AMD decided this year to discontinue
| funding the effort and not release it as any software product."
|
| Managers at AMD never heard of AI?
| ultra_nick wrote:
| If anyone wants to work in this area, AMD currently has a lot of
| related job posts open.
| navbaker wrote:
| The other big need is for a straightforward library for dynamic
| allocation/sharing of GPUs. Bitfusion was a huge pain in the ass,
| but at least it was something. Now it's been discontinued, the
| last version doesn't support any recent versions of PyTorch, and
| there's only two(?) possible replacements in varying levels of
| readiness (Juice and RunAI). We're experimenting now with
| replacing our Bitfusion installs with a combination of Jupyter
| Enterprise Gateway and either MIGed GPUs or finding a way to get
| JEG to talk to a RunAI installation to allow quick allocation and
| deallocation of portions of GPUs for our researchers.
| irusensei wrote:
| I'll try later with projects in which I had issues to make it
| work like TortoiseTTS. I'm not expecting comparable Nvidia speeds
| but definitely faster than pure CPU.
| Farfignoggen wrote:
| Phoronix Article from earlier(1):
|
| "While AMD ships pre-built ROCm/HIP stacks for the major
| enterprise Linux distributions, if you are using not one of them
| or just want to be adventurous and compile your own stack for
| building HIP programs for running on AMD GPUs, one of the AMD
| Linux developers has written a how-to guide. "(1)
|
| (1)
|
| "Building An AMD HIP Stack From Upstream Open-Source Code
|
| Written by Michael Larabel in Radeon on 9 February 2024 at 06:45
| AM EST."
|
| https://www.phoronix.com/news/Building-Upstream-HIP-Stack
| JonChesterfield wrote:
| Hahnle is one of our best, that'll be solid.
| http://nhaehnle.blogspot.com/2024/02/building-hip-
| environmen.... Looks pretty similar to how I build it.
|
| Side point, there's a driver in your linux kernel already
| that'll probably work. The driver that ships with rocm is a
| newer version of the same and might be worth building via dkms.
|
| Very strange that the rocm github doesn't have build scripts
| but whatever, I've been trying to get people to publish those
| for almost five years now and it just doesn't seem to be
| feasible.
| rekado wrote:
| You can also install HIP/ROCm via Guix:
|
| https://hpc.guix.info/blog/2024/01/hip-and-rocm-come-to-guix...
|
| > AMD has just contributed 100+ Guix packages adding several
| versions of the whole HIP and ROCm stack
| fancyfredbot wrote:
| Wouldn't it be fun to make this work on intel graphics as well.
| codedokode wrote:
| As I understand, Vulkan allows to run custom code on GPU,
| including the code to multiply matrices. Can one simply use
| Vulkan and ignore CUDA, PyTorch and ROCm?
___________________________________________________________________
(page generated 2024-02-12 23:00 UTC)