[HN Gopher] AMD funded a drop-in CUDA implementation built on RO...
       ___________________________________________________________________
        
       AMD funded a drop-in CUDA implementation built on ROCm: It's now
       open-source
        
       Author : mfiguiere
       Score  : 817 points
       Date   : 2024-02-12 14:00 UTC (8 hours ago)
        
 (HTM) web link (www.phoronix.com)
 (TXT) w3m dump (www.phoronix.com)
        
       | hd4 wrote:
       | https://github.com/vosen/ZLUDA - source
        
         | MegaDeKay wrote:
         | Latest commit message: "Nobody expects the Red Team"
        
         | pella wrote:
         | https://github.com/vosen/ZLUDA/tree/v3
        
       | fariszr wrote:
       | > after the CUDA back-end was around for years and after dropping
       | OpenCL, Blender did add a Radeon HIP back-end... But the real
       | kicker here is that using ZLUDA + CUDA back-end was slightly
       | faster than the native Radeon HIP backend.
       | 
       | This is absolutely crazy.
        
         | toxik wrote:
         | Is AMD just a puppet org to placate antitrust fears? Why are
         | they like this?
        
           | swozey wrote:
           | Is this really a theory? If so my $8 AMD stock from, 2015? is
           | currently worth $176 so they should make more shell companies
           | they're doing great.
           | 
           | I guess that might answer my "Why would AMD find that having
           | a CUDA competitor isn't a business case unless they couldn't
           | do it or the cards underperformed significantly."
        
             | kllrnohj wrote:
             | For some reason AMD's GPU division continues to be run,
             | well, horribly. The CPU division is crushing it, but the
             | GPU division is comically bad. During the great GPU
             | shortage AMD had multiple opportunities to capture chunks
             | of the market and secure market share, increasing the
             | priority for developers to acknowledge and target AMD's
             | GPUs. What did they do instead? Not a goddamn thing, they
             | followed Nvidia's pricing and managed to sell jack shit
             | (like seriously the RX 580 is still the first AMD card to
             | show up on the steam hardware survey).
             | 
             | They're not going big enough dies at the top end to compete
             | with nvidia for the halo, and they're refusing to undercut
             | at the low end where nvidia's reputation for absurd pricing
             | is at an all time high. AMD's GPU division is a clown show,
             | it's impressively bad. Even though the hardware itself is
             | _fine_ they just can 't stop either making terrible product
             | launches, awful pricing strategies, or just brain dead
             | software choices like shipping a feature that triggered
             | anti-cheat, getting their customers predictably banned &
             | angering game devs in the process
             | 
             | And relevant to this discussion Nvidia's refusal to add
             | VRAM to their lower end cards is a prime opportunity for
             | AMD to go after the lower-end compute / AI interested crowd
             | who will become the next generation software devs. What are
             | they doing with this? Well, they're not making ROCm
             | available to basically anyone, that's apparently the
             | winning strategy. ROCm 6.0 only supports the 7900 XTX and
             | the... Radeon VII. The weird one-off Vega 20 refresh. Of
             | all the random cards to support, why the hell would you
             | pick that one???
        
               | swozey wrote:
               | > The (AMD) CPU division is crushing it
               | 
               | I worked at a baremetal CDN with 60 pops and a few years
               | ago we _had_ to switch to AMD because of PCIE bandwidth
               | over to our smartNICs and nvmeOF sort of things. We 'd
               | long hit limits on Intel before the Epyc stuff came out
               | so we had to have more servers running than we wanted
               | because we had to limit how much we did with one server
               | to not hit the limits and cause everything to lock.
               | 
               | And we were _excited_ , not a single apprehension. Epyc
               | crushed the server market, everyone is using them. Well,
               | it's going ARM now but Epyc will still be around awhile.
        
       | wheybags wrote:
       | Cannot understand why AMD would stop funding this. It seems like
       | this should have a whole team allocated to it.
        
         | otoburb wrote:
         | They would always be at the mercy of NVIDIA's API. Without
         | knowing the inner workings, perhaps a major concern with this
         | approach is the need to implement on NVIDIA's schedule instead
         | of AMD's which is a very reactive stance.
         | 
         | This approach actually would make sense if AMD felt, like most
         | of us perhaps, that the NVIDIA ecosystem is too entrenched, but
         | perhaps they made the decision recently to discontinue funding
         | because they (now?) feel otherwise.
        
           | blagie wrote:
           | They've been at mercy of Intel x86 APIs for a long time.
           | Didn't kill them.
           | 
           | What happens here is that the original vendor loses control
           | of the API once there are multiple implementations. That's
           | the best possible outcome for AMD.
           | 
           | In either case, they have a limited window to be adopted, and
           | that's more important. The abstraction layer here helps too.
           | AMD code is !@#$%. If this were adopted, it makes it easier
           | to fix things underneath. All that is a lot more important
           | than a dream of disrupting CUDA.
        
             | rubatuga wrote:
             | x86 is not the same, the courts forced the release of x86
             | architecture to AMD during an antitrust lawsuit
        
               | anon291 wrote:
               | You don't think the courts would force the opening of
               | CUDA? Didn't a court already rule that API cannot be
               | patented. I believe it was a Google case. As long as no
               | implementation was stolen, the API itself is not able to
               | be copyrighted.
               | 
               | Here it is: https://arstechnica.com/tech-
               | policy/2021/04/how-the-supreme-...
        
               | Symmetry wrote:
               | Regardless of the legal status of APIs, this Phoronix
               | article is about AMD providing a replacement ABI and I
               | wouldn't assume the legal issues are necessarily the
               | same. But because this is a case where AMD is following a
               | software target there's the possibility, if AMD starts to
               | succeed, that NVidia might change their ABI in ways that
               | deliberatly hurt AMD's compatibility efforts in ways that
               | would be much more difficult for APIs or hardware.
               | That's, presumably, why AMD is going forward with their
               | API emulation effort instead.
        
               | anon291 wrote:
               | If you read the article, it's about Google's re-
               | implementation of the Java API and runtime. Thus, yes,
               | Google was providing both API and ABI compatibility.
        
               | Symmetry wrote:
               | I read the article when it came out and re-scimmed it
               | just now. My understanding at the time and still was that
               | the legal case revolved around the API and the exhibits
               | entered into evidence I saw were all Java function names
               | with their arguments and things of that sort. And I'm
               | given to understand that the Dalvik Java implementation
               | Google was using with Android was register based rather
               | than than the stack based standard Java, which sounds to
               | me like it would make actual binary compatibility
               | impossible.
        
               | jcranmer wrote:
               | > Didn't a court already rule that API cannot be
               | patented. I believe it was a Google case. As long as no
               | implementation was stolen, the API itself is not able to
               | be copyrighted.
               | 
               | That is... not accurate in the slightest.
               | 
               | Oracle v Google was not about patentability. Software
               | patentability is its own separate minefield, since anyone
               | who looks at the general tenor of SCOTUS cases on the
               | issue should be able to figure out that SCOTUS is at best
               | highly skeptical of software patents, even if it hasn't
               | made any direct ruling on the topic. (Mostly this is a
               | matter of them being able to tell what they don't like
               | but not what they do like, I think). But I've had a
               | patent attorney straight-out tell me that in absence of
               | better guidance, they're just pretending the most recent
               | relevant ruling (which held that X-on-a-computer isn't
               | patentable) doesn't exist. In any case, a patent on
               | software APIs (as opposed to software as a whole) would
               | very clearly fall under the "what are you on, this isn't
               | patentable" category of patentability.
               | 
               | The case was about the copyrightability of software APIs.
               | Except if you read the decision itself, SCOTUS doesn't
               | actually answer the question [1]. Instead, it focuses on
               | whether or not Google's use of the Java APIs were fair
               | use. Fair use is a dangerous thing to rely on for legal
               | precedent, since there's no "automatic" fair use
               | category, but instead a balancing test ostensibly of four
               | factors but practically of one factor: does it hurt the
               | original copyright owner's profits [2].
               | 
               | There's an older decision which held that the "structure,
               | sequence, and organization" of code is copyrightable
               | independent of the larger work of software, which is
               | generally interpreted as saying that software APIs are
               | copyrightable. At the same time, however, it's widespread
               | practice in the industry to assume that "clean room"
               | development of an API doesn't violate any copyright. The
               | SCOTUS decision in Google v Oracle was widely interpreted
               | as endorsing this interpretation of the law.
               | 
               | [1] There's a sentence or two that suggests to me there
               | was previously a section on copyrightability that had
               | been ripped out of the opinion.
               | 
               | [2] See also the more recent Andy Warhol SCOTUS decision
               | which, I kid you not, says that you have to analyze this
               | to figure out whether or not a use is "transformative".
               | Which kind of implicitly overturns Google v Oracle if you
               | think about it, but is unlikely to in practice.
        
               | monocasa wrote:
               | To be fair, there were patent claims in Oracle vs. Google
               | too. That's why the appeals went through the CAFC rather
               | than the 9th circuit. Those claims were simply thrown out
               | pretty early. Whether that says something about more
               | generally or was simply a set of weak claims intended for
               | venue shopping is a legitimate discussion to be had
               | though.
        
               | hardware2win wrote:
               | You think x86 would be changed in such a way that it'd
               | break and?
               | 
               | Because what else?
               | 
               | If so, then i think that this is crazy because software
               | is harder to change than hardware
        
             | tikkabhuna wrote:
             | My understanding is that with AMD64 there's a circular
             | dependency where AMD need Intel for x86 and Intel need AMD
             | for x86_64?
        
               | monocasa wrote:
               | That's true now, but AMD has been making x86 compatible
               | CPUs since the original 8086.
        
             | lambdaone wrote:
             | More than that, a second implementation of CUDA acts as a
             | disincentive for NVIDIA to make breaking changes to it,
             | since it would reduce any incentive for software developers
             | to follow those changes, as it reduces the value of their
             | software by eliminating hardware choice for end-users
             | (which in some case like large companies are also the
             | developers themselves).
             | 
             | At the same time, open source projects can be pretty nimble
             | in chasing things like changing APIs, potentially
             | frustrating the effectiveness of API pivoting by NVIDIA in
             | a second way.
        
           | visarga wrote:
           | > They would always be at the mercy of NVIDIA's API.
           | 
           | They only need to support PyTorch. Not CUDA
        
       | sam_goody wrote:
       | I don't really follow this, but isn't it a bad sign for ROCm
       | that, for example, ZLUDA + Blender 4's CUDA back-end delivers
       | better performance than the native Radeon HIP back-end?
        
         | fariszr wrote:
         | It really shows how neglected their software stack is, or at
         | least how neglected this implementation is.
        
         | whizzter wrote:
         | Could be that the CUDA backend has seen far more specialization
         | optimizations whereas the seeingly fairly fresh HIP backend
         | hasn't had as many developers looking at it, in the end a few
         | more control instructions on the CPU side to go through the
         | ZLUDA wrapper will be insignificant compared to all the time
         | spent inside better optimized GPU kernels.
        
         | KeplerBoy wrote:
         | Surely this can be attributed to Blender's HIP code just being
         | suboptimal because nobody really cares about it. By extension
         | nobody cares about it because performance is suboptimal.
         | 
         | It's AMDs job to break that circle.
        
         | mdre wrote:
         | I'd say it's even worse, since for rendering Optix is like 30%
         | faster than CUDA. But that requires the tensor cores. At this
         | point AMD is waaay behind hardware wise.
        
       | btown wrote:
       | Why would this not be AMD's top priority among priorities?
       | Someone recently likened the situation to an Iron Age where
       | NVIDIA owns all the iron. And this sounds like AMD knowing about
       | a new source of ore and not even being willing to sink a single
       | engineer's salary into exploration.
       | 
       | My only guess is they have a parallel skunkworks working on the
       | same thing, but in a way that they can keep it closed-source -
       | that this was a hedge they think they no longer need, and they
       | are missing the forest for the trees on the benefits of cross-
       | pollination and open source ethos to their business.
        
         | fariszr wrote:
         | According to the article, AMD seems to have pulled the plug on
         | this as they think it will hinder ROCMv6 adoption, which still
         | btw only supports two consumer cards out of their entire line
         | up[1]
         | 
         | 1. https://www.phoronix.com/news/AMD-ROCm-6.0-Released
        
           | bhouston wrote:
           | AMD should have the funds to push both of these initiatives
           | at once. If the ROCM team has political reasons to kill the
           | competition, it is because they are scared it will succeed.
           | I've seen this happen in big companies.
           | 
           | But management at AMD should be above petty team politics and
           | fund both because at the company level they do not care which
           | solution wins in the end.
        
             | imtringued wrote:
             | Why would they be worried about people using their product?
             | Some CUDA wrapper on top of ROCM isn't going to get them
             | fired. It doesn't get rid of ROCM's function as a GPGPU
             | driver.
        
             | zer00eyz wrote:
             | If your AMD you don't want to be compatible till you have a
             | compelling feature of your own.
             | 
             | Good enough CUDA + New feature x gives them leverage in the
             | inevitable court battle(S) and patten sharing agreement
             | that everyone wants to see.
             | 
             | AMD' already stuck its toe in the water: new CPU's with
             | their AI cores built in. If you can get a AM5 socket to run
             | with 196 gigs, that's a large (all be it slow) model you
             | can run.
        
           | kkielhofner wrote:
           | With the most recent card being their one year old flagship
           | ($1k) consumer GPU...
           | 
           | Meanwhile CUDA supports anything with Nvidia stamped on it
           | before it's even released. They'll even go as far as doing
           | things like adding support for new GPUs/compute families to
           | older CUDA versions (see Hopper/Ada and CUDA 11.8).
           | 
           | You can go out and buy any Nvidia GPU the day of release,
           | take it home, plug it in, and everything just works. This is
           | what people expect.
           | 
           | AMD seems to have no clue that this level of usability is
           | what it will take to actually compete with Nvidia and it's a
           | real shame - their hardware is great.
        
             | KingOfCoders wrote:
             | AMD thinks the reason Nvidia is ahead of them is bad
             | marketing on their part, and good marketing (All is AI) by
             | Nvidia. They don't see the difference in software stacks.
             | 
             | For years I want to get off the Nvidia train for AI, but
             | I'm forced to buy another Nvidia card b/c AMD stuff just
             | doesn't work, and all examples work with Nvidia cards as
             | they should.
        
               | fortran77 wrote:
               | At the risk of sounding like Jeff Ballmer, the reason I
               | only use NVIDIA for GPGPU work (our company does a lot of
               | it!) is the developer support. They have compilers,
               | tools, documentation, and tech support for developers who
               | want to do any type of GPGPU computing on their hardware
               | that just isn't matched on any other platform.
        
             | roenxi wrote:
             | You've got to remember that AMD are behind at all aspects
             | of this, including documenting their work in an easily
             | digestible way.
             | 
             | "Support" means that the card is actively tested and
             | presumably has some sort of SLA-style push to fix bugs for.
             | As their stack matures, a bunch of cards that don't have
             | official support will work well [0]. I have an unsupported
             | card. There are horrible bugs. But the evidence I've seen
             | is that the card will work better with time even though it
             | is never going to be officially supported. I don't think
             | any of my hardware is officially supported by the
             | manufacturer, but the kernel drivers still work fine.
             | 
             | > Meanwhile CUDA supports anything with Nvidia stamped on
             | it before it's even released...
             | 
             | A lot of older Nvidia cards don't support CUDA v9 [1]. It
             | isn't like everything supports everything, particularly in
             | the early part of building out capability. The impression
             | I'm getting is that in practice the gap in strategy here is
             | not as large as the current state makes it seem.
             | 
             | [0] If anyone has bought an AMD card for their machine to
             | multiply matrices they've been gambling on whether the
             | capability is there. This comment is reasonable
             | speculation, but I want to caveat the optimism by asserting
             | that I'm not going to put money into AMD compute until
             | there is some some actual evidence on the table that GPU
             | lockups are rare.
             | 
             | [1] https://en.wikipedia.org/wiki/CUDA#GPUs_supported
        
               | spookie wrote:
               | To be fair, if anything, that table still shows you'll
               | have compatibility with at least 3 major releases. Either
               | way, I agree their strategy is getting results, it just
               | takes time. I do prefer their open source commitment, I
               | just hope they continue.
        
               | paulmd wrote:
               | All versions of CUDA support PTX, which is an
               | intermediate bytecode/compiler representation that can be
               | finally-compiled by even CUDA 1.0.
               | 
               | So the contract is: as long as your future program does
               | not touch any intrinsics etc that do not exist in CUDA
               | 1.0, you can export the new program from CUDA 27.0 as
               | PTX, and the GTX 6800 driver will read the PTX and let
               | your gpu run it as CUDA 1.0 code... so it is quite
               | literally just as they describe, unlimited forward and
               | backward capability/support as long as you go through PTX
               | in the middle.
               | 
               | https://docs.nvidia.com/cuda/archive/10.1/parallel-
               | thread-ex...
               | 
               | https://en.wikipedia.org/wiki/Parallel_Thread_Execution
        
               | ColonelPhantom wrote:
               | CUDA dropped Tesla (from 2006!) only as of 7.0, which
               | seems to have released around 2015. Fermi support lasted
               | from 2010 until 2017, giving it a solid 7 years still.
               | Kepler support was dropped around 2020, and the first
               | cards were released in 2012.
               | 
               | As such Fermi seems to be the shortest supported
               | architecture, and it was around for 7 years. GCN4
               | (Polaris) was introduced in 2016, and seems to have been
               | officially dropped around 2021, just 5 years in. While
               | you could still get it working with various workarounds,
               | I don't see the evidence of Nvidia being even remotely as
               | hasty as AMD with removing support, even for early
               | architectures like Tesla and Fermi.
        
               | hedgehog wrote:
               | On top of this some Kepler support (for K80s etc) is
               | still maintained in CUDA 11 which was last updated late
               | 2022, and libraries like PyTorch and TensorFlow still
               | support CUDA 11.8 out of the box.
        
             | Certhas wrote:
             | The most recent "card" is their MI300 line.
             | 
             | It's annoying as hell to you and me that they are not
             | catering to the market of people who want to run stuff on
             | their gaming cards.
             | 
             | But it's not clear it's bad strategy to focus on executing
             | in the high-end first. They have been very successful
             | landing MI300s in the HPC space...
             | 
             | Edit: I just looked it up: 25% of the GPU Compute in the
             | current Top500 Supercomputers is AMD
             | 
             | https://www.top500.org/statistics/list/
             | 
             | Even though the list has plenty of V100 and A100s which
             | came out (much) earlier. Don't have the data at hand, but I
             | wouldn't be surprised if AMD got more of the Top500 new
             | installations than nVidia in the last two years.
        
               | latchkey wrote:
               | I'm building a bare metal business around MI300x and top
               | end Epyc CPUs. We will have them for rental soon. The
               | goal is to build a public super computer that isn't just
               | available to researchers in HPC.
        
               | beebeepka wrote:
               | Is it true MI300 line is 3-4x cheaper for similar
               | performance than whatever nvidia is selling in highest
               | segment?
        
               | latchkey wrote:
               | I probably can't comment on that, but what I can comment
               | on is this:
               | 
               | H100's are hard to get. Nearly impossible. CoreWeave and
               | others have scooped them all up for the foreseeable
               | future. So, if you are looking at only price as the
               | factor, then it becomes somewhat irrelevant, if you can't
               | even buy them [0]. I don't really understand the focus on
               | price because of this fact.
               | 
               | Even if you do manage to score yourself some H100's. You
               | also need to factor in the networking between nodes. IB
               | (Infiniband) made by Mellanox, is owned by NVIDIA. Lead
               | times on that equipment are 50+ weeks. Again, price
               | becomes irrelevant if you can't even network your boxes
               | together.
               | 
               | As someone building a business around MI300x (and future
               | products), I don't care that much about price [!]. We
               | know going in that this is a super capital intensive
               | business and have secured the backing to support that. It
               | is one of those things where "if you have to ask, you
               | can't afford it."
               | 
               | We buy cards by the chassis, it is one price. I actually
               | don't know the exact prices of the cards (but I can infer
               | it). It is a lot about who you know and what you're
               | doing. You buy more chassis, you get better pricing.
               | Azure is probably paying half of what I'm paying [1]. But
               | I'd also say that from what I've seen so far, their
               | chassis aren't nearly as nice as mine. I have dual
               | 9754's, 2x bonded 400G, 3TB ram, and 122TB nvme... plus
               | the 8x MI300x. These are top of the top. They have Intel
               | and I don't know what else inside.
               | 
               | [!] Before you harp on me, of course I care about
               | price... but at the end of the day, it isn't what I'm
               | focused on today as much as just being focused on
               | investing all of the capex/opex that I can get my hands
               | on, into building a sustainable business that provides as
               | much value as possible to our customers.
               | 
               | [0] https://www.tomshardware.com/news/tsmc-shortage-of-
               | nvidias-a...
               | 
               | [1] https://www.techradar.com/pro/instincts-are-
               | massively-cheape...
        
               | beebeepka wrote:
               | Pretty sweet. I do envy you. For what it's worth, I would
               | prefer AMD to charge as much as possible for these little
               | beasts.
        
               | latchkey wrote:
               | They definitely aren't cheap.
        
               | kkielhofner wrote:
               | Indeed, but this is extremely short-sighted.
               | 
               | You don't win an overall market by focusing on several
               | hundred million dollar bespoke HPC builds where the
               | platform (frankly) doesn't matter at all. I'm working on
               | a project on an AMD platform on the list (won't say - for
               | now) and needless to say you build whatever you have to
               | what's there, regardless of what it takes and the
               | operators/owners and vendor support teams pour in
               | whatever resources are necessary to make it work.
               | 
               | You win a market a generation at a time - supporting low
               | end cards for tinkerers, the educational market, etc. AMD
               | should focus on the low-end because that's where the next
               | generation of AI devs, startups, innovation, etc is
               | coming from and for now that's going to continue to be
               | CUDA/Nvidia.
        
             | voakbasda wrote:
             | In the embedded space, Nvidia regularly drops support for
             | older hardware. The last supported kernel for their Jetson
             | TX2 was 4.9. Their newer Jetson Xavier line is stuck on
             | 5.10.
             | 
             | The hardware may be great, but their software ecosystem is
             | utter crap. As long as they stay the unchallenged leader in
             | hardware, I expect Nvidia will continue to produce crap
             | software.
             | 
             | I would push to switch our products in a heartbeat, if AMD
             | actually gets their act together. If this alternative
             | offers a path to evaluate our current application software
             | stack on an AMD devkit, I would buy one tomorrow.
        
               | kkielhofner wrote:
               | In the embedded space customers develop bespoke solutions
               | to well, embed them in products where they (essentially)
               | bake the firmware image and more-or-less freeze the
               | entire software stack less incremental updates. The next
               | version of your product uses the next fresh Jetson and
               | Jetpack release. Repeat. Using the latest and greatest
               | kernel is far from a top consideration in these
               | applications...
               | 
               | I was actually advising an HN user against using Jetson
               | just the other day because it's such an extreme outlier
               | when it comes to Nvidia and software support. Frankly
               | Jetson makes no sense unless you really need the power
               | efficiency and form-factor.
               | 
               | Meanwhile, any seven year old >= Pascal card is fully
               | supported in CUDA 12 and the most recent driver releases.
               | That combined with my initial data points and others
               | people have chimed in with on this thread is far from
               | "utter crap".
               | 
               | Use the right tool for the job.
        
             | streb-lo wrote:
             | I have been using rocm on my 7800xt, it seems to be
             | supported just fine.
        
           | MrBuddyCasino wrote:
           | AMD truly deserves its misfortune in the GPU market.
        
           | incrudible wrote:
           | That is really out of touch. ROCm is garbage as far as I am
           | concerned. A drop in replacement, especially one that seems
           | to perform quite well, is really interesting however.
        
         | iforgotpassword wrote:
         | Someone built the same a while ago for Intel gpus, I think even
         | still the old pre-Xe ones. With arc/xe on the horizon, people
         | had the same question: why isn't Intel sponsoring this or even
         | building their own. It was speculated that this might get them
         | into legal hot water with Nvidia, Google VS. Oracle was brought
         | up, etc...
        
           | my123 wrote:
           | They financed the prior iteration of Zluda:
           | https://github.com/vosen/ZLUDA?tab=readme-ov-file#faq
           | 
           | but then stopped
        
             | formerly_proven wrote:
             | > [2021] After some deliberation, Intel decided that there
             | is no business case for running CUDA applications on Intel
             | GPUs.
             | 
             | oof
        
               | iforgotpassword wrote:
               | That's an oof indeed. Are AMD and Intel really that
               | delusional, ie "once we get our own version of Cuda right
               | everybody will just rewrite all their software to make
               | use of it", or do they know something we mere mortals
               | don't?
        
               | garaetjjte wrote:
               | Maybe their lawyers are afraid of another round of "are
               | APIs copyrightable"?
        
               | AtheistOfFail wrote:
               | > After two years of development and some deliberation,
               | AMD decided that there is no business case for running
               | CUDA applications on AMD GPUs.
               | 
               | Oof x2
        
               | Cheer2171 wrote:
               | Are you freaking kidding me!?!? Fire those MBAs
               | immediately.
        
         | geodel wrote:
         | Well simplest reason would be money. There are few companies
         | rolling in kind of money like Nvidia and AMD is not one of
         | them. Cloud vendors would care a bit for them it is just
         | business if Nvidia cost a lot more they in turn charge their
         | customers a lot more while keeping their margins. I know some
         | people still harbors notion that _competition_ will lower the
         | price, and it may, just not in sense customers imagine.
        
         | izacus wrote:
         | Why do you think running after nVidia for this submarket is a
         | good idea for them? The AMD GPU team isn't especially big and
         | the development investment is massive. Moreover, they'll have
         | the opportunity cost for projects they're now dominating in
         | (all game consoles for example).
         | 
         | Do you expect them to be able to capitalize on the AI fad so
         | much (and quickly enough!) that it's worth dropping the ball on
         | projects they're now doing well in? Or perhaps continue
         | investing into the part of the market where they're doing much
         | better than nVidia?
        
           | jandrese wrote:
           | If the alternative it to ignore one of the biggest developing
           | markets then yeah, maybe they should start trying to catch
           | up. Unless you think GPU compute is a fad that's going to
           | fizzle out?
        
             | izacus wrote:
             | One of the most important decisions a company can do, is to
             | decide which markets they'll focus in and which they won't.
             | This is even true for megacorps (see: Google and their
             | parade of messups). There's just not enough time to be in
             | all markets all at once.
             | 
             | So, again, it's not at all clear that AMD being in the
             | compute GPU game is the automatic win for them in the
             | future. There's plenty of companies that killed themselves
             | trying to run after big profitable new fad markets (see:
             | Nokia and Windows Phone, and many other cases).
             | 
             | So let's examine that - does AMD actually have a good shot
             | of taking a significant chunk of market that will offset
             | them not investing in some other market?
        
               | jandrese wrote:
               | AMD is literally the only company on the market poised to
               | exploit the explosion in demand for GPU compute after
               | nVidia (sorry Intel). To not even really try to break in
               | is insanity. nVidia didn't grow their market cap by 5x
               | over the course of a year because people really got into
               | 3D gaming. Even as an also ran on the coat tails of
               | nVidia with a compatibility glue library the market is
               | clearly demanding more product.
        
               | justinclift wrote:
               | Isn't Intel's next gen GPU supposed to be pretty strong
               | on compute?
               | 
               | Read an article about it recently, but when trying to
               | remember the details / find it again just now I'm not
               | seeing it. :(
        
               | jandrese wrote:
               | Intel is trying, but all of their efforts thus far have
               | been pretty sad and abortive. I don't think anybody is
               | taking them seriously at this point.
        
               | spookie wrote:
               | Their OneAPI is really interesting!
        
               | 7speter wrote:
               | I'm not an expert like you would find here on HN, I am
               | only really a tinkerer and learner, amateur at best, but
               | I think Intel's compute is very promising on Alchemist.
               | The A770 beats out the 4060ti 16gb in video rendering via
               | Davinci Resolve and Adobe; has AV1 support in free
               | Davinci Resolve while Lovelace only has AV1 support in
               | studio. Then for AI, the A770 has had a good showing in
               | stable diffsion against Nvidia's midrange Lovelace since
               | the summer: https://www.tomshardware.com/news/stable-
               | diffusion-for-intel...
               | 
               | The big issue for Intel is pretty similar to that of AMD;
               | everything is made for CUDA, and Intel has to either
               | build their own solutions or convince people to build
               | support for Intel. While I'm working on learning AI and
               | plan to use an Nvidia card, its pretty the progress Intel
               | has made in the last couple of years since introducing
               | their first GPU to market has been pretty wild, and I
               | think it really give AMD pause.
        
               | atq2119 wrote:
               | They are breaking in, though. By all accounts, MI300s are
               | being sold as fast as they can make them.
        
               | thfuran wrote:
               | Investing in what other market?
        
               | yywwbbn wrote:
               | > So, again, it's not at all clear that AMD being in the
               | compute GPU game is the automatic win for them in the
               | future. There's
               | 
               | You're right about that but it seems that it's pretty
               | clear that not being in the compute GPU game is an
               | automatic loss for them (look at their recent revenue
               | growth in the past quarter and two by in each sector)
        
               | imtringued wrote:
               | Are you seriously telling me they shouldn't invest into
               | one of their core markets? The necessary investments are
               | probably insignificant. Let's say you need a budget of 10
               | million dollars (50 developers) to assemble a dev team to
               | fix ROCM. How many 7900 XTX to break even on revenue?
               | Roughly 9000. How many did they sell? I'm too lazy to
               | count but Mindfactory a German online shop alone sold
               | around 6k units.
        
           | nindalf wrote:
           | AMD is betting big on GPUs. They recently released the MI300,
           | which has "2x transistors, 2.4x memory and 1.6x memory
           | bandwidth more than the H100, the top-of-the-line artificial-
           | intelligence chip made by Nvidia"
           | (https://www.economist.com/business/2024/01/31/could-amd-
           | brea...).
           | 
           | They very much plan to compete in this space, and hope to
           | ship $3.5B of these chips in the next year. Small compared to
           | Nvidia's revenues of $59B (includes both consumer and data
           | centre), but AMD hopes to match them. It's too big a market
           | to ignore, and they have the hardware chops to match Nvidia.
           | What they lack is software, and it's unclear if they'll ever
           | figure that out.
        
             | incrudible wrote:
             | They are trying to compete in the segment of data center
             | market where the shots are called by bean counters
             | calculating FLOPS per dollar.
        
               | BearOso wrote:
               | A market where Nvidia chips are all bought out, so what's
               | left?
        
               | latchkey wrote:
               | That's why I'm going to democratize that business and
               | make it available to anyone who wants access. How does
               | bare metal rentals of MI300x and top end Epyc CPUs sound?
               | We take on the capex/opex/risk and give people what they
               | want, which is access to HPC clusters.
        
           | throwawaymaths wrote:
           | IIRC (this could be old news) AMD GPUs are preferred in the
           | supercomputer segment because they offer better flops/unit
           | energy. However without a cuda-like you're missing out on the
           | AI part of supercompute, which is increasing proportion.
           | 
           | The margins on supercompute-related sales are very high.
           | Simplifying, but you can basically take a consumer chip,
           | unlock a few things, add more memory capacity, relicense, and
           | your margin goes up by a huge factor.
        
             | Symmetry wrote:
             | It's more that the resource balance in AMD's compute line
             | of GPUs (the CDNA ones) has been more focused on the double
             | precision operations that most supercomputer code makes
             | heavy use of.
        
               | throwawaymaths wrote:
               | Thanks for clarifying! I had a feeling I had my story
               | slightly wrong
        
             | anonylizard wrote:
             | They are preferred not because of inherent superiority of
             | AMD GPUs. But simply because they have to price lower and
             | have lower margins.
             | 
             | Nvidia could always just half their prices one day, and
             | wipe out every non-state-funded competitor. But Nvidia
             | prefers to collect their extreme margins and funnel it into
             | even more R&D in AI.
        
           | hnlmorg wrote:
           | GPU for compute has been a thing since the 00s. Regardless of
           | whether AI is a fad (it isn't, but we can agree to disagree
           | on this one) not investing more in GPU compute is a weird
           | decision.
        
           | FPGAhacker wrote:
           | It was Microsoft's strategy for several decades (outsiders
           | called it embrace, extend, extinguish, only partially in
           | jest). It can work for some companies.
        
           | currymj wrote:
           | everyone buying GPUs for AI and scientific workloads wishes
           | AMD was a viable option, and this has been true for almost a
           | decade now.
           | 
           | the hardware is already good enough, people would be happy to
           | use it and accept that's it's not quite as optimized for DL
           | as Nvidia.
           | 
           | people would even accept that the software is not as
           | optimized as CUDA, I think, as long as it is correct and
           | reasonably fast.
           | 
           | the problem is just that every time i've tried it, it's been
           | a pain in the ass to install and there are always weird bugs
           | and crashes. I don't think it's hubris to say that they could
           | fix these sorts of problems if they had the will.
        
           | bonton89 wrote:
           | AMD also has the problem that they make much better margins
           | on their CPUs than on their GPUs and there are only so many
           | TSMC wafers. So in a way making more GPUs is like burning up
           | free money.
        
           | carlossouza wrote:
           | Because the supply for this market is constrained.
           | 
           | It's a pure business decision based on simple math.
           | 
           | If the estimated revenues from selling to the underserved
           | market are higher than the cost of funding the project (they
           | probably are, considering the obscene margins from NVIDIA),
           | then it's a no-brainer.
        
           | yywwbbn wrote:
           | Because their current market valuation was massively inflated
           | because of the AI/GPU boom and/or bubble?
           | 
           | In rational world their stock price would collapse if they
           | don't focus on it and are unable to deliver anything
           | competitive in the upcoming year or two
           | 
           | > of the market where they're doing much better than nVidia?
           | 
           | So the market that's hardly growing, Nvidia is not competing
           | in and Intel still has bigger market share and is catching up
           | performance wise? AMD's valuation is this highly only because
           | they are seen as the only company that could directly compete
           | with Nvidia in the data center GPU market.
        
         | jandrese wrote:
         | AMD's management seems to be only vaguely aware that GPU
         | compute is a thing. All of their efforts in the field feel like
         | afterthoughts. Or maybe they are all just hardware guys who
         | think of software as just a cost center.
        
           | giovannibonetti wrote:
           | Maybe they just can't lure in good software developers with
           | the right skill set, either due to not paying them enough or
           | not having a good work environment in comparison to the other
           | places that could hire them.
        
             | captainbland wrote:
             | I did a cursory glance at Nvidia's and AMD's respective
             | careers pages for software developers at one point - what
             | struck me was they both have similarly high requirements
             | for engineers in fields like GPU compute and AI but Nvidia
             | hires much more widely, geographically speaking, than AMD.
             | 
             | As a total outsider it seems to me that maybe one of AMD's
             | big problems is they just aren't set up to take advantage
             | of the global talent pool in the same way Nvidia is.
        
           | newsclues wrote:
           | They are aware, but it wasn't until recently that they had
           | the resources to invest in the space. They had to build Zen
           | and start making buckets of money first
        
             | beebeepka wrote:
             | Exactly. AMD stock was like 2 dollars just eight years ago.
             | They didn't have any money and, amusingly, it was their GPU
             | business that kept them going on life support.
             | 
             | Their leadership seems quite a bit more competent than
             | random forum commenters give them credit for. I guess what
             | they need, marketing wise, is a few successful halo GPU
             | launches. They haven't done that in a while. Lisa
             | acknowledged this years ago. It's marketing 101. I guess
             | these things are easier said than done.
        
           | lostdog wrote:
           | It feels like "Make the AI software work on our GPUs," is on
           | some VP's OKRs, but isn't really being checked on for
           | progress or quality.
        
           | trynumber9 wrote:
           | That doesn't explain CDNA. They focused on high-throughput
           | FP64 which is not where the market went.
        
         | hjabird wrote:
         | The problem with effectively supporting CUDA is that encourages
         | CUDA adoption all the more strongly. Meanwhile, AMD will always
         | be playing catch-up, forever having to patch issues, work
         | around Nvidia/AMD differences, and accept the performance
         | penalty that comes from having code optimised for another
         | vendor's hardware. AMD needs to encourage developers to use
         | their own ecosystem or an open standard.
        
           | slashdev wrote:
           | With Nvidia controlling 90%+ of the market, this is not a
           | viable option. They'd better lean hard into CUDA support if
           | they want to be relevant.
        
             | cduzz wrote:
             | A bit of story telling here:
             | 
             | IBM and Microsoft made OS/2. The first version worked on
             | 286s and was stable but useless.
             | 
             | The second version worked only on 386s and was quite good,
             | and even had wonderful windows 3.x compatibility. "Better
             | windows than windows!"
             | 
             | At that point Microsoft wanted out of the deal and they
             | wanted to make their newer version of windows, NT, which
             | they did.
             | 
             | IBM now had a competitor to "new" windows and a very
             | compatible version of "old" windows. Microsoft killed OS2
             | by a variety of ways (including just letting IBM be IBM)
             | but also by making it very difficult for last month's
             | version of OS/2 to run next month's bunch of Windows
             | programs.
             | 
             | To bring this back to the point -- IBM vs Microsoft is akin
             | to AMD vs Nvidia -- where nvidia has the standard that AMD
             | is implementing, and so no matter what if you play in the
             | backward compatibility realm you're always going to be
             | playing catch-up and likely always in a position where
             | winning is exceedingly hard.
             | 
             | As WOPR once said "interesting game; the only way to win is
             | to not play."
        
               | panick21_ wrote:
               | IBM also made a whole bunch of strategic mistakes beyond
               | that. Most importantly their hardware division didn't
               | give a flying f about OS/2. Even when they had a 'better
               | Windows' they did not actually use it themselves and
               | didn't push it to other vendors.
               | 
               | Windows NT wasn't really relevant in that competition for
               | much longer, only XP was finally for end consumers.
               | 
               | > where nvidia has the standard that AMD is implementing,
               | and so no matter what if you play in the backward
               | compatibility realm you're always going to be playing
               | catch-up
               | 
               | That's not true. If AMD starts adding their own features
               | and have their own advantages, that can flip.
               | 
               | It only takes a single generation of hardware, or a
               | single feature for things to flip.
               | 
               | Look at Linux and Unix. Its started out with Linux
               | implementing Unix, and now the Unix are trying to add
               | compatibility with with Linux.
               | 
               | Is SGI still the driving force behind OpenGL/Vulcan? Did
               | you think it was a bad idea for other companies to use
               | OpenGL?
               | 
               | AMD was successful against Intel with x86_64.
               | 
               | There are lots of example of the company making something
               | popular, not being able to take full advantage of it in
               | the long run.
        
               | chuckadams wrote:
               | Slapping a price tag of over $300 on OS/2 didn't do IBM
               | any favors either.
        
               | BizarroLand wrote:
               | That's what happens when your primary business model is
               | selling to the military. They had to pay what IBM charged
               | them (within a small bit of reason) and it was incredibly
               | difficult for them to pivot away from any path they chose
               | in the 80's once they had chosen it.
               | 
               | However, that same logic doesn't apply to consumers, and
               | since they continued to fail to learn that lesson now IBM
               | doesn't even target the consumer market given that they
               | never learned how to be competitive and could only ever
               | effectively function when they had a monopoly or at least
               | a vendor lock-in.
               | 
               | https://en.wikipedia.org/wiki/Acquisition_of_the_IBM_PC_b
               | usi...
        
               | incrudible wrote:
               | Windows before NT was crap, so users had an incentive to
               | upgrade. If there had existed a Windows 7 alternative
               | that was near fully compatible and FOSS, I would wager
               | Microsoft would have lost to it with Windows 8 and even
               | 10. The only reason to update for most people was
               | Microsoft dropping support.
               | 
               | For CUDA, it is not just AMD who would need to catch up.
               | Developers also are not necessarily going to target the
               | latest feature set immediately, especially if it only
               | benefits (or requires) new hardware.
               | 
               | I accept the final statement, but that also means AMD for
               | compute is gonna be dead like OS/2. Their stack just will
               | not reach critical mass.
        
               | BizarroLand wrote:
               | Todays linux OS's would have competed incredibly strongly
               | against Vista and probably would have gone blow for blow
               | against 7.
               | 
               | Proton, Wine, and all of the compatibility fixes and
               | drive improvements that the community has made in the
               | last 16 years has been amazing, and every day is another
               | day where you can say that it has never been easier to
               | switch away from Windows.
               | 
               | However, Microsoft has definitely been drinking the IBM
               | koolaid a little to long and has lost the mandate of
               | heaven. I think in the next 7-10 years we will reach a
               | point where there is nothing Windows can do that linux
               | cannot do better and easier without spying on you, and we
               | may be 3-5 years from a "killer app" that is specifically
               | built to be incompatible with Windows just as a big FU to
               | them, possibly in the VR world, possibly in AR, and once
               | that happens maybe, maybe, maybe it will finally actually
               | be the year of the linux desktop.
        
               | paulmd wrote:
               | > However, Microsoft has definitely been drinking the IBM
               | koolaid a little to long and has lost the mandate of
               | heaven. I think in the next 7-10 years we will reach a
               | point where there is nothing Windows can do that linux
               | cannot do better and easier without spying on you
               | 
               | that's a fascinating statement with the clear ascendancy
               | of neural-assisted algorithms etc. Things like DLSS are
               | the future - small models that just quietly optimize some
               | part of a workload that was commonly considered
               | impossible to the extent nobody even thinks about it
               | anymore.
               | 
               | my prediction is that in 10 years we are looking at the
               | rise of tag+collection based filesystems and operating
               | system paradigms. all of us generate a huge amount of
               | "digital garbage" constantly, and you either sort it out
               | into the important stuff, keep temporarily, and toss, or
               | you accumulate a giant digital garbage pile. AI systems
               | are gonna automate that process, it's gonna start on
               | traditional tree-based systems but eventually you don't
               | need the tree at all, AI is what's going to make that
               | pivot to true tag/collection systems possible.
               | 
               | Tags mostly haven't worked because of a bunch of
               | individual issues which are pretty much solved by AI.
               | Tags aren't specific enough: well, AI can give you good
               | guesses at relevance. Tagging files and maintaining
               | collections is a pain: well, the AI can generate tags and
               | assign collections for you. Tags really require an
               | ontology for "fuzzy" matching (search for "food" should
               | return the tag "hot dog") - well, LLMs understand
               | ontologies fine. Etc etc. And if you do it right, you can
               | basically have the AI generate "inbox/outbox" for you,
               | deduplicate files and handle versioning, etc, all
               | relatively seamlessly.
               | 
               | microsoft and macos are both clearly racing for this with
               | the "AI os" concept. It's not just better relevance
               | searches etc. And the "generate me a whole paragraph
               | before you even know what I'm trying to type" stuff is
               | not how it's going to work either. That stuff is like
               | specular highlights in video games around 2007 or
               | whatever - once you had the tool, for a few years
               | everything was _w e t_ until developers learned some
               | restraint with it. But there are very very good
               | applications that are going to come out in the 10 year
               | window that are going to reduce operator cognitive load
               | by a lot - that is the  "AI OS" concept. What would the
               | OS look like if you truly had the "computer is my
               | secretary" idea? Not just dictating memorandums, but
               | assistance in keeping your life in order and keeping you
               | on-task.
               | 
               | I simply cannot see linux being able to keep up with this
               | change, in the same way the kernel can't just switch to
               | rust - at some point you are too calcified to ever do the
               | big-bang rewrite if there is not a BDFL telling you that
               | it's got to happen.
               | 
               | the downside of being "the bazaar" is that you are
               | standards-driven and have to deal with corralling a
               | million whiny nerds constantly complaining about "spying
               | on me just like microsoft" and continuing to push in
               | their own other directions (sysvinit/upstart/systemd
               | factions, etc) and whatever else, _on top of all the
               | other technical issues of doing a big-bang rewrite_.
               | linux is too calcified to ever pivot away from being a
               | tree-based OS and it 's going to be another 2-3 decades
               | before they catch up with "proper support for new file-
               | organization paradigms" etc even in the smaller sense.
               | 
               | that's really just the tip of the iceberg on the things
               | AI is going to change, and linux is probably going to be
               | left out of most of those _commercial applications_
               | despite being where the research is done. It 's just too
               | much of a mess and too many nerdlingers pushing back to
               | ever get anything done. Unix will be represented in this
               | new paradigm but not Linux - the commercial operators who
               | have the centralization and fortitude to build a
               | cathedral will get there much quicker, and that looks
               | like MacOS or Solaris not linux.
               | 
               | Or at least, unless I see some big announcement from KDE
               | or Gnome or Canonical/Red Hat about a big AI-OS
               | rewrite... I assume that's pretty much where the center
               | of gravity is going to stay for linux.
        
               | BizarroLand wrote:
               | Counterpoint: Most AI stuff is developed on either an OS
               | agnostic language like Python or C, and then ported to
               | Linux/OSX/Windows, so for AI it is less about the OS it
               | runs on than the hardware, drivers, and/or connections
               | that the OS supports.
               | 
               | For the non-vendor lock in AI's (copilot), casting as
               | wide of a net as possible to catch customers as easily as
               | possible should by default mean that they would invest
               | the small amount of money to build linux integrations
               | into their AI platforms.
               | 
               | Plus, the googs has a pretty deep investment into the
               | linux ecosystem and should have little issue pushing bard
               | or gemini or whatever they'll call it next week before
               | they kill it out into a linux compatible interface, and
               | if they do that then the other big players will follow.
               | 
               | And, don't overlook the next generation of VR headsets.
               | People have gotten silly over the Apple headset, but
               | Valve should be rolling out the Deckhard soon and others
               | will start to compete in that space since Apple raised
               | the price bar and should soon start rolling out hardware
               | with more features and software to take advantage of it.
        
               | incrudible wrote:
               | "Neural assisted algorithms" are just algorithms with
               | large lookup tables. Another magnitude of binary bloat,
               | but that's nothing we haven't experienced before. There's
               | no need to fundamentally change the OS paradigm for it.
        
               | paulmd wrote:
               | I think we're well past the "dlss is just FSR2 with
               | lookup tables, you can ALWAYS replicate the outcomes of
               | neural algorithms with deterministic ones" phase, imo.
               | 
               | if that's the case you have billion-dollar opportunities
               | waiting for you to prove it!
        
               | pjmlp wrote:
               | There is no competition when games only come to Linux by
               | "emulating" Windows.
               | 
               | The only thing it has going for it is being a free beer
               | UNIX clone for headless environments, and even then,
               | isn't that relevant on cloud environments where
               | containers and managed languages abstract everything they
               | run on.
        
               | BizarroLand wrote:
               | Thanks to the Steam Deck, more and more games are being
               | ported for Linux compatibility by default.
               | 
               | Maybe some Microsoft owned games makers will never make
               | the shift, but if the majority of others do then that's
               | the death knell.
        
               | incrudible wrote:
               | Are they _ported_ though? I would say thanks to the Steam
               | Deck, Proton is at a point where native Linux ports are
               | unnecessary. It 's also a much more stable target to
               | develop against than N+1 Linux distros.
        
               | pjmlp wrote:
               | Nah, everyone is relying on Proton, there are hardly any
               | native GNU/Linux games being ported, not even Android/NDK
               | ones, where SDL, OpenGL, Vulkan, C, C++ are present, and
               | would be extremely easy to port.
        
               | foobiekr wrote:
               | IBM was also incompetent and the os/2 team in Boca was
               | had some exceptional engineers but was packed witg mostly
               | mediocre-to-bad ones, which is why so many things in OS/2
               | were bad and why IBM got upset for Microsoft contributing
               | negative work to the project because their lines of code
               | contribution was negative (they were rewriting a lot of
               | inefficient bloated IBM code).
               | 
               | A lot went wrong with os/2. For CUDA, I think a better
               | analogy is vhs. The standard, in the effective not open
               | sense, is what it is. AMD sucks at software and views it
               | as an expense rather than an advantage.
        
               | AYBABTME wrote:
               | You would think that by now AMD realizes that poor
               | software is what left them behind in the dust, and would
               | have changed that mindset.
        
               | hyperman1 wrote:
               | Most businesses understand the pain points of their
               | suppliers very well, as they feel that pain and gave
               | themselves organized around it.
               | 
               | They have a hard time to understand the pain points of
               | their consumers, as they don't feel that pain, look
               | trough their own organisation-coloured glases, and can't
               | see the real pain points from the whiney-customer ones.
               | 
               | AMD probably thinks software ecosystems are the easy
               | part, ready to take it on whenever they feel like it and
               | throw a token amount at it. They've built a great engine,
               | see the carossery as beneath them, and don't understand
               | why the lazy customer wants them to build the rest of the
               | car too.
        
               | neerajsi wrote:
               | I'm not in the gpu programming realm, so this observation
               | might be inaccurate:
               | 
               | I think the case of cuda vs an open standard is different
               | from os2 vs Windows because the customers of cuda are
               | programmers with access to source code while the
               | customers of os2 were end users trying to run apps
               | written by others.
               | 
               | If your shrink-wrapped software didn't run on os2, you'd
               | have no choice but to go buy Windows. Otoh if your ai
               | model doesn't run on an AMD device and the issue is
               | something minor, you can edit the shader code.
        
           | bachmeier wrote:
           | > The problem with effectively supporting CUDA is that
           | encourages CUDA adoption all the more strongly.
           | 
           | I'm curious about this. Sure some CUDA code has already been
           | written. If something new comes along that provides better
           | performance per dollar spent, why continue writing CUDA for
           | new projects? I don't think the argument that "this is what
           | we know how to write" works in this case. These aren't
           | scripts you want someone to knock out quickly.
        
             | Uehreka wrote:
             | > If something new comes along that provides better
             | performance per dollar spent
             | 
             | They won't be able to do that, their hardware isn't fast
             | enough.
             | 
             | Nvidia is beating them at hardware performance, AND ALSO
             | has an exclusive SDK (CUDA) that is used by almost all deep
             | learning projects. If AMD can get their cards to run CUDA
             | via ROCm, then they can begin to compete with Nvidia on
             | price (though not performance). Then, and only then, if
             | they can start actually producing cards with equivalent
             | performance (also a big stretch) they can try for an
             | Embrace Extend Extinguish play against CUDA.
        
               | bachmeier wrote:
               | > They won't be able to do that, their hardware isn't
               | fast enough.
               | 
               | Well, then I guess CUDA is not really the problem, so
               | being able to run CUDA on AMD hardware wouldn't solve
               | anything.
               | 
               | > try for an Embrace Extend Extinguish play against CUDA
               | 
               | They wouldn't need to go that route. They just need a way
               | to run existing CUDA code on AMD hardware. Once that
               | happens, their customers have the option to save money by
               | writing ROCm or whatever AMD is working on at that time.
        
               | foobiekr wrote:
               | Intel has the same software issue as AMD but their
               | hardware is genuinely competitive if a generation behind.
               | Cost and power wise, Intel is there; software? No.
        
               | Uehreka wrote:
               | > Well, then I guess CUDA is not really the problem
               | 
               | It is. All the things are the problem. AMD is behind on
               | both hardware and software, for both gaming and compute
               | workloads, and has been for many years. Their competitor
               | has them beat in pretty much every vertical, and the
               | lock-in from CUDA helps ensure that even if AMD can get
               | their act together on the hardware side, existing compute
               | workloads (there are oceans of existing workloads) won't
               | run on their hardware, so it won't matter for
               | professional or datacenter usage.
               | 
               | To compete with Nvidia in those verticals, AMD has to fix
               | all of it. Ideally they'd come out with something better
               | than CUDA, but they have not shown an aptitude for being
               | able to do something like that. That's why people keep
               | telling them to just make a compatibility layer. It's a
               | sad place to be, but that's the sad place where AMD is,
               | and they have to play the hand they've been dealt.
        
             | dotnet00 wrote:
             | If something new comes along that provides better
             | performance per dollar, but you have no confidence that
             | it'll continue to be available in the future, it's far less
             | appealing. There's also little point in being cheaper if it
             | just doesn't have the raw performance to justify the effort
             | in implementing in that language.
             | 
             | CUDA currently has the better raw performance, better
             | availability, and a long record indicating that the
             | platform won't just disappear in a couple of years. You can
             | use it on pretty much any NVIDIA GPU and it's properly
             | supported. The same CUDA code that ran on a GTX680 can run
             | on an RTX4090 with minimal changes if any (maybe even the
             | same binary).
             | 
             | In comparison, AMD has a very spotty record with their
             | compute technologies, stuff gets released and becomes
             | effectively abandonware, or after just a few years support
             | gets dropped regardless of the hardware's popularity. For
             | several generations they basically led people on with
             | promises of full support on consumer hardware that either
             | never arrived or arrived when the next generation of cards
             | were already available, and despite the general popularity
             | of the rx580 and the popularity of the Radeon VII in
             | compute applications, they dropped 'official' support. AMD
             | treats its 'consumer' cards as third class citizens for
             | compute support, but you aren't going to convince people to
             | seriously look into your platform like that. Plus, it's a
             | lot more appealing to have "GPU acceleration will allow us
             | to take advantage of newer supercomputers, while also
             | offering massive benefits to regular users" than just the
             | former.
             | 
             | This was ultimately what removed AMD as a consideration for
             | us when we were deciding on which to focus on for GPU
             | acceleration in our application. Many of us already had
             | access to an NVIDIA GPU of any sort, which would make
             | development easier, while the entire facility had one ROCm
             | capable AMD GPU at the time, specifically so they could
             | occasionally check in on its status.
        
           | panick21_ wrote:
           | That's not guaranteed at all. One could make the same
           | argument about Linux vs Commercial Unix.
           | 
           | If the put their stuff as OpenSource, including firmware, I
           | think they will win out eventually.
           | 
           | And its also not a guarantee that Nvidia will always produce
           | the superior hardware for that code.
        
           | kgeist wrote:
           | Intel embraced Amd64 ditching Itanium. Wasn't it a good
           | decision that worked out well? Is it comparable?
        
             | teucris wrote:
             | In hindsight, yes, but just because a specific technology
             | is leading an industry doesn't mean it's going to be the
             | best option. It has to play out long enough for the market
             | to indicate a preference. In this case, for better or
             | worse, it looks like CUDA's the preference.
        
               | diggan wrote:
               | > It has to play out long enough for the market to
               | indicate a preference
               | 
               | By what measures hasn't that happened already? CUDA been
               | around and constantly improving for more than 15 years,
               | and there is no competitors in sight so far. It's
               | basically the de facto standard in many ecosystems.
        
               | teucris wrote:
               | There haven't been any as successful, but there have been
               | competitors. OpenCL, DirectX come to mind.
        
               | cogman10 wrote:
               | SYCL is the latest attempt that I'm aware of. It's still
               | pretty active and may just work as it doesn't rely on
               | video card manufactures to work out.
        
               | zozbot234 wrote:
               | SYCL is the quasi-successor to OpenCL, built on the same
               | flavor of SPIR-V. Various efforts are trying to run it on
               | top of Vulkan Compute (which tends to be broadly support
               | by modern GPU's) but it's non-trivial because the
               | technologies are independently developed and there are
               | some incompatibilities.
        
             | kllrnohj wrote:
             | Intel & AMD have a cross-license agreement covering
             | everything x86 (and x86_64) thanks to lots and lots of
             | lawsuits over their many years of competition.
             | 
             | So while Intel had to bow to AMD's success and give up
             | Itanium, they weren't then limited by that and could
             | proceed to iterate on top of it.
             | 
             | Meanwhile it'll be a cold day in hell before Nvidia
             | licenses anything about CUDA to AMD, much less allows AMD
             | to iterate on top of it.
        
               | kevin_thibedeau wrote:
               | The original cross licensing was government imposed
               | because a second source was needed for the military.
        
               | atq2119 wrote:
               | Makes you wonder why DoE labs and similar facilities
               | don't mandate open licensing of CUDA.
        
               | krab wrote:
               | Isn't API out of scope for copyright? In the case of
               | CUDA, it seems they can copy most of it and then iterate
               | in their own, keeping a compatible subset.
        
           | throwoutway wrote:
           | Is it? Apple Silicon exists, but Apple created a translation
           | layer above it so the transition could be smoother.
        
             | jack_pp wrote:
             | not really the same in that Apple was absolutely required
             | to do this in order for people to transition smoothly and
             | it wasn't competing against another company / platform, it
             | just needed apps from its previous platform to work while
             | people recompile apps for the current one which they will
        
             | Jorropo wrote:
             | This is extremely different, apple was targeting end
             | consumers that just want their app to run. The performance
             | between apple rosetta and native cpu were still multiple
             | times different.
             | 
             | People writing CUDA apps don't just want stuff to run,
             | performance is an extremely important factor else they
             | would target CPUs which are easier to program for.
             | 
             | From their readme: > On Server GPUs, ZLUDA can compile CUDA
             | GPU code to run in one of two modes: > Fast mode, which is
             | faster, but can make exotic (but correct) GPU code hang. >
             | Slow mode, which should make GPU code more stable, but can
             | prevent some applications from running on ZLUDA.
        
               | hamandcheese wrote:
               | > The performance between apple rosetta and native cpu
               | were still multiple times different.
               | 
               | Rosetta 2 runs apps at 80-90% their native speed.
        
               | Jorropo wrote:
               | Indeed I got that wrong. Sadly minimal SIMD and hardware
               | acceleration support.
        
               | piva00 wrote:
               | > The performance between apple rosetta and native cpu
               | were still multiple times different.
               | 
               | Not at all, the performance hit was in the low 10s %,
               | before natively supporting Apple Silicon most of the apps
               | I use for music/video/photography didn't seem to have a
               | performance impact at all, even more when the M1 machines
               | were so much faster than the Intels.
        
           | coldtea wrote:
           | > _The problem with effectively supporting CUDA is that
           | encourages CUDA adoption all the more strongly_
           | 
           | Worked fine for MS with Excel supporting Lotus 123 and Word
           | supporting WordPerfect's formats when those were dominant...
        
             | Dork1234 wrote:
             | Microsoft could do that because they had the Operating
             | System monopoly to leverage and take out both Lotus 123 and
             | WordPerfect. Without the monopoly of the operating system
             | they wouldn't of been able to Embrace, Extend, Extinguish.
             | 
             | https://en.wikipedia.org/wiki/Embrace,_extend,_and_extingui
             | s...
        
             | bell-cot wrote:
             | But MS controlled the underlying OS. Letting them both
             | throw money at the problem, and (by accounts at the time)
             | frequently tweak the OS in ways that made life difficult
             | for Lotus, WordPerfect, Ashton-Tate, etc.
        
               | p_l wrote:
               | Last I checked, Lotus did themselves by not innovating,
               | and betting on the wrong horse (OS/2) then not doing well
               | on a pivot to Windows.
               | 
               | Meanwhile Excel was gaining features and winning users
               | with them even before Windows was in play.
        
               | dadadad100 wrote:
               | This is a key point. Before windows we had all the dos
               | players - WordPerfect was king. Microsoft was more
               | focused on the Mac. I've always assumed that Microsoft
               | understood that a GUI was coming and trained a generation
               | of developers on the main gui of the day. Once windows
               | came out the dos focused apps could not adapt in time
        
               | robocat wrote:
               | > betting on the wrong horse (OS/2)
               | 
               | Ahhhh, your hindsight is well developed. I would be
               | interested to know the background on the reasons why
               | Lotus made that bet. We can't know the counterfactual,
               | but Lotus delivering on a platform owned by their deadly
               | competitor Microsoft would seem to me to be a clearly
               | worrysome idea to Lotus at the time. Turned out it was an
               | existentially bad idea. Did Lotus fear Microsoft? "DOS
               | ain't done till Lotus won't run" is a myth[1] for a
               | reason. Edit: DRDOS errors[2] were one reason Lotus might
               | fear Microsoft. We can just imagine a narritive of a
               | different timeline where Lotus delivered on Windows but
               | did some things differently to beat Excel. I agree, Lotus
               | made other mistakes and Microsoft made some great
               | decisions, but the point remains.
               | 
               | We can also suspect that AMD have a similar choice now
               | where they are forked. Depending on Nvidea/CUDA may be a
               | similar choice for AMD - fail if they do and fail if they
               | don't.
               | 
               | [1] http://www.proudlyserving.com/archives/2005/08/dos_ai
               | nt_done...
               | 
               | [2] https://www.theregister.com/1999/11/05/how_ms_played_
               | the_inc...
        
               | p_l wrote:
               | I've seen rumours from self-claimed ex-Lotus employees
               | that IBM made a deal with Lotus to prioritise OS/2
        
           | andy_ppp wrote:
           | When the alternative is failure I suppose you choose the
           | least bad option. Nobody is betting the farm on ROCm!
        
             | hjabird wrote:
             | True. This is the big advantage of an open standard instead
             | jumping from one vendors walled garden to another.
        
           | more_corn wrote:
           | They have already lost. The question is do they want to come
           | in second in the game to control the future of the world or
           | not play at all?
        
           | bick_nyers wrote:
           | The latest version of CUDA is 12.3, and version 12.2 came out
           | 6 months prior. How many people are running an older version
           | of CUDA right now on NVIDIA hardware for whatever particular
           | reason?
           | 
           | Even if AMD lagged support on CUDA versioning, I think it
           | would be widely accepted if the performance per dollar at
           | certain price points was better.
           | 
           | Taking the whole market from NVIDIA is not really an option,
           | it's better to attack certain price points and niches and
           | then expand from there. The CUDA ship sailed a long time ago
           | in my view.
        
             | swozey wrote:
             | I just went through this this weekend - If you're running
             | in Windows and want to use deepspeed, you have to still use
             | Cuda 12.1 because deepspeed 13.1 is the latest that works
             | with 12.1. There's no deepspeed for windows that works with
             | 12.3.
             | 
             | I tried to get it working this weekend but it was a huge
             | PITA so I switched to putting everything into WSL2 then in
             | arch on there pytorch etc in containers so I could flip
             | versions easily now that I know how SPECIFIC the versions
             | are to one another.
             | 
             | I'm still working on that part, halfway into it my WSL2
             | completely broke and I had to reinstall windows. I'm scared
             | to mount the vhdx right now. I did ALL of my work and ALL
             | of my documentation is inside of the WSL2 archlinux and NOT
             | on my windows machine. I have EVERYTHING I need to quickly
             | put another server up (dotfiles, configs) sitting in a
             | chezmoi git repo ON THE VM. That I only git committed one
             | init like 5 mins into everything. THAT was a learning
             | experience, now I have no idea if I should follow the "best
             | practice" of keeping projects in wsl or having wsl reach
             | out to windows, there's a performance drop. The 9p
             | networking stopped working and no matter what I
             | reinstalled, reset, removed features, reset windows, etc,
             | it wouldn't start. But at least I have that WSL2 .vhdx
             | image that will hopefully mount and start. And probably
             | break WSL2 again. I even SPECIFICALLY took backups of the
             | image as tarballs every hour in case I broke LINUX, not
             | WSL.
             | 
             | If anyone has done sd containers in wsl2 already let me
             | know. I've tried to use WSL for dev work (i use osx) like
             | this 2-3 times in the last 4-5 years and I always run into
             | some catastrophically broken thing that makes my WSL stop
             | working. I hadn't used it in years so hoped it was super
             | reliable by now. This is on 3 different desktops with
             | completely different hardware, etc. I was terrified it
             | would break this weekend and IT DID. At least I can be up
             | in windows in 20 minutes thanks to chocolately and chezmoi.
             | Wiped out my entire gaming desktop.
             | 
             | Sorry I'm venting now this was my entire weekend.
             | 
             | This repo is from a deepspeed contrib (iirc) and lists the
             | reqs for deepspeed + windows that mention the version
             | matches
             | 
             | https://github.com/S95Sedan/Deepspeed-Windows
             | 
             | > conda install pytorch==2.1.2 torchvision==0.16.2
             | torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
             | 
             | It may sound weird to do any of this in Windows, or maybe
             | not, but if it does just remember that it's a lot of gamers
             | like me with 4090s who just want to learn ML stuff as a
             | hobby. I have absolutely no idea what I'm doing but thank
             | god I know containers and linux like the back of my hand.
        
               | bick_nyers wrote:
               | Vent away! Sounds frustrating for sure.
               | 
               | As much as I love Microsoft/Windows for the work they
               | have put into WSL, I ended up just putting Kubuntu on my
               | devices and use QEMU with GPU passthrough whenever I need
               | Windows. Gaming perf is good. You need an iGPU or a cheap
               | second GPU for Linux in order to hand off a 4090 etc. to
               | Windows (unless maybe your motherboard happens to support
               | headless boot but if it's a consumer board it doesn't).
               | Dual boot with Windows always gave me trouble.
        
               | katbyte wrote:
               | I recently gave this a go as I'd not had a windows
               | desktop for a long time, have a beefy Proxmox server and
               | wanted to play some windows only games - works shockingly
               | well with an a4000 and 35m optical hdmi cables! - however
               | I'm getting random audio crackling and popping and I've
               | yet to figure out what's causing it.
               | 
               | First I thought it was hardware related in a Remote
               | Desktop session leading me to think some weird audio
               | driver thing
               | 
               | have you encountered anything like this at all?
        
               | swozey wrote:
               | What are you running for audio? pipewire+jack, pipewire,
               | jack2, pulseaudio? I wonder if it's from latency.
               | Pulseaudio is the most common but if you do any audio
               | engineering or play guitar etc with your machine we all
               | use jack protocol for less latency.
               | 
               | https://linuxmusicians.com/viewtopic.php?t=25556
               | 
               | Could be completely unrelated though, RDP sessions can
               | definitely act up, get audio out of sync etc. I try to
               | never do pass through rdp audio, it's not even enabled by
               | default in the mstsc client IIRC but that may just be a
               | "probably server" thing.
        
               | swozey wrote:
               | Are you flipping from your main GPU to like a GT710 to do
               | the gpu vfio mount? Or can you share the dgpu directly
               | and not have to go headless now?
               | 
               | I've done this on both a hackintosh and void linux. I was
               | so excited to get the hackintosh working because I
               | honestly hate day desktop linux, it's my day job to work
               | on and I just don't want to deal with it after work.
               | 
               | Unfortunately both would break in significant ways and
               | I'd have to trudge through and fix things. I had that
               | void desktop backed up with Duplicacy (duplicati front
               | end) and IIRC I tried to roll back after breaking qemu,
               | it just dumps all your backup files into their dirs, and
               | I think I broke it more.
               | 
               | I think at that point I was back up in Windows in 30
               | mins.. and all of its intricacies like bsoding 30% of the
               | time that I either restart it or unplug a usb hub. But my
               | Macbooks have a 30% chance of not waking up on Monday
               | morning when I haven't used them all weekend without me
               | having to grab them and open the screen.
        
             | carlossouza wrote:
             | Great comment.
             | 
             | I bet there are at least two markets (or niches):
             | 
             | 1. People who want the absolute best performance and the
             | latest possible version and are willing to pay the premium
             | for it;
             | 
             | 2. People who want to trade performance by cost and accept
             | working with not-the-latest versions.
             | 
             | In fact, I bet the market for (2) is much larger than (1).
        
             | bluedino wrote:
             | > How many people are running an older version of CUDA
             | right now on NVIDIA hardware for whatever particular
             | reason?
             | 
             | I would guess there are lots of people still running CUDA
             | 11. Older clusters, etc. A lot of that software doesn't get
             | updated very often.
        
           | hjabird wrote:
           | There are some great replies to my comment - my original
           | comment was too reductive. However, I still think that
           | entrenching CUDA as the de-facto language for heterogeneous
           | computing is a mistake. We need an open ecosystem for AI and
           | HPC, where vendors compete on producing the best hardware.
        
             | ethbr1 wrote:
             | The problem with open standards is that someone has to
             | write them.
             | 
             | And that someone usually isn't a manufacturer, lest the
             | committee be accused of bias.
             | 
             | Consequently, you get (a) outdated features that SotA has
             | already moved beyond, (b) designed in a way that doesn't
             | correspond to actual practice, and (c) that are overly
             | generalized.
             | 
             | There are some notable exceptions (e.g. IETF), but the
             | general rule has been that open specs please no one,
             | slowly.
             | 
             | IMHO, FRAND and liberal cross-licensing produce better
             | results.
        
               | jchw wrote:
               | Vulkan already has some standard compute functionality.
               | Not sure if it's low level enough to be able to e.g.
               | recompile and run CUDA kernels, but I think if people
               | were looking for a vendor-neutral standard to build GPGPU
               | compute features on top of, I mean, that seems to be the
               | obvious modern choice.
        
               | zozbot234 wrote:
               | There is already a work-in-progress implementation of HIP
               | on top of OpenCL https://github.com/CHIP-SPV/chipStar and
               | the Mesa RustiCL folks are quite interested in getting
               | that to run on top of Vulkan.
               | 
               | (To be clear, HIP is about converting CUDA source code
               | not running CUDA-compiled binaries but the Zluda project
               | discussed in OP heavily relies on it.)
        
           | jvanderbot wrote:
           | If you replace CUDA -> x86 and NVIDIA -> Intel, you'll see a
           | familiar story which AMD has already proved it can work
           | through.
           | 
           | These were precisely the arguments for 'x86 will entrench
           | Intel for all time', and we've seen AMD succeed at that game
           | just fine.
        
             | ianlevesque wrote:
             | And indeed more than succeed, they invented x86_64.
        
               | stcredzero wrote:
               | _And indeed more than succeed, they invented x86_64._
               | 
               | If AMD invented the analogous to x86_64 for CUDA, this
               | would increase competition and progress in AI by some
               | huge fraction.
        
               | pjmlp wrote:
               | Only works if NVidia misteps and creates the Itanium
               | version of CUDA.
        
               | stcredzero wrote:
               | You don't think someone would welcome the option to have
               | more hardware buying options, even if the "Itanium
               | version" didn't happen?
        
               | sangnoir wrote:
               | x86_64's win was helped by Intel's Itanium misstep. AMD
               | can't bank on Nvidia making a mistake, and Nvidia seems
               | content with incremental changes to CUDA, contrasted with
               | Intel's 32-bit to 64-bit transition. It is highly
               | unlikely that AMD can find and exploit a similar chink in
               | the amor against CUDA.
        
               | LamaOfRuin wrote:
               | If they're content with incremental changes to CUDA then
               | it doesn't cost much to keep updated compatibility and do
               | it as quickly as any users actually adopt changes.
        
             | samstave wrote:
             | Transmetta was Intels boogey-man in the 90s.
        
             | ethbr1 wrote:
             | > _These were precisely the arguments for 'x86 will
             | entrench Intel for all time', and we've seen AMD succeed at
             | that game just fine._
             | 
             | ... after a couple decades of legal proceedings and a
             | looming FTC monopoly case convinced Intel to throw in the
             | towel, cross-license, and compete more fairly with AMD.
             | 
             | https://jolt.law.harvard.edu/digest/intel-and-amd-
             | settlement
             | 
             | AMD didn't just magically do it on its own.
        
             | clhodapp wrote:
             | If that's the model, it sounds like the path would be to
             | burn money to stay right behind NVIDIA and wait for them to
             | become complacent and stumble technically, creating the
             | opportunity to leapfrog them. Keeping up could be very
             | expensive if they don't force something like the mutual
             | licensing requirements around x86.
        
           | mindcrime wrote:
           | Yep. This is very similar to the "catch-22" that IBM wound up
           | in with OS/2 and the Windows API. On the one hand, by
           | supporting Windows software on OS/2, they gave OS/2 customers
           | access to a ready base of available, popular software. But in
           | doing so, they also reduced the incentive for ISV's to
           | produce OS/2 native software that could take advantage of
           | unique features of OS/2.
           | 
           | It's a classic "between a rock and a hard place" scenario.
           | Quite a conundrum.
        
             | ianlevesque wrote:
             | Thinking about the highly adjacent graphics APIs history,
             | did anyone really 'win' the Direct3D, OpenGL, Metal, Vulkan
             | war? Are we benefiting from the fragmentation?
             | 
             | If the players in the space have naturally coalesced around
             | one over the last decade, can we skip the thrashing and
             | just go with it this time?
        
               | tadfisher wrote:
               | The game engines won. Folks aren't building Direct3D or
               | Vulkan renderers; they're using Unity or Unreal or Godot
               | and clicking "export" to target whatever API makes sense
               | for the platform.
               | 
               | WebGPU might be the thing that unifies the frontend API
               | for folks writing cross-platform renderers, seeing as
               | browsers will have to implement it on top of the platform
               | APIs anyway.
        
         | imtringued wrote:
         | This feels like a massive punch in the gut. An opensource
         | project, not ruined by AMD's internal mismanaged gets shit done
         | within two years and AMD goes "meh"?!? There are billions of
         | dollars on the line! It's like AMD actively hates it's
         | customers.
         | 
         | Now the only thing they need to do is make sure ROCm itself is
         | stable.
        
         | largbae wrote:
         | It certainly seems ironic that the company that beat Intel at
         | its own compatibility game with x86-64 would abandon
         | compatibility with today's market leader.
        
           | rob74 wrote:
           | The situation is a bit different: AMD got its foot in the
           | door with the x86 market because IBM back in the early 1980s
           | forced Intel to license the technology so AMD could act as a
           | second source of CPUs. In the GPU market, ATI (later bought
           | by AMD) and nVidia emerged as the market leaders after the
           | other 3D graphics pioneers (3Dfx) gave up - but their GPUs
           | were never compatible in the first place, and if AMD tried to
           | _make_ them compatible, nVidia could sue the hell out of
           | them...
        
         | alberth wrote:
         | DirectX vs OpenGL.
         | 
         | This brings back memories of late 90s / early 00s of Microsoft
         | pushing hard their proprietary graphic libraries (DirectX) vs
         | open standards (OpenGL).
         | 
         | Fast forward 25-years and even today, Microsoft still dominates
         | in PC gaming as a result.
         | 
         | There's a bad track record of open standard for GPUs.
         | 
         | Even Apple themselves gave up on OpenGL and has their own
         | proprietary offering (Metal).
        
           | incrudible wrote:
           | To add to that, Linux gaming today is dominated by a wrapper
           | implementing DirectX.
        
             | Zardoz84 wrote:
             | Vulkan running an emulation of DirectX and being faster
        
           | Keyframe wrote:
           | Let's not forget the Fahrenheit maneuver by Microsoft that
           | left SGI stranding and not forward OpenGL.
        
             | pjmlp wrote:
             | Yeah, it never mattered to game consoles either way.
        
           | okanat wrote:
           | OpenGL was invented at SGI and it was closed source until it
           | was given away. It is very popular in its niche i.e. CAD
           | design because the original closed source SGI APIs were very
           | successful.
           | 
           | DirectX was targetted at gaming and was a much more limited
           | simpler API which made programming games in it easier. It
           | couldn't do everything that OpenGL can which is why CAD
           | programs didn't use it even on Windows. DirectX worked
           | because it chose its market correctly and delivered what the
           | customers want. Window's exceptional backwards compatibility
           | helped greatly as well. Many simple game engines still use
           | DX9 API to this day.
           | 
           | It is not so much about having an open standard, but being
           | able to provide extra functionality and performance. Unlike
           | the CPU-dominated areas where executing the common baseline
           | ISA is very competitive, in accelerated computing using every
           | single bit of performance and having new and niche features
           | matter. So providing exceptional hardware with good software
           | is critical for the competition. Closed APIs have much more
           | quick delivery time and they don't have to deal with multiple
           | vendors.
           | 
           | Nobody except Nvidia delivers good enough low level software
           | and their hardware is exceptionally good. AMD's combination
           | is neither. The hardware is slower and it is hard to program
           | so they continuously lose the race.
        
           | pjmlp wrote:
           | Also to note, dispite urban myths, OpenGL never mattered on
           | game consoles, which people keep forgeting about when
           | praising OpenGL "portability".
           | 
           | Then there is the whole issue of extension spaghetti, and
           | incompatibilities across OpenGL, OpenGL ES and WebGL, hardly
           | possible to have portable code 1:1 everywhere, beyond toy
           | examples.
        
             | beebeepka wrote:
             | I guess every recent not-xbox never mattered.
        
               | pjmlp wrote:
               | Like Nintendo, SEGA and Sony ones?
        
         | owlbite wrote:
         | Code portability isn't performance portability, a fact that was
         | driven home back in the bad old OpenCL era. Code is going to
         | have to rewritten to be efficient on AMD architectures.
         | 
         | At which point why tie yourself to the competitor's language.
         | Probably much more effective to just write a well optimized
         | library that serves the MLIR/whatever is popular API in order
         | to run big ML jobs.
        
         | modeless wrote:
         | I've been critical of AMD's failure to compete in AI for over a
         | decade now, but I can see why AMD wouldn't want to go the route
         | of cloning CUDA and I'm surprised they even tried. They would
         | be on a never ending treadmill of feature catchup and bug-for-
         | bug compatibility, and wouldn't have the freedom to change the
         | API to suit their hardware.
         | 
         | The right path for AMD has always been to make their own API
         | that runs on _all_ of their own hardware, just as CUDA does for
         | Nvidia, and push support for that API into all the open source
         | ML projects (but mostly PyTorch), while attacking Nvidia 's
         | price discrimination by providing features they use to segment
         | the market (e.g. virtualization, high VRAM) at lower price
         | points.
         | 
         | Perhaps one day AMD will realize this. It seems like they're
         | slowly moving in the right direction now, and all it took for
         | them to wake up was Nvidia's market cap skyrocketing to 4th in
         | the world on the back of their AI efforts...
        
           | matchagaucho wrote:
           | But AMD was formed to shadow Intel's x86?
        
             | modeless wrote:
             | ISAs are smaller and less stateful and better documented
             | and less buggy and most importantly they evolve much more
             | slowly than software APIs. Much more feasible to clone.
             | Especially back when AMD started.
        
               | paulmd wrote:
               | PTX is just an ISA too. Programming languages annd ISA
               | representations are effectively fungible, that's the
               | lesson of Microsoft CLR/Intermediate Language and Java
               | too. A "machine" is a hardware and a language.
        
               | modeless wrote:
               | PTX is not a hardware ISA though, it's still software and
               | can change more rapidly.
        
               | paulmd wrote:
               | Not without breaking the support contract? If you change
               | PTX format then CUDA 1.0 machines can no longer it and
               | it's no longer PTX.
               | 
               | Again, you are missing the point. Java is both a language
               | (java source) and a machine (the JVM). The latter is a
               | hardware ISA - there are processors that implement Java
               | bytecode as their ISA format. Yet most people who are
               | running Java are not doing so on java-machine hardware,
               | yet they _are_ using the java ISA in the process.
               | 
               | https://en.wikipedia.org/wiki/Java_processor
               | 
               | https://en.wikipedia.org/wiki/Bytecode#Execution
               | 
               | any bytecode is an ISA, the bytecode spec defines the
               | machine and you can physically build such a machine that
               | executes bytecode directly. Or you can translate via an
               | intermediate layer, like how Transmeta Crusoe processors
               | executed x86 as bytecode on a VLIW processor (and how
               | most modern x86 processors actually use RISC micro-ops
               | inside).
               | 
               | these are completely fungible concepts. They are not
               | quite _the same thing_ but bytecode is clearly an ISA in
               | itself. Any given processor can _choose_ to use a
               | particular bytecode as either an ISA or translate it to
               | its native representation, and this includes both PTX,
               | Java, and x86 (among all other bytecodes). And you can do
               | the same for any other ISA (x86 as bytecode
               | representation, etc).
               | 
               | furthermore, what most people think of as "ISAs" aren't
               | necessarily so. For example RDNA2 is an ISA _family_ -
               | different processors have different capabilities (for
               | example 5500XT has mesh shader support while 5700XT does
               | not) and the APUs use a still different ISA internally
               | etc. GFX1101 is not the same ISA as GFX1103 and so on.
               | These are properly _implementations_ not ISAs, or if you
               | consider it to be an ISA then there is also a meta-ISA
               | encompassing larger groups (which also applies to x86 's
               | numerous variations). But people casually throw it all
               | into the "ISA" bucket and it leads to this imprecision.
               | 
               | like many things in computing, it's all a matter of
               | perspective/position. where is the boundary between "CMT
               | core within a 2-thread module that shares a front-end"
               | and "SMT thread within a core with an ALU pinned to one
               | particular thread"? It's a matter of perspective. Where
               | is the boundary of "software" vs "hardware" when
               | virtually every "software" implementation uses fixed-
               | function accelerator units and every fixed-function
               | accelerator unit is running a control program that
               | defines a flow of execution and has
               | schedulers/scoreboards multiplexing the execution unit
               | across arbitrary data flows? It's a matter of
               | perspective.
        
               | modeless wrote:
               | You are missing the point. PTX is not designed as a
               | vendor neutral abstraction like JVM/CLR bytecode.
               | Furthermore CUDA is a lot more than PTX. There's a whole
               | API there, plus applications ship machine code and rely
               | on Nvidia libraries which can be prohibited from running
               | on AMD by license and with DRM, so those large libraries
               | would also become part of the API boundary that AMD would
               | have to reimplement and support.
               | 
               | Chasing CUDA compatibility is a fool's errand when the
               | most important users of CUDA are open source. Just add
               | explicit AMD support upstream and skip the never ending
               | compatibility treadmill, and get better performance too.
               | And once support is established and well used the
               | community will pitch in to maintain it.
        
             | atq2119 wrote:
             | AMD was founded at almost the same time as Intel. X86
             | didn't exist at the time.
             | 
             | But yes, AMD was playing the "follow x86" game for a long
             | time until they came up with x86-64, which evened the
             | playing field in terms of architecture.
        
         | chem83 wrote:
         | To be fair to AMD, they've been trying to solve ML workload
         | portability at more fundamental levels with the acquisition of
         | Nod.ai and de-facto incorporation of Google's IREE compiler
         | project + MLIR.
        
         | whywhywhywhy wrote:
         | > Why would this not be AMD's top priority among priorities?
         | 
         | Same reason it wasn't when it was obvious Nvidia was taking
         | over this space maybe 8 years ago now when they let OpenCL die
         | then proceeded to do nothing till it's too late.
         | 
         | Speaking to anyone working in general purpose GPU coding back
         | then they all just said the same thing, OpenCL was a nightmare
         | to work with and CUDA was easy and mature compared to it.
         | Writing was on the wall where things were heading the second
         | you saw a photon based renderer running on GPU vs CPU all the
         | way back then, AMD has only themselves to blame because Nvidia
         | basically showed them the potential with CUDA.
        
           | btown wrote:
           | One would hope that they've learned since then - but it could
           | very well be that they haven't!
        
         | phero_cnstrcts wrote:
         | Because the two CEOs are family? Like literally.
        
           | CamperBob2 wrote:
           | That didn't stop World War I...
        
       | mdre wrote:
       | Fun fact: ZLUDA means something like illusion/delusion/figment.
       | Well played! (I see the main dev is from Poland.)
        
         | Detrytus wrote:
         | You should also mention that CUDA in Polish means "miracles"
         | (plural).
        
       | miduil wrote:
       | Wow, this is great news. I really hope that the community will
       | find ways to sustainable fund this project, being suddenly run a
       | lot of innovative CUDA based projects on AMD GPUs is a big game-
       | changer, especially because you don't have to deal with the poor
       | state of nvidia on linux support.
        
       | sam_goody wrote:
       | Aside from the latest commit, there has been no activity for
       | almost 3 years (latest code change on Feb 22, 2021).
       | 
       | People are criticizing AMD for dropping this, but it makes sense
       | to stop paying for development when the dev has stopped doing the
       | work, no?
       | 
       | And if he means that AMD stopped paying 3 years ago - well, that
       | was before dinosaurs and ChatGPT, and alot has changed since
       | then.
       | 
       | https://github.com/vosen/ZLUDA/commits/v3
        
         | EspadaV9 wrote:
         | Pretty sure this was developed in private, but because AMD
         | cancelled the contract he has been allowed to open source the
         | code, and this is the "throw it over the fence" code dump.
        
           | rrrix1 wrote:
           | This.                   762 changed files with 252,017
           | additions and 39,027 deletions.
           | 
           | https://github.com/vosen/ZLUDA/commit/1b9ba2b2333746c5e2b05a.
           | ..
        
         | Ambroisie wrote:
         | My thinking is that the dev _did_ work on it for X amount of
         | time, but as part of their contract is not allowed to share the
         | _actual_ history of the repo, thus the massive code dumped in
         | their "Nobody expects the Red Team" commit?
        
         | rswail wrote:
         | Have a look at the latest commit and the level of change.
         | 
         | Effectively the internal commits while he was working for AMD
         | aren't in the repo, but the squashed commit contains all of the
         | changes.
        
         | michaellarabel wrote:
         | As I wrote in the article, it was privately developed the past
         | 2+ years while being contracted by AMD during that time... In a
         | private GitHub repo. Now that he's able to make it public /
         | open-source, he squashed all the changes into a clean new
         | commit to make it public. The ZLUDA code from 3+ years ago was
         | when he was experimenting with CUDA on Intel GPUs.
        
         | SushiHippie wrote:
         | The code prior to this was all for the intel gpu zluda, and
         | then the latest commit is all the amd zluda code, hence why the
         | commit talks about the red team
        
         | Zopieux wrote:
         | If only this exact concern was addressed explicitly in the
         | first FAQ at the bottom of the README...
         | 
         | https://github.com/vosen/ZLUDA/tree/v3?tab=readme-ov-file#fa...
        
         | Detrytus wrote:
         | This is really interesting (from the project's README):
         | 
         | > AMD decided that there is no business case for running CUDA
         | applications on AMD GPUs.
         | 
         | Is AMD leadership brain-damaged, or something?
        
       | AndrewKemendo wrote:
       | ROCm is not spelled out anywhere in their documentation and the
       | best answers in search come from Github and not AMD official
       | documents
       | 
       | "Radeon Open Compute Platform"
       | 
       | https://github.com/ROCm/ROCm/issues/1628
       | 
       | And they wonder why they are losing. Branding absolutely matters.
        
         | rtavares wrote:
         | Later in the same thread:
         | 
         | > ROCm is a brand name for ROCm(tm) open software platform (for
         | software) or the ROCm(tm) open platform ecosystem (includes
         | hardware like FPGAs or other CPU architectures).
         | 
         | > Note, ROCm no longer functions as an acronym.
        
           | ametrau wrote:
           | >> Note, ROCm no longer functions as an acronym.
           | 
           | That is really dumb. Like LLVM.
        
         | marcus0x62 wrote:
         | That, and it only runs on a handful of their GPUs.
        
           | NekkoDroid wrote:
           | If you are talking about the "supported" list of GPUs, those
           | listed are only the ones they fully validate and QA test,
           | other of same gen are likely to work, but most likely with
           | some bumps along the way. In one of the a bit older phoronix
           | posts about ROCm one of their engeneers did say they are
           | trying to expand the list of validated & QA'd cards, as well
           | as destinguishing between "validated", "supported" and "non-
           | functional"
        
             | machomaster wrote:
             | They can say whatever but the action is what matters, not
             | wishes and promises. And the reality is that list of
             | supported GPUs has been unchanged since they first
             | announced it a year ago.
        
         | alwayslikethis wrote:
         | I mean, I also had to look up what CUDA stands for.
        
           | hasmanean wrote:
           | Compute unified device architecture ?
        
         | phh wrote:
         | I have no idea what CUDA stands for, and I live just fine
         | without knowing it.
        
           | moffkalast wrote:
           | Cleverly Undermining Disorganized AMD
        
           | rvnx wrote:
           | Countless Updates Developer Agony
        
             | egorfine wrote:
             | This is the right definition.
        
             | hyperbovine wrote:
             | Lost five hours of my life yesterday discovering the fact
             | that "CUDA 12.3" != "CUDA 12.3 Update 2".
             | 
             | (Yes, that's obvious, but not so obvious when your GPU
             | applications submitted to a cluster start crashing randomly
             | for no apparent reason.)
        
           | smokel wrote:
           | Compute Unified Device Architecture [1]
           | 
           | [1] https://en.wikipedia.org/wiki/CUDA
        
           | alfalfasprout wrote:
           | Crap, updates destroyed (my) application
        
         | sorenjan wrote:
         | Funnily enough it doesn't work on their RDNA ("Radeon DNA")
         | hardware (with some exceptions I think), but it's aimed at
         | their CDNA (Compute DNA). If they would come up with a new name
         | today it probably wouldn't include Radeon.
         | 
         | AMD seems to be a firm believer in separating the consumer
         | chips for gaming and the compute chips for everything else.
         | This probably makes a lot of sense from a chip design and
         | current business perspective, but I think it's shortsighted and
         | a bad idea. GPUs are very competent compute devices, and
         | basically wasting all that performance for "only" gaming is
         | strange to me. AI and other compute is getting more and more
         | important for things like image and video processing, language
         | models, etc. Not only for regular consumers, but for
         | enthusiasts and developers it makes a lot of sense to be able
         | to use your 10 TFLOPS chip even when you're not gaming.
         | 
         | While reading through the AMD CDNA whitepaper I saw this and
         | got a good chuckle. "culmination of years of effort by AMD"
         | indeed.
         | 
         | > The computational resources offered by the AMD CDNA family
         | are nothing short of astounding. However, the key to
         | heterogeneous computing is a software stack and ecosystem that
         | easily puts these abilities into the hands of software
         | developers and customers. The AMD ROCm 4.0 software stack is
         | the culmination of years of effort by AMD to provide an open,
         | standards-based, low-friction ecosystem that enables
         | productivity creating portable and efficient high-performance
         | applications for both first- and third-party developers.
         | 
         | https://www.amd.com/content/dam/amd/en/documents/instinct-bu...
        
           | slavik81 wrote:
           | ROCm works fine on the RDNA cards. On Ubuntu 23.10 and Debian
           | Sid, the system packages for the ROCm math libraries have
           | been built to run on every discrete Vega, RDNA 1, RDNA 2,
           | CDNA 1, and CDNA 2 GPU. I've manually tested dozens of cards
           | and every single one worked. There were just a handful of
           | bugs in a couple of the libraries that could easily be fixed
           | by a motivated individual. https://slerp.xyz/rocm/logs/full/
           | 
           | The system package for HIP on Debian has been stuck on ROCm
           | 5.2 / clang-15 for a while, but once I get it updated to ROCm
           | 5.7 / clang-17, I expect that all discrete RDNA 3 GPUs will
           | work.
        
             | stonogo wrote:
             | It doesn't matter to my lab whether it technically runs.
             | According to https://rocm.docs.amd.com/projects/install-on-
             | linux/en/lates... it only supports three commercially-
             | available Radeon cards (and four available Radeon Pro) on
             | Linux. Contrast this to CUDA, which supports literally
             | every nVIDIA card in the building, including the crappy NVS
             | series and weirdo laptop GPUs, and it basically becomes
             | impossible to convince anyone to develop for ROCm.
        
         | atq2119 wrote:
         | My understanding is that there was some trademark silliness
         | around "open compute", and AMD decided that instead of doing a
         | full rebrand, they would stick to ROCm but pretend that it
         | wasn't ever an acronym.
        
           | michaellarabel wrote:
           | Yeah it was due to the Open Compute Project AFAIK... Though
           | for a little while AMD was telling me they really meant to
           | call it "Radeon Open eCosystem" before then dropping that too
           | with many still using the original name.
        
         | slavik81 wrote:
         | That is intentional. We had to change the name. ROCm is no
         | longer an acronym.
        
           | AndrewKemendo wrote:
           | I assume you're on the team if you're saying "we"
           | 
           | Can you say why you had to change the name?
        
       | pjmlp wrote:
       | So polyglot programing workflows via PTX targeting are equally
       | supported?
        
       | michalf6 wrote:
       | Zluda roughly means "delusion" / "mirage" / "illusion" in Polish,
       | given the author is called Andrzej Janik this may be a pun :)
        
         | rvba wrote:
         | Arguably one could also translate it as "something that will
         | never happen".
         | 
         | At the same time "cuda" could be translated as "wonders".
        
       | eqvinox wrote:
       | Keeping my hopes curtailed until I see proper benchmarks...
        
       | hd4 wrote:
       | The interest in this thread tells me there are a lot of people
       | who are not cool with the CUDA monopoly.
        
         | smoldesu wrote:
         | Those people should have spoken up when their hardware
         | manufacturers abandoned OpenCL. The industry set itself 5-10
         | years behind by ignoring open GPGPU compute drivers while
         | Nvidia slowly built their empire. Just look at how long it's
         | taken to re-impliment a _fraction_ of the CUDA featureset on a
         | small handful of hardware.
         | 
         | CUDA shouldn't exist. We should have hardware manufacturers
         | _working together_ , using common APIs and standardizing
         | instead of going for the throat. The further platforms drift
         | apart, the more valuable Nvidia's vertical integration becomes.
        
           | mnau wrote:
           | Common API means being replaceable, fungible. There are no
           | margins in that.
        
             | smoldesu wrote:
             | Correct. It's why the concept of 'proprietary UNIX' didn't
             | survive long once program portability became an incentive.
        
           | Avamander wrote:
           | Is my impression wrong, that people understood the need for
           | OCL only after CUDA had already cornered and strangled the
           | market?
        
             | smoldesu wrote:
             | You're mostly right. CUDA was a "sleeper product" that
             | existed early-on but didn't see serious demand until later.
             | OpenCL was Khronos Group's hedged bet against the success
             | of CUDA; it was assumed that they would invest in it more
             | as demand for GPGPU increased. After 10 years though,
             | OpenCL wasn't really positioned to compete and CUDA was
             | more fully-featured than ever. Adding insult to injury, OS
             | manufacturers like Microsoft and Apple started to avoid
             | standardized GPU libraries in favor of more insular native
             | APIs. By the time demand for CUDA materialized, OpenCL had
             | already been left for dead by most of the involved parties.
        
       | cashsterling wrote:
       | I feel like AMD's senior executives all own a lot of nVIDIA
       | stock.
        
       | lambdaone wrote:
       | It seems to me that AMD are crazy to stop funding this. CUDA-on-
       | ROCm breaks NVIDIA's moat, and would also act as a disincentive
       | for NVIDIA to make breaking changes to CUDA; what more could AMD
       | want?
       | 
       | When you're #1, you can go all-in on your own proprietary stack,
       | knowing that network effects will drive your market share higher
       | and higher for you for free.
       | 
       | When you're #2, you need to follow de-facto standards and work on
       | creating and following truly open ones, and try to compete on
       | actual value, rather than rent-seeking. AMD of all companies
       | should know this.
        
         | RamRodification wrote:
         | > and would also act as a disincentive for NVIDIA to make
         | breaking changes to CUDA
         | 
         | I don't know about that. You could kinda argue the opposite.
         | "We improved CUDA. Oh it stopped working for you on AMD
         | hardware? Too bad. Buy Nvidia next time"
        
           | mnau wrote:
           | Also known as OS/2: Redux strategy.
        
           | freeone3000 wrote:
           | Most CUDA applications do not target the newest CUDA version!
           | Despite 12.1 being out, lots of code still targets 7 or 8 to
           | support old NVIDIA cards. Similar support for AMD isn't
           | unthinkable (but a rewrite to rocm would be).
        
           | outside415 wrote:
           | NVIDIA is about ecosystem plays, they have no interest in
           | sabotage or anti competition plays. Leave that to apple and
           | google and their dumb app stores and mobile OSs.
        
             | 0x457 wrote:
             | > NVIDIA is about ecosystem plays, they have no interest in
             | sabotage or anti competition plays.
             | 
             | Are we talking about the same NVIDIA? The entire Nvidia GPU
             | strategy for nvidia is - make a feature (or find existing
             | one) that performs better on their cards - pay developers
             | to use (and sometimes misuse) it extensively.
        
         | saboot wrote:
         | Yep, I develop several applications that use CUDA. I see
         | AMD/Radeon powered computers for sale and want to buy one, but
         | I am not going to risk not being able to run those applications
         | or having to rewrite them.
         | 
         | If they want me as a customer, and they have not created a
         | viable alternative to CUDA, they need to pursue this.
        
           | weebull wrote:
           | Define "viable"?
        
         | tester756 wrote:
         | If you see:
         | 
         | 1) billions of dollar at the stake
         | 
         | 2) one of the most successful leadership
         | 
         | 3) during hottest peroid of their business where they heard
         | about Nvidia's moat probably thousands of times during last 18
         | months...
         | 
         | and you call some decision "crazy", then you probably do not
         | have the same informations that they do
         | 
         | or they underperformed, who knows, but I bet on #1 reason.
        
       | 2OEH8eoCRo0 wrote:
       | Question: Why aren't we using LLMs to translate programs to use
       | ROCm?
       | 
       | Isn't translation one of the strengths of LLMs?
        
         | JonChesterfield wrote:
         | You can translate cuda to hip using a regex. LLM is rather
         | overkill.
        
       | mogoh wrote:
       | Why is CUDA so prevalent oppose to its alternatives?
        
         | smoldesu wrote:
         | At first, it was because Nvidia had a wide variety of highly
         | used cards that almost all support some form of CUDA. By-and-
         | large, your gaming GPU could debug and run the same code that
         | you'd scale up to a datacenter, which was a huge boon for
         | researchers and niche industry applications.
         | 
         | With that momentum, CUDA got incorporated into a _lot_ of high-
         | performance computing applications. Few alternatives show up
         | because there aren 't many acceleration frameworks that are as
         | large or complete as CUDA. Nvidia pushed forward by scaling
         | down to robotics and edge-compute scale hardware, and now are
         | scaling up with their DGX/Grace platforms.
         | 
         | Today, Nvidia is prevalent because all attempts to subvert them
         | have failed. Khronos Group tried to get the industry to rally
         | around OpenCL as a widely-supported alternative, but too many
         | stakeholders abandoned it before the initial crypto/AI booms
         | kicked off the demand for GPGPU compute.
        
         | JonChesterfield wrote:
         | Opencl was the alternative, came along later, couldn't write a
         | lot of programs that cuda can. Cuda is legitimately better than
         | opencl.
        
       | enonimal wrote:
       | From the ARCHITECTURE.md:
       | 
       | > Those pointers point to undocumented functions forming CUDA
       | Dark API. It's impossible to tell how many of them exist, but
       | debugging experience suggests there are tens of function pointers
       | across tens of tables. A typical application will use one or two
       | most common. Due to they undocumented nature they are exclusively
       | used by Runtime API and NVIDIA libraries (and in by CUDA
       | applications in turn). We don't have names of those functions nor
       | names or types of the arguments. This makes implementing them
       | time-consuming. Dark API functions are are reverse-engineered and
       | implemented by ZLUDA on case-by-case basis once we observe an
       | application making use of it.
        
         | leeoniya wrote:
         | fertile soil for Alyssa and Asahi Lina :)
         | 
         | https://rosenzweig.io/
         | 
         | https://vt.social/@lina
        
           | smcl wrote:
           | I know that Lina doesn't like a lot of the attention HN sends
           | her way so it may be better if you don't link her socials
           | here.
        
             | bigdict wrote:
             | Sounds ridiculous, why have a public presence on a social
             | network then?
        
               | cyanydeez wrote:
               | oh. I think the emphasis is on hacker news.
               | 
               | you know certain social media sites contain certain toxic
               | conversants.
        
               | bigdict wrote:
               | > you know certain social media sites contain certain
               | toxic conversants.
               | 
               | That's just people...
        
         | PoignardAzur wrote:
         | Having an ARCHITECTURE.md file at all is extremely promising,
         | but theirs seems pretty polished too!
        
         | gdiamos wrote:
         | These were a huge pain in the ass when I tried this 20 years
         | ago on Ocelot.
         | 
         | Eventually one of the NVIDIA engineers just asked me to join
         | and I did. :-P
        
       | Cu3PO42 wrote:
       | I'm really rooting for AMD to break the CUDA monopoly. To this
       | end, I genuinely don't know whether a translation layer is a good
       | thing or not. On the upside it makes the hardware much more
       | viable instantly and will boost adoption, on the downside you run
       | the risk that devs will never support ROCm, because you can just
       | use the translation layer.
       | 
       | I think this is essentially the same situation as Proton+DXVK for
       | Linux gaming. I think that that is a net positive for Linux, but
       | I'm less sure about this. Getting good performance out of GPU
       | compute requires much more tuning to the concrete architecture,
       | which I'm afraid devs just won't do for AMD GPUs through this
       | layer, always leaving them behind their Nvidia counterparts.
       | 
       | However, AMD desperately needs to do something. Story time:
       | 
       | On the weekend I wanted to play around with Stable Diffusion. Why
       | pay for cloud compute, when I have a powerful GPU at home, I
       | thought. Said GPU is a 7900 XTX, i.e. the most powerful consumer
       | card from AMD at this time. Only very few AMD GPUs are supported
       | by ROCm at this time, but mine is, thankfully.
       | 
       | So, how hard could it possibly to get Stable Diffusion running on
       | my GPU? Hard. I don't think my problems were actually caused by
       | AMD: I had ROCm installed and my card recognized by rocminfo in a
       | matter of minutes. But the whole ML world is so focused on Nvidia
       | that it took me ages to get a working installation of pytorch and
       | friends. The InvokeAI installer, for example, asks if you want to
       | use CUDA or ROCm, but then always installs the CUDA variant
       | whatever you answer. Ultimately, I did get a model to load, but
       | the software crashed my graphical session before generating a
       | single image.
       | 
       | The whole experience left me frustrated and wanting to buy an
       | Nvidia GPU again...
        
         | Certhas wrote:
         | They are focusing on HPC first. Which seems reasonable if your
         | software stack is lacking. Look for sophisticated customers
         | that can help build an ecosystem.
         | 
         | As I mentioned elsewhere, 25% of GPU compute on the Top 500
         | Supercomputer list is AMD. This all on the back of a card that
         | came out only three years ago. We are very rapidly moving
         | towards a situation where there are many, many high-performance
         | developers that will target ROCm.
        
           | ametrau wrote:
           | Is a top 500 super computer list a good way of measuring
           | relevancy in the future?
        
             | latchkey wrote:
             | No, it isn't. What is a better measure is to look at
             | businesses like what I'm building (and others), where we
             | take on the capex/opex risk around top end AMD products and
             | bring them to the masses through bare metal rentals.
             | Previously, these sorts of cards were only available to the
             | Top 500.
        
         | whywhywhywhy wrote:
         | > I'm really rooting for AMD to break the CUDA monopoly
         | 
         | Personally I want Nvidia to break the x86-64 monopoly, with how
         | amazing properly spec'd Nvidia cards are to work with I can
         | only dream of a world where Nvidia is my CPU too.
        
           | kuschkufan wrote:
           | apt username
        
           | smcleod wrote:
           | That's already been done with ARM.
        
           | weebull wrote:
           | > Personally I want Nvidia to break the x86-64 monopoly
           | 
           | The one supplied by two companies?
        
             | Keyframe wrote:
             | Maybe he meant homogeneity which Nvidia did try and tries
             | with Arm.. but, on the other hand how wild would it be for
             | Nvidia to enter x86-64 as well? It's probably never going
             | to happen due to licensing if nothing else, lest we
             | remember nForce chipset ordeal with intel legal.
        
         | bntyhntr wrote:
         | I would love to be able to have a native stable diffusion
         | experience, my rx 580 takes 30s to generate a single image. But
         | it does work after following
         | https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki...
         | 
         | I got this up and running on my windows machine in short order
         | and I don't even know what stable diffusion is.
         | 
         | But again, it would be nice to have first class support to
         | locally participate in the fun.
        
           | Cu3PO42 wrote:
           | I have heard that DirectML was a somewhat easier story, but
           | allegedly has worse performance (and obviously it's Windows
           | only...). But I'm not entirely suprised that setup is
           | somewhat easier on Windows, where bundling everything is an
           | accepted approach.
           | 
           | With AMD's official 15GB(!) Docker image, I was now able to
           | get the A1111 UI running. With SD 1.5 and 30 sample
           | iterations, generating an image takes under 2s. I'm still
           | struggling to get InvokeAI running.
        
         | westurner wrote:
         | > _Proton+DXVK for Linux gaming_
         | 
         | "Building the DirectX shader compiler better than Microsoft?"
         | (2024) https://news.ycombinator.com/item?id=39324800
         | 
         | E.g. llama.cpp already supports hipBLAS; is there an advantage
         | to this ROCm CUDA-compatibility layer - ZLUDA on Radeon (and
         | not yet Intel OneAPI) - instead or in addition?
         | https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#hi...
         | https://news.ycombinator.com/item?id=38588573
         | 
         | What can't WebGPU abstract away from CUDA unportability?
         | https://news.ycombinator.com/item?id=38527552
        
         | nocombination wrote:
         | As other folks have commented, CUDA not being an open standard
         | is a large part of the problem. That and the developers who
         | target CUDA directly when writing Stable Diffusion algorithms--
         | they are forcing the monopoly. Even at the cost of not being
         | able to squeeze every ounce out of the GPU, portability greatly
         | improves software access when people target Vulkan et al.
        
         | formerly_proven wrote:
         | > I'm really rooting for AMD to break the CUDA monopoly. To
         | this end, I genuinely don't know whether a translation layer is
         | a good thing or not. On the upside it makes the hardware much
         | more viable instantly and will boost adoption, on the downside
         | you run the risk that devs will never support ROCm, because you
         | can just use the translation layer.
         | 
         | On the other hand:
         | 
         | > The next major ROCm release (ROCm 6.0) will not be backward
         | [source] compatible with the ROCm 5 series.
         | 
         | Even worse, not even the driver is backwards-compatible:
         | 
         | > There are some known limitations though like currently only
         | targeting the ROCm 5.x API and not the newly-released ROCm 6.x
         | releases.. In turn having to stick to ROCm 5.7 series as the
         | latest means that using the ROCm DKMS modules don't build
         | against the Linux 6.5 kernel now shipped by Ubuntu 22.04 LTS
         | HWE stacks, for example. Hopefully there will be enough
         | community support to see ZLUDA ported to ROCM 6 so at least it
         | can be maintained with current software releases.
        
         | nialv7 wrote:
         | I am surprised that everybody seem to have forgotten the
         | (in)famous Embrace, Extend and Extinguish strategy.
         | 
         | It's time for Open Source to be on the extinguishing side for
         | once.
        
         | sophrocyne wrote:
         | Hey there -
         | 
         | I'm a maintainer (and CEO) of Invoke.
         | 
         | It's something we're monitoring as well.
         | 
         | ROCm has been challenging to work with - we're actively talking
         | to AMD to keep apprised of ways we can mitigate some of the
         | more troublesome experiences that users have with getting
         | Invoke running on AMD (and hoping to expand official support to
         | Windows AMD)
         | 
         | The problem is that a lot of the solutions proposed involve
         | significant/unsustainable dev effort (i.e., supporting an
         | entirely different inference paradigm), rather than "drop in"
         | for the existing Torch/diffusers pipelines.
         | 
         | While I don't know enough about your set up to offer immediate
         | solutions, if you join the discord, am sure folks would be
         | happy to try walking through some manual
         | troubleshooting/experimentation to get you up and running -
         | discord.gg/invoke-ai
        
           | latchkey wrote:
           | Invoke is awesome. Let me know if you guys want some MI300x
           | to develop/test on. =) We've also got some good contacts at
           | AMD if you need help there as well.
        
       | CapsAdmin wrote:
       | Hope this can benefit from the seemingly infinite enthusiasm from
       | rust programmers
        
       | sharts wrote:
       | AMD fail to realize software toolchain is what makes nvidia
       | great. AMD thinks the hardware is all that's needed
        
         | JonChesterfield wrote:
         | Nvidia's toolchain is really not great. Applications are just
         | written to step around the bugs.
         | 
         | ROCm has different bugs, which the application workarounds tend
         | to miss.
        
           | bornfreddy wrote:
           | Yes. This is what makes Nvidia's toolchain, if not great, at
           | least ok. As a developer I can actually use their GPUs. And
           | what I developed locally I can yhen run on Nvidia hardware in
           | the cloud and pay by usage.
           | 
           | AMD doesn't seem to understand that affordable entry-level
           | hardware with good software support is key.
        
             | JonChesterfield wrote:
             | Ah yes, so that one does seem to be a stumbling block. ROCm
             | is not remotely convinced that running on gaming cards is a
             | particularly useful thing. HN is really sure that being
             | able to develop code on ~free cards that you've got lying
             | around anyway is an important gateway to running on amdgpu.
             | 
             | The sad thing is people can absolutely run ROCm on gaming
             | cards if they build from source. Weirdly GPU programmers
             | seem determined to use proprietary binaries to run
             | "supported" hardware, and thus stick with CUDA.
             | 
             | I don't understand why AMD won't write the names of some
             | graphics cards under "supported", even if they didn't test
             | them as carefully as the MI series, and I don't understand
             | why developers are so opposed to compiling their toolchains
             | from source. For one thing it means you can't debug the
             | toolchain effectively when it falls over, weird limitation
             | to inflict on oneself.
             | 
             | Strange world.
        
       | CapsAdmin wrote:
       | One thing I didn't see mentioned anywhere apart from the repos
       | readme:
       | 
       | > PyTorch received very little testing. ZLUDA's coverage of cuDNN
       | APIs is very minimal (just enough to run ResNet-50) and
       | realistically you won't get much running.
        
       | yieldcrv wrote:
       | Sam could get more chips for way less than $7 trillion if he
       | helps fund and mature this
        
         | JonChesterfield wrote:
         | I'm pretty tired of the business model of raising capital from
         | VCs to give to Nvidia.
        
       | Keyframe wrote:
       | This event of release is however a result of AMD stopped funding
       | it per "After two years of development and some deliberation, AMD
       | decided that there is no business case for running CUDA
       | applications on AMD GPUs. One of the terms of my contract with
       | AMD was that if AMD did not find it fit for further development,
       | I could release it. Which brings us to today." from
       | https://github.com/vosen/ZLUDA?tab=readme-ov-file#faq
       | 
       | so, same mistake intel made before.
        
         | VoxPelli wrote:
         | Sounds like he had a good contract, would be great to read more
         | about that, hopefully more devs could include the same
         | phrasing!
        
         | nikanj wrote:
         | This should be the top comment here, people are getting their
         | hopes up for nothing
        
         | jacoblambda wrote:
         | I mean it could also be that there was no business case for it
         | as long as it remained closed source work.
         | 
         | If the now very clearly well functioning implementation
         | continues to perform as well as it is, the community may be
         | able to keep it funded and functioning.
         | 
         | And the other side of this is that with renewed AMD
         | interest/support for the rocm/HIP project, it might be just
         | good enough as a stopgap step to push projects towards rocm/HIP
         | adoption. (included below is another blurb from the readme).
         | 
         | > I am a developer writing CUDA code, does this project help me
         | port my code to ROCm/HIP?
         | 
         | > Currently no, this project is strictly for end users. However
         | this project could be used for a much more gradual porting from
         | CUDA to HIP than anything else. You could start with an
         | unmodified application running on ZLUDA, then have ZLUDA expose
         | the underlying HIP objects (streams, modules, etc.), allowing
         | to rewrite GPU kernels one at a time. Or you could have a mixed
         | CUDA-HIP application where only the most performance sensitive
         | GPU kernels are written in the native AMD language.
        
         | pk-protect-ai wrote:
         | > After two years of development and some deliberation, AMD
         | decided that there is no business case for running CUDA
         | applications on AMD GPUs
         | 
         | Who was responsible at AMD for this project and why is he still
         | not fired???????? How brain dead someone have to be to reject
         | the major market share??????
        
         | tgsovlerkhgsel wrote:
         | How is this not priority #1 for them, with NVIDIA stock
         | shooting to the moon because everyone does machine learning
         | using CUDA-centric tools?
         | 
         | If AMD could get 90% of the CUDA ML stuff to seamlessly run on
         | AMD hardware, and could provide hardware at a competitive cost-
         | per-performance (which I assume they probably could since
         | NVIDIA must have an insane profit margin on their GPUs),
         | wouldn't that be _the_ opportunity to eat NVIDIA 's lunch?
        
           | make3 wrote:
           | it's a common misconception that deep learning stuff is built
           | in cuda. it's actually built on CUDNN kernels that don't use
           | cuda but are actually gpu assembly written by hand by phds.
           | I'm really not convinced that this project here would be able
           | to be used for this. the ROCm kernels that are analogue to
           | cudnn though, yes
        
           | pheatherlite wrote:
           | The only reason our lab bought 20k worth of Nvidia gpu cards
           | rather than amd was the cuda industry standard (might as
           | wellbe). It's kind of mind boggling how much business amd
           | must be losing over this.
        
       | shmerl wrote:
       | Anything that breaks CUDA lock-in is great! This reminds how
       | DX/D3D lock-in was broken by dxvk and vkd3d-proton.
       | 
       |  _> It apparently came down to an AMD business decision to
       | discontinue the effort_
       | 
       | Bad decision if that's the case. May be someone can pick it up,
       | since it's open now.
        
       | swozey wrote:
       | I may have missed it in the article, but this post would mean
       | absolutely nothing to me except for the fact that last week I got
       | into stable diffusion so I'm crushing my 4090 with pytorch and
       | deepspeed, etc and dealing with a lot of nvidia ctk/sdk stuff.
       | Well, I'm actually trying to do this in windows w/ wsl2 and
       | deepmind/torch/etc in containers and it's completely broken so
       | not crushing currently.
       | 
       | I guess awhile ago it was found that Nvidia was bypassing the
       | kernels GPL license driver check and I read that kernel 6.6 was
       | going to lock that driver out if they didn't fix it, and from
       | what I've read there was no reply or anything done by nvidia yet.
       | Which I think I probably just can't find.
       | 
       | Am I wrong about that part?
       | 
       | We're on kernel 6.7.4 now and I'm still using the same drivers.
       | Did it get pushed back, did nvidia fix it?
       | 
       | Also, while trying to find answers myself I came across this 21
       | year old post which is pretty funny and very apt for the topic
       | https://linux-kernel.vger.kernel.narkive.com/eVHsVP1e/why-is...
       | 
       | I'm seeing conflicting info all over the place so I'm not really
       | sure what the status of this GPL nvidia driver block thing is.
        
       | zoobab wrote:
       | "For reasons unknown to me, AMD decided this year to discontinue
       | funding the effort and not release it as any software product."
       | 
       | Managers at AMD never heard of AI?
        
       | ultra_nick wrote:
       | If anyone wants to work in this area, AMD currently has a lot of
       | related job posts open.
        
       | navbaker wrote:
       | The other big need is for a straightforward library for dynamic
       | allocation/sharing of GPUs. Bitfusion was a huge pain in the ass,
       | but at least it was something. Now it's been discontinued, the
       | last version doesn't support any recent versions of PyTorch, and
       | there's only two(?) possible replacements in varying levels of
       | readiness (Juice and RunAI). We're experimenting now with
       | replacing our Bitfusion installs with a combination of Jupyter
       | Enterprise Gateway and either MIGed GPUs or finding a way to get
       | JEG to talk to a RunAI installation to allow quick allocation and
       | deallocation of portions of GPUs for our researchers.
        
       | irusensei wrote:
       | I'll try later with projects in which I had issues to make it
       | work like TortoiseTTS. I'm not expecting comparable Nvidia speeds
       | but definitely faster than pure CPU.
        
       | Farfignoggen wrote:
       | Phoronix Article from earlier(1):
       | 
       | "While AMD ships pre-built ROCm/HIP stacks for the major
       | enterprise Linux distributions, if you are using not one of them
       | or just want to be adventurous and compile your own stack for
       | building HIP programs for running on AMD GPUs, one of the AMD
       | Linux developers has written a how-to guide. "(1)
       | 
       | (1)
       | 
       | "Building An AMD HIP Stack From Upstream Open-Source Code
       | 
       | Written by Michael Larabel in Radeon on 9 February 2024 at 06:45
       | AM EST."
       | 
       | https://www.phoronix.com/news/Building-Upstream-HIP-Stack
        
         | JonChesterfield wrote:
         | Hahnle is one of our best, that'll be solid.
         | http://nhaehnle.blogspot.com/2024/02/building-hip-
         | environmen.... Looks pretty similar to how I build it.
         | 
         | Side point, there's a driver in your linux kernel already
         | that'll probably work. The driver that ships with rocm is a
         | newer version of the same and might be worth building via dkms.
         | 
         | Very strange that the rocm github doesn't have build scripts
         | but whatever, I've been trying to get people to publish those
         | for almost five years now and it just doesn't seem to be
         | feasible.
        
         | rekado wrote:
         | You can also install HIP/ROCm via Guix:
         | 
         | https://hpc.guix.info/blog/2024/01/hip-and-rocm-come-to-guix...
         | 
         | > AMD has just contributed 100+ Guix packages adding several
         | versions of the whole HIP and ROCm stack
        
       | fancyfredbot wrote:
       | Wouldn't it be fun to make this work on intel graphics as well.
        
       | codedokode wrote:
       | As I understand, Vulkan allows to run custom code on GPU,
       | including the code to multiply matrices. Can one simply use
       | Vulkan and ignore CUDA, PyTorch and ROCm?
        
       ___________________________________________________________________
       (page generated 2024-02-12 23:00 UTC)