[HN Gopher] Testing AMD's Giant MI300X
___________________________________________________________________
Testing AMD's Giant MI300X
Author : klelatti
Score : 167 points
Date : 2024-06-25 15:36 UTC (7 hours ago)
(HTM) web link (chipsandcheese.com)
(TXT) w3m dump (chipsandcheese.com)
| pella wrote:
| from the summary:
|
| _" When it is all said and done, MI300X is a very impressive
| piece of hardware. However, the software side suffers from a
| chicken-and-egg dilemma. Developers are hesitant to invest in a
| platform with limited adoption, but the platform also depends on
| their support. Hopefully the software side of the equation gets
| into ship shape. Should that happen, AMD would be a serious
| competitor to NVIDIA."_
| firebaze wrote:
| OTOH, the performance advantage compared to the H100 is super-
| impressive according to tfa. Things could become interesting
| again in the GPU market.
| bearjaws wrote:
| IMO Nvidia is going to force companies to fix this, Nvidia have
| always made it clear they will increase prices and capture 90%
| of your profits when left free to do so. See any example from
| the GPU vendor space. There isn't infinite money to be spent
| per token, so it's not like the AI companies can just increase
| prices.
|
| AMD can offer this product at a 40% discount and still make
| money tells you all you need to know.
| amelius wrote:
| I'm personally wondering when nVidia will open an AI
| AppStore, and every app that runs on nVidia hardware will
| have to be notarized first, and you'll have to pay 30% of
| your profits to nVidia.
|
| History has shown that this idea is not as crazy as it
| sounds.
| latchkey wrote:
| The news you've all been waiting for!
|
| We are thrilled to announce that Hot Aisle Inc. proudly
| volunteered our system for Chips and Cheese to use in their
| benchmarking and performance showcase. This collaboration has
| demonstrated the exceptional capabilities of our hardware and
| further highlighted our commitment to cutting-edge technology.
|
| Stay tuned for more exciting updates!
| JonChesterfield wrote:
| Thank you for loaning the box out! Has a lot more credibility
| than the vendor saying it runs fast
| latchkey wrote:
| Thanks Jon, that's exactly the idea. About $12k worth of free
| compute on a box that costs as much as a Ferrari.
|
| Funny that HN doesn't like my comment for some reason though.
| renewiltord wrote:
| It reads like the kind of chumbox PR you read at bottom of
| random website. Get a copywriter or something like
| writer.ai. I thought your comment was spam and nearly
| flagged it. It really is atrocious copy.
| jampekka wrote:
| I thought it was sarcastic.
| logicallee wrote:
| [retracted]
| latchkey wrote:
| This is the news that many people have been waiting for
| and we do have more exciting updates coming. There is
| another team on the system now doing testing. We have a
| list of 22 people currently waiting.
| logicallee wrote:
| okay, I've retracted my comments. Thanks for your
| generosity.
| latchkey wrote:
| Thanks, but I wouldn't call it generosity. We're helping
| AMD build a developer flywheel and that is very much to
| our benefit. The more developers using these chips, the
| more chips that are needed, the more we buy to rent out,
| the more our business grows.
|
| Previously, this stuff was only available to HPC
| applications. We're trying to get these into the hands of
| more developers. Our view is that this is a great way to
| foster the ecosystem.
|
| Our simple and competitive pricing reflects this as well.
| klelatti wrote:
| Do you think this comment will make Hot Aisle more or
| less likely to loan out their hardware in the future?
|
| Personally, I couldn't care less about the quality of
| copy. I do care about having access to similar hardware
| in the future.
| latchkey wrote:
| Heh, I didn't even think of that, but you make a good
| point. Don't worry though, we will keep the access
| coming. I hate to say it, but it literally is... stay
| tuned for more exciting updates.
| klelatti wrote:
| Thanks so much for doing that. There are loads of people
| here who really appreciate it. We will stay tuned!
| alecco wrote:
| Don't sweat it. Some people are trigger happy on downvoting
| things looking like self-promotion due to the sheer amount
| of spam everywhere. Your sponsorship (?) is the right way
| to promote your company. Thank you.
| elorant wrote:
| Even if the community provides support it could take years to
| reach the maturity of CUDA. So while it's good to have some
| competition, I doubt it will make any difference in the immediate
| future. Unless some of the big corporations in the market lean in
| heavily and support the framework.
| JonChesterfield wrote:
| Something like Triton from Microsoft/OpenAI as a cuda bypass?
| Or pytorch/tensorflow targeting ROCm without user intervention.
|
| Or there's openmp or hip. In extremis opencl.
|
| I think the language stack is fine at this point. The moat
| isn't in cuda the tech. It's in code running reliably on
| nvidia's stack, without things like stray pointers needing a
| machine reboot. Hard to know how far off robust rocm is at this
| point.
| jeroenhd wrote:
| If, and that's a big if, AMD can get ROCm working well for this
| chip, I don't think this will be a big problem.
|
| ROCm can be spotty, especially on consumer cards, but for many
| models it does seem to work on their more expensive models. It
| may be worth it spending a few hours/days/weeks to work around
| the peculiarities of ROCm given the cost difference between AMD
| and Nvidia in this market segment.
|
| This all stands or falls with how well AMD can get ROCm to
| work. As this article states, it's nowhere near ready yet, but
| one or two updates can turn AMD's accelerators from "maybe in
| 5-10 years" to "we must consider this next time we order
| hardware".
|
| I also wonder if AMD is going to put any effort into ROCm (or a
| similar framework) as a response to Qualcomm and other ARM
| manufacturers creaming them on AI stuff. If these Copilot PCs
| take off, we may see AMD invest into their AI compatibility
| libraries because of interest from both sides.
| entropicdrifter wrote:
| If we're extremely lucky they might invest in SYCL and we'll
| see an Intel/AMD open-source teamup
| pbalcer wrote:
| > Qualcomm and other ARM manufacturers creaming them on AI
| stuff
|
| That's mostly on Microsoft's DirectML though. I'm not sure
| whether AMD's implementation is based on ROCm (doubt it).
| jjoonathan wrote:
| So I knew that AMD's compute stack was a buggy mess -- nobody
| starts out wanting to pay more for less and I had to learn
| the hard way how big of a gap there was between AMD's paper
| specs and their actual offerings -- and I also knew that
| Nvidia had a huge edge at the cutting edge of things, if you
| need gigashaders or execution reordering or whatever, but ML
| isn't any of that. The calculations are "just" matrix
| multiplication, or not far off.
|
| I would have thought AMD could have scrambled to fix their
| bugs, at least the matmul related ones, scrambled to shore up
| torch compatibility or whatever was needed for LLM training,
| and pushed something out the door that might not have been
| top-of-market but could at least have taken advantage of the
| opportunity provided by 80% margins from team green. I
| thought the green moat was maybe a year wide and tens of
| millions deep (enough for a team to test the bugs, a team to
| fix the bugs, time to ramp, and time to make it happen). But
| here we are, multiple years and trillions in market cap delta
| later, and AMD still seems to be completely non-viable. What
| happened? Did they go into denial about the bugs? Did they
| fix the bugs but the industry still doesn't trust them?
| JonChesterfield wrote:
| It's roughly that the AMD tech works reasonably well on HPC
| and less convincingly on "normal" hardware/systems. So a
| lot of AMD internal people think the stack is solid because
| it works well on their precisely configured dev machines
| and on the commercially supported clusters.
|
| Other people think it's buggy and useless because that's
| the experience on some other platforms.
|
| This state of affairs isn't great. It could be worse but it
| could certainly be much better.
| latchkey wrote:
| https://stratechery.com/2024/an-interview-with-amd-ceo-
| lisa-...
|
| "One of the things that you mentioned earlier on software,
| very, very clear on how do we make that transition super easy
| for developers, and one of the great things about our
| acquisition of Xilinx is we acquired a phenomenal team of
| 5,000 people that included a tremendous software talent that
| is right now working on making AMD AI as easy to use as
| possible."
| jjoonathan wrote:
| Oh no. Ohhhh nooooo. No, no, no!
|
| Xilinx dev tools are awful. They are the ones who had
| Windows XP as the only supported dev environment for a
| product with guaranteed shipments through 2030. I saw
| Xilinx defend this state of affairs for over a decade. My
| entire FPGA-programming career was born, lived, and died,
| long after XP became irrelevant but before Xilinx moved
| past it, although I think they finally gave in some time
| around 2022. Still, Windows XP through 2030, and if you
| think that's bad wait until you hear about the actual
| software. These are not role models of dev experience.
|
| In my, err, uncle? post I said that I was confused about
| where AMD was in the AI arms race. Now I know. They really
| are just this dysfunctional. Yikes.
| pbalcer wrote:
| Xilinx made triSYCL (https://github.com/triSYCL/triSYCL),
| so maybe there's some chance AMD invests first-class
| support for SYCL (an open standard from Khronos). That'd
| be nice. But I don't have much hope.
| paulmd wrote:
| this is honestly a very enlightening interview because - as
| pointed out at the time - Lisa Su is basically _repeatedly_
| asked about software and every single time she blatantly
| dodges the question and tries to steer the conversation
| back to her comfort-zone on hardware.
| https://news.ycombinator.com/item?id=40703420
|
| > He tries to get a comment on the (in hindsight) not great
| design tradeoffs made by the Cell processor, which was hard
| to program for and so held back the PS3 at critical points
| in its lifecycle. It was a long time ago so there's been
| plenty of time to reflect on it, yet her only thought is
| "Perhaps one could say, if you look in hindsight,
| programmability is so important". That's it! In hindsight,
| programmability of your CPU is important! Then she
| immediately returns to hardware again, and saying how proud
| she was of the leaps in hardware made over the PS
| generations.
|
| > He asks her if she'd stayed at IBM and taken over there,
| would she have avoided Gerstner's mistake of ignoring the
| cloud? Her answer is "I don't know that I would've been on
| that path. I was a semiconductor person, I am a
| semiconductor person." - again, she seems to just reject on
| principle the idea that she would think about software,
| networking or systems architecture because she defines
| herself as an electronics person.
|
| > Later Thompson tries harder to ram the point home, asking
| her "Where is the software piece of this? You can't just be
| a hardware cowboy ... What is the reticence to software at
| AMD and how have you worked to change that?" and she just
| point-blank denies AMD has ever had a problem with
| software. Later she claims everything works out of the box
| with AMD and seems to imply that ROCm hardly matters
| because everyone is just programming against PyTorch
| anyway!
|
| > The final blow comes when he asks her about ChatGPT. A
| pivotal moment that catapults her competitor to absolute
| dominance, apparently catching AMD unaware. Thompson asks
| her what her response was. Was she surprised? Maybe she
| realized this was an all hands to deck moment? What did
| NVIDIA do right that you missed? Answer: no, we always knew
| and have always been good at AI. NVIDIA did nothing
| different to us.
|
| > The whole interview is just astonishing. Put under
| pressure to reflect on her market position, again and again
| Su retreats to outright denial and management waffle about
| "product arcs". It seems to be her go-to safe space. It's
| certainly possible she just decided to play it all as low
| key as possible and not say anything interesting to protect
| the share price, but if I was an analyst looking for signs
| of a quick turnaround in strategy there's no sign of that
| here.
|
| not expecting a heartfelt postmortem about how things got
| to be this bad, but you can very easily _make this question
| go away too_ , simply by acknowledging that it's a focus
| and you're working on driving change and blah blah. you
| really don't have to worry about crushing some analyst's
| mindshare on AMD's software stack because nobody is crazy
| enough to think that AMD's software isn't horrendously
| behind at the present moment.
|
| and frankly that's literally how she's governed as far as
| software too. ROCm is barely a concern. Support
| base/install base, obviously not a concern. DLSS
| competitiveness, obviously not a concern. Conventional
| gaming devrel: obviously not a concern. She wants to ship
| the hardware and be done with it, but that's not how
| products are built and released in 2020 anymore.
|
| NVIDIA is out here building integrated systems that you
| build your code on and away you go. They run NVIDIA-written
| CUDA libraries, NVIDIA drivers, on NVIDIA-built networks
| and stacks. AMD can't run the sample packages in ROCm
| stably (as geohot discovered) on a supported configuration
| of hardware/software, even after hours of debugging just to
| get it that far. AMD doesn't even think drivers/runtime is
| a thing they should have to write, let alone a software
| library for the ecosystem.
|
| "just a small family company (bigger than NVIDIA, until
| very recently) who can't possibly afford to hire developers
| for all the verticals they want to be in". But like, they
| spent $50b on a single acquisition, they spent $12b in
| stock buybacks over 2 years, they have money, just not for
| _this_.
| kd913 wrote:
| You do know that Microsoft, Oracle, Meta are all in on this
| right?
|
| Heck I think it is being used to run ChatGPT 3.5 and 4
| services.
| 0cf8612b2e1e wrote:
| On the other hand, AMD has had a decade of watching CUDA eat
| their lunch and done basically nothing to change the
| situation.
| bee_rider wrote:
| AMD tries to compete in hardware with Intel's CPUs and
| Nvidia's GPUs. They have to slack somewhere, and software
| seems to be where. It isn't any surprise that they can't
| keep up on every front, but it does mean they can freely
| bring in partners whose core competency is software and
| work with them without any caveats.
|
| Not sure why they haven't managed to execute on that yet,
| but the partners must be pretty motivated now, right? I'm
| sure they don't love doing business at Nvidia's leisure.
| bobsondugnut wrote:
| when was the last time AMD hardware was keeping up with
| NVIDIA? 2014?
| 0cf8612b2e1e wrote:
| Been a while since AMD had the top tier offering, but it
| has been trading blows in the middle tier segment the
| entire time. If you are just looking for a gamer card (ie
| not max AI performance), the AMD is typically cheaper and
| less power hungry than the equivalent Nvidia.
| bobsondugnut wrote:
| > the AMD is typically cheaper and less power hungry than
| the equivalent Nvidia
|
| cheaper is true, but less power hungry is absolutely not
| true, which is kind of my point.
| dralley wrote:
| It was true with RDNA 2. RDNA 3 regressed on this a bit,
| supposedly there was a hardware hiccup that prevented
| them from hitting frequency and voltage targets that they
| were hoping to reach.
|
| In any case they're only slightly behind, not crazy far
| behind like Intel is.
| aurareturn wrote:
| It's trading blows because AMD sells their cards at lower
| margins in the midrange and Nvidia lets them.
| bee_rider wrote:
| The MI300X sounds like it is competitive, haha
| bobsondugnut wrote:
| competitive with H100 for inference. a 2 year old product
| on just one half of the ML story. H200 (and potentially
| B100) is the appropriate comparison based on their
| production in volume.
| softfalcon wrote:
| I feel like people forget that AMD has huge contracts with
| Microsoft, Valve, Sony, etc to design consoles at scale. It's
| an invisible provider as most folks don't even realize their
| Xbox and their Playstation are both AMD.
|
| When you're providing fab designs at that scale, it makes a
| lot more sense to folks that companies would be willing to
| try a more affordable option to nVidia hardware.
|
| My bet is that AMD figures out a service-able solution for
| some (not all) workloads that isn't ground breaking, but
| affordable to the clients that want an alternative. That's
| usually how this goes for AMD in my experience.
| sangnoir wrote:
| If you read/listen to the Stratechary interview wirh Lisa
| Hsu, she spelled out being open ro customizing AMD hardware
| to meet partner's needs. So if Microsoft needs more memory
| bandwidth and less compute, AMD will build something just
| for them based on what they have now. If Meta wants 10%
| less power consumption (and cooling) for a 5% hit in
| compute, AMD will hear them out too. We'll see if that
| hardware customization strategy works outside of consoles.
| Rinzler89 wrote:
| _> I feel like people forget that AMD has huge contracts
| with Microsoft, Valve, Sony, etc to design consoles at
| scale. _
|
| Nobody forget that, just that those console chips are super
| low margins, which is why Intel and Nvidia stopped catering
| to that market after the Xbox/PS3 generations and only AMD
| took it up because they were broke and every penny mattered
| to them.
|
| Nvidia did a brief stint with the Shield/Switch because
| they were trying to get into the Android/ARM space and also
| kinda gave up due to the margins.
| adabyron wrote:
| I have read in a few places that Microsoft is using AMD for
| inference to run ChatGPT. If I recall they said the
| price/performance was better.
|
| I'm curious if that's just because they can't get enough
| Nvidia GPUs or if the price/performance is actually that much
| better.
| atq2119 wrote:
| Most likely it really is better overall.
|
| Think of it this way: AMD is pretty good at hardware, so
| there's no reason to think that the raw difference in terms
| of flops is significant in either direction. It may go in
| AMD's favor sometimes and Nvidia's other times.
|
| What AMD traditionally couldn't do was software, so those
| AMD GPUs are sold at a discount (compared to Nvidia),
| giving you better price/performance _if you can use them_.
|
| Surely Microsoft is operating GPUs at large enough scale
| that they can pay a few people to paper over the software
| deficiencies so that they _can_ use the AMD GPUs and still
| end up ahead in terms of overall price /performance.
| singhrac wrote:
| The problem is that we all have a lot of FUD (for good
| reasons). It's on AMD to solve that problem publically. They
| need to make it easier to understand what is supported so far
| and what's not.
|
| For example, for bitandbytes (a common dependency in LLM world)
| there's a ROCm fork that the AMD maintainers are trying to
| merge in
| (https://github.com/TimDettmers/bitsandbytes/issues/107).
| Meanwhile an Intel employee merged a change that made a common
| device abstraction (presumably usable by AMD + Apple + Intel
| etc.).
|
| There's a lot of that right now - super popular package that is
| CUDA-only is navigating how to make it work correctly with any
| other accelerator. We just need more information on what is
| supported.
| JonChesterfield wrote:
| Fantastic to see.
|
| The MI300X does memory bandwidth better than anything else by a
| ridiculous margin, up and down the cache hierarchy.
|
| It did not score very well on global atomics.
|
| So yeah, that seems about right. If you manage to light up the
| hardware, lots and lots of number crunching for you.
| jsheard wrote:
| All eyes are of course on AI, but with 192GB of VRAM I wonder if
| this or something like it could be good enough for high end
| production rendering. Pixar and co still use CPU clusters for all
| of their final frame rendering, even though the task is
| ostensibly a better fit for GPUs, mainly because their memory
| demands have usually been so far ahead of what even the biggest
| GPUs could offer.
|
| Much like with AI, Nvidia has the software side of GPU production
| rendering locked down tight though so that's just as much of an
| uphill battle for AMD.
| Havoc wrote:
| I'd imagine ray tracing is a bit easier to paralize over lots
| of older cards. The computations aren't as heavily linked and
| are more fault tolerant. So I doubt anyone is paying h100 style
| premiums
| bryanlarsen wrote:
| Pixar is paying a massive premium; they probably are using an
| order of magnitude or two more CPUs than they would if they
| could use GPUs. Using a hundred CPUs in place of a single
| H100 is a greater-than-h100 style premium.
| imjonse wrote:
| Would Pixar's existing software run on GPUs without much
| work?
| jsheard wrote:
| It does already, at least on Nvidia GPUs: https://rmanwik
| i.pixar.com/pages/viewpage.action?mobileBypas...
|
| They currently only use the GPU mode for quick iteration
| on relatively small slices of data though, and then
| switch back to CPU mode for the big renders.
| jsheard wrote:
| The computations are easily parallelized, sure, but the data
| feeding those computations isn't easily partitioned. Every
| parallel render node needs as much memory as a lone render
| node would, and GPUs typically have nowhere near enough for
| the highest of high end productions. Last I heard they were
| putting around 128GB to 256GB of RAM in their machines and
| that was a few years ago.
| PaulHoule wrote:
| One missed opportunity from the game streaming bubble would be
| a 20-or-so player game where one big machine draws everything
| for everybody and streams it.
| bob1029 wrote:
| Stuff like this is still of interest to me. There are some
| really compelling game ideas that only become possible once
| you look into modern HPC platforms and streaming.
| PaulHoule wrote:
| My son and I have wargamed it a bit. The trouble is there
| is a huge box of tricks used in open world and other
| complex single player games for conserving RAM that compete
| with just having a huge amount of RAM and it is not so
| clear the huge SMP machine with a huge GPU really comes out
| ahead in terms of creating a revolution in gaming.
|
| In the case of Stadia, however, failing to develop this was
| like a sports team not playing any home games. One way of
| thinking about the current crisis of the games industry and
| VR is that building 3-d worlds is too expensive and a major
| part of it is all the shoehorning tricks the industry
| depends on. Better hardware for games could be about
| lowering development cost as opposed to making fancier
| graphics but that tends to be a non-starter with companies
| whose core competence is getting 1000 highly-paid
| developers to struggle with difficult to use tools and the
| idea you could do the same with 10 ordinary developers is
| threatening to them.
| bob1029 wrote:
| I am thinking beyond the scale of any given machine and
| traditional game engine architectures.
|
| I am thinking of an entire datacenter purpose-built to
| host a single game world, with edge locations handling
| the last mile of client-side prediction, viewport
| rendering, streaming and batching of input events.
|
| We already have a lot of the conceptual architecture
| figured out in places like the NYSE and CBOE - Processing
| hundreds of millions of events in less than a second on a
| single CPU core against one synchronous view of _some_
| world. We can do this with insane reliability and
| precision day after day. Many of the technology
| requirements that emerge from the single instance WoW
| path approximate what we have already accomplished in
| other domains.
| rcxdude wrote:
| EVE online is more or less the closest to this so far, so
| it may be worth learning lessons from them (though I
| wouldn't suggest copying their approach: their stackless
| python behemoth codebase appears to contain many a
| horror). It's certainly a hard problem though, especially
| when you have a concentration of large numbers of players
| (which is inevitable when you create such a game world).
| ganzuul wrote:
| Curious what that is. Some kind of AR physics simulation?
|
| I have been thinking about if the the compute could go
| right in cellphone towers but this would take it up a
| notch.
| ThrowawayTestr wrote:
| Stadia was supposed to allow for really big games distributed
| across a cluster. Too bad it died in the crib.
| Nexxxeh wrote:
| It would immediately prevent several classes of cheating. No
| more wallhacks or ESP.
|
| Ironically the main type that'd still exist would be the
| vision-based external AI-powered target-highlighting and
| aim/fire assist.
|
| The display is analysed and overlaid with helpful info (like
| enemies highlighted) and/or inputs are assisted (snap to
| visible enemies, and/or automatically pull trigger.)
| JackYoustra wrote:
| It's probably implemented way differently, but I worry about
| the driver suitability. Gaming benchmarks at least perform
| substantially worse on AI accelerators than even many
| generations old GPUs, I wonder if this extends to custom
| graphics code too.
| Arelius wrote:
| I work in this field, and I think so. This is actually the
| project I'm currently working on.
|
| I'm betting with current hardware and some clever tricks, we
| can resolve full production frames in real-time rates.
| Pesthuf wrote:
| I feel like these huge graphics cards with insane amounts of RAM
| are the moat that AI companies have been hoping for.
|
| We can't possibly hope to run the kinds of models that run on
| 192GB of VRAM at home.
| dmbaggett wrote:
| For inference you could use a maxed-out Mac Ultra; the RAM is
| shared between the CPU and GPU.
| alecco wrote:
| For single user (batch_size = 1), sure. But that is quite
| expensive in $/tok.
| jsheard wrote:
| Apple will gladly sell you a GPU with 192GB of memory, but your
| wallet won't like it.
| kbenson wrote:
| Won't Nvidia, and Intel, and Qualcomm, and Falanx (who make
| the ARM Mali GPUs from what I can see), and Imagination
| Technologies (PowerVR) do the same? They each make a GPU, and
| if you pay them enough money I have a hard time beleiving
| they won't figure out how to slap enough RAM on a board for
| one of their existing products and making whatever changes
| are required.
| nextaccountic wrote:
| The US government is looking into heavily limit availability
| of high end GPUs from now on. And the biggest and most
| effective bottleneck for AI right now is VRAM
|
| So maybe Apple is happy to sell huge GPUs like that but the
| government will probably put it under export controls like
| A100 and H100 already is
| rbanffy wrote:
| Cue the PowerMac G4 TV ad.
|
| https://youtu.be/lb7EhYy-2RE
| rbanffy wrote:
| OTOH, it comes free with one of the finest Unix workstations
| ever made.
| Rinzler89 wrote:
| Which Unix workstation?
| coolspot wrote:
| They are referring to MacOS being included with expensive
| Mac hardware.
| tonetegeatinst wrote:
| Ow contrarily I'd argue the opposite. GPU vram has gotten
| faster but the density isn't that good. 8gb used to be high end
| for the early 2000's yet now 16gb can't even run games that
| well, especially if its a studio that loves vram.
|
| Side note: as someone who has been into machine learning for
| over 10 years, let me tell ya us hobbyists and researchers
| hunger for compute and memory.
|
| VRAM isn't everything.....I am well aware but certain workflows
| really do benefit from heaps of vram like vfx and cad and CFD.
| I realize that the dream of upgradable GPUs where I can upgrade
| the different components just like you do on the computer.
| Computer is slow, then upgrade ram or storage or get a faster
| chip that uses the same socket. GPU could possibility see
| modularity with the processor the vram etc.
|
| Level1Tech has some great videos about how PCIe is the
| future...where we can connect systems together using raw PCI
| lanes, which is similar to how nvidia Blackwell servers
| communicate to other servers in the rack.
| immibis wrote:
| Wasn't that just because of Nvidia's market segmentation?
| phkahler wrote:
| >> We can't possibly hope to run the kinds of models that run
| on 192GB of VRAM at home.
|
| I'm looking to build a mini-ITX system with 256GB of RAM for my
| next build. DDR5 spec can support that in 2 modules, but nobody
| makes them yet. No need for a GPU, I'm looking to the AMD APUs
| which are getting into the 50TOPs range. But yes, RAM seems to
| be the limiting factor. I'm a little surprised the memory
| companies aren't pushing harder for consumers to have that
| capacity.
| auspiv wrote:
| 128GB DDR5 module -
| https://store.supermicro.com/us_en/supermicro-
| hynix-128gb-28...
|
| It is of course RDIMM, but you didn't specify what memory
| type you were looking at.
| tonetegeatinst wrote:
| I wonder if the human body could grow artificial kidney's so that
| I can just sell infinite kidney's and manage to afford a couple
| of these so I can do AI training on my own hardware.
| pca006132 wrote:
| why not infinite brains so you can have more computational
| power than these GPUs?
| Maken wrote:
| Apparently one of those costs around 15K $. I don't know if you
| can buy a couple or they only sell those in massive batches,
| but in any case, how many human kidneys you need to sell to get
| 30K $?
| Filligree wrote:
| So just out of curiosity, what does this thing cost?
| latchkey wrote:
| Pricing is strictly NDA. AMD does not give it out.
| sva_ wrote:
| The rumors say $20k. Nothing official though.
| pheatherlite wrote:
| Without first-class CUDA translation or cross compile, AMD is
| just throwing more transistors at the void
| chung8123 wrote:
| I agree they need to work on their software but I also think
| that the availability as well as massive expense of the H100,
| AMD can undercut Nvidia and build a developer ecosystem if they
| wanted to. I think they need to hit the consumer market pretty
| hard and get all the local llama people hacking up the software
| and drivers to make things work. A cheaper large VRAM consumer
| card would go a long ways to getting a developer ecosystem
| behind them.
| epistasis wrote:
| Given the number of people who need the compute but are only
| accessing it via APIs like HuggingFace's transformers library,
| which supports these chips, I don't really think that CUDA
| support is absolutely essential.
|
| Most kernels are super quick to rewrite, and higher level
| abstractions like PyTorch and JAX make dealing with CUDA a
| pretty rare experience for most people making use of large
| clusters and small installs. And if you have the money to build
| a big cluster, you can probably also hire the engineers to port
| your framework to the right AMD library.
|
| The world has changed a lot!
|
| The bigger challenge is that if you are starting up, why in the
| world would you give yourself the additional challenge of going
| off the beaten path? Its not just CUDA but the whole
| infrastructure of clusters and networking that really gives
| NVIDIA an edge, in addition to knowing that they are going to
| stick around in the market, whereas AMD might leave it
| tomorrow.
| alkonaut wrote:
| Good. If there is even a slight suspicion that the best value is
| team read in 5 or 10 years then CUDA will look a lot less
| attractive already today.
| w-m wrote:
| Impressions from last week's CVPR, a conference with 12k
| attendees on computer vision - Pretty much everyone is using
| NVIDIA GPUs, and pretty much everyone isn't happy with the
| prices, and would like some competition in the space:
|
| NVIDIA was there with 57 papers, a website dedicated to their
| research presented at the conference, a full day tutorial on
| accelerating deep learning, and ever present with shirts and
| backpacks in the corridors and at poster presentations.
|
| AMD had a booth at the expo part, where they were raffling off
| some GPUs. I went up to them to ask what framework I should look
| into, when writing kernels (ideally from Python) for GPGPU. They
| referred me to the "technical guy", who it turns out had a demo
| on inference on an LLM. Which he couldn't show me, as the laptop
| with the APU had crashed and wouldn't reboot. He didn't know
| about writing kernels, but told me there was a compiler guy who
| might be able to help, but he wasn't to be found at that moment,
| and I couldn't find him when returning to the booth later.
|
| I'm not at all happy with this situation. As long as AMDs
| investment into software and evangelism remains at ~$0, I don't
| see how any hardware they put out will make a difference. And
| you'll continue to hear people walking away from their booth,
| saying "oh when I win it I'm going to sell it to buy myself an
| NVIDIA GPU".
| cstejerean wrote:
| Completely agree. It's been 18 years since Nvidia released
| CUDA. AMD has had a long time to figure this out so I'm amazed
| at how they continue to fumble this.
| bryanlarsen wrote:
| 10 years ago they were basically broke and bet the farm on
| Zen. That bet paid off. I doubt a bet on CUDA would have paid
| off in time to save the company. They definitely didn't have
| the resources to split that bet.
| jsheard wrote:
| It's not like the specific push for AI on GPUs came out of
| nowhere either, Nvidia first shipped cuDNN in 2014.
| kimixa wrote:
| CUDA of 18 years ago is _very_ different to CUDA of today.
|
| Back then AMD/ATI were actually at the forefront on the GPGPU
| side - things like the early brook language and CTM lead
| pretty quickly into things like OpenCL. Lots of work went on
| using the xbox360 gpu in _real_ games for GPGPU tasks.
|
| But CUDA steadily improved iteratively, and AMD kinda just...
| stopped developing their equivalents? Considering a good part
| of that time they were near bankruptcy it might have not have
| been surprising though.
|
| But saying Nvidia solely kicked off everything with CUDA is
| rather a-historical.
| yvdriess wrote:
| Yep! I used BrookGPU for my GPGPU master thesis, before
| CUDA was a thing. AMD lacked followthrough on yhe software
| side as you said, but a big factor was also NV handing out
| GPUs to researchers.
| userabchn wrote:
| > CUDA of 18 years ago is very different to CUDA of today.
|
| I've been writing CUDA since 2008 and it doesn't seem that
| different to me. They even still use some of the same
| graphics in the user guide.
| dagw wrote:
| _AMD kinda just... stopped developing their equivalents?_
|
| I wasn't so much that they stopped developing, rather they
| kept throwing everything out and coming out with new and
| non backwards compatible replacements. I knew people
| working in the GPU Compute field back in those days who
| were trying to support both AMD/ATI and NVidia. While their
| CUDA code just worked from release to release and every new
| release of CUDA just got better and better, AMD kept coming
| up with new breaking APIs and forcing rewrite and rewrite
| until they just gave up and dropped AMD.
| dragontamer wrote:
| 10 years ago AMD was selling its own headquarters so that it
| could stave off bankruptcy for another few weeks
| (https://arstechnica.com/information-
| technology/2013/03/amd-s...).
|
| AMD's software investments have begun in earnest a few years
| ago, but AMD really did progress more than pretty much
| everyone else aside from NVidia IMO.
|
| AMD further made a few bad decisions where they "split the
| bet", relying upon Microsoft and others to push software
| forward. (I did like C++ Amp for what its worth). The
| underpinnings of C++Amp led to Boltzmann which led to ROCm,
| which then needed to be ported away from C++Amp and into
| CUDA-like Hip.
|
| So its a bit of a misstep there for sure. But its not like
| AMD has been dilly dallying. And for what its worth, I would
| have personally preferred C++ Amp (a C++11 standardized way
| to represent GPU functions as []-lambdas rather than CUDA-
| specific <<<extensions>>>). Obviously everyone else disagrees
| with me but there's some elegance to
| parallel_for_each([](param1, param2){magically a GPU function
| executing in parallel}), where the compiler figures out the
| details of how to get param1 and param2 from CPU RAM into GPU
| (or you use GPU-specific allocators to make param1/param2 in
| the GPU codespace already to bypass the automagic).
| jacoblambda wrote:
| > As long as AMDs investment into software and evangelism
| remains at ~$0
|
| Last time I checked they have been trying to hire a ton of
| software engineers for improving the applied stacks (CV, ML,
| DSP, compute, etc) at the location near where I'm located.
|
| It seems like there's a big push to improve the stacks but
| given that less than 10 years ago they were practically at
| death's door it's not terribly surprising that their software
| is in the state it is. It's been getting better gradually but
| quality software doesn't just show up over night and especially
| so when things are as complex and arcane as they are in the GPU
| world.
| benreesman wrote:
| With margins that high?
|
| There is always financing, there are always people willing to
| go to the competitor at some wage, there is always a way if
| the leadership wants to.
|
| If it was just a straight up fab bottleneck? Yeah maybe you
| buy that for a year or two.
|
| "During Q1, Nvidia reported $5.6 billion in cost of goods
| sold (COGS). This resulted in a gross profit of $20.4
| billion, or a margin profile of 78.4%."
|
| That's called an "induced market failure".
| almostgotcaught wrote:
| > With margins that high? There is always financing, there
| are always people willing to go to the competitor at some
| wage, there is always a way if the leadership wants to.
|
| People love to pop-off on stuff they really know anything
| about. Let me ask you: what financing do you imagine is
| available? Like literally what financing do you propose for
| a publically traded company? Like do you realize they can't
| actually issue new shares without putting it to a
| shareholder vote? Should they issue bonds? No I know they
| should run an ICO!!!
|
| And then what margins exactly? Do you know what the margin
| is on MI300? No. Do you know whether they're currently
| selling at a loss to win marketshare? No.
|
| I would the happiest boy if hn, in addition to policing
| jokes and memes, could police arrogance.
| JohnPrine wrote:
| Are you saying that companies lose the ability to secure
| financing once they go public?
| almostgotcaught wrote:
| of course not - mentioned 3 routes to securing further
| financing. did you read about those 3 routes in my
| comment?
| qaq wrote:
| Well if Mojo and Modular Max Platform take off I guess there
| will be a path for AMD
| monkeydust wrote:
| As more a business person than engineer, help me understand why
| AMD are not getting this, what's the counter argument? Is CUDA
| just too far ahead, are they lacking the right people in senior
| leadership roles to see this through?
| cyanydeez wrote:
| CUDA is a software moat. If you want to use any gpu other
| than nvidia, you need to double your engineering budget
| because theres no easy to bootstrap projects at any level.
| The hardware prices are meaninglesz if you need a 200k
| engineer, if they exist, just.to bootstrap a product.
| rbanffy wrote:
| Depending on your hardware budget, the engineering one can
| look like a rounding error.
| cyanydeez wrote:
| Sure, but then youre still on the.side.of NVIDIA because
| you jave the.budget.
| sangnoir wrote:
| Why give any additional money to Nvidia when you can
| announce more profits (or get more compute if you're a
| government agency) by hiring more engineers to enable AMD
| hardware for less than a few million per year? It's not
| like Microsoft loves the idea of handing over money to
| Nvidia if there is a cheaper alternative that can make
| $MSFT go up.
| noelwelsh wrote:
| Leadership lacking vision + being almost bankrupt until
| relatively recently.
| hedgehog wrote:
| As another commenter points out their strategy appears to be
| to focus on HPC clients where AMD can focus providing after-
| sale software support around a relatively small number of
| customer requests. This gets them some sales while avoiding
| the level of organizational investment necessary to build a
| software platform that can support NVIDIA-style broad
| compatibility and good out-of-the-box experience.
| dagw wrote:
| CUDA is very far ahead. Not only technically, but in
| mindshare. Developers trust CUDA and know that investing in
| CUDA is a future proof investment. AMD has had so many API
| changes over the years, that no one trusts them any more. If
| you go all in on AMD, you might have to re-write all your
| code in 3-5 years. AMD can promise that this won't happen,
| but it's happened so many times already that no one really
| believes them.
|
| Another problem is simply that hiring (and keeping) top
| talent is really really hard. If you're smart enough to be a
| lead developer of AMDs core Machine Learning libraries, you
| can probably get hired at any number of other places, so why
| choose AMD.
|
| I think the leadership gets it and understand the importance,
| I just don't think they (or really anybody) knows how to come
| up with a good plan to turn things around quickly. They're
| going to have to commit to at least a 5 year plan and lose
| money each of those 5 years, and I'm not sure they can or
| even want to fight that battle.
| martinpw wrote:
| > Another problem is simply that hiring (and keeping) top
| talent is really really hard.
|
| Absolutely. And when your mandate for this top talent is
| going to be "go and build something that basically copies
| what those other guys have already built", it is even
| harder to attract them, when they can go any place they
| like and work on something new.
|
| > I think the leadership gets it and understand the
| importance, I just don't think they (or really anybody)
| knows how to come up with a good plan to turn things around
| quickly.
|
| Yes, it always puzzles me when people think nobody at AMD
| actually sees the problem. Of course they see it. Turning a
| large company is incredibly hard. Leadership can give
| direction, but there is so much baked in momentum, power
| structures, existing projects and interests, that it is
| really tough to change things.
| alecco wrote:
| > are they lacking the right people in senior leadership
| roles to see this through?
|
| Just like Intel, they have an outdated culture. IMHO they
| should start a software Skunk Works isolated from the company
| and have the software guys guide the hardware features. Not
| the other way around.
|
| I wouldn't bet money on either of them doing this. Hopefully
| some other smaller, modern, and flexible companies can try
| it.
| jwuphysics wrote:
| Have you looked into TinyCorp [0]/tinygrad [1], one of the
| latest endeavors by George Hotz? I've been pretty impressed by
| the performance. [2]
|
| [0] https://tinygrad.org/ [1]
| https://github.com/tinygrad/tinygrad [2]
| https://x.com/realGeorgeHotz/status/1800932122569343043?t=Y6...
| arghwhat wrote:
| He also shakes his fist at the software stack, but loudly
| enough that it has AMD react to it.
| anthonix1 wrote:
| I have not been impressed by the perf. Slower than PyTorch
| for LLMs, and PyTorch is actually stable on AMD (I've trained
| 7B/13B models).. so the stability issues seem to be more of a
| tinygrad problem and less of an AMD problem, despite George's
| ramblings [0][1]
|
| [0] https://github.com/tinygrad/tinygrad/issues/4301 [1]
| https://x.com/realAnthonix/status/1800993761696284676
| slavik81 wrote:
| MIVisionX is probably the library you want for computer vision.
| As for kernels, you would generally write HIP, which is very
| similar to CUDA. To my knowledge, there's no equivalent to cupy
| for writing kernels in Python.
|
| For what it's worth, your post has cemented my decision to
| submit a few conference talks. I've felt too busy writing code
| to go out and speak, but I really should make time.
| hyperbovine wrote:
| The equivalent to cupy is ... cupy:
|
| https://docs.cupy.dev/en/v13.2.0/install.html#using-cupy-
| on-...
| slavik81 wrote:
| Oh cool! It appears that I've already packaged cupy's
| required dependencies for AMD GPU support in the Debian 13
| 'main' and Ubuntu 24.04 'universe' repos. I also extended
| the enabled architectures to cover all discrete AMD GPUs
| from Vega onwards (aside from MI300, ironically). It might
| be nice to get python3-cupy-rocm added to Debian 13 if this
| is a library that people find useful.
| sangnoir wrote:
| > I'm not at all happy with this situation. As long as AMDs
| investment into software and evangelism remains at ~$0, I don't
| see how any hardware they put out will make a difference.
|
| It appears AMD initial strategy is courting the HPC crowd and
| hyperscalers, they have big budgets, lower support overhead and
| are willing and able to write code that papers-over AMDs not-
| great software while appreciating lower-than-Nvidia TCO. I
| think this this incremental strategy is sensible, considering
| where most of the money is.
|
| As a first mover, Nvidia had to start from the bottom up; CUDA
| used to run only/mostly on consumer GPUs - AMD is going top-
| down, starting with high-margin DC hardware, before trickling
| down rack-level users, and eventually APUs later as revenue
| growth allows more re-investment.
| antupis wrote:
| That is wrong move personally would start from localllm/llama
| folks who crave more memory and build up from there.
| sangnoir wrote:
| Seeing that they don't have a mature software stack, I
| think for now AMD would prefer one customer who brings in
| $10m revenue over 10'000 customers at $1000 a pop.
| landryraccoon wrote:
| They're making the wrong strategic play.
|
| They will fail if they go after the highest margin customers.
| Nvidia has every advantage and every motivation to keep those
| customers. They would need a trillion dollars in capital to
| have a chance imho.
|
| It would be like trying to go after Intel in the early 2000s
| by trying to target server cpus, or going after the desktop
| operating system market in the 90s against Microsoft. Its
| aiming for your competition where they are strongest and you
| are weakest.
|
| Their only chance to disrupt is to try to get some of the
| customers that Nvidia doesn't care about, like consumer level
| inference / academic or hobbyist models. Intel failed when
| they got beaten in a market they didn't care about, i.e
| mobile / small power devices.
| acchow wrote:
| > They would need a trillion dollars in capital to have a
| chance imho.
|
| All AMD would really need is for Nvidia innovation to
| stall. Which, with many of their engineers coasting on $10M
| annual compensation, seems not too far fetched
| sangnoir wrote:
| AMD can go toe to toe with Nvidia on hardware innovation.
| What AMD had realised (correctly, IMO), is that all they
| need is for _hyperscalers_ to match /come close to Nvidia
| on software innovation on AMD hardware -
| Amazon/Meta/Microsoft engineers can get their foundation
| models running on M1300X well enough for their needs -
| CUDA is not much of moat in that market-segment where
| there are dedicated AI-infrastructure teams. If the price
| is right, they may shift some of those CapEx dollars from
| Nvidia to AMD. Few AI practitioners - and even fewer LLM
| consumers - care about the libraries underpinning
| torch/numpy/high-level-python-framework/$LLM-service, as
| long as it works.
| Certhas wrote:
| This is a common sentiment, no doubt also driven by the
| wish thay AMD would cater to us.
|
| But I see no evidence that the strategy is wrong or
| failing. AMD is already powering a massive and rapidly
| growing share of Top 500 HPC:
|
| https://www.top500.org/statistics/treemaps/
|
| AMD compute growth isn't in places where people see it, and
| I think that gives a wrong impression. (Or it means people
| have missed the big shifts over the last two years.)
| frognumber wrote:
| I see a lot of evidence, in the form of a rising moat for
| NVidia.
| jampekka wrote:
| It would be interesting to see how much these
| "supercomputers" are actually used, and what parts of
| them are used.
|
| I use my university's "supercomputer" every now and then
| when I need lots of VRAM, and there are rarely many other
| users. E.g. I've never had to queue for a GPU even though
| I use only the top model, which probably should be the
| most utilized.
|
| Also, I'd guess there can be nvidia cards in the grid
| even if "the computer" is AMD.
|
| Of course it doesn't matter for AMD whether the compute
| is actually used or not as long as it's bought, but lots
| of theoretical AMD flops standing somewhere doesn't
| necessarily mean AMD is used much for compute.
| bryanlarsen wrote:
| The savings are an order of magnitude different. Switching
| from Intel to AMD in a data center might have saved
| millions if you were lucky. Switching from NVidia to AMD
| might save the big LLM vendors billions.
| toast0 wrote:
| I only observe this market from the sidelines... but
|
| They're able to get the high end customers, and this
| strategy works because they can sell the high end customers
| high end parts in volume without having to have a good
| software stack; at the high end, the customers are willing
| to put in the effort to make their code work on hardware
| that is better in dollars/watts/availability or whatever it
| is that's giving AMD inroads into the supercomputing
| market. They can't sell low end customers on GPU compute
| without having a stack that works, and somebody who has a
| small GPU compute workload may not be willing or able to
| adapt their software to make it work on an AMD card even if
| the AMD card would be a better choice if they could make it
| work.
| jiggawatts wrote:
| They're going to sell a billion dollars of GPUs to a
| handful of customers while NVIDIA sells a trillion
| dollars of their products to _everyone_.
|
| Every framework, library, demo, tool, and app is going to
| use CUDA forever and ever while some "account manager" at
| AMD takes a government procurement officer to lunch to
| sell one more supercomputer that year.
| make3 wrote:
| 99%+ of people aren't writing kernels man, this doesn't mean
| anything, this is just silly
| xadhominemx wrote:
| If you are looking for attention from an evangelist, I'm sorry
| but you are not the target customer for MI300. They are
| courting the Hyperscalers for heavy duty production inference
| workloads.
| spitfire wrote:
| I remember years ago one of the amd apus had the cup and gpu on
| the same die, and could exchange ownership of cpu and gpu memory
| with just a pointer change or some other small accounting.
|
| Has this returned? Because for dual gpu/cpu workloads (alpha
| zero, etc) that would deliver effective "infinite bandwidth"
| between gpu and cpu. Using an apu of course gets you huge amounts
| of slowish memory. But being some to fling things around with
| abandon would be an advantage, particularly for development.
| wmf wrote:
| I assume the MI300A APU also supports zero-copy. Because MI300X
| is a separate chip you necessarily have to copy data over PCIe
| to get it into the GPU.
| rbanffy wrote:
| One day someone will build a workstation around that chip.
| One day...
| JonChesterfield wrote:
| You don't need to change the pointer value. The GPU and the CPU
| have the same page table structures and both use the same
| pointer representation for "somewhere in common memory".
|
| On the GPU there are additional pointer types for different
| local memory, e.g. LDS is a uint16_t indexing from zero. But
| even there you can still have a single pointer to "somewhere"
| and when you store to it with a single flat addressing
| instruction the hardware sorts out whether it's pointing to
| somewhere in GPU stack or somewhere on the CPU.
|
| This works really well for tables of data. It's a bit of a
| nuisance for code as the function pointer is aimed at somewhere
| in memory and whether that's to some x86 or to some gcn depends
| on where you got the pointer from, and jumping to gcn code from
| within x86 means exactly what it sounds like.
| spitfire wrote:
| I'm not sure it was "pointers" but it was some very low cost
| way to change ownership of memory between the CPU and GPU.
|
| They had some fancy marketing name for it at the time. But it
| wasn't on all chips, it should have been. Even if it was dog
| slow between PCIe GPU and CPU the unified interface would
| have been the right way to go. Also, amenable to automated
| scheduling.
|
| The point still stand though, I want entirely unified GPU and
| CPU memory.
| JonChesterfield wrote:
| The unified address space with moving pages between CPU and
| GPU on page fault works on some discrete GPU systems but
| it's a bit of a performance risk compared to keeping the
| pages on the same device.
|
| Fundamentally if you've got separate blocks of memory tied
| together by pcie then it's either annoying copying data
| across or a potential performance problem doing it behind
| the scenes.
|
| A single block of memory that everything has direct access
| to is much better. It works very neatly on the APU systems.
| spitfire wrote:
| > Fundamentally if you've got separate blocks of memory
| tied together by pcie then it's either annoying copying
| data across or a potential performance problem doing it
| behind the scenes.
|
| Well, as I said that's amenable to automated planning.
|
| But what I really, really want is a nice APU with 512GB+
| of memory that both the CPU and GPU can access willy
| nilly.
| JonChesterfield wrote:
| Yep, that's what I want too. The future is now.
|
| The MI300A is an APU with 128gb on the package. They come
| in four socket systems, that's 512gb of cache coherent
| machine with 96 fast x64 cores and many GCN cores. Quite
| like a node from El Capitan.
|
| I'm delighted with the hardware and not very impressed
| with the GPU offloading languages for programming it. The
| GCN and x64 cores are very much equal peers on the
| machine, the asymmetry baked into the languages grates on
| me.
|
| (on non-apu systems, moving the data around in the
| background such that the latency is hidden is a nice idea
| and horrendously difficult to do for arbitrary workloads)
| kcb wrote:
| Probably thinking of this https://en.m.wikipedia.org/wiki/H
| eterogeneous_System_Archite...
|
| > Even if it was dog slow between PCIe GPU and CPU the
| unified interface would have been the right way to go
|
| That is actually what happened. You can directly access
| pinned cpu memory over pcie on discrete gpus.
| omneity wrote:
| I'm surprised at the simplicity of the formula in the paragraph
| below. Could someone explain the relationship between model size,
| memory bandwidth and token/s as they calculated here?
|
| > Taking LLaMA 3 70B as an example, in float16 the weights are
| approximately 140GB, and the generation context adds another
| ~2GB. MI300X's theoretical maximum is 5.3TB/second, which gives
| us a hard upper limit of (5300 / 142) = ~37.2 tokens per second.
| wmf wrote:
| AFAIK generating a single token requires reading all the
| weights from RAM. So 5300 GB/s total memory bandwidth / 142 GB
| weights = ~37.2 tokens per second.
| latchkey wrote:
| From Cheese (they don't have a HN account, so I'm posting for
| them):
|
| Each weight is a FP16 float which is 2 Bytes worth of data, you
| have 70B tokens, so the total amount of data the weights take
| up is 140GB then you have a couple extra GBs for the context.
|
| Then to figure out the theoretical tokens per second you just
| divide the amount of memory bandwidth, 5300GB/s in MI300X's
| case, by the amount of data that the tokens and context take up
| so 5300/142 which is about 37 tokens per second.
| rbanffy wrote:
| 37 somethings per second doesn't sound fast at all. You need
| to remember it's 37 ridiculously difficult things per second.
| omneity wrote:
| So am I correct in understanding what they really mean is 37
| full forward passes per second?
|
| In which case, if the model weights are fitting in the VRAM
| and are already loaded, why does the bandwidth impact the
| rate of tok/s?
| fancyfredbot wrote:
| You have to get those weights from the RAM to the floating
| point unit. The bandwidth here is the rate at which you can
| do that.
|
| The weights are not really reused. Which means they are
| never in registers, or in L1/L2/L3 caches. They are always
| in VRAM and always need to be loaded back in again.
|
| However, if you are batching multiple separate inputs you
| can reuse each weight on ech input, in which case you may
| not be entirely bandwidth bound and this analysis breaks
| down a bit. Basically you can't produce a single stream of
| tokens any faster than this rate, but you can produce more
| than one stream of tokens at this rate.
| immibis wrote:
| That would be higher with batching, right? (5300 / 144) * 2 =
| ~73.6 and so on.
| snaeker58 wrote:
| I hate the state of AMDs software for non gamers. RoCm is a war
| crime (which has improved dramatically in the last two years and
| still sucks).
|
| But like many have said considering AMD was almost bankrupt their
| performance is impressive. This really speaks for their hardware
| division. If only they could get the software side of things
| fixed!
|
| Also I wonder if NVIDIA has an employee of the decade plaque for
| CUDA. Because CUDA is the best thing that could've happened to
| them.
| rbanffy wrote:
| Would be interesting to see a workstation based on the version
| with a couple x86 dies, the MI300A. Oddly enough, it'd need a
| discrete GPU.
| alecco wrote:
| > Taking LLaMA 3 70B as an example, in float16 the weights are
| approximately 140GB, and the generation context adds another
| ~2GB. MI300X's theoretical maximum is 5.3TB/second, which gives
| us a hard upper limit of (5300 / 142) = ~37.2 tokens per second.
|
| I think they mean 37.2 _forward passes_ per second. And at 4008
| tokens per second (from "LLaMA3-70B Inference" chart) it means
| they were using a batch size of ~138 (if using that math, but
| probably not correct). Right?
___________________________________________________________________
(page generated 2024-06-25 23:01 UTC)