[HN Gopher] Azure announces new AI optimized VM series featuring...
       ___________________________________________________________________
        
       Azure announces new AI optimized VM series featuring AMD's flagship
       MI300X GPU
        
       Author : latchkey
       Score  : 50 points
       Date   : 2023-11-15 19:11 UTC (3 hours ago)
        
 (HTM) web link (techcommunity.microsoft.com)
 (TXT) w3m dump (techcommunity.microsoft.com)
        
       | CharlesW wrote:
       | I could've sworn that it was once a given that people doing
       | serious AI work needed CUDA-capable GPUs. Is that no longer the
       | case?
        
         | qeternity wrote:
         | Necessity is the mother of invention.
        
         | kibibu wrote:
         | The article highlights that onnx, deepspeed, pytorch and
         | tensorflow all support these AMD processors
        
           | CharlesW wrote:
           | Thanks, so I think you're saying that there are two related
           | things happening that are eliminating NVIDIA's CUDA moat: (1)
           | Developers are mostly using libraries instead of writing "to
           | the metal" (CUDA), and (2) many popular libraries have added
           | support for AMD GPUs. Does that capture it?
        
             | latchkey wrote:
             | https://www.databricks.com/blog/training-llms-scale-amd-
             | mi25...
             | 
             | https://embeddedllm.com/blog/vllm_rocm/
        
             | brucethemoose2 wrote:
             | It's the opposite in my experience.
             | 
             | Pure PyTorch mostly works OK, but some libraries
             | implementing crazy optimized, hand written kernels and such
             | will have some trouble.
             | 
             | So (for instance) maybe you can run an LLM in a particular
             | PyTorch framework, but flash attention 2 doesn't support
             | your AMD card, so performance and memory use takes a hit.
             | 
             | Or maybe the library works on an Intel XPU with like 5
             | changed lines in the entire library (rename "cuda" to
             | "xpu"), but no one bothered to add it, or maybe the dev
             | doesn't even want to support the PR.
        
             | tails4e wrote:
             | AMD also have HIP which is almost exactly CUDA, but target
             | agnostic, so basically now CUDA code will now run on AMD
             | hardware. So they have the low level via HIP, and have also
             | integrated into the libraries like pytorch. It's a great
             | bit chip out of that moat.
        
         | faeriechangling wrote:
         | There aren't enough CUDA-capable GPUs to buy. So people will
         | enter learn how to use products like the MI300X or they won't
         | have AI.
        
           | epolanski wrote:
           | It's also a matter of price. Nvidia sells their top GPUs at
           | insane premiums. For Microsoft which is such a huge player
           | and the major AI vendor and owns a lot of OpenAI they have no
           | point into cornering themselves into Nvidia dependence.
           | 
           | They benefit from competition, not from bending to one
           | vendor.
        
             | latchkey wrote:
             | > _Nvidia sells their top GPUs at insane premiums._
             | 
             | It is all sold out, for years. You can't sell something you
             | don't have.
             | 
             | https://www.theregister.com/2023/09/08/tsmc_ai_chip_crunch/
        
         | modeless wrote:
         | Now that AI made Nvidia a trillion dollar company AMD has
         | finally woken up and realized what they should have ten years
         | ago: that they need to invest in the software side of things
         | more. There is movement now, and you can actually do some AI
         | things on AMD hardware, but it will take a long time for them
         | to catch up to Nvidia.
        
           | latchkey wrote:
           | AMD beat Intel in the server CPU market.
        
           | noxa wrote:
           | challenge accepted ;)
           | 
           | (hello!)
        
           | beebeepka wrote:
           | You seem to have forgotten where AMD was 10 years ago.
           | Company was circling the drain but they should have invested
           | heavily in the software stack for hardware they could barely
           | afford to design? Brilliant strategy. Now's the time to
           | invest because they finally have the resources. The ai train
           | is not going anywhere anytime soon
        
         | treprinum wrote:
         | That's the joke. It still is. ROCm is nowhere near production-
         | ready and if MS thinks devs will want to waste their time on
         | random errors then good luck with that business. AMD cards are
         | also super expensive so not sure what their competitiveness is
         | supposed to be?
        
           | latchkey wrote:
           | Expensive is not entirely relevant. They are available (for
           | now).
           | 
           | https://twitter.com/sama/status/1724626002595471740
           | 
           | ROCm has also made a lot of advances in recent times.
           | 
           | https://www.databricks.com/blog/training-llms-scale-amd-
           | mi25...
        
             | kristianp wrote:
             | Thread about that tweet (or should I say xeet) from SamA:
             | https://news.ycombinator.com/item?id=38274427
        
           | epolanski wrote:
           | Microsoft can develop their own software. We are talking a
           | company that places billions in orders to Nvidia, they can
           | afford to develop the tools and obviously AMD will throw
           | millions and millions in support to court such a customer.
        
           | brucethemoose2 wrote:
           | The MI300 is a massive GPU compared to even the new H200.
           | 
           | It was designed as a combined CPU/GPU for supercomputers,
           | with shared memory. But then the AI craze hit, so AMD spun a
           | variant into a pure GPU AI accelerator real quick, which they
           | could actually pull off because the GPU silicon is modular.
           | 
           | ...So thats why it cost a fortune. Its really a jury rigged
           | HPC product.
        
         | brucethemoose2 wrote:
         | AMD _is_ "CUDA compatible" in theory.
         | 
         | Intel is taking a slightly different approach, and is going for
         | "PyTorch compatible."
         | 
         | You will hear endless negative anecdotes about ROCm/OpenVINO,
         | but they both do seem to be getting better with each update.
        
       | epolanski wrote:
       | I'm surprised by the negativity.
       | 
       | Microsoft has the know-how from hardware and software, drivers,
       | APIs, firmware the resources, it's a major AMD customer in Cloud
       | and consumer devices.
       | 
       | Do people think Microsoft and AMD will watch Nvidia cannibalize
       | the market snd Microsoft writing cheques for whatever Nvidia
       | demands?
       | 
       | It's like people forget a major economic rule: when margins are
       | high you will attract competition.
        
         | partiallypro wrote:
         | Similar to the old Slashdot days, HN is still plagued with just
         | outright anti-Microsoft people under any circumstance, even in
         | cases where there is no justification.
        
           | jiggawatts wrote:
           | A more recent thing is just pretending Microsoft (and
           | especially Azure) doesn't even exist.
           | 
           | If pressed, sure, people will admit that these may not be
           | entirely imaginary entities, but if listing technologies or
           | platforms then "oops" they'll just forget.
           | 
           | The best example I saw was a "poster" of cloud big data
           | technologies. Started with Amazon S3, went through Google
           | Bigtable, and then in the corners had companies so small that
           | their own marketing page is the only search result. No
           | mention of Azure anywhere.
           | 
           | There were dozens of logos on that page representing
           | companies with annual revenues smaller than what one my
           | customers spent on a single Azure Storage Account _by
           | accident_.
        
             | dboreham wrote:
             | Meanwhile they're all committing code to github and pulling
             | packages from npmjs.
        
               | justahuman74 wrote:
               | If often wondered if microsoft should rebrand the entire
               | company to github
        
             | doublepg23 wrote:
             | If developers are going out of their way to forget about
             | Microsoft that seems like Microsoft's failing.
        
               | thereisnospork wrote:
               | Not like Microsoft has ever made actively hostile design
               | choices over and over again with it's namesake product or
               | anything...
        
               | FirmwareBurner wrote:
               | _> If developers are going out of their way to forget
               | about Microsoft that seems like Microsoft's failing._
               | 
               | The world of developers as a profession, is much bigger
               | than some angry activists on Twitter and HN frothing in
               | their echo-chambers.
               | 
               | HN also likes to forget that enterprise also exists. And
               | that IBM or SAP exist. It's just another echo chamber
               | where people prioritize emotions before facts.
        
         | doublepg23 wrote:
         | I've been burned by Microsoft and AMD.
         | 
         | The talking heads on CNBC would probably mention their synergy
         | of "software, drivers, APIs, firmware the resources" but as a
         | user of their end results - once bitten, twice shy.
        
         | faeriechangling wrote:
         | Never mind cannibalising (?) the market.
         | 
         | There have already been several ChatGPT outages caused by a
         | lack of compute capacity. Azure literally cannot buy Nvidia
         | GPUs fast enough to satisfy customer demand. They are forced to
         | buy alternatives here. There's no real decision to make here.
        
       | ChrisArchitect wrote:
       | Confusing as they also announced their own Cobalt AI chips too
        
         | kristianp wrote:
         | No, that's about Microsoft's chips. This is about AMDs GPUs.
        
       | taspeotis wrote:
       | Imagine the hallucinations we're gonna see with AMD drivers in
       | the mix...
        
       ___________________________________________________________________
       (page generated 2023-11-15 23:01 UTC)