https://wccftech.com/amd-launches-instinct-mi300x-ai-gpu-accelerator-up-to-60-percent-faster-nvidia-h100/

Menu News Hardware Gaming Mobile Finance Software Deals Reviews
Videos How To
  
 
  
[                    ]
[SUBMIT]
News Hardware Gaming Mobile Finance Software Deals Reviews Videos How
To
AnnouncementHardware

AMD Launches Instinct MI300X AI GPU Accelerator, Up To 60% Faster
Than NVIDIA H100

Hassan Mujtaba * Dec 6, 2023 01:27 PM EST
* Copy Shortlink
 
[AMD-Instinct-MI300X-Launch-728x422]

AMD has announced the official launch of its flagship AI GPU
accelerator, the MI300X, which offers up to 60% better performance
than NVIDIA's H100.

AMD Finally Has The GPU To Tackle NVIDIA In The AI Segment, MI300X Up
To 60% Faster Than H100

The AMD Instinct MI300 class of AI accelerators will be another
chiplet powerhouse, making use of advanced packaging technologies
from TSMC. Today, AMD not only announced the launch of these chips
but shared the first performance benchmarks of the MI300X which look
great. AMD first used the general specs as a comparison and their
CDNA 3 accelerator offers (versus NVIDIA H100):

Related Story AMD Ryzen 8050 Strix Point APUs With XDNA 2 NPU Coming
In 2024, Boosts AI Performance By 3X

  * 2.4X Higher Memory Capacity
  * 1.6X Higher Memory Bandwidth
  * 1.3X FP8 TFLOPS
  * 1.3X FP16 TFLOPS
  * Up To 20% Faster Vs H100 (Llama 2 70B) In 1v1 Comparison
  * Up To 20% Faster Vs H100 (FlashAttention 2) in 1v1 Comparison
  * Up To 40% Faster Vs H100 (Llama 2 70B) in 8v8 Server
  * Up To 60% Faster Vs H100 (Bloom 176B) In 8v8 Server

amd-instinct-mi300-_-mi300x-launch-_4
amd-instinct-mi300-_-mi300x-launch-_3
amd-instinct-mi300-_-mi300x-launch-_6
2 of 9

In general LLM Kernel TFLOPs, the MI300X offers up to 20% higher
performance in FlashAttention-2 and Llama 2 70B. Looking from a
platform perspective which compares an 8x MI300X solution to an 8X
H100 solution, we see a much bigger 40% gain in Llama 2 70B & a 60%
gain in Bloom 176B.

[AMD-Instinct-MI300-_-MI300X-Launch-_7]

AMD mentions that in training performance, the MI300X is on par with
the competition (H100) and offers competitive price/perf while
shining in inferencing workloads.

The driving force behind the latest MI300 accelerators is ROCm 6.0.
The software stack has been updated to the latest version with
powerful new features which include support for various AI workloads
such as Generative AI and Large language models.

amd-rocm-6-_1
amd-rocm-6-_2
amd-rocm-6-_3
2 of 9

The new software stack supports the latest compute formats such as
FP16, Bf16, and FP8 (including Sparsity). The optimizations combine
to offer up to 2.6x speedup in vLLM through optimized inference
libraries, 1.4x speedup in HIP Graph through optimized runtime, and
1.3x Flash Attention speedup through optimized Kernels. ROCm 6 is
expected later this month alongside the MI300 AI accelerators. It
will be interesting to see how ROCm 6 compares to the latest version
of NVIDIA's CUDA stack which is its real competition.

AMD Instinct MI300X - Challenging NVIDIA's AI Supremacy With CDNA 3 &
Huge Memory

The AMD Instinct MI300X is the chip that will be highlighted the most
since it is targeted at NVIDIA's Hopper and Intel's Gaudi
accelerators within the AI segment. This chip has been designed
solely on the CDNA 3 architecture and there is a lot of stuff going
on. The chip is going to host a mix of 5nm and 6nm IPs, all combining
to deliver up to 153 Billion transistors (MI300X).

AMD Instinct MI300X & MI300A AI Accelerators Detailed: CDNA 3 & Zen 4
Come Together In An Advanced Packaging Marvel 2
AMD Instinct MI300X Accelerator.

Starting with the design, the main interposer is laid out with a
passive die which houses the interconnect layer using a 4th Gen
Infinity Fabric solution. The interposer includes a total of 28 dies
which include eight HBM3 packages, 16 dummy dies between the HBM
packages, & four active dies and each of these active dies gets two
compute dies.

Each GCD based on the CDNA 3 GPU architecture features a total of 40
compute units which equals 2560 cores. There are eight compute dies
(GCDs) in total so that gives us a total of 320 Compute & 20,480 core
units. For yields, AMD will be scaling back a small portion of these
cores and we will be seeing a total of 304 Compute units (38 CUs per
GPU chiplet) enabled for a total of 19,456 stream processors.

[AMD-MI300X]
AMD Instinct MI300X Accelerator with CDNA 3 dies.

Memory is another area where you will see a huge upgrade with the
MI300X boasting 50% more HBM3 capacity than its predecessor, the
MI250X (128 GB). To achieve a memory pool of 192 GB, AMD is equipping
the MI300X with 8 HBM3 stacks and each stack is 12-Hi while
incorporating 16 Gb ICs which give us 2 GB capacity per IC or 24 GB
per stack.

The memory will offer up to 5.3 TB/s of bandwidth and 896 GB/s of
Infinity Fabric Bandwidth. For comparison, NVIDIA's upcoming H200 AI
accelerator offers 141 GB capacities while Gaudi 3 from Intel will be
offering 144 GB capacities. Large memory pools matter a lot in LLMs
which are mostly memory-bound and AMD can show its AI prowess by
leading in the memory department. For comparisons:

  * Instinct MI300X - 192 GB HBM3
  * Gaudi 3 - 144 GB HBM3
  * H200 - 141 GB HBM3e
  * MI300A - 128 GB HBM3
  * MI250X - 128 GB HBM2e
  * H100 - 96 GB HBM3
  * Gaudi 2 - 96 GB HBM2e

[AMD-Instinct-MI300-_-MI300X-Launch-_2]

In terms of power consumption, the AMD Instinct MI300X is rated at
750W which is a 50% increase over the 500W of the Instinct MI250X and
50W more than the NVIDIA H200.

amd-instinct-mi300-ai-accelerators-servers-_2
amd-instinct-mi300-ai-accelerators-servers-_1
amd-instinct-mi300-ai-accelerators-servers-_4
amd-instinct-mi300-ai-accelerators-servers-_5
AMD Instinct MI300A APUs Power French "Adastra" Supercomputer, MI300
Expected To Ship 400,000 Units In 2024 1
2 of 9

One configuration showcased is the G593-ZX1/ZX2 series of servers
from Gigabyte which offer up to 8 MI300X GPU accelerators and two AMD
EPYC 9004 CPUs. These systems will be equipped with up to eight 3000W
power supplies, totaling 18000W of power. AMD also showcased its own
Instinct MI300X platform which includes 8 of these AI accelerator
chips, offering some solid numbers over the NVIDIA HGX H100 platform.
Some numbers shared by AMD include:

  * 2.4X Higher HBM3 Memory (1.5 TB vs 640 GB)
  * 1.3X More Compute FLOPS (10.4 PF vs 7.9 PF)
  * Similar Bi-Directional Bandwidth (896 GB/s vs 900 GB/s)
  * Similar Single-Node Ring Bandwidth (448 GB/s vs 450 GB/s)
  * Similar Networking Capabilities (400 GbE vs 400 GbE)
  * Similar PCIe Protocol (PCIe Gen 5 128 GB/s)

[AMD-Instinct-MI300-_-MI300X-Launch-_5]

For now, AMD should know that their competitors are also going full
steam ahead on the AI craze with NVIDIA already teasing some huge
figures for its 2024 Hopper H200 GPUs & Blackwell B100 GPUs and Intel
prepping up its Guadi 3 and Falcon Shores GPUs for launch in the
coming years too. Companies such as Oracle, Dell, META, and OpenAI
have announced support for AMD's Instinct MI300 AI chips in their
ecosystem.

One thing is for sure at the current moment, AI customers will gobble
up almost anything they can get and everyone is going to take
advantage of that. But AMD has a very formidable solution that is not
just aiming to be an alternative to NVIDIA but a leader in the AI
segment.

AMD Radeon Instinct Accelerators

Accelerator     AMD       AMD       AMD       AMD       AMD    AMD Instinct  AMD Radeon   AMD Radeon   AMD Radeon   AMD Radeon   AMD Radeon
    Name     Instinct  Instinct  Instinct  Instinct  Instinct     MI100       Instinct     Instinct     Instinct   Instinct MI8 Instinct MI6
               MI400     MI300    MI250X     MI250     MI210                    MI60         MI50         MI25
CPU          Zen 5     Zen 4
Architecture (Exascale (Exascale N/A       N/A       N/A       N/A          N/A          N/A          N/A          N/A          N/A
             APU)      APU)
GPU                    Aqua      Aldebaran Aldebaran Aldebaran Arcturus
Architecture CDNA 4    Vanjaram  (CDNA 2)  (CDNA 2)  (CDNA 2)  (CDNA 1)     Vega 20      Vega 20      Vega 10      Fiji XT      Polaris 10
                       (CDNA 3)
GPU Process  4nm       5nm+6nm   6nm       6nm       6nm       7nm FinFET   7nm FinFET   7nm FinFET   14nm FinFET  28nm         14nm FinFET
Node
                                 2 (MCM)   2 (MCM)   2 (MCM)   1            1            1            1            1            1
GPU Chiplets TBD       8 (MCM)   1 (Per    1 (Per    1 (Per    (Monolithic) (Monolithic) (Monolithic) (Monolithic) (Monolithic) (Monolithic)
                                 Die)      Die)      Die)
GPU Cores    TBD       Up To     14,080    13,312    6656      7680         4096         3840         4096         4096         2304
                       19,456
GPU Clock    TBD       TBA       1700 MHz  1700 MHz  1700 MHz  1500 MHz     1800 MHz     1725 MHz     1500 MHz     1000 MHz     1237 MHz
Speed
FP16 Compute TBD       TBA       383 TOPs  362 TOPs  181 TOPs  185 TFLOPs   29.5 TFLOPs  26.5 TFLOPs  24.6 TFLOPs  8.2 TFLOPs   5.7 TFLOPs
FP32 Compute TBD       TBA       95.7      90.5      45.3      23.1 TFLOPs  14.7 TFLOPs  13.3 TFLOPs  12.3 TFLOPs  8.2 TFLOPs   5.7 TFLOPs
                                 TFLOPs    TFLOPs    TFLOPs
FP64 Compute TBD       TBA       47.9      45.3      22.6      11.5 TFLOPs  7.4 TFLOPs   6.6 TFLOPs   768 GFLOPs   512 GFLOPs   384 GFLOPs
                                 TFLOPs    TFLOPs    TFLOPs
VRAM         TBD       192 GB    128 GB    128 GB    64 GB     32 GB HBM2   32 GB HBM2   16 GB HBM2   16 GB HBM2   4 GB HBM1    16 GB GDDR5
                       HBM3      HBM2e     HBM2e     HBM2e
Memory Clock TBD       5.2 Gbps  3.2 Gbps  3.2 Gbps  3.2 Gbps  1200 MHz     1000 MHz     1000 MHz     945 MHz      500 MHz      1750 MHz
Memory Bus   TBD       8192-bit  8192-bit  8192-bit  4096-bit  4096-bit bus 4096-bit bus 4096-bit bus 2048-bit bus 4096-bit bus 256-bit bus
Memory       TBD       5.3 TB/s  3.2 TB/s  3.2 TB/s  1.6 TB/s  1.23 TB/s    1 TB/s       1 TB/s       484 GB/s     512 GB/s     224 GB/s
Bandwidth
Form Factor  TBD       OAM       OAM       OAM       Dual Slot Dual Slot,   Dual Slot,   Dual Slot,   Dual Slot,   Dual Slot,   Single Slot,
                                                     Card      Full Length  Full Length  Full Length  Full Length  Half Length  Full Length
Cooling      TBD       Passive   Passive   Passive   Passive   Passive      Passive      Passive      Passive      Passive      Passive
                       Cooling   Cooling   Cooling   Cooling   Cooling      Cooling      Cooling      Cooling      Cooling      Cooling
TDP (Max)    TBD       750W      560W      500W      300W      300W         300W         300W         300W         175W         150W

Share this story
Facebook
Twitter

Deal of the Day

[231128-720p-gd24-2]

Further Reading

  * [AMD-Ryzen-8000-Hawk-Point-APUs-2]

    AMD Ryzen 8000 Hawk Point APUs Official: Zen 4 CPU, RDNA 3 GPU, &
    Upgraded XDNA AI NPU With 16 TOPs

  * [AMD-Instinct-MI300A-Exascale-APU]

    AMD Instinct MI300A APU Enters Volume Production: Up To 4X Faster
    Than NVIDIA H100 In HPC, Twice As Efficient

  * Watch The AMD "Advancing AI" Livestream Here - Instinct MI300
    Unveil & More 1

    Watch The AMD "Advancing AI" Livestream Here - Instinct MI300
    Unveil & More

  * NVIDIA Working & Fully Complying With US Policies In Development
    of New AI Chips For China 1

    NVIDIA Working & Fully Complying With US Policies In Development
    of New AI Chips For China

Comments

Please enable JavaScript to view the comments.
[2023-10_LIQMAXFLO_300x250][Google-ad-XTRME5-1_300x2501]

Trending Stories

  * AMD Launches Instinct MI300X AI GPU Accelerator, Up To 60% Faster
    Than NVIDIA H100

    36 Active Readers
  * Cyberpunk 2077 Patch 2.1 Path Tracing, Ray Tracing Comparisons
    Highlight Improvements Across the Board

    22 Active Readers
  * AMD Instinct MI300A APU Enters Volume Production: Up To 4X Faster
    Than NVIDIA H100 In HPC, Twice As Efficient

    22 Active Readers
  * The PlayStation 5 Has Certain Advantages Over a PC, According to
    SIE Vice President

    16 Active Readers
  * AMD Ryzen 8000 Hawk Point APUs Official: Zen 4 CPU, RDNA 3 GPU, &
    Upgraded XDNA AI NPU With 16 TOPs

    15 Active Readers

Popular Discussions

  * NVIDIA's Next-Gen HPC & AI GPU Architecture Could Be Named After
    Astronomer, Vera Rubin - R100 GPUs To Replace Blackwell B100

    1227 Comments
  * Intel Calls Out AMD For Using Old Cores In New CPUs In "Core
    Truths" Marketing Playbook, Is This "Real-World Performance" 2.0?

    1157 Comments
  * Watch The AMD "Advancing AI" Livestream Here - Instinct MI300
    Unveil & More

    1092 Comments
  * AMD Opens The Company's Largest R&D Center In India To Accelerate
    Development of Next-Gen CPUs, GPUs & SOCs

    1018 Comments
  * MSI Brings 6 GHz+ One-Click Overclocking To Z790 & Z690
    Motherboards, Enhanced APO Support Too

    890 Comments

 

Subscribe to get an everyday digest of the latest technology news in
your inbox

Email address [                    ]
 
Follow us on

Facebook Youtube Twitter

Topics

  * Hardware
  * Gaming
  * Mobile
  * Finance
  * Software
  * Security
  * Web

Sections

  * Deals
  * Reviews
  * Videos
  * How To's
  * Analysis
  * Exclusives
  * Interviews

Company

  * About
  * Advertise with Us
  * Contact
  * Tip Us
  * Careers
  * Terms of Use
  * Privacy & Cookie Policy
  * Ethics Statement
  * Appeal Moderation

Some posts on wccftech.com may contain affiliate links. We are a
participant in the Amazon Services LLC Associates Program, an
affiliate advertising program designed to provide a means for sites
to earn advertising fees by advertising and linking to amazon.com

(c) 2023 WCCF TECH INC. 700 - 401 West Georgia Street, Vancouver, BC,
Canada