https://wccftech.com/amd-launches-instinct-mi300x-ai-gpu-accelerator-up-to-60-percent-faster-nvidia-h100/ Menu News Hardware Gaming Mobile Finance Software Deals Reviews Videos How To [ ] [SUBMIT] News Hardware Gaming Mobile Finance Software Deals Reviews Videos How To AnnouncementHardware AMD Launches Instinct MI300X AI GPU Accelerator, Up To 60% Faster Than NVIDIA H100 Hassan Mujtaba * Dec 6, 2023 01:27 PM EST * Copy Shortlink [AMD-Instinct-MI300X-Launch-728x422] AMD has announced the official launch of its flagship AI GPU accelerator, the MI300X, which offers up to 60% better performance than NVIDIA's H100. AMD Finally Has The GPU To Tackle NVIDIA In The AI Segment, MI300X Up To 60% Faster Than H100 The AMD Instinct MI300 class of AI accelerators will be another chiplet powerhouse, making use of advanced packaging technologies from TSMC. Today, AMD not only announced the launch of these chips but shared the first performance benchmarks of the MI300X which look great. AMD first used the general specs as a comparison and their CDNA 3 accelerator offers (versus NVIDIA H100): Related Story AMD Ryzen 8050 Strix Point APUs With XDNA 2 NPU Coming In 2024, Boosts AI Performance By 3X * 2.4X Higher Memory Capacity * 1.6X Higher Memory Bandwidth * 1.3X FP8 TFLOPS * 1.3X FP16 TFLOPS * Up To 20% Faster Vs H100 (Llama 2 70B) In 1v1 Comparison * Up To 20% Faster Vs H100 (FlashAttention 2) in 1v1 Comparison * Up To 40% Faster Vs H100 (Llama 2 70B) in 8v8 Server * Up To 60% Faster Vs H100 (Bloom 176B) In 8v8 Server amd-instinct-mi300-_-mi300x-launch-_4 amd-instinct-mi300-_-mi300x-launch-_3 amd-instinct-mi300-_-mi300x-launch-_6 2 of 9 In general LLM Kernel TFLOPs, the MI300X offers up to 20% higher performance in FlashAttention-2 and Llama 2 70B. Looking from a platform perspective which compares an 8x MI300X solution to an 8X H100 solution, we see a much bigger 40% gain in Llama 2 70B & a 60% gain in Bloom 176B. [AMD-Instinct-MI300-_-MI300X-Launch-_7] AMD mentions that in training performance, the MI300X is on par with the competition (H100) and offers competitive price/perf while shining in inferencing workloads. The driving force behind the latest MI300 accelerators is ROCm 6.0. The software stack has been updated to the latest version with powerful new features which include support for various AI workloads such as Generative AI and Large language models. amd-rocm-6-_1 amd-rocm-6-_2 amd-rocm-6-_3 2 of 9 The new software stack supports the latest compute formats such as FP16, Bf16, and FP8 (including Sparsity). The optimizations combine to offer up to 2.6x speedup in vLLM through optimized inference libraries, 1.4x speedup in HIP Graph through optimized runtime, and 1.3x Flash Attention speedup through optimized Kernels. ROCm 6 is expected later this month alongside the MI300 AI accelerators. It will be interesting to see how ROCm 6 compares to the latest version of NVIDIA's CUDA stack which is its real competition. AMD Instinct MI300X - Challenging NVIDIA's AI Supremacy With CDNA 3 & Huge Memory The AMD Instinct MI300X is the chip that will be highlighted the most since it is targeted at NVIDIA's Hopper and Intel's Gaudi accelerators within the AI segment. This chip has been designed solely on the CDNA 3 architecture and there is a lot of stuff going on. The chip is going to host a mix of 5nm and 6nm IPs, all combining to deliver up to 153 Billion transistors (MI300X). AMD Instinct MI300X & MI300A AI Accelerators Detailed: CDNA 3 & Zen 4 Come Together In An Advanced Packaging Marvel 2 AMD Instinct MI300X Accelerator. Starting with the design, the main interposer is laid out with a passive die which houses the interconnect layer using a 4th Gen Infinity Fabric solution. The interposer includes a total of 28 dies which include eight HBM3 packages, 16 dummy dies between the HBM packages, & four active dies and each of these active dies gets two compute dies. Each GCD based on the CDNA 3 GPU architecture features a total of 40 compute units which equals 2560 cores. There are eight compute dies (GCDs) in total so that gives us a total of 320 Compute & 20,480 core units. For yields, AMD will be scaling back a small portion of these cores and we will be seeing a total of 304 Compute units (38 CUs per GPU chiplet) enabled for a total of 19,456 stream processors. [AMD-MI300X] AMD Instinct MI300X Accelerator with CDNA 3 dies. Memory is another area where you will see a huge upgrade with the MI300X boasting 50% more HBM3 capacity than its predecessor, the MI250X (128 GB). To achieve a memory pool of 192 GB, AMD is equipping the MI300X with 8 HBM3 stacks and each stack is 12-Hi while incorporating 16 Gb ICs which give us 2 GB capacity per IC or 24 GB per stack. The memory will offer up to 5.3 TB/s of bandwidth and 896 GB/s of Infinity Fabric Bandwidth. For comparison, NVIDIA's upcoming H200 AI accelerator offers 141 GB capacities while Gaudi 3 from Intel will be offering 144 GB capacities. Large memory pools matter a lot in LLMs which are mostly memory-bound and AMD can show its AI prowess by leading in the memory department. For comparisons: * Instinct MI300X - 192 GB HBM3 * Gaudi 3 - 144 GB HBM3 * H200 - 141 GB HBM3e * MI300A - 128 GB HBM3 * MI250X - 128 GB HBM2e * H100 - 96 GB HBM3 * Gaudi 2 - 96 GB HBM2e [AMD-Instinct-MI300-_-MI300X-Launch-_2] In terms of power consumption, the AMD Instinct MI300X is rated at 750W which is a 50% increase over the 500W of the Instinct MI250X and 50W more than the NVIDIA H200. amd-instinct-mi300-ai-accelerators-servers-_2 amd-instinct-mi300-ai-accelerators-servers-_1 amd-instinct-mi300-ai-accelerators-servers-_4 amd-instinct-mi300-ai-accelerators-servers-_5 AMD Instinct MI300A APUs Power French "Adastra" Supercomputer, MI300 Expected To Ship 400,000 Units In 2024 1 2 of 9 One configuration showcased is the G593-ZX1/ZX2 series of servers from Gigabyte which offer up to 8 MI300X GPU accelerators and two AMD EPYC 9004 CPUs. These systems will be equipped with up to eight 3000W power supplies, totaling 18000W of power. AMD also showcased its own Instinct MI300X platform which includes 8 of these AI accelerator chips, offering some solid numbers over the NVIDIA HGX H100 platform. Some numbers shared by AMD include: * 2.4X Higher HBM3 Memory (1.5 TB vs 640 GB) * 1.3X More Compute FLOPS (10.4 PF vs 7.9 PF) * Similar Bi-Directional Bandwidth (896 GB/s vs 900 GB/s) * Similar Single-Node Ring Bandwidth (448 GB/s vs 450 GB/s) * Similar Networking Capabilities (400 GbE vs 400 GbE) * Similar PCIe Protocol (PCIe Gen 5 128 GB/s) [AMD-Instinct-MI300-_-MI300X-Launch-_5] For now, AMD should know that their competitors are also going full steam ahead on the AI craze with NVIDIA already teasing some huge figures for its 2024 Hopper H200 GPUs & Blackwell B100 GPUs and Intel prepping up its Guadi 3 and Falcon Shores GPUs for launch in the coming years too. Companies such as Oracle, Dell, META, and OpenAI have announced support for AMD's Instinct MI300 AI chips in their ecosystem. One thing is for sure at the current moment, AI customers will gobble up almost anything they can get and everyone is going to take advantage of that. But AMD has a very formidable solution that is not just aiming to be an alternative to NVIDIA but a leader in the AI segment. AMD Radeon Instinct Accelerators Accelerator AMD AMD AMD AMD AMD AMD Instinct AMD Radeon AMD Radeon AMD Radeon AMD Radeon AMD Radeon Name Instinct Instinct Instinct Instinct Instinct MI100 Instinct Instinct Instinct Instinct MI8 Instinct MI6 MI400 MI300 MI250X MI250 MI210 MI60 MI50 MI25 CPU Zen 5 Zen 4 Architecture (Exascale (Exascale N/A N/A N/A N/A N/A N/A N/A N/A N/A APU) APU) GPU Aqua Aldebaran Aldebaran Aldebaran Arcturus Architecture CDNA 4 Vanjaram (CDNA 2) (CDNA 2) (CDNA 2) (CDNA 1) Vega 20 Vega 20 Vega 10 Fiji XT Polaris 10 (CDNA 3) GPU Process 4nm 5nm+6nm 6nm 6nm 6nm 7nm FinFET 7nm FinFET 7nm FinFET 14nm FinFET 28nm 14nm FinFET Node 2 (MCM) 2 (MCM) 2 (MCM) 1 1 1 1 1 1 GPU Chiplets TBD 8 (MCM) 1 (Per 1 (Per 1 (Per (Monolithic) (Monolithic) (Monolithic) (Monolithic) (Monolithic) (Monolithic) Die) Die) Die) GPU Cores TBD Up To 14,080 13,312 6656 7680 4096 3840 4096 4096 2304 19,456 GPU Clock TBD TBA 1700 MHz 1700 MHz 1700 MHz 1500 MHz 1800 MHz 1725 MHz 1500 MHz 1000 MHz 1237 MHz Speed FP16 Compute TBD TBA 383 TOPs 362 TOPs 181 TOPs 185 TFLOPs 29.5 TFLOPs 26.5 TFLOPs 24.6 TFLOPs 8.2 TFLOPs 5.7 TFLOPs FP32 Compute TBD TBA 95.7 90.5 45.3 23.1 TFLOPs 14.7 TFLOPs 13.3 TFLOPs 12.3 TFLOPs 8.2 TFLOPs 5.7 TFLOPs TFLOPs TFLOPs TFLOPs FP64 Compute TBD TBA 47.9 45.3 22.6 11.5 TFLOPs 7.4 TFLOPs 6.6 TFLOPs 768 GFLOPs 512 GFLOPs 384 GFLOPs TFLOPs TFLOPs TFLOPs VRAM TBD 192 GB 128 GB 128 GB 64 GB 32 GB HBM2 32 GB HBM2 16 GB HBM2 16 GB HBM2 4 GB HBM1 16 GB GDDR5 HBM3 HBM2e HBM2e HBM2e Memory Clock TBD 5.2 Gbps 3.2 Gbps 3.2 Gbps 3.2 Gbps 1200 MHz 1000 MHz 1000 MHz 945 MHz 500 MHz 1750 MHz Memory Bus TBD 8192-bit 8192-bit 8192-bit 4096-bit 4096-bit bus 4096-bit bus 4096-bit bus 2048-bit bus 4096-bit bus 256-bit bus Memory TBD 5.3 TB/s 3.2 TB/s 3.2 TB/s 1.6 TB/s 1.23 TB/s 1 TB/s 1 TB/s 484 GB/s 512 GB/s 224 GB/s Bandwidth Form Factor TBD OAM OAM OAM Dual Slot Dual Slot, Dual Slot, Dual Slot, Dual Slot, Dual Slot, Single Slot, Card Full Length Full Length Full Length Full Length Half Length Full Length Cooling TBD Passive Passive Passive Passive Passive Passive Passive Passive Passive Passive Cooling Cooling Cooling Cooling Cooling Cooling Cooling Cooling Cooling Cooling TDP (Max) TBD 750W 560W 500W 300W 300W 300W 300W 300W 175W 150W Share this story Facebook Twitter Deal of the Day [231128-720p-gd24-2] Further Reading * [AMD-Ryzen-8000-Hawk-Point-APUs-2] AMD Ryzen 8000 Hawk Point APUs Official: Zen 4 CPU, RDNA 3 GPU, & Upgraded XDNA AI NPU With 16 TOPs * [AMD-Instinct-MI300A-Exascale-APU] AMD Instinct MI300A APU Enters Volume Production: Up To 4X Faster Than NVIDIA H100 In HPC, Twice As Efficient * Watch The AMD "Advancing AI" Livestream Here - Instinct MI300 Unveil & More 1 Watch The AMD "Advancing AI" Livestream Here - Instinct MI300 Unveil & More * NVIDIA Working & Fully Complying With US Policies In Development of New AI Chips For China 1 NVIDIA Working & Fully Complying With US Policies In Development of New AI Chips For China Comments Please enable JavaScript to view the comments. [2023-10_LIQMAXFLO_300x250][Google-ad-XTRME5-1_300x2501] Trending Stories * AMD Launches Instinct MI300X AI GPU Accelerator, Up To 60% Faster Than NVIDIA H100 36 Active Readers * Cyberpunk 2077 Patch 2.1 Path Tracing, Ray Tracing Comparisons Highlight Improvements Across the Board 22 Active Readers * AMD Instinct MI300A APU Enters Volume Production: Up To 4X Faster Than NVIDIA H100 In HPC, Twice As Efficient 22 Active Readers * The PlayStation 5 Has Certain Advantages Over a PC, According to SIE Vice President 16 Active Readers * AMD Ryzen 8000 Hawk Point APUs Official: Zen 4 CPU, RDNA 3 GPU, & Upgraded XDNA AI NPU With 16 TOPs 15 Active Readers Popular Discussions * NVIDIA's Next-Gen HPC & AI GPU Architecture Could Be Named After Astronomer, Vera Rubin - R100 GPUs To Replace Blackwell B100 1227 Comments * Intel Calls Out AMD For Using Old Cores In New CPUs In "Core Truths" Marketing Playbook, Is This "Real-World Performance" 2.0? 1157 Comments * Watch The AMD "Advancing AI" Livestream Here - Instinct MI300 Unveil & More 1092 Comments * AMD Opens The Company's Largest R&D Center In India To Accelerate Development of Next-Gen CPUs, GPUs & SOCs 1018 Comments * MSI Brings 6 GHz+ One-Click Overclocking To Z790 & Z690 Motherboards, Enhanced APO Support Too 890 Comments Subscribe to get an everyday digest of the latest technology news in your inbox Email address [ ] Follow us on Facebook Youtube Twitter Topics * Hardware * Gaming * Mobile * Finance * Software * Security * Web Sections * Deals * Reviews * Videos * How To's * Analysis * Exclusives * Interviews Company * About * Advertise with Us * Contact * Tip Us * Careers * Terms of Use * Privacy & Cookie Policy * Ethics Statement * Appeal Moderation Some posts on wccftech.com may contain affiliate links. We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com (c) 2023 WCCF TECH INC. 700 - 401 West Georgia Street, Vancouver, BC, Canada