https://9to5mac.com/2024/12/18/apple-collaborates-with-nvidia-to-research-faster-llm-performance/ Skip to main content Toggle main menu Go to the 9to5Mac home page Switch site * 9to5Toys * 9to5Google * Electrek * DroneDJ * Space Explored * About * Privacy * * * * * * * * * * Toggle social menu * * * Toggle dark mode Search for: [ ] Submit Toggle search form * Exclusives * Store * Podcasts + Apple@Work + Happy Hour + 9to5Mac Daily * Vision + Vision Pro + visionOS + Hands-on * iPhone + iPhone + iPhone 14 + iPhone 15 + iPhone 16 + iOS 18 * Mac + Mac + MacBook Pro + MacBook Air + iMac + Mac mini + Mac Studio + Mac Pro + macOS Sequoia * Watch + Apple Watch + Apple Watch Ultra + Apple Watch Ultra 2 + Apple Watch Series 10 + Apple Health + Apple Watch SE + watchOS 11 * iPad + iPad Pro + iPad Air + iPad mini + iPad + iPadOS + iPadOS 18 * Music and TV + Apple Music + AirPods + HomePod + Apple TV + tvOS 18 * Guides + Reviews + How Tos + AAPL + Apple Store + Apple Arcade + Apple Card + Apple Silicon + Apple One + Apple Fitness+ + CarPlay + Siri + HomeKit * * * * * * * * * * * * * Toggle dark mode * AAPL Company * Nvidia Apple collaborates with NVIDIA to research faster LLM performance Avatar for Chance Miller Chance Miller | Dec 18 2024 - 1:33 pm PT 0 Comments [tensor-rt-llm-graphic] In a blog post today, Apple engineers have shared new details on a collaboration with NVIDIA to implement faster text generation performance with large language models. Apple published and open sourced its Recurrent Drafter (ReDrafter) technique earlier this year. It represents a new method for generating text with LLMs that is significantly faster and "achieves state of the art performance." It combines two techniques: beam search (to explore multiple possibilities) and dynamic tree attention (to efficiently handle choices). While its research demonstrated strong results, Apple collaborated with NVIDIA to apply ReDrafter in production. As part of this collaboration, ReDrafter was integrated into NVIDIA TensorRT-LLM, a tool that helps run LLMs faster on NVIDIA GPUs. Here are the results: To enable the integration of ReDrafter, NVIDIA added new operators or exposed existing ones, which considerably improved TensorRT-LLM's capability to accommodate sophisticated models and decoding methods. ML developers using NVIDIA GPUs can now easily benefit from ReDrafter's accelerated token generation for their production LLM applications with TensorRT-LLM. In benchmarking a tens-of-billions parameter production model on NVIDIA GPUs, using the NVIDIA TensorRT-LLM inference acceleration framework with ReDrafter, we have seen 2.7x speed-up in generated tokens per second for greedy decoding. These benchmark results indicate this tech could significantly reduce latency users may experience, while also using fewer GPUs and consuming less power. "LLMs are increasingly being used to power production applications, and improving inference efficiency can both impact computational costs and reduce latency for users," Apple's machine learning researchers conclude. "With ReDrafter's novel approach to speculative decoding integrated into the NVIDIA TensorRT-LLM framework, developers can now benefit from faster token generation on NVIDIA GPUs for their production LLM applications." You can learn more about this work on Apple's website and in a blog post on NVIDIA's website: * Apple: Accelerating LLM Inference on NVIDIA GPUs with ReDrafter * NVIDIA: NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference Follow Chance: Threads, Bluesky, Instagram, and Mastodon. Add 9to5Mac to your Google News feed. FTC: We use income earning auto affiliate links. More. [XGIMI-750-150] You're reading 9to5Mac -- experts who break news about Apple and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Mac on Twitter, Facebook, and LinkedIn to stay in the loop. Don't know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel Featured from 9to5Mac * Apple Intelligence logo Apple Intelligence isn't changing lives yet, but just you wait Avatar for Ryan Christoffel Ryan Christoffel Dec 17 2024 * [ios-18] iOS 18.3: New features, release date, and more Avatar for Chance Miller Chance Miller Dec 17 2024 * Beats Solo 4 (left) vs Beats Studio Pro (right) The best Beats products and deals for holiday shopping Avatar for Chance Miller Chance Miller Dec 17 2024 * iPhone Gift Ideas Five last minute Apple gift ideas for your iPhone friends and family Avatar for Benjamin Mayo Benjamin Mayo Dec 18 2024 [INS::INS] Check out 9to5Mac on YouTube for more Apple news: Comments Expand Close comments Expand Close comments Guides AAPL Company AAPL Company Breaking news from Cupertino. We'll give you t... Nvidia Nvidia Nvidia is best known as a designer of GPUs (Grap... Author Avatar for Chance Miller Chance Miller chancehmiller Chance is the editor-in-chief of 9to5Mac, overseeing the entire site's operations. He also hosts the 9to5Mac Daily and 9to5Mac Happy Hour podcasts. You can send tips, questions, and typos to chance@9to5mac.com. [pus] Manage push notifications All [ ] Post [ ] Update [9to5Mac-Si] notification icon We would like to show you notifications for the latest news and updates. Allow Cancel notification icon You are subscribed to notifications Close notification icon We would like to show you notifications for the latest news and updates. Allow Cancel notification icon You are subscribed to notifications Close