[HN Gopher] Every Flop Counts: Scaling a 300B LLM Without Premiu...
       ___________________________________________________________________
        
       Every Flop Counts: Scaling a 300B LLM Without Premium GPUs
        
       Author : bretpiatt
       Score  : 109 points
       Date   : 2025-03-24 12:48 UTC (4 days ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | osti wrote:
       | I think this is the one where they train LLM without NVIDIA
       | GPU's.
        
         | cavisne wrote:
         | They talk about CUDA level tracing in their framework. I assume
         | its just consumer GPU's that Nvidia say arent meant to be used
         | in datacenters.
        
       | flowerthoughts wrote:
       | They never mention what hardware they're on.
       | 
       | Table 1 is the closest thing. Device specs for six devices:
       | 120-989 TFLOPS and 64-96 GB RAM.
       | 
       | An RTX 5090 is about 105 TFLOPS.
       | 
       | https://www.techpowerup.com/gpu-specs/geforce-rtx-5090.c4216
        
       | rahen wrote:
       | I'm pretty surprised by the claimed memory usage for 300B
       | parameters (table 1). If we compare similar models:
       | 
       | - Llama 3.1 with 405B parameters: 2 TB of memory (FP32), 500 GB
       | (FP8)
       | 
       | - DeepSeek R1 with 671B parameters: 1.3 TB (scaling linearly,
       | around 600 GB for 300B parameters)
       | 
       | Ling claims no more than 96 GB of memory, most likely for
       | inference. That's far more than a 20% reduction. Am I missing
       | something?
        
         | fxtentacle wrote:
         | Some of these models still produce great results with something
         | low like 2.7 bits per variable.
        
         | cavisne wrote:
         | I think they only claim their "Ling-Lite" 17B model can fit on
         | a single 96GB GPU, their 300B model needs 8 of them (768GB of
         | HBM)
        
       ___________________________________________________________________
       (page generated 2025-03-28 23:02 UTC)