[HN Gopher] How to scale your model: A systems view of LLMs on TPUs
       ___________________________________________________________________
        
       How to scale your model: A systems view of LLMs on TPUs
        
       Author : mattjjatgoogle
       Score  : 126 points
       Date   : 2025-02-04 18:56 UTC (4 hours ago)
        
 (HTM) web link (jax-ml.github.io)
 (TXT) w3m dump (jax-ml.github.io)
        
       | perfobotto wrote:
       | What an amazing write up! Thank you very much!
        
       | hassleblad23 wrote:
       | Great writeup. Congrats.
        
       | 3abiton wrote:
       | I am really looking forward for JAX to take over pytorch/cuda
       | over the next years. The whole PTX kerfuffle with Deepseek team
       | shows the value of investing in more low levels approaches to
       | squeeze out the most out of your hardware.
        
         | kadushka wrote:
         | Most Pytorch users don't bother even with the simplest
         | performance optimizations, and you are talking about PTX.
        
         | throwaway287391 wrote:
         | I like JAX but I'm not sure how an ML framework debate like
         | "JAX vs PyTorch" is relevant to DeepSeek/PTX. The JAX API is at
         | a similar level of abstraction to PyTorch [0]. Both are Python
         | libraries and sit a few layers of abstraction above PTX/CUDA
         | and their TPU equivalents.
         | 
         | [0] Although PyTorch arguably encompasses 2 levels, with both a
         | pure functional library like the JAX API, as well as a "neural
         | network" framework on top of it. Whereas JAX doesn't have the
         | latter and leaves that to separate libraries like Flax.
        
         | saagarjha wrote:
         | You do understand that PTX is part of CUDA right?
        
       | mattjjatgoogle wrote:
       | An author's tweet thread:
       | https://x.com/jacobaustin132/status/1886844716446007300
        
         | awongh wrote:
         | Here in the thread he says:
         | https://x.com/jacobaustin132/status/1886844724339675340 : `5
         | years ago, there were many ML architectures, but today, there
         | is (mostly) only one [transformers].`
         | 
         | To what degree is this actually true, and what else is on the
         | horizon that might become as popular as transformers?
        
       | lordswork wrote:
       | This has been my bible for performance work internally at Google.
       | Kind of surprised they released it publicly, but I guess they
       | removed all the Gemini-specific details.
        
       | whatever1 wrote:
       | How do they make these fancy animations?
        
         | alevskaya wrote:
         | Nothing fancy. I made these with some pretty simple hand
         | written scripts in javascript rendering to canvas: lots of
         | fiddly little boxes moving around are simpler to script than to
         | hand animate. (If I were to do much more of this I might
         | rewrite these in blender since it has much nicer authoring
         | tooling and export control.)
        
       | memhole wrote:
       | This is awesome! Can't wait to read it. I've been very curious
       | about why we don't hear more about LLMs on TPUs.
        
       | nicodjimenez wrote:
       | Shameless request for help: if anybody has experience with
       | seq2seq on TPU, and you want to do a cool project to deploy a
       | world class Pytorch image parsing model to TPU (and do this
       | quickly), please contact me immediately for a well paid and
       | interesting job opportunity at nico [at] mathpix.com.
        
       | brap wrote:
       | Not strictly related, but does anyone know why JAX uses tracing
       | and not AST via reflection?
        
       | eamag wrote:
       | Any way to convert this Jekyll site to a PDF?
        
       ___________________________________________________________________
       (page generated 2025-02-04 23:00 UTC)