[HN Gopher] How to scale your model: A systems view of LLMs on TPUs
___________________________________________________________________
How to scale your model: A systems view of LLMs on TPUs
Author : mattjjatgoogle
Score : 126 points
Date : 2025-02-04 18:56 UTC (4 hours ago)
(HTM) web link (jax-ml.github.io)
(TXT) w3m dump (jax-ml.github.io)
| perfobotto wrote:
| What an amazing write up! Thank you very much!
| hassleblad23 wrote:
| Great writeup. Congrats.
| 3abiton wrote:
| I am really looking forward for JAX to take over pytorch/cuda
| over the next years. The whole PTX kerfuffle with Deepseek team
| shows the value of investing in more low levels approaches to
| squeeze out the most out of your hardware.
| kadushka wrote:
| Most Pytorch users don't bother even with the simplest
| performance optimizations, and you are talking about PTX.
| throwaway287391 wrote:
| I like JAX but I'm not sure how an ML framework debate like
| "JAX vs PyTorch" is relevant to DeepSeek/PTX. The JAX API is at
| a similar level of abstraction to PyTorch [0]. Both are Python
| libraries and sit a few layers of abstraction above PTX/CUDA
| and their TPU equivalents.
|
| [0] Although PyTorch arguably encompasses 2 levels, with both a
| pure functional library like the JAX API, as well as a "neural
| network" framework on top of it. Whereas JAX doesn't have the
| latter and leaves that to separate libraries like Flax.
| saagarjha wrote:
| You do understand that PTX is part of CUDA right?
| mattjjatgoogle wrote:
| An author's tweet thread:
| https://x.com/jacobaustin132/status/1886844716446007300
| awongh wrote:
| Here in the thread he says:
| https://x.com/jacobaustin132/status/1886844724339675340 : `5
| years ago, there were many ML architectures, but today, there
| is (mostly) only one [transformers].`
|
| To what degree is this actually true, and what else is on the
| horizon that might become as popular as transformers?
| lordswork wrote:
| This has been my bible for performance work internally at Google.
| Kind of surprised they released it publicly, but I guess they
| removed all the Gemini-specific details.
| whatever1 wrote:
| How do they make these fancy animations?
| alevskaya wrote:
| Nothing fancy. I made these with some pretty simple hand
| written scripts in javascript rendering to canvas: lots of
| fiddly little boxes moving around are simpler to script than to
| hand animate. (If I were to do much more of this I might
| rewrite these in blender since it has much nicer authoring
| tooling and export control.)
| memhole wrote:
| This is awesome! Can't wait to read it. I've been very curious
| about why we don't hear more about LLMs on TPUs.
| nicodjimenez wrote:
| Shameless request for help: if anybody has experience with
| seq2seq on TPU, and you want to do a cool project to deploy a
| world class Pytorch image parsing model to TPU (and do this
| quickly), please contact me immediately for a well paid and
| interesting job opportunity at nico [at] mathpix.com.
| brap wrote:
| Not strictly related, but does anyone know why JAX uses tracing
| and not AST via reflection?
| eamag wrote:
| Any way to convert this Jekyll site to a PDF?
___________________________________________________________________
(page generated 2025-02-04 23:00 UTC)