[HN Gopher] Trillium TPU Is GA
       ___________________________________________________________________
        
       Trillium TPU Is GA
        
       Author : gok
       Score  : 97 points
       Date   : 2024-12-11 15:44 UTC (7 hours ago)
        
 (HTM) web link (cloud.google.com)
 (TXT) w3m dump (cloud.google.com)
        
       | xnx wrote:
       | > We used Trillium TPUs to train the new Gemini 2.0,
       | 
       | Wow. I knew custom Google silicon was used for inference, but I
       | didn't realize it was used for training too. Does this mean
       | Google is free of dependence on Nvidia GPUs? That would be a huge
       | advantage over AI competitors.
        
         | m3kw9 wrote:
         | Maybe only for their own models
        
           | walterbell wrote:
           | Now any Google customer can use Trillium for training any
           | model?
        
             | richards wrote:
             | [Google employee] Yes, you can use TPUs in Compute Engine
             | and GKE, among other places, for whatever you'd like. I
             | just checked and the v6 are available.
        
               | KaoruAoiShiho wrote:
               | Is there not goin to be a v6p?
        
               | richards wrote:
               | Can't speculate on futures, but here's the current
               | version log ... https://cloud.google.com/tpu/docs/system-
               | architecture-tpu-vm...
        
           | xnx wrote:
           | Google trained Llama-2-70B on Trillium chips
        
             | monocasa wrote:
             | I thought llama was trained by meta.
        
             | DrBenCarson wrote:
             | > Google trained Llama
             | 
             | Source? This would make quite the splash in the market
        
               | xnx wrote:
               | It's in the article: "When training the Llama-2-70B
               | model, our tests demonstrate that Trillium achieves near-
               | linear scaling from a 4-slice Trillium-256 chip pod to a
               | 36-slice Trillium-256 chip pod at a 99% scaling
               | efficiency."
        
               | llm_nerd wrote:
               | I'm pretty sure they're doing fine-tune training, using
               | Llama because it is a widely known and available sample.
               | They used SDXL elsewhere for the same reason.
               | 
               | Llama 2 was released well over a year ago and was
               | training between Meta and Microsoft.
        
               | hhh wrote:
               | They can just train another one.
        
               | llm_nerd wrote:
               | Llama 2 end weights are public. The data used to train
               | it, or even the process used to train it, are not. Google
               | can't just train another Llama 2 from scratch.
               | 
               | They could train something similar, but it'd be super
               | weird if they called it Llama 2. They could call it
               | something like "Gemini", or if it's open weights,
               | "Gemma".
        
         | Permit wrote:
         | My understanding is that the Trillium TPU was primarily
         | targeted at inference (so it's surprising to see it was used to
         | train Gemini 2.0) but other generations of TPUs have targeted
         | training. For example the chip prior to this one is called TPU
         | v5p and was targeted toward training.
        
         | dekhn wrote:
         | Google silicon TPUs have been used for training for at least 5
         | years, probably more (I think it's 10 years). They do not
         | depend on nvidia gpus for the majority of their projects. Took
         | TPUs a while to catch up on some details, like sparsity.
        
           | summerlight wrote:
           | This aligns with my knowledge. I don't know much about LLM,
           | but TPU has been used for training deep prediction models in
           | ads at least from 2018, though there were some gap filled by
           | CPU/GPU for a while. Nowaday, TPU capacity is probably more
           | than the combination of CPU/GPU.
        
             | felarof wrote:
             | +1, almost all (if not all) Google training runs on TPU.
             | They don't use NVIDIA GPUs at all.
        
               | dekhn wrote:
               | at some point some researchers were begging for GPUs...
               | mainly for sparse work. I think that's why sparsecore was
               | added to TPU (https://cloud.google.com/tpu/docs/system-
               | architecture-tpu-vm...) in v4. I think at this point with
               | their tech turnaround time they can catch up as
               | competitors add new features and researchers want to use
               | them.
        
               | felarof wrote:
               | dumb question: wdym by sparse work? Is it embedding
               | lookups?
               | 
               | (TPUs have had BarnaCore for efficient embedding lookups
               | since TPU v3)
        
               | dekhn wrote:
               | Mostly embedding, but IIRC DeepMind RL made use of
               | sparsity- basically, huge matrices with only a few non-
               | zero elements.
               | 
               | BarnaCore existed and was used, but was tailored mostly
               | for embeddings. BTW, IIRC they were called that because
               | they were added "like a barnacle hanging off the side".
               | 
               | The evolution of TPU has been interesting to watch; I
               | came from the HPC and supercomputing space, and seeing
               | Google as mostly-CPU for the longest time, and then
               | finally learning how to build "supercomputers" over a
               | decade+ (gradually adding many features that classical
               | supercomputers have long had), was a very interesting
               | process. Some very expensive mistakes along the way. But
               | now they've paid down almost all the expensive up-front
               | costs and can now ride on the margins, adding new bits
               | and pieces while increasing the clocks and capacities on
               | a cadence.
        
               | amelius wrote:
               | Do they have the equivalent of CUDA, and what is it
               | called?
        
         | lern_too_spel wrote:
         | Since TPUv2, announced in 2017:
         | https://arstechnica.com/information-technology/2017/05/googl...
         | 
         | The superscalers are all working on this.
         | https://aws.amazon.com/ai/machine-learning/trainium/
        
         | drusepth wrote:
         | Why is that a huge advantage over AI competitors? Just not
         | having to fight for limited Nvidia supply?
        
           | aseipp wrote:
           | That is one factor, but another is total cost of ownership.
           | At large scales something that's 1/2 the overall speed but
           | 1/3rd the total cost is still a net win by a large margin.
           | This is one of the reasons why every major hyperscaler is, to
           | some extent, developing their own hardware e.g. Meta, who
           | famously have an insane amount of Nvidia GPUs.
           | 
           | Of course this does not mean their models will necessarily be
           | proportionally better, nor does it mean Google won't buy GPUs
           | for other reasons (like providing them to customers on Google
           | Cloud.)
        
           | xnx wrote:
           | Yes. And cheaper operating cost per TFLOP.
        
           | badlucklottery wrote:
           | Vertical integration.
           | 
           | Nvidia is making big bucks "selling shovels in a gold rush".
           | Google has made their own shovel factory and they can avoid
           | paying Nvidia's margins.
        
           | bufferoverflow wrote:
           | TPUs are cheaper and faster than GPUs. But it's custom
           | silicon. Which means barrier to entry is very very high.
        
             | felarof wrote:
             | > Which means barrier to entry is very very high.
             | 
             | +1 on this. The tooling to use TPUs still needs more work.
             | But we are betting on building this tooling and unlocking
             | these ASIC chips (https://github.com/felafax/felafax).
        
         | felarof wrote:
         | TPUs have been used for training since a long time.
         | 
         | (PS: we are startup trying to make TPUs more accessible, if you
         | wanna fine-tune Llama3 on TPU check out
         | https://github.com/felafax/felafax)
        
       | randomcatuser wrote:
       | How good is Trillium/TPU compared to Nvidia? It seems the stats
       | are: tpu v6e achieves 900 TFLOPS per chip (fp16) while Nvidia
       | H100 achieves 1800 TFLOPS per gpu? (fp16)?
       | 
       | Would be neat if anyone has benchmarks!!
        
         | chessgecko wrote:
         | 1800 on the h100s is with 2/4 sparsity, it's half of that
         | without. Not sure if the tpu number is doing that too, but I
         | don't think 2/4 is used that heavily so I probably would
         | compare without it.
        
       | amelius wrote:
       | Are the Gemini models open?
        
         | stefan_ wrote:
         | Not even to Google customers most days, it seems.
        
         | jnwatson wrote:
         | Just the Gemma models are open.
        
       | blackeyeblitzar wrote:
       | So Google has Trillium, Amazon has Trainium, Apple is working on
       | a custom chip with Broadcom, etc. Nvidia's moat doesn't seem that
       | big.
       | 
       | Plus big tech companies have the data and customers and will
       | probably be the only surviving big AI training companies. I doubt
       | startups can survive this game - they can't afford the chips,
       | can't build their own, don't have existing products to leech data
       | off of, and don't have control over distribution channels like OS
       | or app stores
        
         | talldayo wrote:
         | > Nvidia's moat doesn't seem that big.
         | 
         | Well, look at it this way. Nvidia played their cards so well
         | that their competitors had to invent entirely new product
         | categories to supplant their demand for Nvidia hardware. This
         | new hardware isn't even reprising the role of CUDA, just the
         | subset of tensor operations that are used for training and AI
         | inference. If demand for training and inference wanes, these
         | hardware investments will be almost entirely wasted.
         | 
         | Nvidia's core competencies - scaling hardware up and down,
         | providing good software interfaces and selling direct to
         | consumer are not really assailed at all. The big lesson Nvidia
         | is giving to the industry is that you _should_ invest in
         | complex GPU architectures and write the software to support it.
         | Currently the industry is trying it 's hardest to reject that
         | philosophy, and only time will tell if they're correct.
        
           | felarof wrote:
           | > CUDA, just the subset of tensor operations that are used
           | for training and AI inference. If demand for training and
           | inference wanes
           | 
           | Interesting take, but why would demand for training and
           | inference wade? This seems like a very contrarian take.
        
             | talldayo wrote:
             | Maybe it won't - I say "time will tell" because we really
             | do not know how much LLMs will be demanded in 10 years.
             | Nvidia's stock skyrocketed because they were incidentally
             | prepared for an enormous increase in demand _the moment_ it
             | happened. Now that expectations are cooling down and Sam
             | Altman is signalling that AGI is a long ways off, the math
             | that justified designing NPU /TPU hardware in-house might
             | not add up anymore. Even if you believe in the tech itself,
             | the hype is cooling and the do-or-die moment is rapidly
             | approaching.
             | 
             | My overall point is that I think Nvidia played smartly from
             | the start. They could derive profit from any sufficiently
             | large niche their competitors were too afraid to exploit,
             | and general purpose GPU compute was the perfect investment.
             | With AMD, Apple and the rest of the industry focusing on
             | simpler GPUs, Nvidia was given an empty soapbox to market
             | CUDA with. The big question is whether demand for CUDA can
             | be supplanted with application-specific accelerators.
        
               | felarof wrote:
               | > The big question is whether demand for CUDA can be
               | supplanted with application-specific accelerators.
               | 
               | At least for AI workloads, Google's XLA compiler and the
               | JAX ML framework have reduced the need for something like
               | CUDA.
               | 
               | There are two main ways to train ML models today:
               | 
               | 1) Kernel-heavy approach: This is where frameworks like
               | PyTorch are used, and developers write custom kernels
               | (using Triton or CUDA) to speed up certain ops.
               | 
               | 2) Compiler-heavy approach: This uses tools like XLA,
               | which apply techniques like op fusion and compiler
               | optimizations to automatically generate fast, low-level
               | code.
               | 
               | NVIDIA's CUDA is a major strength in the first approach.
               | But if the second approach gains more traction, NVIDIA's
               | advantage might not be as important.
               | 
               | And I think the second approach has a strong chance of
               | succeeding, given that two massive companies--Google
               | (TPUs) and Amazon (Trainium)--are heavily investing in
               | it.
               | 
               | (PS: I'm also bit biased towards approach 2), we build
               | llama3 fine-tuning on TPU
               | https://github.com/felafax/felafax)
        
               | 01100011 wrote:
               | It's weird to me that folks think NVDA is just sitting
               | there, waiting for everyone to take their lunch. Yes, I'm
               | totally sure NVDA is completely blind to competition and
               | has chosen to sit on their cash rather than develop
               | alternatives...</s>
        
           | tada131 wrote:
           | > If demand for training and inference wanes, these hardware
           | investments will be almost entirely wasted
           | 
           | Nvidia also need to invent smth then, as pumping mining (or
           | giving good to gamers) again is not sexy. What's next? Will
           | we finally compute for drug development and achieve just as
           | great results as with chatbots?
        
             | talldayo wrote:
             | > Nvidia also need to invent smth then, as pumping mining
             | (or giving good to gamers) again is not sexy.
             | 
             | They do! Their research page is well worth checking out,
             | they wrote a lot of the fundamental papers that people cite
             | for machine learning today:
             | https://research.nvidia.com/publications
             | 
             | > Will we finally compute for drug development and achieve
             | just as great results as with chatbots?
             | 
             | Maybe - but they're not really analogous problem spaces.
             | Fooling humans with text is easy - Markov chains have been
             | doing it for decades. Automating the discovery of drugs and
             | research of proteins is not quite so easy, rudimentary
             | attempts like Folding@Home went on for years without any
             | breakthrough discoveries. It's going to take a lot more
             | research before we get to ChatGPT levels of success. But
             | tools like CUDA certainly help with this by providing
             | flexible compute that's easy to scale.
        
               | dekhn wrote:
               | There was nothing rudimentary about Folding@Home (either
               | in the MD engine or the MSM clustering method), and my
               | paper on GPCRs that used Folding@Home regularly gets
               | cites from pharma (we helped establish the idea that
               | treating proteins as being a single structure at the
               | global energy minimum was too simplistic to design
               | drugs). But F@H was never really a serious attempt at
               | drug discovery- it was intended to probe the underlying
               | physics of protein folding, which is tangentially
               | related.
               | 
               | In drug discovery, we'd love to be able to show that
               | virtual screening really worked- if you could do docking
               | against a protein to find good leads affordably, and also
               | ensure that the resulting leads were likely to pass FDA
               | review (IE, effective and non-toxic), that could
               | potentially greatly increase the rate of discovery.
        
         | llm_nerd wrote:
         | It seems this way, but we've been saying this for years and
         | years. And somehow nvidia keeps making more and more.
         | 
         | Isn't it telling when Google's release of an "AI" chip doesn't
         | include a single reference to nvidia or its products? They're
         | releasing it for general availability, for people to build
         | solutions on, so it's pretty weird that there isn't comparisons
         | to H100s et al. All of their comparisons are to their own prior
         | generations, which you do if you're the leader (e.g. Apple does
         | it with their chips), but it's a notable gap when you're a
         | contender.
        
           | jeffbee wrote:
           | Google posted TPUv6 results for a few things on MLCommons in
           | August. You can compare them to H100 over there, at least for
           | inference on stable diffusion xl.
           | 
           | Suspiciously there is a filter for "TPU-trillium" in the
           | training results table, but no result using such an
           | accelerator. Maybe results were there and later redacted, or
           | have been embargoed.
        
         | mlboss wrote:
         | The biggest barrier for any Nvidia competitor is that hackers
         | can run the models on their desktop. You don't need a cloud
         | provider specific model to do stuff locally.
        
           | r3trohack3r wrote:
           | This. I suspect consumer brands focusing on consumer hardware
           | are going to make a bigger dent in this space than cloud
           | vendors.
           | 
           | The future of AI is local, not remote.
        
       | Hilift wrote:
       | "we constantly strive to enhance the performance and efficiency
       | of our Mamba and Jamba language models."
       | 
       | ... "The growing importance of multi-step reasoning at inference
       | time necessitates accelerators that can efficiently handle the
       | increased computational demands."
       | 
       | Unlike others, my main concern with AI is any savings we got from
       | converting petroleum generating plants to wind/solar, it was
       | blasted away by AI power consumption months or even years ago.
       | Maybe Microsoft is on to something with the TMI revival.
        
         | beepbooptheory wrote:
         | This has been a constant thought for me as well. Like, the plan
         | from what I can tell is that we are going to start to spinning
         | all this stuff up every single time someone searches something
         | on google, or perhaps, when someone would _otherwise_ search
         | something on there.
         | 
         | Just that alone feels like an absolutely massive load to bear!
         | But its only a drop in the bucket compared to everything else
         | around this stuff.
         | 
         | But while I may be thirsty and hungry in the future, at least I
         | will (maybe) be able to know how many rs are in "strawberry".
        
         | r3trohack3r wrote:
         | Energy availability at this point appears (to me at least) to
         | be practically infinite. In the sense that it is technically
         | finite but not for any definition of the word that is
         | meaningful for Earth or humans at this stage.
         | 
         | I don't see our current energy production scaling to meet the
         | demands of AI. I see a lot of signals that most AI players feel
         | the same. From where I'm sitting, AI is already accelerating
         | energy generation to meet demand.
         | 
         | If your goal is to convert the planet to clean energy, AI seems
         | like one of the most effective engines for doing that. It's
         | going to drive the development of new technologies (like small
         | modular nuclear reactors) pushing down the cost of construction
         | and ownership. Strongly suspect that, in 50 years, the new
         | energy tech that AI drives development of will have rendered
         | most of our current energy infrastructure worthless.
         | 
         | We will abandon current forms of energy production not because
         | they were "bad" but because they were rendered inefficient.
        
       | peepeepoopoo95 wrote:
       | Can we please pop this insane Nvidia valuation bubble now?
        
       ___________________________________________________________________
       (page generated 2024-12-11 23:00 UTC)