[HN Gopher] Ironwood: The first Google TPU for the age of inference
___________________________________________________________________
Ironwood: The first Google TPU for the age of inference
Author : meetpateltech
Score : 324 points
Date : 2025-04-09 12:24 UTC (10 hours ago)
(HTM) web link (blog.google)
(TXT) w3m dump (blog.google)
| fancyfredbot wrote:
| It looks amazing but I wish we could stop playing silly games
| with benchmarks. Why compare fp8 performance in ironwood to
| architectures which don't support fp8 in hardware? Why leave out
| TPUv6 in the comparison?
|
| Why compare fp64 flops in the El Capitan supercomputer to fp8
| flops in the TPU pod when you know full well these are not
| comparable?
|
| [Edit: it turns out that El Capitan is actually faster when
| compared like for like and the statement below underestimated how
| much slower fp64 is, my original comment in italics below is not
| accurate] ( _The TPU would still be faster even allowing for the
| fact fp64 is ~8x harder than fp8. Is it worthwhile to
| misleadingly claim it 's 24x faster instead of honestly saying
| it's 3x faster? Really?_)
|
| It comes across as a bit cheap. Using misleading statements is a
| tactic for snake oil salesmen. This isn't snake oil so why lower
| yourself?
| shihab wrote:
| I went through the article and it seems you're right about the
| comparison with El Capitan. These performance figures are so
| bafflingly misleading.
|
| And so unnecessary too- nobody shopping for AI inference server
| cares at all about its relative performance vs a fp64 machine.
| This language seems designed solely to wow tech-illiterate
| C-Suites.
| cheptsov wrote:
| I think it's not misleading, but rather very clear that there
| are problems. v7 is compared to v5e. Also, notice that it's not
| compared to competitors, and the price isn't mentioned.
| Finally, I think the much bigger issue with TPU is the software
| and developer experience. Without improvements there, there's
| close to zero chance that anyone besides a few companies will
| use TPU. It's barely viable if the trend continues.
| latchkey wrote:
| The reference to El Capitan, is a competitor.
| cheptsov wrote:
| Are you suggesting NVIDIA is not a competitor?
| latchkey wrote:
| You said: "notice that it's not compared to competitors"
|
| The article says: "When scaled to 9,216 chips per pod for
| a total of 42.5 Exaflops, Ironwood supports more than 24x
| the compute power of the world's largest supercomputer -
| El Capitan - which offers just 1.7 Exaflops per pod."
|
| It is literally compared to a competitor.
| cheptsov wrote:
| I believe my original sentence was accurate. I was
| expecting the article to provide an objective comparison
| between TPUs and their main competitors. If you're
| suggesting that El Capitan is the primary competitor, I'm
| not sure I agree, but I appreciate the perspective.
| Perhaps I was looking for other competitors, which is why
| I didn't really pay attention to El Capitan.
| latchkey wrote:
| Andrey, this is what I'm referring to:
| https://news.ycombinator.com/item?id=43632709
| cheptsov wrote:
| Yea, makes sense
| sebzim4500 wrote:
| >Without improvements there, there's close to zero chance
| that anyone besides a few companies will use TPU. It's barely
| viable if the trend continues.
|
| I wonder whether Google sees this as a problem. In a way it
| just means more AI compute capacity for Google.
| mupuff1234 wrote:
| > besides a few companies will use TPU. It's barely viable if
| the trend continues
|
| That doesn't matter much of those few companies are the
| biggest companies. Even with Nvidia majority of the revenue
| is being generated by a handful of hyperscalers.
| imtringued wrote:
| Also, there is no such thing as a "El Capitan pod". The quoted
| number is for the entire supercomputer.
|
| My impression from this is that they are too scared to say that
| their TPU pod is equivalent to 60 GB200 NVL72 racks in terms of
| fp8 flops.
|
| I can only assume that they need way more than 60 racks and
| they want to hide this fact.
| jeffbee wrote:
| A max-spec v5p deployment, at least the biggest one they'll
| let you rent, occupies 140 racks, for reference.
| aaronax wrote:
| 8960 chips in those 140 racks. $4.20/hour/chip or
| $4,066/month/chip
|
| So $68k per hour or $27 million per month.
|
| Get 55% off with 3 year commitment.
| charcircuit wrote:
| >Why compare fp8 performance in ironwood to architectures which
| don't support fp8 in hardware?
|
| Because end users want to use fp8. Why should architectural
| differences matter when the speed is what matters at the end of
| the day?
| bobim wrote:
| GP bikes are faster than dirt bikes, but not on dirt. The
| context has some influence here.
| zipy124 wrote:
| Because it is a public company that aims to maximise
| shareholder value and thus the value of it's stock. Since value
| is largely evaluated by perception, if you can convince people
| your product is better than it is, your stock valuation, at
| least in the short term will be higher.
|
| Hence Tesla saying FSD and robo-taxis are 1 year away, the
| fusion companies saying fusion is closer than it is etc....
|
| Nvidia, AMD, apple and intel have all been publishing
| misleading graphs for decades and even under constant criticism
| they continue to.
| fancyfredbot wrote:
| I understand the value of perception.
|
| A big part of my issue here is that they've really messed up
| the misleading benchmarks.
|
| They've failed to compare to the most obvious alternative,
| which is Nvidia GPUs. They look like they've got something to
| hide, not like they're ahead.
|
| They've needlessly made their own current products look bad
| in comparison to this one understating the long-standing
| advantage TPUs have given Google.
|
| Then they've gone and produced a misleading comparison to the
| wrong product (who cares about El Capitan? I can't rent
| that!). This is a waste of credibility. If you are going to
| go with misleading benchmarks then at least compare to
| something people care about.
| segmondy wrote:
| Why not? If we line up to race. You can't say why compare v8 to
| v6 turbo or electric engine. It's a race, the drive train
| doesn't matter. Who gets to the finish line first?
|
| No one is shopping for GPU by fp8, fp16, fp32, fp64. It's all
| about cost/performance factor. 8 bits is as good as 32bits,
| great performance is even been pulled out of 4 bits...
| fancyfredbot wrote:
| This is like saying I'm faster because I ran (a mile) in 8
| minutes whereas it took you 15 minutes (to run two miles).
| fancyfredbot wrote:
| It's even worse than I thought. El Capitan has 43,808 MI300A
| APUs. According to AMD each MI300A can do 3922TF of sparse FP8
| for a total of 171EF sparse FP8 performance, or 85TF non-
| sparse.
|
| In other words El Capitan is between 2 and 4 times as fast as
| one of these pods, yet they claim the pod is 24x faster than El
| Capitan.
| adrian_b wrote:
| FP64 is more like 64 times harder than FP8.
|
| Actually the cost is even much higher, because the cost ratio
| is not much less than the square of the ratio between the sizes
| of the significands, which in this case is 52 bits / 4 bits =
| 13, and the square of 13 is 169.
| christkv wrote:
| Memory size and bandwidth goes up a lot right?
| dekhn wrote:
| Google shouldn't do that comparison. When I worked there I
| strongly emphasized to the TPU leadership to not compare their
| systems to supercomputers- not only were the comparisons
| misleading, Google absolutely does not want supercomputer users
| to switch to TPUs. SC users are demanding and require huge
| support.
| lawlessone wrote:
| Can these be repurposed for other things? Encoding/decoding
| video? Graphics processing etc?
|
| edit: >It's a move from responsive AI models that provide real-
| time information for people to interpret, to models that provide
| the proactive generation of insights and interpretation. This is
| what we call the "age of inference" where AI agents will
| proactively retrieve and generate data to collaboratively deliver
| insights and answers, not just data.
|
| maybe i will sound like a luddite but im not sure i want this.
|
| I'd rather AI/ML only do what i ask it to.
| vinkelhake wrote:
| Google already has custom ASICs for video transcoding. YouTube
| has been running those for many years now.
|
| https://streaminglearningcenter.com/encoding/asics-vs-softwa...
| lawlessone wrote:
| Thank you :)
| cavisne wrote:
| The JAX docs have a good explanation for how a TPU works
|
| https://docs.jax.dev/en/latest/pallas/tpu/details.html#what-...
|
| Its not really useful for other workloads (unless your workload
| looks like a bunch of matrix multiplications).
| no_wizard wrote:
| Some honest competition in the chip space in the machine learning
| race! Genuinely interested to see how this ends up playing out.
| Nvidia seemed 'untouchable' for so long in this space that its
| nice to see things get shaken up.
|
| I know they aren't selling the TPU as boxed units, but still,
| even as hardware that backs GCP services and what not, its
| interesting to see how it'll shake out!
| epolanski wrote:
| > Nvidia seemed 'untouchable' for so long in this space that
| its nice to see things get shaken up.
|
| Did it?
|
| Both Mistral's LeChat (running on Cerebras) and Google's Gemini
| (running on Tensors) have clearly showed ages ago Nvidia had no
| advantage at all in inference.
|
| The hundreds of billions spent in hardware till now focused on
| training, but inference is in the long run gonna get the lion
| share of the work.
| wyager wrote:
| > but inference is in the long run gonna get the lion share
| of the work.
|
| I'm not sure - might not the equilibrium state be that we are
| constantly fine-tuning models with the latest data (e.g.
| social media firehose)?
| nharada wrote:
| The first specifically designed for inference? Wasn't the
| original TPU inference only?
| jeffbee wrote:
| Yeah that made me chuckle, too. The original was indeed
| inference-only.
| dgacmu wrote:
| Yup. (Source: was at brain at the time.)
|
| Also holy cow that was 10 years ago already? Dang.
|
| Amusing bit: The first TPU design was based on fully connected
| networks; the advent of CNNs forced some design rethinking, and
| then the advent of RNNs (and then transformers) did it yet
| again.
|
| So maybe it's reasonable to say that this is the first TPU
| designed for inference in the world where you have both a
| matrix multiply unit and an embedding processor.
|
| (Also, the first gen was purely a co-processor, whereas the
| later generations included their own network fabric, a trait
| shared by this most recent one. So it's not totally crazy to
| think of the first one as a very different beast.)
| kleiba wrote:
| _> the advent of CNNs forced some design rethinking, and then
| the advent of RNNs (and then transformers) did it yet again._
|
| Certainly, RNNs are much older than TPUs?!
| woodson wrote:
| So are CNNs, but I guess their popularity heavily increased
| at that time, to the point where it made sense to optimize
| the hardware for them.
| hyhjtgh wrote:
| RNN was of course well known at the at time, but they
| werent putting out state of the art numbers at that time.
| miki123211 wrote:
| Wow, you guys needed a custom ASIC for inference _before CNNs
| were even invented_?
|
| What were the use cases like back then?
| refulgentis wrote:
| https://research.google/blog/the-google-brain-team-
| looking-b... is a good overview
|
| I wasn't on Brain, but got obsessed with Kerminology of ML
| internally at Google because I wanted to know why
| leadership was so gung ho on it.
|
| The general sense in the early days was these things can
| learn anything, and they'll replace fundamental units of
| computing. This thought process is best exhibited
| externally by ex. https://research.google/pubs/the-case-
| for-learned-index-stru...
|
| It was also a different Google, the "3 different teams
| working on 3 different chips" bit reminds me of lore re:
| how many teams were working on Android wearables until
| upper management settled it.
|
| FWIW it's a very, very, different company now. Back then it
| was more entrepreneurial. A better version of Wave-era,
| where things launch themselves. An MBA would find this top-
| down company in 2025 even _better_ , I find it less - it's
| perfectly tuned to do what Apple or OpenAI did 6-12 months
| ago, but not to lead - almost certainly a better
| investment, but a worse version of an average workplace,
| because it hasn't developed antibodies against BSing.
| (disclaimer: worked on Android)
| huijzer wrote:
| According to a Google blog post from 2016 [1], use-cases
| were RankBrain to improve the relevancy of search results
| and Street View. Also they used it for AlphaGo. And from
| what I remember from my MSc thesis, they also probably were
| starting to use it for Translate. I can't find any TPU
| reference in the Attention is All You Need or BERT: Pre-
| training of Deep Bidirectional Transformers for Language
| Understanding, but I have been fine-tuning BERT in a TPU at
| the time in okt 2018 [2]. If I remember correctly, the BERT
| example repository showed how to fit a model with a TPU
| inside a Colab. So I would guess that the natural language
| research was mostly not on TPU's around 2016-2018, but then
| moved over to TPU in production. I could be wrong though
| and dgacmu probably knows more.
|
| [1]: https://cloud.google.com/blog/products/ai-machine-
| learning/g...
|
| [2]: https://github.com/rikhuijzer/improv/blob/master/runs/
| 2018-1...
| mmx1 wrote:
| Yes, IIRC (please correct me if I'm wrong), translate did
| utilize Seastar (TPU v1) which was integer only, so not
| easily useful for training.
| dekhn wrote:
| As an aside, Google used CPU-based machine learning (using
| enormous numbers of CPUs) for a long time before custom
| ASICS or tensorflow even existed.
|
| The big ones were SmartASS (ads serving) and Sibyl
| (everything else serving). There was an internal debate
| over the value of GPUs with a prominent engineer writing an
| influential doc that caused Google continue with fat CPU
| nodes when it was clear that accelerators were a good
| alternative. This was around the time ImageNet blew up, and
| some eng were stuffing multiple GPUs in their dev boxes to
| demonstrate training speeds on tasks like voice
| recognition.
|
| Sibyl was a heavy user of embeddings before there was any
| real custom ASIC support for that and there was an add-on
| for TPUs called barnacore to give limited embedding support
| (embeddings are very useful for maximizing profit through
| ranking).
| theptip wrote:
| The phrasing is very precise here, it's the first TPU for _the
| age of inference_, which is a novel marketing term they have
| defined to refer to CoT and Deep Research.
| nehalem wrote:
| Not knowing much about special-purpose chips, I would like to
| understand whether chips like this would give Google a
| significant cost advantage over the likes of Anthropic or OpenAI
| when offering LLM services. Is similar technology available to
| Google's competitors?
| baby_souffle wrote:
| There are other ai/llm 'specific' chips out there, yes. But the
| thing about asics is that you need one for each *specific*
| task. Eventually we'll hit an equilibrium but for now, the
| stuff that Cerebras is best at is not what TPUs are best at is
| not what GPUs are best at...
| monocasa wrote:
| I don't even know if eventually we'll hit an equilibrium.
|
| The end of Moore's law pretty much dictates specialization,
| it's just more apparent in fields without as much
| ossification first.
| avrionov wrote:
| NVIDIA operates at 70% profit right now. Not paying that
| premium and having alternative to NVIDIA is beneficial. We just
| don't know how much.
| kccqzy wrote:
| I might be misremembering here, but Google's own AI models
| (Gemini) don't use NVIDIA hardware in any way, training or
| inference. Google bought a large number of NVIDIA hardware
| only for Google Cloud customers, not themselves.
| heymijo wrote:
| GPUs, very good for pretraining. Inefficient for inference.
|
| Why?
|
| For each new word a transformer generates it has to move the
| entire set of model weights from memory to compute units. For a
| 70 billion parameter model with 16-bit weights that requires
| moving approximately 140 gigabytes of data to generate just a
| single word.
|
| GPUs have off-chip memory. That means a GPU has to push data
| across a chip - memory bridge for every single word it creates.
| This architectural choice, is an advantage for graphics
| processing where large amounts of data needs to be stored but
| not necessarily accessed as rapidly for every single
| computation. It's a liability in inference where quick and
| frequent data access is critical.
|
| Listening to Andrew Feldman of Cerebras [0] is what helped me
| grok the differences. Caveat, he is a founder/CEO of a company
| that sells hardware for AI inference, so the guy is talking his
| book.
|
| [0]
| https://www.youtube.com/watch?v=MW9vwF7TUI8&list=PLnJFlI3aIN...
| hanska wrote:
| The Groq interview was good too. Seems that the thought
| process is that companies like Groq/Cerebras can run the
| inference, and companies like Nvidia can keep/focus on their
| highly lucrative pretraining business.
|
| https://www.youtube.com/watch?v=xBMRL_7msjY
| latchkey wrote:
| Cerebras (and Groq) has the problem of using too much die for
| compute and not enough for memory. Their method of scaling is
| to fan out the compute across more physical space. This takes
| more dc space, power and cooling, which is a huge issue.
| Funny enough, when I talked to Cerebras at SC24, they told me
| their largest customers are for training, not inference. They
| just market it as an inference product, which is even more
| confusing to me.
|
| I wish I could say more about what AMD is doing in this
| space, but keep an eye on their MI4xx line.
| heymijo wrote:
| > _they told me their largest customers are for training,
| not inference_
|
| That is curious. Things are moving so quickly right now. I
| typed out a few speculative sentences then went ahead and
| asked an LLM.
|
| Looks like Cerebras is responding to the market and
| pivoting towards a perceived strength of their product
| combined with the growth in inference, especially with the
| advent of reasoning models.
| latchkey wrote:
| I wouldn't call it "pivoting" as much as "marketing".
| ein0p wrote:
| Several incorrect assumptions in this take. For one thing, 16
| bit is not necessary. For another 140GB/token holds only if
| your batch size is 1 and your sequence length is 1 (no
| speculative decoding). Nobody runs LLMs like that on those
| GPUs - if you do it like that, compute utilization becomes
| ridiculously low. With batch of greater than 1 and
| speculative decoding arithmetic intensity of the kernels is
| much higher, and having weights "off chip" is not that much
| of a concern.
| xnx wrote:
| Google has a significant advantage over other hyperscalers
| because Google's AI data centers are much more compute cost
| efficient (capex and opex).
| claytonjy wrote:
| Because of the TPUs, or due to other factors?
|
| What even is an AI data center? are the GPU/TPU boxes in a
| different building than the others?
| xnx wrote:
| > Because of the TPUs, or due to other factors?
|
| Google does many pieces of the data center better. Google
| TPUs use 3D torus networking and are liquid cooled.
|
| > What even is an AI data center?
|
| Being newer, AI installations have more
| variations/innovation than traditional data centers.
| Google's competitors have not yet adopted all of Google's
| advances.
|
| > are the GPU/TPU boxes in a different building than the
| others?
|
| Not that I've read. They are definitely bringing on new
| data centers, but I don't know if they are initially
| designed for pure-AI workloads.
| nsteel wrote:
| Wouldn't a 3d torus network have horrible performance
| with 9,216 nodes? And really horrible latency? I'd have
| assumed traditional spine-leaf would do better. But I
| must be wrong as they're claiming their latency is great
| here. Of course, they provide zero actual evidence of
| that.
|
| And I'll echo, what even is an AI data center, because
| we're still none the wiser.
| xnx wrote:
| > what even is an AI data center
|
| A data center that runs significant AI training or
| inference loads. Non AI data centers are fairly
| commodity. Google's non-AI efficiency is not much better
| than Amazon or anyone else. Google is much more efficient
| at running AI workloads than anyone else.
| xadhominemx wrote:
| It's data center with much higher power density. We're
| talking about 100 going to 1,000 kw/rack vs 20 kw/rack
| for a traditional data center. Requiring much different
| cooling a power delivery.
| dekhn wrote:
| A 3d torus is a tradeoff in terms of wiring
| complexity/cost and performance. When node counts get
| high you can't really have a pair of wires between all
| pairs of nodes, so if you don't use a torus you usually
| need a stack of switches/routers aggregating traffic.
| Those mid-level and top-level switch/routers get very
| expensive (high bandwidth cross-section) and the routing
| can get a bit painful. 3d torus has far fewer cables, and
| the routing can be really simple ("hop vertically until
| you reach your row, then hop horizontally to read your
| node"), and the wrap-around connections are nice.
|
| That said, the torus approach was a gamble that most
| workloads would be nearest-neighbor, and allreduce needs
| extra work to optimize.
|
| An AI data center tends to have enormous power
| consumption and cooling capabilities, with less disk, and
| slightly different networking setups. But really it just
| means "this part of the warehouse has more ML chips than
| disks"
| summerlight wrote:
| Lots of other factors. I suspect this is one of the reasons
| why Google cannot offer TPU hardware itself out of their
| cloud service. A significant chunk of TPU efficiency can be
| attributed external factors which customers cannot easily
| replicate.
| pkaye wrote:
| Anthropic is using Google TPUs. Also jointly working with
| Amazon on a data center using Amazon's custom AI chips. Also
| Google and Amazon are both investors in Anthropic.
|
| https://www.datacenterknowledge.com/data-center-chips/ai-sta...
|
| https://www.semafor.com/article/12/03/2024/amazon-announces-...
| cavisne wrote:
| Nvidia has ~60% margins in their datacenter chips. So TPU's
| have quite a bit of headroom to save google money without being
| as good as Nvidia GPU's.
|
| No one else has access to anything similar, Amazon is just
| starting to scale their Trainium chip.
| buildbot wrote:
| Microsoft has the MAIA 100 as well. No comment on their
| scale/plans though.
| behnamoh wrote:
| The naming of these chips (GPUs, CPUs) is kinda badass: Ironwood,
| Blackwell, ThreadRipper, Epyc, etc.
| mikrl wrote:
| Scroll through wikichip sometime and try to figure out the
| Intel march names.
|
| I always confuse Blackwell with Bakewell (tart) and my CPU is
| on Coffee Lake and great... now I want coffee and cake
| qoez wrote:
| Post just to tease us since they barely sell TPUs
| throwaway48476 wrote:
| Its hard to be excited about hardware that will only exist in the
| cloud before shredding.
| p_j_w wrote:
| I think this article is for Wall Street, not Silicon Valley.
| noitpmeder wrote:
| What's their use case?
| fennokin wrote:
| As in for investor sentiment, not literally finance
| companies.
| amelius wrote:
| Gambling^H^H^H^H Making markets more "efficient".
| mycall wrote:
| Bad timing as I think Wall Street is preoccupied at the
| moment.
| asdfman123 wrote:
| Oh, believe me, they are very much paying attention to tech
| stocks right now.
| jeffbee wrote:
| Ogg no care multi-axis computer-numerical machine center. Ogg
| no space Ogg cave for nonsense. Ogg bang rock scrape hide.
| CursedSilicon wrote:
| Please don't make low-effort bait comments. This isn't Reddit
| crazygringo wrote:
| You can't get excited about lower prices for your cloud GPU
| workloads thanks to the competition it brings to Nvidia?
|
| This benefits everyone, even if you don't use Google Cloud,
| because of the competition it introduces.
| 01HNNWZ0MV43FF wrote:
| I like owning things
| sodality2 wrote:
| Cloud will buy less NVDA chips, and since they're related
| goods, prices will drop.
| xadhominemx wrote:
| You own any GB200s?
| baobabKoodaa wrote:
| You will own nothing and you will be happy.
| throwaway48476 wrote:
| It's only competitive with nvidia if you believe Google won't
| kill this product like everything else.
| maxrmk wrote:
| I love to hate on google, but I suspect this is strategic
| enough that they wont kill it.
|
| Like graviton at AWS its as much of a negotiation tool as
| it is a technical solution, letting them push harder with
| NVIDIA on pricing because they have a backup option.
| mmx1 wrote:
| Google has done stuff primarily for negotiation purposes
| (e.g. POWER9 chips) but TPU ain't one. It's not a backup
| option or presumed "inferior solution" to NVIDIA. Their
| entire ecosystem is TPU-first.
| joshuamorton wrote:
| Google's been doing custom ML accelerators for 10 years
| now, and (depending on how much you're willing to stretch
| the definition) has been doing them in consumer hardware
| for soon to be five years (the Google Tensor chips in pixel
| phones).
| justanotheratom wrote:
| exactly. I wish Groq would start selling their cards that they
| use internally.
| xadhominemx wrote:
| They would lose money on every sale
| foota wrote:
| Personally, I have a (non-functional) TPU sitting on my desk at
| home :-)
| fluidcruft wrote:
| This isn't anything anyone can purchase, is it? Who's the
| audience for this announcement?
| badlucklottery wrote:
| > Who's the audience for this announcement?
|
| Probably whales who can afford to rent one from Google Cloud.
| jeffbee wrote:
| People with $3 are whales now? TPU prices are similar to
| other cloud resources.
| dylan604 wrote:
| Does anyone do anything useful with a $3 spend, or is it $3
| X $manyManyHours?
| scarmig wrote:
| No one does anything useful with a $3 spend. That's not
| anything particular to TPUs, though.
| dylan604 wrote:
| That's my point. The touting of $3 is beyond misleading.
| fancyfredbot wrote:
| You can do real work for a few hundred dollars which is
| hardly the exclusive domain of "whales"?
|
| The programmer who writes code to run on these likely
| costs at least 15x this amount an hour.
| MasterScrat wrote:
| An on-demand v5e-1 is $1.2/h, it's pretty accessible.
|
| The challenge is getting them to run efficiently, which
| typically involves learning JAX.
| llm_nerd wrote:
| The overwhelming majority of AI compute is by either the few
| bigs in their own products, or by third parties that rent out
| access to compute resources from those same bigs. Extremely few
| AI companies are buying their own GPU/TPU buildouts.
|
| Google says Ironwood will be available in the Google Cloud late
| this year, so it's relevant to just about anyone that rents AI
| compute, which is just about everyone in tech. Even if you have
| zero interest in this product, it will likely lead to downward
| pressure on pricing, mostly courtesy of the large memory
| allocations.
| fluidcruft wrote:
| It just seems like John Deere putting out a press-release
| about about a new sparkplug that is only useful to John Deere
| and can maybe be used on rented John Deere harvesters when
| sharecropping on John Deere-owned fields using John Deere GMO
| crops. I just don't see what's appealing about any of it. Not
| only is it a walled garden, you can't even own anything and
| are completely dependent on the whims of John Deere to not
| bulldoze the entire field.
|
| It just seems like if you build on Tensor then sure, you can
| go home, but Google will keep your ball.
| aseipp wrote:
| The reality is that for large scale AI deployment there's
| only one criterion that matters: what is the total cost of
| ownership? If TPUs are 1/30th the total perf but 1/50th the
| total price, then they will be bought by customers.
| Basically that simple.
|
| Most places using AI hardware don't actually want to expend
| massive amounts of capital to procure it and then shove it
| into racks somewhere and then manage it over its total
| lifetime. Hyperscalers like Google are also far, far ahead
| in things like DC energy efficiency, and at really large
| scale those energy costs are huge and have to be factored
| into the TCO. The long dominant cost of this stuff is all
| operational expenditures. Anyone running a physical AI
| cluster is going to have to consider this.
|
| The walled garden stuff doesn't matter, because places
| demanding large-scale AI deployments (and actually willing
| to spend money on it) do not really have the same
| priorities as HN homelabbers who want to install
| inefficient 5090s so they can run Ollama.
| fluidcruft wrote:
| At large scales why shouldn't it matter whether you're
| beholden to Google's cloud only vs having options to use
| AWS or Oracle or Azure etc. There's maybe an argument to
| be made about price and efficiency of Google's data
| centers, but Google's cloud is far from notably cheaper
| than alternatives (to put it mildly) so that's a moot
| point if there's any efficiencies to be had Google's
| pocketing it themselves. I just don't see why anyone
| should care about this chip except Google themselves. It
| would be a different story if we were talking about a
| chip that had the option of being available in non-Google
| data centers.
| xhkkffbf wrote:
| People who buy their stock.
| avrionov wrote:
| The audience is Google cloud customers + investors
| _hark wrote:
| Can anyone comment on where efficiency gains come from these days
| at the arch level? I.e. not process-node improvements.
|
| Are there a few big things, many small things...? I'm curious
| what fruit are left hanging for fast SIMD matrix multiplication.
| yeahwhatever10 wrote:
| Specialization. Ie specialized for inference.
| vessenes wrote:
| One big area the last two years has been algorithmic
| improvements feeding hardware improvements. Supercomputer folks
| use f64 for everything, or did. Most training was done at f32
| four years ago. As algo teams have shown fp8 can be used for
| training and inference, hardware has updated to accommodate,
| yielding big gains.
|
| NB: Hobbyist, take all with a grain of salt
| jmalicki wrote:
| Unlike a lot of supercomputer algorithms, where fp error
| accumulates as you go, gradient descent based algorithms
| don't need as much precision since any fp errors will still
| show up at the next loss function calculation to be
| corrected, which allows you to make do with much lower
| precision.
| muxamilian wrote:
| In-memory computing (analog or digital). Still doing SIMD
| matrix multiplication but using more efficient hardware:
| https://arxiv.org/html/2401.14428v1
| https://www.nature.com/articles/s41565-020-0655-z
| gautamcgoel wrote:
| This is very interesting, but not what the Ironside TPU is
| doing. The blog post says that the TPU uses conventional HBM
| RAM.
| nsteel wrote:
| There's been some talk/rumour of next-gen HBMs having some
| compute capability on the base die. But again, not what
| they're doing here, this is regular HBM3/HBM3e.
|
| https://semiengineering.com/speeding-down-memory-lane-
| with-c...
| vessenes wrote:
| 7.2 Terabit/s HBM Bandwidth raised my eyebrows. But then I
| googled, and it looks like GB200 is 16Tb/s. In plebe land, 2Tb is
| pretty awesome.
|
| These continue to be mostly for bragging rights and strategic
| safety I think. I bet they are not on premium processor nodes; If
| I worked at GOOG I'd probably think about these as competitive
| insurance vis-a-vis NVIDIA -- total costs of chip team, software,
| tape outs, and increased data center energy use probably wipe out
| any savings from not buying NV, but you are 100% not beholden to
| Jensen.
| gigel82 wrote:
| I was hoping they're launching a Coral kind of device that can
| run locally and cheaply, with updated specs.
|
| It would be awesome for things like homelabs (to run Frigate NVR,
| Immich ML tasks or the Home Assistant LLM).
| GrumpyNl wrote:
| Why doesnt google offer the most advanced voice technology when
| they offer a playback version, it still sounds like the most
| basic text to voice.
| tuna74 wrote:
| How is API story for these devices? Are the drivers mainlined in
| Linux? Is there a specific API you use to code for them? How does
| the instance you rent on Google Cloud look and what does that
| have for software?
| cbarrick wrote:
| XLA (Accelerated Linear Algebra) [1] is likely the library that
| you'll want to use to code for these machines.
|
| TensorFlow, PyTorch, and Jax all support XLA on the backend.
|
| [1]: https://openxla.org/
| g42gregory wrote:
| And ... where could we get one? If they wouldn't sell it anyone,
| then is this a self-congratulation story? Why do we even need to
| know about this? If it propagates to the lower Gemini prices,
| fantastic. If not, then isn't it kind of irrelevant for the
| actual user experience?
| lordofgibbons wrote:
| You can rent it on GCP in a few months
| g42gregory wrote:
| Good point. At what prices per GB/TOPS? Better be lower than
| the existing TPUs ... That's what I care about.
| jstummbillig wrote:
| Well, with stocks and all, there is more that matters in the
| world than "actual user experience"
| DeathArrow wrote:
| Cool. But does it support CUDA?
| wg0 wrote:
| Can anyone buy them?
| ein0p wrote:
| God damn it, Google. Make a desktop version of these things.
| DisjointedHunt wrote:
| Cloud resources are trending towards consumer technology adoption
| numbers rather than being reserved mostly for Enterprise. This is
| the most exciting thing in decades!
|
| There is going to be a GPU/Accelerator shortage for the
| foreseeable future to run the most advanced models, Gemini 2.5
| Pro is such a good example. It is probably the first model that
| many developers i've considered skeptics of extended agent use
| have started to saturate free token thresholds on.
|
| Grok is honestly the same, but the lack of an API is suggestive
| of the massive demand wall they face.
| attentive wrote:
| anyone knows how this compares to AWS Inferentia chips?
| aranw wrote:
| I wonder if these chips might contribute towards advancements for
| the Coral TPU chips?
___________________________________________________________________
(page generated 2025-04-09 23:00 UTC)