hngopher.com

       [HN Gopher] TPU Deep Dive
       ___________________________________________________________________
        
       TPU Deep Dive
        
       Author : transpute
       Score  : 354 points
       Date   : 2025-06-22 02:51 UTC (20 hours ago)
        
 (HTM) web link (henryhmko.github.io)
 (TXT) w3m dump (henryhmko.github.io)
        
       | almostgotcaught wrote:
       | > In essence, caches allow hardware to be flexible and adapt to a
       | wide range of applications. This is a large reason why GPUs are
       | very flexible hardware (note: compared to TPUs).
       | 
       | this is correct but mis-stated - it's not the caches themselves
       | that cost energy but MMUs that automatically load/fetch/store to
       | cache on "page faults". TPUs don't have MMUs and furthermore are
       | a push architecture (as opposed to pull).
        
       | RossBencina wrote:
       | Can you suggest a good reference for understanding which
       | algorithms map well onto the regular grid systolic arrays used by
       | TPUs? The fine article says dese matmul and convolution are good,
       | but is there anything else? Eigendecomposition? SVD? matrix
       | exponential? Solving Ax = b or AX = B? Cholesky?
        
         | WithinReason wrote:
         | Anything that you can express as 128x128 (but ideally much
         | larger) dense matrix multiplication and nothing else
        
         | musebox35 wrote:
         | I think https://jax-ml.github.io/scaling-book/ is one of the
         | best references to go through. It details how single device and
         | distributed computations map to TPU hardware features. The
         | emphasis is on mapping the transformer computations, both
         | forwards and backwards, so requires some familiarity with how
         | transformer networks are structured.
        
         | cdavid wrote:
         | SVD/eigendecomposition will often boil down to making many
         | matmul (e.g. when using Krylov-based methods, e.g. Arnoldi,
         | Krylov-schur, etc.), so I would expect TPU to work well there.
         | GMRES, one method to solve Ax = b is also based on Arnoldi
         | decomp.
        
       | serf wrote:
       | does that cooling channel have a NEMA stepper on it as a pump or
       | metering valve?[0]
       | 
       | If so, wild. That seems like overkill.
       | 
       | [0]: https://henryhmko.github.io/posts/tpu/images/tpu_tray.png
        
         | fellowmartian wrote:
         | definitely closed-loop, might even be a servo
        
       | frays wrote:
       | How can someone have this level of knowledge about TPUs without
       | working at Google?
        
         | musebox35 wrote:
         | From the acknowledgment at the end, I guess the author has
         | access to TPUs through https://sites.research.google/trc/about/
         | 
         | This is not the only way though. TPUs are available to
         | companies operating on GCP as an alternative to GPUs with a
         | different price/performance point. That is another way to get
         | hands-on experience with TPUs.
        
           | erwincoumans wrote:
           | A quick free way to access TPUs is through
           | https://colab.research.google.com, Runtime / Change Runtime
           | Type / v2-8 TPU
        
         | ipsum2 wrote:
         | Everything thats in the blog post is basically well known
         | already. Google publishes papers and gives talks about their
         | TPUs. Many details are lacking though, and require some
         | assumptions/best guesses. Jax and XLA are (partially) open
         | source and give clues about how TPUs work under the hood as
         | well.
         | 
         | https://arxiv.org/abs/2304.01433
         | 
         | https://jax-ml.github.io/scaling-book/
        
       | ariwilson wrote:
       | Cool article!
        
       | sgt101 wrote:
       | ELI5: how (specifically) do GPU and TPU optimisations effect
       | determinism in LLMs? Or is this all a myth?
        
         | barrkel wrote:
         | LLMs are generally deterministic. The token sampling step is
         | usually randomized to some degree because it gets better
         | results (creativity) and helps avoid loops, but you can turn
         | that off (temp zero for simple samplers).
        
           | perching_aix wrote:
           | + can also just pin the seed instead, right?
        
           | sgeisenh wrote:
           | This is an oversimplification. When distributed, the
           | nondeterministic order of additions during reductions can
           | produce nondeterministic results due to floating point error.
           | 
           | It's nitpicking for sure, but it causes real challenges for
           | reproducibility, especially during model training.
        
           | Der_Einzige wrote:
           | This belief (LLMs are deterministic except for samplers) is
           | very wrong and will get you into hilariously large amounts of
           | trouble for assuming it's true.
           | 
           | Also greedy sampling considered harmful:
           | https://arxiv.org/abs/2506.09501
           | 
           | From the abstract:
           | 
           | "For instance, under bfloat16 precision with greedy decoding,
           | a reasoning model like DeepSeek-R1-Distill-Qwen-7B can
           | exhibit up to 9% variation in accuracy and 9,000 tokens
           | difference in response length due to differences in GPU
           | count, type, and evaluation batch size. We trace the root
           | cause of this variability to the non-associative nature of
           | floating-point arithmetic under limited numerical precision.
           | This work presents the first systematic investigation into
           | how numerical precision affects reproducibility in LLM
           | inference. Through carefully controlled experiments across
           | various hardware, software, and precision settings, we
           | quantify when and how model outputs diverge. Our analysis
           | reveals that floating-point precision--while critical for
           | reproducibility--is often neglected in evaluation practices."
        
             | sgt101 wrote:
             | Great reference - thanks.
        
             | marcinzm wrote:
             | Does this apply to TPUs or just GPUs?
        
         | jpgvm wrote:
         | They don't affect determinism of the results but different
         | architectures have different determinism guarantees with
         | respect to performance, as a result of scheduling and other
         | things.
         | 
         | TPUs share a similar lineage to the Groq TPU accelerators
         | (disclaimer: I work at Groq) which are actually fully
         | deterministic which means not only do you get deterministic
         | output, you get it in a deterministic number of cycles.
         | 
         | There is a trade off though, making the hardware deterministic
         | means you give up HW level scheduling and other sources of non-
         | determinism. This makes the architecture highly dependent on a
         | "sufficiently smart compiler". TPUs and processors like them
         | are generally considered VLIW and are all similarly dependent
         | on the compiler doing all the smart scheduling decisions
         | upfront to ensure good compute/IO overlap and eliminating
         | pipeline bubbles etc.
         | 
         | GPUs on the other hand have very sophisticated scheduling
         | systems on the chips themselves along with stuff like kernel
         | swapping etc that make them much more flexible, less dependent
         | on the compiler and generally easier to reach a fairly high
         | utilisation of the processor without too much work.
         | 
         | TLDR: TPUs MAY have deterministic cycle guarantees. GPUs (of
         | the current generation/architectures) cannot because they use
         | non-deterministic scheduling and memory access patterns. Both
         | still produce deterministic output for deterministic programs.
        
           | sgt101 wrote:
           | This is gold dust. Thank you for taking the time to share
           | your knowledge.
        
       | lanthissa wrote:
       | can someone help me understand how the following can be true:
       | 
       | 1. TPU's are a serious competitor to nvidia chips.
       | 
       | 2. Chip makers with the best chips are valued at 1-3.5T.
       | 
       | 3. Google's market cap is 2T.
       | 
       | 4. It is correct for google to not sell TPU's.
       | 
       | i have heard the whole, its better to rent them thing, but if
       | they're actually good selling them is almost as good a business
       | as every other part of the company.
        
         | smokel wrote:
         | Selling them and supporting that in the field requires quite
         | some infrastructure you'd have to build. Why go through all
         | that trouble if you already make higher margins renting them
         | out?
         | 
         | Also, if they are so good, it's best to not level the playing
         | field by sharing that with your competitors.
         | 
         | Also "chip makers with the best chips" == Nvidia, there aren't
         | many others. And Alphabet does more than just produce TPUs.
        
         | dismalaf wrote:
         | Nvidia is selling a ton of chips on hype.
         | 
         | Google is saving a ton of money by making TPUs, which will pay
         | off in the future when AI is better monetized, but so far no
         | one is directly making a massive profit from foundation models.
         | It's a long term play.
         | 
         | Also, I'd argue Nvidia is massively overvalued.
        
           | CalChris wrote:
           | Common in gold rushes but then they are selling chips. Are
           | they overvalued? Maybe. Are they profitable (something WeWork
           | and Uber aren't) ? Yes, quite.
        
         | rwmj wrote:
         | Aren't Google's TPUs a bit like a research project with
         | practical applications as a nice side effect?
        
           | hackernudes wrote:
           | Why do you say that? They are on their seventh iteration of
           | hardware and even from the beginning (according to the
           | article) they were designed to serve Google AI needs.
           | 
           | My take is "sell access to TPUs on Google cloud" is the nice
           | side effect.
        
           | surajrmal wrote:
           | On what basis do you make that claim? It's incredibly
           | misleading and wrong.
        
           | silentsea90 wrote:
           | All of Google ML runs on TPUs tied to $ billions in revenue.
           | You make it sound like TPUs are a Google X startup that's
           | going to get killed tomorrow.
        
         | mft_ wrote:
         | If they think they've got a competitive advantage vs. GPUs
         | which benefits one of their core products, it would make sense
         | to retain that competitive advantage for the long term, no?
        
           | Uehreka wrote:
           | No. If they sell the TPUs for "what they're worth", they get
           | to reap a portion of the benefit their competitors would get
           | from them. There's money they could be making that they
           | aren't.
           | 
           | Or rather, there would be if TPUs were that good in practice.
           | From the other comments it sounds like TPUs are difficult to
           | use for a lot of workloads, which probably leads to the real
           | explanation: No one wants to use them as much as Google does,
           | so selling them for a premium price as I mentioned above
           | won't get them many buyers.
        
         | radialstub wrote:
         | I believe Broadcom is also very involved in the making of the
         | TPU's and networking infrastructure and they are valued at 1.2T
         | currently. Maybe consider the combined value of Broadcom and
         | Google.
        
           | lftl wrote:
           | Wouldn't you also need to add TSMC to Nvidia's side in that
           | case?
        
             | radialstub wrote:
             | Broadcom is fabless. I think they aid in hardware design,
             | while google mostly does the software stack. Nvidia does
             | both hardware and software stack.
        
             | santaboom wrote:
             | Not sure what you mean. Who do you think fabs broadcomm and
             | google chips
        
               | lftl wrote:
               | Ah, I didn't realize broadcomm was fabless and only
               | helping in design.
        
         | Velorivox wrote:
         | Wall street undervalued Google even on day one (IPO). Bezos has
         | said that some of the times the stock had been doing the worst
         | were when the company was doing great.
         | 
         | So, to help you understand how they can be true: market cap is
         | governed by something other than what a business is worth.
         | 
         | As an aside, here's a fun article that embarrasses wall street.
         | [0]
         | 
         | [0] https://www.nbcnews.com/id/wbna15536386
        
         | jeffbee wrote:
         | Like other Google internal technologies, the amount of custom
         | junk you'd need to support to use a TPU is pretty extreme, and
         | the utility of the thing without the custom junk is
         | questionable. You might as well ask why they aren't marketing
         | their video compression cards.
        
         | michaelt wrote:
         | nvidia, who make AI chips with kinda good software support, and
         | who have sales reflecting that, is worth 3.5T
         | 
         | google, who make AI chips with barely-adequate software, is
         | worth 2.0T
         | 
         | AMD, who also make AI chips with barely-adequate software, is
         | worth 0.2T
         | 
         | Google made a few decisions with TPUs that might have made
         | business sense at the time, but with hindsight haven't helped
         | adoption. They closely bound TPUs with their 'TensorFlow 1'
         | framework (which was kinda hard to use) then they released
         | 'TensorFlow 2' which was incompatible enough it was just as
         | easy to switch to PyTorch, which has TPU support in theory but
         | not in practice.
         | 
         | They also decided TPUs would be Google Cloud only. Might make
         | sense, if they need water cooling or they have special power
         | requirements. But it turns out the sort of big corporations
         | that have multi-cloud setups and a workload where a 1.5x
         | improvement in performance-per-dollar is worth pursuing aren't
         | big open source contributors. And understandably, the academics
         | and enthusiasts who are giving their time away for free aren't
         | eager to pay Google for the privilege.
         | 
         | Perhaps Google's market cap already reflects the value of being
         | a second-place AI chipmaker?
        
           | que-encrypt wrote:
           | jax is very much a working (and in my view better, aside from
           | the lack of community) software support. Especially if you
           | use their images (which they do). > > Tensorflow They have
           | been using jax/flax/etc rather than tensorflow for a while
           | now. They don't really use pytorch from what I see on the
           | outside from their research works. For instance, they
           | released siglip/siglip2 with flax linen:
           | https://github.com/google-research/big_vision
           | 
           | TPUs very much have software support, hence why SSI etc use
           | TPUs.
           | 
           | P.S. Google gives their tpus for free at:
           | https://sites.research.google/trc/about/, which I've used for
           | the past 6 months now
        
             | throwaway314155 wrote:
             | > They have been using jax/flax/etc rather than tensorflow
             | for a while now
             | 
             | Jax has a harsher learning curve than Pytorch in my
             | experience. Perhaps it's worth it (yay FP!) but it doesn't
             | help adoption.
             | 
             | > They don't really use pytorch from what I see on the
             | outside from their research works
             | 
             | Of course not, there is no outside world at Google - if
             | internal tooling exists for a problem their culture
             | effectively mandates using that before anything else, no
             | matter the difference in quality. This basically explains
             | the whole TF1/TF2 debacle which understandably left a poor
             | taste in people's mouths. In any case while they don't use
             | Pytorch, the rest of us very much do.
             | 
             | > P.S. Google gives their tpus for free at:
             | https://sites.research.google/trc/about/, which I've used
             | for the past 6 months now
             | 
             | Right and in order to use it effectively you basically have
             | to use Jax. Most researchers don't have the advantage of
             | free compute so they are effectively trying to buy
             | mindshare rather than winning on quality. This is fine, but
             | it's worth repeating as it biases the discussion heavily -
             | many proponents of Jax just so happen to be on TRC or have
             | been given credits for TPU's via some other mechanism.
        
         | roughly wrote:
         | Aside from the specifics of Nvidia vs Google, one thing to note
         | regarding company valuations is that not all parts of the
         | company are necessarily additive. As an example (read: a thing
         | I'm making up), consider something like Netflix vs Blockbuster
         | back in the early days - once Blockbuster started to also ship
         | DVDs, you'd think it'd obviously be worth more than Netflix,
         | because they've got the entire retail operation as well, but
         | that presumes the retail operation is actually a long-term
         | asset. If Blockbuster has a bunch of financial obligations
         | relating to the retail business (leases, long-term agreements
         | with shippers and suppliers, etc), it can very quickly wind up
         | that the retail business is a substantial drag on Blockbuster's
         | valuation, as opposed to something that makes it more valuable.
        
         | santaboom wrote:
         | Good questions, below I attempt to respond to each point then
         | wrap it up. TLDR: even if TPU is good (and it is good for
         | Google) it wouldn't be "almost as good a business as every
         | other part of their company" because the value add isn't FROM
         | Google in the form of a good chip design(TPU). Instead the
         | value add is TO Google in form of specific compute (ergo) that
         | is cheap and fast FROM relatively simple ASICs(TPU chip)
         | stitched together into massively complex systems (TPU super
         | pods).
         | 
         | If interesting in further details:
         | 
         | 1) TPUs are a serious competitor to Nvidia chips for Google's
         | needs, per the article they are not nearly as flexible as a GPU
         | (dependence on precompiled workloads, high usage of PEs in
         | systolic array). Thus for broad ML market usage, they may not
         | be competitive with Nvidia gpu/rack/clusters.
         | 
         | 2)chip makers with the best chips are not valued at 1-3.5T, per
         | other comments to OC only Nvidia and Broadcomm are worth this
         | much. These are not just "chip makers", they are (the best)
         | "system makers" driving designs for chips and interconnect
         | required to go from a diced piece of silicon to a data center
         | consuming MWs. This part is much harder, this is why Google
         | (who design TPU) still has to work with Broadcomm to integrate
         | their solution. Indeed every hyperscalar is designing chips and
         | software for their needs, but every hyperscalar works with
         | companies like Broadcomm or Marvell to actually create a
         | complete competitive system. Side note, Marvell has deals with
         | Amazon, Microsoft and Meta to mostly design these systems they
         | are worth "only" 66B. So, you can't just design chips to be
         | valuable, you have to design systems. The complete systems have
         | to be the best, wanted by everyone (Nvidia, Broadcomm) in order
         | to be in Ts, otherwise you're in Bs(Marvell).
         | 
         | 4. I see two problems with selling TPU, customers and margins.
         | If you want to sell someone a product, it needs to match their
         | use, currently the use only matches Google's needs so who are
         | the customers? Maybe you want to capture hyperscalars / big AI
         | labs, their use case is likely similar to google. If so,
         | margins would have to be thin, otherwise they just work
         | directly with Broadcomm/Marvell(and they all do). If Google
         | wants everyone using cuda /Nvidia as a customer then you
         | massively change the purpose of TPU and even Google.
         | 
         | To wrap up, even if TPU is good (and it is good for Google) it
         | wouldn't be "almost as good a business as every other part of
         | their company" because the value add isn't FROM Google in the
         | form of a good chip design(TPU). Instead the value add is TO
         | Google in form of specific compute (ergo) that is cheap and
         | fast FROM relatively simple ASICs(TPU chip) stitched together
         | into massively complex systems (TPU super pods).
         | 
         | Sorry that got a bit long winded, hope it's helpful!
        
           | throwaway31131 wrote:
           | This also all assumes that there is excess foundry capacity
           | in the world for Google to expand into, which is not obvious.
           | One would need exceptionally good operations to compete here
           | and that has never been Google's forte.
           | 
           | https://www.tomshardware.com/tech-industry/artificial-
           | intell...
           | 
           | "Nvidia to consume 77% of wafers used for AI processors in
           | 2025: Report...AWS, AMD, and Google lose wafer share."
        
         | matt-p wrote:
         | AMD and even people like Huawei also make somewhat acceptable
         | chips but using them is a bit of a nightmare. Is it a similar
         | thing here? Using TPUs is more difficult, only exists inside
         | Google cloud etc
        
         | epolanski wrote:
         | > can someone help me understand how the following can be true
         | 
         | You're conflating price with intrinsic value with market
         | analysis. All different things.
        
         | foota wrote:
         | 5. The efficient market hypothesis is true :-)
        
       | cdg007 wrote:
       | What will competitors say?
        
       | b0a04gl wrote:
       | tpu's predictable latency under scale. when you control the
       | compiler, the runtime, the interconnect and the chip, you
       | eliminate so much variance that you can actually schedule jobs
       | efficiently at data center scale. so the obvious question why
       | haven't we seen anyone outside Google replicate this full
       | vertical stack yet? is it because the hardware's hard or because
       | no one has nailed the compiler/runtime contract at that scale?
        
         | kevindamm wrote:
         | you mean other than NVidia and AMD?
        
         | transpute wrote:
         | Groq, https://en.wikipedia.org/wiki/Groq &
         | https://news.ycombinator.com/item?id=44345738
        
       | Neywiny wrote:
       | What's not mentioned is a comparison vs FPGAs. You can have a
       | systolic, fully pipelined system for any data processing not just
       | vectorized SIMD. Every primitive is able to work independently of
       | everything else. For example, if you have 240 DSP slices (which
       | is far from outrageous on low scale), a perfect design could use
       | those as 240 cores at 100% throughput. No memory, caching,
       | decoding, etc overhead.
        
         | adrian_b wrote:
         | True, but FPGAs are suitable only for things that will not be
         | produced in great numbers, because their cost and power
         | consumption are many times higher than those of an ASIC.
         | 
         | For a company of the size of Google, the development costs for
         | a custom TPU are quickly recovered.
         | 
         | Comparing a Google TPU with an FPGA is like comparing an
         | injection-moulded part with a 3D-printed part.
         | 
         | Unfortunately, the difference in performance between FPGAs and
         | ASICs has greatly increased in recent years, because the FPGAs
         | have remain stuck on relatively ancient CMOS manufacturing
         | processes, which are much less efficient than the state-of-the-
         | art CMOS manufacturing processes.
        
           | Neywiny wrote:
           | When you can ASIC, yes, do an ASIC. But my point was that
           | there was a lot of GPU comparison. GPUs are also not ASICs
           | relative to AI.
        
             | QuadmasterXLII wrote:
             | They're close, they're basically matmul asics
        
               | Neywiny wrote:
               | Arguably so are the DSP heavy FPGAs. And the unused logic
               | will have a minimal static power draw relative to the
               | unused but clocked G-only parts of the GPU.
        
               | daxfohl wrote:
               | I have to imagine google considered this and decided
               | against it. I assume it's that all the high-perf matmul
               | stuff needs to be ASIC'd out to get max performance,
               | quality heat dissipation, etc. And for anything
               | reconfigurable, a CPU-based controller or logic chip is
               | sufficient and easier to maintain.
               | 
               | FPGA's kind of sit in this very niche middle ground. Yes
               | you can optimize your logic so that the FPGA does exactly
               | the thing that your use case needs, so your hardware maps
               | more precisely to your use case than a generic TPU or GPU
               | would. But what you gain in logic efficiency, you'll lose
               | several times over in raw throughput to a generic TPU or
               | GPU, at least for AI stuff which is almost all matrix
               | math.
               | 
               | Plus, getting that efficiency isn't easy; FPGAs have a
               | higher learning curve and a way slower dev cycle than
               | writing TPu or GPU apps, and take much longer to compile
               | and test than CUDA code, especially when they get dense
               | and you have to start working around gate timing
               | constraints and such. It's easy to get to a point where
               | even a tiny change can exceed some timing constraint and
               | you've got to rewrite a whole subsystem to get it to
               | synthesize again.
        
           | c-c-c-c-c wrote:
           | fpga's are not expensive when ordered in bulk, the volume
           | prices you see on mouser are way higher than the going rates.
        
             | monocasa wrote:
             | The actual cost of the part (within reason) doesn't matter
             | all that much for a hyperscaler. The real cost is in the
             | perf/watt, which an FPGA is around an order of magnitude
             | worse for the same RTL.
        
           | cpgxiii wrote:
           | > True, but FPGAs are suitable only for things that will not
           | be produced in great numbers, because their cost and power
           | consumption are many times higher than those of an ASIC.
           | 
           | While common folk wisdom, this really isn't true. A
           | surprising number of products ship with FPGAs inside,
           | including ones designed to be "cheap". A great example of
           | this is that Blackmagic, a brand known for being a "cheap"
           | option in cinema/video gear, bases _everything_ on Xilinx
           | /AMD FPGAs (for some "software heavy" products they use the
           | Xilinx/AMD Zynq line, which combines hard ARM cores with an
           | FPGA). Pretty much every single machine vision camera on the
           | market uses an FPGA for image processing as well. These
           | aren't "one in every pocket" level products, but they are
           | widely produced.
           | 
           | > Unfortunately, the difference in performance between FPGAs
           | and ASICs has greatly increased in recent years, because the
           | FPGAs have remain stuck on relatively ancient CMOS
           | manufacturing processes
           | 
           | This isn't true either. At the high end, FPGAs are made on
           | whatever the best process available is. Particularly in the
           | top-end models that combine programmable fabric with hard
           | elements, it would be insane not to produce them on the best
           | process available. What _is_ the big hindrance with FPGAs is
           | that almost by definition the cell structures needed to
           | produce programability are inherently more complex and less
           | efficient than the dedicated circuits of an ASIC. That often
           | means a big hit to maximum clock rate, with resulting
           | consequences to any serial computation being performed.
        
             | santaboom wrote:
             | All very informative, I had some quibbles.
             | 
             | While it is true that cheap and expensive FPGAs exist, an
             | FPGA system to replace TPU would not use a $0.50 or even
             | $100 FPGA it would use a Versal or Ultrascale+ FPGA that
             | costs thousands, compared to the (rough guess) $100/die you
             | might spend for largest chip on most advanced process.
             | Furthermore, overhead of FPGA means every single one my
             | support a few million logic gates (maybe 2-5x if you use
             | hardened blocks), compare to billions of transistors on
             | largest chips in most advanced node --> cost per chip to
             | buy is much much higher.
             | 
             | To the second point, afaik, leading edge Versal FPGAs are
             | in 7nm, not ancient also not cutting edge used for
             | asic(n3).
        
             | adrian_b wrote:
             | By high end I assume that you mean something like some of
             | the AMD Versal series, which are made on a TSMC 7 nm
             | process, like the AMD Zen 2 CPUs from 6 years ago.
             | 
             | While TSMC 7 nm is much better than what most FPGAs use, it
             | is still ancient in comparison with what the current CPUs
             | and GPUs use.
             | 
             | Moreover, I have never seen such FPGAs sold for less than
             | thousands of $, i.e. they are much more expensive than GPUs
             | of similar throughput.
             | 
             | Perhaps they are heavily discounted for big companies, but
             | those are also the companies which could afford to design
             | an ASIC with better energy efficiency.
             | 
             | I always prefer to implement an FPGA solution over any
             | alternatives, but unfortunately much too often the FPGAs
             | with high enough performance have also high enough prices
             | to make them unaffordable.
             | 
             | The FPGAs with the highest performance that are still
             | reasonably priced are the AMD UltraScale+ families, made
             | with a TSMC 14 nm process, which is still good in
             | comparison with most FPGAs, but nevertheless it is a decade
             | old manufacturing process.
        
       | cheptsov wrote:
       | It's so ridiculous to see TPUs being compared to NVIDIA GPUs. IMO
       | proprietary chips such as TPU had no future sure to the monopoly
       | on the cloud services. There is no competition across the cloud
       | services providers. The only way to access TPUs is through GCP.
       | As the result nobody wants to use them regardless of the
       | technology. This is the biggest fault of GCP. Further the road,
       | the gap between NVIDIA GPUs and Google TPUs (call it ,,moat" or
       | CUDA) is going to grow.
       | 
       | The opposite situation is with AMD which are avoiding the
       | mistakes of Google.
       | 
       | My hope though is that AMD doesn't start to compete with cloud
       | service providers, e.g. by introducing their own cloud.
        
         | hiddencost wrote:
         | TPUs will thrive regardless of public adoption; Google's
         | internal demand for TPU is such that they could buy every TPU
         | ever produced.
        
           | roughly wrote:
           | One thing worth note here - TPUs are optimized for a fairly
           | constrained set of operations. Google's had good success with
           | them, but, like many of the other Google architectural
           | choices, this will constrain Google's technical choice space
           | in the future - if they've gone all in on TPUs, future Google
           | machine learning projects will be using the sets of
           | operations the TPUs excel at because that's what Google has a
           | lot of, not necessarily because that's the optimal choice.
           | This will have knock-on effects across the industry due to
           | Google's significant influence on industry practice and
           | technical direction.
        
       | trostaft wrote:
       | Excellent write up, thank you. The benefits section was
       | illustrative
        
       | mdaniel wrote:
       | Related: _OpenTPU: Open-Source Reimplementation of Google Tensor
       | Processing Unit (TPU)_ -
       | https://news.ycombinator.com/item?id=44111452 - May, 2025 (23
       | comments)
        
       | wkat4242 wrote:
       | Haha I thought this was about 3D printing with thermostatic poly
       | urethane. It's one of the harder materials to print and it also
       | took me some time to get my head around it.
        
       ___________________________________________________________________
       (page generated 2025-06-22 23:00 UTC)