[HN Gopher] Who uses Google TPUs for inference in production?
___________________________________________________________________
Who uses Google TPUs for inference in production?
I am really puzzled by TPUs. I've been reading everywhere that TPUs
are powerful and a great alternative to NVIDIA. I have been
playing with TPUs for a couple of months now, and to be honest I
don't understand how can people use them in production for
inference: - almost no resources online showing how to run modern
generative models like Mistral, Yi 34B, etc. on TPUs - poor
compatibility between JAX and Pytorch - very hard to understand the
memory consumption of the TPU chips (no nvidia-smi equivalent) -
rotating IP addresses on TPU VMs - almost impossible to get my
hands on a TPU v5 Is it only me? Or did I miss something? I
totally understand that TPUs can be useful for training though.
Author : arthurdelerue
Score : 81 points
Date : 2024-03-11 16:22 UTC (6 hours ago)
| hiddencost wrote:
| Google is using them in prod. I think they're so hungry for chips
| internally that cloud isn't getting much support in selling them.
| amelius wrote:
| Or maybe they are just using nVidia. Who knows ...
| eklitzke wrote:
| Lots of people know.
| nivekney wrote:
| _marks as solved_
| vineyardmike wrote:
| Beyond the fact that this is hardly a secret, there's lots of
| other signs.
|
| 1. They have bought far less from NVidia than other hyper
| scalers, and they literally can't vomit without saying "AI".
| They have to be running those models on something. They have
| purchased huge amounts of chips from fabs, and what else
| would that be?
|
| 2. They have said they use them. Should be pretty obvious
| here.
|
| 3. They maintain a whole software stack for them, they design
| the chips, etc. Then they don't really try to sell the TPU.
| Why else would they do this?
| jeffbee wrote:
| They have announced publicly using TPUs for inference, as
| far back as 2016. They did not offer TPUs for Cloud
| customers until 2017. The development is clearly driven by
| internal use cases. One of the features they publicly
| disclosed as TPU-based was Smart Reply and that launched in
| 2015. So their internal use of TPUs for inference goes back
| nearly a decade.
| danjl wrote:
| I would guess that Google's vertexAI managed solution uses
| TPUs. Also Google uses them internally to train and infer for
| all their research products.
| sciencesama wrote:
| 80 to 90% are consumed internally !! Only from version 5 it
| is planned to be customer focussed !!
| michaelt wrote:
| While you _can_ use TPUs with vertexai, it 's just virtual
| machines - you can have one with an nvidia card if you like.
| VirusNewbie wrote:
| They're getting swallowed up by Anthropic and the other huge
| spenders:
|
| https://www.prnewswire.com/news-releases/google-announces-ex...
|
| "Partnership includes important new collaborations on AI safety
| standards, committing to the highest standards of AI security,
| and use of TPU v5e accelerators for AI inference "
| _b wrote:
| I think this is right, in part because I've been told exactly
| this from people who work for Google and their job is to sell
| me cloud stuff- i.e., they say they have so much internal
| demand they aren't pushing TPUs for external use. Hence
| external pricing and support just isn't that great right now.
| But presumably when capacity catches up they'll start pushing
| TPUs again.
| natbobc wrote:
| Feels like a bad point in the curve to try and sell them. "Oh
| our internal hypecycle is done... we'll put them in the
| market now that they're all worn out.
| rj45jackattack wrote:
| Sounds like ButterflyLabs.
| emadm wrote:
| https://pytorch.org/blog/high-performance-llama-2/
| htrp wrote:
| >Cheers,
|
| > The PyTorch/XLA Team at Google
|
| Meanwhile you have an issue from 5 years ago with 0 support
|
| https://github.com/pytorch/xla/issues/202
| htrp wrote:
| We've previously tried and almost always regretted the decision.
| I think the tech stack needs another 12-18 months to mature
| (doesn't help that almost all work ex Google is being done in
| torch).
| mike_d wrote:
| > I think the tech stack needs another 12-18 months to mature
|
| Google has been doing AI before any other company even thought
| about it. They are on the 6th generation of TPU hardware.
|
| I don't think there is any maturity issue, just an availability
| issue because they are all being used internally.
| htrp wrote:
| 100% agree, if I have access to the TPU team internally,
| it'll be very easy to use in production.
|
| If you aren't internal, the documentation, support, and even
| just general bug fixing is impossible to get.
| ethbr1 wrote:
| (Has an expert team dedicated solely to optimizing for
| exotic hardware) = an option
|
| (Doesn't have a team like that) = stick to mass-use,
| commodity hardware
|
| That's generally been the trade-off since ~1970. And
| usually, the performance isn't worth the people-salaries.
|
| How many examples of successful hardware that isn't well-
| documented and doesn't have drop-in 1:1 SDK coverage vs
| (more popular solution) are there?
|
| It seems like a heavy-lift to even get something that does
| have parity in those ways adopted, given you're fighting
| market inertia.
| danielcampos93 wrote:
| I feel like I have been hearing that since V1 TPU. I think
| Google is the perfect solution because they are teams whose job
| is to take a model and TPUify it. Elsewhere there is no team,
| so it's no fun.
| kccqzy wrote:
| Apparently Midjourney uses it. GCP put out a press release a
| while ago: https://www.prnewswire.com/news-releases/midjourney-
| selects-...
| trsohmers wrote:
| The quote from the linked press release is that they do
| training on TPUv4, while inference is running on GPUs. I have
| also heard this separately from people associated with
| Midjourney recently, and that they solely do training on TPUs.
| derdrdirk wrote:
| TPUs are tightly coupled to JAX and the XLA compiler. If your
| model is based on Pytorch you can use a bridge to export your
| model to StableHLO and then compile it to a TPU accelerator. In
| theory the XLA compiler should be more performant than the
| Pytorch Inductor.
| ooterness wrote:
| There's a cubesat using a Coral TPU for pose estimation.
|
| https://aerospace.org/article/aerospaces-slingshot-1-demonst...
| ChrisArchitect wrote:
| Ask HN:
| pogue wrote:
| I've seen people connecting these to Raspberry Pis to run local
| LLMs but I'm not sure how effective it is. Check YouTube for some
| videos about it.
|
| Speaking of SBCs, prior to the Raspberry Pi, I was looking at the
| Orange Pi 5 which has a Rockchip RK3588S with an NPU (Neural
| Processing Unit). This was the first I had heard of such a thing
| but I was curious how/what exactly it does. Unfortunately,
| there's very little support for Orange Pi & not a large community
| for it so I couldn't find any feedback on how well it worked or
| what it did.
|
| http://www.orangepi.org/html/hardWare/computerAndMicrocontro...
| mrwilliamchang wrote:
| To see memory consumption on the TPU while running on GKE you can
| look at kubernetes.io/node/accelerator/memory_used
|
| https://cloud.google.com/kubernetes-engine/docs/how-to/tpus#...
___________________________________________________________________
(page generated 2024-03-11 23:01 UTC)