[HN Gopher] 'I paid for the whole GPU, I am going to use the who...
___________________________________________________________________
'I paid for the whole GPU, I am going to use the whole GPU'
Author : mooreds
Score : 63 points
Date : 2025-05-07 21:04 UTC (1 hours ago)
(HTM) web link (modal.com)
(TXT) w3m dump (modal.com)
| cubefox wrote:
| For anyone thinking this is about video games:
|
| > We'll specifically focus on neural network inference workloads
| keybored wrote:
| It's hard to forget the neural network application these days.
| mooreds wrote:
| The subtitle (which is important but was too long for the HN
| submission) is "A high-level guide to GPU utilization".
| mwilcox wrote:
| Understandable
| Mockapapella wrote:
| This is a good article on the "fog of war" for GPU inference.
| Modal has been doing a great job of aggregating and disseminating
| info on how to think about high quality AI inference. Learned
| some fun stuff -- thanks for posting it.
|
| > the majority of organizations achieve less than 70% GPU
| Allocation Utilization when running at peak demand -- to say
| nothing of aggregate utilization. This is true even of
| sophisticated players, like the former Banana serverless GPU
| platform, which operated at an aggregate utilization of around
| 20%.
|
| Saw this sort of thing at my last job. Was very frustrating
| pointing this out to people only for them to respond with
| -\\_(tsu)_/-. I posted a much less tactful article (read: rant)
| than the one by Modal, but I think it still touches on a lot of
| the little things you need to consider when deploying AI models:
| https://thelisowe.substack.com/p/you-suck-at-deploying-ai-mo...
| charles_irl wrote:
| Nice article! I had to restrain myself from ranting on our blog
| :)
| awesome_dude wrote:
| I'm old enough to remember when people would be concerned if
| their CPU usage went to 100%
| bdangubic wrote:
| back in those days you weren't renting them :)
| XorNot wrote:
| Well also people are pretty bad at logistical reasoning
| though.
|
| From a capital expenditure perspective, you _are_ renting the
| CPU you bought in terms of opportunity cost.
|
| What people have some sense of is that there's an ascribable
| value to having a capability in reserve versus discovering
| you don't have it when you need it.
| twoodfin wrote:
| You'd worry about 100% CPU because even if the OS was
| successfully optimizing for throughput (as Linux is very good
| at), latency/p99 is certain to suffer as spare cycles
| disappear.
|
| That's not a concern with typical GPU workloads, which are
| batch/throughput-oriented.
| calaphos wrote:
| There's still a throughput/latency tradeoff curve, at least
| for any sort of interactive models.
|
| One of the reasons why inference providers sell batch
| discounts.
| charles_irl wrote:
| Oh, I wrote this! Thanks for sharing it.
___________________________________________________________________
(page generated 2025-05-07 23:00 UTC)