[HN Gopher] Are GPUs Worth It for ML?
___________________________________________________________________
Are GPUs Worth It for ML?
Author : varunkmohan
Score : 81 points
Date : 2022-08-29 18:34 UTC (4 hours ago)
(HTM) web link (exafunction.com)
(TXT) w3m dump (exafunction.com)
| mpaepper wrote:
| This also very much depends on the inference use case / context.
| For example, I work in deep learning on digital pathology where
| images can be up to 100000x100000pixels in size and inference
| needs GPUs as it's just way too slow otherwise.
| PeterStuer wrote:
| " It feels wasteful to have an expensive GPU sitting idle while
| we are executing the CPU portions of the ML workflow"
|
| What is expensive? Those 3090ti's are looking very tasteful at
| current prices.
| ummonk wrote:
| What a clickbaity article. It's an interesting discussion of GPU
| multiplexing for ML inference merged together with a sales pitch
| but the clickbait title made me hate the article bait and switch.
| This wasn't even an example of Betteridge's law but just
| completely misleading headline.
| Eridrus wrote:
| Is everyone with relevant inference costs not doing this
| already?
|
| I am so confused how there seems to be a startup around having
| a work queue that does batching...
| Kukumber wrote:
| An interesting question, shows how insanely overpriced GPUs still
| are, specially in the cloud environment
| pqn wrote:
| Disclaimer: I work at Exafunction
|
| I empathize a bit with the cloud providers as they have to
| upgrade their data centers every few years with new GPU
| instances and it's hard for them to anticipate demand.
|
| But if you can easily use every trick in the book (CPU version
| of the model, autoscaling to zero, model compilation, keeping
| inference in your own VPC, using spot instances, etc.) then
| it's usually still worth it.
| NavinF wrote:
| *only in the cloud environment
|
| Throw some 3090s in a rack and you'll break even in 3 months
| mistrial9 wrote:
| the HPC crowd are not able to add GPUs, that I know of..
| deepLearning group of algorithms do kick butt for lots of kinds
| of problems+data .. though I will advocate that dl is NOT the
| only game in town, despite what you often read here
| Frost1x wrote:
| In what context? HPC and certain code bases have been
| effectively leveraging heterogenous CPU GPU workloads for a
| variety of applications for quite awhile. I know of some
| doing so in at least 2009 and know plenty of prior art was
| already there by that point, it's just a specific time I
| happen to remember.
| fancyfredbot wrote:
| There are some pretty elegant solutions out there for the problem
| of having the right ratio of CPU to GPU. One of the nicer ones is
| rCUDA.
| https://scholar.google.com/citations?view_op=view_citation&h...
| varunkmohan wrote:
| rCUDA is super cool! One of the issues though is for a lot of
| the common model frameworks are not supported and a new release
| has not come out a while.
| fancyfredbot wrote:
| Fair point. It's not obvious from the website which model
| frameworks does exafunction supports, or when the last
| exafunction release was.
| PeterisP wrote:
| For some reason they focus on the inference, which is the
| computationally cheap part. If you're working on ML (as opposed
| to deploying someone else's ML) then almost all of your workload
| is training, not inference.
| cardine wrote:
| I have not found this to be true at all in my field (natural
| language generation).
|
| We have a 7 figure GPU setup that is running 24/7 at 100%
| utilization just to handle inference.
| fartcannon wrote:
| How do you train new models if your GPUs are being used for
| inference? I guess the training happens significantly less
| frequently?
|
| Forgive my ignorance.
| jacquesm wrote:
| Typically a different set of hardware for model training.
| cardine wrote:
| We have different servers for each. But the split is
| usually 80%/20% for inference/training. As our product
| grows in usage the 80% number is steadily increasing.
|
| That isn't because we aren't training that often - we are
| almost always training many new models. It is just that
| inference is so computationally expensive!
| dheera wrote:
| Also true of self-driving. You train a perception model for a
| week and then log millions of vehicle-hours on inference.
| MichaelBurge wrote:
| Think Google: Every time you search, some model somewhere gets
| invoked, and the aggregate inference cost would dwarf even very
| large training costs if you have billions of searches.
|
| Marketing blogspam like this is always targeting big(not
| Google, but big) companies hoping to divert their big IT
| budgets to their coffers: "You have X million queries to your
| model every day. Imagine if we billed you per-request, but
| scaled the price so in aggregate it's slightly cheaper than
| your current spending."
|
| People who are training-constrained are early-stage(i.e.
| correlate with not having money), and then they need to buy an
| entirely separate set of GPUs to support you(e.g. T4s are good
| for inference, but they need V100s for training). So they
| choose to ignore you entirely.
| Jensson wrote:
| If you are training models that are intended to be used in
| production at scale then training is dirt cheap compared to
| inference. There is a reason why Google focused on inference
| first with their TPU's even though Google does a lot of ML
| training.
| dllthomas wrote:
| I think another part of the question is whether you're
| scaling on your own hardware or the customers' hardware.
| gowld wrote:
| jldugger wrote:
| > If you're working on ML (as opposed to deploying someone
| else's ML) then almost all of your workload is training, not
| inference.
|
| Wouldn't that depend on the size of your customer base? Or at
| least, requests per second?
| karamanolev wrote:
| With more customers usually the revenue and profit grow, then
| the team becomes larger, wants to perform more experiments,
| spends more on training and so on. Inference is just so
| computationally cheap compared to training.
|
| That's what I've seen in my experience, but I concur that
| there might be cases where the ML is a more-or-less solved
| problem for a very large customer base where inference is
| more. I've rarely seen it happen, but other people are
| sharing scenarios where it happens frequently. So I guess it
| massively depends on the domain.
| mgraczyk wrote:
| This depends a lot on what you're doing. If you are ranking 1M
| qps in a recommender system, then training cost will be tiny
| compared to inference.
| lajamerr wrote:
| I wonder if there's room for model caching. Sacrifice some
| personalization for near similar results so you aren't
| hitting the model so often.
| mgraczyk wrote:
| Yeah we did lots of things like this at Instagram. Can be
| very brittle and dangerous though to share any caching
| amongst multiple users. If you work at Facebook you can
| search for some SEVs related to this lol
| varunkmohan wrote:
| Agreed that there are workloads where inference is not
| expensive, but it's really workload dependent. For applications
| that run inference over large amounts of data in the computer
| vision space, inference ends up being a dominant portion of the
| spend.
| PeterisP wrote:
| The way I see it, generally every new data point (on which
| the production model inference gets run once) becomes part of
| the data set which then gets used in training every next
| model, processing the same data point many more times in
| training, thus training unavoidably taking more effort than
| inference.
|
| Perhaps I'm a bit biased towards all kinds of self-supervised
| or human-in-the-loop or semi-supervised models, but the
| notion of discarding large amounts of good domain-specific
| data that get processed _only_ for inference and not used for
| training afterward feels a bit foreign to me, because you
| usually can extract an advantage from it. But perhaps that 's
| the difference between data-starved domains and overwhelming-
| data domains?
| varunkmohan wrote:
| Yup, exactly. It's a good point that for self-supervised
| workloads, the training set can become arbitrarily large.
| For a lot of other workloads in the vision space, most data
| needs to be labeled to be able to used for training.
| pdpi wrote:
| There's one piece of the puzzle you're missing: field-
| deployed devices.
|
| If I play chess on my computer, the games I play locally
| won't hit the Stockfish models. When I use the feature on
| my phone that allows me to copy text from a picture, it
| won't phone home with all the frames.
| version_five wrote:
| What you say re saving all data is the ideal. I'd add a
| couple caveats, one is that in many fields you often get
| lots of redundant data that adds nothing to training (for
| example if an image classifier looking for some rare class
| you can be drowning in images of the majority class). Or
| you can just have lots of data that is unambiguously and
| correctly classified- some kind of active learning can tell
| you what is worth keeping.
|
| The other is that for various reasons the customer doesnt
| want to share their data (or at least have sharing built
| into the inference system) so even if you'd like to have
| everything they record, it's just not available. Obviously
| something to discourage but it seems common
| acchow wrote:
| Is your inference running on some daily jobs? That's not a ton
| of inference compared to running online for every live request
| (10k QPS?)
| sabotista wrote:
| It depends a lot on your problem, of course.
|
| Game-playing (e.g. AlphaGo) is computationally hard but the rules
| are immutable, target functions (e.g., heuristics) don't change
| much, and you can generate arbitrarily sized clean data sets
| (play more games). On these problems, ML-scaling approaches work
| very well. For business problems where the value of data decays
| rapidly, though, you probably don't need the power of a deer or
| complex neural net with millions of parameters, and expensive
| specialty hardware probably isn't worth it.
| scosman wrote:
| We did a big analysis of this a few years back. We ended up using
| a big spot-instance cluster of CPU machines for our inference
| cluster. Much more consistently available than spot GPU, at
| greater scale, and at better price per inference (at least at the
| time). Scaled well to many billion inferences. Of course, compare
| cost per inference on your models to make sure logic applies.
| Article on how it worked: https://www.freecodecamp.org/news/ml-
| armada-running-tens-of-...
|
| Training was always GPUs (for speed), non-spot-instance (for
| reliability), and cloud based (for infinite parallelism).
| Training work tended to be chunky, never made sense to build
| servers in house that would be idle some of the time, and queued
| at other times.
| machinekob wrote:
| What cloud is even remotely worth it over buying 20x rtx 3090
| or even some quadro for training? Maybe if u have very small
| team and small problems but if you have CV/Video tasks and team
| more than 3 maybe even 2 people in house servers are always
| better choice as you'll get your money back in 2-3 months of
| training over cloud solution and maybe even more if you wait
| for rtx 4090.
|
| And if you are solo dev its even easier choice as you can reuse
| your rig for other stuff when you dont train anything (for
| example gaming :D).
|
| Only possibility is if you get free 100k from AWS and then 100k
| from GCP you can live with that for a year or even two if u
| stack both providers but it is special case and im not sure how
| easy it is to get 100k right now.
| beecafe wrote:
| You are years behind if you think you're training a model
| worth anything on consumer grade GPUs. Table stakes these
| days is 8x A100 pods, and lots of them. Luckily you can just
| get DGX pods so you don't have to build racks but for many
| orgs just renting the pods is much cheaper.
| rockemsockem wrote:
| 300 billion parameters or GTFO, eh?
|
| There is tons of value to be had from smaller models. Even
| some state of the art results can be obtained on a
| relatively small set of commodity GPUs. Not everything is
| GPT-scale.
| cjbgkagh wrote:
| Years behind what? Table stakes for what? There is much
| more to ML than the latest transformer and diffusion
| models. While those get the attention the amount of
| research not in that space dominates.
| mushufasa wrote:
| > You are years behind if you think you're training a model
| worth anything on consumer grade GPUs
|
| Ah yes, my code can't be useful to people unless it takes a
| long time to compile...
| skimo8 wrote:
| To be fair, I think ML workloads are quite a bit
| different than the days of compiling over lunch breaks.
|
| What the above post was probably trying to get at is that
| the ML specific hardware is far more efficient these days
| than consumer GPUs.
| machinekob wrote:
| Ahh yes cause there is only one way to do Deep Learning and
| it is ofc stacking models large enough to not be useful
| outside pods of GPUs and this is for sure way to go if you
| want to make money (from VC ofc cause you wont have much
| users that are ever willing to pay so much that you'll ever
| make even, as was OpenAI and other big model providers,
| maybe you can get some money/sponsoring from state or uni).
|
| Market for local small and efficient models running on
| device is pretty big maybe even biggest that exist right
| now [ios, android and macos are pretty easy to monetize
| with low cost models that are useful]. I can assure you of
| that and you can do it on even 4x RTX 3090 [ it wont be
| fast but you'll get there :) ]
| scosman wrote:
| As mentioned in the comment, ML training workloads tend to be
| super chunky (at least in my experience). Some days we want
| to train 50 models, some weeks we are evaluating and don't
| need any compute.
|
| I'd rather be able to spin up 200 gpus in parallel when
| needed (yes, at a premium), but ramp to 0 when not. Data
| scientists waiting around are more expensive than GPUs.
| Replacing/maintaining servers is more work/money than you
| expect. And for us the training data was cloud native, so
| transfer/privacy/security is easier; nothing on prem, data
| scientists can design models without having access to raw
| data, etc.
| machinekob wrote:
| If you are cloud only company then for sure it is just
| easier but still it wont be cheaper just more convenient to
| use. If data science team is very big probably "best"
| solution without unlimited money is just to run local and
| go cloud [premium] if you dont have free resources for your
| teams (It was the case when i was working in pretty big EU
| bank but it wasn't "true" Deep learning yet [about 4-5
| years ago]).
| varunkmohan wrote:
| You have a good point. I think for small enough workloads
| self managing instances on-prem is more cost-effective. There
| is a simplicity gain in being able to scale up and scale down
| instances in the cloud but may not make sense if you can
| self-manage without too much work.
| varunkmohan wrote:
| Disclaimer: I'm the Cofounder / CEO at Exafunction
|
| That's a great point. We'll be addressing this in an upcoming
| post as well.
|
| We've served workloads that run entirely on spot GPUs where it
| makes sense since a small number of spot GPUs can make up for a
| large amount of spot CPU capacity. The best of all worlds is if
| you can manage both spot and on-demand instances (with a
| preference towards spot instances). Also, for latency sensitive
| workloads, running on spot instances or CPUs sometimes is not
| an option.
|
| I could definitely see cases where it makes sense to run on
| spot CPUs though.
| fortysixdegrees wrote:
| Disclaimer != Disclosure
|
| Probably one of HNs most common mistakes in comments
| adgjlsfhk1 wrote:
| perhaps, but I think disclaimer in this context it's just
| an abbreviation since the disclosure carries with it the
| implicit disclaimer of "so the things I'm saying are
| subconsciously influenced by the fact that they potentially
| could make me money"
| synergy20 wrote:
| I think TPU is the way to go for ML, be it training or inference.
|
| We're using GPU(some contains a TPU block inside) due to
| 'historical reasons'. With vector unit(x86 AVX, ARM SVE, RISC-V
| RVV) that is part of the host cpu, either put a TPU on a separate
| die of the chiplet, or just put it into a PCIe card will do the
| heavy lift ML job fine. It shall be much cheaper than the GPU
| model for ML nowadays, unless you are both a PC game player and a
| ML engineer.
| andrewmutz wrote:
| At training time they sure are. The only thing more expensive
| than fancy GPUs are the ML engineers whose productivity that are
| improving.
| jacquesm wrote:
| This is an ad.
| triknomeister wrote:
| I thought this post would be about how ASICs are probably a
| better bet.
| rfrey wrote:
| Not related to the article, but how would one begin to become
| smart on optimizing GPU workloads? I've been charged with
| deploying an application that is a mixture of heuristic search
| and inference, that has been exclusively single-user to this
| point.
|
| I'm sure every little thing I've discovered (e.g. measuring
| cpu/gpu workloads, trying to multiplex access to the gpu, etc)
| was probably covered in somebody's grad school notes 12 years
| ago, but I haven't found a source of info on the topic.
| pqn wrote:
| Let's just take the topic of measuring GPU usage. This alone is
| quite tricky -- tools like nvidia-smi will show full GPU
| utilization even if not all SMs are running. And also the
| workload may change behavior over time, if for instance inputs
| to transformers got longer over time. And then it gets even
| more complicated to measure when considering optimizations like
| dynamic batching. I think if you peek into some ML Ops
| communities you can get a flavor of these nuances, but not sure
| if there are good exhaustive guides around right now.
| einpoklum wrote:
| > And CPUs are so much cheaper
|
| Doesn't look like it. Consumer:
|
| AMD ThreadRipper 3970X: ~3000 USD on NewEgg
|
| https://www.newegg.com/amd-ryzen-threadripper-2990wx/p/N82E1...
|
| NVIDIA RTX 3080 Ti Founders' Edition: ~2000 USD
|
| https://www.newegg.com/nvidia-900-1g133-2518-000/p/1FT-0004-...
|
| For servers, a comparison is even more complicated and it
| wouldn't be fair to just give two numbers, but I still don't
| think GPUs are more expensive.
|
| ... besides, none of that may matter if yours is a power budget.
| 37ef_ced3 wrote:
| For small-scale transformer CPU inference you can use, e.g.,
| Fabrice Bellard's https://bellard.org/libnc/
|
| Similarly, for small-scale convolutional CPU inference, where you
| only need to do maybe 20 ResNet-50 (batch size 1) per second per
| CPU (cloud CPUs cost $0.015 per hour) you can use inference
| engines designed for this purpose, e.g., https://NN-512.com
|
| You can expect about 2x the performance of TensorFlow or PyTorch.
| tombert wrote:
| Is there a thing that Fabrice Bellard hasn't built? I had no
| idea that he was interested in something like machine learning,
| but I guess I shouldn't have been surprised because he has
| built every tool that I use.
| mistrial9 wrote:
| https://en.wikipedia.org/wiki/Fabrice_Bellard
| [deleted]
| [deleted]
| [deleted]
___________________________________________________________________
(page generated 2022-08-29 23:00 UTC)