[HN Gopher] How to train large models on many GPUs?
___________________________________________________________________
How to train large models on many GPUs?
Author : picture
Score : 100 points
Date : 2021-09-26 02:07 UTC (1 days ago)
(HTM) web link (lilianweng.github.io)
(TXT) w3m dump (lilianweng.github.io)
| Voloskaya wrote:
| DeepSpeed [1] is amazing tool to enable the different kind of
| parallelisms and optimizations on your model. I would definitely
| not recommend reimplementing everything yourself.
|
| Probably FairScale [2] too, but never tried it myself.
|
| [1]: https://github.com/microsoft/DeepSpeed
|
| [2]: https://github.com/facebookresearch/fairscale
| sisjohn wrote:
| Any suggestions on what GPU to use to train large models?
| blackbear_ wrote:
| Totally depends on your budget. The DGX A100 [1] is quite good
| if you have a fat wallet
|
| [1] https://www.nvidia.com/en-us/data-center/dgx-a100/
| atty wrote:
| Really depends on what you mean by large. If you mean truly
| large, you will need a cluster to train it in any reasonable
| amount of time. You'd probably want to look at servers built on
| the HGX platform (8 A100s per server). We use servers leased in
| bulk from traditional server providers (think Dell, HP, etc).
| If you mean more like "as large as personally affordable", then
| you'd probably want to look at something like the RTX 3090, if
| you can get lucky and find it at MSRP, it has 24 gigs of
| memory. Nvidia also has their workstation cards with up to 48
| gigs if I remember correctly, but if I were buying cards for
| myself, I would wait until I could get two 3090s somewhere
| close to MSRP, instead of paying the markup on the workstation
| cards (unless you want to have more than 2 in a workstation, in
| which case you'd need to go for those)
| lvl100 wrote:
| 2 x 3090FE is the best bang for your buck.
| cinntaile wrote:
| Do you need watercooling to keep them from running too hot?
| maxwells-daemon wrote:
| I use 2x3090 to train large language models, and mine don't
| thermal-throttle with air cooling even though they're right
| next to each other. Eth mining does generate too much heat
| though.
| kkielhofner wrote:
| You can tweak the power limit settings for your
| application. In many cases you can drop the power
| consumption (and heat generated) while still maintaining >
| 90% performance but this will depend on your actual use
| case [0].
|
| In my experience for many models you can reduce the power
| limit even further than what has been tested in this guide
| while barely impacting performance.
|
| [0] https://timdettmers.com/2020/09/07/which-gpu-for-deep-
| learni...
| lvl100 wrote:
| For ML? Nope. I think overheating issues are mostly for
| mining. I run models and 3D render quite a bit and never
| ran into problems.
___________________________________________________________________
(page generated 2021-09-27 23:03 UTC)