[HN Gopher] How to train large models on many GPUs?
       ___________________________________________________________________
        
       How to train large models on many GPUs?
        
       Author : picture
       Score  : 100 points
       Date   : 2021-09-26 02:07 UTC (1 days ago)
        
 (HTM) web link (lilianweng.github.io)
 (TXT) w3m dump (lilianweng.github.io)
        
       | Voloskaya wrote:
       | DeepSpeed [1] is amazing tool to enable the different kind of
       | parallelisms and optimizations on your model. I would definitely
       | not recommend reimplementing everything yourself.
       | 
       | Probably FairScale [2] too, but never tried it myself.
       | 
       | [1]: https://github.com/microsoft/DeepSpeed
       | 
       | [2]: https://github.com/facebookresearch/fairscale
        
       | sisjohn wrote:
       | Any suggestions on what GPU to use to train large models?
        
         | blackbear_ wrote:
         | Totally depends on your budget. The DGX A100 [1] is quite good
         | if you have a fat wallet
         | 
         | [1] https://www.nvidia.com/en-us/data-center/dgx-a100/
        
         | atty wrote:
         | Really depends on what you mean by large. If you mean truly
         | large, you will need a cluster to train it in any reasonable
         | amount of time. You'd probably want to look at servers built on
         | the HGX platform (8 A100s per server). We use servers leased in
         | bulk from traditional server providers (think Dell, HP, etc).
         | If you mean more like "as large as personally affordable", then
         | you'd probably want to look at something like the RTX 3090, if
         | you can get lucky and find it at MSRP, it has 24 gigs of
         | memory. Nvidia also has their workstation cards with up to 48
         | gigs if I remember correctly, but if I were buying cards for
         | myself, I would wait until I could get two 3090s somewhere
         | close to MSRP, instead of paying the markup on the workstation
         | cards (unless you want to have more than 2 in a workstation, in
         | which case you'd need to go for those)
        
         | lvl100 wrote:
         | 2 x 3090FE is the best bang for your buck.
        
           | cinntaile wrote:
           | Do you need watercooling to keep them from running too hot?
        
             | maxwells-daemon wrote:
             | I use 2x3090 to train large language models, and mine don't
             | thermal-throttle with air cooling even though they're right
             | next to each other. Eth mining does generate too much heat
             | though.
        
             | kkielhofner wrote:
             | You can tweak the power limit settings for your
             | application. In many cases you can drop the power
             | consumption (and heat generated) while still maintaining >
             | 90% performance but this will depend on your actual use
             | case [0].
             | 
             | In my experience for many models you can reduce the power
             | limit even further than what has been tested in this guide
             | while barely impacting performance.
             | 
             | [0] https://timdettmers.com/2020/09/07/which-gpu-for-deep-
             | learni...
        
             | lvl100 wrote:
             | For ML? Nope. I think overheating issues are mostly for
             | mining. I run models and 3D render quite a bit and never
             | ran into problems.
        
       ___________________________________________________________________
       (page generated 2021-09-27 23:03 UTC)