[HN Gopher] My Deep Learning Rig
___________________________________________________________________
My Deep Learning Rig
Author : jacquesm
Score : 20 points
Date : 2023-08-15 20:05 UTC (2 hours ago)
(HTM) web link (nonint.com)
(TXT) w3m dump (nonint.com)
| throwing_away wrote:
| > This is because without dropping serious $$$ on mellanox high-
| speed NICs and switches, inter-server communication bandwidth
| quickly becomes the bottleneck when training large models. I
| can't afford fancy enterprise grade hardware, so I get around it
| by keeping my compute all on the same machine. This goal drives
| many of the choices I made in building out my servers, as you
| will see.
|
| 10gbe is very cheap now, but I guess that's not enough?
| liuliu wrote:
| Yeah, you need 100gbe minimal. 10gbe is too little (PCIe
| bandwidth can be a bottleneck, and that is already clocked
| around 100GbE (16GB)).
|
| BTW: echo to the author, PSU and in the U.S. (120v) is a major
| issue why I am limiting to 4-GPUs. Also, it seems 3090 still
| have NVLink support, wondering why the author haven't put that
| up. From what I experienced, NVLink does help if you run data
| parallel training.
| jacquesm wrote:
| Couldn't you use a 240V dryer socket for that purpose? That
| should get you 7200 Watts on a 30A circuit.
| bradfox2 wrote:
| 100gbe mellanox connect x cards are not actually that expensive
| though.
| jacquesm wrote:
| You'd need a switch too, unless you're going point-to-point
| but that will eat up PCI slots that you probably would like
| to use for GPUs.
| LTL_FTC wrote:
| If these server boards support thunderbolt AIC's, and I believe
| they might as my Threadripper Pro board does, daisy chaining
| them together could get you 40Gbps somewhat easily, if that is
| sufficient.
| hooloovoo_zoo wrote:
| Interesting, wonder what the actual income from vast.ai looked
| like.
| jacquesm wrote:
| Likewise, based on the costs listed on their page I'd say no
| more than $.8 / hour or so assuming a 50% gross margin for
| vast.ai.
|
| And that includes energy costs so I assume the OP has a cheap
| source of power. Here in NL I could not do this profitably,
| even off solar power it would be more efficient to sell that
| power to the grid than to use it to drive a GPU rig.
| [deleted]
| doctorpangloss wrote:
| Without a fully connected NVLink network, the 3090s will be
| underutilized for models that distribute the layers across
| multiple GPUs.
|
| If AMD were better supported, it would be most economical to use
| 4x MI60s for 128GB using an Infinity Fabric bridge. However, in
| order to get to the end of such a journey, you would have to know
| something.
| jacquesm wrote:
| What kind of factor would that be?
___________________________________________________________________
(page generated 2023-08-15 23:00 UTC)