[HN Gopher] Upgrading Multi-GPU Interconnectivity with the Third...
___________________________________________________________________
Upgrading Multi-GPU Interconnectivity with the Third-Generation
Nvidia NVSwitch
Author : my123
Score : 24 points
Date : 2022-11-20 19:34 UTC (3 hours ago)
(HTM) web link (developer.nvidia.com)
(TXT) w3m dump (developer.nvidia.com)
| mmastrac wrote:
| Has anyone used one of the Nvlink switches? That must be a power
| and heat beast.
| wmf wrote:
| NVSwitch looks around 1/4th the size of a contemporary Ethernet
| switch so it may be only ~100 W which is pretty small relative
| to the GPUs it's connecting.
| solardev wrote:
| Are those the things that kept melting and blowing up computers
| with the 4090s?
| dotnet00 wrote:
| Your question is understandable given the hysteria about
| those connectors, but they're power connectors and it seems
| there were only ~50 cases in total, mostly down to the
| connector not being inserted fully (with a handful due to
| manufacturing defects).
| mastax wrote:
| No. Some 4090s had issues with power connectors. The 4090
| does not support NVLink (though there is evidence that they
| were designed with the capability).
| marcyb5st wrote:
| Probably they are using the same PCBs as their Datacenter
| counterparts and there NVLink is extremely important to
| minimize overhead.
| verall wrote:
| They're not using the same PCBs because 4090 has the
| blow-through cooler and so a downsized pcb but the DC
| cards have big fanless heatsinks for cooling by chassis
| fans.
|
| 3090 had nvlink so it was probably considered for 4090
| but eventually cut for some reason
| buildbot wrote:
| Entertainingly, they do use molex micro fit connectors on
| SXM boards - the same ones as the 4090.
| anonymousDan wrote:
| Can anyone elaborate on the pros and cons of nvlink in comparison
| to e.g. rdma over infiniband/roce, or pci-e?
| [deleted]
| wmf wrote:
| I think NVLink is a memory interconnect more like NUMAlink,
| UPI, or Infinity Fabric. AFAIK GPUs can transparently access
| each other's memory over NVLink (probably at cache line
| granularity) while RDMA protocols require explicitly copying
| memory (usually in larger chunks) before accessing it.
| woodson wrote:
| According to the article, it has more than three times the
| bandwidth of PCI-E Gen5 per lane.
| rektide wrote:
| RDNA3 has memory/cache chiplets providing 3.5Tbps. The 0.9Tbps of
| much further distance chip to chip conmevtivity here is highly
| impressive. Interesting to note that RDNA3 links are only 10Gbps,
| they just have extreme interposer design with hundreds upon
| hundreds of incredibly small wires (resulting in stupendously
| high bit-efficiency); not something a multi-GPU design can do,
| and hence Nvidia doing 100Gbps links.
|
| Given where ethernet is (on the way to terabit ethernet) Im not
| surprised Nvidia is here, but it's still cool to see.
|
| The onboard SHARP processor on the NVswitch is very cool to read
| about. Given how much ML involves crunching many matrixes/tensors
| then adding the results, being able to parallelize the crunching
| and have a central aggregator makes sense. In general I hope we
| see an emerging presence of extremely high-fabric connectivity
| cpus & gpus that can serve as DPUs like this, as central
| coordinators. Thinking of this less as a switch (which it
| certainly is too, with it's vast bank of PHYs and massive
| crossbar) but more as an io cemtric computer is, I think,
| deserved.
___________________________________________________________________
(page generated 2022-11-20 23:00 UTC)