[HN Gopher] A RoCE network for distributed AI training at scale
___________________________________________________________________
A RoCE network for distributed AI training at scale
Author : mikece
Score : 47 points
Date : 2024-08-05 16:13 UTC (6 hours ago)
(HTM) web link (engineering.fb.com)
(TXT) w3m dump (engineering.fb.com)
| jauntywundrkind wrote:
| From the paper, seems like they are using RDMA to/from video
| cards, skipping the nic.
|
| > * These transactions require GPU-to-RDMA NIC support for
| optimal performance*
|
| Remarkably consumer computing actually has similarly found reason
| to bypass sending data through the cpu; texture streaming.
| DirectStorage and Sony's Kraken purport to let the GPU read
| direct from the SSD. It's a storage application instead of NIC,
| but still built around PCIe DMA-P2P (at least the DirectStorage
| is I think).
|
| Table 2, network stats for 128 GPUs is kind of interesting. Most
| topologies such as AllGather and AllReduce run with only 4 Queue
| Pairs. Not my area of expertise at all but wow that seems tiny!
| All this network, and basically everyone's talking to only a few
| peers? That's what it means right?
|
| The discussion at the end of the paper talked about Flowlets. The
| description makes me think a little bit of hash bucket chaining,
| where you try the first path, and if latter a conflict arise or
| the oath degrades, there's a fallback path already planned. Like
| there's would be a fallback chained bucket in a hash.
| wmf wrote:
| The NIC is still there but they're skipping the data copy from
| system RAM to GPU RAM. https://developer.nvidia.com/gpudirect
| eslaught wrote:
| So they're re-inventing HPC networks in the data center.
|
| https://en.wikipedia.org/wiki/Fat_tree
|
| https://www.cs.umd.edu/class/spring2021/cmsc714/readings/Kim...
|
| I'm sure there are innovations here, but most of this has been
| standard in HPC for decades. (Fat trees since 1985, Dragonfly
| since 2008.) This is not new science, folks.
| wmf wrote:
| It's not new science, but tuning RoCE performance is new
| engineering.
___________________________________________________________________
(page generated 2024-08-05 23:00 UTC)