[HN Gopher] Matrix Core Programming on AMD GPUs
___________________________________________________________________
Matrix Core Programming on AMD GPUs
Author : skidrow
Score : 106 points
Date : 2025-10-04 21:22 UTC (1 days ago)
(HTM) web link (salykova.github.io)
(TXT) w3m dump (salykova.github.io)
| gleenn wrote:
| Glad to see more articles out using AMD hardware acceleration
| especially for matrix math. More diversity in this space is
| welcome.
| latchkey wrote:
| Many people have been asking them for this sort of content, and
| it is happening. Can't be more excited. Also note that it is
| AMD, but not AMD. Being published in the open on an individual
| github.
| imtringued wrote:
| Whenever I see code like this, I'm starting to think that GPUs
| are uniquely unsuited for matrix multiplication.
|
| You're pretending that each streaming multiprocessor can handle
| independent threads, when in reality you're feeding something
| that only exists once or twice per SM. It's like independently
| controlling one out of 32 cars on a 32 lane highway where the
| cars aren't allowed to switch lanes and having the controls on
| one car replicated to all the others when in reality everyone is
| sitting in the same bus.
| MaxBarraclough wrote:
| I'm not sure I follow. Matrix multiplication isn't inherently
| 'branchy' in a way that we would expect to cause inefficient
| execution on SIMT (i.e. branch divergence).
| touisteur wrote:
| I think the remark is more about Tensor Cores (or Matrix
| Cores in AMD lingo) are distributed by SM (and not aside on
| an interconnect and individually programmable) so on the same
| SM you have your classical warps (cuda cores) AND the Tensor
| units and switching between one and the other might be
| confusing.
|
| My vision of SMs has always been "assume AVX512 is the
| default ISA" and "tensor cores are another layer aside of
| this" (kind-of like AMX) and you have this heterogeneous
| "thing" to program. Don't know if it helps. The CUDA
| programming model hides a lot and looking at PTX code in
| nsight-compute is most enlightening.
___________________________________________________________________
(page generated 2025-10-05 23:01 UTC)