[HN Gopher] Demystifying GPU compute architectures
___________________________________________________________________
Demystifying GPU compute architectures
Author : rbanffy
Score : 14 points
Date : 2024-02-06 19:20 UTC (3 hours ago)
(HTM) web link (thechipletter.substack.com)
(TXT) w3m dump (thechipletter.substack.com)
| einpoklum wrote:
| The main problem with GPU architectures is that the manufacturers
| hide their details from us. Hell, they don't even document the
| instruction sets! At least NVIDIA doesn't. And they often don't
| tell us what the hardware actually _does_, and rather offer a
| mental model of how we can think about it.
|
| Specifically, those structural diagrams of functional-units
| within SM's you see in the blog post? That comes from NVIDIA. And
| they explicitly state that they do _not_ guarantee that this is
| what the hardware is actually like. The hardware works "as-if" it
| were made up of this kind of units. And even more specifically -
| it's not clear whether there even is such a thing as a "tensor
| core", or whether it's just some hack for doing lower-precision
| FP math somewhat faster.
|
| -----
|
| Anyway, if the architectures weren't mostly _hidden_, we would
| make them far less mysterious within a rather short period of
| time.
| wolletd wrote:
| Just to be fair: as also stated at the beginning of the post,
| CPUs are far more complex nowadays than they represent
| themselves to the outside.
|
| They don't work like a von-Neumann machine, they act like one.
| Granted, we know a lot more about the inner workings of modern
| CPUs thn GPUs, but a lot of real-life work still assumes that
| the CPU works "as-if" it was a computer from the 70s, just
| really, really fast.
| dist-epoch wrote:
| GPU instructions sets change every year/generation.
|
| Your software would not run next year if you directly targeted
| the instruction set.
|
| NVIDIA does document their PTX instruction set (a level above
| what the hardware actually runs):
|
| https://docs.nvidia.com/cuda/parallel-thread-execution/index...
| Xeamek wrote:
| ...But literally the exact same is the case for CPUs and yet
| we do have public and constant instruction set for decades
| now?
|
| Altough ofcourse CPU's instruction are also just a frontend
| api that behind the scenes is implemented using microcode,
| which probably is much less stable.
|
| But the point is, if we could move one level 'closer' on
| gpus, just like we have it on cpus, it would stop the big
| buisness gate-keeping that exsists when it comes to current
| day GPU apis/libraries
| dist-epoch wrote:
| That's not true, you can run a 30 years old binary on
| modern CPUs. The machine code you feed to the CPU didn't
| change much.
|
| That's not true for GPUs, the machine code changes very
| frequently. You feed your "binary" (PTX, ...) to the
| driver, and the driver compiles it to the actual machine
| code of your actual GPU.
___________________________________________________________________
(page generated 2024-02-06 23:00 UTC)