[HN Gopher] Demystifying GPU compute architectures
       ___________________________________________________________________
        
       Demystifying GPU compute architectures
        
       Author : rbanffy
       Score  : 14 points
       Date   : 2024-02-06 19:20 UTC (3 hours ago)
        
 (HTM) web link (thechipletter.substack.com)
 (TXT) w3m dump (thechipletter.substack.com)
        
       | einpoklum wrote:
       | The main problem with GPU architectures is that the manufacturers
       | hide their details from us. Hell, they don't even document the
       | instruction sets! At least NVIDIA doesn't. And they often don't
       | tell us what the hardware actually _does_, and rather offer a
       | mental model of how we can think about it.
       | 
       | Specifically, those structural diagrams of functional-units
       | within SM's you see in the blog post? That comes from NVIDIA. And
       | they explicitly state that they do _not_ guarantee that this is
       | what the hardware is actually like. The hardware works "as-if" it
       | were made up of this kind of units. And even more specifically -
       | it's not clear whether there even is such a thing as a "tensor
       | core", or whether it's just some hack for doing lower-precision
       | FP math somewhat faster.
       | 
       | -----
       | 
       | Anyway, if the architectures weren't mostly _hidden_, we would
       | make them far less mysterious within a rather short period of
       | time.
        
         | wolletd wrote:
         | Just to be fair: as also stated at the beginning of the post,
         | CPUs are far more complex nowadays than they represent
         | themselves to the outside.
         | 
         | They don't work like a von-Neumann machine, they act like one.
         | Granted, we know a lot more about the inner workings of modern
         | CPUs thn GPUs, but a lot of real-life work still assumes that
         | the CPU works "as-if" it was a computer from the 70s, just
         | really, really fast.
        
         | dist-epoch wrote:
         | GPU instructions sets change every year/generation.
         | 
         | Your software would not run next year if you directly targeted
         | the instruction set.
         | 
         | NVIDIA does document their PTX instruction set (a level above
         | what the hardware actually runs):
         | 
         | https://docs.nvidia.com/cuda/parallel-thread-execution/index...
        
           | Xeamek wrote:
           | ...But literally the exact same is the case for CPUs and yet
           | we do have public and constant instruction set for decades
           | now?
           | 
           | Altough ofcourse CPU's instruction are also just a frontend
           | api that behind the scenes is implemented using microcode,
           | which probably is much less stable.
           | 
           | But the point is, if we could move one level 'closer' on
           | gpus, just like we have it on cpus, it would stop the big
           | buisness gate-keeping that exsists when it comes to current
           | day GPU apis/libraries
        
             | dist-epoch wrote:
             | That's not true, you can run a 30 years old binary on
             | modern CPUs. The machine code you feed to the CPU didn't
             | change much.
             | 
             | That's not true for GPUs, the machine code changes very
             | frequently. You feed your "binary" (PTX, ...) to the
             | driver, and the driver compiles it to the actual machine
             | code of your actual GPU.
        
       ___________________________________________________________________
       (page generated 2024-02-06 23:00 UTC)