[HN Gopher] Demystifying NPUs: Questions and Answers
       ___________________________________________________________________
        
       Demystifying NPUs: Questions and Answers
        
       Author : rbanffy
       Score  : 49 points
       Date   : 2024-06-10 10:37 UTC (1 days ago)
        
 (HTM) web link (thechipletter.substack.com)
 (TXT) w3m dump (thechipletter.substack.com)
        
       | patrickthebold wrote:
       | Skimmed about 2/3s of this. It's unclear to me if code needs to
       | be specially written to run on the NPU.
       | 
       | Normally I think of something like CUDA to get code running on a
       | GPU. Can code targeting a CPU or GPU automatically be sped up
       | running on a NPU? Or does code explicitly need to target the NPU?
        
         | lawlessone wrote:
         | I agree I don't understand the difference? Are the calculations
         | an NPU is capable of doing different to a GPU?
         | 
         | Are they not basically identical hardware?
        
           | klelatti wrote:
           | NPUs are basically specialized for matrix multiplication,
           | GPUs for more general parallel operations on multiple data
           | (Single Instruction Multiple Thread) although modern GPUs
           | also contain matrix multiplication units.
           | 
           | May be a degree of software compatibility at the highest
           | level - eg PyTorch - but the underlying software will be very
           | different.
        
           | adrian_b wrote:
           | A NPU can do only a very small subset of the operations
           | supported by a GPU.
           | 
           | A NPU does strictly only the operations required for ML
           | inference, which use data types with low precision, i.e.
           | 16-bit or 8-bit types.
        
           | foobiekr wrote:
           | Different optimization choices.
        
           | xcv123 wrote:
           | By the same reasoning, a CPU is no different to a GPU. They
           | can both do matrix calculations.
           | 
           | A GPU is optimised for 3D rendering (and is useful for
           | parallel computations in general). An NPU is optimised for
           | neural network inferencing. These algorithms both involve
           | matrix mathematics but they are not the same. The NPU
           | hardware design matches the deep neural network inferencing
           | algorithm. For example it has an "Activation Function" block
           | dedicated to computing the activation function between neural
           | network layers. It is optimised and specialised for one very
           | specific algorithm: inferencing. A GPU would beat an NPU for
           | training, and any other parallel computations besides
           | inferencing.
        
             | TheDudeMan wrote:
             | High-end Nvidia GPUs are not optimized for 3D rendering;
             | they are optimized for machine learning and inference.
        
         | eterps wrote:
         | I'm also wondering if such an NPU can be targeted (in a way
         | that can be understood) from the assembly/machine language
         | level. Or that it needs an opaque kitchen sink of libraries,
         | blobs and other abstractions.
        
           | klelatti wrote:
           | Opaque blobs I believe in almost all cases!
        
           | qludes wrote:
           | I believe it's the latter, each NPU vendor has their own
           | software stack. Take a look at Tomeu Vizoso's work:
           | 
           | https://blog.tomeuvizoso.net/search/label/npu
        
           | wmf wrote:
           | You can write assembly for NPUs although the instruction set
           | may be quite janky compared to CPUs. Once you've written NPU
           | code you need some libraries to load your code but that's not
           | particularly different from the fact that CPUs now need
           | massive firmware to boot.
           | 
           | Back in reality, that's not how any vendor intends their NPUs
           | to be used. They provide high-level libraries that implement
           | ONNX, CoreML, DirectML, or whatever and they expect you to
           | just use those.
        
         | cherioo wrote:
         | My understanding is that the low level APIs are different, CUDA
         | vs. whatever Android provides vs. whatever Apple provides.
         | However, higher abstraction like PyTorch may be able to target
         | different platforms with less code changes.
        
       ___________________________________________________________________
       (page generated 2024-06-11 23:01 UTC)