[HN Gopher] Demystifying NPUs: Questions and Answers
___________________________________________________________________
Demystifying NPUs: Questions and Answers
Author : rbanffy
Score : 49 points
Date : 2024-06-10 10:37 UTC (1 days ago)
(HTM) web link (thechipletter.substack.com)
(TXT) w3m dump (thechipletter.substack.com)
| patrickthebold wrote:
| Skimmed about 2/3s of this. It's unclear to me if code needs to
| be specially written to run on the NPU.
|
| Normally I think of something like CUDA to get code running on a
| GPU. Can code targeting a CPU or GPU automatically be sped up
| running on a NPU? Or does code explicitly need to target the NPU?
| lawlessone wrote:
| I agree I don't understand the difference? Are the calculations
| an NPU is capable of doing different to a GPU?
|
| Are they not basically identical hardware?
| klelatti wrote:
| NPUs are basically specialized for matrix multiplication,
| GPUs for more general parallel operations on multiple data
| (Single Instruction Multiple Thread) although modern GPUs
| also contain matrix multiplication units.
|
| May be a degree of software compatibility at the highest
| level - eg PyTorch - but the underlying software will be very
| different.
| adrian_b wrote:
| A NPU can do only a very small subset of the operations
| supported by a GPU.
|
| A NPU does strictly only the operations required for ML
| inference, which use data types with low precision, i.e.
| 16-bit or 8-bit types.
| foobiekr wrote:
| Different optimization choices.
| xcv123 wrote:
| By the same reasoning, a CPU is no different to a GPU. They
| can both do matrix calculations.
|
| A GPU is optimised for 3D rendering (and is useful for
| parallel computations in general). An NPU is optimised for
| neural network inferencing. These algorithms both involve
| matrix mathematics but they are not the same. The NPU
| hardware design matches the deep neural network inferencing
| algorithm. For example it has an "Activation Function" block
| dedicated to computing the activation function between neural
| network layers. It is optimised and specialised for one very
| specific algorithm: inferencing. A GPU would beat an NPU for
| training, and any other parallel computations besides
| inferencing.
| TheDudeMan wrote:
| High-end Nvidia GPUs are not optimized for 3D rendering;
| they are optimized for machine learning and inference.
| eterps wrote:
| I'm also wondering if such an NPU can be targeted (in a way
| that can be understood) from the assembly/machine language
| level. Or that it needs an opaque kitchen sink of libraries,
| blobs and other abstractions.
| klelatti wrote:
| Opaque blobs I believe in almost all cases!
| qludes wrote:
| I believe it's the latter, each NPU vendor has their own
| software stack. Take a look at Tomeu Vizoso's work:
|
| https://blog.tomeuvizoso.net/search/label/npu
| wmf wrote:
| You can write assembly for NPUs although the instruction set
| may be quite janky compared to CPUs. Once you've written NPU
| code you need some libraries to load your code but that's not
| particularly different from the fact that CPUs now need
| massive firmware to boot.
|
| Back in reality, that's not how any vendor intends their NPUs
| to be used. They provide high-level libraries that implement
| ONNX, CoreML, DirectML, or whatever and they expect you to
| just use those.
| cherioo wrote:
| My understanding is that the low level APIs are different, CUDA
| vs. whatever Android provides vs. whatever Apple provides.
| However, higher abstraction like PyTorch may be able to target
| different platforms with less code changes.
___________________________________________________________________
(page generated 2024-06-11 23:01 UTC)