[HN Gopher] Experiments with Byte Matrix Multiplication
       ___________________________________________________________________
        
       Experiments with Byte Matrix Multiplication
        
       Author : serge-ss-paille
       Score  : 27 points
       Date   : 2025-01-10 15:36 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | gok wrote:
       | Curious how this compares with, say, the implementation of
       | gemm_s8s8s32 in Intel's MKL / OneAPI.
        
       | dkhudia wrote:
       | > It's quite common in machine learning operations to multiply a
       | matrix of unsigned byte by a matrix of signed byte. Don't ask me
       | why, but that's the case.
       | 
       | Overflow is the reason. Intel's vpmaddubsw takes int8_t and
       | uint8_t to give you results in int16_t. If both are unsigned 255
       | * 255 = 65025 will be out of range for int16_t (-32,768 to
       | +32,767) so likely the instruction is designed to take int8_t and
       | uint8_t. However, if one is signed and other is unsigned extremes
       | -128 * 255 or 127 * 255 are always in int16_t range. The overflow
       | (or rather saturation with this instruction) can still occur
       | because it sums adjacent multiplications. See my comment in
       | PyTorch.
       | https://github.com/pytorch/pytorch/blob/a37db5ae3978010e1bb7...
        
         | atq2119 wrote:
         | This doesn't feel like a convincing argument. If you wanted to
         | multiply uint8 * uint8, you'd naturally use an unsigned
         | multiply with a uint16 result. That doesn't overflow either.
         | 
         | I believe a better argument is to appeal to the structure of
         | neural networks. Activation inputs into a matrix multiply come
         | out of a non-linear function, and ReLU is a popular function
         | which causes activation inputs to be unsigned. Weights then
         | need to be signed so that the matrix multiplication can have
         | negative outputs -- without negative outputs, you would lose
         | the non-linearity of ReLU.
        
       ___________________________________________________________________
       (page generated 2025-01-10 23:01 UTC)