_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
 (HTM) Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
 (DIR)   Show HN: GPULlama3.java Llama Compilied to PTX/OpenCL Now Integrated in Quarkus
       
       
        lostmsu wrote 11 hours 3 min ago:
        Does it support flash attention? Use tensor cores? Can I write custom
        kernels?
        
        UPD. found no evidence that it supports tensor cores, so it's going to
        be many times slower than implementations that do.
       
          mikepapadim wrote 4 hours 27 min ago:
          Yes, when you use the PTX backend it supports Tensor Cores.It has
          also implementation for flash attention. You can also write your own
          kernels, have a look here:
          
 (HTM)    [1]: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...
 (HTM)    [2]: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...
       
            lostmsu wrote 2 hours 47 min ago:
            TornadoVM GitHub has no mentions of tensor cores or WMMA
            instructions. The only mention of tensor cores is in 2024 and
            states they are not used:
            
 (HTM)      [1]: https://github.com/beehive-lab/TornadoVM/discussions/393
       
        mikepapadim wrote 21 hours 0 min ago:
        
        
 (HTM)  [1]: https://github.com/beehive-lab/GPULlama3.java
       
       
 (DIR) <- back to front page