_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
(HTM) Visit Hacker News on the Web
COMMENT PAGE FOR:
(DIR) Show HN: GPULlama3.java Llama Compilied to PTX/OpenCL Now Integrated in Quarkus
lostmsu wrote 11 hours 3 min ago:
Does it support flash attention? Use tensor cores? Can I write custom
kernels?
UPD. found no evidence that it supports tensor cores, so it's going to
be many times slower than implementations that do.
mikepapadim wrote 4 hours 27 min ago:
Yes, when you use the PTX backend it supports Tensor Cores.It has
also implementation for flash attention. You can also write your own
kernels, have a look here:
(HTM) [1]: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...
(HTM) [2]: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...
lostmsu wrote 2 hours 47 min ago:
TornadoVM GitHub has no mentions of tensor cores or WMMA
instructions. The only mention of tensor cores is in 2024 and
states they are not used:
(HTM) [1]: https://github.com/beehive-lab/TornadoVM/discussions/393
mikepapadim wrote 21 hours 0 min ago:
(HTM) [1]: https://github.com/beehive-lab/GPULlama3.java
(DIR) <- back to front page