[HN Gopher] ThunderKittens: Simple, fast, and adorable AI kernels
___________________________________________________________________
ThunderKittens: Simple, fast, and adorable AI kernels
Author : lnyan
Score : 65 points
Date : 2024-10-30 14:48 UTC (8 hours ago)
(HTM) web link (hazyresearch.stanford.edu)
(TXT) w3m dump (hazyresearch.stanford.edu)
| mynameismon wrote:
| How easy is it to run on older GPUs (think 1080Tis)? The reason I
| ask this is because torch.compile refuses to support that, and
| that alone makes things much slower.
| almostgotcaught wrote:
| > torch.compile
|
| torch.compile is a pt2.0 feature and has nothing to do with
| handwritten cuda kernels
|
| > How easy is it to run on older GPUs
|
| this is a torch cpp extension
|
| https://github.com/HazyResearch/ThunderKittens/blob/8daffc9c...
|
| so you're going to have the same exact issue (whatever issue
| you're having)
| danielhanchen wrote:
| The other issue is Pascal cards don't have tensor cores, so
| there much slower than those with them. You could try Unsloth
| for 2x faster llama fine-tuning - someone made P40s and P100s
| work. Although I would suggest upgrading to at least RTX 20x
| series.
| danielhanchen wrote:
| This is super cool! Especially matrix mult getting similar or
| better perf than cuBLAS! If anyone is interested on other kernels
| like swiglu, geglu, RMS layernorm, I coded some at
| https://github.com/unslothai/unsloth/tree/main/unsloth/kerne...
| pama wrote:
| I dont want to use the Platform Formerly Known as Twitter, but
| does anyone have a way to get the link to their livestream
| tomorrow?
| convexstrictly wrote:
| Simran Arora: "Join us for a _livestream this Thursday,
| Halloween /Diwali_, and join our channel on the _GPU Mode
| Discord server_ to hang out with us /get involved:"
|
| https://discord.com/login?redirect_to=%2Fchannels%2F11894982...
| convexstrictly wrote:
| CUDA + ThunderKittens 4.5 hour tutorial
|
| https://www.youtube.com/watch?v=xcpEl0cGCC4
| Archit3ch wrote:
| I hate to be that guy, but Metal support?
___________________________________________________________________
(page generated 2024-10-30 23:01 UTC)