[HN Gopher] ThunderKittens: Simple, fast, and adorable AI kernels
       ___________________________________________________________________
        
       ThunderKittens: Simple, fast, and adorable AI kernels
        
       Author : lnyan
       Score  : 65 points
       Date   : 2024-10-30 14:48 UTC (8 hours ago)
        
 (HTM) web link (hazyresearch.stanford.edu)
 (TXT) w3m dump (hazyresearch.stanford.edu)
        
       | mynameismon wrote:
       | How easy is it to run on older GPUs (think 1080Tis)? The reason I
       | ask this is because torch.compile refuses to support that, and
       | that alone makes things much slower.
        
         | almostgotcaught wrote:
         | > torch.compile
         | 
         | torch.compile is a pt2.0 feature and has nothing to do with
         | handwritten cuda kernels
         | 
         | > How easy is it to run on older GPUs
         | 
         | this is a torch cpp extension
         | 
         | https://github.com/HazyResearch/ThunderKittens/blob/8daffc9c...
         | 
         | so you're going to have the same exact issue (whatever issue
         | you're having)
        
         | danielhanchen wrote:
         | The other issue is Pascal cards don't have tensor cores, so
         | there much slower than those with them. You could try Unsloth
         | for 2x faster llama fine-tuning - someone made P40s and P100s
         | work. Although I would suggest upgrading to at least RTX 20x
         | series.
        
       | danielhanchen wrote:
       | This is super cool! Especially matrix mult getting similar or
       | better perf than cuBLAS! If anyone is interested on other kernels
       | like swiglu, geglu, RMS layernorm, I coded some at
       | https://github.com/unslothai/unsloth/tree/main/unsloth/kerne...
        
       | pama wrote:
       | I dont want to use the Platform Formerly Known as Twitter, but
       | does anyone have a way to get the link to their livestream
       | tomorrow?
        
         | convexstrictly wrote:
         | Simran Arora: "Join us for a _livestream this Thursday,
         | Halloween /Diwali_, and join our channel on the _GPU Mode
         | Discord server_ to hang out with us /get involved:"
         | 
         | https://discord.com/login?redirect_to=%2Fchannels%2F11894982...
        
       | convexstrictly wrote:
       | CUDA + ThunderKittens 4.5 hour tutorial
       | 
       | https://www.youtube.com/watch?v=xcpEl0cGCC4
        
       | Archit3ch wrote:
       | I hate to be that guy, but Metal support?
        
       ___________________________________________________________________
       (page generated 2024-10-30 23:01 UTC)