[HN Gopher] FlashAttention-2, 2x faster than FlashAttention
___________________________________________________________________
FlashAttention-2, 2x faster than FlashAttention
Author : machdiamonds
Score : 58 points
Date : 2023-07-17 18:21 UTC (4 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| whimsicalism wrote:
| Does anyone have resources for a good way to get started with
| this sort of modern GPU systems work?
| luckyt wrote:
| I found it helpful to start with CUDA on numba since it lets
| you write GPU kernels in python. Assuming you're like most ML
| engineers and you're more familiar with python than C++, this
| allows you to separately learn CUDA concepts from also learning
| C++ at the same time. There's also a set of GPU puzzles for
| beginners [1] using to get started with numba CUDA.
|
| [1] https://github.com/srush/GPU-Puzzles
| whimsicalism wrote:
| Thanks for the link! Sasha is actually my former professor -
| if this is anything like his past pytorch puzzles I'm sure
| I'll find it enjoyable.
| jahewson wrote:
| If you'd like a practical goal, you probably want to learn
| PyTorch and have a little background knowledge of the memory
| architecture of the GPUs. If you want to go deep, learn CUDA:
| https://docs.nvidia.com/cuda/cuda-c-programming-guide/index....
| whimsicalism wrote:
| Yes, I know pytorch well at this point and have basic memory
| architecture understanding. In the process of learning CUDA,
| but would love pointers for depth/intermediate things to
| explore.
| jahewson wrote:
| I found this talk helpful. https://on-
| demand.gputechconf.com/gtc/2017/presentation/s712...
|
| Have you tried the Visual Profiler yet?
| brrrrrm wrote:
| I'd start with the example of implementing the fastest
| reduction you possibly can. Pretty much all complexity in every
| kernel used in ML extends from this concept (reductions with
| addition).
|
| https://developer.download.nvidia.com/assets/cuda/files/redu...
| whimsicalism wrote:
| thank you for the suggestion - will take a look!
| ternaus wrote:
| I would be very greatfull to see how one can leverage it not for
| LLMs but for Stable Diffusion models
| m00x wrote:
| Why couldn't it be applied to SD?
| m00x wrote:
| It looks like it's already at thing
| https://github.com/AUTOMATIC1111/stable-diffusion-
| webui/blob...
| lucidrains wrote:
| huge! thank you Tri!
| bufo wrote:
| Tri Dao and Tim Dettmers ftw
| [deleted]
___________________________________________________________________
(page generated 2023-07-17 23:01 UTC)