[HN Gopher] FlexGen: Running large language models on a single GPU
___________________________________________________________________
FlexGen: Running large language models on a single GPU
Author : behnamoh
Score : 126 points
Date : 2023-03-26 05:31 UTC (17 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| aliljet wrote:
| This is absolutely something I was running into with LLaMA. I'm
| curious if this potentially extends into that particular use
| case...
| stavros wrote:
| What's the currently best-performing LLM that one can run with
| this?
| zargon wrote:
| Previous discussion is here (266 comments):
| https://news.ycombinator.com/item?id=34869960
| kkielhofner wrote:
| What's really amazing about a lot of these recent projects is
| they tend to provide benchmarks running on an Nvidia T4. They use
| these because they're relatively cheap from cloud providers and
| you can usually actually get them (as opposed to requesting and
| getting denied for an A100 or whatever).
|
| For those that aren't familiar with it they are tiny power and
| density optimized GPUs. I have the successor (A2) and total max
| TDP is 60 watts. Single slot. Slot only power, and passively
| cooled.
|
| Depending on workload I observe it to be roughly 5-10x slower
| than a 3090, which means for most people at home with a spare
| Nvidia gaming card (or whatever) you'll see results from these
| project at a performance multiple of the benchmarks they provide.
|
| The one caveat is that the T4/A2 have 16GB VRAM, which makes them
| more capable (albeit slower) than a "low end" desktop card like
| the 3070 which has only 8GB VRAM. But as HN readers know there is
| incredible progress daily to reduce VRAM requirements for these
| models!
| [deleted]
| lxe wrote:
| Best way to mess around with FlexGen and LLMs on local hardware
| in general is https://github.com/oobabooga/text-generation-webui
| boppo1 wrote:
| Can't get it to run on amd 6700xt even though there's ROCm
| installation instructions. Tried to run llama 7b but got hung
| up because bitsandbytes calls CUDA.
| [deleted]
| [deleted]
| stuckinhell wrote:
| This is absolutely stunning work. Excited to try it out on my
| husband's homelab.
| ByThyGrace wrote:
| I'm just here waiting for the "LLM retard guide" to come out, as
| it happened for stable diffusion last August.
| arthurcolle wrote:
| Link to stable diffusion ref you're referring to? I was able to
| run the model and everything, so pretty familiar, but just
| wondering if you're referring to a specific document! Haha
| neilv wrote:
| > _This project was made possible thanks to a collaboration with
| ... Yandex Research ..._
|
| I'm all for global cooperation and fellowship. Are sanctions
| going to be a barrier for this and related projects?
| blagie wrote:
| It depends on the work and the sanctions regime, but in
| general:
|
| - The best sanctions impact targeted industries (e.g. anything
| needed to build tanks, warplanes, etc.).
|
| - The worst sanctions impact communications and collaboration.
| Change comes from conversations. Media, non-military education,
| and non-military academic collaboration are bad usually bad
| sanctions.
| lxe wrote:
| Bing says: I could not find any specific
| information on how these sanctions affect academic research and
| cooperation between the US and Russia. Some sources suggest
| that some academics have canceled conferences, joint projects,
| and funding with Russian institutions as a form of self-imposed
| sanctions2, while others indicate that Russian students are
| still able to secure visas to study abroad3. Therefore, the
| impact of the sanctions on academic research and cooperation
| may vary depending on the field, institution, and individual
| circumstances.
___________________________________________________________________
(page generated 2023-03-26 23:00 UTC)