[HN Gopher] Ask HN: Cheapest hardware to run Llama 2 70B
___________________________________________________________________
Ask HN: Cheapest hardware to run Llama 2 70B
Was wondering if I was to buy cheapest hardware (eg PC) to run for
personal use at reasonable speed llama 2 70b what would that
hardware be? Any experience or recommendations?
Author : danielEM
Score : 28 points
Date : 2023-08-09 20:28 UTC (2 hours ago)
| mromanuk wrote:
| one 4090 + 3090-Ti
|
| https://github.com/turboderp/exllama
| amerine wrote:
| You would need at least a RTX A6000 for the 70b. You're looking
| at maybe $4k? Plus whatever you spend on the rest of the machine?
| Maybe $6k all-in?
| phas0ruk wrote:
| SageMaker
| jrflowers wrote:
| How much can I purchase one (1) SageMaker for?
| synaesthesisx wrote:
| Fair warning - tools like SageMaker are good for simple use
| cases, but SageMaker tends to abstract away a lot of
| functionality you might find yourself digging through the
| framework for. Not to mention - it's easy to rack up a hefty
| AWS bill
| vasili111 wrote:
| How much approximately will be price per hour?
| astrodust wrote:
| Very yes.
| tuxpenguine wrote:
| I don't think it is the cheapest, but the tiny box is an option:
|
| https://tinygrad.org/
| spikedoanz wrote:
| If you have a lot of money (but not H100/A100 money), get 4090s
| as they're currently the best bang for your buck on the CUDA side
| (according to George Hotz). If broke, get multiple second hand
| 3090s. https://timdettmers.com/2023/01/30/which-gpu-for-deep-
| learni.... If unwilling to spend any money at all and just want
| to play around with llama70b, look into petals
| https://github.com/bigscience-workshop/petals
| gorbypark wrote:
| A 192gb Mac Studio should be able to run an unquantized 70B and I
| think would cost less than running a multi gpu setup made up of
| nvidia cards. I haven't actually done the math, though. If you
| factor in electricity costs over a certain time period it might
| make the Mac even cheaper!
| oceanplexian wrote:
| A Mac studio will "run" the model as a glorified chat bot but
| it'll be unusable for anything interesting at 5-6t/s. With a
| couple of high end consumer GPUs you're going to get closer to
| 20t/s. You also be able to realistically fine tune models, and
| run other interesting things besides an LLM.
| 1letterunixname wrote:
| If it's only for a short time, use a price calculator to decide
| if it's worth renting GPUs on a cloud provider. You can get
| immediate temporary access for far more computing power than you
| can ever hope to buy outright.
| Ms-J wrote:
| The only info I can provide is the table I've seen on:
| https://github.com/jmorganca/ollama where it states one needs "32
| GB to run the 13B models." I would assume you may need a GPU for
| this.
|
| Related, could someone please point me in the right direction on
| how to run Wizard Vicuna Uncensored or Llama2 13B locally in
| Linux? I've been searching for a guide and have not found what I
| need for a beginner like myself. In the Github I referenced the
| download is only for Mac at the time. I have a Macbook Pro M1 I
| can use though it's running Debian.
|
| Thank you.
| hdjfkfbfbr wrote:
| Hmmm I ran llama2 ggml q4 in 6gb ram with llama.CPP on my
| laptop.
| Ms-J wrote:
| I very much do appreciate your comment and will look into
| into llama.cpp. Was it from here:
| https://github.com/ggerganov/llama.cpp
|
| Do you have a guide that you followed and could link it to me
| or was it just from prior knowledge? Also, do you know if I
| could run the Wizard Vicuna on it? That model isn't listed on
| the above page.
___________________________________________________________________
(page generated 2023-08-09 23:02 UTC)