[HN Gopher] Ask HN: Cheapest hardware to run Llama 2 70B
       ___________________________________________________________________
        
       Ask HN: Cheapest hardware to run Llama 2 70B
        
       Was wondering if I was to buy cheapest hardware (eg PC) to run for
       personal use at reasonable speed llama 2 70b what would that
       hardware be? Any experience or recommendations?
        
       Author : danielEM
       Score  : 28 points
       Date   : 2023-08-09 20:28 UTC (2 hours ago)
        
       | mromanuk wrote:
       | one 4090 + 3090-Ti
       | 
       | https://github.com/turboderp/exllama
        
       | amerine wrote:
       | You would need at least a RTX A6000 for the 70b. You're looking
       | at maybe $4k? Plus whatever you spend on the rest of the machine?
       | Maybe $6k all-in?
        
       | phas0ruk wrote:
       | SageMaker
        
         | jrflowers wrote:
         | How much can I purchase one (1) SageMaker for?
        
         | synaesthesisx wrote:
         | Fair warning - tools like SageMaker are good for simple use
         | cases, but SageMaker tends to abstract away a lot of
         | functionality you might find yourself digging through the
         | framework for. Not to mention - it's easy to rack up a hefty
         | AWS bill
        
         | vasili111 wrote:
         | How much approximately will be price per hour?
        
           | astrodust wrote:
           | Very yes.
        
       | tuxpenguine wrote:
       | I don't think it is the cheapest, but the tiny box is an option:
       | 
       | https://tinygrad.org/
        
       | spikedoanz wrote:
       | If you have a lot of money (but not H100/A100 money), get 4090s
       | as they're currently the best bang for your buck on the CUDA side
       | (according to George Hotz). If broke, get multiple second hand
       | 3090s. https://timdettmers.com/2023/01/30/which-gpu-for-deep-
       | learni.... If unwilling to spend any money at all and just want
       | to play around with llama70b, look into petals
       | https://github.com/bigscience-workshop/petals
        
       | gorbypark wrote:
       | A 192gb Mac Studio should be able to run an unquantized 70B and I
       | think would cost less than running a multi gpu setup made up of
       | nvidia cards. I haven't actually done the math, though. If you
       | factor in electricity costs over a certain time period it might
       | make the Mac even cheaper!
        
         | oceanplexian wrote:
         | A Mac studio will "run" the model as a glorified chat bot but
         | it'll be unusable for anything interesting at 5-6t/s. With a
         | couple of high end consumer GPUs you're going to get closer to
         | 20t/s. You also be able to realistically fine tune models, and
         | run other interesting things besides an LLM.
        
       | 1letterunixname wrote:
       | If it's only for a short time, use a price calculator to decide
       | if it's worth renting GPUs on a cloud provider. You can get
       | immediate temporary access for far more computing power than you
       | can ever hope to buy outright.
        
       | Ms-J wrote:
       | The only info I can provide is the table I've seen on:
       | https://github.com/jmorganca/ollama where it states one needs "32
       | GB to run the 13B models." I would assume you may need a GPU for
       | this.
       | 
       | Related, could someone please point me in the right direction on
       | how to run Wizard Vicuna Uncensored or Llama2 13B locally in
       | Linux? I've been searching for a guide and have not found what I
       | need for a beginner like myself. In the Github I referenced the
       | download is only for Mac at the time. I have a Macbook Pro M1 I
       | can use though it's running Debian.
       | 
       | Thank you.
        
         | hdjfkfbfbr wrote:
         | Hmmm I ran llama2 ggml q4 in 6gb ram with llama.CPP on my
         | laptop.
        
           | Ms-J wrote:
           | I very much do appreciate your comment and will look into
           | into llama.cpp. Was it from here:
           | https://github.com/ggerganov/llama.cpp
           | 
           | Do you have a guide that you followed and could link it to me
           | or was it just from prior knowledge? Also, do you know if I
           | could run the Wizard Vicuna on it? That model isn't listed on
           | the above page.
        
       ___________________________________________________________________
       (page generated 2023-08-09 23:02 UTC)