[HN Gopher] Serving AI from the Basement - 192GB of VRAM Setup
       ___________________________________________________________________
        
       Serving AI from the Basement - 192GB of VRAM Setup
        
       Author : XMasterrrr
       Score  : 206 points
       Date   : 2024-09-08 17:47 UTC (5 hours ago)
        
 (HTM) web link (ahmadosman.com)
 (TXT) w3m dump (ahmadosman.com)
        
       | XMasterrrr wrote:
       | Hey guys, this is something I have been intending to share here
       | for a while. This setup took me some time to plan and put
       | together, and then some more time to explore the software part of
       | things and the possibilities that came with it.
       | 
       | Part of the main reason I built this was data privacy, I do not
       | want to hand over my private data to any company to further train
       | their closed weight models; and given the recent drop in output
       | quality on different platforms (ChatGPT, Claude, etc), I don't
       | regret spending the money on this setup.
       | 
       | I was also able to do a lot of cool things using this server by
       | leveraging tensor parallelism and batch inference, generating
       | synthetic data, and experimenting with finetuning models using my
       | private data. I am currently building a model from scratch,
       | mainly as a learning project, but I am also finding some cool
       | things while doing so and if I can get around ironing out the
       | kinks, I might release it and write a tutorial from my notes.
       | 
       | So I finally had the time this weekend to get my blog up and
       | running, and I am planning on following up this blog post with a
       | series of posts on my learnings and findings. I am also open to
       | topics and ideas to experiment with on this server and write
       | about, so feel free to shoot your shot if you have ideas you want
       | to experiment with and don't have the hardware, I am more than
       | willing to do that on your behalf and sharing the findings
       | 
       | Please let me know if you have any questions, my PMs are open,
       | and you can also reach me on any of the socials I have posted on
       | my website.
        
         | nrp wrote:
         | How are you finding 2b/3b quantized llama 405B? Is it behaving
         | better than 8b or 16b llama 70B?
        
         | nullindividual wrote:
         | Do you run this 24/7?
         | 
         | What is your cost of electricity per kilowatt hour and what is
         | the cost of this setup per month?
        
           | michaelt wrote:
           | I have a much smaller setup than the author - a quarter the
           | GPUs and RAM - and I was surprised to find it draws 300W at
           | _idle_
        
             | nullindividual wrote:
             | The reason I asked is I used to run a dual X5650 server
             | with SSDs and it was about $50/month with the cheapest (or
             | very close to) rates in the US.
        
               | disiplus wrote:
               | We have more expensive gas then usa, but i pay like a 5
               | cent per kwh @220V
               | 
               | Did not know how expensive it is in usa, especially
               | California.
        
               | fuzzybear3965 wrote:
               | Yep. ~$.33/kWh in Southern California (SoCal Edison) and
               | going up all the time!
        
               | fragmede wrote:
               | $0.51/KwH during peek hours in San Francisco!
        
           | trollbridge wrote:
           | This is a setup that might make more sense to run at full
           | power during winter months.
        
         | bravura wrote:
         | How loud is it? Was special electrical needed?
        
         | mattnewton wrote:
         | The main thing stopping me from going beyond 2x 4090's in my
         | home lab is power. Anything around ~2k watts on a single
         | circuit breaker is likely to flip it, and that's before you get
         | to the costs involved of drawing that much power for multiple
         | days of a training run. How did you navigate that in a
         | (presumably) residential setting?
        
           | abound wrote:
           | Not OP, but my current home had a dedicated 50A/240V circuit
           | because the previous owner did glass work and had a massive
           | electric kiln. I can't imagine it was cheap to install, but
           | I've used it for beefy, energy hungry servers in the past.
           | 
           | Which is all to say its possible in a residential setting,
           | just probably expensive.
        
             | woleium wrote:
             | Yes, or something like a residential aircon heatpump will
             | need a 40a circuit too. Car charging usually has a 30a.
             | Electric oven is usually 40a. There's lots of stuff that
             | uses that sort of power residentially
        
           | throwthrowuknow wrote:
           | Not speaking from direct experience building a rig like this
           | but the blog post mentions having 3 power supplies so the
           | most direct solution would be to put each on their own
           | dedicated circuit. As long as you have space in your
           | electrical box this is straightforward to do though I would
           | recommend having an electrician do the wiring if you aren't
           | experienced with that type of home electrical work.
        
             | gizmo686 wrote:
             | Even without space in the existing box, installing a
             | subpanel isn't that much more of a cost.
        
           | littlestymaar wrote:
           | Then juste add a 32A circuit breaker to your electrical
           | installation, it's not a big deal really.
        
           | bluedino wrote:
           | Take your typical 'GPU node', which would be a
           | Dell/HP/SuperMicro with 4-8 NVIDIA H100's and a single top
           | high level AMD/Intel CPU. You would need 2-4 240v outlets
           | (30A).
           | 
           | In the real world you would plug them into a PDU such as:
           | https://www.apc.com/us/en/product/AP9571A/rack-pdu-
           | basic-1u-...
           | 
           | Each GPU will take around 700W and then you have the rest of
           | the system to power, so depending on CPU/RAM/storage...
           | 
           | And then you need to cool it!
        
           | tcdent wrote:
           | I can't believe a group of engineers are so afraid of
           | residential power.
           | 
           | It is not expensive, nor is it highly technical. It's not
           | like we're factoring in latency and crosstalk...
           | 
           | Read a quick howto, cruise into Home Depot and grab some
           | legos off the shelf. Far easier to figure out than executing
           | "hello world" without domain expertise.
        
             | 3eb7988a1663 wrote:
             | People can and do die from misuses of electricity. Not a
             | move-fast-and-break things kind of domain.
        
               | varispeed wrote:
               | You only "break" once...
        
               | lbotos wrote:
               | I've been learning Japanese and a favorite of mine is: Yi
               | Ti
               | 
               | Which is used as "what the heck" but it's direct kanji
               | translation is _one body_.
               | 
               | https://jisho.org/word/%E4%B8%80%E4%BD%93
        
             | gizmo686 wrote:
             | A good engineer knows the difference between safe and
             | dangerous. Setting up an AI computer is safe. Maybe you
             | trip a circut. Maybe you interfere with something else
             | running on your hobby computer. But nothing bad can really
             | happen.
             | 
             | Residential electrical is dangerous. Maybe you electrocute
             | yourself. Maybe you cause a fire 5 years down the line.
             | Maybe you cause a fire for the next owner because you
             | didn't know to protect the wire with a metal plate so they
             | drill into it.
             | 
             | Having said that, 2 4090s will run you aroud $5,000, not
             | counting any of the surrounding system. At that cost point,
             | hireing an electritian would not be that big of an expense
             | relativly speaking.
             | 
             | Also, if you are at the point where you need to add a
             | circut for power, you might need to seriously consider
             | cooling, which could potentially be another side quest.
        
               | tjoff wrote:
               | Add to that is that it is likely illegal to do yourself.
               | Which of course has implications for insurance etc.
        
               | m-s-y wrote:
               | In the US, it's fully legal to perform
               | electric/plumbing/whatever work on your own home.
               | 
               | If you screw it up and need to file a claim, insurance
               | _can't_ deny the claim based solely on the fact that you
               | performed the work yourself, even if you're not a
               | certified electrician /plumber/whatever.
               | 
               | What you _don 't_ want to do is have an unlicensed friend
               | work on your home, and vice versa. There are no legal
               | protections, and the insurance companies absolutely
               | _will_ go after you /your friend for damages.
               | 
               | Edit: sorry this applies to owned property, not if you're
               | renting
        
             | lolinder wrote:
             | > I can't believe a group of engineers are so afraid of
             | residential power. ... Read a quick howto, cruise into Home
             | Depot and grab some legos off the shelf. Far easier to
             | figure out than executing "hello world" without domain
             | expertise.
             | 
             | The instinct to not touch something that you don't yet
             | deeply understand is very much an engineer's instinct. Any
             | engineer worthy of the title has often spent weeks
             | carefully designing a system to take care of the hundreds
             | of edge cases that weren't apparent at a quick glance. Once
             | you've done that once (much less dozens of times) you have
             | a healthy respect for the complexity that usually lurks
             | below the surface, and you're loathe to confidently insert
             | yourself confidently into an unfamiliar domain that has a
             | whole engineering discipline dedicated to it. You
             | understand that those engineers are employed full time for
             | a reason.
             | 
             | The attitude you describe is one that's useful in a lot of
             | cases and may even be correct for this particular
             | application (though I'm personally leery of it), but if
             | confidently injecting yourself into territory you don't
             | know well is what being an "engineer" means to you, that's
             | a sad commentary on the state of software engineering
             | today.
        
               | tcdent wrote:
               | Sir, this is "Hacker News".
        
               | lolinder wrote:
               | So did you mean "I can't believe a group of hackers are
               | so afraid of residential power"?
        
             | fhdsgbbcaA wrote:
             | You're forgetting many people have landlords who aren't
             | exactly keen on tenants doing diy electrical work.
        
             | wpietri wrote:
             | Ah yes, the "move fast and burn your house down" school of
             | "engineering".
        
           | J_Shelby_J wrote:
           | I'm running two 3090s on a 700w psu. You definitely can get
           | more than that out of 2000w bus.
           | 
           | I wrote a blog on reducing the power limits of nvidia gpus.
           | Definitely try it out.
           | https://shelbyjenkins.github.io/blog/power-limit-nvidia-
           | linu...
        
             | smcnally wrote:
             | Thank you for this post. I'd read it in ~June and it helped
             | quite a bit with manual 'nvidia-smi' runs. I just recently
             | created the systemd service description and am still
             | delving related power and performance possibilities.
        
           | sandos wrote:
           | This is funny as a european, since we have many, many groups
           | where we reguarly will run 2kW, and some, loads. Really no
           | issue, but I guess lower voltage makes it a problem.
        
             | sixothree wrote:
             | Yup. We typically have 20 amp breakers in living portions
             | of the house and it's common practice for most devices to
             | top out at 1500 watts. But from your description, you would
             | still need three lines and three breakers. So. I'm not
             | understanding your point.
        
           | slavik81 wrote:
           | Not the OP, but I hired an electrician to put a 30A 240V
           | circuit with a NEMA L6-30R recepticle next to my electrical
           | panel. It was 600 CAD. You can probably get it done cheaper.
           | He had to disconnect another circuit and make a trip to the
           | hardwate store because I told him to bring the wrong breaker.
        
           | orbital-decay wrote:
           | _> Anything around ~2k watts on a single circuit breaker is
           | likely to flip it_
           | 
           | I'm curious, how do you use e.g. a washing machine or an
           | electric kettle, if 2kW is enough to flip your breaker? You
           | should simply know your wiring limits. Breaker/wiring at my
           | home won't even notice this.
        
             | immibis wrote:
             | Americans do not have electric kettles and need special
             | circuits for electric clothes dryers.
        
               | lolinder wrote:
               | We have an electric kettle in the US and it runs just
               | fine drawing 1500W.
               | 
               | You're correct that the dryer is on a larger circuit,
               | though.
        
               | beAbU wrote:
               | > and it runs just fine drawing 1500W.
               | 
               | You think that this is "just fine" because you've never
               | experienced the glory that is a 3kW kettle!
        
               | blibble wrote:
               | I get bored and tend to wander off waiting for it to boil
               | at 3kW
               | 
               | 1.5kW must be absolute agony
        
               | lolinder wrote:
               | I mean... yes, I don't sit around waiting for the kettle
               | to boil. But if I fill it and start it first the water is
               | already boiling by the time I get everything out, so it's
               | not like any time is wasted as is.
        
               | jamesbfb wrote:
               | Huh, what?! Mega TIL moment for me as an Australian with
               | an electric kettle and dryer plugged into whatever power
               | socket I wish! Reminds me of this great Technology
               | Connections video:
               | https://youtu.be/jMmUoZh3Hq4?si=3vSMHmU2ClwNRtow
        
             | trillic wrote:
             | My kettle only pulls 1500W, as do most in the US. Our water
             | just takes longer to boil than in Europe. My washer / dryer
             | has its own 30a breaker as does my Oven as well as water
             | heater. My garbage disposal has its own 15a breaker.
             | 
             | Boiling 1 liter takes like 2 mins. Most Americans don't
             | have kettles because they don't drink tea.
        
           | teaearlgraycold wrote:
           | I've ran 3x L40S on a 1650W PSU on a normal 120V 20A circuit.
        
         | pupdogg wrote:
         | Amazing setup. I have the capability to design, fabricate, and
         | powder coat sheet metal. I would love to collaborate on
         | designing and fabricating a cool enclosure for this setup. Let
         | me know if you're interested.
        
         | lossolo wrote:
         | Cool, it looks similar to my crypto mining rigs (8xGPU per
         | node) from around 7 years ago, but I used PCI-E risers and a
         | dual power supply.
        
       | freeqaz wrote:
       | How much do the NVLinks help in this case?
       | 
       | Do you have a rough estimate of how much this cost? I'm curious
       | since I just built my own 2x 3090 rig and I wondered about going
       | EPYC for the potential to have more cards (stuck with AM5 for
       | cheapness though).
       | 
       | All in all I spent about $3500 for everything. I'm guessing this
       | is closer to $12-15k? CPU is around $800 on eBay.
        
         | lvl155 wrote:
         | My reason for going Epyc was for Pcie lanes and cheaper
         | enterprise SSDs via U.3/2. With AM5, you tap out the lanes with
         | dual GPUs. Threadripper is preferable but Epyc is about 1/2 of
         | the price or even better if you go last gen.
        
           | Eisenstein wrote:
           | Why do you need such high cross card bandwidth for inference?
           | Are you hosting for a lot of users at once?
        
         | Tepix wrote:
         | I built this in early 2023 out of used parts and ended up with
         | a cost of 2300EUR for AM4/128GB/2x3090 @ PCIe 4.0x8 +nvLink
        
       | rvnx wrote:
       | You could just buy a Mac Studio for 6500 USD, have 192 GB of
       | unified RAM and have way less power consumption.
        
         | flemhans wrote:
         | Are people running llama 3.1 405B on them?
        
           | rspoerri wrote:
           | I'm running 70B models (usually in q4 .. q5_k_m, but possible
           | to q6) on my 96Gbyte Macbook Pro with M2-Max (12 cpu cores,
           | 38 gpu cores). This also leaves me with plenty of ram for
           | other purposes.
           | 
           | I'm currently using reflection:70b_q4 which does a very good
           | job in my opinion. It generates with 5.5 tokens/s for the
           | response, which is just about my reading speed.
           | 
           | edit: I usually dont run larger models (q6) because of the
           | speed. I'd guess a 405B model would just be awfully slow.
        
             | throwthrowuknow wrote:
             | Not going to work for training from scratch which is what
             | the author is doing.
        
         | angoragoats wrote:
         | You could for sure, but the nVidia setup described in this
         | article would be many times faster at inference. So it's a
         | tradeoff between power consumption and performance.
         | 
         | Also, modern GPUs are surprisingly good at throttling their
         | power usage when not actively in use, just like CPUs. So while
         | you need 3kW+ worth of PSU for an 8x3090 setup, it's not going
         | to be using anywhere near 3kW of power on average, unless
         | you're literally using the LLM 24x7.
        
           | cranberryturkey wrote:
           | Can Reflection:70b work on them?
        
             | angoragoats wrote:
             | Maybe you meant to reply to a different comment? Work on
             | what?
             | 
             | Edit: I guess to directly answer your question, I don't see
             | why you couldn't run a 70b model at full quality on either
             | a M2 192GB machine or on an 8x 3090 setup.
        
             | christianqchung wrote:
             | Pretty sure it'll work where any 70b model would, but it's
             | probably not noticably better than Llama 3.1 70b if the
             | reports I'm reading now are correct.[1]
             | 
             | [1]https://x.com/JJitsev/status/1832758733866222011
        
           | exyi wrote:
           | Even if you are running it constantly, the per token power
           | consumption is likely going to be in a similar range, not to
           | mention you'd need 10+ macs for the throughput.
        
           | robotnikman wrote:
           | I have a 3090 power capped at 65%, I only notice a minimal
           | difference in performance
        
         | lvl155 wrote:
         | This is something people often say without even attempting to
         | do a major AI task. If Mac Studio were that great they'd be
         | sold out completely. It's not even cost efficient for
         | inference.
        
         | kcb wrote:
         | and have way less power
        
         | vunderba wrote:
         | I'm seeing this misunderstanding a lot recently. There's _TWO_
         | components to putting together a viable machine learning rig:
         | 
         | - Fitting models in memory
         | 
         | - Inference / Training speed
         | 
         | 8 x RTX 3090s will absolutely _CRUSH_ a single Mac Studio in
         | raw performance.
        
         | steve_adams_86 wrote:
         | I know it's a fraction of the size, but my 32GB studio gets
         | wrecked by these types of tasks. My experience is that they're
         | awesome computers in general, but not as good for AI as people
         | expect.
         | 
         | Running llama3.1 70B is brutal on this thing. Responses take
         | minutes. Someone running the same model on 32GB of GPU memory
         | seems to have far better results from what I've read.
        
           | irusensei wrote:
           | You are probably swapping. On M3 max with similar memory
           | bandwidth the output is around 4t/s which is normally on par
           | with most people's reading speed. Try different quants.
        
       | 3eb7988a1663 wrote:
       | What is the power draw under load/idle? Does it noticeably
       | increase the room temperature? Given the surroundings (aka the
       | huge pile of boxes behind the setup), curious if you could get
       | away with just a couple of box fans instead of the array of case
       | fans.
       | 
       | Are you intending to use the capacity all for yourself or rent it
       | out to others?
        
         | NavinF wrote:
         | Box fans are surprisingly power hungry. You'd be better off
         | using large 200mm PC fans. They're also a lot quieter
        
           | michaelt wrote:
           | If you care about noise, I also recommend not getting 8 GPUs
           | with 3 fans each :)
        
       | throwpoaster wrote:
       | Did you write this with the LLM running on the rig?
        
         | emptiestplace wrote:
         | Does this post actually seem LLM generated to you?
        
       | cranberryturkey wrote:
       | this is why we need an actual AI blockchain, so we can donate GPU
       | and earn rewards for the p2p api calls using the distributed
       | model.
        
         | walterbell wrote:
         | _> donate GPU .. earn rewards_
         | 
         | Is a blockchain needed to sell unused GPU capacity?
        
         | bschmidt1 wrote:
         | That's actually interesting. While crypto GPU mining is
         | "purposeless" or arbitrary, would be way cooler if to GPU mine
         | meant to chunk through computing tasks in a free/open queue
         | (blockchain).
         | 
         | Eventually there could be some tipping point where networks are
         | fast enough and there are enough hosting participants it could
         | be like a worldwide/free computing platform - not just for AI
         | for anything.
        
           | yunohn wrote:
           | This idea has been brought up tons of times by grifters
           | aiming to pivot from Crypto to AI. The reason that GPUs are
           | used for blockchains is to compute large numbers or proofs -
           | which are truly useless but still verifiable so they can be
           | distributed and rewarded. The free GPU compute idea misses
           | this crucial point, so the blockchain part is (still) useless
           | unless your aim is to waste GPU compute instead.
           | 
           | IRL all you need is a simple platform to pay and schedule
           | jobs on other's GPUs.
        
             | fragmede wrote:
             | folding@home predates Bitcoin by eight years. the concept
             | isn't inherent to grifters
        
               | yunohn wrote:
               | Folding at home does not use a blockchain, further
               | proving non-grifters don't need it. That was the point
               | being discussed, not distributed computing as a concept.
        
           | vunderba wrote:
           | I also think this idea has been explored a little bit at
           | least in terms of GPU distribution networks for AI (Petal and
           | Horde come to mind).
           | 
           | https://stablehorde.net
           | 
           | https://petals.dev
        
         | cloudking wrote:
         | Similar concept https://petals.dev/
        
         | kcb wrote:
         | Problem is once you have to scale to multiple GPUs the
         | interconnect becomes the primary bottleneck.
        
       | SmellTheGlove wrote:
       | I thought I was balling with my dual 3090 with nvlink. I haven't
       | quite yet figured out what to do with 48GB VRAM yet.
       | 
       | I hope this guy posts updates.
        
         | lxe wrote:
         | Run 70B LLM models of course
        
           | thelastparadise wrote:
           | Or train a cute little baby llama.
        
       | bogwog wrote:
       | Awesome! I've always wondered what something like this would look
       | like for a home lab.
       | 
       | I'm excited to see your benchmarks :)
        
       | InsomniacL wrote:
       | When you moved in to your house, did you think you would finish a
       | PC build with 192GB of VRAM before you would finish the plaster
       | boarding?
        
       | walterbell wrote:
       | An adjacent project for 8 GPUs could convert used 4K monitors
       | into a borderless mini-wall of pixels, for local video
       | composition with rendered and/or AI-generated backgrounds,
       | https://theasc.com/articles/the-mandalorian
       | 
       |  _> the heir to rear projection -- a dynamic, real-time, photo-
       | real background played back on a massive LED video wall and
       | ceiling, which not only provided the pixel-accurate
       | representation of exotic background content, but was also
       | rendered with correct camera positional data.. "We take objects
       | that the art department have created and we employ photogrammetry
       | on each item to get them into the game engine"_
        
       | renewiltord wrote:
       | I have a similar one with 4090s. Very cool. Yours is nicer than
       | mine where I've let the 4090s rattle around a bit.
       | 
       | I haven't had enough time to find a way to split inference which
       | is what I'm most interested in. Yours is also much better with
       | the 1600 W supply. I have a hodge podge.
        
       | modeless wrote:
       | I wonder how the cost compares to a Tinybox. $25k for 6x 4090 or
       | $15k for 6x 7900XTX. Of course that's the full package with power
       | supplies, CPU, storage, cooling, assembly, shipping, etc. And a
       | tested, known good hardware/software configuration which is
       | crucial with this kind of thing.
        
         | itomato wrote:
         | With a rental option coming, it's hard for me to imagine a more
         | profitable way to use a node like that.
        
         | Tepix wrote:
         | If you merely want CUDA and lots of VRAM there's no reason to
         | pick expensive 4090s over used 3090s
        
         | angoragoats wrote:
         | You can build a setup like in the OP for somewhere around $10k,
         | depending on several factors, the most important of which are
         | the price you source your GPUs at ($700 per 3090 is a
         | reasonable going rate) and what CPU you choose (high core
         | count, high frequency Epyc CPUs will cost more).
        
       | maaaaattttt wrote:
       | Looking forward to reading this series.
       | 
       | As a side note I'd love to find a chart/data on the cost
       | performance ratio of open source models. And possibly then a
       | $/ELO value (where $ is the cost to build and operate the machine
       | and ELO kind of a proxy value for the average performance of the
       | model)
        
       | wkat4242 wrote:
       | > And who knows, maybe someone will look back on my work and be
       | like "haha, remember when we thought 192GB of VRAM was a lot?"
       | 
       | I wonder if this will happen. It's already really hard to buy big
       | HDDs for my NAS because nobody buys external drives anymore. So
       | the pricing has gone up a lot for the prosumer.
       | 
       | I expect something similar to happen to AI. The big cloud parties
       | are all big leaders on LLMs and their goal is to keep us beholden
       | to their cloud service. Cheap home hardware work serious
       | capability is not something they're interested in. They want to
       | keep it out of our reach so we can pay them rent and they can
       | mine our data.
        
         | thelastparadise wrote:
         | > It's already really hard to buy big HDDs for my NAS
         | 
         | IME 20tb drives are easy to find.
         | 
         | I don't think the clouds have access to bigger drives or
         | anything.
         | 
         | Similarly, we can buy 8x A100s, they're just fundamentally
         | expensive whether you're a business or not.
         | 
         | There doesn't seem to be any "wall" up like there used to be
         | with proprietary hardware.
        
           | wkat4242 wrote:
           | They are easy to find but extremely expensive. I used to pay
           | below 200EUR for a 14TB Seagate 8 years ago. That's now above
           | 300. And the bigger ones are even more expensive.
           | 
           | For me these prices are prohibitive. Just like the A100s are
           | (though those are even more so of course).
           | 
           | The problem is the common consumer relying on the cloud so
           | these kind of products become niches and lose volume. Also,
           | the cloud providers don't pay what we do for a GPU or HDD.
           | They buy them by the ten thousands and get deep discounts.
           | That's why the RRPs which we do pay are highly inflated.
        
             | Dylan16807 wrote:
             | Well if I look at Amazon I see a couple models of external
             | 14TB for $190, and a brand new Exos 16TB for $230. Not too
             | bad. Though personally I get much cheaper used drives and
             | put them in RAID for a NAS.
             | 
             | And they do have better sales.
        
         | gizmo686 wrote:
         | The cloud companies do not make the hardware, they buy it like
         | the rest of us. They are just going to be almost the entirety
         | of the market, so naturally the products will built and priced
         | with that market in mind.
        
           | wkat4242 wrote:
           | Yes and they get deep discounts which we don't. Can be 40% or
           | more!
           | 
           | Of course the vendor can't make a profit with such discounts
           | so they inflate the RRP. But we do end up paying that.
        
         | Eisenstein wrote:
         | It isn't that cloud providers want to shut us out, it is that
         | nVidia wants to relegate AI capable cards to the high end
         | enterprise tier. So far in 2024 they have made $10.44b in
         | revenue from the gaming market, and over $47.5b in the
         | datacenter market, and I would bet that there is much less
         | profit in gaming. In order to keep the market segmented they
         | stopped putting nvlink on gaming cards and have capped VRAM at
         | 24GB for the highest end GPUs (3090 and 4090) and it doesn't
         | look much better for the upcoming 5090. I don't blame them,
         | they are a profit-maximizing corporation after all, but if
         | anything is to be done about making large AI models practical
         | for hobbyists, start with nVidia.
         | 
         | That said, I really don't think that the way forward for
         | hobbyists is maxing VRAM. Small models are becoming much more
         | capable and accelerators are a possibility, and there may not
         | be a need for a person to run a 70billion parameter model in
         | memory at all when there are MoEs like Mixtral and small
         | capable models like phi.
        
         | Saris wrote:
         | >It's already really hard to buy big HDDs for my NAS because
         | nobody buys external drives anymore. So the pricing has gone up
         | a lot for the prosumer.
         | 
         | I buy refurb/used enterprise drives for that reason, generally
         | around $12 per TB for the recent larger drives. And around $6
         | per TB for smaller drives. You just need an SAS interface but
         | that's not difficult or expensive.
         | 
         | IE; 25TB for $320, or 12TB for $80.
        
       | flixf wrote:
       | Very interesting! How are the 8 GPUs connected to the
       | motherboard? Based on the article and the pictures, he doesn't
       | appear to be using PCIe risers.
       | 
       | I have a setup with 3 RTX 3090 GPUs and the PCIe risers are a
       | huge source of pain and system crashes.
        
         | plantain wrote:
         | Looks like SlimSAS.
        
         | lbotos wrote:
         | I had the same question. I was curious what retimers he was
         | using.
         | 
         | I've had my eye on these for a bit https://c-payne.com/
        
       | choilive wrote:
       | I have a similar setup in my basement! Although its multiple
       | nodes, with a total of 16x3090s. Also needed to install a 30A
       | 240V circuit as well.
        
         | lvl155 wrote:
         | That last part is often overlooked. This is also why sometimes
         | it's just not worth going local especially if you don't need
         | all that compute power beyond a few days.
        
       | Tepix wrote:
       | So, how do you connect the 8th card if you have 7 PCIe 4.0 x16
       | slots available?
        
         | manav wrote:
         | PCIe bifurcation - so splitting one of the x16 slots into two
         | x8 or similar.
        
       | LetsGetTechnicl wrote:
       | Just an eye watering amount of compute, electricity and money
       | just to run LLM's... this is insane. Very cool though!
        
       | elorant wrote:
       | The motherboard has 7 PCie slots and there are 8 GPUs. So where
       | does the spare one connect to? Is he using two GPUs in the same
       | slot limiting the bandwidth?
        
         | ganoushoreilly wrote:
         | may be using an nvme to pcie adapter, common in the crypto
         | mining world
        
       ___________________________________________________________________
       (page generated 2024-09-08 23:00 UTC)