[HN Gopher] Building an AI server on a budget
___________________________________________________________________
Building an AI server on a budget
Author : mful
Score : 58 points
Date : 2025-06-06 02:33 UTC (2 days ago)
(HTM) web link (www.informationga.in)
(TXT) w3m dump (www.informationga.in)
| vunderba wrote:
| The RTX market is particularly irritating right now, even second-
| hard 4090s are still going for MSRP if you can find them at all.
|
| Most of the recommendations for this budget AI system are on
| point - the only thing I'd recommend is more RAM. 32GB is not a
| lot - particularly if you start to load larger models through
| formats such as GGUF and want to take advantage of system ram to
| split the layers at the cost of inference speed. I'd recommend at
| least _2 x 32GB_ or even _4 x 32GB_ if you can swing it budget-
| wise.
|
| Author mentioned using Claude for recommendations, but another
| great resource for building machines is PC Part Picker. They'll
| even show warnings if you try pairing incompatible parts or try
| to use a PSU that won't supply the minimum recommended power.
|
| https://pcpartpicker.com
| uniposterz wrote:
| I had a similar setup for a local LLM, 32GB was not enough. I
| recommend going for 64GB.
| golly_ned wrote:
| Whenever I get to a section that was clearly autogenerated by an
| LLM I lose interest in the entire article. Suddenly the entire
| thing is suspect and I feel like I'm wasting my time, since I'm
| lo lingering encountering the mind of another person, just
| interacting with a system.
| bravesoul2 wrote:
| I didn't see anything like that here. Yeah they used bullets.
| golly_ned wrote:
| There's a section that says what the parts of a pc are, and
| what that part is.
| Nevermark wrote:
| > I used the AI-generated recommendations as a starting
| point, and refined the options with my own research.
|
| Referring to this section?
|
| I don't see a problem with that. This isn't an article
| about a design intended for 10,000 systems. Just one
| person's follow through on an interesting project. With
| disclosure of methodology.
| throwaway314155 wrote:
| Eh, yeah - the article starts off pretty specific but then gets
| into the weeds of stuff like how to put your PC together, which
| is far from novel information and certainly not on-topic in my
| opinion.
| 7speter wrote:
| I dunno everyone, but I think Intel has something big on their
| hands with their announced workstation gpus. The b50 is a low
| profile card that doesn't have a powersupply hookup because it
| only uses something like 60 watts, and comes with 16gb vram at a
| msrp of 300 dollars.
|
| I imagine companies will have first dibs via the likes of
| agreements with suppliers like CDW, etc, but if Intel had enough
| of these battlemage dies accumulated, it could also drastically
| change the local ai enthusiast/hobbyist landscape; for starters
| this could drive down the price of workstation cards that are
| ideal for inference, at the very least. I'm cautiously excited.
|
| On the AMD front (really, a sort of open compute front), Vulkan
| Kompute is picking up steam and it would be really cool to have a
| standard that mostly(?) ships with Linux, and older ports
| available for Freebsd, so that we can actually run free as in
| freedom inference locally.
| Uehreka wrote:
| Love the attention to detail, I can tell this was a lot of work
| to put together and I hope it helps people new to PC building.
|
| I will note though, 12GB of VRAM and 32GB of system RAM is a
| ceiling you're going to hit pretty quickly if you're into messing
| with LLMs. There's basically no way to do a better job at the
| budget you're working with though.
|
| One thing I hear about a lot is people using things like RunPod
| to briefly get access to powerful GPUs/servers when they need
| one. If you spend $2/hr you can get access to an H100. If you
| have a budget of $1300 that could get you about 600 hours of
| compute time, which (unless you're doing training runs) should
| last you several months.
|
| In several months time the specs required to run good models will
| be different again in ways that are hard to predict, so this
| approach can help save on the heartbreak of buying an RTX 5090
| only to find that even that doesn't help much with LLM inference
| and we're all gonna need the cheaper-but-more-VRAM Intel Arc
| B60s.
| semi-extrinsic wrote:
| > save on the heartbreak of buying an RTX 5090 only to find
| that even that doesn't help much with LLM inference and we're
| all gonna need the cheaper-but-more-VRAM Intel Arc B60s
|
| When going for more VRAM, with an RTX 5090 currently sitting at
| $3000 for 32GB, I'm curious why people aren't trying to get the
| Dell C4140s. Those seem to go for $3000-$4000 for the whole
| server with 4x V100 16GB, so 64GB total VRAM.
|
| Maybe it's just because they produce heat and noise like a
| small turbojet.
| Jedd wrote:
| In January 2024 there was a similar post (
| https://news.ycombinator.com/item?id=38985152 ) wherein the
| author selected dual NVidia 4060 Ti's for an at-home-LLM-with-
| voice-control -- because they were the cheapest cost per GB of
| well-supported VRAM at the time.
|
| (They probably still are, or at least pretty close to it.)
|
| That informed my decision shortly after, when I built something
| similar - that video card model was widely panned by gamers (or
| more accurately, gamer 'influencers'), but it was an excellent
| choice if you wanted 16GB of VRAM with relatively low power draw
| (150W peak).
|
| TFA doesn't say where they are, or what currency they're using
| (which implies the hubris of a North American) - at which point
| that pricing for a second hand, smaller-capacity, higher-power-
| drawing 4070 just seems weird.
|
| Appreciate the 'on a budget' aspect, it just seems like an
| objectively worse path, as upgrades are going to require
| replacement, rather than augment.
|
| As per other comments here, 32 / 12 is going to be _really_
| limiting. Yes - lower parameter / smaller-quant models are
| becoming more capable, but at the same time we're seeing
| increasing interest in larger context for these at home use
| cases, and that chews up memory real fast.
| throwaway314155 wrote:
| > which implies the hubris of a North American
|
| No need for that.
| topato wrote:
| True, though
| topato wrote:
| He did soften the blow by saying North American, rather than
| the more correctly appropos, American
| T-A wrote:
| > TFA doesn't say where they are
|
| "the 1,440W limit on wall outlets in California" is a pretty
| good hint.
| rcarmo wrote:
| The trouble with these things is that "on a budget" doesn't
| deliver much when most interesting and truly useful models are
| creeping beyond the 16GB VRAM limit and/or require a lot of
| wattage. Even a Mac mini with enough RAM is starting to look like
| an expensive proposition, and the AMD Stryx Halo APUs (the SKUs
| that matter, like the Framework Desktop at 128GB) are around $2K.
|
| As someone who built a period-equivalent rig (with a 12GB 3060
| and 128GB RAM) a few years ago, I am not overly optimistic that
| local models will keep being a cheap alternative (never mind the
| geopolitics). And yeah, there are vey cheap ways to run
| inference, but hey become pointless - I can run Qwen and Phi4
| locally on an ARM chip like the RK3588, but it is still dog slow.
| v5v3 wrote:
| I thought prevailing wisdom was that a used 3090 with it's larger
| vram was the best budget gpu choice?
|
| And in general, if on a budget then why not buy used and not new?
| And more so as the author himself talks about the resale value
| for when he sells it on.
| olowe wrote:
| > I thought prevailing wisdom was that a used 3090 with it's
| larger vram was the best budget gpu choice?
|
| The trick is memory bandwidth - not just the amount of VRAM -
| is important for LLM inference. For example, the B50 specs list
| a memory bandwidth of 224 GB/s [1], whereas the Nvidia RTX 3090
| has over 900GB/s [2]. The 4070's bandwidth is "just" 500GB/s
| [3].
|
| More VRAM helps run larger models but with lower bandwidth
| tokens could be generating so slowly it's not really practical
| for day-to-day use or experimenting.
|
| [1]:
| https://www.intel.com/content/www/us/en/products/sku/242615/...
|
| [2]: https://www.techpowerup.com/gpu-specs/geforce-
| rtx-3090.c3622
|
| [3]: https://www.thefpsreview.com/gpu-family/nvidia-geforce-
| rtx-4...
| lelanthran wrote:
| > The trick is memory bandwidth - not just the amount of VRAM
| - is important for LLM inference.
|
| I'm not really knowledgeable about this space, so maybe I'm
| missing something:
|
| Why does the bus performance affect token generation? I would
| expect it to cause a slow startup when loading the model, but
| once the model is loaded, just how much bandwidth can the
| token generation possibly use?
|
| Token generation is completely on the card using the memory
| _on the card_ , without any bus IO at all, no?
|
| IOW, I'm trying to think of what IO the card is going to need
| for token generation, and I can't think of any other than
| returning the tokens (which, even on a slow 100MB/s transfer
| is still going to be about 100x the rate at which tokens are
| being generated.
| retinaros wrote:
| yes it is
| politelemon wrote:
| If the author is reading this I'll point out that the cuda
| toolkit you find in the repositories is generally older. You can
| find the latest versions straight from Nvidia:
| https://developer.nvidia.com/cuda-downloads?target_os=Linux&...
|
| The caveat is that sometimes a library might be expecting an
| older version of cuda.
|
| The vram on the GPU does make a difference, so it would at some
| point be worth looking at another GPU or increasing your system
| ram if you start running into limits.
|
| However I wouldn't worry too much right away, it's more important
| to get started and get an understanding of how these local LLMs
| operate and take advantage of the optimisations that the
| community is making to make it more accessible. Not everyone has
| a 5090, and if LLMs remain in the realms of high end hardware,
| it's not worth the time.
| throwaway314155 wrote:
| The other main caveat is that installing from custom sources
| using apt is a massive pain in the ass.
| burnt-resistor wrote:
| Reminds me of https://cr.yp.to/hardware/build-20090123.html
|
| I'll be that guy(tm) that says if you're going to do any
| computing half-way reliably, only use ECC RAM. Silent bit flips
| suck.
| DogRunner wrote:
| I used a similar budget and build something like this:
|
| 7x RTX 3060 - 12 GB which results in 84GB Vram AMD Ryzen 5 -
| 5500GT with 32GB Ram
|
| All in a 19-inch rack with a nice cooling solution and a beefy
| power supply.
|
| My costs? 1300 Euro, but yeah, I sourced my parts on ebay /
| second hand.
|
| (Added some 3d printed parts into the mix:
| https://www.printables.com/model/1142963-inter-tech-and-gene...
| https://www.printables.com/model/1142973-120mm-5mm-rised-noc...
| https://www.printables.com/model/1142962-cable-management-fu...
| if you think about building something similar)
|
| My power consumption is below 500 Watt at the wall, when using
| LLLMs,since I did some optimizations:
|
| * Worked on power optimizations and after many weeks of
| benchmarking, the sweet spot on the RTX3060 12GB cards is a 105
| Watt limit
|
| * Created Patches for Ollama (
| https://github.com/ollama/ollama/pull/10678) to group models to
| exactly memory allocation instead of spreading over all available
| GPUs (This also reduces the VRAM overhead)
|
| * ensured that ASPM is used on all relevant PCI components
| (Powertop is your friend)
|
| It's not all shiny:
|
| * I still use PCIe3 X1 for most of the cards, which limits their
| capability, but all I found so far (PCIe Gen4 x4 extender and
| bifurcation/special PCIE routers) are just too expensive to be
| used on such low powered cards
|
| * Due to the slow PCIe bandwidth, the performance drops
| significantly
|
| * Max VRAM per GPU is king. If you split up a model over several
| cards, the RAM allocation overhead is huge! (See Examples in my
| ollama patch about). I would rather use 3x 48GB instead of 7x
| 12G.
|
| * Some RTX 3060 12GB Cards do idle at 11-15 Watt, which is
| unacceptable. Good BIOSes like the one from Gigabyte (Windforce
| xxx) do idle at 3 Watt, which is a huge difference when you use 7
| or more cards. These BIOSes can be patched, but this can be risky
|
| All in all, this server idles at 90-100Watt currently, which is
| perfect as a central service for my tinkerings and my family
| usage.
| incomingpain wrote:
| I've been dreaming on pcpartpicker.
|
| I think Radeon RX 7900 XT - 20 GB has been the best bang for your
| buck. Enables full gpu 32B?
|
| Looking at what other people have been doing lately, they arent
| doing this.
|
| They are getting 64+ core cpus and 512GB of ram. Keeping it on
| cpu and enabling massive models. This setup lets you do deepseek
| 671B.
|
| It makes me wonder, how much better is 671B vs 32B?
| djhworld wrote:
| With system builds like this I always feel the VRAM is the
| limiting factor when it comes to what models you can run, and
| consumer grade stuff tends to max out at 16GB or (somemtimes)
| 24GB for more expensive models.
|
| It does make me wonder whether we'll start to see more and more
| computers with unified memory architecture (like the Mac) - I
| know nvidia have the Digits thing which has been renamed to
| something else
| JKCalhoun wrote:
| Go server GPU (TESLA) and 24 GB is not unusual. (And also about
| $300 used on eBay.)
| atentaten wrote:
| Enjoyed the article as I am interested in the same. I would like
| to have seen more about the specific use cases and how they
| performed on the rig.
| ww520 wrote:
| I use a 10-year old laptop to run a local LLM. The time between
| prompts are 10-30 seconds. Not for speedy interactive usage.
| JKCalhoun wrote:
| Someone posted that they had used a "mining rig" [0] from
| AliExpress for less than $100. It even has RAM and a CPU. He
| picked up a 2000W (!) DELL server PS for cheap off eBay. The GPUs
| were NVIDIA TESLAs (M40 for example) since they often have a lot
| of RAM and are less expensive.
|
| I followed in those footsteps to create my own [1] (photo [2]).
|
| I picked up a 24GB M40 for around $300 off eBay. I 3D printed a
| "cowl" for the GPU that I found online and picked up two small
| fans from Amazon that got int he cowl. Attached the cowl + fans
| keep the GPU cool. (These TESLA server GPUs have no fan since
| they're expected to live in one of those wind-tunnels called a
| server rack).
|
| I bought the same cheap DELL server PS that the original person
| had used and I also had to get a break-out board (and power-
| supply cables and adapters) for the GPU.
|
| Thanks to LLMs, I was able to successfully install Rocky Linux as
| well as CUDA and NVIDIA drivers. I SSH into it and run ollama
| commands.
|
| My own hurdle at this point is: I have a 2nd 24 GB M40 TESLA but
| when installed on the motherboard, Linux will not boot. LLMs are
| helping me try to set up BIOS correctly or otherwise determine
| what the issue is. (We'll see.) I would love to get to 48 GB.
|
| [0] https://www.aliexpress.us/item/3256806580127486.html
|
| [1]
| https://bsky.app/profile/engineersneedart.com/post/3lmg4kiz4...
|
| [2]
| https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxjqlam...
| iJohnDoe wrote:
| Details about the ML software or AI software?
| jacekm wrote:
| For $100 more you could get a used 3090 with twice as much VRAM.
| You could also get 4060 Ti which is cheaper than 4070 and it has
| 16 GB VRAM (although it's less powerfull too, so I guess depends
| on the use case)
| msp26 wrote:
| > 12GB vram
|
| waste of effort, why would you go through the trouble of building
| + blogging for this?
| pshirshov wrote:
| 3090 for ~1000 is much more solid choice. Also these old mining
| mobos play very well for multi-gpu ollama.
| usercvapp wrote:
| I have a server at home sitting IDLE for the last 2 years with 2
| TB of RAM and 4 CPUs.
|
| I am gonna push it this week and launch some LLM models to see
| how they perform!
|
| How much electric bill efficient are they running locally?
| T-A wrote:
| I would consider adding $400 for something like this instead:
|
| https://www.bosgamepc.com/products/bosgame-m5-ai-mini-deskto...
___________________________________________________________________
(page generated 2025-06-08 23:00 UTC)