[HN Gopher] Benchmark Framework Desktop Mainboard and 4-node clu...
___________________________________________________________________
Benchmark Framework Desktop Mainboard and 4-node cluster
Author : geerlingguy
Score : 121 points
Date : 2025-08-07 17:49 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| jeffbee wrote:
| I had been hoping that these would be a bit faster than the 9950X
| because of the different memory architecture, but it appears that
| due to the lower power design point the AI Max+ 395 loses across
| the board, by large margins. So I guess these really are niche
| products for ML users only, and people with generic workloads
| that want more than the 9950X offers are shopping for a
| Threadripper.
| dijit wrote:
| Sounds about right.
|
| I'm struggling to justify the cost of a Threadripper (let alone
| pro!) for a AAA game studio though.
|
| I wonder who can justify these machines. High frequency
| trading? data science? shouldn't that be done on servers?
| jeffbee wrote:
| Yeah I don't get it either. To get marginally more resources
| than the 9950X you have to make a significant leap in price
| to a $1500+ CPU on a $1000 motherboard.
| kadoban wrote:
| Threadripper very rarely seems to make any sense. The only
| times it seems like you want it are for huge memory
| support/bandwidth and/or a huge number of pcie slots. But
| it's not cheap or supported enough compared to epyc to really
| make sense to me any time I've been specing out a system
| along those lines.
| StrangeDoctor wrote:
| I bought a threadripper pro system out of desperation,
| trying to get secondhand PCIe 80G A100s to run locally. The
| huge rebar allocations confused/crashed every Intel/AMD
| system I had access to.
|
| I think the Xeon systems should have worked and that it was
| actually a motherboard bios issue, but I had seen a photo
| of it running in a threadripper and prayed I wasn't digging
| an even deeper hole.
| rtkwe wrote:
| It also seems like the tools aren't there to fully utilize
| them. Unless I misunderstood he was running off CPU only for
| all the test so there's still the iGPU and NPU performance
| that's not been utilized in these tests.
| geerlingguy wrote:
| No, only a couple initial tests with Ollama used CPU. I ran
| most tests on Vulkan / iGPU, and some on ROCm (read further
| down the thread).
|
| I found it difficult to install ROCm on Fedora 42 but after
| upgrading to Rawhide it was easy, so I re-tested everything
| with ROCm vs Vulkan.
|
| Ollama, for some silly reason, doesn't support Vulkan even
| though I've used a fork many times to get full GPU
| acceleration with it on Pi, Ampere, and even this AMD
| system... (moral of the story just stick with llama.cpp).
| edwinjones wrote:
| Sadly, the reason they give is subjectively terrible:
|
| https://x.com/ollama/status/1952783981000446029
|
| No experimental flag option, no "you can use the fork that
| works fine but we don't have capacity to support this" just
| a hard "no, we think it's unreliable". I guess they just
| want you to drop them and use llama.cpp.
| geerlingguy wrote:
| Yeah, my conspiracy theory is Nvidia is somehow
| influencing the decision. If you can do Vulkan with
| Ollama, it opens up people to using Intel/AMD/other iGPUs
| and you might not be incentivized to buy an Nvidia GPU.
|
| ROCm support is not wonderful. It's certainly worse for
| an end user to deal with than Vulkan, which usually 'just
| works'.
| edwinjones wrote:
| I agree. AMD should just go all in on vulkan I think, The
| ROCm compatibility list is terrible compared to...every
| modern device and probably some ancient gpus that can be
| made to work with vulkan as well.
|
| Considering they created mantle, you would think it would
| be the obvious move too.
| jcastro wrote:
| Hi Jeff, I'm a linux ambassador for Framework and I have
| one of these units. It'd be interesting if you would
| install ramalama in fedora and test that. I've been using
| that out of the box as a drop in replacement for ollama and
| everything was GPU accelerated out of the box. It pulls
| rocm from a container and just figures it out, etc. Would
| love to see actual numbers though.
|
| Great work on this!
| mhitza wrote:
| I've ran a comparison benchmark for the smaller models
| https://gist.github.com/mhitza/f5a8eeb298feb239de10f9f60f841...
|
| Comparing it against the RTX 4000 SFF Ada (20GB) which is around
| $1.2k (if you believe the original price on the nvidia website
| https://marketplace.nvidia.com/en-us/enterprise/laptops-work...).
| Which I have access to on a Hetzner GEX44.
|
| I'm going to ballpark it between 2.5-3x faster than the desktop.
| Except for the tg128 test, where the difference is "minimal" (but
| I didn't do the math).
| yencabulator wrote:
| The whole point of these integrated memory designs is to go
| beyond that 20 GB VRAM.
| reissbaker wrote:
| Thanks for the excellent writeup. I'm pleasantly surprised that
| ROCm worked as well as it did -- for the price these aren't bad
| for LLM workloads and some moderate gaming. (Apple is probably
| still the king of affordable at-home inference, but for games...
| Amazing these days but Linux is so much better.)
| mulmen wrote:
| I switched to Fedora Sway as my daily driver nearly two years
| ago. A Windows title wasn't working on my brand new PC. I
| switched to Steam+Proton+Fedora and it worked immediately.
| Valve now offers a more stable and complete Windows API through
| Proton than Microsoft does through Windows itself.
| xemdetia wrote:
| I was about to be annoyed until you said you got preprod units. I
| guess I'll have to build on this when my desktop shows up.
| iamtheworstdev wrote:
| for those who are already in the field and doing these things -
| if I wanted to start running my own local LLM.. should I find an
| Nvidia 5080 GPU for my current desktop or is it worth trying one
| of these Framework AMD desktops?
| wmf wrote:
| If you think the future is small models (27B) get Nvidia; if
| you think larger models (70-120B) are worth it then you need
| AMD or Apple.
| yencabulator wrote:
| I wonder how much MoE will disrupt this. qwen3:30b-a3b is
| pretty good even on pure CPU, but a lot smarter than a 3B
| parameter model. If the CPU-GPU bottleneck isn't too tight, a
| large model might be able to sustainably cache the currently
| active experts in GPU RAM.
| loudmax wrote:
| The short answer is that the best value is a used RTX 3090 (the
| long answer being, naturally, it depends). Most of the time,
| the bottleneck for running LLMs on consumer grade equipment is
| memory and memory bandwidth. A 3090 has 24GB of VRAM, while a
| 5080 only has 16GB of VRAM. For models that can fit inside 16GB
| of VRAM, the 5080 will certainly be faster than the 3090, but
| the 3090 can run models that simply won't fit on a 5080. You
| can offload part of the model onto the CPU and system RAM, but
| running a model on a desktop CPU is an enormous drag, even when
| only partially offloaded.
|
| Obviously an RTX 5090 with 32GB of VRAM is even better, but
| they cost around $2000, if you can find one.
|
| What's interesting about this Strix Halo system is that it has
| 128GB of RAM that is accessible (or mostly accessible) to the
| CPU/GPU/APU. This means that you can run much larger models on
| this system than you possibly could on a 3090, or even a 5090.
| The performance tests tend to show that the Strix Halo's memory
| bandwidth is a significant bottleneck though. This system might
| be the most affordable way of running 100GB+ models, but it
| won't be fast.
| Havoc wrote:
| Jeff - check out the distributed-llama project...you should be
| able to distribute over entire cluster
| burnte wrote:
| He mentioned that in the video.
| yjftsjthsd-h wrote:
| https://github.com/b4rtaz/distributed-llama ?
| geerlingguy wrote:
| I've been testing Exo (seems dead), llama.cpp RPC (has a lot of
| performance limitations) and distributed-llama (faster but has
| some Vulkan quirks and only works with a few models).
|
| See my AI cluster automation setup here:
| https://github.com/geerlingguy/beowulf-ai-cluster
|
| I was building that through the course of making this video,
| because it's insane how much manual labor people put into
| building home AI clusters :D
| jvanderbot wrote:
| So, TL;DR?
|
| I saw mixed results but comments suggest very good performance
| relative to other at-home setups. Can someone summarize?
| geerlingguy wrote:
| I put most of the top-line numbers and some graphs on my blog:
| https://www.jeffgeerling.com/blog/2025/i-clustered-four-fram...
| jvanderbot wrote:
| Great! As always fantastic writeup
___________________________________________________________________
(page generated 2025-08-07 23:00 UTC)