[HN Gopher] Benchmark Framework Desktop Mainboard and 4-node clu...
       ___________________________________________________________________
        
       Benchmark Framework Desktop Mainboard and 4-node cluster
        
       Author : geerlingguy
       Score  : 121 points
       Date   : 2025-08-07 17:49 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jeffbee wrote:
       | I had been hoping that these would be a bit faster than the 9950X
       | because of the different memory architecture, but it appears that
       | due to the lower power design point the AI Max+ 395 loses across
       | the board, by large margins. So I guess these really are niche
       | products for ML users only, and people with generic workloads
       | that want more than the 9950X offers are shopping for a
       | Threadripper.
        
         | dijit wrote:
         | Sounds about right.
         | 
         | I'm struggling to justify the cost of a Threadripper (let alone
         | pro!) for a AAA game studio though.
         | 
         | I wonder who can justify these machines. High frequency
         | trading? data science? shouldn't that be done on servers?
        
           | jeffbee wrote:
           | Yeah I don't get it either. To get marginally more resources
           | than the 9950X you have to make a significant leap in price
           | to a $1500+ CPU on a $1000 motherboard.
        
           | kadoban wrote:
           | Threadripper very rarely seems to make any sense. The only
           | times it seems like you want it are for huge memory
           | support/bandwidth and/or a huge number of pcie slots. But
           | it's not cheap or supported enough compared to epyc to really
           | make sense to me any time I've been specing out a system
           | along those lines.
        
             | StrangeDoctor wrote:
             | I bought a threadripper pro system out of desperation,
             | trying to get secondhand PCIe 80G A100s to run locally. The
             | huge rebar allocations confused/crashed every Intel/AMD
             | system I had access to.
             | 
             | I think the Xeon systems should have worked and that it was
             | actually a motherboard bios issue, but I had seen a photo
             | of it running in a threadripper and prayed I wasn't digging
             | an even deeper hole.
        
         | rtkwe wrote:
         | It also seems like the tools aren't there to fully utilize
         | them. Unless I misunderstood he was running off CPU only for
         | all the test so there's still the iGPU and NPU performance
         | that's not been utilized in these tests.
        
           | geerlingguy wrote:
           | No, only a couple initial tests with Ollama used CPU. I ran
           | most tests on Vulkan / iGPU, and some on ROCm (read further
           | down the thread).
           | 
           | I found it difficult to install ROCm on Fedora 42 but after
           | upgrading to Rawhide it was easy, so I re-tested everything
           | with ROCm vs Vulkan.
           | 
           | Ollama, for some silly reason, doesn't support Vulkan even
           | though I've used a fork many times to get full GPU
           | acceleration with it on Pi, Ampere, and even this AMD
           | system... (moral of the story just stick with llama.cpp).
        
             | edwinjones wrote:
             | Sadly, the reason they give is subjectively terrible:
             | 
             | https://x.com/ollama/status/1952783981000446029
             | 
             | No experimental flag option, no "you can use the fork that
             | works fine but we don't have capacity to support this" just
             | a hard "no, we think it's unreliable". I guess they just
             | want you to drop them and use llama.cpp.
        
               | geerlingguy wrote:
               | Yeah, my conspiracy theory is Nvidia is somehow
               | influencing the decision. If you can do Vulkan with
               | Ollama, it opens up people to using Intel/AMD/other iGPUs
               | and you might not be incentivized to buy an Nvidia GPU.
               | 
               | ROCm support is not wonderful. It's certainly worse for
               | an end user to deal with than Vulkan, which usually 'just
               | works'.
        
               | edwinjones wrote:
               | I agree. AMD should just go all in on vulkan I think, The
               | ROCm compatibility list is terrible compared to...every
               | modern device and probably some ancient gpus that can be
               | made to work with vulkan as well.
               | 
               | Considering they created mantle, you would think it would
               | be the obvious move too.
        
             | jcastro wrote:
             | Hi Jeff, I'm a linux ambassador for Framework and I have
             | one of these units. It'd be interesting if you would
             | install ramalama in fedora and test that. I've been using
             | that out of the box as a drop in replacement for ollama and
             | everything was GPU accelerated out of the box. It pulls
             | rocm from a container and just figures it out, etc. Would
             | love to see actual numbers though.
             | 
             | Great work on this!
        
       | mhitza wrote:
       | I've ran a comparison benchmark for the smaller models
       | https://gist.github.com/mhitza/f5a8eeb298feb239de10f9f60f841...
       | 
       | Comparing it against the RTX 4000 SFF Ada (20GB) which is around
       | $1.2k (if you believe the original price on the nvidia website
       | https://marketplace.nvidia.com/en-us/enterprise/laptops-work...).
       | Which I have access to on a Hetzner GEX44.
       | 
       | I'm going to ballpark it between 2.5-3x faster than the desktop.
       | Except for the tg128 test, where the difference is "minimal" (but
       | I didn't do the math).
        
         | yencabulator wrote:
         | The whole point of these integrated memory designs is to go
         | beyond that 20 GB VRAM.
        
       | reissbaker wrote:
       | Thanks for the excellent writeup. I'm pleasantly surprised that
       | ROCm worked as well as it did -- for the price these aren't bad
       | for LLM workloads and some moderate gaming. (Apple is probably
       | still the king of affordable at-home inference, but for games...
       | Amazing these days but Linux is so much better.)
        
         | mulmen wrote:
         | I switched to Fedora Sway as my daily driver nearly two years
         | ago. A Windows title wasn't working on my brand new PC. I
         | switched to Steam+Proton+Fedora and it worked immediately.
         | Valve now offers a more stable and complete Windows API through
         | Proton than Microsoft does through Windows itself.
        
       | xemdetia wrote:
       | I was about to be annoyed until you said you got preprod units. I
       | guess I'll have to build on this when my desktop shows up.
        
       | iamtheworstdev wrote:
       | for those who are already in the field and doing these things -
       | if I wanted to start running my own local LLM.. should I find an
       | Nvidia 5080 GPU for my current desktop or is it worth trying one
       | of these Framework AMD desktops?
        
         | wmf wrote:
         | If you think the future is small models (27B) get Nvidia; if
         | you think larger models (70-120B) are worth it then you need
         | AMD or Apple.
        
           | yencabulator wrote:
           | I wonder how much MoE will disrupt this. qwen3:30b-a3b is
           | pretty good even on pure CPU, but a lot smarter than a 3B
           | parameter model. If the CPU-GPU bottleneck isn't too tight, a
           | large model might be able to sustainably cache the currently
           | active experts in GPU RAM.
        
         | loudmax wrote:
         | The short answer is that the best value is a used RTX 3090 (the
         | long answer being, naturally, it depends). Most of the time,
         | the bottleneck for running LLMs on consumer grade equipment is
         | memory and memory bandwidth. A 3090 has 24GB of VRAM, while a
         | 5080 only has 16GB of VRAM. For models that can fit inside 16GB
         | of VRAM, the 5080 will certainly be faster than the 3090, but
         | the 3090 can run models that simply won't fit on a 5080. You
         | can offload part of the model onto the CPU and system RAM, but
         | running a model on a desktop CPU is an enormous drag, even when
         | only partially offloaded.
         | 
         | Obviously an RTX 5090 with 32GB of VRAM is even better, but
         | they cost around $2000, if you can find one.
         | 
         | What's interesting about this Strix Halo system is that it has
         | 128GB of RAM that is accessible (or mostly accessible) to the
         | CPU/GPU/APU. This means that you can run much larger models on
         | this system than you possibly could on a 3090, or even a 5090.
         | The performance tests tend to show that the Strix Halo's memory
         | bandwidth is a significant bottleneck though. This system might
         | be the most affordable way of running 100GB+ models, but it
         | won't be fast.
        
       | Havoc wrote:
       | Jeff - check out the distributed-llama project...you should be
       | able to distribute over entire cluster
        
         | burnte wrote:
         | He mentioned that in the video.
        
         | yjftsjthsd-h wrote:
         | https://github.com/b4rtaz/distributed-llama ?
        
         | geerlingguy wrote:
         | I've been testing Exo (seems dead), llama.cpp RPC (has a lot of
         | performance limitations) and distributed-llama (faster but has
         | some Vulkan quirks and only works with a few models).
         | 
         | See my AI cluster automation setup here:
         | https://github.com/geerlingguy/beowulf-ai-cluster
         | 
         | I was building that through the course of making this video,
         | because it's insane how much manual labor people put into
         | building home AI clusters :D
        
       | jvanderbot wrote:
       | So, TL;DR?
       | 
       | I saw mixed results but comments suggest very good performance
       | relative to other at-home setups. Can someone summarize?
        
         | geerlingguy wrote:
         | I put most of the top-line numbers and some graphs on my blog:
         | https://www.jeffgeerling.com/blog/2025/i-clustered-four-fram...
        
           | jvanderbot wrote:
           | Great! As always fantastic writeup
        
       ___________________________________________________________________
       (page generated 2025-08-07 23:00 UTC)