[HN Gopher] A Guide to Local Coding Models
___________________________________________________________________
A Guide to Local Coding Models
Author : mpweiher
Score : 85 points
Date : 2025-12-21 20:55 UTC (2 hours ago)
(HTM) web link (www.aiforswes.com)
(TXT) w3m dump (www.aiforswes.com)
| nzeid wrote:
| I appreciate the author's modesty but the flip-flopping was a
| little confusing. If I'm not mistaken, the conclusion is that by
| "self-hosting" you save money in all cases, but you cripple
| performance in scenarios where you need to squeeze out the kind
| of quality that requires hardware that's impractical to cobble
| together at home or within a laptop.
|
| I am still toying with the notion of assembling an LLM tower with
| a few old GPUs but I don't use LLMs enough at the moment to
| justify it.
| a_victorp wrote:
| If you ever do it, please make a guide! I've been toying with
| the same notion myself
| suprjami wrote:
| If you want to do it cheap, get a desktop motherboard with
| two PCIe slots and two GPUs.
|
| Cheap tier is dual 3060 12G. Runs 24B Q6 and 32B Q4 at 16
| tok/sec. The limitation is VRAM for large context. 1000 lines
| of code is ~20k tokens. 32k tokens is is ~10G VRAM.
|
| Expensive tier is dual 3090 or 4090 or 5090. You'd be able to
| run 32B Q8 with large context, or a 70B Q6.
|
| For software, llama.cpp and llama-swap. GGUF models from
| HuggingFace. It just works.
|
| If you need more than that, you're into enterprise hardware
| with 4+ PCIe slots which costs as much as a car and the power
| consumption of a small country. You're better to just pay for
| Claude Code.
| satvikpendem wrote:
| Jeff Geerling has (not quite but sort of) guides:
| https://news.ycombinator.com/item?id=46338016
| cloudhead wrote:
| In my experience the latest models (Opus 4.5, GPT 5.2) Are _just_
| starting to keep up with the problems I'm throwing at them, and I
| really wish they did a better job, so I think we're still 1-2
| years away from local models not wasting developer time outside
| of CRUD web apps.
| OptionOfT wrote:
| Eh, these things are trained on existing data. The further you
| are from that the worse the models get.
|
| I've noticed that I need to be a lot more specific in those
| cases, up to the point where being more specific is slowing me
| down, partially because I don't always know what the right
| thing is.
| simonw wrote:
| > I realized I looked at this more from the angle of a hobbiest
| paying for these coding tools. Someone doing little side projects
| --not someone in a production setting. I did this because I see a
| lot of people signing up for $100/mo or $200/mo coding
| subscriptions for personal projects when they likely don't need
| to.
|
| Are people really doing that?
|
| If that's you, know that you can get a LONG way on the $20/month
| plans from OpenAI and Anthropic. The OpenAI one in particular is
| a great deal, because Codex is charged a whole lot lower than
| Claude.
|
| The time to cough up $100 or $200/month is when you've exhausted
| your $20/month quota and you are frustrated at getting cut off.
| At that point you should be able to make a responsible decision
| by yourself.
| hamdingers wrote:
| And as a hobbyist the time to sign up for the $20/month plan is
| after you've spent $20 on tokens at least a couple times.
|
| YMMV based on the kinds of side projects you do, but it's
| definitely been cheaper for me in the long run to pay by token,
| and the flexibility it offers is great.
| iOSThrowAway wrote:
| I spent $240 in one week through the API and realized the
| $20/month was a no-brainer.
| __mharrison__ wrote:
| I'm convinced the $20 gpt plus plan is the best plan right now.
| You can use Codex with gpt5.2. I've been very impressed with
| this.
|
| (I also have the same MBP the author has and have used Aider
| with Qwen locally.)
| baq wrote:
| bit the bullet this week and paid for a month of claude and a
| month of chatgpt plus. claude seems to have much lower token
| limits, both aggregate and rate-limited and GPT-5.2 isn't a
| bad model at all. $20 for claude is not enough even for a
| hobby project (after one day!), openai looks like it might
| be.
| InsideOutSanta wrote:
| I feel like a lot of the criticism the GPT-5.x models
| receive only applies to specific use cases. I prefer these
| models over Anthropic's because they are less creative and
| less likely to take freedoms interpreting my prompts.
|
| Sonnet 4.5 is great for vibe coding. You can give it a
| relatively vague prompt and it will take the initiative to
| interpret it in a reasonable way. This is good for non-
| programmers who just want to give the model a vague idea
| and end up with a working, sensible product.
|
| But I usually do not want that, I do not want the model to
| take liberties and be creative. I want the model to do
| precisely what I tell it and nothing more. In my
| experience, te GPT-5.x models are a better fit for that way
| of working.
| wyre wrote:
| Me. Currently using Claude Max for personal coding projects.
| I've been on Claude's $20 plan and would run out of tokens. I
| don't want to give my money to OpenAI. So far these projects
| have not returned their value back to me, but I am viewing it
| as an investment in learning best pratices with these coding
| tools.
| satvikpendem wrote:
| > _If that 's you, know that you can get a LONG way on the
| $20/month plans from OpenAI and Anthropic._
|
| > _The time to cough up $100 or $200 /month is when you've
| exhausted your $20/month quota and you are frustrated at
| getting cut off. At that point you should be able to make a
| responsible decision by yourself._
|
| These are the same people, by and large. What I have seen is
| users who purely vibe code everything and run into the limits
| of the $20/m models and pay up for the more expensive ones.
| Essentially they're trading learning coding (and time, in some
| cases, it's not always faster to vibe code than do it yourself)
| for money.
| maddmann wrote:
| If this is the new way code is written then they are arguably
| learning how to code. Jury is still out though, but I think
| you are being a bit dismissive.
| smcleod wrote:
| On a $20/mo plan doing any sort of agentic coding you'll hit
| the 5hr window limits in less than 20 minutes.
| andix wrote:
| It really depends. When building a lot of new features it
| happens quite fast. With some attention to context length I
| was often able to go for over an hour on the 20$ claude plan.
|
| If you're doing mostly smaller changes, you can go all day
| with the 20$ Claude plan without hitting the limits.
| Especially if you need to thoroughly review the AI changes
| for correctness, instead of relying on automated tests.
| jwpapi wrote:
| Not everybody is broke.
| simonw wrote:
| This story talks about MLX and Ollama but doesn't mention LM
| Studio - https://lmstudio.ai/
|
| LM Studio can run both MLX and GGUF models but does so from an
| Ollama style (but more full-featured) macOS GUI. They also have a
| very actively maintained model catalog at
| https://lmstudio.ai/models
| ZeroCool2u wrote:
| LMStudio is so much better than Ollama it's silly it's not more
| popular.
| thehamkercat wrote:
| LMStudio is not open source though, ollama is
|
| but people should use llama.cpp instead
| behnamoh wrote:
| > LMStudio is not open source though, ollama is
|
| and why should that affect usage? it's not like ollama
| users fork the repo before installing it.
| thehamkercat wrote:
| It was worth mentioning.
| smcleod wrote:
| I suspect Ollama is at least partly moving away open source
| as they look to raise capitol, when they released their
| replacement desktop app they did so as closed source.
| You're absolutely right that people should be using
| llama.cpp - not only is it truly open source but it's
| significantly faster, has better model support, many more
| features, better maintained and the development community
| is far more active.
| midius wrote:
| Makes me think it's a sponsored post.
| Cadwhisker wrote:
| LMStudio? No, it's the easiest way to run am LLM locally that
| I've seen to the point where I've stopped looking at other
| alternatives.
|
| It's cross-platform (Win/Mac/Linux), detects the most
| appropriate GPU in your system and tells you whether the
| model you want to download will run within it's RAM
| footprint.
|
| It lets you set up a local server that you can access through
| API calls as if you were remotely connected to an online
| service.
| vunderba wrote:
| FWIW, Ollama already does most of this:
|
| - Cross-platform
|
| - Sets up a local API server
|
| The tradeoff is a somewhat higher learning curve, since you
| need to manually browse the model library and choose the
| model/quantization that best fit your workflow and
| hardware. OTOH, it's also open-source unlike LMStudio which
| is proprietary.
| randallsquared wrote:
| I assumed from the name that it only ran llama-derived
| models, rather than whatever is available at huggingface.
| Is that not the case?
| thehamkercat wrote:
| I think you should mention that LM Studio isn't open source.
|
| I mean, what's the point of using local models if you can't
| trust the app itself?
| satvikpendem wrote:
| Depends what people use them for, not every user of local
| models is doing so for privacy, some just don't like paying
| for online models.
| thehamkercat wrote:
| Most LLM sites are now offering free plans, and they are
| usually better than what you can run locally, So I think
| people are running local models for privacy 99% of the time
| behnamoh wrote:
| > I mean, what's the point of using local models if you can't
| trust the app itself?
|
| and you think ollama doesn't do telemetry/etc. just because
| it's open source?
| thehamkercat wrote:
| That's why i suggested using llama.cpp in my other comment.
| maranas wrote:
| Cline + RooCode and VSCode already works really well with local
| models like qwen3-coder or even the latest gpt-oss. It is not as
| plug-and-play as Claude but it gets you to a point where you only
| have to do the last 5% of the work
| NelsonMinar wrote:
| "This particular [80B] model is what I'm using with 128GB of
| RAM". The author then goes on to breezily suggest you try the 4B
| model instead of you only have 8GB of RAM. With no discussion of
| exactly what a hit in quality you'll be taking doing that.
| Workaccount2 wrote:
| I'm curious what the mental calculus was that a $5k laptop would
| competitively benchmark against SOTA models for the next 5 years
| was.
|
| Somewhat comically, the author seems to have made it about 2
| days. Out of 1,825. I think the real story is the folly of
| fixating your eyes on shiny new hardware and searching for
| justifications. I'm too ashamed to admit how many times I've done
| that dance...
|
| Local models are purely for fun, hobby, and extreme privacy
| paranoia. If you really want privacy beyond a ToS guarantee, just
| lease a server (I know they can still be spying on that, but it's
| a threshold.)
| ekjhgkejhgk wrote:
| I agree with everything you said, and yet I cannot help but
| respect a person who wants to do it himself. It reminds me of
| the hacker culture of the 80s and 90s.
| satvikpendem wrote:
| > _I 'm curious what the mental calculus was that a $5k laptop
| would competitively benchmark against SOTA models for the next
| 5 years was._
|
| Well, the hardware remains the same but local models get better
| and more efficient, so I don't think there is much difference
| between paying 5k for online models over 5 years vs getting a
| laptop (and well, you'll need a laptop anyway, so why not just
| get a good enough one to run local models in the first place?).
| smcleod wrote:
| My 2023 Macbook Pro (M2 Max) is coming up to 3 years old and I
| can run models locally that are arguably "better" than what was
| considered SOTA about 1.5 years ago. This is of course not an
| exact comparison but it's close enough to give some
| perspective.
| holyknight wrote:
| your premise would've been right, if memory wouldn't skyrocketed
| like 400% in like 2 weeks.
| freeone3000 wrote:
| What are you doing with these models that you're going above free
| tier on copilot?
| satvikpendem wrote:
| Some just like privacy and working without internet, I for
| example travel regularly on the train and like to have my
| laptop when there's not always good WiFi.
| ardme wrote:
| Isnt the math of buying Nvidia stock with what you pay for all
| the hardware and then just paying $20 a month for codex with the
| annual returns better?
| andix wrote:
| I wouldn't run local models on the development PC. Instead run
| them on a box in another room or another location. Less fan noise
| and it won't influence the performance of the pc you're working
| on.
|
| Latency is not an issue at all for LLMs, even a few hundred ms
| won't matter.
|
| It doesn't make a lot of sense to me, except when working offline
| while traveling.
___________________________________________________________________
(page generated 2025-12-21 23:00 UTC)