hngopher.com

       [HN Gopher] A Guide to Local Coding Models
       ___________________________________________________________________
        
       A Guide to Local Coding Models
        
       Author : mpweiher
       Score  : 85 points
       Date   : 2025-12-21 20:55 UTC (2 hours ago)
        
 (HTM) web link (www.aiforswes.com)
 (TXT) w3m dump (www.aiforswes.com)
        
       | nzeid wrote:
       | I appreciate the author's modesty but the flip-flopping was a
       | little confusing. If I'm not mistaken, the conclusion is that by
       | "self-hosting" you save money in all cases, but you cripple
       | performance in scenarios where you need to squeeze out the kind
       | of quality that requires hardware that's impractical to cobble
       | together at home or within a laptop.
       | 
       | I am still toying with the notion of assembling an LLM tower with
       | a few old GPUs but I don't use LLMs enough at the moment to
       | justify it.
        
         | a_victorp wrote:
         | If you ever do it, please make a guide! I've been toying with
         | the same notion myself
        
           | suprjami wrote:
           | If you want to do it cheap, get a desktop motherboard with
           | two PCIe slots and two GPUs.
           | 
           | Cheap tier is dual 3060 12G. Runs 24B Q6 and 32B Q4 at 16
           | tok/sec. The limitation is VRAM for large context. 1000 lines
           | of code is ~20k tokens. 32k tokens is is ~10G VRAM.
           | 
           | Expensive tier is dual 3090 or 4090 or 5090. You'd be able to
           | run 32B Q8 with large context, or a 70B Q6.
           | 
           | For software, llama.cpp and llama-swap. GGUF models from
           | HuggingFace. It just works.
           | 
           | If you need more than that, you're into enterprise hardware
           | with 4+ PCIe slots which costs as much as a car and the power
           | consumption of a small country. You're better to just pay for
           | Claude Code.
        
           | satvikpendem wrote:
           | Jeff Geerling has (not quite but sort of) guides:
           | https://news.ycombinator.com/item?id=46338016
        
       | cloudhead wrote:
       | In my experience the latest models (Opus 4.5, GPT 5.2) Are _just_
       | starting to keep up with the problems I'm throwing at them, and I
       | really wish they did a better job, so I think we're still 1-2
       | years away from local models not wasting developer time outside
       | of CRUD web apps.
        
         | OptionOfT wrote:
         | Eh, these things are trained on existing data. The further you
         | are from that the worse the models get.
         | 
         | I've noticed that I need to be a lot more specific in those
         | cases, up to the point where being more specific is slowing me
         | down, partially because I don't always know what the right
         | thing is.
        
       | simonw wrote:
       | > I realized I looked at this more from the angle of a hobbiest
       | paying for these coding tools. Someone doing little side projects
       | --not someone in a production setting. I did this because I see a
       | lot of people signing up for $100/mo or $200/mo coding
       | subscriptions for personal projects when they likely don't need
       | to.
       | 
       | Are people really doing that?
       | 
       | If that's you, know that you can get a LONG way on the $20/month
       | plans from OpenAI and Anthropic. The OpenAI one in particular is
       | a great deal, because Codex is charged a whole lot lower than
       | Claude.
       | 
       | The time to cough up $100 or $200/month is when you've exhausted
       | your $20/month quota and you are frustrated at getting cut off.
       | At that point you should be able to make a responsible decision
       | by yourself.
        
         | hamdingers wrote:
         | And as a hobbyist the time to sign up for the $20/month plan is
         | after you've spent $20 on tokens at least a couple times.
         | 
         | YMMV based on the kinds of side projects you do, but it's
         | definitely been cheaper for me in the long run to pay by token,
         | and the flexibility it offers is great.
        
           | iOSThrowAway wrote:
           | I spent $240 in one week through the API and realized the
           | $20/month was a no-brainer.
        
         | __mharrison__ wrote:
         | I'm convinced the $20 gpt plus plan is the best plan right now.
         | You can use Codex with gpt5.2. I've been very impressed with
         | this.
         | 
         | (I also have the same MBP the author has and have used Aider
         | with Qwen locally.)
        
           | baq wrote:
           | bit the bullet this week and paid for a month of claude and a
           | month of chatgpt plus. claude seems to have much lower token
           | limits, both aggregate and rate-limited and GPT-5.2 isn't a
           | bad model at all. $20 for claude is not enough even for a
           | hobby project (after one day!), openai looks like it might
           | be.
        
             | InsideOutSanta wrote:
             | I feel like a lot of the criticism the GPT-5.x models
             | receive only applies to specific use cases. I prefer these
             | models over Anthropic's because they are less creative and
             | less likely to take freedoms interpreting my prompts.
             | 
             | Sonnet 4.5 is great for vibe coding. You can give it a
             | relatively vague prompt and it will take the initiative to
             | interpret it in a reasonable way. This is good for non-
             | programmers who just want to give the model a vague idea
             | and end up with a working, sensible product.
             | 
             | But I usually do not want that, I do not want the model to
             | take liberties and be creative. I want the model to do
             | precisely what I tell it and nothing more. In my
             | experience, te GPT-5.x models are a better fit for that way
             | of working.
        
         | wyre wrote:
         | Me. Currently using Claude Max for personal coding projects.
         | I've been on Claude's $20 plan and would run out of tokens. I
         | don't want to give my money to OpenAI. So far these projects
         | have not returned their value back to me, but I am viewing it
         | as an investment in learning best pratices with these coding
         | tools.
        
         | satvikpendem wrote:
         | > _If that 's you, know that you can get a LONG way on the
         | $20/month plans from OpenAI and Anthropic._
         | 
         | > _The time to cough up $100 or $200 /month is when you've
         | exhausted your $20/month quota and you are frustrated at
         | getting cut off. At that point you should be able to make a
         | responsible decision by yourself._
         | 
         | These are the same people, by and large. What I have seen is
         | users who purely vibe code everything and run into the limits
         | of the $20/m models and pay up for the more expensive ones.
         | Essentially they're trading learning coding (and time, in some
         | cases, it's not always faster to vibe code than do it yourself)
         | for money.
        
           | maddmann wrote:
           | If this is the new way code is written then they are arguably
           | learning how to code. Jury is still out though, but I think
           | you are being a bit dismissive.
        
         | smcleod wrote:
         | On a $20/mo plan doing any sort of agentic coding you'll hit
         | the 5hr window limits in less than 20 minutes.
        
           | andix wrote:
           | It really depends. When building a lot of new features it
           | happens quite fast. With some attention to context length I
           | was often able to go for over an hour on the 20$ claude plan.
           | 
           | If you're doing mostly smaller changes, you can go all day
           | with the 20$ Claude plan without hitting the limits.
           | Especially if you need to thoroughly review the AI changes
           | for correctness, instead of relying on automated tests.
        
         | jwpapi wrote:
         | Not everybody is broke.
        
       | simonw wrote:
       | This story talks about MLX and Ollama but doesn't mention LM
       | Studio - https://lmstudio.ai/
       | 
       | LM Studio can run both MLX and GGUF models but does so from an
       | Ollama style (but more full-featured) macOS GUI. They also have a
       | very actively maintained model catalog at
       | https://lmstudio.ai/models
        
         | ZeroCool2u wrote:
         | LMStudio is so much better than Ollama it's silly it's not more
         | popular.
        
           | thehamkercat wrote:
           | LMStudio is not open source though, ollama is
           | 
           | but people should use llama.cpp instead
        
             | behnamoh wrote:
             | > LMStudio is not open source though, ollama is
             | 
             | and why should that affect usage? it's not like ollama
             | users fork the repo before installing it.
        
               | thehamkercat wrote:
               | It was worth mentioning.
        
             | smcleod wrote:
             | I suspect Ollama is at least partly moving away open source
             | as they look to raise capitol, when they released their
             | replacement desktop app they did so as closed source.
             | You're absolutely right that people should be using
             | llama.cpp - not only is it truly open source but it's
             | significantly faster, has better model support, many more
             | features, better maintained and the development community
             | is far more active.
        
         | midius wrote:
         | Makes me think it's a sponsored post.
        
           | Cadwhisker wrote:
           | LMStudio? No, it's the easiest way to run am LLM locally that
           | I've seen to the point where I've stopped looking at other
           | alternatives.
           | 
           | It's cross-platform (Win/Mac/Linux), detects the most
           | appropriate GPU in your system and tells you whether the
           | model you want to download will run within it's RAM
           | footprint.
           | 
           | It lets you set up a local server that you can access through
           | API calls as if you were remotely connected to an online
           | service.
        
             | vunderba wrote:
             | FWIW, Ollama already does most of this:
             | 
             | - Cross-platform
             | 
             | - Sets up a local API server
             | 
             | The tradeoff is a somewhat higher learning curve, since you
             | need to manually browse the model library and choose the
             | model/quantization that best fit your workflow and
             | hardware. OTOH, it's also open-source unlike LMStudio which
             | is proprietary.
        
               | randallsquared wrote:
               | I assumed from the name that it only ran llama-derived
               | models, rather than whatever is available at huggingface.
               | Is that not the case?
        
         | thehamkercat wrote:
         | I think you should mention that LM Studio isn't open source.
         | 
         | I mean, what's the point of using local models if you can't
         | trust the app itself?
        
           | satvikpendem wrote:
           | Depends what people use them for, not every user of local
           | models is doing so for privacy, some just don't like paying
           | for online models.
        
             | thehamkercat wrote:
             | Most LLM sites are now offering free plans, and they are
             | usually better than what you can run locally, So I think
             | people are running local models for privacy 99% of the time
        
           | behnamoh wrote:
           | > I mean, what's the point of using local models if you can't
           | trust the app itself?
           | 
           | and you think ollama doesn't do telemetry/etc. just because
           | it's open source?
        
             | thehamkercat wrote:
             | That's why i suggested using llama.cpp in my other comment.
        
       | maranas wrote:
       | Cline + RooCode and VSCode already works really well with local
       | models like qwen3-coder or even the latest gpt-oss. It is not as
       | plug-and-play as Claude but it gets you to a point where you only
       | have to do the last 5% of the work
        
       | NelsonMinar wrote:
       | "This particular [80B] model is what I'm using with 128GB of
       | RAM". The author then goes on to breezily suggest you try the 4B
       | model instead of you only have 8GB of RAM. With no discussion of
       | exactly what a hit in quality you'll be taking doing that.
        
       | Workaccount2 wrote:
       | I'm curious what the mental calculus was that a $5k laptop would
       | competitively benchmark against SOTA models for the next 5 years
       | was.
       | 
       | Somewhat comically, the author seems to have made it about 2
       | days. Out of 1,825. I think the real story is the folly of
       | fixating your eyes on shiny new hardware and searching for
       | justifications. I'm too ashamed to admit how many times I've done
       | that dance...
       | 
       | Local models are purely for fun, hobby, and extreme privacy
       | paranoia. If you really want privacy beyond a ToS guarantee, just
       | lease a server (I know they can still be spying on that, but it's
       | a threshold.)
        
         | ekjhgkejhgk wrote:
         | I agree with everything you said, and yet I cannot help but
         | respect a person who wants to do it himself. It reminds me of
         | the hacker culture of the 80s and 90s.
        
         | satvikpendem wrote:
         | > _I 'm curious what the mental calculus was that a $5k laptop
         | would competitively benchmark against SOTA models for the next
         | 5 years was._
         | 
         | Well, the hardware remains the same but local models get better
         | and more efficient, so I don't think there is much difference
         | between paying 5k for online models over 5 years vs getting a
         | laptop (and well, you'll need a laptop anyway, so why not just
         | get a good enough one to run local models in the first place?).
        
         | smcleod wrote:
         | My 2023 Macbook Pro (M2 Max) is coming up to 3 years old and I
         | can run models locally that are arguably "better" than what was
         | considered SOTA about 1.5 years ago. This is of course not an
         | exact comparison but it's close enough to give some
         | perspective.
        
       | holyknight wrote:
       | your premise would've been right, if memory wouldn't skyrocketed
       | like 400% in like 2 weeks.
        
       | freeone3000 wrote:
       | What are you doing with these models that you're going above free
       | tier on copilot?
        
         | satvikpendem wrote:
         | Some just like privacy and working without internet, I for
         | example travel regularly on the train and like to have my
         | laptop when there's not always good WiFi.
        
       | ardme wrote:
       | Isnt the math of buying Nvidia stock with what you pay for all
       | the hardware and then just paying $20 a month for codex with the
       | annual returns better?
        
       | andix wrote:
       | I wouldn't run local models on the development PC. Instead run
       | them on a box in another room or another location. Less fan noise
       | and it won't influence the performance of the pc you're working
       | on.
       | 
       | Latency is not an issue at all for LLMs, even a few hundred ms
       | won't matter.
       | 
       | It doesn't make a lot of sense to me, except when working offline
       | while traveling.
        
       ___________________________________________________________________
       (page generated 2025-12-21 23:00 UTC)