[HN Gopher] Coding with LLMs in the summer of 2025 - an update
___________________________________________________________________
Coding with LLMs in the summer of 2025 - an update
Author : antirez
Score : 365 points
Date : 2025-07-20 11:04 UTC (11 hours ago)
(HTM) web link (antirez.com)
(TXT) w3m dump (antirez.com)
| theodorewiles wrote:
| My question on all of the "can't work with big codebases" is how
| would a codebase that was designed for an LLM look like? Composed
| of many many small functions that can be composed together?
| antirez wrote:
| I believe it's the same as for humans: different files
| implementing different parts of the system with good interfaces
| and sensible boundaries.
| dkdcio wrote:
| this is a common pattern I see -- if your codebase is
| confusing for LLMs, it's probably confusing for people too
| physicles wrote:
| This fact is one of the most pleasant surprises I've had
| during this AI wave. Finally, a concrete reason to care
| about your docs and your code quality.
| afro88 wrote:
| Well documented helps a lot too.
|
| You can use an LLM to help document a codebase, but it's
| still an arduous task because you do need to review and fix
| up the generated docs. It will make, sometimes glaring
| sometimes subtle, mistakes. And you want your documentation
| to provide accuracy rather than double down on or even
| introduce misunderstanding.
| Hasnep wrote:
| And my question to that is how would that be different from a
| codebase designed for humans?
| __MatrixMan__ wrote:
| I think it means finer toplevel granularity re: what's
| runnable/testable at a given moment. I've been exploring this
| for my own projects and although it's not a silver bullet, I
| think there's something to it.
|
| ----
|
| Several codebases I've known have provided a three-stage
| pipeline: unit tests, integration tests, and e2e tests. Each
| of these batches of tests depend on the creation of one of
| three environments, and the code being tested is what ends up
| in those environments. If you're interested in a particular
| failing test, you can use the associated environment and just
| iterate on the failing test.
|
| For humans with a bit of tribal knowledge about the project,
| humans who have already solved the get-my-dev-environment-
| set-up problem in more or less uniform way, this works ok.
| Humans are better at retaining context over weeks and months,
| whereas you have to spin up a new session with an LLM every
| few hours or so. So we've created environments for ourselves
| that we ignore most of the time, but that are too complex to
| be bite sized for an agent that comes on the scene as a blank
| slate every few hours. There are too few steps from blank-
| slate to production, and each of them is too large.
|
| But if successively more complex environments can be built on
| each other in arbitrarily many steps, then we could achieve
| finer granularity. As a nix user, my mental model for this is
| function composition where the inputs and outputs are
| environments, but an analogous model would be layers in a
| docker files where you test each layer before building the
| one on top of it.
|
| Instead of maybe three steps, there are eight or ten. The
| goal would be to have both whatever code builds the
| environment, and whatever code tests it, paired up into bite-
| sized chunks so that a failure in the pipeline points you a
| specific stage which is more specific that "the unit tests
| are failing". Ideally test coverage and implementation
| complexity get distributed uniformly across those stages.
|
| Keeping the scope of the stages small maximizes the amount of
| your codebase that the LLM can ignore while it works. I have
| a flake output and nix devshell corresponding to each stage
| in the pipeline and I'm using pytest to mark tests based on
| which stage they should run in. So I run the agent from the
| devshell that corresponds with whichever stage is relevant at
| the moment, and I introduce it to _only_ the tests and code
| that are relevant to that stage (the assumption being that
| all previous stages are known to be in good shape). Most of
| the time, it doesn't need to know that it's working stage 5
| of 9, so it "feels" like a smaller codebase than it actually
| is.
|
| If evidence emerges that I've engaged the LLM at the wrong
| stage, I abandon the session and start over at the right
| level (now 6 of 9 or somesuch).
| Keyframe wrote:
| like a microservice architecture? overall architecture to get
| the context and then dive into a micro one?
| exitb wrote:
| And on top of that - can you steer an LLM to create this kind
| of code? In my experience the models don't really have a
| ,,taste" for detecting complexity creep and reengineering for
| simplicity, in the same way an experienced human does.
| lubujackson wrote:
| I am vibe coding a complex app. You can certainly keep things
| clean but the trick is to enforce a rigid structure. This
| does add a veneer of complexity but simplifies " implement
| this new module" or "add this feature across all relevant
| files".
| victorbjorklund wrote:
| I found that it is beneficial to create more libraries. If I
| for example build a large integration to an API (basically a
| whole api client) I would in the past have it in the same repo
| but now I make it a standalone library.
| qweiopqweiop wrote:
| This matches my take, but I'm curious if OP has used Claude code.
| antirez wrote:
| Yep when I use agents I go for Claude Code. For example I
| needed to buy too many Commodore 64 than appropriate lately,
| and I let it code a Telegram bot advising me when popular
| sources would have interesting listings. It worked (after a few
| iterations) then I looked at the code base and wanted to puke
| but who cares in this case? It worked and it was much faster
| and I had zero to learn in the proces of doing it myself. I
| published a Telegram library for C in the past and know how it
| works and how to do scraping and so forth.
| Keyframe wrote:
| _For example I needed to buy too many Commodore 64 than
| appropriate lately_
|
| Been there, done that!
|
| for those one-off small things, LLMs are rather cool.
| Especially Cloude Code and Gemini CLI. I was given an archive
| of some really old movies recently, but files were bearing
| title names in Croatian instead of original (mostly English
| ones). So I claude --dangerously-skip-permissions into the
| directory with movies and in a two-sentence prompt I asked it
| to rename files into a given format (that I tend to have in
| my archive) and for each title to find original name and year
| or release and use it in the file.. but, before commiting
| rename to give me a list of before and after for approval. It
| took like what, a minute of writing a prompt.
|
| Now, for larger things, I'm still exploring a way, an angle,
| what and how to do it. I've tried from yolo prompting to
| structured and uber structured approaches, all the way to
| mimicking product/prd - architecture - project management /
| tasks - developer/agents.. so far, unless it's rather simpler
| projects I don't see it's happening that way. Most luck I had
| was "some structure" as context and inputs and then guiding
| prompting during sessions and reviewing stuff. Almost pair-
| programming.
| apwell23 wrote:
| > ## Provide large context
|
| I thought large contexts are not necessarily better and sometimes
| have opposite effect ?
| antirez wrote:
| LLMs performance will suffer from both insufficient context and
| context flooding. Balancing is an art.
| NitpickLawyer wrote:
| I found it depends very much on the task. For "architect"
| sessions you need as much context as you can reasonably gather.
| The more the merrier. At least gemini2.5 pro will gather the
| needed context from many files and it really does make a
| difference when you can give it a lot of it.
|
| On coding you need to aggressively prune it, and only give
| minimum adjacent context, or it'll start going on useless
| tangents. And if you get stuck just refresh and start from 0,
| changing what is included. It's often faster than "arguing"
| with the LLM in multi-step sessions.
|
| (the above is for existing codebases. for vibe-coding one-off
| scripts, just go with the vibes, sometimes it works
| surprisingly well from a quick 2-3 lines prompt)
| apwell23 wrote:
| > Coding activities should be performed mostly with: Claude Opus
| 4
|
| I've been going down to sonnet for coding over opus. maybe i am
| just writing dumb code
| stpedgwdgfhgdd wrote:
| That is also what Anthropic recommends. In edge cases use Opus.
|
| Opus is also way more expensive. (Don't forget to switch back
| to Sonnet in _all_ terminals)
| jtonl wrote:
| Most of the time Sonnet 4 just works but need to refine context
| as much as you can.
| northern-lights wrote:
| In my experience as well, Sonnet 4 is much better than Opus.
| Opus is great at the start of a project, where you would need
| to plan things, structure the project, figure out how to
| execute but it cannot beat Sonnet is actually executing it. It
| is also a lot cheaper.
| cyral wrote:
| Opus is too expensive and I find it goes way off the rails
| often (just writing way way too much. Maybe that could be
| controlled with a better prompt on my end). Sonnet gives me
| more realistic code that isn't too overengineered.
| quantumHazer wrote:
| I'm going a little offtopic here, but I disagree with the OPs use
| of the term "PhD-level knowledge", although I have a huge amount
| of respect for antirez (beside that we are born in the same
| island).
|
| This phrasing can be misleading and points to a broader
| misunderstanding about the nature of doctoral studies, which it
| has been influenced by the marketing and hype discourse
| surrounding AI labs.
|
| The assertion that there is a defined "PhD-level knowledge" is
| pretty useless. The primary purpose of a PhD is not simply to
| acquire a vast amount of pre-existing knowledge, but rather to
| learn how to conduct research.
| antirez wrote:
| Agree with that. Read it as expert-level knowledge without all
| the other stuff LLMs can't do as well as humans. LLMs way to
| express knowledge is kinda of alien as it is different, so
| indeed those are all poor simplifications. For instance an LLM
| can't code as well as a top human coder but can write a non
| trivial program from the first to the last character without
| iterating.
| spyckie2 wrote:
| Hey antirez,
|
| What sticks out to me is Gemini catching bugs before
| production release, was hoping you'd give a little more
| insight into that.
|
| Reason being is that we expect ai to create bugs and we catch
| them, but if Gemini is spotting bugs by some way of it being
| a QA (not just by writing and passing tests) then that perks
| my interest.
| jacobr1 wrote:
| Our team has pretty aggressively started using LLMs for
| automated code review. It will look at our PRs and post
| comments. We can adding more material for different things
| for it to consider- from a looking at a summarized version
| of our API guidelines, general prompts like, "You are an
| expert software engineer and QA professional, review this
| PR and point out any bugs or other areas of technical risk.
| Make concise suggestions for improvement where applicable."
| - it catches a ton of stuff.
|
| Another area we've started doing is having it look at build
| failures and writing a report on suggested root causes
| before even a human looks at it - saves time.
|
| Or (and we haven't rolled this out automatically yet but
| are testing a prototype) having it triage alarms from our
| metrics, with access to the logs and codebase to
| investigate.
| kgwgk wrote:
| > The primary purpose of a PhD is not simply to acquire a vast
| amount of pre-existing knowledge, but rather to learn how to
| conduct research.
|
| It's not like once you have a PhD anyone cares about the
| subject, right? The only thing that matters is that you learnt
| to conduct research.
| quantumHazer wrote:
| I can't understand why once you have a PhD anyone should care
| more about the subject.
| ghm2180 wrote:
| > but rather to learn how to conduct research
|
| Further, I always assumed PhD level of knowledge meant coming
| up with the right questions. I would say it is at best a "Lazy
| Knowledge Rich worker", it won't explore hypothesis if you
| don't * _ask it*_ to. A PHD would ask those questions to
| *themselves*. Let me give you a simple example:
|
| The other day Claude Code(Max Pro Subscription) commented out a
| bunch of test assertions as a part of a related but separate
| test suite it was coding. It did not care to explore -- what
| was a serious bug -- why it was commenting it out because of a
| faulty assumption in the original plan. I had to ask it to
| change the plan by doing the ultra-think, think-hard trick to
| explore why it was failing, amend the plan and fix it.
|
| The bug was the ORM object had null values because it was not
| refreshed after the commit and was fetched before by another DB
| session that had since been closed.*
| chis wrote:
| If you understand that a PhD is about much more than just
| knowledge, it's still the case that having easy access to that
| knowledge is super valuable. My last job we often had questions
| that would just traditionally require a PhD-level person to
| answer, even if it wasn't at the limit of their research
| abilities. "What will happen to the interface of two materials
| if voltage is applied in one direction" type stuff, turns out
| to be really hard to answer but LLMs do a decent job.
| quantumHazer wrote:
| Have you checked experimentally the response of the LLM?
|
| Anyway I don't think this is ""PhD-knowledge"" questions, but
| job related electrical engineering questions.
| Keyframe wrote:
| Unlike OP, from my still limited but intense month or so diving
| into this topic so far, I had better luck with Gemini 2.5 PRO and
| Opus 4 on more abstract level like architecture etc. and then
| dealing input to Sonnet for coding. I found 2.5 PRO, and to a
| lesser degree Opus, were hit or miss; A lot of instances of them
| circling around the issue and correcting itself when coding
| (Gemini especially so), whereas Sonnet would cut to the chase,
| but needed explicit take on it to be efficient.
| khaledh wrote:
| This is my experience too. I usually use Gemini 2.5 Pro through
| AI Studio for big design ideas that need to be validated and
| refined. Then take the refined requirements to Claude Code
| which does an excellent job most of the time in coding them
| properly. Recently I tried Gemini CLI, and it's not even close
| to Claude Code's sharp coding skills. It often makes syntax
| mistakes, and get stuck trying to get itself out of a rut; its
| output is so verbose (and fast) that it's hard to follow what
| it's trying to do. Claude Code has a much better debugging
| capability.
|
| Another contender in the "big idea" reasoning camp: DeepSeek
| R1. It's much slower, but most of the time it can analyze
| problems and get to the correct solution in one shot.
| antirez wrote:
| Totally possible. In general I believe that while more powerful
| in their best outputs, Sonnet/Opus 4 are in other ways
| (alignment / consistency) a regression on Sonnet 3.5v2 (often
| called Sonnet 3.6), as Sonnet 3.7 was. Also models are
| _complex_ objects, and sometimes in a given domain a given
| model that on paper is weaker will work better. And, on top of
| that: interactive use vs agent requires different reinforcement
| learning training that sometimes may not be towards an aligned
| target... So also using the model in one way or the other may
| change how good it is.
| jpdus wrote:
| This is also confirmed by internal cline statistics where Opus
| and Gemini 2.5 pro both perform worse than Sonnet 4 in real-
| world scenarios
|
| https://x.com/pashmerepat/status/1946392456456732758/photo/1
| bgwalter wrote:
| Translation: His company will launch "AI" products in order to
| get funding or better compete with Valkey.
|
| I find it very sad that people who have been really productive
| without "AI" now go out of their way to find small anecdotal
| evidence for "AI".
| brokencode wrote:
| I find it even more sad when people come out of the woodwork on
| every LLM post to tell us that our positive experiences using
| LLMs are imagined and we just haven't realized how bad they are
| yet.
| on_the_train wrote:
| If LLMs were actually useful, there would be no need to
| scream it everywhere. On the contrary: it would be a guarded
| secret.
| neuronexmachina wrote:
| In my experience, devs generally aren't secretive about
| tools they find useful.
| fellowniusmonk wrote:
| People are insane, you can artificially pine for the
| simpler betters times made up in your mind when you could
| give oracle all your money.
|
| But I would stake my very life on the fact that the
| movement by developers we call open-source is the single
| greatest community and ethos humanity has ever created.
|
| Of course it inherits from enlightenment and other
| thinking, it doesn't exist in a vacuum, but it is an
| extension of the ideologies that came before it.
|
| I challenge anyone to come up with any single modern
| subcultures that has tangibly generated more that touches
| more lives, moves more weight, travels farther, effects
| humanity more every single day from the moment they wake
| up than the open source software community (in the
| catholic sense obviously).
|
| Both in moral goodness and in measurable improvement in
| standard of living and understanding of the universe.
|
| Some people's memories are very short indeed, all who
| pine pine for who they imagined they were and are
| consumed by a memetic desire of their imagined selves.
| badlibrarian wrote:
| > open-source is the single greatest community and ethos
| humanity has ever created
|
| good lord.
| hobs wrote:
| I think many devs are guarding their secrets, but the last
| few decades have shown us that an open foundation can net
| huge benefits for everyone (and then you can put your
| secret sauce in the last mile.)
| logsr wrote:
| posting a plain text description of your experience on a
| personal blog isn't exactly screaming. in the noise of the
| modern internet this would be read by nobody if it wasn't
| coming from one of the most well known open source software
| creators of all time.
|
| people who believe in open source don't believe that
| knowledge should be secret. i have released a lot of open
| source myself, but i wouldn't consider myself a "true
| believer." even so, i strongly believe that all information
| about AI must be as open as possible, and i devote a fair
| amount of time to reverse engineering various proprietary
| AI implementations so that i can publish the details of how
| they work.
|
| why? a couple of reasons:
|
| 1) software development is my profession, and i am not
| going to let anybody steal it from me, so preventing any
| entity from establishing a monopoly on IP in the space is
| important to me personally.
|
| 2) AI has some very serious geopolitical implications. this
| technology is more dangerous than the atomic bomb. allowing
| any one country to gain a monopoly on this technology would
| be extremely destabilizing to the existing global order,
| and must be prevented at all costs.
|
| LLMs are very powerful, they will get more powerful, and we
| have not even scratched the surface yet in terms of fully
| utilizing them in applications. staying at the cutting edge
| of this technology, and making sure that the knowledge
| remains free, and is shared as widely as possible, is a
| natural evolution for people who share the open source
| ethos.
| bgwalter wrote:
| If consumer "AI", and that includes programming tools,
| had real geopolitical implications it would be
| classified.
|
| The "race against China" is a marketing trick to convince
| senators to pour billions into "AI". Here is who is
| financing the whole bubble to a large extent:
|
| https://time.com/7280058/data-centers-tax-breaks-ai/
| alwillis wrote:
| _If LLMs were actually useful, there would be no need to
| scream it everywhere. On the contrary: it would be a
| guarded secret._
|
| LLMs _are_ useful--but there's no way such an innovation
| should be a "guarded secret" even at this early stage.
|
| It's like saying spreadsheets should have remained a secret
| when they amplified what people could do when they became
| mainstream.
| victorbjorklund wrote:
| If Internet was actually useful there would be no need to
| scream it everywhere. Guess that means the internet is
| totally useless?
| brokencode wrote:
| So ironic that you post this on Hacker News, where there
| are regularly articles and blog posts about lessons from
| the industry, both good and bad, that would be helpful to
| competitors. This industry isn't exactly Coke guarding its
| secret recipe.
| halfmatthalfcat wrote:
| Could it not be that those positive experiences are just
| shining a light that the practices before using an LLM were
| inefficient? It's more a reflection on the pontificator than
| anything.
| jstanley wrote:
| Tautologically so! That doesn't show that LLMs are useless,
| it perfectly shows how they are useful.
| johnfn wrote:
| Sure, but even then the perspective makes no sense. The
| common argument against AI at this point (e.g. OP) is that
| the only reason people use it is because they are
| intentionally trying to prop up high valuations - they seem
| unable to understand that other people have a different
| experience than they do. You'd think that just because
| there are some cases where it doesn't work doesn't
| necessarily mean that 100% of it is a sham. At worst it's
| just up to individual taste, but that doesn't mean everyone
| who doesn't share your taste is wrong.
|
| Consider cilantro. I'm happy to admit there are people out
| there who don't like cilantro. But it's like the people who
| don't like cilantro are inventing increasingly absurd
| conspiracy theories ("Redis is going to add AI features to
| get a higher valuation") to support their viewpoint, rather
| than the much simpler "some people like a thing I don't
| like".
| bgwalter wrote:
| "Redis for AI is our integrated package of features and
| services designed to get your GenAI apps into production
| faster with the fastest vector database."
| cratermoon wrote:
| We don't just tell you they were imagined, we can provide
| receipts.
|
| https://metr.org/blog/2025-07-10-early-2025-ai-
| experienced-o...
| nojito wrote:
| Cursor is an old way of using LLMs.
|
| Not to mention in the study less than 1/2 have ever used it
| before the study.
| roywiggins wrote:
| The AI tooling churn is so fast that by the time a study
| comes out people will be able to say "well they were
| using an older tool" _no matter what tool that the study
| used_.
| cratermoon wrote:
| It's the eternal future. "AI will soon be able to...".
|
| There's an entire class of investment scammers that
| string along their marks, claiming that the big payoff is
| just around corner while they fleece the victim with the
| death of a thousand cuts.
| nojito wrote:
| Not really. Chatting with a llm was cutting edge for 3
| years it's only within the last 8-10 months with Claude
| code and Gemini cli do we have the next big change in how
| we interact with llms
| roywiggins wrote:
| Claude Code was released in _May_.
| nojito wrote:
| Yup. But they are improvements over what cursor was
| releasing over the last year or so.
| roywiggins wrote:
| If there are paradigm-shattering improvements every six
| months, every single study that is ever released will be
| "behind" or "use an older tool." In six months when a
| study comes out using Claude Code, people dissatisfied
| with it will be able to point to the newest hotness, ad
| infinitum.
| camdenreslink wrote:
| How is Claude Code and Gemini CLI any different from
| using Cursor in agent mode? It's basically the same exact
| thing.
| steveklabnik wrote:
| I can't speak to how they're technically different, but
| in practice, Cursor was basically useless for me, and
| Claude Code works well. Even with Cursor using Claude's
| models.
| steveklabnik wrote:
| > We do not provide evidence that:
|
| > AI systems do not currently speed up many or most
| software developers
|
| > We do not claim that our developers or repositories
| represent a majority or plurality of software development
| work
| brokencode wrote:
| Certainly an interesting result, but remember that a single
| paper doesn't prove anything. This will no doubt be
| something studied very extensively and change over time as
| tools develop.
|
| Personally, I find the current tools don't work great for
| large existing codebases and complex tasks. But I've found
| they can help me quickly make small scripts to save me
| time.
|
| I know, it's not the most glamorous application, but it's
| what I find useful today. And I have confidence the tools
| will continue to improve. They hardly even existed a few
| years ago.
| skippyboxedhero wrote:
| Some people got into coding to code, rather than build
| things.
|
| If the AI is doing the coding then that is a threat to some
| people. I am not sure why, LLMs can be good and you can enjoy
| coding...those things are unrelated. The logic seems to be
| that if LLMs are good then coding is less fun, lol.
| Cheer2171 wrote:
| Software jobs pay more than artist jobs because coding
| builds things. You can still be a code artist on your own
| time. Nobody is stopping you from writing in assembler.
| skippyboxedhero wrote:
| -\\_(tsu)_/- people didn't stop playing chess because
| computers were better at it than them
| mumbisChungo wrote:
| And chess players stream as their primary income, because
| there's no money in Chess unless you're exactly the best
| player in the world (and even then the money is coming
| from sponsors/partners, not from chess itself).
| antirez wrote:
| Did you read my post? I hope you didn't.
|
| This post has nothing to do with Redis and is even a follow up
| to a post I wrote before rejoining the company.
| syntheticnature wrote:
| This is HN. We don't read posts here.
| kgwgk wrote:
| Amen. I have to confess that I made an exception here
| though. This may be the first submission I read before
| going into the comments in years.
| babuloseo wrote:
| Please send your thoughts and prayers to Gemini 2.5 Pro
| hopefully they can recover and get well soon enough, I
| hope Google lets them out of the hospital soon and
| discharges them, the last 3 week has been hell for me
| without them there.
| babuloseo wrote:
| OP as a free user of Gemini 2.5 Pro via Ai studio my friend
| has been hit by the equivalent of a car breaking
| approximately 3 weeks, I hope they can recover soon, it is
| not easy for them.
| dcre wrote:
| "Always be part of the loop by moving code by hand from your
| terminal to the LLM web interface: this guarantees that you
| follow every process. You are still the coder, but augmented."
|
| I agree with this, but this is why I use a CLI. You can pipe
| files instead of copying and pasting.
| lmeyerov wrote:
| Yeah it is also a bit of a shibboleth: vibes coding, when I'm
| productive for the 80% case with Claude code, is about the LLM
| cranking for 10-20min. I'm instructing & automating the LLM on
| how to do its own context management, vs artisanally making
| every little decision.
|
| Ex: Implementing a spec, responding to my review comments,
| adding wider unit tests, running a role play for usability
| testing, etc. The main time we do what he describes of manually
| copying into a web ide is occasionally for a better short use
| of a model, like only at the beginning of some plan generation,
| or debug from a bunch of context we have done manually. Like we
| recently solved some nasty GPU code race this way, using a
| careful mix of logs and distributed code. Most of our job is
| using Boring Tools to write Boring Code, even if the topic/area
| is neato: you do not want your codebase to work like an
| adventure for everything, so we invest in making it look
| boring.
|
| I agree the other commenter said: I manage context as part of
| the skill, but by making the AI do it. Doing that by hand is
| like slowly handcoding assembly. Instead, I'm telling Claude
| Code to do it. Ex: Download and crawl some new dependency I'm
| using for some tricky topic, or read in my prompt template
| markdown for some task, or generate and self-maintain some
| plan.md with high-level rules on context I defined. This is the
| 80% case.
|
| Maybe one of the disconnects is task latency vs throughput as
| trade-offs in human attention. If I need the LLM to get to the
| right answer faster, so the task is done faster, I have to lean
| in more. But my time is valuable and I have a lot to do. If
| rather spend 50% less of my time per task, even if the task
| takes 4x longer, by the LLM spinning longer. In that saved
| human time, I can be working on another task: I typically have
| 2-3 terminals running Claude, so I only check in every 5-15min.
| airstrike wrote:
| Your strategy only works for some domains.
| lmeyerov wrote:
| Totally
|
| We do this ~daily for:
|
| * Multitier webapps
|
| * DevOps infrastructure: docker, aws, ci systems, shell
| scripts, ...
|
| * Analytics & data processing
|
| * AI investigations (logs, SIEMs, ..) <--- what we sell!
|
| * GPU kernels
|
| * Compilers
|
| * Docs
|
| * Test amplification
|
| * Spec writing
|
| I think ~half the code happening by professional software
| engineers fits into these, or other vibes friendly domains.
| The stuff antirez does with databases seems close to what
| we do with compilers, GPU kernels, and infra.
|
| We are still not happy with production-grade frontend side
| of coding, though by being strong on API-first design and
| keeping logic vs UI seperated, most of our UI code is
| friendly to headless.
| indigodaddy wrote:
| Since I've heard Gemini-cli is not yet up to snuff, has anyone
| tried opencode+gemini? I've heard that with opencode you can
| login with Google account (have NOT confirmed this, but if anyone
| has any experience, pls advise) so not sure if that would get
| extra mileage from Gemini's limits vs using a Gemini api key?
| dakiol wrote:
| > Gemini 2.5 PRO | Claude Opus 4
|
| Whether it's vibe coding, agentic coding, or copy pasting from
| the web interface to your editor, it's still sad to see the
| normalization of private (i.e., paid) LLM models. I like the
| progress that LLMs introduce and I see them as a powerful tool,
| but I cannot understand how programmers (whether complete
| nobodies or popular figures) dont mind adding a strong dependency
| on a third party in order to keep programming. Programming used
| to be (and still is, to a large extent) an activity that can be
| done with open and free tools. I am afraid that in a few years,
| that will no longer be possible (as in most programmers will be
| so tied to a paid LLM, that not using them would be like not
| using an IDE or vim nowadays), since everyone is using private
| LLMs. The excuse "but you earn six figures, what' $200/month to
| you?" doesn't really capture the issue here.
| azan_ wrote:
| Paid models are just much, much better.
| dakiol wrote:
| Of course they are. I wouldn't expect otherwise :)
|
| But the price we're paying (and I don't mean money) is very
| high, imho. We all talk about how good engineers write code
| that depends on high-level abstractions instead of low-level
| details, allowing us to replace third party dependencies
| easily and test our apps more effectively, keeping the core
| of our domain "pure". Well, isn't it time we started doing
| the same with LLMs? I'm not talking about MCP, but rather an
| open source tool that can plug into either free and open
| source LLMs or private ones. That would at least allow us to
| switch to a free and opensource version if the companies
| behind the private LLMs go rogue. I'm afraid tho that
| wouldn't be enough, but it's a starting point.
|
| To put an example: what would you think if you need to pay
| for every single Linux process in your machine? Or for every
| Git commit you make? Or for every debugging session you
| perform?
| azan_ wrote:
| > I'm not talking about MCP, but rather an open source tool
| that can plug into either free and open source LLMs or
| private ones. That would at least allow us to switch to a
| free and opensource version if the companies behind the
| private LLMs go rogue. I'm afraid tho that wouldn't be
| enough, but it's a starting point.
|
| There are open source tools that do exactly that already.
| dakiol wrote:
| Ah, well that's nice. But every single post I read don't
| mention them? So, I assume they are not popular for some
| reason. Again, my main point here is: the normalization
| of using private LLMs. I don't see anyone talking about
| it; we are all just handing over a huge part of what it
| means to build software to a couple of enterprises whose
| goal is, of course, to maximize profit. So, yeah, perhaps
| I'm overthinking I don't know; I just don't like that now
| these companies are so ingrained in the act of building
| software (just like AWS is so ingrained in the act of
| running software)
| rahimnathwani wrote:
| But every single post I read don't mention them?
|
| Why would they?
|
| Does every single post about a Jetbrains feature mention
| that you can easily switch from Jetbrains to an open
| source editor like VS Code or vim?
| Arainach wrote:
| >every single post I read don't mention them
|
| Because the models are so much worse that people aren't
| using them.
|
| Philosophical battles don't pay the bills and for most of
| us they aren't fun.
|
| There have been periods of my life where I stubbornly
| persisted using something inferior for various reasons -
| maybe I was passionate about it, maybe I wanted it to
| exist and was willing to spend my time debugging and
| offer feedback - but there a finite number of hours in my
| life and often I'd much rather pay for something that
| works well than throw my heart, soul, time, and blood
| pressure at something that will only give me pain.
| airstrike wrote:
| None of that applies here since we could all easily switch
| to open models at a moment's notice with limited costs. In
| fact, we switch between proprietary models every few
| months.
|
| It just so happens that closed models are better _today_.
| ghm2180 wrote:
| > I'm not talking about MCP, but rather an open source tool
| that can plug into either free and open source LLMs or
| private ones.
|
| Has someone computed/estimated what is at cost $$$ value of
| utilizing these models at full tilt: several messages per
| minute and at least 500,000 token context windows? What we
| need is a wikipedia like effort to support something truly
| open and continually improving in its quality.
| simonw wrote:
| > I'm not talking about MCP, but rather an open source tool
| that can plug into either free and open source LLMs or
| private ones.
|
| I have been building that for a couple of years now:
| https://llm.datasette.io
| vunderba wrote:
| > an open source tool that can plug into either free and
| open source LLMs or private ones
|
| Fortunately there are many of these that can integrate with
| offline LLMs through systems like LiteLLM/Ollama/etc. Off
| the top of my head, I'd look into Continue, Cline and
| Aider.
|
| https://github.com/continuedev/continue
|
| https://github.com/cline/cline
|
| https://github.com/Aider-AI/aider
| moron4hire wrote:
| They are a little better. Sometimes that little bit is an
| activation-energy level of difference. But overall, I don't
| see a huge amount of difference in quality between the open
| and closed models. Most of the time, it just takes a little
| more effort to get as good of results out of the open models
| as the closed ones.
| belter wrote:
| The issue is somebody will have to debug and fix what those LLM
| Leeches made up. I guess then companies will have to hire some
| 10x Prompters?
| muglug wrote:
| > Programming used to be (and still is, to a large extent) an
| activity that can be done with open and free tools.
|
| Yet JetBrains has been a business longer than some of my
| colleagues have been alive, and Microsoft's Visual
| Basic/C++/Studio made writing software for Windows much easier,
| and did not come cheap.
| dakiol wrote:
| I see a big difference: I do use Jetbrains IDEs (they are
| nice), but I can switch to vim (or vscode) any time if I need
| to (e.g., let's say Jetbrains increase their price to a point
| that doesn't make sense, or perhaps they introduce a
| pervasive feature that cannot be disabled). The problem with
| paid LLMs is that one cannot easily switch to open-source
| ones (because they are not as good as the paid ones). So,
| it's a dependency that cannot be avoided, and that's imho
| something that shouldn't be overlooked.
| wepple wrote:
| Anyone can switch from Claude to llama?
| dakiol wrote:
| I don't think so. Let's do a silly experiment: antirez,
| could you ditch Gemini 2.5 PRO and Claude Opus 4, and
| instead use llama? Like never again go back to
| Gemini/Claude. I don't think he can (I don't think he
| would want to). I this is not on antirez, this is on
| everyone who's paying for LLMs at the moment: they are
| paying for them because they are so damn good compared to
| the open source ones... so there's no incentive to
| switch. But again, that's like the climate change:
| there's no incentive to pollute less (well, perhaps to
| save us, but money is more important).
| eevmanu wrote:
| Open-weight and open-source LLMs are improving as well.
| While there will likely always be a gap between closed,
| proprietary models and open models, at the current pace the
| capabilities of open models could match today's closed
| models within months.
| throwaway8879p wrote:
| People who understand the importance of this choice but
| still opt for closed source software are the worst of the
| worst.
|
| You won't be able to switch to a meaningful vim if you
| channel your support to closed source software, not for
| long.
|
| Best to put money where the mouth is.
| dakiol wrote:
| I don't contribute to vim precisely, but I do contribute
| to other open source projects. So, I do like to keep this
| balance between making open source tools better over time
| and using paid alternatives. I don't think that's
| possible tho with LLMs at the moment (and I dont think it
| would be possible in the future, but ofc i could be
| wrong).
| muglug wrote:
| Until you've run a big open-source project you won't
| quite understand how much time and energy it can eat up.
| All that effort won't feed your family.
| rolisz wrote:
| I was a hardcore vim user 10 years ago, but now I just use
| PyCharm to work. I'm paid to solve problems, not to futz
| around with vim configs.
|
| Can you make vim work roughly the same way? Probably you
| can get pretty close. But how many hours do I have to sink
| into the config? A lot. And suddenly the PyCharm license is
| cheap.
|
| And it's exactly the same thing with LLMs. You want hand
| crafted beautiful code, untainted by AI? You can still do
| that. But I'm paid to solve problems. I can solve them
| faster/solve more of them? I get more money.
| skydhash wrote:
| > _I was a hardcore vim user 10 years ago, but now I just
| use PyCharm to work. I 'm paid to solve problems, not to
| futz around with vim configs._
|
| The reason I don't like those arguments is that they
| merge two orthogonal stuff: Solving problems and
| optimizing your tooling. You can optimize PyCharm just as
| much you can fiddle with Vim's config. And people are
| solving with problems with Vim just as you do with an
| IDE. It's just a matter of preference.
|
| In my day job, I have two IDEs, VSCode, and Emacs open. I
| prefer Emacs to edit and git usage, but there's a few
| things that only the IDEs can do (as in I don't bother
| setting emacs to do the same), and VSCode is there
| because people get dizzy with the way I switch buffers in
| Emacs.
| LeafItAlone wrote:
| >The reason I don't like those arguments is that they
| merge two orthogonal stuff: Solving problems and
| optimizing your tooling. You can optimize PyCharm just as
| much you can fiddle with Vim's config.
|
| But you're ignoring that the "optimizing tooling" is for
| the goal of making it easier for you. Its spending time
| now to decrease time spent in the long term.
|
| I spent over a decade with Emacs as my sole editor and
| have since spent over a decade with PyCharm. Day 1 of
| PyCharm already had practically everything that it took a
| decade to get working for Emacs, and more. It was pre-
| optimized for me, so I was able to spend more time
| working on my code. Did I need to spend time optimizing
| Emacs? No. But doing so added intellisense and the
| ability to jump around the codebase very quickly. You
| _can_ spend just as much time optimizing Emacs, but I
| didn't _have_ to in order to get the same result. Or have
| I spent that much time optimizing it since, for even more
| functionality.
| skydhash wrote:
| Emacs is not a python oriented IDE. So the comparison is
| moot from the beginning. Some people likes what Emacs
| offers and mesh it with external tools. Some prefers a
| more complete package. Nothing wrong with either,
| especially if you have the time and the knowledge to
| iterate quicly on the first.
|
| What you may need is something others can do without. So
| what's best is always subjective.
| freedomben wrote:
| Indeed. I've been a vim user for almost two decades, and
| it's been a long, long time since I"ve had to spend time
| solving problems/optimizing my tooling. Yes it was a big
| up front investment, but it's paid off immensely. I don't
| think I'm anything special so please don't misunderstand
| this as a brag, but I routinely have people enjoy
| "watching" me use vim because I can fly around the
| codebase with lightning speed, often times I can have
| already followed code paths through several files before
| VS code is even loaded and ready to work on my coworkers
| machine. The only problem is for whatever reason if I
| know somebody is watching, I sometimes get stage fright
| and forget how to use vim for a few seconds at at time
| :-D
| layer8 wrote:
| > because they are not as good as the paid ones
|
| The alternative is to restrict yourself to "not as good"
| ones already now.
| jacobr1 wrote:
| Seems the same to me. If you are using the llm as a tool to
| build your product (rather than a dependency within it for
| functionality) you can easily switch to a different model,
| or IDE/Agentic-coder in the same way you can switch between
| vim and emacs. It might be a `worse` experience for you or
| have fewer feature, but you aren't locked in, other than in
| the sense of your preference for productivity. In fact in
| seems likely to mee that the tools I'll be using a year
| from now are going to be different than today - and almost
| certainly a different model will be leading. For example
| google surprised everyone with the quality of 2.5.
| ta12653421 wrote:
| Ah, there are community editions of the most imiportant tools
| (since 10+ years), and i doubt e.g. MS will close VS.NET
| Community Version in the future.
| bgwalter wrote:
| Yes, and what is worse is that the same mega-corporations who
| have been ostensibly promoting equity until 2025 are now
| pushing for a gated development environment that costs the same
| as a monthly rent in some countries or more than a monthly
| salary in others.
|
| That problem does not even include lock-in, surveillance, IP
| theft and all other things that come with SaaS.
| righthand wrote:
| It's weird that programmers will champion paying for Llm but
| not ad-free web search.
| haiku2077 wrote:
| I pay for search and have convinced several of my
| collaborators to do so as well
| righthand wrote:
| I think the dev population mostly uses free search, just
| based on the fact no one has told me to "Kagi it" yet.
| haiku2077 wrote:
| When I need a facial tissue I ask for a Kleenex even if
| the box says Puffs. Because who says "pass me the Puffs"?
| martsa1 wrote:
| I've been curious of that phenomenon, why not juat ask
| "pass me a tissue?"
| haiku2077 wrote:
| Cuz all the adults around me called it a Kleenex when I
| was growing up and I've internalized the word for that
| kind of tissue is Kleenex
| righthand wrote:
| Good old American brand loyalty.
| righthand wrote:
| I say "tissue" and "web search" so you're talking to the
| wrong guy with that. Even though growing up everyone
| around me has said Kleenex and Google.
| positron26 wrote:
| Ad-free search doesn't by itself produce a unique product.
| It's just a product that doesn't have noise, noise that
| people with attention spans and focus don't experience at
| all.
|
| Local models are not quite there yet. For now, use the evil
| bad tools to prepare for the good free tools when they do get
| there. It's a self-correcting form of technical debt that we
| will never have to pay down.
| righthand wrote:
| "To prepare for the good free tools"
|
| Why do I have to prepare? Once the good free tools are
| available, it should just work no?
| conradkay wrote:
| They have adblock
| glitchc wrote:
| I'm certain these are advertorials masquerading as personal
| opinions. These people are being paid to promote the product,
| either through outright cash, credits on their platform or just
| swag.
| tptacek wrote:
| So, just so I have this straight, you think antirez is being
| paid by Google to hype Gemini.
| Herring wrote:
| A lot of people are really bad at change. See: immigration.
| Short of giving everyone jazz improv lessons at school,
| there's nothing to be done.
|
| To be fair, change is not always good. We still haven't
| fixed fitness/obesity issues caused (partly) by the
| invention of the car, 150 years later. I think there's a
| decent chance LLMs will have the same effect on the brain.
| simonw wrote:
| I recommend readjusting your advertorial-detecting radar.
| antirez isn't taking kickbacks from anyone.
|
| I added a "disclosures" section to my own site recently, in
| case you're interested:
| https://simonwillison.net/about/#disclosures
| amirhirsch wrote:
| It started out as an innocent kv cache before the redis
| industrial complex became 5% of the GDP
| floucky wrote:
| Why do you see this as a strong dependency? The beauty of it is
| that you can change the model whenever you want. You can even
| just code yourself! This isn't some no-code stuff.
| jstummbillig wrote:
| > Programming used to be (and still is, to a large extent) an
| activity that can be done with open and free tools.
|
| Since when? It starts with computers, the main tool and it's
| architecture not being free and goes from there. Major
| compilers used to not be free. Major IDEs used to not be free.
| For most things there were decent and (sometimes) superior free
| alternatives. The same is true for LLMs.
|
| > The excuse "but you earn six figures, what' $200/month to
| you?" doesn't really capture the issue here.
|
| That "excuse" could exactly capture the issue. It does not,
| because you chose to make it a weirder issue. Just as before:
| You will be free to either not use LLMs, or use open-source
| LLMs, or use paid LLMs. Just as before in the many categories
| that pertain to programming. It all comes at a cost, that you
| might be willing to pay and somebody else is free to really
| does not care that much about.
| randallsquared wrote:
| > _Major compilers used to not be free. Major IDEs used to
| not be free._
|
| There were and are a lot of non-free ones, but since the
| 1990s, GCC and interpreted languages and Linux and Emacs and
| Eclipse and a bunch of kinda-IDEs were all free, and now VS
| Code is one of the highest marketshare IDEs, and those are
| all free. Also, the most used and learned programming
| language is JS, which doesn't need compilers in the first
| place.
| jstummbillig wrote:
| There are free options and there continue to be non-free
| options. The same is true for LLMs.
| vorador wrote:
| When's the last time you paid for a compiler?
| jstummbillig wrote:
| The original point was that there is some inherent
| tradition in programming being free, with a direct
| critique wrt LLMs, which apparently breaks that
| tradition.
|
| And my point is that's simply not the case. Different
| products have always been not free, and continue to be
| not free. Recent example would be something like Unity,
| that is not entirely free, but has competitors, which are
| entirely free and open source. JetBrain is something
| someone else brought up.
|
| Again: You have local LLMs and I have every expectation
| they will improve. What exactly are we complaining about?
| That people continue to build products that are not free
| and, gasp, other people will pay for them, as they always
| have?
| bluefirebrand wrote:
| > Major compilers used to not be free
|
| There's never been anything stopping you from building your
| own
|
| Soon there will be. The knowledge of how to do so will be
| locked behind LLMs, and other sources of knowledge will be
| rarer and harder to find as a result of everything switching
| to LLM use
| jstummbillig wrote:
| For the past decades knowledge was "locked" behind search
| engines. Could you have rolled your own search engine
| indexing the web, to unlock that knowledge? Yes, in the
| same theoretical way that you can roll your own LLM.
| bluefirebrand wrote:
| There was never anything stopping you from finding other
| avenues than Search Engines to get people to find your
| website. You could find a url on a board at a cafe and
| still find a website without a search engine. More local
| sure, but knowledge had ways to spread in the real world
| when it needed to
|
| How are LLMs equivalent? People posting their prompts on
| bulletin boards at cafes?
| retsibsi wrote:
| But what is (or will be) stopping you from finding
| avenues other than LLMs? You say other sources of
| knowledge will be rarer. But they will still exist, and I
| don't see why they will become _less_ accessible than
| non-search-engine-indexed content is now.
| cafp12 wrote:
| IMO It's not unlike all other "dev" tools we use at all. There
| are tons of free and open tools that usually lag a bit behind
| the paid versions. People pay for jetbrains, for mac os, and
| even to search the web (google ads).
|
| You have very powerful open weight models, they are not the
| cutting edge. Even those you can't really run locally, so you'd
| have to pay a 3rd party to run it.
|
| Also the competition is awesome to see, these companies are all
| trying hard to get customers and build the best model and
| driving prices down, and giving you options. No one company has
| all of the power, its great to see capitalism working.
| mirekrusin wrote:
| You don't pay for macOS, you pay for apple device, operating
| system is free.
| cafp12 wrote:
| Thanks captain missing the point
| kgwgk wrote:
| You do pay for the operating system. And for future
| upgrades to the operating system. Revenue recognition is a
| complex and evolving issue.
| antirez wrote:
| It's not that bad: K2 and DeepSeek R1 are at the level of
| frontier models of one year ago (K2 may be even better: I have
| enough experience only with V3/R1). We will see more coming
| since LLMs are incredibly costly to train but very simple in
| their essence (it's like if their fundamental mechanic is built
| in the physical nature of the computation itself) so the
| barrier to entry is large but not insurmountable.
| positron26 wrote:
| If the models are getting cheaper, better, and freer even when
| we use paid ones, then right now is the time to develop
| techniques, user interfaces, and workflows that become the
| inspirations and foundations of a future world of small, local,
| and phenomenally powerful models that have online learning,
| that can formalize their reason, that can bake deduction into
| their own weights and code.
| kelvinjps10 wrote:
| Doesn't already happen with some people being unable to code
| without Google or similar?
| ozgung wrote:
| > The excuse "but you earn six figures, what' $200/month to
| you?" doesn't really capture the issue here.
|
| Just like every other subscription model, including the one in
| the Black Mirror episode, Common People. The value is too good
| to be true for the price at the beginning. But you become their
| prisoner in the long run, with increasing prices and degrading
| quality.
| lencastre wrote:
| Can you expand on your argument?
| jordanbeiber wrote:
| The argument is perhaps "enshittification", and that
| becoming reliant on a specific provider or even set of
| providers for "important thing" will become problematic
| over time.
| x______________ wrote:
| Not op but a something from a few days ago that might be
| interesting for you: 259. Anthropic
| tightens usage limits for Claude Code without telling users
| (techcrunch.com) 395 points by mfiguiere 2 days ago |
| hide | 249 comments
|
| https://news.ycombinator.com/item?id=44598254
| nico wrote:
| Currently in the front page of HN:
| https://news.ycombinator.com/item?id=44622953
|
| It isn't specific to software/subscriptions but there are
| plenty of examples of quality degradation in the comments
| signa11 wrote:
| enshittification/vendor-lockin/stickiness/... take your
| pick
| nicce wrote:
| There is a reason why companies throw billions into AI and
| still are not profitable. They must be the first ones to
| hook the users in the long run and make service necessary
| part of user's life. And then increase the price.
| mleo wrote:
| Or expect price to deliver the service becomes cheaper.
| Or both.
| majormajor wrote:
| I don't think it's subscriptions so much as consumer
| startup pricing strategies:
|
| Netflix/Hulu were "losing money on streaming"-level cheap.
|
| Uber was "losing money on rides"-level cheap.
|
| WeWork was "losing money on real-estate" level cheap.
|
| Until someone releases wildly profitable LLM company
| financials it's reasonable to expect prices to go up in the
| future.
|
| Course, advances in compute are much more reasonable to
| expect than advances in cheap media production, taxi driver
| availability, or office space. So there's a possibility it
| could be different. But that might require capabilities to
| hit a hard plateau so that the compute can keep up. And
| that might make it hard to justify the valuations some of
| these companies have... which could also lead to price
| hikes.
|
| But I'm not as worried as others. None of these have lock-
| in. If the prices go up, I'm happy to cancel or stop using
| it.
|
| For a current student or new grad who has _only_ ever used
| the LLM tools, this could be a rougher transition...
|
| Another thing that would change the calculation is if it
| becomes impossible to maintain large production-level
| systems competitively without these tools. That's
| presumably one of the things the companies are betting on.
| We'll see if they get there. At that point many of us
| probably have far bigger things to worry about.
| bee_rider wrote:
| It isn't even _that_ unreasonable for the AI companies to
| not be profitable at the moment (they are probably
| betting they can decrease costs before they run out of
| money, and want to offer people something like what the
| final experience will be). But it's totally bizarre that
| people are comparing the cost of running locally to the
| current investor-subsidized remote costs.
|
| Eventually, these things should get closer. Eventually
| the hosted solutions have to make money. Then we'll see
| if the costs of securing everything and paying some tech
| company CEO's wage are higher than the benefits of
| centrally locating the inference machines. I expect local
| running will win, but the future is a mystery.
| klabb3 wrote:
| > But I'm not as worried as others. None of these have
| lock-in.
|
| They will. And when they do it will hit hard, especially
| if you're not just a consumer but relying on it for work.
|
| One vector is personalization. Your LLM gets to know you
| and your history. They will not release that to a
| different company.
|
| Another is integrations. Perhaps you're using LLMs for
| assistance, but only Gemini has access to your calendar.
|
| Cloud used to be "rent a server". You could do it
| anywhere, but AWS was good & cheap. Now how is is it to
| migrate? Can you even afford the egress? How easy is it
| to combine offerings from different cloud providers?
| webappguy wrote:
| I personally can't wait for programming to 'die'. It has stolen
| a decade of my life minimum. Like veterinarians being trained
| to help pets ultimately finding out a huge portion of the job
| is killing them. I was not sufficiently informed that I'd spend
| a decade arguing languages, dealing with thousands of other
| developers with diverging opinions, legacy code, poorly if at
| all maintained libraries, tools, frameworks, etc if you have
| been in the game at least a decade please don't @. Adios to
| programming as it was (happily welcoming a new DIFFERENT
| reality whatever that means). Nostalgia is for life, not
| staring at a screen 8hrs a day
| bluefirebrand wrote:
| Feel free to change careers and get lost, no one is forcing
| you to be a programmer.
|
| If you feel it is stealing your life, then please feel free
| to reclaim your life at any time.
|
| Leave the programming to those of us who actually want to do
| it. We don't want you to be a part of it either
| oblio wrote:
| Don't be rude.
| __loam wrote:
| He's being honest, not rude
| oblio wrote:
| Honesty doesn't look like this:
|
| > [...] get lost [...]
|
| > [..] We don't want you to be a part of it either. [...]
|
| He's being rude.
|
| Honesty would be, something like:
|
| > I (and probably many others) like programming a lot.
| Even if you're frustrated with it, I think a great deal
| of people will be sad if somehow programming disappeared
| completely. It might be best for you if you just found a
| job that you love more, instead.
|
| Also the original comment makes a point that's SUPER
| valid and anyone working as a professional programmer for
| 10+ years can't really deny:
|
| > poorly if at all maintained libraries, tools,
| frameworks
|
| Most commercial code just sucks due to misaligned
| incentives. Open Source is better, but not always, as a
| lot of Open Source code is just commercial code opened up
| after the fact.
| theshackleford wrote:
| > Honesty doesn't look like this
|
| Sure it does. Reads incredibly honestly to me.
| lbrito wrote:
| Maybe it's just not for you.
|
| I've been programming professionally since 2012 and still
| love it. To me the sweet spot must've been the early mid
| 2000s, with good enough search engines and ample
| documentation online.
| llbbdd wrote:
| You got some arguably rude replies to this but you're right.
| I've been doing this a long time and the stuff you listed is
| never the fun part despite some insistence on HN that it
| somehow is. I love programming as a platonic ideal but those
| moments are fleeting between the crap you described and I
| can't wait for it to go.
| hsuduebc2 wrote:
| Did you expect computer programming not to involve this much
| time at a computer screen? Most modern jobs especially in
| tech do. If it's no longer fulfilling, it might be worth
| exploring a different role or field instead of waiting for
| the entire profession to change.
|
| I understand your frustration but the problem is mostly
| people. Not the particular skill itself.
| vunderba wrote:
| _> It has stolen a decade of my life minimum._
|
| Feels like this is a byproduct of a poor work-life balance
| more than an intrinsic issue with programming itself. I also
| can't really relate since I've always enjoyed discussing
| challenging problems with colleagues.
|
| I'm assuming by "die" you mean some future where autonomous
| agentic models handle all the work. In this world, where you
| can delete your entire programming staff and have a single PM
| who tells the models what features to implement next, where
| do you imagine you fit in?
|
| I just hope for your sake that you have a fallback set of
| viable skills to survive in this theoretical future.
| midasz wrote:
| Damn dude. I'm just having fun most of the time. The field is
| so insanely broad that if you've got an ounce of affinity
| there's a corner that would fit you snugly AND you'd make a
| decent living. Take a look around.
| jacooper wrote:
| Kimi k2 exists now.
| kelvinjps10 wrote:
| LLMS are basically free? Yes you're rate limited but I have
| just started paying for them now, before I'd bounce around
| between the providers but still free
| jacobr1 wrote:
| The most cutting edge-models aren't usually free, at least at
| first.
| ipaddr wrote:
| They are good enough for 90% of people and 90% of cases
| that I would trust an llm for.
|
| What advantages are people getting on these new models?
| simonw wrote:
| The models I can run locally aren't as good yet, and are _way_
| more expensive to operate.
|
| Once it becomes economical to run a Claude 4 class model
| locally you'll see a lot more people doing that.
|
| The closest you can get right now might be Kimi K2 on a pair of
| 512GB Mac Studios, at a cost of about $20,000.
| QRY wrote:
| Have you considered the Framework Desktop setup they
| mentioned in their announcement blog post[0]? Just marketing
| fluff, or is there any merit to it?
|
| > The top-end Ryzen AI Max+ 395 configuration with 128GB of
| memory starts at just $1999 USD. This is excellent for
| gaming, but it is a truly wild value proposition for AI
| workloads. Local AI inference has been heavily restricted to
| date by the limited memory capacity and high prices of
| consumer and workstation graphics cards. With Framework
| Desktop, you can run giant, capable models like Llama 3.3 70B
| Q6 at real-time conversational speed right on your desk. With
| USB4 and 5Gbit Ethernet networking, you can connect multiple
| systems or Mainboards to run even larger models like the full
| DeepSeek R1 671B.
|
| I'm futsing around with setups, but adding up the specs would
| give 384GB of VRAM and 512GB total memory, at a cost of about
| $10,000-$12,000. This is all highly dubious napkin math, and
| I hope to see more experimentation in this space.
|
| There's of course the moving target of cloud costs and
| performance, so analysing break-even time is even more
| precarious. So if this sort of setup would work, its cost-
| effectiveness is a mystery to me.
|
| [0] https://frame.work/be/en/blog/introducing-the-framework-
| desk...
| cheeze wrote:
| I love Framework but it's still not enough IMO. My time is
| the most valuable thing, and a subscription to
| $paid_llm_of_choice is _cheap_ relative to my time spent
| working.
|
| In my experience, something Llama 3.3 works really well for
| smaller tasks. For "I'm lazy and want to provide minimal
| prompting for you to build a tool similar to what is in
| this software package already", paid LLMs are king.
|
| If anything, I think the best approach for free LLMs would
| be to run using rented GPU capacity. I feel bad knowing
| that I have a 4070ti super that sits idle for 95% of the
| time. I'd rather share an a1000 with bunch of folks and
| have that run at close to max utilization.
| generic92034 wrote:
| > and a subscription to $paid_llm_of_choice is _cheap_
| relative to my time spent working.
|
| In the mid to long term the question is, is the
| subscription covering the costs of the LLM provider.
| Current costs might not be stable for long.
| lhl wrote:
| Strix Halo does not run a 70B Q6 dense model at real-time
| conversational speed - it has a real-world MBW of about 210
| GB/s. A 40GB Q4 will clock just over 5 tok/s. A Q6 would be
| slower.
|
| It _will_ run some big MoEs at a decent speed (eg, Llama 4
| Scout 109B-A17B Q4 at almost 20 tok /s). The other issue is
| its prefill - only about 200 tok/s due to having only very
| under-optimized RDNA3 GEMMs. From my testing, you usually
| have to trade off pp for tg.
|
| If you are willing to spend $10K for hardware, I'd say you
| are much better off w/ EPYC and 12-24 channels of DDR5, and
| a couple fast GPUS for shared experts and TFLOPS. But,
| unless you are doing all-night batch processing, that $10K
| is probably better spent on paying per token or even
| renting GPUs (especially when you take into account power).
|
| Of course, there may be other reasons you'd want to
| inference locally (privacy, etc).
| moffkalast wrote:
| Yeah it's only really viable for chat use cases, coding
| is the most demanding in terms of generation speed, to
| keep the workflow usable it needs to spit out corrections
| in seconds, not minutes.
|
| I use local LLMs as much as possible myself, but coding
| is the only use case where I still entirely defer to
| Claude, GPT, etc. because you need both max speed and
| bleeding edge model intelligence for anything close to
| acceptable results. When Qwen-3-Coder lands + having it
| on runpod might be a low end viable alternative, but
| likely still a major waste of time when you actually need
| to get something done properly.
| zackify wrote:
| The memory bandwidth is crap and you'll never run anything
| close to Claude on that unfortunately. They should have
| shipped something 8x faster at least 2 tb/s bandwidth
| smcleod wrote:
| The framework desktop isn't really that compelling for work
| with LLMs, it's memory bandwidth is very low compared to
| GPUs and Apple Silicon Max/Ultra chips - you'd really
| notice how slow LLMs are on it to the point of frustration.
| Even a 2023 Macbook Pro with a M2 Max chip has twice the
| usable bandwidth.
| zer00eyz wrote:
| > Once it becomes economical to run a Claude 4 class model
| locally you'll see a lot more people doing that.
|
| Historically these sorts of things happened because of Moores
| law. Moores law is dead. For a while we have scaled on the
| back of "more cores", and process shrink. It looks like we
| hit the wall again.
|
| We seem to be near the limit of scaling (physics) we're not
| seeing a lot in clock (some but not enough), and IPC is flat.
| We are also having power (density) and cooling (air wont cut
| it any more) issues.
|
| The requirements to run something like claud 4 local aren't
| going to make it to house hold consumers any time soon.
| Simply put the very top end of consumer PC's looks like 10
| year old server hardware, and very few people are running
| that because there isn't a need.
|
| The only way we're going to see better models locally is if
| there is work (research, engineering) put into it. To be
| blunt that isnt really happening, because Fb/MS/Google are
| scaling in the only way they know how. Throw money at it to
| capture and dominate the market, lock out the innovators from
| your API and then milk the consumer however you can. Smaller,
| and local is antithetical to this business model.
|
| Hoping for the innovation that gives you a moat, that makes
| you the next IBM isnt the best way to run a business.
|
| Based on how often Google cancels projects, based on how
| often the things Zuck swear are "next" face plant (metaverse)
| one should not have a lot of hope about AI>
| esafak wrote:
| Model efficiency is outpacing Moore's law. That's what
| DeepSeek V3 was about. It's just we're simultaneously
| finding ways to use increase model capacity, and that's
| growing even faster...
| zer00eyz wrote:
| > Model efficiency is outpacing Moore's law.
|
| Moores law is dead, has been for along time. There is
| nothing to outpace.
|
| > That's what DeepSeek V3 was about.
|
| This would be a foundational shift! What problem in
| complexity theory was solved that the rest of computing
| missed out on?
|
| Don't get me wrong MOE is very interesting but breaking
| up one large model into independent chunks isn't a
| foundational breakthrough its basic architecture. It's
| 1960's time sharing on unix basics. It's decomposition of
| your application basics.
|
| All that having been said, there is a ton of room for
| these sorts of basic, blood and guts engineering ideas to
| make systems more "portable" and "usable". But a shift in
| thinking to small, targeted and focused will have to
| happen. Thats antithetical to everything in one basket
| throw more compute at it and magically we will get to
| AGI. That clearly isnt the direction the industry is
| going... it wont give any one a moat, or market
| dominance.
| moron4hire wrote:
| I agree with you that Moore's Law being dead means we
| can't expect much more from current, silicon-based GPU
| compute. Any improvement from hardware alone is going to
| have to come from completely new compute technology, of
| which I don't think there is anything mature enough to
| expect any results in the next 10 years.
|
| Right now, hardware wise, we need more RAM in GPUs than
| we really need compute. But it's a breakpoint issue: you
| need enough RAM to hold the model. More RAM that is less
| than the model is not going to improve things much. More
| RAM that is more than the model is largely dead weight.
|
| I don't think larger models are going to show any major
| inference improvements. They hit the long tail of
| diminishing returns re: model training vs quality of
| output at least 2 years ago.
|
| I think the best anyone can hope for in optimizing
| current LLM technology is improve the performance of
| inference engines, and there at most I can imagine only
| about a 5x improvement. That would be a really long tail
| of performance optimizations that would take at least a
| decade to achieve. In the 1 to 2 year timeline, I think
| the best that could be hoped for is a 2x improvement. But
| I think we may have already seen much of the low hanging
| optimization fruit already picked, and are starting to
| turn the curve into that long tail of incremental
| improvements.
|
| I think everyone betting on LLMs improving the
| performance of junior to mid level devs and that leading
| to a Renaissance of software development speed is wildly
| over optimistic as to the total contribution to
| productivity those developers already represent. Most of
| the most important features are banged out by harried,
| highly skilled senior developers. Most everyone else is
| cleaning up around the edges of that. Even a 2 or 3x
| improvement of the bottom 10% of contributions is only
| going to grow the pie just so much. And I think these
| tools are basically useless to skilled senior devs. All
| this "boilerplate" code folks keep cheering the AI is
| writing for them is just not that big of a deal. 15
| minutes of savings once a month.
|
| But I see how this technology works and what people are
| asking it to do (which in my company is basically "all
| the hard work that you already weren't doing, so how are
| you going to even instruct an LLM to do it if you don't
| really know how to do it?") and there is such a huge gap
| between the two that I think it's going to take at least
| a 100x improvement to get there.
|
| I can't see AI being all that much of an improvement on
| productivity. It still gives wrong results too many
| times. The work needed to make it give good results is
| the same sort of work we should have been doing already
| to be able to leverage classical ML systems with more
| predictable performance and output. We're going to spend
| trillions as an industry trying to chase AI that will
| only end up being an exercise in making sure documents
| are stored in a coherent, searchable way. At which point,
| why not do just that and avoid having to pressure the
| energy industry to firing up a bunch of old coal plants
| to meet demand?
| viraptor wrote:
| > What problem in complexity theory was solved
|
| None. We're still in the "if you spend enough effort you
| can make things less bad" era of LLMs. It will be a while
| before we even find out what are the theoretical limits
| in that area. Everyone's still running on roughly the
| same architecture after all - big corps haven't even
| touched recursive LLMs yet!
| mleo wrote:
| Why wouldn't 3rd party hardware vendors continue to work on
| reducing costs of running models locally? If there is a
| market opportunity for someone to make money, it will be
| filled. Just because the cloud vendors don't develop
| hardware someone will. Apple has vested interest in making
| hardware to run better models locally, for example.
| zer00eyz wrote:
| > Why wouldn't 3rd party hardware vendors continue to
| work on reducing costs of running models locally?
|
| Every one wants this to happen they are all trying but...
|
| EUV, what has gotten us down to 3nm and less is HARD.
| Reduction in chip size has lead to increases in density
| and lower costs. But now yields are DOWN and the design
| concessions to make the processes work are hurting costs
| and performance. There are a lot of hopes and prayers in
| the 1.8 nodes but things look grim.
|
| Power is a massive problem for everyone. It is a MASSIVE
| a problem IN the data center and it is a problem for
| GPU's at home. Considering that locally is a PHONE for
| most people it's an even bigger problem. With all this
| power comes cooling issues. The industry is starting to
| look at all sorts of interesting ways to move heat away
| from cores... ones that don't involve air.
|
| Design has hit a wall as well. If you look at NVIDIA's
| latest offering its IPC, (thats Instructions Per Clock
| cycle) you will find they are flat. The only gains
| between the latest generation and previous have come from
| small frequency upticks. These gains came from using
| "more power!!!", and thats a problem because...
|
| Memory is a problem. There is a reason that the chips for
| GPU's are soldered on to the boards next to the
| processors. There is a reason that laptops have them
| soldered on too. CAMM try's to fix some of this but the
| results are, to say the least, disappointing thus far.
|
| All of this has been hitting cpu's slowly, but we have
| also had the luxury of "more cores" to throw at things.
| If you go back 10-15 years a top end server is about the
| same as a top end desktop today (core count, single core
| perf). Because of all of the above issues I don't think
| you are going to get 700+ core consumer desktops in a
| decade (current high end for server CPU)... because of
| power, costs etc.
|
| Unless we see some foundational breakthrough in hardware
| (it could happen), you wont see the normal generational
| lift in performance that you have in the past (and I
| would argue that we already haven't been seeing that).
| Someone is going to have to make MAJOR investments in the
| software side, and there is NO MOAT by doing so. Simply
| put it's a bad investment... and if we can't lower the
| cost of compute (and it looks like we can't) its going to
| be hard for small players to get in and innovate.
|
| It's likely you're seeing a very real wall.
| smallerize wrote:
| I don't have to actually run it locally to remove lock-in.
| Several cloud providers offer full DeepSeek R1 or Kimi K2 for
| $2-3/million output tokens.
| ketzo wrote:
| In what ways is that better for you than using eg Claude?
| Aren't you then just "locked in" to having a cloud provider
| which offers those models cheaply?
| viraptor wrote:
| Any provider can run Kimi (including yourself if you
| would get enough use out of it), but only one can run
| Claude.
| oblio wrote:
| The thing is, code is quite compact. Why do LLMs need to
| train on content bigger than the size of the textual internet
| to be effective?
|
| Total newb here.
| airspresso wrote:
| Many reasons, one being that LLMs are essentially
| compressing the training data to unbelievably small data
| volumes (the weights). When doing so, they can only afford
| to keep the general principles and semantic meaning of the
| training data. Bigger models can memorize more than smaller
| ones of course, but are still heavily storage limited.
| Through this process they become really good at semantic
| understanding of code and language in general. It takes a
| certain scale of training data to achieve that.
| oblio wrote:
| Yeah, I just asked Gemini and apparently some older
| estimates put a relatively filtered dataset of Github
| source code at around 21TB in 2018, and some more recent
| estimates could put it in the low hundreds of TB.
|
| Considering as you said, that LLMs are doing a form of
| compression, and assuming generously that you add extra
| compression on top, yeah, now I understand a bit more.
| Even if you focus on non-similar code to get the most
| coverage, I wouldn't be shocked if a modern,
| representative source code training data from Github
| weighed 1TB, which obviously is a lot more than consumer
| grade hardware can bear.
|
| I guess we need to ramp up RAM production a bunch more
| :-(
|
| Speaking of which, what's the next bottle neck except for
| storing the damned things? Training needs a ton of
| resources but that part can be pooled, even for OSS
| models, it "just" need to be done "once", and then the
| entire community can use the data set. So I guess
| inference is the scaling cost, what's the most used
| resource there? Data bandwidth for RAM?
| Fervicus wrote:
| > I cannot understand how programmers don't mind adding a
| strong dependency on a third party in order to keep programming
|
| And how they don't mind freely opening up their codebase to
| these bigtech companies.
| LeafItAlone wrote:
| You mean the same companies they are hosting their VCS in and
| providing the infrastructure they deploy their codebases to?
| All in support of their CRUD application that is in a space
| with 15 identical competitors? My codebase is not my secret
| sauce.
| Fervicus wrote:
| Sure, the codebase itself isn't special. But it's the
| principle and ethics of it all. These companies trained
| their models unethically without consequence, and now
| people are eating up their artificially inflated hype and
| are lining up to give them money and their data on a silver
| platter.
| geoka9 wrote:
| > > I cannot understand how programmers don't mind adding a
| strong dependency on a third party in order to keep
| programming > > And how they don't mind freely opening up
| their codebase to these bigtech companies.
|
| And how they don't mind opening up their development machines
| to agents driven by a black-box program that is run in the
| cloud by a vendor that itself doesn't completely understand
| how it works.
| fragmede wrote:
| > Programming used to be (and still is, to a large extent) an
| activity that can be done with open and free tools.
|
| Not without a lot of hard thankless work by people like RMS to
| write said tools. Programming for a long while was the purview
| of Microsoft Visual Studio family, which cost hundreds, if not
| thousands of dollars. There existed other options, some of
| which was free, but, as is the case today with LLMs you can run
| at home, they were often worse.
|
| This is why making software developer tools is such a tough
| market and why debugging remains basically in the dark ages
| (though there are the occasional bright lights like rr). Good
| quality tools are expensive, for doctors and mechanics, why do
| we as software developers expect ours to be free, libre and
| gratis?
| TacticalCoder wrote:
| I rely on these but there's zero loyalty. The moment something
| better is there, like when Gemini 2.5 Pro showed up, I
| immediately switch.
|
| That's why I drink the whole tools kool-aid. From TFA:
|
| > In this historical moment, LLMs are good amplifiers and bad
| one-man-band workers.
|
| That's how I use them: write a function here, explain an error
| message there. I'm still in control.
|
| I don't depend on LLMs: they just amplify.
|
| I can pull the plug immediately and I'm still able to code, as
| I was two years ago.
|
| Shall DeepSeek release a free SOTA model? I'll then use that
| model locally.
|
| It's not because I use LLMs that I have a strong dependency on
| them.
|
| Just like I was already using JetBrains' IntelliJ IDEA back
| when many here were still kids (and, yup, it was lightyears
| better than NetBeans and Eclipse) didn't make me have a strong
| dependency on JetBrains tools.
|
| I'm back to Emacs and life is good: JetBrains IDEs didn't make
| me forget how to code, just as LLMs won't.
|
| They're just throwaway tools and are to be regarded as such.
| throw-number9 wrote:
| > Programming used to be (and still is, to a large extent) an
| activity that can be done with open and free tools. I am afraid
| that in a few years, that will no longer be possible .. The
| excuse "but you earn six figures, what' $200/month to you?"
| doesn't really capture the issue here.
|
| Yeah, coding (and to a lesser extent IT in general) at one
| point was a real meritocracy, where skill mattered more than
| expensive/unnecessary academic pedigree. Not perfect of course,
| but real nevertheless. And coders were the first engineers who
| really said "I won't be renting a suit for an interview, I
| think an old t-shirt is fine" and we normalized that. Part of
| this was just uncompromisingly practical.. like you can either
| do the work or not, and fuck the rest of that noise. But there
| was also a pretty punk aspect to this for many people in the
| industry.. some recognition that needing to have money to make
| money was a bullshit relic of closeted classism.
|
| But we're fast approaching a time where both the old metrics
| _(how much quality code are you writing how fast and what 's
| your personal open source portfolio like?)_ and the new metrics
| _(are you writing a blog post every week about your experience
| with the new models, is your personal computer fast enough to
| even try to run crappy local models?)_ are both going to favor
| those with plenty of money to experiment.
|
| It's not hard to see how this will make inequality worse and
| disadvantage junior devs, or just talented people that didn't
| plan other life-events around purchasing API credits/GPUs. A
| pay-to-play kind of world was ugly enough in politics and
| business so it sucks a lot to see it creeping into engineering
| disciplines but it seems inevitable. If paying for tokens/GPU
| ever allows you to purchase work or promotion by proxy, we're
| right back to this type of thing
| https://en.wikipedia.org/wiki/Purchase_of_commissions_in_the...
| LeafItAlone wrote:
| >I am afraid that in a few years, that will no longer be
| possible (as in most programmers will be so tied to a paid LLM
|
| As of now, I'm seeing no lock-in for any LLM. With tools like
| Aider, Cursor, etc., you can swim on a whim. And with Aider, I
| do.
|
| That's what I currently don't get in terms of investment.
| Companies (in many instances, VCs) are spending billions of
| dollars and tomorrow someone else eats their lunch. They are
| going to need to determine that method of lock-in at some
| point, but I don't see it happening with the way I use the
| tools.
| jerrygenser wrote:
| They can lock in by subsidizing the price of you use their
| tool, while making the default price larger for wrappers.
| This can draw people from the wrapper that can support
| multiple models to the specific CLI that supports the
| proprietary model.
| rapind wrote:
| Not an issue and I'll tell you why.
|
| If the gains plateau, then there's really no need to make
| productivity sacrifices here for the societal good, because
| there's so much competition, and various levels of open models
| that aren't far behind, that there will be no reason to stick
| with a hostile and expensive service unless it's tooling stays
| leaps ahead of the competition.
|
| If the gains don't plateau, well then we're obsolete anyways,
| and will need to pivot to... something?
|
| So I sympathize, but pragmatically I don't think there's much
| point in stressing it. I also suspect the plateau is coming and
| that the stock of these big players is massively overvalued.
| 20k wrote:
| +1, I use exclusively free tools for this exact reason. I've
| been using the same tools for 15 years now (GCC + IDE), and
| they work great
|
| There is a 0% chance that I'm going to subscribe to being able
| to program, because its actively a terrible idea. You have to
| be _very_ naive to think that any of these companies are still
| going to be around and supporting your tools in 10-20 years
| time, so if you get proficient with them you 're absolutely
| screwed
|
| I've seen people say that AI agents are great because instead
| of using git directly, they can ask their AI agent to do it.
| Which would be fine if it was a free tool, but you're
| subscribing to the ability to even start and maintain projects
|
| A lot of people are about to learn an extremely blunt lesson
| about capitalism
| moron4hire wrote:
| A lot of people's problems with Git would go away if they
| just took a weekend and "read the docs." It's shocking how
| resistant most people are to the idea of studying to improve
| their craft.
|
| I've been spending time with my team, just a few hours a
| week, on training them on foundational things, vs every other
| team in the company just plodding along, trying to do things
| the same way they always have, which already wasn't working.
| It's gotten to where my small team of 4 is getting called in
| to clean up after these much larger teams fail to deliver.
| I'm pretty proud of my little junior devs.
| jonas21 wrote:
| How is it a strong dependency? If Claude were to disappear
| tomorrow, you could just switch to Gemini. If all proprietary
| LLMs were to disappear tomorrow (I don't know how that would
| happen, but let's suppose for the sake of argument), then you
| switch to free LLMs, or even just go back to doing everything
| by hand. There's very little barrier to switching models if you
| have to.
| overgard wrote:
| I think it's an unlikely future.
|
| What I think is more likely is people will realize that every
| line of code written is, to an extent, a liability, and
| generating massive amounts of sloppy insecure poorly performing
| code is a massive liability.
|
| That's not to say that AI's will go away, obviously, but I
| think when the hype dies down and people get more accustomed to
| what these things can and can't do well we'll have a more
| nuanced view of where these things should be applied.
|
| I suppose what's still not obvious to me is what happens if the
| investment money dries up. OpenAI and Anthropic, as far as I
| know, aren't anywhere near profitable and they require record
| breaking amounts of capital to come in just to sustain what
| they have. If what we currently see is the limit of what LLM's
| and other generative techniques can do, then I can't see that
| capital seeing a good return on its investment. If that's the
| case, I wonder if when the bubble bursts these things become
| massively more expensive to use, or get taken out of products
| entirely. (I won't be sad to see all the invasive Copilot
| buttons disappear..)
| kossae wrote:
| The point on investment is apt. Even if they achieve twice as
| much as they're able to today (some doubts amongst experts
| here), when the VC funding dries up we've seen what happens.
| It's time to pay the piper. The prices rise to Enterprise-
| plan amounts, and companies start making much more real ROI
| decisions on these tools past the hype bubble. Will be
| interesting to see how that angle plays out. I'm no denier
| nor booster, but in the capitalist society these things
| inevitably balance out.
| paulddraper wrote:
| And you thought Visual Studio was free?
|
| Or Windows?
|
| Or an Apple Developer License?
|
| There are free/free-ish, options, but there have always been
| paid tools.
| sneak wrote:
| Strong dependency? I can still code without LLMs, just an order
| of magnitude slower.
|
| There is no dependency at all.
| cheschire wrote:
| I find agentic coding to be best when using one branch per
| conversation. Even if that conversation is only a single bugfix,
| branch it. Then do 2 or 3 iterations of that same conversation
| across multiple branches and choose the best result of the 3 and
| destroy the other two.
| nlh wrote:
| Can anyone recommend a workflow / tools that accomplishes a
| _slightly_ more augmented version of antirez' workflow &
| suggestions minus the copy-pasting?
|
| I am on board to agree that pure LLM + pure original full code as
| context is the best path at the moment, but I'd love to be able
| to use some shortcuts like quickly applying changes, checkpoints,
| etc.
|
| My persistent (and not unfounded?) worry is that all the major
| tools & plugins (Cursor, Cline/Roo) all play games with their own
| sub-prompts and context "efficiency".
|
| What's the purest solution?
| cheeseface wrote:
| Claude Code has worked well for me. It is easy to point it to
| the relevant parts of the codebase and see what it decides to
| read itself so you provide missing piece of code when
| necessary.
| afro88 wrote:
| This is almost the opposite of what OP is asking, and what
| the post from antirez describes.
| afro88 wrote:
| You can actually just put Cursor in manual mode and it's the
| same thing. You 100% manage the context and there's no agentic
| loop.
|
| If your codebase fits in the context window, you can also just
| turn on "MAX" mode and it puts it all in the context for you.
| airstrike wrote:
| I think all conversations about coding with LLMs, vibe coding,
| etc. need to note the domain and choice of programming language.
|
| IMHO those two variables are 10x (maybe 100x) more explanatory
| than any vibe coding setup one can concoct.
|
| Anyone who is befuddled by how the other person {loves, hates}
| using LLMs to code should ask what kind of problem they are
| working on and then try to tackle the same problem with AI to get
| a better sense for their perspective.
|
| Until then, every one of these threads will have dozens of
| messages saying variations of "you're just not using it right"
| and "I tried and it sucks", which at this point are just noise,
| not signal.
| cratermoon wrote:
| They should also share their prompts and discuss exactly _how
| much_ effort went into checking the output and re-prompting to
| get the desired result. The post hints at how much work it
| takes for the human, "If you are able to describe problems in
| a clear way and, if you are able to accept the back and forth
| needed in order to work with LLMs ... you need to provide
| extensive information to the LLM: papers, big parts of the
| target code base ... And a brain dump of all your understanding
| of what should be done. Such braindump must contain especially
| the following:" and more.
|
| After all the effort getting to the point where the generated
| code is acceptable, one has to wonder, why not just write it
| yourself? The time spent typing is trivial to all the cognitive
| effort involved in describing the problem, and describing the
| problem in a rigorous way is the essence of programming.
| tines wrote:
| I would assume the argument is that you only need to provide
| the braindump and extensive information one time (or at
| least, collect it once, if not upload once) and then you can
| take your bed of ease as the LLM uses that for many tasks.
| skydhash wrote:
| The thing is no one writes that much code, at least anyone
| that cares about code reuse. Mostly the times is spent
| collecting the information (especially communication with
| stakeholder), and verifying that the code you wrote didn't
| break anything.
| UncleEntity wrote:
| > After all the effort getting to the point where the
| generated code is acceptable, one has to wonder, why not just
| write it yourself?
|
| You know, I would often ask myself that very question...
|
| Then I discovered the stupid robots are good at designing a
| project, you ask them to produce a design document, argue
| over it with them for a while, make revision and changes,
| explore new ideas, then, finally, ask them to produce the
| code. It's like being able to interact with the yaks you're
| trying to shave, what's not to love about that?
| datastoat wrote:
| > They should also share their prompts
|
| Here's a recent ShowHN post (a map view for OneDrive photos),
| which documents all the LLM prompting that went into it:
|
| https://news.ycombinator.com/item?id=44584335
| skippyboxedhero wrote:
| Have used Claude's GitHub action quite a bit now (10-20 issue
| implementations, a bit more PR reviews), and it is hit and miss
| so agree with the enhanced coding rather than just letting it run
| loose.
|
| When the change is very small, self-contained feature/refactor it
| can mostly work alone, if you have tests that cover the feature
| then it is relatively safe (and you can do other stuff because it
| is running in an action, which is a big plus...write the issue
| and you are done, sometimes I have had Claude write the issue
| too).
|
| When it gets to a more medium size, it will often produce
| something that will appear to work but actually doesn't. Maybe I
| don't have test coverage and it is my fault but it will do this
| the majority of the time. I have tried writing the issue myself,
| adding more info to claude.md, letting claude write the issue so
| it is a language it understands but nothing works, and it is
| quite frustrating because you spend time on the review and then
| see something wrong.
|
| And anything bigger, unsurprisingly, it doesn't do well.
|
| PR reviews are good for small/medium tasks too. Bar is lower here
| though, much is useless but it does catch things I have missed.
|
| So, imo, still quite a way from being able to do things
| independently. For small tasks, I just get Claude to write the
| issue, and wait for the PR...that is great. For medium (which is
| most tasks), I don't need to do much actual coding, just
| directing Claude...but that means my productivity is still way
| up.
|
| I did try Gemini but I found that when you let it off the leash
| and accept all edits, it would go wild. We have Copilot at work
| reviewing PRs, and it isn't so great. Maybe Gemini better on
| large codebases where, I assume, Claude will struggle.
| fumeux_fume wrote:
| I currently use LLMs as a glorified Stack Overflow. If I want to
| start integrating an LLM like Gemini 2.5 PRO into my IDE (I use
| Visual Studio Code), whats the best way to do this? I don't want
| to use a platform like Cursor or Claude Code which takes me away
| from my IDE.
| hedgehog wrote:
| GitHub Copilot is pretty easy to try within VS Code
| fumeux_fume wrote:
| I want to use Gemini 2.5 PRO. I was an early tester of
| Copilot and it was awful.
| kgwgk wrote:
| https://docs.github.com/en/copilot/reference/ai-
| models/suppo...
| fumeux_fume wrote:
| Thank you! When I was testing out Copilot I was stuck
| with whatever default LLM was being used. Didn't realize
| you could switch it out for a non-MS/OpenAI model.
| hedgehog wrote:
| In my testing Sonnet 4 is far better than any of the
| Google or OpenAI models.
| haiku2077 wrote:
| Copilot has 2.5 Pro in the settings in github.com, along
| with claude 4
| physicles wrote:
| Cursor is an IDE. You can use its powerful (but occasionally
| wrong) autocomplete, and start asking it to do small coding
| tasks using the Ctrl+L side window.
| fumeux_fume wrote:
| I don't want to leave my IDE
| Philpax wrote:
| Worth noting that Cursor is a VS Code fork and you can copy
| all of your settings over to it. Not saying that you have
| to, of course, but that it's perhaps not as different as
| you might be imagining.
| anthomtb wrote:
| Does running a Claude Code command in VSCode's integrated
| terminal count as leaving your IDE?
|
| (We may have differing definitions of "leaving" ones IDE).
| cyral wrote:
| I don't either but unfortunately Cursor is better than all
| the other plugins for IDEs like JetBrains. I just tab over
| to cursor and prompt it, then edit the code in my IDE of
| choice.
| brainless wrote:
| Lovely post @antirez. I like the idea that LLMs should be
| directly accessing my codebase and there should be no agents in
| between. Basically no software that filters what the LLM sees.
|
| That said, are there tools that make going through a codebase
| easier for LLMs? I guess tools like Claude Code simply grep
| through the codebase and find out what Claude needs. Is that good
| enough or are there tools which keep a much more thorough view of
| the codebase?
| DSingularity wrote:
| Sorry if I missed it in the article -- what's your setup? Do you
| use a CLI tool like aider or are you using an IDE like cursor?
| quantumHazer wrote:
| He uses vim and copy paste code from web interfaces because he
| wants to maintain control and understanding of the code. You
| can find proofs of this setup on his youtube channel
| [https://www.youtube.com/@antirez]
| antirez wrote:
| Thanks. Also based on the coding rig you use models may not
| match the performance of what it is served via web. Or may
| not be as cheap. For instance the Gemini 2.5 pro 20$ account
| is very hard to saturate with queries.
| antirez wrote:
| Terminal with vim in one side, the official web interface of
| the model in the other side. The pbcopy utility to pass stuff
| in the clipboard. I believe models should be used in their
| native interface as when there are other layers sometimes the
| model served is not exactly the same, other times it misbehaves
| because of RAG and in general no exact control of the context
| window.
| js2 wrote:
| This seems like a lot of work depending upon the use case.
| e.g. the other day I had a bunch of JSON files with contact
| info. I needed to update them with more recent contact info
| on an internal Confluence page. I exported the Confluence
| page to a PDF, then dropped it into the same directory as the
| JSON files. I told Claude Code to read the PDF and use it to
| update the JSON files.
|
| It tried a few ways to read the PDF before coming up with
| installing PyPDF2, using that to parse the PDF, then updated
| all the JSON files. It took about 5 minutes to do this, but
| it ended up 100% correct, updating 7 different fields across
| two dozen JSON files.
|
| (The reason for the PDF export was to get past the Confluence
| page being behind Okta authentication. In retrospect, I
| probably should've saved the HTML and/or let Claude Code
| figure out how to grab the page itself.)
|
| How would I have done that with Gemini using just the web
| interface?
| speedgoose wrote:
| Thanks for writing this article.
|
| I used a similar setup until a few weeks ago, but coding agents
| became good enough recently.
|
| I don't find context management and copy pasting fun, I will let
| GitHub Copilot Insiders or Claude Code do it. I'm still very much
| in the loop while doing vibe coding.
|
| Of course it depends on the code base, and Redis may not benefit
| much from coding agents.
|
| But I don't think one should reject vibe coding at this stage, it
| can be useful when you know what the LLMs are doing.
| cushychicken wrote:
| I'm super curious to see the reactions in the comments.
|
| antirez is a big fuggin deal on HN.
|
| I'm sort of curious if the AI doubting set will show up in force
| or not.
| babuloseo wrote:
| OP I think gemini 2.5 pro is in the hospital and has been
| recovering for the last 2 weeks, lets all wish our good friend a
| good recovery and hope they can get back to their normal selves,
| 1024core wrote:
| I have found that if I ask the LLM to first _describe_ to me what
| it wants to do without writing any code, then the subsequent code
| generated has much higher quality. I will ask for a detailed
| description of the things it wants to do, give it some feedback
| and after a couple of iterations, tell it to go ahead and
| implement it.
| lysecret wrote:
| IMO Claude code was a huge step up. We have a large and well
| structured python code base revolving mostly around large and
| complicated adapter pattern Claude is almost fully capable to
| implement a new adapter if given the right prompt/resources.
| krupan wrote:
| What is the overall feedback loop with LLMs writing code? Do they
| learn as they go like we do? Do they just learn from reading code
| on GitHub? If the latter, what happens as less and less code gets
| written by human experts? Do the LLMs then stagnate in their
| progress and start to degrade? Kind of like making analog copies
| of analog copies of analog copies?
| Herring wrote:
| Code and math are similar to chess/go, where verification is
| (reasonably) easy so you can generate your own high-quality
| training data. It's not super straightforward, but you should
| still expect more progress in coming years.
| cesarb wrote:
| > Code and math are similar to chess/go, where verification
| is (reasonably) easy
|
| Verification for code would be a formal proof, and these are
| hard; with a few exceptions like seL4, most code does not
| have any formal proof. Games like chess and go are much
| easier to verify. Math is in the middle; it also needs formal
| proofs, but most of math is doing these formal proofs
| themselves, and even then there are still unproven
| conjectures.
| Herring wrote:
| Verification for code is just running it. Maybe
| "verification" was the wrong word. The model just needs a
| sense of code X leads to outcome Y for a large number of
| (high-quality) XY pairs, to learn how to navigate the space
| better, same as with games.
| steveklabnik wrote:
| > Do they learn as they go like we do?
|
| It's complicated. You have to understand that when you ask an
| LLM something, you have the model itself, which is kind of like
| a function: put something in, get something out. However, you
| also pass an argument to that function: the context.
|
| So, in a literal sense, no, they do not learn as they go, in
| the sense that the model, that function, is unchanged by what
| you send it. But the context can be modified. So, in some
| sense, an LLM in a agentic loop that goes and reads some code
| from GitHub can include that information in the context it uses
| in the future, so it will "learn" within the session.
|
| > If the latter, what happens as less and less code gets
| written by human experts?
|
| So, this is still a possible problem, because future trainings
| of future LLMs will end up being trained on code written by
| LLMs. If this is a problem or not is yet to be seen, I don't
| have a good handle on the debates in this area, personally.
| iandanforth wrote:
| The most interesting and divergent part of this post is this bit:
|
| "Don't use agents or things like editor with integrated coding
| agents."
|
| He argues that the copy/paste back and forth with the web UI is
| essential for maintaining control and providing the correct
| context.
| tomwphillips wrote:
| I'm surprised IDE integration is written off. I've been pleased
| with Junie's agent mode in IntelliJ. Works well.
| delduca wrote:
| I have a good example of how sometimes AI/LLM can write very very
| inefficient code: https://nullonerror.org/2025/07/12/ai-will-
| replace-programme...
| dawnerd wrote:
| Similarly, AI is really bad at code golf. You'd think it would
| be great, able to know all the little secrets but nope. It
| needs verbose code.
| verbify wrote:
| I wonder what fine tuning an LLM on code golf examples would
| produce.
| lettergram wrote:
| Contrary to this post, I think the AI agents, particularly the
| online interface of OpenAI's Codex to be a massive help.
|
| One example, I had a PR up that was being reviewed by a
| colleague. I was driving home from vacation when I saw the 3-4
| comments come in. I read them when we stopped for gas, went to
| OpenAI / codex on my phone, dictated what I needed and made it PR
| to my branch. Then got back on the road & PR'd it. My colleague
| saw the PR, agreed and merged it in.
|
| I think of it as having a ton of interns, the AI is about the
| same quality. It can help to have them, but they often get stuck,
| need guidance, etc. If you treat the AI like an intern and
| explain what you need it can often produce good results; just be
| prepared to fallback to coding quickly.
| entropyneur wrote:
| Interesting. This is quite contrary to my experience. Using LLMs
| for things ouside my expertise produces crappy results which I
| can only identify as such months later when my expertise expands.
| Meanwhile delegating the boring parts that I know too well to
| agents proved to be a huge productivity boost.
| sitkack wrote:
| I find it serendipitous that Antirez is into LLM based coding,
| because the attention to detail in Redis means all the LLMs have
| trained extensively on the Redis codebase.
|
| Something that was meant for humans, has now been consumed by AI
| and he is being repaid for that openness in a way. It comes full
| circle. Consistency, clarity and openness win again.
| ok123456 wrote:
| One way to utilize these CLI coding agents that I like is to have
| them run static analysis tools in a loop, along with whatever
| test suite you have set up, systematically improving crusty code
| beyond the fixes that the static analysis tools offer.
| wg0 wrote:
| I don't understand.
|
| Is author suggesting manually pasting redis C files into Gemini
| Pro chat window on the web?
| thefourthchime wrote:
| I was mostly nodding my head until he got to this part.
|
| The fundamental requirement for the LLM to be used is: don't
| use agents or things like editor with integrated coding agents.
|
| So right, is he like actually copying and pasting stuff into a
| chat window? I did this before Co-Pilot, but with cursor I
| would never think of doing that. He never mentioned Cursor or
| Claude Code so I wonder if he's even experienced it.
| libraryofbabel wrote:
| Right, this didn't make much sense to me either. Who'd still
| recommend copy-and-paste-into-chat coding these days with
| Claude Code and similar agents available? I wonder if he's
| got agents / IDEs like windsurf, copilot, cursor etc where
| there is more complexity between you and the frontier LLM and
| various tricks to minimize token use. Claude Code, Gemini CLI
| etc aren't like that and will just read in whole files into
| the context so that the LLM can see everything, which I think
| achieves what he wants but with all the additional magic of
| agents like edits, running tests, etc. as well.
| torginus wrote:
| I'm a little baby when it comes to Claude Code and agentic
| AI, that said I was a heavy user of Cursor since it came out,
| and before agents came out, I had to manually select which
| files would be included in the prompt of my query.
|
| Now Cursor's agent mode does this for me, but it can be a hit
| or miss.
| jwpapi wrote:
| Thank you very much this is exactly my experience. I sometimes
| let it vibe code frontend features that area easy to test in an
| already typed code base (add a field to this form), but most of
| the time its my sparring partner to review my code and evaluate
| all options. While it often recommends bullox or has logical
| flaws it helps me to do the obvious thing and to not miss a
| solution! Sometimes we have fancy play syndrome and want to code
| the complicated thing because of a fundamental leak we have. LLMS
| done a great job of reducing those of my flaws.
|
| But just because I've not been lazy...
___________________________________________________________________
(page generated 2025-07-20 23:00 UTC)