[HN Gopher] The current state of LLM-driven development
___________________________________________________________________
The current state of LLM-driven development
Author : Signez
Score : 78 points
Date : 2025-08-09 16:17 UTC (6 hours ago)
(HTM) web link (blog.tolki.dev)
(TXT) w3m dump (blog.tolki.dev)
| randfish wrote:
| Deeply curious to know if this is an outlier opinion, a
| mainstream but pessimistic one, or the general consensus. My
| LinkedIn feed and personal network certainly suggests that it's
| an outlier, but I wonder if the people around me are overly
| optimistic or out of synch with what the HN community is
| experiencing more broadly.
| Terretta wrote:
| Which part of the opinion?
|
| I tend to _strongly_ agree with the "unpopular opinion" about
| the IDEs mentioned versus CLI (specifically, aider.chat and
| Claude Code).
|
| Assuming (this is key) you have mastery of the language and
| framework you're using, working with the CLI tool in 25 year
| old XP practices is an incredible accelerant.
|
| Caveats:
|
| - You absolutely must bring taste and critical thinking, as the
| LLM has neither.
|
| - You absolutely must bring systems thinking, as it cannot keep
| deep weirdness "in mind". By this I mean the second and third
| order things that "gotcha" about how things ought to work but
| don't.
|
| - Finally, you should package up everything new about your
| language or frameworks since a few months or year before the
| knowledge cutoff date, and include a condensed synthesis in
| your context (e.g., Swift 6 and 6.1 versus the 5.10 and 2024's
| WWDC announcements that are all GPT-5 knows).
|
| For this last one I find it useful to (a) use OpenAI's "Deep
| Research" to first whitepaper the gaps, then another pass to
| turn that into a Markdown context prompt, and finally bring
| that over to your LLM tooling to include as needed when doing a
| spec or in architect mode. Similarly, (b) use repomap tools on
| dependencies if creating new code that leverages those
| dependencies, and have that in context for that work.
|
| I'm confused why these two obvious steps aren't built into
| leading agentic tools, but maybe handling the LLM as a naive
| and outdated "Rain Man" type doesn't figure into mental models
| at most KoolAid-drinking "AI" startups, or maybe vibecoders
| don't care, so it's just not a priority.
|
| Either way, context based development beats Leroy Jenkins.
| throwdbaaway wrote:
| > use repomap tools on dependencies if creating new code that
| leverages those dependencies, and have that in context for
| that work.
|
| It seems to me that currently there are 2 schools of thought:
|
| 1. Use repomap and/or LSP to help the models navigate the
| code base
|
| 2. Let the models figure things out with grep
|
| Personally, I am 100% a grep guy, and my editor doesn't even
| have LSP enabled. So, it is very interesting to see how many
| of these agentic tools do exactly the same thing.
|
| And Claude Code /init is a great feature that basically
| writes down the current mental model after the initial round
| of grep.
| WD-42 wrote:
| I think it's pretty common among people whose job it is to
| provide working, production software.
|
| If you go by MBA types on LinkedIn that aren't really
| developers or haven't been in a long time, now they can vibe
| out some react components or a python script so it's a
| revolution.
| danielbln wrote:
| Hi, my job is building working production software (these
| days heavily LLM assisted). The author of the article doesn't
| know what they're talking about.
| MobiusHorizons wrote:
| My impression has been that in corporate settings (and I would
| include LinkedIn in that) AI optimism is basically used as
| virtue signaling, making it very hard to distinguish people who
| are actually excited about the tech from people wanting to be
| accepted.
|
| My personal experience has been that AI has trouble keeping the
| scope of the change small and targeted. I have only been using
| Gemini 2.5 pro though, as we don't have access to other models
| at my work. My friend tells me he uses Claud for coding and
| Gemini for documentation.
| procaryote wrote:
| Linkedin posts seems like an awful source. The people I see
| posting for themselves there are either pre-successful or just
| very fond of personal branding
| ebiester wrote:
| I disagree from almost the first sentence:
|
| > Learning how to use LLMs in a coding workflow is trivial. There
| is no learning curve. You can safely ignore them if they don't
| fit your workflows at the moment.
|
| Learning how to use LLMs in a coding workflow is trivial to
| start, but you find you get a bad taste early if you don't learn
| how to adapt both your workflow and its workflow. It is easy to
| get a trivially good result and then be disappointed in the
| followup. It is easy to try to start on something it's not good
| at and think it's worthless.
|
| The pure dismissal of cursor, for example, means that the author
| didn't learn how to work with it. Now, it's certainly limited and
| some people just prefer Claude code. I'm not saying that's
| unfair. However, it requires a process adaptation.
| mkozlows wrote:
| "There's no learning curve" just means this guy didn't get very
| far up, which is definitely backed up by thinking that Copilot
| and other tools are all basically the same.
| leptons wrote:
| Basically, they are the same, they are all LLMs. They all
| have similar limitations. They all produce "hallucinations".
| They can also sometimes be useful. And they are all way
| overhyped.
| trenchpilgrim wrote:
| The amount of misconceptions in this comment are quite
| profound.
|
| Copilot isn't an LLM, for a start. You _combine_ it wil a
| selection of LLMs. And it absolutely has severe limitations
| compared to something like Claude Code in how it can
| interact with the programming environment.
|
| "Hallucinations" are far less of a problem with software
| that grounds the AI to the truth in your compiler,
| diagnostics, static analysis, a running copy of your
| project, runnning your tests, executing dev tools in your
| shell, etc.
| leptons wrote:
| >Copilot isn't an LLM, for a start
|
| You're being overly pedantic here and moving goalposts.
| Copilot (for coding) without an LLM is pretty useless.
|
| I stand by my assertion that these tools are all
| _basically the same fundamental tech_ - LLMs.
| rustybolt wrote:
| > "There's no learning curve" just means this guy didn't get
| very far up
|
| Not everyone with a different opinion is dumber than you.
| SadErn wrote:
| This is all just ignorance. We've all worked with LLMs and
| know that creating an effective workflow is not trivial and
| it varies based on the tool.
| jaynetics wrote:
| I'm not a native speaker, but to me that quote doesn't
| necessarily imply an inability of OP to get up the curve.
| Maybe they just mean that the curve can look flat at the
| start?
| scrollaway wrote:
| No, it's sometimes just extremely easy to recognize people
| who have no idea what they're talking about when they make
| certain claims.
|
| Just like I can recognize a clueless frontend developer
| when they say "React is basically just a newer jquery".
| Recognizing clueless engineers when they talk about AI can
| be pretty easy.
|
| It's a sector that is both old and new: AI has been around
| forever, but even people who worked in the sector years ago
| are taken aback by what is suddenly possible, the workflows
| that are happening... hell, I've even seen cases where it's
| the very people who have been following GenAI forever that
| have a bias towards believing it's incapable of what it can
| do.
|
| For context, I lead an AI R&D lab in Europe
| (https://ingram.tech/). I've seen some shit.
| tptacek wrote:
| _Learning how to use LLMs in a coding workflow is trivial. There
| is no learning curve. You can safely ignore them if they don't
| fit your workflows at the moment._
|
| I have never heard _anybody_ successfully using LLMs say this
| before. Most of what I 've learned from talking to people about
| their workflows is counterintuitive and subtle.
|
| It's a really weird way to open up an article concluding that
| LLMs make one a worse programmer: "I definitely know how to use
| this tool optimally, and I conclude the tool sucks". Ok then.
| Also: the piano is a terrible, awful instrument; what a racket it
| makes.
| edfletcher_t137 wrote:
| The first two points directly contradict each other, too.
| Learning a tool should have the outcome that one is productive
| with it. If getting to "productive" is non-trivial, then
| learning the tool is non-trivial.
| prerok wrote:
| I agree with your assessment about this statement. I actually
| had to reread it a few times to actually understand it.
|
| He is actually recommending Copilot for price/performance
| reasons and his closing statement is "Don't fall for the hype,
| but also, they are genuinely powerful tools sometimes."
|
| So, it just seems like he never really gave a try at how to
| engineer better prompts that these more advanced models can
| use.
| bgwalter wrote:
| Pianists' results are well known to be proportional to their
| talent/effort. In open source hardly anyone is even using LLMs
| and the ones that do have barely any output, In many cases less
| output than they had _before_ using LLMs.
|
| The blogging output on the other hand ...
| troupo wrote:
| > I have never heard anybody successfully using LLMs say this
| before. Most of what I've learned from talking to people about
| their workflows is counterintuitive and subtle.
|
| Because for all our posturing about being skeptical and data
| driven we all believe in magic.
|
| Those "counterintuitive non-trivial workflows"? They work about
| as well as just prompting "implement X" with no rules,
| agents.md, careful lists etc.
|
| Because 1) literally no one actually measures whether magical
| incarnations work and 2) it's impossible to make such
| measurements due to non-determinism
| roxolotl wrote:
| On top of this a lot of the "learning to work with LLMs" is
| breaking down tasks into small pieces with clear instructions
| and acceptance criteria. That's just part of working
| efficiently but maybe don't want to be bothered to do it.
| SkyPuncher wrote:
| > Learning how to use LLMs in a coding workflow is trivial.
| There is no learning curve. You can safely ignore them if they
| don't fit your workflows at the moment.
|
| That's a wild statement. I'm now extremely productive with LLMs
| in my core codebases, but it took a lot of practice to get it
| right and repeatable. There's a lot of little contextual
| details you need to learn how to control so the LLM makes the
| right choices.
|
| Whenever I start working in a new code base, it takes a a non-
| trivial amount of time to ramp back up to full LLM
| productivity.
| majormajor wrote:
| > That's a wild statement. I'm now extremely productive with
| LLMs in my core codebases, but it took a lot of practice to
| get it right and repeatable. There's a lot of little
| contextual details you need to learn how to control so the
| LLM makes the right choices.
|
| > Whenever I start working in a new code base, it takes a a
| non-trivial amount of time to ramp back up to full LLM
| productivity.
|
| Do you find that these details translate between models?
| Sounds like it doesn't translate across codebases for you?
|
| I have mostly moved away from this sort of fine-tuning
| approach because of experience a while ago around OpenAI's
| ChatGPT 3.5 and 4. Extra work on my end necessary with the
| older model wasn't with the new one, and sometimes
| counterintuitively caused worse performance by pointing it at
| what the way I'd do it vs the way it might have the best luck
| with. ESPECIALLY for the sycophantic models which will
| heavily index on "if you suggested that this thing might be
| related, I'll figure out some way to make sure it is!"
|
| So more recently I generally stick to the "we'll handle a lot
| of the prompt nitty gritty" for you IDE or CLI agent stuff,
| but I find they still fall apart with large complex codebases
| and also that the tricks don't translate across codebases.
| dezmou wrote:
| OP did miss the vscode extension for claude code, it is still
| terminal based but: - it show you the diff of the incoming
| changes in vscode ( like git ) - it know the line you selected in
| the editor for context
| sudhirb wrote:
| I have a biased opinion since I work for a background agent
| startup currently - but there are more (and better!) out there
| than Jules and Copilot that might address some of the author's
| issues.
| troupo wrote:
| And those mythical better tools tools that you didn't even
| bother to mention are?
| simonw wrote:
| _Learning how to use LLMs in a coding workflow is trivial. There
| is no learning curve. [...]_
|
| _LLMs will always suck at writing code that has not be written
| millions of times before. As soon as you venture slightly
| offroad, they falter._
|
| That right there is your learning curve! Getting LLMs to write
| code that's not heavily represented in their training data takes
| experience and skill and isn't obvious to learn.
| weeksie wrote:
| Yet another developer who is too full of themselves to admit that
| they have no idea how to use LLMs for development. There's an
| arrogance that can set in when you get to be more senior and
| unless you're capable of force feeding yourself a bit of humility
| you'll end up missing big, important changes in your field.
|
| It becomes farcical when not only are you missing the big thing
| but you're also proud of your ignorance and this guy is both.
| spenrose wrote:
| So many articles should prepend "My experience with ..." to their
| title. Here is OP's first sentence: "I spent the past ~4 weeks
| trying out all the new and fancy AI tools for software
| development." Dude, you have had some experiences and they are
| worth writing up and sharing. But your experiences are not a
| stand-in for "the current state." This point applies to a
| significant fraction of HN articles, to the point that I wish the
| headlines were flagged "blog".
| mettamage wrote:
| Clickbait gets more reach. It's an unfortunate thing. I
| remember Veritasium in a video even saying something along the
| lines of him feeling forced to do clickbaity YouTube because it
| works so well.
|
| The reach is big enough to not care about our feelings. I wish
| it wasn't this way.
| dash2 wrote:
| They missed OpenAI Codex, maybe deliberately? It's less llm-
| development and more vibe-coding, or maybe "being a PHB of
| robots". I'm enjoying it for my side project this week.
| kodisha wrote:
| LLM driven coding can yield awesome results, but you will be
| typing _a lot_ and, as article states, requires already well
| structured codebase.
|
| I recently started with fresh project, and until I got to the
| desired structure I only used AI to ask questions or suggestions.
| I organized and written most of the code.
|
| Once it started to get into the shape that felt semi-permanent to
| me, I started a lot of queries like:
|
| ```
|
| - Look at existing service X at folder services/x
|
| - see how I deploy the service using k8s/services/x
|
| - see how the docker file for service X looks like at
| services/x/Dockerfile
|
| - now, I started service Y that does [this and that]
|
| - create all that is needed for service Y to be skaffolded and
| deployed, follow the same pattern as service X
|
| ```
|
| And it would go, read existing stuff for X, then generate all of
| the deployment/monitoring/readme/docker/k8s/helm/skaffold for Y
|
| With zero to none mistakes. Both claude and gemini are more than
| capable to do such task. I had both of them generate 10-15 files
| with no errors, with code being able to be deployed right after
| (of course service will just answer and not do much more than
| that)
|
| Then, I will take over again for a bit, do some business logic
| specific to Y, then again leverage AI to fill in missing bits,
| review, suggest stuff etc.
|
| It might look slow, but it actually cuts most boring and most
| error prone steps when developing medium to large k8s backed
| project.
| philipwhiuk wrote:
| There's an IntelliJ extension for GitHub CoPilot.
|
| It's not perfect but it's okay.
| SadErn wrote:
| It's all about the Kilo Code extension.
| yogthos wrote:
| Personally, I've had a pretty positive experience with the coding
| assistants, but I had to spend some time to develop intuition for
| the types of tasks they're likely to do well. I would not say
| that this was trivial to do.
|
| Like if you need to crap out a UI based on a JSON payload, make a
| service call, add a server endpoint, LLMs will typically do this
| correctly in one shot. These are common operations that are
| easily extrapolated from their training data. Where they tend to
| fail are tasks like business logic which have specific
| requirements that aren't easily generalized.
|
| I've also found that writing the scaffolding for the code
| yourself really helps focus the agent. I'll typically add stubs
| for the functions I want, and create overall code structure, then
| have the agent fill the blanks. I've found this is a really
| effective approach for preventing the agent from going off into
| the weeds.
|
| I also find that if it doesn't get things right on the first
| shot, the chances are it's not going to fix the underlying
| problems. It tends to just add kludges on top to address the
| problems you tell it about. If it didn't get it mostly right at
| the start, then it's better to just do it yourself.
|
| All that said, I find enjoyment is an important aspect as well
| and shouldn't be dismissed. If you're less productive, but you
| enjoy the process more, then I see that as a net positive. If all
| LLMs accomplish is to make development more fun, that's a good
| thing.
|
| I also find that there's use for both terminal based tools and
| IDEs. The terminal REPL is great for initially sketching things
| out, but IDE based tooling makes it much easier to apply
| selective changes exactly where you want.
|
| As a side note, got curious and asked GLM-4.5 to make a token
| field widget with React, and it did it in one shot.
|
| It's also strange not to mention DeepSeek and GLM as options
| given that they cost orders of magnitude less per token than
| Claude or Gemini.
| bachmeier wrote:
| > By being particularly bad at anything outside of the most
| popular languages and frameworks, LLMs force you to pick a very
| mainstream stack if you want to be efficient.
|
| I haven't found that to be true with my most recent usage of AI.
| I do a lot of programming in D, which is not popular like Python
| or Javascript, but Copilot knows it well enough to help me with
| things like templates, metaprogramming, and interoperating with
| GCC-produced DLL's on Windows. This is true in spite of the lack
| of a big pile of training data for these tasks. Importantly, it
| gets just enough things wrong when I ask it to write code for me
| that I have to understand everything well enough to debug it.
| Vektorceraptor wrote:
| I agree. I had a similar experience.
|
| https://speculumx.at/pages/read_post.html?post=59
| singularity2001 wrote:
| "LLMs won't magically make you deliver production-ready code"
|
| Either I'm extremely lucky or I was lucky to find the guy who
| said it must all be test driven and guided by the usual
| principles of DRY etc. Claude Code works absolutely fantastically
| nine out of 10 times and when it doesn't we just roll back the
| three hours of nonsense it did postpone this feature or give it
| extra guidance.
| simonw wrote:
| I'm beginning to suspect robust automated tests may be one of
| the single strongest indicators for if you're going to have a
| good time with LLM coding agents or not.
|
| If there's a test suite for the thing to run it's SO much less
| likely to break other features when it's working. Plus it can
| read the tests and use them to get a good idea about how
| everything is supposed to work already.
|
| Telling Claude to write the test first, then execute it and
| watch it fail, then write the implementation has been giving me
| really great results.
| stephc_int13 wrote:
| I have not tried every IDE/CLI or models, only a few, mostly
| Claude and Qwen.
|
| I work mostly in C/C++.
|
| The most valuable improvement of using this kind of tools, for
| me, is to easily find help when I have to work on boring/tedious
| tasks or when I want to have a Socratic conversation about a
| design idea with a not-so-smart but extremely knowledgeable
| colleague.
|
| But for anything requiring a brain, it is almost useless.
| infoseek12 wrote:
| There are kind of a lot of errors in this piece. For instance,
| the problem the author had with Gemini CLI running out of tokens
| in ten minutes is what happens when you don't set up (a free) API
| key in your environment.
___________________________________________________________________
(page generated 2025-08-09 23:00 UTC)