[HN Gopher] The current state of LLM-driven development
       ___________________________________________________________________
        
       The current state of LLM-driven development
        
       Author : Signez
       Score  : 78 points
       Date   : 2025-08-09 16:17 UTC (6 hours ago)
        
 (HTM) web link (blog.tolki.dev)
 (TXT) w3m dump (blog.tolki.dev)
        
       | randfish wrote:
       | Deeply curious to know if this is an outlier opinion, a
       | mainstream but pessimistic one, or the general consensus. My
       | LinkedIn feed and personal network certainly suggests that it's
       | an outlier, but I wonder if the people around me are overly
       | optimistic or out of synch with what the HN community is
       | experiencing more broadly.
        
         | Terretta wrote:
         | Which part of the opinion?
         | 
         | I tend to _strongly_ agree with the  "unpopular opinion" about
         | the IDEs mentioned versus CLI (specifically, aider.chat and
         | Claude Code).
         | 
         | Assuming (this is key) you have mastery of the language and
         | framework you're using, working with the CLI tool in 25 year
         | old XP practices is an incredible accelerant.
         | 
         | Caveats:
         | 
         | - You absolutely must bring taste and critical thinking, as the
         | LLM has neither.
         | 
         | - You absolutely must bring systems thinking, as it cannot keep
         | deep weirdness "in mind". By this I mean the second and third
         | order things that "gotcha" about how things ought to work but
         | don't.
         | 
         | - Finally, you should package up everything new about your
         | language or frameworks since a few months or year before the
         | knowledge cutoff date, and include a condensed synthesis in
         | your context (e.g., Swift 6 and 6.1 versus the 5.10 and 2024's
         | WWDC announcements that are all GPT-5 knows).
         | 
         | For this last one I find it useful to (a) use OpenAI's "Deep
         | Research" to first whitepaper the gaps, then another pass to
         | turn that into a Markdown context prompt, and finally bring
         | that over to your LLM tooling to include as needed when doing a
         | spec or in architect mode. Similarly, (b) use repomap tools on
         | dependencies if creating new code that leverages those
         | dependencies, and have that in context for that work.
         | 
         | I'm confused why these two obvious steps aren't built into
         | leading agentic tools, but maybe handling the LLM as a naive
         | and outdated "Rain Man" type doesn't figure into mental models
         | at most KoolAid-drinking "AI" startups, or maybe vibecoders
         | don't care, so it's just not a priority.
         | 
         | Either way, context based development beats Leroy Jenkins.
        
           | throwdbaaway wrote:
           | > use repomap tools on dependencies if creating new code that
           | leverages those dependencies, and have that in context for
           | that work.
           | 
           | It seems to me that currently there are 2 schools of thought:
           | 
           | 1. Use repomap and/or LSP to help the models navigate the
           | code base
           | 
           | 2. Let the models figure things out with grep
           | 
           | Personally, I am 100% a grep guy, and my editor doesn't even
           | have LSP enabled. So, it is very interesting to see how many
           | of these agentic tools do exactly the same thing.
           | 
           | And Claude Code /init is a great feature that basically
           | writes down the current mental model after the initial round
           | of grep.
        
         | WD-42 wrote:
         | I think it's pretty common among people whose job it is to
         | provide working, production software.
         | 
         | If you go by MBA types on LinkedIn that aren't really
         | developers or haven't been in a long time, now they can vibe
         | out some react components or a python script so it's a
         | revolution.
        
           | danielbln wrote:
           | Hi, my job is building working production software (these
           | days heavily LLM assisted). The author of the article doesn't
           | know what they're talking about.
        
         | MobiusHorizons wrote:
         | My impression has been that in corporate settings (and I would
         | include LinkedIn in that) AI optimism is basically used as
         | virtue signaling, making it very hard to distinguish people who
         | are actually excited about the tech from people wanting to be
         | accepted.
         | 
         | My personal experience has been that AI has trouble keeping the
         | scope of the change small and targeted. I have only been using
         | Gemini 2.5 pro though, as we don't have access to other models
         | at my work. My friend tells me he uses Claud for coding and
         | Gemini for documentation.
        
         | procaryote wrote:
         | Linkedin posts seems like an awful source. The people I see
         | posting for themselves there are either pre-successful or just
         | very fond of personal branding
        
       | ebiester wrote:
       | I disagree from almost the first sentence:
       | 
       | > Learning how to use LLMs in a coding workflow is trivial. There
       | is no learning curve. You can safely ignore them if they don't
       | fit your workflows at the moment.
       | 
       | Learning how to use LLMs in a coding workflow is trivial to
       | start, but you find you get a bad taste early if you don't learn
       | how to adapt both your workflow and its workflow. It is easy to
       | get a trivially good result and then be disappointed in the
       | followup. It is easy to try to start on something it's not good
       | at and think it's worthless.
       | 
       | The pure dismissal of cursor, for example, means that the author
       | didn't learn how to work with it. Now, it's certainly limited and
       | some people just prefer Claude code. I'm not saying that's
       | unfair. However, it requires a process adaptation.
        
         | mkozlows wrote:
         | "There's no learning curve" just means this guy didn't get very
         | far up, which is definitely backed up by thinking that Copilot
         | and other tools are all basically the same.
        
           | leptons wrote:
           | Basically, they are the same, they are all LLMs. They all
           | have similar limitations. They all produce "hallucinations".
           | They can also sometimes be useful. And they are all way
           | overhyped.
        
             | trenchpilgrim wrote:
             | The amount of misconceptions in this comment are quite
             | profound.
             | 
             | Copilot isn't an LLM, for a start. You _combine_ it wil a
             | selection of LLMs. And it absolutely has severe limitations
             | compared to something like Claude Code in how it can
             | interact with the programming environment.
             | 
             | "Hallucinations" are far less of a problem with software
             | that grounds the AI to the truth in your compiler,
             | diagnostics, static analysis, a running copy of your
             | project, runnning your tests, executing dev tools in your
             | shell, etc.
        
               | leptons wrote:
               | >Copilot isn't an LLM, for a start
               | 
               | You're being overly pedantic here and moving goalposts.
               | Copilot (for coding) without an LLM is pretty useless.
               | 
               | I stand by my assertion that these tools are all
               | _basically the same fundamental tech_ - LLMs.
        
           | rustybolt wrote:
           | > "There's no learning curve" just means this guy didn't get
           | very far up
           | 
           | Not everyone with a different opinion is dumber than you.
        
             | SadErn wrote:
             | This is all just ignorance. We've all worked with LLMs and
             | know that creating an effective workflow is not trivial and
             | it varies based on the tool.
        
             | jaynetics wrote:
             | I'm not a native speaker, but to me that quote doesn't
             | necessarily imply an inability of OP to get up the curve.
             | Maybe they just mean that the curve can look flat at the
             | start?
        
             | scrollaway wrote:
             | No, it's sometimes just extremely easy to recognize people
             | who have no idea what they're talking about when they make
             | certain claims.
             | 
             | Just like I can recognize a clueless frontend developer
             | when they say "React is basically just a newer jquery".
             | Recognizing clueless engineers when they talk about AI can
             | be pretty easy.
             | 
             | It's a sector that is both old and new: AI has been around
             | forever, but even people who worked in the sector years ago
             | are taken aback by what is suddenly possible, the workflows
             | that are happening... hell, I've even seen cases where it's
             | the very people who have been following GenAI forever that
             | have a bias towards believing it's incapable of what it can
             | do.
             | 
             | For context, I lead an AI R&D lab in Europe
             | (https://ingram.tech/). I've seen some shit.
        
       | tptacek wrote:
       | _Learning how to use LLMs in a coding workflow is trivial. There
       | is no learning curve. You can safely ignore them if they don't
       | fit your workflows at the moment._
       | 
       | I have never heard _anybody_ successfully using LLMs say this
       | before. Most of what I 've learned from talking to people about
       | their workflows is counterintuitive and subtle.
       | 
       | It's a really weird way to open up an article concluding that
       | LLMs make one a worse programmer: "I definitely know how to use
       | this tool optimally, and I conclude the tool sucks". Ok then.
       | Also: the piano is a terrible, awful instrument; what a racket it
       | makes.
        
         | edfletcher_t137 wrote:
         | The first two points directly contradict each other, too.
         | Learning a tool should have the outcome that one is productive
         | with it. If getting to "productive" is non-trivial, then
         | learning the tool is non-trivial.
        
         | prerok wrote:
         | I agree with your assessment about this statement. I actually
         | had to reread it a few times to actually understand it.
         | 
         | He is actually recommending Copilot for price/performance
         | reasons and his closing statement is "Don't fall for the hype,
         | but also, they are genuinely powerful tools sometimes."
         | 
         | So, it just seems like he never really gave a try at how to
         | engineer better prompts that these more advanced models can
         | use.
        
         | bgwalter wrote:
         | Pianists' results are well known to be proportional to their
         | talent/effort. In open source hardly anyone is even using LLMs
         | and the ones that do have barely any output, In many cases less
         | output than they had _before_ using LLMs.
         | 
         | The blogging output on the other hand ...
        
         | troupo wrote:
         | > I have never heard anybody successfully using LLMs say this
         | before. Most of what I've learned from talking to people about
         | their workflows is counterintuitive and subtle.
         | 
         | Because for all our posturing about being skeptical and data
         | driven we all believe in magic.
         | 
         | Those "counterintuitive non-trivial workflows"? They work about
         | as well as just prompting "implement X" with no rules,
         | agents.md, careful lists etc.
         | 
         | Because 1) literally no one actually measures whether magical
         | incarnations work and 2) it's impossible to make such
         | measurements due to non-determinism
        
           | roxolotl wrote:
           | On top of this a lot of the "learning to work with LLMs" is
           | breaking down tasks into small pieces with clear instructions
           | and acceptance criteria. That's just part of working
           | efficiently but maybe don't want to be bothered to do it.
        
         | SkyPuncher wrote:
         | > Learning how to use LLMs in a coding workflow is trivial.
         | There is no learning curve. You can safely ignore them if they
         | don't fit your workflows at the moment.
         | 
         | That's a wild statement. I'm now extremely productive with LLMs
         | in my core codebases, but it took a lot of practice to get it
         | right and repeatable. There's a lot of little contextual
         | details you need to learn how to control so the LLM makes the
         | right choices.
         | 
         | Whenever I start working in a new code base, it takes a a non-
         | trivial amount of time to ramp back up to full LLM
         | productivity.
        
           | majormajor wrote:
           | > That's a wild statement. I'm now extremely productive with
           | LLMs in my core codebases, but it took a lot of practice to
           | get it right and repeatable. There's a lot of little
           | contextual details you need to learn how to control so the
           | LLM makes the right choices.
           | 
           | > Whenever I start working in a new code base, it takes a a
           | non-trivial amount of time to ramp back up to full LLM
           | productivity.
           | 
           | Do you find that these details translate between models?
           | Sounds like it doesn't translate across codebases for you?
           | 
           | I have mostly moved away from this sort of fine-tuning
           | approach because of experience a while ago around OpenAI's
           | ChatGPT 3.5 and 4. Extra work on my end necessary with the
           | older model wasn't with the new one, and sometimes
           | counterintuitively caused worse performance by pointing it at
           | what the way I'd do it vs the way it might have the best luck
           | with. ESPECIALLY for the sycophantic models which will
           | heavily index on "if you suggested that this thing might be
           | related, I'll figure out some way to make sure it is!"
           | 
           | So more recently I generally stick to the "we'll handle a lot
           | of the prompt nitty gritty" for you IDE or CLI agent stuff,
           | but I find they still fall apart with large complex codebases
           | and also that the tricks don't translate across codebases.
        
       | dezmou wrote:
       | OP did miss the vscode extension for claude code, it is still
       | terminal based but: - it show you the diff of the incoming
       | changes in vscode ( like git ) - it know the line you selected in
       | the editor for context
        
       | sudhirb wrote:
       | I have a biased opinion since I work for a background agent
       | startup currently - but there are more (and better!) out there
       | than Jules and Copilot that might address some of the author's
       | issues.
        
         | troupo wrote:
         | And those mythical better tools tools that you didn't even
         | bother to mention are?
        
       | simonw wrote:
       | _Learning how to use LLMs in a coding workflow is trivial. There
       | is no learning curve. [...]_
       | 
       |  _LLMs will always suck at writing code that has not be written
       | millions of times before. As soon as you venture slightly
       | offroad, they falter._
       | 
       | That right there is your learning curve! Getting LLMs to write
       | code that's not heavily represented in their training data takes
       | experience and skill and isn't obvious to learn.
        
       | weeksie wrote:
       | Yet another developer who is too full of themselves to admit that
       | they have no idea how to use LLMs for development. There's an
       | arrogance that can set in when you get to be more senior and
       | unless you're capable of force feeding yourself a bit of humility
       | you'll end up missing big, important changes in your field.
       | 
       | It becomes farcical when not only are you missing the big thing
       | but you're also proud of your ignorance and this guy is both.
        
       | spenrose wrote:
       | So many articles should prepend "My experience with ..." to their
       | title. Here is OP's first sentence: "I spent the past ~4 weeks
       | trying out all the new and fancy AI tools for software
       | development." Dude, you have had some experiences and they are
       | worth writing up and sharing. But your experiences are not a
       | stand-in for "the current state." This point applies to a
       | significant fraction of HN articles, to the point that I wish the
       | headlines were flagged "blog".
        
         | mettamage wrote:
         | Clickbait gets more reach. It's an unfortunate thing. I
         | remember Veritasium in a video even saying something along the
         | lines of him feeling forced to do clickbaity YouTube because it
         | works so well.
         | 
         | The reach is big enough to not care about our feelings. I wish
         | it wasn't this way.
        
       | dash2 wrote:
       | They missed OpenAI Codex, maybe deliberately? It's less llm-
       | development and more vibe-coding, or maybe "being a PHB of
       | robots". I'm enjoying it for my side project this week.
        
       | kodisha wrote:
       | LLM driven coding can yield awesome results, but you will be
       | typing _a lot_ and, as article states, requires already well
       | structured codebase.
       | 
       | I recently started with fresh project, and until I got to the
       | desired structure I only used AI to ask questions or suggestions.
       | I organized and written most of the code.
       | 
       | Once it started to get into the shape that felt semi-permanent to
       | me, I started a lot of queries like:
       | 
       | ```
       | 
       | - Look at existing service X at folder services/x
       | 
       | - see how I deploy the service using k8s/services/x
       | 
       | - see how the docker file for service X looks like at
       | services/x/Dockerfile
       | 
       | - now, I started service Y that does [this and that]
       | 
       | - create all that is needed for service Y to be skaffolded and
       | deployed, follow the same pattern as service X
       | 
       | ```
       | 
       | And it would go, read existing stuff for X, then generate all of
       | the deployment/monitoring/readme/docker/k8s/helm/skaffold for Y
       | 
       | With zero to none mistakes. Both claude and gemini are more than
       | capable to do such task. I had both of them generate 10-15 files
       | with no errors, with code being able to be deployed right after
       | (of course service will just answer and not do much more than
       | that)
       | 
       | Then, I will take over again for a bit, do some business logic
       | specific to Y, then again leverage AI to fill in missing bits,
       | review, suggest stuff etc.
       | 
       | It might look slow, but it actually cuts most boring and most
       | error prone steps when developing medium to large k8s backed
       | project.
        
       | philipwhiuk wrote:
       | There's an IntelliJ extension for GitHub CoPilot.
       | 
       | It's not perfect but it's okay.
        
       | SadErn wrote:
       | It's all about the Kilo Code extension.
        
       | yogthos wrote:
       | Personally, I've had a pretty positive experience with the coding
       | assistants, but I had to spend some time to develop intuition for
       | the types of tasks they're likely to do well. I would not say
       | that this was trivial to do.
       | 
       | Like if you need to crap out a UI based on a JSON payload, make a
       | service call, add a server endpoint, LLMs will typically do this
       | correctly in one shot. These are common operations that are
       | easily extrapolated from their training data. Where they tend to
       | fail are tasks like business logic which have specific
       | requirements that aren't easily generalized.
       | 
       | I've also found that writing the scaffolding for the code
       | yourself really helps focus the agent. I'll typically add stubs
       | for the functions I want, and create overall code structure, then
       | have the agent fill the blanks. I've found this is a really
       | effective approach for preventing the agent from going off into
       | the weeds.
       | 
       | I also find that if it doesn't get things right on the first
       | shot, the chances are it's not going to fix the underlying
       | problems. It tends to just add kludges on top to address the
       | problems you tell it about. If it didn't get it mostly right at
       | the start, then it's better to just do it yourself.
       | 
       | All that said, I find enjoyment is an important aspect as well
       | and shouldn't be dismissed. If you're less productive, but you
       | enjoy the process more, then I see that as a net positive. If all
       | LLMs accomplish is to make development more fun, that's a good
       | thing.
       | 
       | I also find that there's use for both terminal based tools and
       | IDEs. The terminal REPL is great for initially sketching things
       | out, but IDE based tooling makes it much easier to apply
       | selective changes exactly where you want.
       | 
       | As a side note, got curious and asked GLM-4.5 to make a token
       | field widget with React, and it did it in one shot.
       | 
       | It's also strange not to mention DeepSeek and GLM as options
       | given that they cost orders of magnitude less per token than
       | Claude or Gemini.
        
       | bachmeier wrote:
       | > By being particularly bad at anything outside of the most
       | popular languages and frameworks, LLMs force you to pick a very
       | mainstream stack if you want to be efficient.
       | 
       | I haven't found that to be true with my most recent usage of AI.
       | I do a lot of programming in D, which is not popular like Python
       | or Javascript, but Copilot knows it well enough to help me with
       | things like templates, metaprogramming, and interoperating with
       | GCC-produced DLL's on Windows. This is true in spite of the lack
       | of a big pile of training data for these tasks. Importantly, it
       | gets just enough things wrong when I ask it to write code for me
       | that I have to understand everything well enough to debug it.
        
       | Vektorceraptor wrote:
       | I agree. I had a similar experience.
       | 
       | https://speculumx.at/pages/read_post.html?post=59
        
       | singularity2001 wrote:
       | "LLMs won't magically make you deliver production-ready code"
       | 
       | Either I'm extremely lucky or I was lucky to find the guy who
       | said it must all be test driven and guided by the usual
       | principles of DRY etc. Claude Code works absolutely fantastically
       | nine out of 10 times and when it doesn't we just roll back the
       | three hours of nonsense it did postpone this feature or give it
       | extra guidance.
        
         | simonw wrote:
         | I'm beginning to suspect robust automated tests may be one of
         | the single strongest indicators for if you're going to have a
         | good time with LLM coding agents or not.
         | 
         | If there's a test suite for the thing to run it's SO much less
         | likely to break other features when it's working. Plus it can
         | read the tests and use them to get a good idea about how
         | everything is supposed to work already.
         | 
         | Telling Claude to write the test first, then execute it and
         | watch it fail, then write the implementation has been giving me
         | really great results.
        
       | stephc_int13 wrote:
       | I have not tried every IDE/CLI or models, only a few, mostly
       | Claude and Qwen.
       | 
       | I work mostly in C/C++.
       | 
       | The most valuable improvement of using this kind of tools, for
       | me, is to easily find help when I have to work on boring/tedious
       | tasks or when I want to have a Socratic conversation about a
       | design idea with a not-so-smart but extremely knowledgeable
       | colleague.
       | 
       | But for anything requiring a brain, it is almost useless.
        
       | infoseek12 wrote:
       | There are kind of a lot of errors in this piece. For instance,
       | the problem the author had with Gemini CLI running out of tokens
       | in ten minutes is what happens when you don't set up (a free) API
       | key in your environment.
        
       ___________________________________________________________________
       (page generated 2025-08-09 23:00 UTC)