hngopher.com

       [HN Gopher] Why agents are bad pair programmers
       ___________________________________________________________________
        
       Why agents are bad pair programmers
        
       Author : sh_tomer
       Score  : 261 points
       Date   : 2025-06-09 23:36 UTC (23 hours ago)
        
 (HTM) web link (justin.searls.co)
 (TXT) w3m dump (justin.searls.co)
        
       | bluefirebrand wrote:
       | Pair programming is also not suitable for all cases
       | 
       | Maybe not for many cases
       | 
       | I mentioned this elsewhere but I find it absolutely impossible to
       | get into a good programming flow anymore while the LLM constantly
       | interrupts me with suggested autocompletes that I have to stop,
       | read, review, and accept/reject
       | 
       | It's been miserable trying to incorporate this into my workflow
        
         | amazingamazing wrote:
         | Code regularly, and use ai to get unblocked if you do so or
         | review code for mistakes.
         | 
         | Or have the ai write the entire first draft for some piece and
         | then you give it a once over, correcting it either manually or
         | with prompts.
        
         | meesles wrote:
         | Second this. My solution is to have a 'non-AI' IDE and then a
         | Cursor/VS Code to switch between. Deep work cannot be achieved
         | by chatting with the coding bots, sorry.
        
           | morkalork wrote:
           | Thirded. It was just completely distracting and I had to turn
           | it off. I use AI but not after every keystroke, jeez.
        
             | latentsea wrote:
             | But but but... "we are an AI-first company".
             | 
             | Yeah, nah. Fourthed!
        
               | mdp2021 wrote:
               | > _AI-first company_
               | 
               | Does anybody introduce itself like that?
               | 
               | It's like when your date sends subtle signals, like
               | kicking sleeping tramps in the street and snorting the
               | flour over bread at the restaurant.
               | 
               | (The shocking thing is that the expression would even
               | make sense when taken properly - "we have organized our
               | workflows through AI-intelligent systems" -, while at
               | this time it easily means the opposite.)
        
               | neilv wrote:
               | > > _AI-first company_
               | 
               | > _Does anybody introduce itself like that?_
               | 
               | Yes, I've started getting job posts sent to me that say
               | that.
               | 
               | Declaring one's company "AI-first" right now is a great
               | time-saver: I know instantly that I can disregard that
               | company.
        
           | gen220 wrote:
           | You should try aider! This is my workflow, essentially
        
           | soulofmischief wrote:
           | > Deep work cannot be achieved by chatting with the coding
           | bots, sorry.
           | 
           | ...by you. Meanwhile, plenty of us have found a way to
           | enhance our productivity during deep work. No need for the
           | patronization.
        
             | bluefirebrand wrote:
             | I don't believe you experience deep work the same way I do
             | then
             | 
             | In my mind you cannot do deep work while being interrupted
             | constantly, and LLM agents are constant interruptions
        
               | soulofmischief wrote:
               | You can do better than a No true Scotsman fallacy. The
               | fact is that not everyone works the same way you do, or
               | interacts the same way with agents. They are not constant
               | interruptions if you use them correctly.
               | 
               | Essentially, this is a skill issue and you're at the
               | first peak of the Dunning-Kruger curve, sooner ready to
               | dismiss those with more experience in this area as being
               | less experienced, instead of keeping an open mind and
               | attempting to learn from those who contradict your
               | beliefs.
               | 
               | You could have asked for tips since I said I've found a
               | way to work deeply with them, but instead chose to assume
               | that you knew better. This kind of attitude will stunt
               | your ability to adopt these programs in the same way that
               | many people were dismissive about personal computers or
               | the internet and got left behind.
        
               | girvo wrote:
               | It's quite amusing to see you complain about
               | patronisation, and then see you turn about and do it
               | yourself one comment later.
        
               | soulofmischief wrote:
               | I'm open to hearing how being honest with them about
               | their negative approach is patronizing them.
        
               | jcranmer wrote:
               | Calling someone "on the first peak of the Dunning-Kruger
               | curve" is patronizing them.
        
               | soulofmischief wrote:
               | How would you have handled it?
        
               | kbelder wrote:
               | Civilly?
        
               | Timwi wrote:
               | Here is how I might have handled it differently:
               | 
               | Instead of
               | 
               | > Meanwhile, plenty of us have found a way to enhance our
               | productivity during deep work. No need for the
               | patronization.
               | 
               | you could have written
               | 
               | > Personally, I found doing X does enhance my
               | productivity during deep work.
               | 
               | Why it's better: 1) cuts out the confrontation ("you're
               | being patronizing!"), 2) offers the information directly
               | instead of merely implying that you've found it, and 3)
               | speaks for yourself and avoids the generalization about
               | "plenty of people", which could be taken as a veiled
               | insult ("you must be living as a hermit or something").
               | 
               | Next:
               | 
               | > You can do better than a No true Scotsman fallacy.
               | 
               | Even if the comment were a No True Scotsman, I would not
               | have made that fact the central thesis of this paragraph.
               | Instead, I might have explained the error in the argument
               | instead. Advantages: 1) you can come out clean in the
               | case that you might be wrong about the fallacy, and 2)
               | the commenter might appreciate the insight.
               | 
               | Reason you're wrong in this case: The commenter referred
               | entirely to their own experience and made no "true
               | programmer" assertions.
               | 
               | Next:
               | 
               | > Essentially, this is a skill issue [...] Dunning-Kruger
               | curve [...] chose to assume that you knew better. [...]
               | 
               | I would have left out these entire two paragraphs. As
               | best as I can tell, they contain only personal attacks.
               | As a result, the reader comes away feeling like your only
               | purpose here is to put others down. Instead, when you
               | wrote
               | 
               | > You could have asked for tips
               | 
               | I personally would have just written out the tips.
               | Advantage: the reader may find it useful in the best
               | case, and even if not, at least appreciate your
               | contribution.
        
               | owebmaster wrote:
               | That's real patronizing. His answers were fine, unless
               | you think he is totally wrong.
        
               | Timwi wrote:
               | As an observer to this conversation, I can't help but
               | notice that both have a good point here.
               | 
               | Soulofmischief's main point is that meesles made an
               | inappropriate generalization. Meesles said that something
               | was impossible to do, and soulofmischief pointed out that
               | you can't really infer that it's impossible for everyone
               | just because _you_ couldn 't find a way. This is a
               | perfectly valid point, but it wasn't helped by
               | soulofmischief calling the generalization "patronizing".
               | 
               | Bluefirebrand pushed back on that by merely stating that
               | their experience and intuition match those of meesles,
               | but soulofmischief then interpreted that as implying
               | they're not a real programmer and called it a No True
               | Scotsman fallacy.
               | 
               | It went downhill from there with soulofmischief trying to
               | reiterate their point but only doing so in terms of
               | insults such as the Dunning-Kruger line.
        
               | girvo wrote:
               | Oh 100%. I deliberately passed no judgement on the
               | _actual_ main points, as my experience is quite literally
               | in between both of theirs.
               | 
               | I find agent mode incredibly distracting and it _does_
               | get in the way of very deep focus for implementation for
               | myself for the work I do... but not always. It has
               | serious value for some tasks!
        
               | soulofmischief wrote:
               | I only took issue with ", sorry." The rest of it I was
               | fine with. I definitely didn't need to match their energy
               | so much though, I should have toned it down. Also, the No
               | true Scotsman was about deep work, not being a
               | programmer, but otherwise yeah. I didn't mean to be
               | insulting but I could have done better :)
        
               | antihipocrat wrote:
               | Would be informative if both sides share what the problem
               | domain is when providing their their experiences.
               | 
               | It's possible that the domain or the complexity of the
               | problems are the deciding factor for success with AI
               | supported programming. Statements like 'you'll be left
               | behind' or 'it's a skill issue' are as helpful as 'It
               | fails miserably'
        
               | JumpCrisscross wrote:
               | For what it's worth, the deepest-thinking and most
               | profound programmers I have met--hell, thinkers in
               | general--have a peculiar tendency to favour pen and
               | paper. Perhaps because once their work is recognised,
               | they are generally working with a team that can amplify
               | them without needing to interrupt their thought flow.
        
               | soulofmischief wrote:
               | Ha, I would count myself among those if my handwriting
               | wasn't so terrible and I didn't have bad arthritis since
               | my youth. I still reach for pen and paper on the go or
               | when I need to draw something out, but I've gotten more
               | productive using an outliner on my laptop, specifically
               | Logseq.
               | 
               | I think there's still room for thought augmentation via
               | LLMs here. Years back when I used Obsidian, I created
               | probably the first or second copilot-for-Obsidian plugin
               | and I found it very helpful, even though GPT-3 was
               | generally pretty awful. I still find myself in deep flow,
               | thinking in abstract, working alongside my agent to solve
               | deep problems in less time than I otherwise would.
        
               | hnthrowaway121 wrote:
               | Analysis in the last 5-10 years has shown the Dunning-
               | Kruger effect may not really exist. So it's a poor basis
               | on which to be judgmental and condescending.
        
               | soulofmischief wrote:
               | > judgmental and condescending
               | 
               | pushing back against judgement and condescension is not
               | judgemental and condescending.
               | 
               | > may not really exist
               | 
               | I'm open to reading over any resources you would like to
               | provide, maybe it's "real", maybe it isn't, but I have
               | personally both experienced and witnessed the effect in
               | myself, other individuals and groups. It's a good
               | heuristic for certain scenarios, even if it isn't
               | necesarily generalizable.
        
               | Timwi wrote:
               | I would invite you to re-read some of the comments you
               | perceived as judgement and condescension and keep an open
               | mind. You might find that you took them as judgement and
               | condescension unfairly.
               | 
               | Meanwhile, you have absolutely been judgemental and
               | condescending yourself. If you really keep the open mind
               | that you profess, you'll take a moment to reflect on this
               | and not dismiss it out of hand. It does not do you any
               | favors to blissfully assume everyone is wrong about you
               | and obliviously continue to be judgmental and
               | condescending.
        
               | Timwi wrote:
               | > You could have asked for tips since I said I've found a
               | way to work deeply with them
               | 
               | How do you work deeply with them? Looking for some tips.
        
               | 8note wrote:
               | if youre using a computer at all, youre doing it wrong.
               | deep work can only be done from the forest with no
               | internet reception, pencil and paper
        
               | thallada wrote:
               | Everyone knows real programmers only need to use a
               | butterfly.
        
               | cmrdporcupine wrote:
               | If you've opened your eyes, it's not deep work.
               | 
               | Deep work happens in a sensor deprivation tank. And you
               | have to memorize everything you thought through, and
               | write it down (with quill pen) after you emerge.
               | 
               | Anything else isn't really deep. Sorry, you posers.
        
               | ashdksnndck wrote:
               | This sounds like an issue with the specific UI setup you
               | are using. I have mine configured so it only starts doing
               | stuff if I ask it to. It never interrupts me.
        
               | icedchai wrote:
               | We're getting constantly interrupted with Slack messages,
               | Zoom meetings, emails, Slack messages about checking said
               | emails, etc. At least an LLM isn't constantly pinging you
               | for updates (yet?) - you can get back to it whenever.
        
           | NicoSchwandner wrote:
           | I do this as well and it works quite well for me like that!
           | 
           | Additionally, when working on microservices and on issues
           | that don't seem too straightforward, I use o3 and copy the
           | whole code of the repo into the prompt and refine a plan
           | there and then paste it as a prompt into cursor. Handy if you
           | don't have MAX mode, but a company-sponsored ChatGPT.
        
             | vasusen wrote:
             | I do this too by pasting only the relevant context files
             | into O3 or Claude 4. We have an internal tool that just
             | lets us select folders/files and spit out one giant
             | markdown.
        
           | thunspa wrote:
           | I just cmd + shift + p -> disable Cursor tab -> enter
           | 
           | Sure, you could just add a shortcut too.
           | 
           | After a while, it turns into a habit.
        
           | SV_BubbleTime wrote:
           | This is kind of intentionally the flow with Claude code as
           | I've experienced it.
           | 
           | I'm in VSCode doing my thing, and it's in a terminal window
           | that occasionally needs or wants my attention. I can go off
           | and be AI-Free for as long as I like.
        
         | flessner wrote:
         | I recently got a new laptop and had to setup my IDE again.
         | 
         | After a couple hours of coding something felt "weird" - turns
         | out I forgot to login to GitHub Copilot and I was working
         | without it the entire time. I felt a lot more proactive and
         | confident as I wasn't waiting on the autocomplete.
         | 
         | Also, Cursor was exceptional at interrupting any kind of "flow"
         | - who even wants their next cursor position predicted?
         | 
         | I'll probably keep Copilot disabled for now and stick to the
         | agent-style tools like aider for boilerplate or redundant
         | tasks.
        
           | johnfn wrote:
           | > who even wants their next cursor position predicted
           | 
           | I'm fascinated by how different workflows are. This single
           | feature has saved me a staggering amount of time.
        
           | ipaddr wrote:
           | It's strange the pure llm workflow and boring. I still write
           | most of my own code and will llms when I'm too lazy to write
           | the next piece.
           | 
           | If I give it to an llm most of my time is spent debugging and
           | reprompting. I hate fixing someone elses bug.
           | 
           | Plus I like the feeling of the coding flow..wind at my back.
           | Each keystroke putting us one step closer.
           | 
           | The apps I made with llms I never want to go back to but the
           | apps I made by hand piece by piece getting a chemical
           | reaction when problems were solved are the ones I think
           | positively about and want to go back to.
           | 
           | I always did math on paper or my head and never used a
           | calculator. Its a skill I never have forgotten and I worry
           | how many programmers won't be able to code without llms in
           | the future.
        
           | baq wrote:
           | > Also, Cursor was exceptional at interrupting any kind of
           | "flow" - who even wants their next cursor position predicted?
           | 
           | Me, I use this all the time. It's actually predictable and
           | saves lots of time when doing similar edits in a large file.
           | It's about as powerful as multi-line regex search and
           | replace, except you don't have to write the regex.
        
           | barrenko wrote:
           | Same, I like agents or nothing in between.
        
         | jumploops wrote:
         | I'm a Vim user and couldn't agree more.
         | 
         | Didn't like any of the AI-IDEs, but loved using LLMs for
         | spinning up one off solutions (copy/paste).
         | 
         | Not to be a fan boy, but Claude Code is my new LLM workflow.
         | It's tough trying to get it to do everything, but works really
         | well with a targeted task on an existing code base.
         | 
         | Perfect harmony of a traditional code editor (Vim) with an LLM-
         | enhanced workflow in my experience.
        
         | brianpan wrote:
         | AI "auto-complete" or "code suggestions" is the worst,
         | especially if you are in a strongly-type language because it's
         | 80% correct and competing with an IDE that can be 100% correct.
         | 
         | AI agents are much better for me because 1) they don't
         | constantly interrupt your train of thought and 2) they can run
         | compile, run tests, etc. to discover they are incorrect and fix
         | it before handing the code back to you.
        
         | dinosaurdynasty wrote:
         | I love the autocomplete, honestly use it more than any other AI
         | feature.
         | 
         | But I'm forced to write in Go which has a lot of boilerplate
         | (and no, some kind of code library or whatever would not
         | help... it's just easier to type at that point).
         | 
         | It's great because it helps with stuff that's too much of a
         | hassle to talk to the AI for (just quicker to type).
         | 
         | I also read very fast so one line suggestions are just instant
         | anyway (like non AI autocomplete), and longer ones I can see if
         | it's close enough to what I was going to type anyway. And
         | eventually it gets to the point where you just kinda know what
         | it's going to do.
         | 
         | Not an amazing boost, but it does let me be lazy writing log
         | messages and for loops and such. I think you do need to read it
         | much faster than you can write it to be helpful though.
        
         | CraigJPerry wrote:
         | Zed has a "subtle" mode, hopefully that feature can become
         | table stakes in all AI editor integrations
        
         | rsynnott wrote:
         | I've always seen it as primarily an _education_ tool; the
         | purpose of pair programming isn't that two people pair
         | programming are more productive than two people working
         | individually, they're generally not. So pair programming with a
         | magic robot seems rather futile; it's not going to learn
         | anything.
        
           | searls wrote:
           | LLMs in their current incarnation will not, but there's
           | nothing inherently preventing them from learning. Contexts
           | are getting large enough that having a sidecar database
           | living with each project or individual as a sort of corpus of
           | "shit I learned pairing with Justin" is already completely
           | achievable, if only a product company wanted to do that.
        
             | GrantMoyer wrote:
             | Claude Plays Pokemon is kind of an interesting case study
             | for this. This sort of knowledgebase is implemented, but
             | even state of the art models struggle to use it
             | effectively. They seem to fixate on small snippets from the
             | knowledge base without any ability to consider greater
             | context.
        
         | hk1337 wrote:
         | > Pair programming is also not suitable for all cases
         | 
         | I think this is true but pair programming can work for most
         | circumstances.
         | 
         | The times where it doesn't work is usually because one or both
         | parties are not all-in with the process. Either someone is
         | skeptical about pair programming and thinks it never works or
         | they're trying to enforce a strict interpretation of pair
         | programming.
        
           | bluefirebrand wrote:
           | It doesn't work when someone already has a solution in mind
           | and all they need to do is type it into the editor
           | 
           | I've been doing this a while. This is most of my work
        
       | ramesh31 wrote:
       | >LLM agents make bad pairs because they code faster than humans
       | think
       | 
       | This is why I strongly dislike all of the terminal based tools
       | and PR based stuff. If you're left to read through a completed
       | chunk of code it is just overwhelming and your cycle time is too
       | slow. The key to productivity is using an IDE based tool that
       | shows you every line of code as it is being written, so you're
       | reading it and understanding where it's going in real time.
       | Augmentation, not automation, is the path forward. Think of it
       | like the difference between walking and having a manual
       | transmission car to drive, _not_ the difference between having a
       | car and having a self driving car.
        
         | bluefirebrand wrote:
         | If I have a 20 line function in my mind and the LLM injects 20
         | lines for me to accept or reject, I have two problems
         | 
         | First I have to review the 20 lines the LLM has produced
         | 
         | Second, if I reject those lines, it has probably shoved the
         | function I had in mind out of my head
         | 
         | It's enormously disruptive to my progress
        
           | ramesh31 wrote:
           | The hard truth here is in accepting that the 20 lines in your
           | head were probably wrong, or suboptimal, and letting go of
           | that urge. Think in interfaces, not implementations.
           | Successive rendering, not one-shot.
        
             | shortstuffsushi wrote:
             | This is just fundamentally not the case most of the time.
             | LLMs guess where you're going, but so often what they
             | produce is a "similar looking" non sequitur relative to the
             | lines above it. It guesses, and sometimes that guess is
             | good, but as often, or more, it's not.
             | 
             | The suggestion "think in interfaces" is fine; if you spell
             | out enough context in comments, the LLM may be able to
             | guess more accurately, but in spelling out that much
             | context for it, you've likely already done the mental
             | exercise of the implementation.
             | 
             | Also baffled by "wrong or suboptimal," I don't think I've
             | ever seen an LLM come up with a _better_ solution.
        
             | bluefirebrand wrote:
             | > The hard truth here is in accepting that the 20 lines in
             | your head were probably wrong, or suboptimal, and letting
             | go of that urge.
             | 
             | Maybe, but the dogshit that Cursor generates is
             | _definitely_ wrong so frankly if it 's gonna be my name on
             | the PR then I want it to me _my_ wrong code not hide behind
             | some automated tool
             | 
             | > Think in interfaces, not implementations
             | 
             | In my experience you likely won't know if you've designed
             | the right interface until you successfully implement the
             | solution. Trying to design the perfect interface upfront is
             | almost guaranteed to take longer than just building the
             | thing
        
             | datameta wrote:
             | I agree with the last two sentences but simultaneously
             | think that starting to defacto believe you cannot have an
             | equal or better solution compared to the AI is the start of
             | atrophy of those skills.
        
             | xedrac wrote:
             | Maybe it's the domain I work in, or the languages I use,
             | but the 20 lines the LLM comes up with is almost certainly
             | wrong.
        
             | simoncion wrote:
             | > ...and letting go of that urge.
             | 
             | What urge? The urge to understand what the software you're
             | about to build upon is doing? If so, uh... no. No thanks.
             | 
             | I've seen some proponents of these code-generation machines
             | say things like "You don't check the output of your
             | optimizing compiler, so why check the output of
             | Claude/Devon/whatever?". The problem with this analogy is
             | that the output from mainstream optimizing compilers is
             | very nearly always correct. It may be notably _worse_ than
             | hand-generated output, but it 's nearly never _wrong_. Not
             | even the most rabid proponent will claim the same of today
             | 's output from these code-generation machines.
             | 
             | So, when these machines emit code, I will inevitably have
             | to switch from "designing and implementing my software
             | system" mode into "reading and understanding someone else's
             | code" mode. Some folks may be actually be able to do this
             | context-shuffling quickly and easily. I am not one of those
             | people. The results from those studies from a while back
             | that found that folks take something like a quarter-hour to
             | really get back into the groove when interrupted while
             | doing a technical task _suggest_ that not that many folks
             | are able to do this.
             | 
             | > Think in interfaces...
             | 
             | Like has been said already, you don't tend to get the right
             | interface until you've attempted to use it with a bunch of
             | client code. "Take a good, educated stab at it and refine
             | it as the client implementations reveal problems in your
             | design." is the way you're going to go for all but the most
             | well-known problems. (And if your problem is _that_ well-
             | known, why are you writing more than a handful of lines
             | solving that problem again? Why haven 't you bundled up the
             | solution to that problem in a library already?)
             | 
             | > Successive rendering, not one-shot.
             | 
             | Yes, like nearly all problem-solving, most programming is
             | and always has been an iterative process. One rarely gets
             | things right on the first try.
        
       | ChrisMarshallNY wrote:
       | I use an LLM as a reference (on-demand), and don't use agents
       | (yet). I was never into pair programming, anyway, so it isn't a
       | familiar workflow for me.
       | 
       | I will admit that it encourages "laziness," on my part, but I'm
       | OK with that (remember when they said that calculators would do
       | that? They were right).
       | 
       | For example, I am working on a SwiftUI project (an Apple Watch
       | app), and forgot how to do a fairly basic thing. I could have
       | looked it up, in a few minutes, but it was easier to just spin up
       | ChatGPT, and ask it how to do it. I had the answer in a few
       | seconds. Looking up SwiftUI stuff is a misery. The documentation
       | is ... _a work in progress_ ...
        
         | petesergeant wrote:
         | > I use an LLM as a reference (on-demand), and don't use agents
         | (yet)
         | 
         | This was me until about three weeks ago. Then, during a week of
         | holiday, I decided I didn't want to get left behind and tried a
         | few side-projects using agents -- specifically I've been using
         | Roo. Now I use agents when appropriate, which I'd guess is
         | about 50% of the work I'm doing.
        
           | cpursley wrote:
           | Roo looks interesting. How does it compare with Cursor and
           | Windsurf?
        
             | shio_desu wrote:
             | It burns tokens if you BYOK but you can hook into GH
             | Copilot LLMs directly
             | 
             | I really like the orchestrator and architect personas as is
             | out of the box. I prefer it over Cursor / Windsurf for a
             | few reasons - no indexing (double edged sword) -
             | orchestrator I find much more useful than windsurf cascades
             | - tool usage is fantastic
             | 
             | The no indexing is a double edged sword, it does need to
             | read files constantly, contributing to token burn. However,
             | you don't have to worry about indexed data being on a 3rd
             | party server (cursor), and also since it has to crawl to
             | understand the codebase for it to implement, to me it seems
             | like it is more capable of trickier code implementations,
             | as long as you utilize context properly.
             | 
             | For more complex tasks, I usually either spend 20-30
             | minutes writing a prompt to give it what I'm looking to
             | implement, or write up a document detailing the approach
             | I'd like to take and iterate with the architect agent.
             | 
             | Afterwards, hand it off to the orchestrator and it manages
             | and creates subtasks, which is to provide targeted
             | implementation steps / tasks with a fresh context window.
             | 
             | If you have a GH Copilot license already, give it a shot. I
             | personally think it's a good balance between control as an
             | architect and not having to tie my time down for
             | implementations, since really a lot of the work in coding
             | is figuring out the implementation plan anyways, and the
             | coding can be busy work, to me personally anyways. I prefer
             | it over the others as I feel Windsurf/Cursor encourages
             | YOLO too much.
        
       | khendron wrote:
       | When I first tried an LLM agent, I was hoping for an interactive,
       | 2-way, pair collaboration. Instead, what I got was a pairing
       | partner who wanted to do everything themselves. I couldn't even
       | tweak the code they had written, because it would mess up their
       | context.
       | 
       | I want a pairing partner where I can write a little, they write a
       | little, I write a little, they write a little. You know, an
       | actual collaboration.
        
         | psadri wrote:
         | I usually add "discuss first. Don't modify code yet". Then we
         | do some back and forth. And finally, "apply".
        
           | dragonfax wrote:
           | Claude Code has "plan mode" for this now. It enforces this
           | behavior. But its still poorly documented.
        
             | psadri wrote:
             | They should add a "cmd-enter" for ask, and "enter" to go.
             | 
             | Separately, if I were at cursor (or any other company for
             | that matter), I'd have the AI scouring HN comments for "I
             | wish x did y" suggestions.
        
               | falcor84 wrote:
               | I've been thinking about this a lot recently - having AI
               | automate product manager user research. My thread of
               | thought goes something like this:
               | 
               | 0. AI can scour the web for user comments/complaints
               | about our product and automatically synthesize those into
               | insights.
               | 
               | 1. AI research can be integrated directly into our
               | product, allowing the user to complain to it just-in-
               | time, whereby the AI would ask for clarification, analyze
               | the user needs, and autonomously create/update an idea
               | ticket on behalf of the user.
               | 
               | 2. An AI integrated into the product could actually
               | change the product UI/UX on its own in some cases,
               | perform ad-hoc user research, asking the user "would it
               | be better if things were like this?" and also measuring
               | objective usability metrics (e.g. task completion time),
               | and then use that validated insight to automatically
               | spawn a PR for an A/B experiment.
               | 
               | 3. Wait a minute - if the AI can change the interface on
               | its own - do we even need to have a single interface for
               | everyone? Perhaps future software would only expose an
               | API and a collection of customizable UI widgets (perhaps
               | coupled with official example interfaces), which each
               | user's "user agent AI" would then continuously adapt to
               | that user's needs?
        
               | darkwater wrote:
               | > 3. Wait a minute - if the AI can change the interface
               | on its own - do we even need to have a single interface
               | for everyone? Perhaps future software would only expose
               | an API and a collection of customizable UI widgets
               | (perhaps coupled with official example interfaces), which
               | each user's "user agent AI" would then continuously adapt
               | to that user's needs?
               | 
               | Nice, in theory. In practice it will be "Use our Premium
               | Agent at 24.99$/month to get all the best features, or
               | use the Basic Agent at 9.99$ that will be less effective,
               | less customizable and inject ads".
        
               | falcor84 wrote:
               | Well, at the end of the day, capitalism is about
               | competition, and I would hope for a future where that
               | "User Agent AI" is a local model fully controlled by the
               | user, and the competition is about which APIs you access
               | through them - so maybe "24.99$/month to get all the best
               | features", but (unless you relinquish control to MS or
               | Google), users wouldn't be shown any ads unless they
               | choose to receive them.
               | 
               | We're seeing something similar in VS Code and its zoo of
               | forks - we're choosing which API/subscriptions to access
               | (e.g. GitLens Pro, or Copilot, or Cursor/Windsurf/Trae
               | etc.), but because the client itself is open source,
               | there aren't any ads.
        
             | mdemare wrote:
             | Claude Code denies that it has a plan mode...
        
           | carpo wrote:
           | Same. I use /ask in Aider so I can read what it's planning,
           | ask follow-up questions, get it to change things, then after
           | a few iterations I can type "Make it so" while sitting back
           | to sip on my Earl Grey.
        
             | tomkwong wrote:
             | I had done something slightly different. I would ask LLM to
             | prepare a design doc, not code, and iterate on that doc
             | before I ask them to start coding. That seems to have
             | worked a little better as it's less likely to go rogue.
        
           | lodovic wrote:
           | I try to be super careful, type the prompt I want to execute
           | in a textfile. Ask the agent to validate and improve on it,
           | and ask it to add an implementation plan. I even let another
           | agent review the final plan. But even then, occasionally it
           | still starts implementing halfway a refining.
        
         | icedchai wrote:
         | Have you tried recently? This hasn't been my experience. I
         | modify the code it's written, then ask it to reread the file.
         | It generally responds "I see you changed file and [something.]"
         | Or when it makes a change, I tell it I need to run some tests.
         | I provide feedback, explain the problem, and it iterates. This
         | is with Zed and Claude Sonnet.
        
           | dkersten wrote:
           | I do notice though that if I edit what it wrote before
           | accepting it, and then it sees it (either because I didn't
           | wait for it to finish or because I send it another message),
           | it will overwrite my changes with what it had before my
           | changes every single time, without fail.
           | 
           | (Zed with Claude 4)
        
             | jagged-chisel wrote:
             | Gemini has insisted on remembering an earlier version of a
             | file even after its own edits.
             | 
             | "We removed that, remember?"
             | 
             | "Yes! I see now ..."
             | 
             | Sometimes it circles back to that same detail that no
             | longer exists.
        
             | icedchai wrote:
             | Interesting. I always wait for it to finish with my
             | workflow.
        
               | dkersten wrote:
               | It does it even if I wait for it to finish, but don't
               | accept. Eg:
               | 
               | Starting code: a quick brown fox
               | 
               | prompt 1: "Capitalize the words"
               | 
               | AI: A Quick Brown Fox
               | 
               | I don't accept or reject, but change it to "A Quick Red
               | Fox"
               | 
               | prompt 2: "Change it to dog"
               | 
               | AI: A Quick Brown Dog
        
               | icedchai wrote:
               | Do you tell it to reread the file? Seems like the updates
               | aren't in the context.
        
               | dkersten wrote:
               | Hmm, perhaps not. I'll have to experiment more.
        
         | mock-possum wrote:
         | In all honesty - have you tried doing what you would do with a
         | paired programmer - that is, talk to them about it?
         | Communicate? I've never had trouble getting cursor or copilot
         | to chat with me about solutions first before making changes,
         | and usually they'll notice if I make my own changes and say
         | "oh, I see you already added XYZ, I'll go ahead and move on to
         | the next part."
        
           | lomase wrote:
           | I've never had trouble getting cursor or copilot to chat with
           | me about solutions first before making changes
           | 
           | Never had any trouble.... and then they lived together happy
           | forever.
        
         | tobyhinloopen wrote:
         | You can totally do that. Just tell it to.
         | 
         | If you want an LLM to do something, you have to explain it.
         | Keep a few prompt docs around to load every conversation.
        
         | haneul wrote:
         | Hmm you can tweak fine these days without messing up context.
         | But, I run in "ask mode" only, with opus in claude code and o3
         | max in cursor. I specifically avoid agent mode because, like in
         | the post, I feel like I gain less over time.
         | 
         | I infrequently tab complete. I type out 80-90% of what is
         | suggested, with some modifications. It does help I can maintain
         | 170 wpm indefinitely on the low-medium end.
         | 
         | Keeping up with the output isn't much an issue at the moment
         | given the limited typing speed of opus and o3 max. Having
         | gained more familiarity with the workflow, the reading feels
         | easier. Felt too fast at first for sure.
         | 
         | My hot take is that if GitHub copilot is your window into llms,
         | you're getting the motel experience.
        
           | catlifeonmars wrote:
           | > My hot take is that if GitHub copilot is your window into
           | llms, you're getting the motel experience.
           | 
           | I've long suspected this; I lean heavily on tab completion
           | from copilot to speed up my coding. Unsurprisingly, it fails
           | to read my mind a large portion of the time.
           | 
           | Thing is, mind reading tab completion is what I actually want
           | in my tooling. It is easier for me to communicate via code
           | rather than prose, and I find the experience of pausing and
           | using natural language to be jarring and distracting.
           | 
           | Writing the code feels like a much more direct form of
           | communicating my intent (in this case to the
           | compiler/interpreter). Maybe I'm just weird; and to be honest
           | I'm afraid to give up my "code first" communication style for
           | programming.
           | 
           | Edit: I think the reason why I find the conversational
           | approach so difficult is that I tend to think as I code. I
           | have fairly strong ADHD and coding gives me appropriate
           | amount of stimulation to do design work.
        
             | maleldil wrote:
             | Take a look at aider's watch mode. It seems like a bridge
             | for code completion with more powerful models than Copilot.
             | 
             | https://aider.chat/docs/usage/watch.html
        
               | catlifeonmars wrote:
               | Thank you! I will check it out
        
         | Macha wrote:
         | My approach has generally been to accept, refactor and reprompt
         | if I need to tweak things.
         | 
         | Of course this does artificially inflate the "accept rate"
         | which the AI companies use to claim that it's writing good
         | code, rather than being a "sigh, I'll fix this myself" moment.
        
           | searls wrote:
           | I do this too and it drives me nuts. It's very obvious to me
           | (and perhaps anyone without an incentive to maximize the
           | accept rate) that the diff view really struggles. If you
           | leave a large diff, copilot and cursor will both get confused
           | and start duplicating chunks, or they'll fail to see the new
           | (or the old) but if you accept it, it always works.
        
             | jeffrallen wrote:
             | Aider solves this by turn-taking. Each modification is a
             | commit. If you hate it, you can undo it (type /undo, it
             | does the git reset --hard for you). If you can live with
             | the code but want to start tweaking it, do so, then /commit
             | (it makes the commit message for you by reading the diffs
             | you made). Working I turns, by commits, Aider can see what
             | you changed and keep up with you. I usually squash the
             | commits at the end, because the wandering way of correcting
             | the AI is not really useful history.
        
         | artursapek wrote:
         | I do this all the time with Claude Code. I'll accept its
         | changes, make adjustments, then tell it what I did and point to
         | the files or tell it to look at the diff.
         | 
         | Pair programming requires communicating both ways. A human
         | would also lose context if you silently changed their stuff.
        
         | pbhjpbhj wrote:
         | I've asked for hints/snippets to give ideas and then
         | implemented what I wanted myself (not commercially). Worked OK
         | for me.
        
       | palisade wrote:
       | LLM agents don't know how to shut up and always think they're
       | right about everything. They also lack the ability to be brief.
       | Sometimes things can be solved with a single character or line,
       | but no they write a full page. And, they write paragraphs of
       | comments for even the most minuscule of changes.
       | 
       | They talk at you, are overbearing and arrogant.
        
         | energy123 wrote:
         | I expect a lot of the things people don't like ("output too
         | long, too many comments in code") are side effects of making
         | the LLM good in other areas.
         | 
         | Long output correlates with less laziness when writing code,
         | and higher performance on benchmarks due to the monotone
         | relationship between number of output tokens and scores.
         | Comment spam correlates with better performance because it's
         | locally-specific reasoning it can attend on when writing the
         | next line of code, leading to reduced errors.
        
           | tobyhinloopen wrote:
           | Just add to the prompt not to include comments and to talk
           | less.
           | 
           | I have a prompt document that includes a complete summary of
           | the Clean Code book, which includes the rules about comments.
           | 
           | You do have to remind it occasionally.
        
             | energy123 wrote:
             | You can, but I would expect code correctness to be reduced,
             | you're removing one mechanism the model uses to dump local
             | reasoning immediately prior to where it's needed.
        
               | tobyhinloopen wrote:
               | With that logic, I should ask the AI to _increase_ the
               | amount of comments. I highly doubt the comments it
               | generates are useful, they're usually very superficial.
        
               | danans wrote:
               | Perhaps not useful to you, but they are the only way the
               | LLM has to know what it is doing.
               | 
               | It has to reason about the problem in its output, since
               | its output comprises almost the entirety of its
               | "awareness". Unlike you, the LLM doesn't "know" anything,
               | even superficial things.
               | 
               | In some sense it's like us when we are working on a
               | problem with lots of novel parts. We usually have to
               | write down notes to refer to in the process of solving
               | the problem, except for the LLM the problem is always a
               | novel problem.
        
               | tobyhinloopen wrote:
               | I usually use huge context/prompt documents (10-100K
               | tokens) before doing anything, I suppose that helps.
               | 
               | I'll experiment with comments, I can always delete them
               | later. My strategy is to have self-documenting code (and
               | my prompts include a how-to on self-documenting code)
        
               | energy123 wrote:
               | But that information is scattered. It's helpful for the
               | LLM to cluster and isolate local reasoning that it can
               | then "forget" about when it moves on to the next thing.
               | Attending to nearby recent tokens is easy for it, looking
               | up relevant information as needle in a haystack every
               | single time is more error prone. I'm not saying asking it
               | to remove comments will lead to a catastrophic drop off
               | in performance, maybe something like a few percent or
               | even less. Just that it's not useless for pure
               | benchmaxxing.
        
             | aerhardt wrote:
             | I have added it in the guidelines doc for Junie and that
             | won't stop it. It can't help itself - it needs to write a
             | comment every three lines, no matter the language it's
             | writing in.
        
               | tobyhinloopen wrote:
               | Hah the need to add comments is pretty resilient, that's
               | true.
        
         | dawnerd wrote:
         | I was trying out sonnet 4 yesterday and it spent 15 minutes
         | changing testing changing etc just to get one config item
         | changed. It ended up changing 40 files for no reason. Also kept
         | trying to open a debugger that didn't exist and load a webpage
         | that requires auth.
         | 
         | They're far from perfect that's for sure.
        
           | SV_BubbleTime wrote:
           | I don't think anyone seriously is claiming perfect. The thing
           | is all of AI is moving 5 times faster than any disrupting
           | tech before it.
           | 
           | We went from proof reading single emails to researching
           | agentic coding in a year.
           | 
           | It should have been five.
        
       | __MatrixMan__ wrote:
       | I've been considering a... protocol? for improving this. Consider
       | this repo:                   foo.py         bar.py
       | bar.py.vibes.md
       | 
       | This would indicate that foo.py is human-written (or at least
       | thoroughly reviewed by a human), while bar.py is LLM written with
       | a lower bar of human scrutiny.
       | 
       | bar.py.vibes.md would contain whatever human-written guidance
       | describes how bar should look. It could be an empty file, or a
       | few paragraphs, or it it could contain function signatures and
       | partially defined data types.
       | 
       | If an LLM wants to add a new file, it gets a vibes.md with
       | whatever prompt motivated the addition.
       | 
       | Maybe some files keep their assiciated *.vibes.md forever, ready
       | to be totally rewritten as the LLM sees fit. Maybe others stick
       | around only until the next release, after which the associated
       | code is reviewed and the vibes files are removed (or somehow
       | deactivated, I could imagine it being useful for them to still be
       | present).
       | 
       | What do people think, do we need handcuffs of this kind for our
       | pair programming friends the LLMs?
        
         | almosthere wrote:
         | I think coding will eventually go away in favor of models with
         | metadata built around them.
         | 
         | How many times did you have a mutation operation where you had
         | to hand code the insert of 3 or 4 entities and make sure they
         | all come back successful, or you back out properly (and perhaps
         | this is without a transaction, perhaps over multiple
         | databases).
         | 
         | Make sure the required fields are present Grab the created
         | inserted ID Rinse, repeat
         | 
         | Or if you're mutating a list, writing code that inserts a new
         | element, but you don't know which one is new. And you end up,
         | again, hand coding loops and checking what you remember to
         | check.
         | 
         | What about when you need to do an auth check.
         | 
         | And the hand coder may fail to remember one little thing
         | somewhere.
         | 
         | With LLM code, you can just describe that function and it will
         | remember to do all the things.
         | 
         | An LLM with a model + metadata - we won't really need to think
         | of it as editing User.java or User.py anymore. Instead
         | User.yaml - and the LLM will just consume that, and build out
         | ALL of your required biz-logic, and be done with it. It could
         | create a fully authenticating/authorizing REST API + GraphQL
         | API with sane defaults - and consistent notions throughout.
         | 
         | And moving into UIs- we can have the same thing. The UI can be
         | described in an organized way. What fields are required for
         | user registration. What fields are optional according to the
         | backend. It's hard to visualize this future, but I think it's a
         | no-code future. It's models of requirements instead.
        
           | cess11 wrote:
           | I don't understand all of what you wrote but a lot of it is
           | very old news and usually done with deterministic tooling you
           | don't have to wait for, some of which you should have built
           | or configured yourself to get it tailored to the type of work
           | you do.
           | 
           | And some of it we've had under the RAD umbrella, basically
           | using configuration files and tools to generate those that
           | are used to generate large portions of systems.
        
           | __MatrixMan__ wrote:
           | What do you suppose that metadata is going to look like if
           | not partially complete code where the LLM fills in the gaps?
        
           | thunspa wrote:
           | In writing the code that is supposed to implement my idea, I
           | find that my idea has many flaws.
           | 
           | Sending that idea to an LLM (in absence of AGI) seems like a
           | great way to find out about the flaws too late.
           | 
           | Otherwise, specifying an application in such detail as to
           | obtain the same effect is essentially coding, just in natural
           | language, which is less precise.
        
           | zahlman wrote:
           | > I think coding will eventually go away in favor of models
           | with metadata built around them.
           | 
           | You can pry my understanding of, and desire to use,
           | traditional programming languages from my cold dead neurons.
           | The entire point of computer systems is that they
           | automatically and unerringly follow precise, explicit
           | instructions.
        
       | ninetyninenine wrote:
       | >LLM agents make bad pairs because they code faster than humans
       | think.
       | 
       | Easily solved. Use less compute. Use slower hardware. Or put in
       | the prompt to pause at certain intervals.
        
       | UltraSane wrote:
       | It is rather soul crushing how fast LLMs spit out decent code.
        
         | Bob_LaBLahh wrote:
         | In my experience, LLMs are idiot savant coders--but currently
         | more idiot than savant. Claude 3.7 (via cursor and roo) can
         | comment code well, create a starter project 10x faster than I
         | could, and they spit out common crud apps pretty well.
         | 
         | However I've come to the conclusion that LLMs are terrible at
         | decision making. I would much rather have an intern architect
         | my code than let AI do it. It's just too unreliable. It seems
         | like 3 out of 4 decisions that it makes are fine. But that 4th
         | decision is usually asinine.
         | 
         | That said, I now consider LLMs a mandatory addition to my
         | toolkit because they have improved my developer efficiency so
         | much. I really am a fan. But without a seasoned dev to write
         | detailed instructions, break down the project into manageable
         | chunks, make all of the key design decisions, and review every
         | line of code that it writes, today's AI will only add a
         | mountain of technical debt to your project.
         | 
         | I guess I'm trying to say: don't worry because the robots
         | cannot replace use yet. We're still in the middle of the hype
         | cycle.
         | 
         | But what do I know? I'm just an average meat coder.
        
           | UltraSane wrote:
           | LLMs currently can generate a few thousand lines of coherent
           | code but they cannot write a cohesive large scale code base.
           | 
           | But LLMs are very good at writing SQL and Cypher queries that
           | I would spend hours or days figuring out how to write.
        
             | Bob_LaBLahh wrote:
             | Agreed.
             | 
             | I find it interesting that LLMs seem pretty good at
             | spitting out SQL that works well enough. But on the other
             | hand LLMs seem pretty awful at working with CSS. I wonder
             | if this is due to a difference in the amount of training
             | data available for SQL vs CSS, or is this because CSS is a
             | finicky pain in the ass when compared to SQL.
        
               | UltraSane wrote:
               | There should be a insane amount of CSS on the web but CSS
               | output is primarily visual so I think that makes it hard
               | for a text only model to generate.
        
               | fragmede wrote:
               | interesting. I've been having a great time telling the
               | LLMs to generate CSS for me so I don't have to fight with
               | tailwind
        
       | Onewildgamer wrote:
       | Finally someone said it, they're overconfident in their approach,
       | don't consult us with the details of the implementation, they're
       | trained to create mock APIs that don't follow structure, leading
       | to lot of rework. The LLM actions should be measured,
       | collaborative, ask for details when it's not present. It is
       | impossible to give every single detail in the initial prompt, and
       | a follow up prompt derails the train thought and design of the
       | application.
       | 
       | I don't know if I'm using it right, I'd love to know more if
       | that's the case. In a way the LLM should improve on being
       | iterative, take feedback, maybe it's a hard problem to add/update
       | the context. I don't know about that either, but love to learn
       | more.
        
         | NitpickLawyer wrote:
         | Most stacks now support some form of "plan" workflows. You'd
         | want to first do this, and see if it improves your experience.
         | 
         | One workflow that works well for me, even with small local
         | models, is to start a plan session with something like: "based
         | on @file, and @docs and @examples, I'd like to _ in @path with
         | the following requirements @module_requirements.md. Let's talk
         | through this and make sure we have all the info before starting
         | to code it."
         | 
         | Then go back and forth, make sure everything is mentioned, and
         | when satisfied either put it into a .md file (so you can retry
         | the coding flow later) or just say "ok do it", and go grab a
         | cup of coffee or something.
         | 
         | You can also make this into a workflow with .rules files or .md
         | files, have a snippets thing from your IDE drop this whenever
         | you start a new task, and so on. The idea with all the
         | advancements in LLMs is that they need lots of context if you
         | want them to be anything other than what they were trained on.
         | And you need to try different flows and see what works on your
         | specific codebase. Something that works for projectA might not
         | work for projectB ootb.
        
         | csomar wrote:
         | Also giving them more details seems to confuse them. There is
         | probably a way around this, though. They are pretty good in
         | finding a tiny silver of information out of the ocean. What I
         | hate is that the industry is all geared toward the same model
         | (chat bot). Imagine if we never invented the keyboard, mouse,
         | GUI, touch screen, etc...
        
           | searls wrote:
           | Yes, this is exactly why the "planning" approach never seems
           | to work for me. Like every ounce of planning I do with the
           | LLM it becomes a pound stupider at implementation time
        
       | tobyhinloopen wrote:
       | This guy needs a custom prompt. I keep a prompt doc around that
       | is constantly updated based on my preferences and corrections.
       | 
       | Not a few sentences but many many lines of examples and
       | documentation
        
         | searls wrote:
         | Gist an example of what you mean? My experience with very large
         | prompts and exacting custom instructions has been drastically
         | eroded "intelligence"
        
           | tobyhinloopen wrote:
           | Many of my prompts include _somewhat_ sensitive details
           | because they're tailor-made for each project. This is a more
           | generic prompt I've been using for my code generation tool:
           | 
           | https://gist.github.com/tobyhinloopen/e567d551c9f30390b23a0a.
           | ..
           | 
           | More about this prompt:
           | 
           | https://bonaroo.nl/2025/05/20/enforced-ai-test-driven-
           | develo...
           | 
           | Lately, I've been letting the agent write the prompt by
           | ordering it to "update the prompt document with my expressed
           | preferences and code conventions", manually reviewing the
           | doc. Literally while writing this comment, I'm waiting for
           | the agent to do:
           | 
           | > note any findings about this project and my expressed
           | preferences and write them to a new prompt document in doc,
           | named 20250610-<summary>.md
           | 
           | I keep a folder of prompt documents because there's so many
           | of them (over 30 as of writing this comment, for a single
           | project). I have more generic ones and more specific ones,
           | and I usually either tell the agent to find relevant prompt
           | documents or tell the agent to read the relevant ones.
           | 
           | Usually over 100K tokens is spent on reading the prompts &
           | documentation before performing any task.
           | 
           | Here's a snippet of the prompt doc it just generated:
           | 
           | https://gist.github.com/tobyhinloopen/c059067037a6edb19065cd.
           | ..
           | 
           | I'm experimenting a lot with prompts, I have yet to learn
           | what works and what doesn't, but one thing is sure: A good
           | prompt makes a huge difference. It's the difference between
           | constantly babysitting and instructing the agent and telling
           | it to do something and waiting for it to complete.
           | 
           | I had many MRs merged with little to no post-prompt guidance.
           | Just fire and forget, commit, read the results, manually test
           | it, and submit as MR. While the code is usually somewhere
           | between "acceptable" and "obviously AI written", it usually
           | works just fine.
        
             | cadr wrote:
             | I guess it is tool-dependent, but do you pass in that
             | enormous prompt on each request?
        
               | tobyhinloopen wrote:
               | Yes, I inject multiple documents like that before every
               | session. The documents I inject are relevant to the
               | upcoming task.
               | 
               | The one I shared is a variant of the "Base" document, I
               | have specific documents per use case. If I know I'm
               | adding features (controller actions), I inject a prompt
               | containing documentation how to add routes, controllers,
               | controller actions, views, etc and how to format views,
               | what helpers are commonly used.
               | 
               | If I'm working on APIs, I have API specific prompts. If
               | I'm working on syncs with specific external services, I
               | have prompts containing the details about these services.
               | 
               | Basically I consider every session a conversation with a
               | new employee. I give them a single task and include all
               | the relevant documentation and guidelines and wish them
               | good luck.
               | 
               | Sometimes it takes a while, but I generally have a second
               | issue to work on, in parallel. So while one agent is
               | fixing one issue, I prepare the other agent to work on
               | the second. Very occasionally I have 3 sessions running
               | at the same time.
               | 
               | I barely write code anymore. I think I've not written a
               | single line of code in the last few work days. Almost
               | everything I submit is written by AI, and every time I
               | have to guide the LLM and I expect the mistake to be
               | repeated, I expand the relevant prompt document.
               | 
               | Last few days I also had the LLM update the prompt
               | documents for me since they're getting pretty big.
               | 
               | I do thoroughly review the code. The generated code is
               | different from how I would write it, sometimes worse but
               | sometimes better.
               | 
               | I also let it write tests, obviously, and I have a few
               | paragraphs to write happy flow tests and "bad flow"
               | tests.
               | 
               | I feel like I'm just scratching the surface of the
               | possibilities. Im writing my own tools to further
               | automate the process, including being able to generate
               | code directly on production and have different versions
               | of modules running based on the current user, so I can
               | test new versions and deploy them instantly to a select
               | group of users. This is just a wild fantasy I have and
               | I'm sure I will find out why it's a terrible idea, but it
               | doesn't stop me from trying.
        
               | cadr wrote:
               | Thanks!
               | 
               | Sorry to belabor the question, when you say "before every
               | session", how many "things" do you do in a session? You
               | say you give them a single task, but do you end up
               | chatting back and forth with the agent in that session? I
               | guess I'm unsure how far back the "context" goes in a
               | conversation and if would drift from your directives if
               | the conversation went back and forth too much.
        
       | atemerev wrote:
       | Aider does everything right. Stop using Cursor or any other
       | agentic environments. Try Aider, it works exactly as suggested
       | here.
        
         | worldsayshi wrote:
         | I've been wanting to use Aider but I want to use Copilot as a
         | provider (to stay compliant with the wishes of my employer).
         | Haven't gone down that road yet because Aider copilot support
         | seems a bit tentative. I see they have some docs about it up
         | now though: https://aider.chat/docs/llms/github.html
         | 
         | "The easiest path is to sign in to Copilot from any JetBrains
         | IDE"
         | 
         | Somebody must've made a standalone login script by now right? I
         | wonder if `gh auth login` can be used to get a token?
        
         | darkstarsys wrote:
         | I prefer Claude Code (the `claude` cmd line version, with
         | Sonnet 4) because it's more like an actual pair-programming
         | session. It uses my claude acct rather than costing extra per
         | token. It also hooks into all my MCP tools (shell (restricted),
         | filesystem, ripgrep, test runners, etc. etc.) which makes it
         | pretty amazing.
         | 
         | After turning off its annoying auto-commit-for-everything
         | behavior, aider does work OK but it's harder to really get it
         | to understand what I want during planning. Its new `--watch-
         | files` thing is pretty darn cool though.
        
       | travisgriggs wrote:
       | > Allow users to pause the agent to ask a clarifying question or
       | push back on its direction without derailing the entire activity
       | or train of thought
       | 
       | I _think_ I've seen Zed /Claude do kind of this. A couple times,
       | I've hit return, and then see that I missed a clarifying
       | statement based on the direction it starts going and I put it in
       | fast, and it corrects.
        
       | Pandabob wrote:
       | I basically jump away from Cursor to ChatGPT when I need to think
       | thoroughly on something like an architecture decision or an edge
       | case etc. Then when I've used ChatGPT to come up with an
       | implementation plan, I jump back to Cursor and have Claude do the
       | actual coding. O3 and ChatGPT's search functionality are just
       | better (at least for myself) currently for "type 2" thinking
       | tasks.
        
       | rhizome31 wrote:
       | As a developer who doesn't use AI for coding, except for the
       | occasional non-project specific question to a chat bot, I am
       | wondering if you use it for client projects or only for your own
       | projects. If you do use it for client projects, do you have some
       | kind of agreement that you're going to share their code with a
       | third-party? I'm asking because most clients will make you sign a
       | contract saying that you shouldn't disclose any information about
       | the project to a third-party. I even once had a client who
       | explicitly stated that AI should not be used. Do you find clients
       | willing to make an exception for AI coding agents?
        
         | internet_points wrote:
         | I don't share anything with openai/anthropic that I wouldn't
         | feel comfortable pasting into a web search prompt.
        
           | rhizome31 wrote:
           | So no AI autocomplete I suppose?
           | 
           | I assume AI autocomplete may send any part of your code base
           | or even all of it to a third-party.
        
         | cess11 wrote:
         | No, I don't. This goes for internal projects as well, we're not
         | going to share code unless payed to do so.
         | 
         | We commonly work with personal information so it would also
         | introduce rather harsh legal risks if usian corporations could
         | reach it.
        
         | Macha wrote:
         | I basically only use it in the workplace, and largely because
         | of one of those AI mandates.
         | 
         | I don't think it actually saves me enough time (or for many
         | tasks, any time) so I wouldn't pay for it for my own projects,
         | and also for my own projects, the enjoyability is a big factor,
         | and I enjoy doing more than prompting.
        
           | rhizome31 wrote:
           | Thank you for the reply. What do you mean by "AI mandates"?
           | Does it mean your company has an explicit policy allowing
           | sharing code with AI services?
        
             | Macha wrote:
             | Sadly, I mean my current employer is doing the whole
             | "tracking to see AI usage rates" and basically checking in
             | performance reviews if people are using as much AI as the
             | AI sales people told the CEO people need to use.
             | 
             | We're a SaaS company so we own all our code.
        
               | rhizome31 wrote:
               | Wow, really?! I had no idea that such policies existed.
               | Quite astonishing I have to say.
        
               | Macha wrote:
               | Klarna, Shopify and either Google or Meta made a lot of
               | press promoting policies like that and also the AI
               | companies themselves are selling this kind of approach in
               | the "how to make the best use of our tools" advice they
               | give to execs.
        
               | aerhardt wrote:
               | I just puked in my mouth a little. Sorry to hear you're
               | being subjected to that.
        
               | suzzer99 wrote:
               | That's hideous.
        
         | dancek wrote:
         | I have a client that actively asks me to use AI more and more.
         | They expect to get better quality code faster, ie. to reduce
         | costs. (That's not my experience but that's beside the point).
        
       | syllogism wrote:
       | LLM agents are very hard to talk about because they're not any
       | one thing. Your action-space in what you say and what approach
       | you take varies enormously and we have very little body of common
       | knowledge about what other people are doing and how they're doing
       | it. Then the agent changes underneath you or you tweak your
       | prompt and it's different again.
       | 
       | In my last few sessions I saw the efficacy of Claude Code plummet
       | on the problem I was working on. I have no idea whether it was
       | just the particular task, a modelling change, or changes I made
       | to the prompt. But suddenly it was glazing every message ("you're
       | absolutely right"), confidently telling me up is down (saying
       | things like "tests now pass" when they completely didn't), it
       | even cheerfully suggested "rm db.sqlite", which would have set me
       | back a fair bit if I said yes.
       | 
       | The fact that the LLM agent can churn out a lot of stuff quickly
       | greatly increases 'skill expression' though. The sharper your
       | insight about the task, the more you can direct it to do
       | something specific.
       | 
       | For instance, most debugging is basically a binary search across
       | the set of processes being conducted. However, the tricky thing
       | is that the optimal search procedure is going to be weighted by
       | the probability of the problem occurring at the different steps,
       | and the expense of conducting different probes.
       | 
       | A common trap when debugging is to take an overly greedy
       | approach. Due to the availability heuristic, our hypotheses about
       | the problem are often too specific. And the more specific the
       | hypothesis, the easier it is to think of a probe that would
       | eliminate it. If you keep doing this you're basically playing
       | Guess Who by asking "Is it Paul? Is it Anne?" etc, instead of "Is
       | the person a man? Does the person have facial hair? etc"
       | 
       | I find LLM agents extremely helpful at forming efficient probes
       | of parts of the stack I'm less fluent in. If I need to know
       | whether the service is able to contact the database, asking the
       | LLM agent to write out the necessary cloud commands is much
       | faster than getting that from the docs. It's also much faster at
       | writing specific tests than I would be. This means I can much
       | more neutrally think about how to bisect the space, which makes
       | debugging time more uniform, which in itself is a significant net
       | win.
       | 
       | I also find LLM agents to be good at the 'eat your vegetables'
       | stuff -- the things I know I should do but would economise on to
       | save time. Populate the tests with more cases, write more tests
       | in general, write more docs as I go, add more output to the
       | scripts, etc.
        
       | throwawayffffas wrote:
       | In my experience the problem is not they are too fast, they are
       | too slow.
       | 
       | Honestly, their speed is just the right amount to make them bad.
       | If they were faster, I could focus on following the code they are
       | writing. But they take so much time for every edit that I tune
       | out. On the other hand if they were slower, I could do other work
       | while they are working, but they are done every 50 seconds to a
       | few minutes which means I can't focus on other tasks.
       | 
       | If they did smaller faster changes it would probably be better.
       | 
       | Ideally though I would prefer them to be more autonomous, and the
       | collaboration mode to be more like going over merge requests than
       | pair programming. I ideally would like to have them take a task
       | and go away for a few hours or even like 30 minutes.
       | 
       | The current loop, provide a task, wait 1 to 3 minutes, see a
       | bunch of changes, provide guidance, repeat is the worst case
       | scenario in my view.
        
         | searls wrote:
         | Yeah, I could completely see this. Reminds me of the "Slow
         | Internet vs No Internet" oatmeal comic
        
         | dsmurrell wrote:
         | > that I tune out
         | 
         | You need a 30L fishtank for your desk. Great for tuning out.
        
       | shultays wrote:
       | A week or so ago I needed to convince chatgpt that following code
       | will indeed initialize x values in struct                 struct
       | MyStruct       {         int x = 5;       };       ...
       | MyStruct myStructs[100];
       | 
       | It was insisting very passionately that you need MyStruct
       | myStructs[100] = {}; instead.
       | 
       | I even showed msvc assembly output and pointed to the place where
       | it is looping & assigning all x values and then it started
       | hallucinating about msvc not conforming the standards. Then I did
       | it for gcc and it said the same. It was surreal how strongly it
       | believed it was correct.
        
         | pegasus wrote:
         | LLM's don't have beliefs, so "convincing" them of this or that
         | is a a waste of your time. The way to handle such cases is to
         | start anew with a clean context and just add your insight to
         | the prompt so that it lands on the right track from the
         | beginning. Remember these models are ultimately just next-token
         | predictors and anthropomorphizing them will invariably lead to
         | suboptimal interactions.
        
         | johnisgood wrote:
         | That is not even valid C code, so you would have to seriously
         | convince me, too.
         | 
         | What makes it invalid is "= 5", and lack of "struct" before
         | "MyStruct" (could have used typedef).
        
           | shultays wrote:
           | It is a c++ code.
        
             | johnisgood wrote:
             | I think the point flew by a few.
        
       | searls wrote:
       | Whenever I land on the front page, I check the comments and brace
       | for HN coming and telling me how stupid I am and lighting me
       | aflame in front of my peers.
       | 
       | But sometimes if I manage to nail the right headline, nobody
       | reads my post and just has their own discussion, and I am spared.
        
         | Upvoter33 wrote:
         | I liked your post, a bit of a "how to enjoy pair programming
         | with an AI". Useful, so thank you!
        
         | thoughtpalette wrote:
         | Hilarious but realistic take. I've noticed a similar trend with
         | other posts. I'm a fan of the discourse either way tbh.
        
       | Traster wrote:
       | I think this has put into words a reason why I bounced off using
       | AI this way, when I need something done I often have a rough idea
       | of _how_ I want it done, and _how_ AI does it often doesn 't
       | match what I want, but because it's gone off and written a 2,000
       | lines of code it's suddenly more work for me to go through and
       | say "Ok, so first off, strip all these comments out, you're
       | doubling the file with trivial explanations of simple code. I
       | don't want X to be abstracted this way, I want that...." etc. And
       | then when I give it feedback 2,000 lines of code suddenly switch
       | to 700 lines of completely different code and I can't keep up.
       | And I don't want my codebase full of disjoint scripts that I
       | don't really understand and all have weirdly different approaches
       | to the problem. I want an AI that I have similar opinions to,
       | which is obviously tough. It's like working with someone on their
       | first day.
       | 
       | I don't know if it's giving the tools less self-confidence per
       | se, but I think it's exposing more the design process. Like
       | ideally you want your designer to go "Ok, I'm thinking of this
       | approach, i'll probably have these sorts of functions or classes,
       | this state will be owned here" and we can approve that first,
       | rather than going straight from prompt -> implementation.
        
         | searls wrote:
         | Yeah, and then you just wind up feeling exhausted AND
         | unsatisfied with where you wind up. You are exactly who I
         | posted this for.
         | 
         | 100% of my positive experiences with agent coding are when I
         | don't have reason to care about the intrinsic qualities of the
         | code (one-off scripts or genuine leaf node functions that can't
         | impact anything else.
        
         | artursapek wrote:
         | You need to tune it.
        
         | theshrike79 wrote:
         | The trick is to have rules specific to the project and your
         | programming style & preferences.
         | 
         | Think of the AI like an outsourced consultant who will never
         | say no. It'll always do everything it can to solve the problem
         | you've given it. If it doesn't know how, it'll write a thousand
         | lines when a single additional dependency and 2 lines of code
         | would've done it.
        
         | Spooky23 wrote:
         | Personally, I've gone from working with the AI to code to
         | working on it to develop specifications. It's also useful at
         | troubleshooting issues.
         | 
         | I'm no longer a developer by trade and it's more a hobby or
         | specific problem solving scenario now. But I find using it to
         | identify gaps in my thinking and edit English is ultimately
         | better than getting random code -- I love editing English text,
         | but find editing code without consistent style a drag for my
         | purposes.
        
         | ozim wrote:
         | At the start of prompt before project requirements I copy paste
         | paragraph about the code I want.
         | 
         | No emojis, no comments, no console log statements, no read me
         | file, no error handling. Act as a senior developer working with
         | other experienced developers.
         | 
         | Otherwise it happily generates bunch of trash that is
         | unreadable. Error handling generated will most of the times
         | just hide errors instead of actually dealing with them.
        
           | ta12653421 wrote:
           | initial prompting like this has a huge impact, yes.
           | 
           | also: I clean the chat and start over sometimes, because
           | results may differ.
        
         | iamkoch wrote:
         | This absolutely captures my experience.
         | 
         | My successful AI written projects are those where I care solely
         | on the output and have little to no knowledge about the subject
         | matter.
         | 
         | When I try to walk an agent through creating anything about
         | which I have a deeply held opinion of what good looks like, I
         | end up frustrated and abandoning the project.
         | 
         | I've enjoyed using roo code's architect function to document an
         | agreed upon approach, then been delighted and frustrated in
         | equal measure by the implementation of code mode.
         | 
         | On revelation is to always start new tasks and avoid continuing
         | large conversations, because I would typically tackle any
         | problem myself in smaller steps with verifiable outputs,
         | whereas I tend to pose the entire problem space to the agent
         | which it invariably fails at.
         | 
         | I've settled on spending time finding what works for me.
         | Earlier today I took 30 minutes to add functionality to an app
         | that would've taken me days to write. And what's more I only
         | put 30 minutes into the diary for it, because I knew what I
         | wanted and didn't care how it got there.
         | 
         | This leads me to conclude that using AI to write code that
         | a(nother) human is one day to interact with is a no-go, for all
         | the reasons listed.
        
           | rlewkov wrote:
           | > "This leads me to conclude that using AI to write code that
           | a(nother) human is one day to interact with is a no-go, for
           | all the reasons listed." So, if one's goal is to develop code
           | that is easily maintainable by others, do you think that AI
           | writing code gets in the way of that goal?
        
         | bgro wrote:
         | Your ai is generating 2000 line code chunks? Are you prompting
         | it to create the entire Skyrim game for SNES? Then after taking
         | long lunch, getting mad when you press run and you find out it
         | made fallout with only melee weapons in a ps1 style?
        
         | cmrdporcupine wrote:
         | I don't have this experience with Claude, frankly. I do have to
         | correct it, but I try to give very specific prompts with very
         | specific instructions. IT does well with _highly_ commented
         | code.
         | 
         | Now, I have the best luck in my personal project codebase,
         | which I know extremely intimately so can be very surgical with.
         | 
         | Work, which has far less comments and is full of very high
         | level abstractions that I don't know as well.. it struggles
         | with. We both do.
         | 
         | It's a fine pair programmer when one of the pairs knows the
         | codebase extremely well. It's a bad companion elsewhere.
        
         | jstummbillig wrote:
         | People are thinking too much of humans and LOCs as something
         | valuable or worth their consideration when working with AI
         | (because usually that LOCs would have required human effort).
         | This is simply not the case when doing AI coding, and you need
         | to adjust how you work because of that and play to the
         | strengths of this setup, if you want to get something out of it
         | and not frustrate yourself.
         | 
         | Here is how to do this: Have it generate something. That first
         | 2000 lines of not so great first attempt code, don't even think
         | about understanding all of that, or, worse, about correcting
         | it.
         | 
         | Review it loosely. You are not dealing with a human! There is
         | absolutely no need to be thorough or nice. You are not hurting
         | any feelings. Go for 80/20 (or the best ratio you think you can
         | get).
         | 
         | Then, think:
         | 
         | - Anything you missed to inform the AI about? Update your
         | initial prompt
         | 
         | - Anything the AI simply does not do well or to your liking?
         | Write general instructions (all of the IDEs have some way of
         | doing that) that are very explicit about what you don't want to
         | see again, and what you want to see instead.
         | 
         | Then _revert everything the ai did_ , and have it go again from
         | the start. You should approach something that's better.
        
           | Sharlin wrote:
           | This approach is essentially the PR workflow preferred by the
           | author. Why let an LLM make huge changes to your working copy
           | just for you to revert them next, instead of just writing
           | patches to be asynchronously reviewed? What you propose is no
           | way of doing _pair programming_ in particular, and seems to
           | support the author's argument.
        
             | jstummbillig wrote:
             | 1. There is not a mention of "pair programming" in the
             | comment I was addressing. As often happens, the discussion
             | evolves.
             | 
             | 2. The point is, that you are training the AI through this
             | process. You can do pair programming afterwards (or not).
             | Aim to instruct it to give you ballpark answers first, and
             | take it from there.
        
           | afavour wrote:
           | > LOCs as something valuable or worth their consideration
           | when working with AI (because usually that LOCs would have
           | required human effort)
           | 
           | At this point AI generated code absolutely requires review by
           | a human so LOC is still an important metric.
        
         | 34679 wrote:
         | Just like with human engineers, you need to start with a
         | planning session. This involves a back and forth discussion to
         | hammer out the details before writing any code. I start off as
         | vague as possible to see if the LLM recommends anything I
         | hadn't thought of, then get more detailed as I go. When I'm
         | satisfied, I have it create 2 documents, initialprompt.txt and
         | TODO.md. The initial prompt file includes a summary of the
         | project along with instructions to read the to do file and mark
         | each step as complete after finishing it.
         | 
         | This ensures the LLM has a complete understanding of the
         | overall goals, along with a step by step list of tasks to get
         | there. It also allows me to quickly get the LLM back up to
         | speed when I need to start a new conversation due to context
         | limits.
        
           | globnomulous wrote:
           | In essence, I need to schedule a meeting with the LLM and
           | 'hammer out a game plan.' Gotta make sure we're 'in sync' and
           | everybody's 'on the same page.'
           | 
           | Meeting-based programming. No wonder management loves it and
           | thinks it should be the future.
        
             | woah wrote:
             | LLMs are stealing the jobs of developers who go off half-
             | cocked and spend three days writing 2000 lines of code
             | implementing the wrong feature instead of attending a 30
             | minute meeting
        
               | devmor wrote:
               | _and_ the jobs of developers that want to schedule
               | another breakout session to discuss the pros and cons of
               | a 2-line change.
        
               | ge96 wrote:
               | Yeah... I'm gonna need to circle back on that
        
               | hnthrow90348765 wrote:
               | That's dumb, of course, but sometimes people really just
               | do the bare minimum to describe what they want and they
               | can only think clearly once there's something in front of
               | them. The 2000 lines there should be considered a POC,
               | even at 2000 lines.
        
             | dowager_dan99 wrote:
             | my manager has been experimenting have AI first right the
             | specs as architecture decision records (ADR), then explain
             | how the would implement them, then slowly actually
             | implementing with lots of breaks, review and
             | approval/feedback. He says it's been far superior to
             | typically agent coding but not perfect.
        
             | ben_w wrote:
             | Meetings are how managers keep everyone else aligned with
             | their goals.
        
           | apwell23 wrote:
           | > This ensures the LLM has a complete understanding of the
           | overall goals
           | 
           | Forget about overall goal. I have this simple instruction
           | that i send on every request
           | 
           | "stop after every failing unit test and discuss
           | implementation with me before writing source code "
           | 
           | but it only does that about 7 times out of 10. Other times it
           | just proceeds with implementation anyways.
        
             | avandekleut wrote:
             | Ive found similar behaviour with stopping at linting
             | errors. I wonder if my instructions are conflicting with
             | the agent system prompt.
        
               | kenfox wrote:
               | System prompts themselves have many contradictions. I
               | remember hearing an Anthropic engineer (possibly Lex
               | Fridman's interview with Amanda Askell) talking about
               | using exaggerated language like "NEVER" just to steer
               | Claude to rarely do something.
        
               | apwell23 wrote:
               | that doesn't work ( atleast not anymore)
        
             | jyounker wrote:
             | So it behaves just like a person.
        
         | slfnflctd wrote:
         | > "Ok, I'm thinking of this approach, i'll probably have these
         | sorts of functions or classes, this state will be owned here"
         | 
         | This is the gist of what I've always wanted from a programming
         | mentor, instructor, or tutor.
         | 
         | It can be surprisingly hard to find. Knowing that current LLMs
         | still struggle with it perhaps helps explain why.
        
         | tristor wrote:
         | > It's like working with someone on their first day.
         | 
         | This matches my experience exactly, but worse than working with
         | a human on their first day, day 100 for an AI is still like
         | working with them on their first day. Humans have effectively
         | infinite context windows over a long enough period of time, AIs
         | context windows are so constrained that it's not worthwhile to
         | invest the effort to 'teach' it like you would a junior
         | engineer.
        
           | SirHumphrey wrote:
           | It's not really that humans have infinite context windows,
           | it's more that the context windows are a very poor
           | substitutes for long term memory.
           | 
           | Memory even in a text heavy field like programming is not
           | only text based so it's often hard to describe for example an
           | appropriate amount of error checking in prompt.md. Giving a
           | person with anterograde amnesia a book of everything they
           | know - no matter how well indexed or how searchable will not
           | fix the lack of long term memory.
        
         | bakkoting wrote:
         | Anthropic's guide to using Claude Code [1] is worth reading.
         | 
         | Specifically, their recommended workflow is "first ask it to
         | read the code, then ask it to make a plan to implement your
         | change, then tell it to execute". That sounds like the workflow
         | you're asking for - you can read its plan and make adjustments
         | before it writes a single line of code.
         | 
         | One of the weird things about using agents is that if they're
         | doing things in a way you don't like, including things like
         | writing code without first running the design by you, you can
         | simply ask them to do things a different way.
         | 
         | [1] https://www.anthropic.com/engineering/claude-code-best-
         | pract...
        
           | woah wrote:
           | > you can simply ask them to do things a different way
           | 
           | Instead of a writing a blog post about how they didn't guess
           | how you wanted things done?
        
           | ta12653421 wrote:
           | good one!
           | 
           | I#m wondering how some can complain about ClaudeAI: - its
           | actually enlightening - it saves a lot of time - by
           | intuition, i did whats written in this blog from the
           | beginning on
           | 
           | YES: - sometimes the solution is rubish because i can see
           | that its "randomly" is connecting/integrating stuff - ...but:
           | In about 95% of the cases the output is exactly what i asked
           | for
        
         | zild3d wrote:
         | > I want an AI that I have similar opinions to, which is
         | obviously tough. It's like working with someone on their first
         | day.
         | 
         | Most of what you're describing does apply to humans on the
         | first day, and ais on their first day. If you aren't capturing
         | these preferences somewhere and giving it to either a human or
         | the ai, then why would they somehow know your preferences? For
         | ai, the standard thats forming is you create some markdown
         | file(s) with these so they only need to be explained once, and
         | auto provided as context.
        
         | Horde wrote:
         | There is an old issue, that's related, why one should avoid
         | using equality relations in AI used for creating a proof. It
         | will go back and forth and fill the log with trivial statements
         | before it comes to the right proof path. This might end up
         | being the majority of the proof and could be an unbounded part.
         | Then somebody has to read that and spend a good deal of time
         | deciding what's trivial and what isn't. Whereas if you remove
         | equality, you have something that isn't very natural.
        
         | chermi wrote:
         | I prompt it to come up with 3 potential implementation plans,
         | choose which one it thinks is best, and explain its plan to me
         | before it does anything. I also ask it to enumerate which
         | files/functions it will modify in its chosen plan. Then you can
         | give feedback on what it thinks and have it come up with a new
         | plan if you don't like it. Every bit of context and constraints
         | you give it helps. Having a design doc + a little description
         | of your design/code "philosophy" helps. This is easy to do in
         | cursor with rules, I'm guessing other tools have a similar
         | feature. Also, if there's a somewhat similar feature
         | implemented already or if you have a particular style, tell it
         | to reference example files/code snippets.
        
         | nixpulvis wrote:
         | I honestly don't expect to use AI tools extensively for code
         | generation until we figure out how to have the models learn and
         | become accustomed to me aside from clever context prompting. I
         | want my own models derived from the baseline.
         | 
         | That said, I also value not becoming too dependent on any
         | service which isn't free and efficient. Relying on a CNC
         | machine when you never learned how to whittle strips me of a
         | sense of security and power I'm not comfortable with.
        
         | schwartzworld wrote:
         | > but because it's gone off and written a 2,000 lines of code
         | 
         | That's a you problem, not an AI problem. You have to give it
         | small tasks broken down the same way you would break them down.
        
       | motbus3 wrote:
       | I have mixed feelings about this situation. I have committed
       | myself to learning how to use it as effectively as possible and
       | to utilising it extensively for at least one month. Through my
       | company, I have access to multiple products, so I am trying them
       | all.
       | 
       | I can say that I am more productive in terms of the number of
       | lines written. However, I cannot claim to be more productive
       | overall.
       | 
       | For every task it completes, it often performs some inexplicable
       | actions that undo or disrupt other elements, sometimes unrelated
       | ones. The tests it generates initially appear impressive, but
       | upon examining other metrics, such as coverage, it becomes
       | evident that its performance is lacking. The amount of time
       | required to guide it to the correct outcome makes it feel as
       | though I am taking many steps backwards before making any
       | significant progress forward--and not in a beneficial way. On one
       | occasion, it added 50,000 unnecessary import lines into a module
       | that it should not have been altering.
       | 
       | On another occasion, one of the agents completely dismantled the
       | object-oriented programming hierarchy, opting instead to use
       | if/else statements throughout, despite the rules I had set.
       | 
       | The issue is that you can never be certain of its performance.
       | Sometimes, for the same task, it operates flawlessly, while at
       | other times, it either breaks everything or behaves
       | unpredictably.
       | 
       | I have tried various techniques to specify what needs to be done
       | and how to accomplish it, yet often, for similar tasks, its
       | behaviour varies so significantly between runs that I find myself
       | needing to review every change it makes each time. Frustratingly,
       | even if the code is nearly correct and you request an update to
       | just one part, it may still behave erratically.
       | 
       | My experience thus far suggests that it is quite effective for
       | small support tools, but when dealing with a medium-sized
       | codebase, one cannot expect it to function reliably every time.
        
       | azhenley wrote:
       | Writing out hundreds of lines of code is not what I meant by
       | proactive tools...
       | 
       | Where are the proactive coding tools?
       | https://austinhenley.com/blog/proactiveai.html
        
       | bsenftner wrote:
       | The collaborative style of AI use struck me as the obvious
       | correct use of AI, just as the more popular "AI writing code"
       | style struck me as horribly incorrect and indication of yet again
       | the software industry going off on a fool's tangent, as the
       | larger industry so often does.
       | 
       | I never have AI write code. I ask it to criticize code I've
       | written, and I use it to strategize about large code
       | organization. As a strategy consult, with careful LLM context
       | construction, one can create amazing effective guides that teach
       | one new information very successfully. That is me using my mind
       | to understand and then do, never giving any AI responsibilities
       | beyond advice. AI is an idiot savant, and must be treated as
       | such.
        
       | pradeepodela wrote:
       | The major problem I see with current LLM-based code generation is
       | their overconfidence beyond a certain point. I've experienced
       | agents losing track of what they are developing; a single line
       | change can literally mess up my entire codebase, making debugging
       | a nightmare.
       | 
       | I believe we need more structured, policy-driven models that
       | exhibit a bit of self-doubt, prompting them to revert to us for
       | clarification. Furthermore, there should be certain industry
       | standards in place. Another significant issue is testing and
       | handling edge cases. No matter what, today's AI consistently
       | fails when dealing with these scenarios, and security remains a
       | concern. what are some problems you have noticed ??
        
       | SkyBelow wrote:
       | >Continue to practice pair-programming with your editor, but
       | throttle down from the semi-autonomous "Agent" mode to the turn-
       | based "Edit" or "Ask" modes.
       | 
       | This can be done while staying in agent mode. I never used edit
       | mode and only use ask mode when my question has nothing to do
       | with the project I have open. Any other time, I tell it to either
       | make no changes at all as I'm only asking a question to research
       | something, or to limit changes to a much smaller scope and style.
       | It doesn't work perfectly, but it works well enough that it is
       | worth the tradeoff given the extra capabilities agent mode seems
       | to provide (this likely depends upon the specific AI/LLM system
       | you are using, so given another tool I might not arrive at the
       | same conclusion).
        
       | jamil7 wrote:
       | I think one huge issue with pairing for non programmers or junior
       | programmers is that the LLM never pushes back on whatever you
       | throw at it. Like it can't deconstruct and examine what that
       | acutal problem and suggest a more robust or simplar alternative.
        
       | sltr wrote:
       | > to ask a clarifying question
       | 
       | I think one shortfall of LLMs is their reluctance to ask
       | clarifying questions. From my own post [1]:
       | 
       | > LLMs are poor communicators, so you have to make up the
       | difference. Unlike a talented direct report, LLMs don't yet seem
       | generally able to ask the question-behind-the-question or to
       | infer a larger context behind a prompt, or even ask for
       | clarification.
       | 
       | [1] https://www.slater.dev/dev-skills-for-the-llm-era/
        
       | cmrdporcupine wrote:
       | Article has some good points. They move fast, and they can easily
       | run off the rails if you don't watch carefully.
       | 
       | And I've found that it's just as mentally exhausting programming
       | alongside one as it is doing it yourself.
       | 
       | The chief advantage I've found of working alongside Claude is its
       | automation of tedious (to me) tasks.
        
       | woah wrote:
       | > give up on editor-based agentic pairing in favor of
       | asynchronous workflows like GitHub's new Coding Agent, whose work
       | you can also review via pull request
       | 
       | Why not just review the agent's work before making a git commit?
        
       | natch wrote:
       | tldr: author is a bad prompter.
       | 
       | Good prompting takes real and ongoing work, thought, foresight,
       | attention, and fastidious communication.
        
       | skeptrune wrote:
       | Title of the blog is negative, but the contents seem fairly
       | positive? If a few UX improvements are the only blocker to the
       | author finding LLMs to be useful pair programmers then we are in
       | a good spot.
        
       | epolanski wrote:
       | I'm conflicted, you can slow down and take all the time you need
       | to understand and ask to clarify further.
        
       | pjmlp wrote:
       | Even humans are bad pair programmers, I always try to steer away
       | from projects or companies that have drinken the whole XP kool
       | aid.
        
         | BeetleB wrote:
         | Indeed, this submission is also a good article on why pair
         | programming often fails _with humans_.
         | 
         | It's not that AIs are bad. It's that pair programming is
         | (often) not effective. In the majority of cases, one side
         | dominates the other.
         | 
         | In my experience, a modified pair programming system works
         | better where the two of us discuss the problem to be solved,
         | then each of us goes off for a few hours independently coming
         | up with ideas, and doing experiments. We then get together,
         | discuss our findings, and finalize a plan of attack. And _then_
         | pair programming helps as we 're both at the same level and on
         | the same page. But even then, having to watch someone else's
         | screen (or have people interrupt you while you're typing) is a
         | pain.
        
       | jyounker wrote:
       | I think the author's dislike for pair programming says a great
       | deal more about the author than it does about pair programming or
       | LLMs.
       | 
       | If you're pair programming and you're not driving, then it's your
       | job to ask the driver to slow down so you can understand what
       | they're doing. You may have to ask them to explain it to you. You
       | may have to explain it back to them. This back-and-forth is what
       | makes pairing work. If you don't do this, then of course you'll
       | get lost.
       | 
       | The author seems to take the same passive position with an LLM,
       | and the results are similar.
        
       ___________________________________________________________________
       (page generated 2025-06-10 23:01 UTC)