hngopher.com

       [HN Gopher] Embracing the parallel coding agent lifestyle
       ___________________________________________________________________
        
       Embracing the parallel coding agent lifestyle
        
       Author : jbredeche
       Score  : 68 points
       Date   : 2025-10-06 10:40 UTC (3 days ago)
        
 (HTM) web link (simonwillison.net)
 (TXT) w3m dump (simonwillison.net)
        
       | dhorthy wrote:
       | > If I tell them exactly how to build something the work needed
       | to review the resulting changes is a whole lot less taxing.
       | 
       | Totally matches my experience- the act of planning the work,
       | defining what you want and what you don't, ordering the steps and
       | declaring the verification workflows---whether I write it or
       | another engineer writes it, it makes the review step so much
       | easier from a cognitive load perspective.
        
       | simonw wrote:
       | Related: Jesse Vincent just published this
       | https://blog.fsck.com/2025/10/05/how-im-using-coding-agents-... -
       | it's a really good description of a much more advanced multi-
       | agent workflow than what I've been doing.
        
         | chrisweekly wrote:
         | Thanks! Your post is great, and Jesse's is too. Bookmarked
         | both.
        
         | brandonb wrote:
         | Both this and Jesse's articles are great. Thanks for posting!
        
       | CBLT wrote:
       | Git worktrees are global mutable state; all containers on your
       | laptop are contending on the same git database. This has a couple
       | of rough edges, but you can work around it.
       | 
       | I prefer instead to make shallow checkouts for my LXC containers,
       | then my main repo can just pull from those. This works just like
       | you expect, without weird worktree issues. The container here is
       | actually providing a security boundary. With a worktree, you need
       | to mount the main repo's .git directory; a malicious process
       | could easily install a git hook to escape.
        
         | chrisweekly wrote:
         | good point
        
         | threecheese wrote:
         | Cool. Operationally, are you using some host-resident non-
         | shallow repo as your point of centralization for the
         | containers, or are you using a central network-hosted repo
         | (like github)?
         | 
         | If the former, how are you getting the shallow clones to the
         | container/mount, before you start the containerized agent? And
         | when the agent is done, are you then adding its updated shallow
         | clones as remotes to that "central" local repository clone and
         | then fetching/merging?
         | 
         | If the latter, I guess you are just shallow-cloning into each
         | container from the network remote and then pushing completed
         | branches back up that way.
        
       | Areibman wrote:
       | >AI-generated code needs to be reviewed, which means the natural
       | bottleneck on all of this is how fast I can review the results
       | 
       | I also fire off tons of parallel agents, and review is hands down
       | the biggest bottleneck.
       | 
       | I built an OSS code review tool designed for reviewing parallel
       | PRs, and way faster than looking at PRs on Github:
       | https://github.com/areibman/bottleneck
        
       | ridruejo wrote:
       | This is a great article and not sure why it got so few upvotes.
       | It captures the way we have been working for a while and why we
       | developed and open sourced Rover
       | (https://endor.dev/blog/introducing-rover), our internal tool for
       | managing coding agents. It automates a lot of what Simon
       | describes like setting up git worktrees, giving each agent its
       | own containerized environment and allowing mixing and matching
       | agents from different vendors (ie Claude and Codex) for different
       | tasks
        
       | ridruejo wrote:
       | Previous submission here with some comments already:
       | https://news.ycombinator.com/item?id=45481585
        
         | dang wrote:
         | Thanks! Looks like that post didn't get any frontpage time.
         | Since the current thread is on the frontpage, maybe we'll merge
         | those comments hither.
         | 
         | (Warning: this involves adjusting timestamps a la https://hn.al
         | golia.com/?dateRange=all&page=0&prefix=true&que..., which is
         | sometimes confusing...)
        
       | ColinEberhardt wrote:
       | Thanks Simon - you asked us to share patterns that work.
       | Coincidentally I just finished writing up this post:
       | 
       | https://blog.scottlogic.com/2025/10/06/delegating-grunt-work...
       | 
       | Using AI Agents to implement UI automation tests - a task that I
       | have always found time-consuming and generally frustrating!
        
       | xnx wrote:
       | We are at a weird moment where the latency of the response is
       | slow enough that we're anthropomorphizing AI code assistants into
       | employees. We don't talk about image generation this way. With
       | images, its batching up a few jobs and reviewing the results
       | later. We don't say "I spun up a bunch of AI artists."
        
         | radarsat1 wrote:
         | Are there any semi autonomous agentic systems for image
         | generation? I feel like mostly it's still a one shot deal but
         | maybe there's an idea there.
         | 
         | I guess Adobe is working on it. Maybe Figma too.
        
           | xnx wrote:
           | That's part of my point. You don't need to conceptualize
           | something as an "agent" that goes off and does work on its
           | own when the latency is less than 2 seconds.
        
         | laterium wrote:
         | As a follow-up, how would this workflow feel if the LLM
         | generation were instantenous or cost nothing? What would the
         | new bottleneck be? Running the tests? Network speed? The human
         | reviewer?
        
           | simonw wrote:
           | You can get a glimpse of that by trying one of the wildly
           | performant LLM providers - most notably Cerebras and Groq, or
           | the Gemini Diffusion preview.
           | 
           | I have videos showing Cerebras:
           | https://simonwillison.net/2024/Oct/31/cerebras-coder/ and
           | Gemini Diffusion:
           | https://simonwillison.net/2025/May/21/gemini-diffusion/
        
       | grim_io wrote:
       | I'm not convinced there is any hope for a productive, long-term,
       | burnout-free parallel agent workflow.
       | 
       | Not while they need even the slightest amount of
       | supervision/review.
        
         | unshavedyak wrote:
         | Yea, i find success in LLMs overall but the quality of the work
         | is proportional to how much oversight there is.
        
         | joshvm wrote:
         | My suspicion is that it's because the feedback loop is so fast.
         | Imagine if you were tasked with supervising 2 co-workers who
         | gave you 50-100 line diffs to review every minute. The uncanny
         | valley is that the code is rarely good enough to accept
         | blindly, but the response is quick enough that it _feels_ like
         | progress. And perhaps an human impulse to respond to the agent?
         | And a 10-person team? In reality those 10 people would review
         | each other 's PRs and in a good organisation you trust each
         | other to gatekeep what gets checked in. The answer sounds like
         | managing-agents, but none of the models are good enough to
         | reliably say what's slop and what's not.
        
           | grim_io wrote:
           | I don't like to compare LLM's to people.
           | 
           | There is a real return of investment in co-workers over time,
           | as they get better (most of the time).
           | 
           | Now, I don't mind engaging in a bit of Sisyphean endeavor
           | using an LLM, but remember that the gods were kind enough to
           | give him just one boulder, not 10 juggling balls.
        
             | joshvm wrote:
             | It's less about a direct comparison to people and more what
             | a similar scenario would be in a normal development team
             | (and why we don't put one person solely in charge of
             | review).
             | 
             | This is an advantage of async systems like Jules/Copilot,
             | where you can send off a request and get on with something
             | else. I also wonder if the response from CLI agents is also
             | short enough that you can waste time staring at the loading
             | bar, because context switching between replies is even more
             | expensive.
        
           | ragnese wrote:
           | Yes. The first time I heard/read someone describe this idea
           | of managing parallel agents, my very first thought was that
           | this is only even a thing because the LLM coding tools are
           | still slow enough that you can't really achieve a good flow
           | state with the current state of the art. On the flip side of
           | that, this kind of workflow is only sustainable if the agents
           | _stay_ somewhat slow. Otherwise, if the agents are blocking
           | on your attention, it seems like it would feel very hectic
           | and I could see myself getting burned out pretty quickly from
           | having to spend my whole work time doing a round-robin on
           | iterating each agent forward.
           | 
           | I say that having not tried this work flow at all, so what do
           | I know? I mostly only use Claude Code to bounce questions off
           | of and ask it to do reviews of my work, because I still
           | haven't had that much luck getting it to actually write code
           | that is complete and how I like.
        
         | simonw wrote:
         | The thing that's working really well for me is parallel
         | research tasks.
         | 
         | I can pay full attention to the change I'm making right now,
         | while having a couple of coding agents churning in the
         | background answering questions like:
         | 
         | "How can I resolve all of the warnings in this test run?"
         | 
         | Or
         | 
         | "Which files do I need to change when working on issue #325?"
         | 
         | I also really like the "Send out a scout" pattern described in
         | https://sketch.dev/blog/seven-prompting-habits - send an agent
         | to implement a complex feature with _no intention_ of actually
         | using their code - but instead aiming to learn from which files
         | and tests they updated, since that forms a useful early map for
         | the actual work.
        
       | aantix wrote:
       | Along these lines, how does everyone visually organize the
       | multiple terminal tabs open for these numerous agents in various
       | states?
       | 
       | I wish there were a way to search across all open tabs.
       | 
       | I've started color-coding my Claude code tabs, all red, which
       | helps me to find them visually. I do this with a preexec in my
       | ~/.zshrc.
       | 
       | But wondering if anyone else has any better tricks for organizing
       | all of these agent tabs?
       | 
       | I'm using iTerm2 on macOS.
        
         | Multiplayer wrote:
         | This may not be your cup of tea but I'm using stage manager on
         | Mac for the first time after hating on it for years, for
         | exactly this. I've got 3 monitors and they all have 4-6 stage
         | manager windows. I bundle the terminals, web browser and
         | whatever into these windows. Easily switching from project to
         | project.
        
         | sounds231 wrote:
         | Aerospace (tiling window manager for macOS). Bit of a learning
         | curve but it's amazing
        
       | simonw wrote:
       | There are quite a few products designed to help manage multiple
       | agents at once. I'm trying out Conductor right now -
       | https://conductor.build/ - it's a pretty slick macOS desktop app
       | that handles running Claude Code within a GUI and uses Git
       | worktrees to manage separate checkouts at the same time.
        
         | jbentley1 wrote:
         | Check out Crystal, similar but open source
         | https://github.com/stravu/crystal
        
       | jbentley1 wrote:
       | I do this every day! In fact, I built a new type of IDE around
       | this (https://github.com/stravu/crystal) and I can never go back.
        
       | dirck-norman wrote:
       | I've seen minimal gains trying to adopt agents into my workflow
       | beyond tests and explanations. It tends to be distracting.
       | 
       | It's so interesting that engineers will criticize context
       | switching, only to adopt it into their technical workflows
       | because it's pitched as a technical solution rather than
       | originating from business needs.
        
       | torvald wrote:
       | Has anyone encountered any good YouTube channels that explore and
       | showcase these workflows in a productive and educational manner?
        
       | pduggishetti wrote:
       | So many marketing and spam comments on this post, it is insane
        
       | cuttothechase wrote:
       | The fact that we now have to write cook book about cook books
       | kind of masks the reality that there is something that could be
       | genuinely wrong about this entire paradigm.
       | 
       | Why are even experts unsure about whats the right way to do
       | something or even if its possible to do something at all, for
       | anything non-trivial? Why so much hesitancy, if this is the
       | panacea? If we are so sure then why not use the AI itself to come
       | up with a proven paradigm?
        
         | MrDarcy wrote:
         | This is like any other new technology. We're figuring it out.
        
           | cuttothechase wrote:
           | Mostly agree but with one big exception. The real issue seems
           | to be that the figuring out part is happening a bit too late.
           | A bit like burn a few hundred billion dollars [0] first ask
           | questions later!?
           | 
           | [0] - https://hai.stanford.edu/ai-index/2025-ai-index-
           | report/econo...
        
             | baq wrote:
             | The bets are placed because if this tech really keeps
             | scaling for the next few years, only the ones who bet today
             | will be left standing.
             | 
             | If the tech stops scaling, whatever we have today is still
             | useful and in some domains revolutionary.
        
               | cuttothechase wrote:
               | Is it fair to categorize that it is a pyramid like scheme
               | but with a twist at the top where there are a few (more
               | than a one) genuine wins and winners?
        
               | jonas21 wrote:
               | No, it's more like a winner take all market, where a few
               | winners will capture most of the value, and those who sit
               | on the sidelines until everything is figured out are left
               | fighting over the scraps.
        
         | hx8 wrote:
         | I share the same skepticism, but I have more patience to watch
         | an emerging technology advance and forgiving as experts come to
         | a consensus while communicating openly.
        
         | nkmnz wrote:
         | Radioactivity was discovered before nuclear engineering
         | existed. We had phenomena first and only later the math,
         | tooling, and guardrails. LLMs are in that phase. They are
         | powerful stochastic compressors with weak theory. No stable
         | abstractions yet. Objectives shift, data drifts, evals leak,
         | and context windows make behavior path dependent. That is why
         | experts hedge.
         | 
         | "Cookbooks about cookbooks" are what a field does while it
         | searches for invariants. Until we get reliable primitives and
         | specs, we trade in patterns and anti-patterns. Asking the AI to
         | "prove the paradigm" assumes it can generate guarantees it does
         | not possess. It can explore the design space and surface
         | candidates. It cannot grant correctness without an external
         | oracle.
         | 
         | So treat vibe-engineering like heuristic optimization. Tight
         | loops. Narrow scopes. Strong evals. Log everything. When we
         | find the invariants, the cookbooks shrink and the compilers
         | arrive.
        
         | johnh-hn wrote:
         | It reminds me of a quote from Designing Data-Intensive
         | Applications by Martin Kleppmann. It goes something like, "For
         | distributed systems, we're trying to create a reliable system
         | out of a set of unreliable components." In a similar fashion,
         | we're trying to get reliable results from an unreliable process
         | (i.e. prompting LLMs to do what we ask).
         | 
         | The difficulties of working with distributed systems are well
         | known but it took a lot of research to get there. The uncertain
         | part is whether research will help overcome the issues of using
         | LLMs, or whether we're really just gambling (in the literal
         | sense) at scale.
        
         | torginus wrote:
         | LLMs are literal gambling - you get them to work right once and
         | they are magical - then you end up chasing that high by
         | tweaking the model and instructions the rest of the time.
        
           | handfuloflight wrote:
           | I actually found in my case that is just self inertia in not
           | wanting to break through cognitive plateaus. The AI helped
           | you with a breakthrough hence the magic, but you also did
           | something right in your constructing of the context in the
           | conversation with the AI; ie. you did thought and
           | biomechanical[1] work. Now the dazzle of the AI's output
           | makes you forget the work you still need to do, and the next
           | time you prompt you get lazy, or you want much more, for much
           | less.
           | 
           | [1] (moving your eyes, hands, hearing with your ears. etc)
        
       | afarah1 wrote:
       | Any setups without Claude Code? I use CoPilot agent heavily on
       | VSCode, from time to time I have independent grunt work that
       | could be parallelized to two or three agents, but I haven't seen
       | a decent setup for that with CoPilot or some other VSCode
       | extension that I could use my CoPilot subscription with.
        
         | wilsonnb3 wrote:
         | GitHub Copilot has a CLI now, I think it is in beta.
         | 
         | It also supports background agents that you can kick off on the
         | GitHub website, they run on VMs
        
       | SeanAnderson wrote:
       | https://raw.githubusercontent.com/obra/dotfiles/6e088092406c...
       | contains the following entry:
       | 
       | "- If you're uncomfortable pushing back out loud, just say
       | "Strange things are afoot at the Circle K". I'll know what you
       | mean"
       | 
       | Most of the rules seem rationale. This one really stands out as
       | abnormal. Anyone have any idea why the engineer would have felt
       | compelled to add this rule?
       | 
       | This is from https://blog.fsck.com/2025/10/05/how-im-using-
       | coding-agents-... mentioned in another comment
        
         | lcnPylGDnU4H9OF wrote:
         | Naively, I assume it's a way of getting around sycophancy.
         | There's many lines that seem to be doing that without
         | explicitly saying "don't be a sycophant" (I mean, you can only
         | do that so much).
         | 
         | The LLM would be uncomfortable pushing back because that's not
         | being a sycophant so instead of that it says something that
         | is... let's say unlikely to be generated, except in that
         | context, so the user can still be cautioned against a bad idea.
        
           | SeanAnderson wrote:
           | Is it your impression that this rules statement would be
           | effective? Or is it more just a tell-tale sign of an
           | exasperated developer?
        
             | lcnPylGDnU4H9OF wrote:
             | Assuming that's why it was added, I wouldn't be confident
             | saying how likely it is to be effective. Especially with
             | there being so many other statements with seemingly the
             | same intent, I think it suggests desperation more, but it
             | may still be effective. If it said the phrase just once and
             | that sparked a conversation around an actual problem, then
             | it was probably worth adding.
             | 
             | For what it's worth, I am very new to prompting LLMs but,
             | in my experience, these concepts of "uncomfortable" and
             | "pushing back" seem to be things LLMs generate text about
             | so I think they understand sentiment fairly well. They can
             | generally tell that they are "uncomfortable" about their
             | desire to "push back" so it's not implausible that one
             | would output that sentence in that scenario.
             | 
             | Actually, I've been wondering a bit about the "out loud"
             | part, which I think is referring to <think></think> text
             | (or similar) that "reasoning" models generate to help
             | increase the likelihood of accurate generation in the
             | answer that follows. That wouldn't be "out loud" and it
             | might include text like "I should push back but I should
             | also be a total pushover" or whatever. It could be that
             | reasoning models in particular run into this issue (in
             | their experience).
        
           | OtherShrezzing wrote:
           | To get around the sycophantic behaviour I prompt the model to
           | 
           | > when discussing implementations, always talk as though
           | you're my manager at a Wall Street investment bank in the
           | 1980s. Praise me modestly when I've done something well.
           | Berate me mercilessly when I've done something poorly.
           | 
           | The models will fairly rigidly write from the perspective of
           | any personality archetype you tell it to. Other personas
           | worth trying out include Jafar interacting with Iago, or the
           | drill sergeant from Full Metal Jacket.
           | 
           | It's important to pick a persona you'll find funny, rather
           | than insulting, because it's a miserable experience being
           | told by a half dozen graphics cards that you're an imbecile.
        
             | simonw wrote:
             | I tried "give me feedback on this blog post like you're a
             | cynical Hacker News commenter" one time and Claude roasted
             | me so hard I decided never to try that again!
        
         | simonw wrote:
         | That doesn't surprise me too much coming from Jesse. See also
         | his attempt to give Claude a "feelings journal"
         | https://blog.fsck.com/2025/05/28/dear-diary-the-user-asked-m...
        
         | threecheese wrote:
         | If you _really_ want your mind blown, see what Jesse is doing
         | (successfully, which I almost can't believe) with Graphviz .dot
         | notation and Claude.md:
         | 
         | https://blog.fsck.com/2025/09/29/using-graphviz-for-claudemd...
        
           | blibble wrote:
           | this is just 21st century voodoo
        
       | aymenfurter wrote:
       | Async agents are great. They let you trigger work with almost no
       | effort, and if the results miss the mark you can just discard
       | them. They also run in the background, making it feel like
       | delegating tasks across teammates who quietly get things done in
       | parallel.
        
       | WhyOhWhyQ wrote:
       | I can't seem to get myself to focus when one of these things is
       | running. I transition into low effort mode. Because of this I've
       | decided to have my good hours of the day LLM free, and then in my
       | crappy hours I'll have one of these running.
        
       | munk-a wrote:
       | I'm very happy to see the article covering the high labor costs
       | of reviewing code. This may just be my neurodivergent self but I
       | find code in the specific style I write to be much easier to
       | quickly verify since there are habits and customs (very
       | functional leaning) I have around how I approach specific tasks
       | and can easily handwave seeing a certain style of function with
       | the "Let me just double check that I wrote that in the normal
       | manner later" and continue reviewing a top-level piece of logic
       | rather than needing to dive into sub-calls to check for errant
       | side effects or other sneakiness that I need to be on the look
       | out for in peer reviews.
       | 
       | When working with peers I'll pick up on those habits and others
       | and slowly gain a similar level of trust but with agents the
       | styles and approaches have been quite unpredictable and varied -
       | this is probably fair given that different units of logic may be
       | easier to express in different forms but it breaks my review
       | habits in that I keep in mind the developer and can watch for
       | specific faulty patterns I know they tend to fall into while
       | building up trust around their strengths. When reviewing agentic
       | generated code I can trust nothing and have to verify every
       | assumption and that introduces a massive overhead.
       | 
       | My case may sound a bit extreme but in others I've observed
       | similar habits when it comes to reviewing new coworker's code,
       | the first few reviews of a new colleague should always be done
       | with the upmost care to ensure proper usage of any internal
       | tooling, adherence to style, and also as a fallback in case the
       | interview was misleading - overtime you build up trust and can
       | focus more on known complications of the particular task or areas
       | of logic they tend to struggle on while trusting their common
       | code more. When it comes to agentically generated code every
       | review feels like interacting with a brand new coworker and need
       | to be vigilant about sneaky stuff.
        
       | extr wrote:
       | IMO, I was an early adopter to this pattern and at this point
       | I've mostly given it up (except in cases where the task is
       | embarassingly parallel, eg: add some bog standard logging to 6
       | different folders). It's more than just that reviewing is high
       | cognitive overhead. You become biased by seeing the AI solutions
       | and it becomes harder to catch fundamental problems you would
       | have noticed immediately inline.
       | 
       | My process now is:
       | 
       | - Verbally dictate what I'm trying to accomplish with MacWhisper
       | + Parakeet v3 + GPT-5-Mini for cleanup. This is usually 40-50
       | lines of text.
       | 
       | - Instruct the agent to explore for a bit and come up with a very
       | concise plan matching my goal. This does NOT mean create a spec
       | for the work. Simply come up with an approach we can describe in
       | < 2 paragraphs. I will propose alternatives and make it defend
       | the approach.
       | 
       | - Authorize the agent to start coding. I turn all edit
       | permissions off and manually approve each change. Often, I find
       | myself correcting it with feedback like "Hmmm, we already have a
       | structure for that [over here] why don't we use that?". Or "If
       | this fails we have bigger problems, no need for exception
       | handling here."
       | 
       | - At the end, I have it review the PR with a slash command to
       | catch basic errors I might have missed or that only pop up now
       | that it's "complete".
       | 
       | - I instruct it to commit + create a PR using the same tone of
       | voice I used for giving feedback.
       | 
       | I've found I get MUCH better work product out of this - with the
       | benefit that I'm truly "done". I saw all the lines of code as
       | they were written, I know what went into it. I can (mostly)
       | defend decisions. Also - while I have extensive rules set up in
       | my CLAUDE/AGENTS folders, I don't need to rely on them.
       | Correcting via dictation is quick and easy and doesn't take long,
       | and you only need to explicitly mention something once for it to
       | avoid those traps the rest of the session.
       | 
       | I also make heavy use of conversation rollback. If I need to go
       | off on a little exploration/research, I rollback to before that
       | point to continue the "main thread".
       | 
       | I find that Claude is really the best at this workflow. Codex is
       | great, don't get me wrong, but probably 85% of my coding tasks
       | are not involving tricky logic or long range dependencies. It's
       | more important for the model to quickly grok my intent and act
       | fast/course correct based on my feedback. I absolutely use
       | Codex/GPT-5-Pro - I will have Sonnet 4.5 dump a description of
       | the issue, paste it to Codex, have it work/get an answer, and
       | then rollback Sonnet 4.5 to simply give it the answer directly as
       | if from nowhere.
        
       ___________________________________________________________________
       (page generated 2025-10-09 23:00 UTC)