[HN Gopher] Embracing the parallel coding agent lifestyle
___________________________________________________________________
Embracing the parallel coding agent lifestyle
Author : jbredeche
Score : 68 points
Date : 2025-10-06 10:40 UTC (3 days ago)
(HTM) web link (simonwillison.net)
(TXT) w3m dump (simonwillison.net)
| dhorthy wrote:
| > If I tell them exactly how to build something the work needed
| to review the resulting changes is a whole lot less taxing.
|
| Totally matches my experience- the act of planning the work,
| defining what you want and what you don't, ordering the steps and
| declaring the verification workflows---whether I write it or
| another engineer writes it, it makes the review step so much
| easier from a cognitive load perspective.
| simonw wrote:
| Related: Jesse Vincent just published this
| https://blog.fsck.com/2025/10/05/how-im-using-coding-agents-... -
| it's a really good description of a much more advanced multi-
| agent workflow than what I've been doing.
| chrisweekly wrote:
| Thanks! Your post is great, and Jesse's is too. Bookmarked
| both.
| brandonb wrote:
| Both this and Jesse's articles are great. Thanks for posting!
| CBLT wrote:
| Git worktrees are global mutable state; all containers on your
| laptop are contending on the same git database. This has a couple
| of rough edges, but you can work around it.
|
| I prefer instead to make shallow checkouts for my LXC containers,
| then my main repo can just pull from those. This works just like
| you expect, without weird worktree issues. The container here is
| actually providing a security boundary. With a worktree, you need
| to mount the main repo's .git directory; a malicious process
| could easily install a git hook to escape.
| chrisweekly wrote:
| good point
| threecheese wrote:
| Cool. Operationally, are you using some host-resident non-
| shallow repo as your point of centralization for the
| containers, or are you using a central network-hosted repo
| (like github)?
|
| If the former, how are you getting the shallow clones to the
| container/mount, before you start the containerized agent? And
| when the agent is done, are you then adding its updated shallow
| clones as remotes to that "central" local repository clone and
| then fetching/merging?
|
| If the latter, I guess you are just shallow-cloning into each
| container from the network remote and then pushing completed
| branches back up that way.
| Areibman wrote:
| >AI-generated code needs to be reviewed, which means the natural
| bottleneck on all of this is how fast I can review the results
|
| I also fire off tons of parallel agents, and review is hands down
| the biggest bottleneck.
|
| I built an OSS code review tool designed for reviewing parallel
| PRs, and way faster than looking at PRs on Github:
| https://github.com/areibman/bottleneck
| ridruejo wrote:
| This is a great article and not sure why it got so few upvotes.
| It captures the way we have been working for a while and why we
| developed and open sourced Rover
| (https://endor.dev/blog/introducing-rover), our internal tool for
| managing coding agents. It automates a lot of what Simon
| describes like setting up git worktrees, giving each agent its
| own containerized environment and allowing mixing and matching
| agents from different vendors (ie Claude and Codex) for different
| tasks
| ridruejo wrote:
| Previous submission here with some comments already:
| https://news.ycombinator.com/item?id=45481585
| dang wrote:
| Thanks! Looks like that post didn't get any frontpage time.
| Since the current thread is on the frontpage, maybe we'll merge
| those comments hither.
|
| (Warning: this involves adjusting timestamps a la https://hn.al
| golia.com/?dateRange=all&page=0&prefix=true&que..., which is
| sometimes confusing...)
| ColinEberhardt wrote:
| Thanks Simon - you asked us to share patterns that work.
| Coincidentally I just finished writing up this post:
|
| https://blog.scottlogic.com/2025/10/06/delegating-grunt-work...
|
| Using AI Agents to implement UI automation tests - a task that I
| have always found time-consuming and generally frustrating!
| xnx wrote:
| We are at a weird moment where the latency of the response is
| slow enough that we're anthropomorphizing AI code assistants into
| employees. We don't talk about image generation this way. With
| images, its batching up a few jobs and reviewing the results
| later. We don't say "I spun up a bunch of AI artists."
| radarsat1 wrote:
| Are there any semi autonomous agentic systems for image
| generation? I feel like mostly it's still a one shot deal but
| maybe there's an idea there.
|
| I guess Adobe is working on it. Maybe Figma too.
| xnx wrote:
| That's part of my point. You don't need to conceptualize
| something as an "agent" that goes off and does work on its
| own when the latency is less than 2 seconds.
| laterium wrote:
| As a follow-up, how would this workflow feel if the LLM
| generation were instantenous or cost nothing? What would the
| new bottleneck be? Running the tests? Network speed? The human
| reviewer?
| simonw wrote:
| You can get a glimpse of that by trying one of the wildly
| performant LLM providers - most notably Cerebras and Groq, or
| the Gemini Diffusion preview.
|
| I have videos showing Cerebras:
| https://simonwillison.net/2024/Oct/31/cerebras-coder/ and
| Gemini Diffusion:
| https://simonwillison.net/2025/May/21/gemini-diffusion/
| grim_io wrote:
| I'm not convinced there is any hope for a productive, long-term,
| burnout-free parallel agent workflow.
|
| Not while they need even the slightest amount of
| supervision/review.
| unshavedyak wrote:
| Yea, i find success in LLMs overall but the quality of the work
| is proportional to how much oversight there is.
| joshvm wrote:
| My suspicion is that it's because the feedback loop is so fast.
| Imagine if you were tasked with supervising 2 co-workers who
| gave you 50-100 line diffs to review every minute. The uncanny
| valley is that the code is rarely good enough to accept
| blindly, but the response is quick enough that it _feels_ like
| progress. And perhaps an human impulse to respond to the agent?
| And a 10-person team? In reality those 10 people would review
| each other 's PRs and in a good organisation you trust each
| other to gatekeep what gets checked in. The answer sounds like
| managing-agents, but none of the models are good enough to
| reliably say what's slop and what's not.
| grim_io wrote:
| I don't like to compare LLM's to people.
|
| There is a real return of investment in co-workers over time,
| as they get better (most of the time).
|
| Now, I don't mind engaging in a bit of Sisyphean endeavor
| using an LLM, but remember that the gods were kind enough to
| give him just one boulder, not 10 juggling balls.
| joshvm wrote:
| It's less about a direct comparison to people and more what
| a similar scenario would be in a normal development team
| (and why we don't put one person solely in charge of
| review).
|
| This is an advantage of async systems like Jules/Copilot,
| where you can send off a request and get on with something
| else. I also wonder if the response from CLI agents is also
| short enough that you can waste time staring at the loading
| bar, because context switching between replies is even more
| expensive.
| ragnese wrote:
| Yes. The first time I heard/read someone describe this idea
| of managing parallel agents, my very first thought was that
| this is only even a thing because the LLM coding tools are
| still slow enough that you can't really achieve a good flow
| state with the current state of the art. On the flip side of
| that, this kind of workflow is only sustainable if the agents
| _stay_ somewhat slow. Otherwise, if the agents are blocking
| on your attention, it seems like it would feel very hectic
| and I could see myself getting burned out pretty quickly from
| having to spend my whole work time doing a round-robin on
| iterating each agent forward.
|
| I say that having not tried this work flow at all, so what do
| I know? I mostly only use Claude Code to bounce questions off
| of and ask it to do reviews of my work, because I still
| haven't had that much luck getting it to actually write code
| that is complete and how I like.
| simonw wrote:
| The thing that's working really well for me is parallel
| research tasks.
|
| I can pay full attention to the change I'm making right now,
| while having a couple of coding agents churning in the
| background answering questions like:
|
| "How can I resolve all of the warnings in this test run?"
|
| Or
|
| "Which files do I need to change when working on issue #325?"
|
| I also really like the "Send out a scout" pattern described in
| https://sketch.dev/blog/seven-prompting-habits - send an agent
| to implement a complex feature with _no intention_ of actually
| using their code - but instead aiming to learn from which files
| and tests they updated, since that forms a useful early map for
| the actual work.
| aantix wrote:
| Along these lines, how does everyone visually organize the
| multiple terminal tabs open for these numerous agents in various
| states?
|
| I wish there were a way to search across all open tabs.
|
| I've started color-coding my Claude code tabs, all red, which
| helps me to find them visually. I do this with a preexec in my
| ~/.zshrc.
|
| But wondering if anyone else has any better tricks for organizing
| all of these agent tabs?
|
| I'm using iTerm2 on macOS.
| Multiplayer wrote:
| This may not be your cup of tea but I'm using stage manager on
| Mac for the first time after hating on it for years, for
| exactly this. I've got 3 monitors and they all have 4-6 stage
| manager windows. I bundle the terminals, web browser and
| whatever into these windows. Easily switching from project to
| project.
| sounds231 wrote:
| Aerospace (tiling window manager for macOS). Bit of a learning
| curve but it's amazing
| simonw wrote:
| There are quite a few products designed to help manage multiple
| agents at once. I'm trying out Conductor right now -
| https://conductor.build/ - it's a pretty slick macOS desktop app
| that handles running Claude Code within a GUI and uses Git
| worktrees to manage separate checkouts at the same time.
| jbentley1 wrote:
| Check out Crystal, similar but open source
| https://github.com/stravu/crystal
| jbentley1 wrote:
| I do this every day! In fact, I built a new type of IDE around
| this (https://github.com/stravu/crystal) and I can never go back.
| dirck-norman wrote:
| I've seen minimal gains trying to adopt agents into my workflow
| beyond tests and explanations. It tends to be distracting.
|
| It's so interesting that engineers will criticize context
| switching, only to adopt it into their technical workflows
| because it's pitched as a technical solution rather than
| originating from business needs.
| torvald wrote:
| Has anyone encountered any good YouTube channels that explore and
| showcase these workflows in a productive and educational manner?
| pduggishetti wrote:
| So many marketing and spam comments on this post, it is insane
| cuttothechase wrote:
| The fact that we now have to write cook book about cook books
| kind of masks the reality that there is something that could be
| genuinely wrong about this entire paradigm.
|
| Why are even experts unsure about whats the right way to do
| something or even if its possible to do something at all, for
| anything non-trivial? Why so much hesitancy, if this is the
| panacea? If we are so sure then why not use the AI itself to come
| up with a proven paradigm?
| MrDarcy wrote:
| This is like any other new technology. We're figuring it out.
| cuttothechase wrote:
| Mostly agree but with one big exception. The real issue seems
| to be that the figuring out part is happening a bit too late.
| A bit like burn a few hundred billion dollars [0] first ask
| questions later!?
|
| [0] - https://hai.stanford.edu/ai-index/2025-ai-index-
| report/econo...
| baq wrote:
| The bets are placed because if this tech really keeps
| scaling for the next few years, only the ones who bet today
| will be left standing.
|
| If the tech stops scaling, whatever we have today is still
| useful and in some domains revolutionary.
| cuttothechase wrote:
| Is it fair to categorize that it is a pyramid like scheme
| but with a twist at the top where there are a few (more
| than a one) genuine wins and winners?
| jonas21 wrote:
| No, it's more like a winner take all market, where a few
| winners will capture most of the value, and those who sit
| on the sidelines until everything is figured out are left
| fighting over the scraps.
| hx8 wrote:
| I share the same skepticism, but I have more patience to watch
| an emerging technology advance and forgiving as experts come to
| a consensus while communicating openly.
| nkmnz wrote:
| Radioactivity was discovered before nuclear engineering
| existed. We had phenomena first and only later the math,
| tooling, and guardrails. LLMs are in that phase. They are
| powerful stochastic compressors with weak theory. No stable
| abstractions yet. Objectives shift, data drifts, evals leak,
| and context windows make behavior path dependent. That is why
| experts hedge.
|
| "Cookbooks about cookbooks" are what a field does while it
| searches for invariants. Until we get reliable primitives and
| specs, we trade in patterns and anti-patterns. Asking the AI to
| "prove the paradigm" assumes it can generate guarantees it does
| not possess. It can explore the design space and surface
| candidates. It cannot grant correctness without an external
| oracle.
|
| So treat vibe-engineering like heuristic optimization. Tight
| loops. Narrow scopes. Strong evals. Log everything. When we
| find the invariants, the cookbooks shrink and the compilers
| arrive.
| johnh-hn wrote:
| It reminds me of a quote from Designing Data-Intensive
| Applications by Martin Kleppmann. It goes something like, "For
| distributed systems, we're trying to create a reliable system
| out of a set of unreliable components." In a similar fashion,
| we're trying to get reliable results from an unreliable process
| (i.e. prompting LLMs to do what we ask).
|
| The difficulties of working with distributed systems are well
| known but it took a lot of research to get there. The uncertain
| part is whether research will help overcome the issues of using
| LLMs, or whether we're really just gambling (in the literal
| sense) at scale.
| torginus wrote:
| LLMs are literal gambling - you get them to work right once and
| they are magical - then you end up chasing that high by
| tweaking the model and instructions the rest of the time.
| handfuloflight wrote:
| I actually found in my case that is just self inertia in not
| wanting to break through cognitive plateaus. The AI helped
| you with a breakthrough hence the magic, but you also did
| something right in your constructing of the context in the
| conversation with the AI; ie. you did thought and
| biomechanical[1] work. Now the dazzle of the AI's output
| makes you forget the work you still need to do, and the next
| time you prompt you get lazy, or you want much more, for much
| less.
|
| [1] (moving your eyes, hands, hearing with your ears. etc)
| afarah1 wrote:
| Any setups without Claude Code? I use CoPilot agent heavily on
| VSCode, from time to time I have independent grunt work that
| could be parallelized to two or three agents, but I haven't seen
| a decent setup for that with CoPilot or some other VSCode
| extension that I could use my CoPilot subscription with.
| wilsonnb3 wrote:
| GitHub Copilot has a CLI now, I think it is in beta.
|
| It also supports background agents that you can kick off on the
| GitHub website, they run on VMs
| SeanAnderson wrote:
| https://raw.githubusercontent.com/obra/dotfiles/6e088092406c...
| contains the following entry:
|
| "- If you're uncomfortable pushing back out loud, just say
| "Strange things are afoot at the Circle K". I'll know what you
| mean"
|
| Most of the rules seem rationale. This one really stands out as
| abnormal. Anyone have any idea why the engineer would have felt
| compelled to add this rule?
|
| This is from https://blog.fsck.com/2025/10/05/how-im-using-
| coding-agents-... mentioned in another comment
| lcnPylGDnU4H9OF wrote:
| Naively, I assume it's a way of getting around sycophancy.
| There's many lines that seem to be doing that without
| explicitly saying "don't be a sycophant" (I mean, you can only
| do that so much).
|
| The LLM would be uncomfortable pushing back because that's not
| being a sycophant so instead of that it says something that
| is... let's say unlikely to be generated, except in that
| context, so the user can still be cautioned against a bad idea.
| SeanAnderson wrote:
| Is it your impression that this rules statement would be
| effective? Or is it more just a tell-tale sign of an
| exasperated developer?
| lcnPylGDnU4H9OF wrote:
| Assuming that's why it was added, I wouldn't be confident
| saying how likely it is to be effective. Especially with
| there being so many other statements with seemingly the
| same intent, I think it suggests desperation more, but it
| may still be effective. If it said the phrase just once and
| that sparked a conversation around an actual problem, then
| it was probably worth adding.
|
| For what it's worth, I am very new to prompting LLMs but,
| in my experience, these concepts of "uncomfortable" and
| "pushing back" seem to be things LLMs generate text about
| so I think they understand sentiment fairly well. They can
| generally tell that they are "uncomfortable" about their
| desire to "push back" so it's not implausible that one
| would output that sentence in that scenario.
|
| Actually, I've been wondering a bit about the "out loud"
| part, which I think is referring to <think></think> text
| (or similar) that "reasoning" models generate to help
| increase the likelihood of accurate generation in the
| answer that follows. That wouldn't be "out loud" and it
| might include text like "I should push back but I should
| also be a total pushover" or whatever. It could be that
| reasoning models in particular run into this issue (in
| their experience).
| OtherShrezzing wrote:
| To get around the sycophantic behaviour I prompt the model to
|
| > when discussing implementations, always talk as though
| you're my manager at a Wall Street investment bank in the
| 1980s. Praise me modestly when I've done something well.
| Berate me mercilessly when I've done something poorly.
|
| The models will fairly rigidly write from the perspective of
| any personality archetype you tell it to. Other personas
| worth trying out include Jafar interacting with Iago, or the
| drill sergeant from Full Metal Jacket.
|
| It's important to pick a persona you'll find funny, rather
| than insulting, because it's a miserable experience being
| told by a half dozen graphics cards that you're an imbecile.
| simonw wrote:
| I tried "give me feedback on this blog post like you're a
| cynical Hacker News commenter" one time and Claude roasted
| me so hard I decided never to try that again!
| simonw wrote:
| That doesn't surprise me too much coming from Jesse. See also
| his attempt to give Claude a "feelings journal"
| https://blog.fsck.com/2025/05/28/dear-diary-the-user-asked-m...
| threecheese wrote:
| If you _really_ want your mind blown, see what Jesse is doing
| (successfully, which I almost can't believe) with Graphviz .dot
| notation and Claude.md:
|
| https://blog.fsck.com/2025/09/29/using-graphviz-for-claudemd...
| blibble wrote:
| this is just 21st century voodoo
| aymenfurter wrote:
| Async agents are great. They let you trigger work with almost no
| effort, and if the results miss the mark you can just discard
| them. They also run in the background, making it feel like
| delegating tasks across teammates who quietly get things done in
| parallel.
| WhyOhWhyQ wrote:
| I can't seem to get myself to focus when one of these things is
| running. I transition into low effort mode. Because of this I've
| decided to have my good hours of the day LLM free, and then in my
| crappy hours I'll have one of these running.
| munk-a wrote:
| I'm very happy to see the article covering the high labor costs
| of reviewing code. This may just be my neurodivergent self but I
| find code in the specific style I write to be much easier to
| quickly verify since there are habits and customs (very
| functional leaning) I have around how I approach specific tasks
| and can easily handwave seeing a certain style of function with
| the "Let me just double check that I wrote that in the normal
| manner later" and continue reviewing a top-level piece of logic
| rather than needing to dive into sub-calls to check for errant
| side effects or other sneakiness that I need to be on the look
| out for in peer reviews.
|
| When working with peers I'll pick up on those habits and others
| and slowly gain a similar level of trust but with agents the
| styles and approaches have been quite unpredictable and varied -
| this is probably fair given that different units of logic may be
| easier to express in different forms but it breaks my review
| habits in that I keep in mind the developer and can watch for
| specific faulty patterns I know they tend to fall into while
| building up trust around their strengths. When reviewing agentic
| generated code I can trust nothing and have to verify every
| assumption and that introduces a massive overhead.
|
| My case may sound a bit extreme but in others I've observed
| similar habits when it comes to reviewing new coworker's code,
| the first few reviews of a new colleague should always be done
| with the upmost care to ensure proper usage of any internal
| tooling, adherence to style, and also as a fallback in case the
| interview was misleading - overtime you build up trust and can
| focus more on known complications of the particular task or areas
| of logic they tend to struggle on while trusting their common
| code more. When it comes to agentically generated code every
| review feels like interacting with a brand new coworker and need
| to be vigilant about sneaky stuff.
| extr wrote:
| IMO, I was an early adopter to this pattern and at this point
| I've mostly given it up (except in cases where the task is
| embarassingly parallel, eg: add some bog standard logging to 6
| different folders). It's more than just that reviewing is high
| cognitive overhead. You become biased by seeing the AI solutions
| and it becomes harder to catch fundamental problems you would
| have noticed immediately inline.
|
| My process now is:
|
| - Verbally dictate what I'm trying to accomplish with MacWhisper
| + Parakeet v3 + GPT-5-Mini for cleanup. This is usually 40-50
| lines of text.
|
| - Instruct the agent to explore for a bit and come up with a very
| concise plan matching my goal. This does NOT mean create a spec
| for the work. Simply come up with an approach we can describe in
| < 2 paragraphs. I will propose alternatives and make it defend
| the approach.
|
| - Authorize the agent to start coding. I turn all edit
| permissions off and manually approve each change. Often, I find
| myself correcting it with feedback like "Hmmm, we already have a
| structure for that [over here] why don't we use that?". Or "If
| this fails we have bigger problems, no need for exception
| handling here."
|
| - At the end, I have it review the PR with a slash command to
| catch basic errors I might have missed or that only pop up now
| that it's "complete".
|
| - I instruct it to commit + create a PR using the same tone of
| voice I used for giving feedback.
|
| I've found I get MUCH better work product out of this - with the
| benefit that I'm truly "done". I saw all the lines of code as
| they were written, I know what went into it. I can (mostly)
| defend decisions. Also - while I have extensive rules set up in
| my CLAUDE/AGENTS folders, I don't need to rely on them.
| Correcting via dictation is quick and easy and doesn't take long,
| and you only need to explicitly mention something once for it to
| avoid those traps the rest of the session.
|
| I also make heavy use of conversation rollback. If I need to go
| off on a little exploration/research, I rollback to before that
| point to continue the "main thread".
|
| I find that Claude is really the best at this workflow. Codex is
| great, don't get me wrong, but probably 85% of my coding tasks
| are not involving tricky logic or long range dependencies. It's
| more important for the model to quickly grok my intent and act
| fast/course correct based on my feedback. I absolutely use
| Codex/GPT-5-Pro - I will have Sonnet 4.5 dump a description of
| the issue, paste it to Codex, have it work/get an answer, and
| then rollback Sonnet 4.5 to simply give it the answer directly as
| if from nowhere.
___________________________________________________________________
(page generated 2025-10-09 23:00 UTC)