[HN Gopher] Field Notes from Shipping Real Code with Claude
___________________________________________________________________
Field Notes from Shipping Real Code with Claude
Author : diwank
Score : 258 points
Date : 2025-06-07 18:11 UTC (1 days ago)
(HTM) web link (diwank.space)
(TXT) w3m dump (diwank.space)
| kasey_junk wrote:
| One of the exciting things to me about the ai agents is how they
| push and allow you to build processes that we've always known
| were important but were frequently not prioritized in the face of
| shipping the system.
|
| You can use how uncomfortable you are with the ai doing something
| as a signal that you need to invest in systematic verification of
| that something. As a for instance in the link, the team could
| build a system for verifying and validating their data
| migrations. That would move a whole class of changes into the ai
| relm.
|
| This is usually much easier to quantify and explain externally
| than nebulous talk about tech debt in that system.
| diwank wrote:
| For sure. Another interesting trick I found to be surprisingly
| effective is to ask Claude Code to "Look around the codebase,
| and if something is confusing, or weird/counterintuitive --
| drop a _AIDEV-QUESTION: ..._ comment so I can document that bit
| of code and /or improve it". We found some really gnarly things
| that had been forgotten in the codebase.
| theptip wrote:
| Agreed, my hunch is that you might use higher abstraction-level
| validation tools like acceptance and property tests, or even
| formal verification, as the relative cost of boilerplate
| decreases.
| diwank wrote:
| Author here: To be honest, I know there are like a bajillion
| Claude code posts out there these days.
|
| But, there are a few nuggets we figured are worth sharing, like
| Anchor Comments [1], which have really made a difference:
|
| ---- # CLAUDE.md ### Anchor comments
| Add specially formatted comments throughout the codebase, where
| appropriate, for yourself as inline knowledge that can be easily
| `grep`ped for. - Use `AIDEV-NOTE:`, `AIDEV-TODO:`, or
| `AIDEV-QUESTION:` as prefix as appropriate. -
| *Important:* Before scanning files, always first try to grep for
| existing `AIDEV-...`. - Update relevant anchors, after
| finishing any task. - Make sure to add relevant anchor
| comments, whenever a file or piece of code is: * too
| complex, or * very important, or * could have a
| bug
|
| ----
|
| [1]: https://diwank.space/field-notes-from-shipping-real-code-
| wit...
| meeech wrote:
| Honest question: approx what percent of the post was human vs
| machine written?
| diwank wrote:
| I'd say around ~40% me, the ideating, editing, citations, and
| images are all mine; rest Opus 4 :)
|
| I typically try to also include the original Claude chat's
| link in the post but it seems like Claude doesn't allow
| sharing chats with deep research used in them.
|
| _Update_ : here's an older chatgpt conversation while
| preparing this: https://chatgpt.com/share/6844eaae-07d0-8001-
| a7f7-e532d63bf8...
| meeech wrote:
| thanks. to be clear, I'm not asking the q to be
| particularly negative about it. Its more just curiosity,
| mixed with trade in effort. If you wrote it 100%, I'm more
| inclined to read the whole thing. vs say now just feeding
| it back to the GPM to extract the condensed nuggets.
| ishita159 wrote:
| great tip! will do that to consume ai generated content
| as well.
| tomhow wrote:
| Thanks for being transparent about this, but we're not
| wanting substantially LLM-generated content on HN.
|
| We've been asking the community to refrain from publicly
| accusing authors of posting LLM-generated articles and
| comments. But the other side of that is that we expect
| authors to post content thay they've created themselves.
|
| It's one thing to use an LLM for proof-reading and editing
| suggestions, but quite another for "60%" of an article to
| be LLM-generated. For that reason I'm having to bury the
| post.
|
| Edit: I changed this decision after further information and
| reflection. See this comment for further details:
| https://news.ycombinator.com/item?id=44215719
| diwank wrote:
| I completely understand. Just to clarify, when I said it
| was ~40%, I didn't mean the content was written by
| Claude/ChatGPT but that I took its help in deep research
| and writing the first drafts. The ideas, all of the code
| examples, the original CLAUDE.md files, the images,
| citations, etc are all mine.
| tomhow wrote:
| Ok, sure, these things are hard to quantify. The main
| issue is that we can't ask the community to refrain from
| accusing authors of publishing AI-generated content if
| people really are publishing content that is obviously
| AI-generated. What matters to us is not how much AI was
| used to write an article, but rather how much the
| audience finds that the article satisfies intellectual
| curiosity. If the audience can sense that the article is
| generated, they lose trust in the content and the author,
| and also lose trust in HN as a place they can visit to
| find high-quality content.
|
| Edit: On reflection, given your explanation of your use
| of AI and given another comment [1] I replied to below, I
| don't think this post is disqualified after all.
|
| [1] https://news.ycombinator.com/item?id=44215719
| diwank wrote:
| I appreciate this :)
| pbhjpbhj wrote:
| Surely you're missing the wood for the trees here - isn't
| the point of asking for no 'AI' to avoid low effort slop?
| This is a relatively high value post about adopting new
| practices and the human-LLM integration.
|
| Tag it, let users decide how they want to vote.
|
| Aside: meta: If you're speaking on behalf of HN you
| should indicate that in the post (really with a marker
| outside of the comment).
| tomhow wrote:
| Indeed, and since the author has clarified what they
| meant by "40%", I've put the post back on the front page.
| Another relevant factor is they seem not to speak English
| as a primary language, and I think we can make allowances
| for such people to use LLMs to polish their writing.
|
| Regarding your other suggestion: it's been the case ever
| since HN started 18 years ago that moderators/modcomments
| don't have any special designation. This is due to our
| preference for simple design and an aversion to seeming
| separate from the community. We trust that people will
| work it out and that has always worked well here.
| elcritch wrote:
| Shouldn't the quality of the content be what matters?
| Avoiding articles with low grade effort or genuine
| content either made with or without LLMs would seem to be
| a better goal.
| tomhow wrote:
| Yes, and I've changed the decision, given further
| information and reflection, and explained it here:
| https://news.ycombinator.com/item?id=44215719
| ericb wrote:
| Is the percentage meaningful, though? If an LLM produces
| the most interesting, insightful, thought-provoking
| content of the day, isn't that what the best version of
| HN would be reading and commenting on?
|
| If I invent the wheel, and have an LLM write 90% of the
| article from bullet points and edit it down, don't we
| still want HN discussing the wheel?
|
| Not to say that the current generation of AI isn't often
| producing boring slop, but there's nothing that says it
| will remain that way, and a percent-AI assistance seems
| like the wrong metric to chase to me?
| never_inline wrote:
| Because why do you anti-compress your thoughts using LLM
| at all? It makes things harder to read.
| threeseed wrote:
| > If an LLM produces the most interesting, insightful,
| thought-provoking content of the day, isn't that what the
| best version of HN would be reading and commenting on?
|
| Absolutely not. Would much rather take some that is
| boring, not thought provoking but that was authentic and
| real rather than as you say AI slop.
|
| If you want that sort of content maybe LinkedIn is a
| better place.
| remram wrote:
| "40% me" means "60% LLM"
| misnome wrote:
| Wasn't it "40% me" e.g. 60% LLM generated?
|
| "I supplied the ideas" is literally the first thing
| anyone caught out using chatgpt to do their homework
| says... I'd tend to believe someones first statement
| instead of the backpedal once they've been chastised for
| it.
| __mharrison__ wrote:
| Speaking for this "we", this was one of the best posts I
| read this week. (And I imagine that a lot of them were
| AI-assisted.)
| meeech wrote:
| Q: How do you ensure tests are only written by humans?
| Basically just the honor system?
| diwank wrote:
| You can:
|
| 1. Add instructions in CLAUDE.md to _not_ touch tests.
|
| 2. Disallow the Edit tool for test directories in the
| project's .claude/settings.json file
| meeech wrote:
| Disallow edit in test dirs is a good tip. thanks.
|
| I meant though in the wider context of the team - everyone
| uses it but not everyone will work the same, use the same
| underlying prompts as they work. So how do you ensure
| everyone keeps to that agreement?
| mathgeek wrote:
| > So how do you ensure everyone keeps to that agreement?
|
| There's nothing specific to using Claude or any other
| automation tool here. You still use code reviews,
| linters, etc. to catch anything that isn't following the
| team norms and expectations. Either that or, as the
| article points out, someone will cause an incident and
| may be looking for a new role (or nothing bad happens and
| no one is the wiser).
| davidmurdoch wrote:
| Why?
| peter422 wrote:
| Just to provide a contrast to some of the negative comments...
|
| As a very experienced engineer who uses LLMs sporadically* and
| not in any systematic way, I really appreciated seeing how you
| use them in production in a real project. I don't know why
| people are being negative, you just mentioned your project in
| details where it was appropriate to talk about the structure of
| it. Doesn't strike me as gratuitous self promotion at all.
|
| Your post is giving me motivation to empower the LLMs a little
| bit more in my workflows.
|
| *: They absolutely don't get the keys to my projects but I have
| had great success with having them complete specific tasks.
| diwank wrote:
| Really appreciate the kind words! I did not intend the post
| to be too much about our company, just that it is the
| codebase I mostly hack on. :)
| mafro wrote:
| Great post. I'm fairly new to the AI pair programming thing
| (I've been using Aider), but with 20 years of coding behind me
| I can see where things are going. You're dead right in the
| conclusion about now being the time to adopt this stuff as part
| of your flow -- if you haven't already.
|
| And regarding the HN post getting buried for a while
| there...[1] Somewhat ironic that an article about using AI to
| help write code would get canned for using an AI to help write
| it :D
|
| [1]: https://news.ycombinator.com/item?id=44214437
| kikimora wrote:
| Thanks for the great article, this is much needed to understand
| how to properly use LLM at scale.
|
| You mentioned that LLM should never touch tests. Then followed
| up with an example refactoring changing 500+ endpoints
| completed in 4 hours. This is impressive! I wonder if these 4
| hours included test refactoring as well or it is just prompting
| time?
| diwank wrote:
| that didn't include the testing, that def took a lot longer
| but at least now my devs don't have an excuse for poorly
| written tests lol
| localhost wrote:
| Did you use Claude Code to write the post? I'm finding that I'm
| using it for 100% of my own writing because agentic editing of
| markdown files is so good (and miles better than what you get
| with claude.ai artifacts or chatgpt.com canvas). This is how
| you can do things like merge deep research or other files into
| the doc that you are writing.
| diwank wrote:
| no, just used chatgpt to bootstrap the research :)
|
| here's the original chat: https://chatgpt.com/share/6844eaae-
| 07d0-8001-a7f7-e532d63bf8...
|
| I also used bits from claude research but apparently if you
| use claude research, they don't let you create a share link
| -_-
| localhost wrote:
| Right. But you can copy paste that into a separate doc and
| have Claude Code merge it in (and not a literal merge - a
| semantic merge "integrate relevant parts of this research
| into this doc"). This is super powerful - try it!
| r0b0ji wrote:
| At one place you mentioned, if a test is updated by AI, you
| reject the PR. How do you know if it was generated or updated
| by AI. From the article I only got that it's a git commit
| message convention to add that but that too is only at commit
| level.
| diwank wrote:
| Mostly just good faith during PR reviews. Plus other than
| Opus 4 models largely flub it and it shows
| noufalibrahim wrote:
| There are a lot of posts around but this was very practical and
| gives me a system i can try to implement and perhaps improve.
| Much appreciated. Thanks for taking the time to write it.
|
| One thing I would have liked to know is the difference between
| a workflow like this and the use of aider. If you have any
| perspective on that, it would be great.
| panny wrote:
| >Think of this post as your field guide to a new way of
| building software. By the time you finish reading, you'll
| understand not just the how but the why behind AI-assisted
| development that actually works.
|
| Hi, AI skeptic with an open-mind here. How much will this cost
| me to try? I don't see that mentioned in your writeup.
| djrockstar1 wrote:
| [flagged]
| diwank wrote:
| I'd say around ~40% me, the ideating, editing, citations, and
| images are all mine; rest Opus 4 :)
|
| I typically try to also include the original Claude chat's link
| in the post but it seems like Claude doesn't allow sharing
| chats with deep research used in them.
|
| See this series of posts for example, I have included the link
| right at the beginning: https://diwank.space/juleps-vision-
| levels-of-intelligence-pt...
|
| I completely get the critique and I already talked about it
| earlier: https://news.ycombinator.com/item?id=44213823
|
| _Update_ : here's an older chatgpt conversation while
| preparing this:
| https://chatgpt.com/share/6844eaae-07d0-8001-a7f7-e532d63bf8...
| Artoooooor wrote:
| I finally decided few days ago to try this Claude Code thing in
| my personal project. It's depressingly efficient. And damn
| expensive - I used over 10 dollars in one day. But I'm afraid it
| is inevitable - I will have to pay tax to AI overlords just to be
| able to keep my job.
| Syzygies wrote:
| I was looking at $2,000 a year and climbing, before Anthropic
| announce $100 and $200 Max subscriptions that bundled Claude
| Console and Claude Code. There are limits per five hour
| windows, but one can toggle back to metered API with the login/
| command, or just walk the dog. $100 a month has done me fine.
| diwank wrote:
| Same. I ran out on the $200 one too yesterday. It's
| skyrocketed after Opus 4. Nothing else comes close
| StefanBatory wrote:
| I had been musing over this. Will devs in very cheap countries
| still stay an attractive option, just because they'd be still
| cheaper monthly than Claude.
| diwank wrote:
| the cost per token for ~similar performance is dropping by a
| factor of 2 every 10-11 months at the moment, so I am not
| sure. that said, I think devs in less expensive parts of the
| world are actually picking up these tools the fastest (maybe
| from existential angst? idk)
| deadbabe wrote:
| We've stopped hiring devs in cheaper countries because their
| quality of output isn't much more than LLMs, but LLMs are
| faster and cheaper. Since we don't really trust cheap devs to
| ship big important features, the market for the kind of work
| we allowed them to do has been totally consumed by LLMs.
| thelittleone wrote:
| The cost on paper may be cheaper. But those options become
| less attractive when you take into account timezones,
| communication challenges, availability, scheduling, and the
| ever increasing coding performance of coding agents.
| wonger_ wrote:
| Some thoughts:
|
| - Is there a more elegant way to organize the
| prompts/specifications for LLMs in a codebase? I feel like
| CLAUDE.md, SPEC.mds, and AIDEV comments would get messy quickly.
|
| - What is the definition of "vibe-coding" these days? I thought
| it refers to the original Karpathy quote, like cowboy mode, where
| you accept all diffs and hardly look at code. But now it seems
| that "vibe-coding" is catch-all clickbait for any LLM workflow.
| (Tbf, this title "shipping real code with Claude" is fine)
|
| - Do you obfuscate any code before sending it to someone's LLM?
| diwank wrote:
| > - Is there a more elegant way to organize the
| prompts/specifications for LLMs in a codebase? I feel like
| CLAUDE.md, SPEC.mds, and AIDEV comments would get messy
| quickly.
|
| Yeah, the comments do start to pile up. I'm working on a vscode
| extension that automatically turns them into tiny visual
| indicators in the gutter instead.
|
| > - What is the definition of "vibe-coding" these days? I
| thought it refers to the original Karpathy quote, like cowboy
| mode, where you accept all diffs and hardly look at code. But
| now it seems that "vibe-coding" is catch-all clickbait for any
| LLM workflow. (Tbf, this title "shipping real code with Claude"
| is fine)
|
| Depends on who you ask ig. For me, hasn't been a panacea, and
| I've often run into issues (3.7 sonnet and codex have had ~60%
| success for me but Opus 4 is actually v good)
|
| > - Do you obfuscate any code before sending it to someone's
| LLM?
|
| In this case, all of it was open source to begin with but good
| point to think about.
| jstummbillig wrote:
| Is it really though, when a lot of critical business data
| goes through Google workspace (usually without client side
| encryption), or are we trying very hard to be a bit special
| in the name of privacy? From a result standpoint I find
| curious how interesting people deem their code base to be to
| a LLM provider.
| diwank wrote:
| true but this does matter to a lot of enterprise customers
| that have to obey strict data provenance laws (for
| instance, there are no gpt-4.1 model endpoints hosted in
| India and hence, fin-tech companies cannot use those apis)
| lispisok wrote:
| I think most of this is good stuff but I disagree with not
| letting Claude touch tests or migrations at all. Handing writing
| tests from scratch is the part I hate the most. Having an LLM do
| a first pass on tests which I add to and adjust as I see fit has
| been a big boon on the testing front. It seems the difference
| between me and the author is I believe whether code was generated
| by an LLM or not the human still takes ownership and
| responsibility. Not letting Claude touch tests and migrations is
| saying you rightfully dont trust Claude but are giving ownership
| to Claude for Claude generated code. That or he doesn't trust his
| employees to not blindly accept AI slop, the strict rules around
| tests and migrations is to prevent the AI slop from breaking
| everything or causing data loss.
| diwank wrote:
| True but, in my experience, a few major pitfalls that happened:
|
| 1. We ran into really bad minefields when we tried to come back
| to manually edit the generated tests later on. Claude tended to
| mock _everything_ because it didn't have context about how we
| run services, build environments, etc.
|
| 2. And this was the worst, all of the devs on the team
| including me got _realllyy_ lazy with testing. Bugs in
| production significantly increased.
| jaakl wrote:
| Did you try to put all this (complex and external) context to
| the context (claude.md or whatever), with intructions how to
| do proper TDD, before asking for the tests? I know that may
| be more work than actual coding it as you know all it by
| heart and external world is always bigger than internal one.
| But in long term and with teams/codebases with no good TDD
| practises that might end up with useful test iterations. Of
| course developer commiting the code is anyway responsible for
| it, so what I would ban is putting "AI did it" to the commits
| - it may mentally work as "get out of jail card" attempt for
| some.
| diwank wrote:
| we tried a few different variations but tbh had universally
| bad results. for example, we use `ward` test runner in our
| python codebase, and claude sonnet (both 3.7 and 4) keep
| trying to force-switch it to pytest lol. every. single.
| time.
|
| maybe we could either try this with opus 4 and hope that
| cheaper models catch up, or just drink the kool-aid and
| switch to pytest...
| ayewo wrote:
| I literally LOLed at #2, haha! LLMs are making devs lazy at
| scale :)
|
| Devs almost universally hate 3 things:
|
| 1. writing tests;
|
| 2. writing docs;
|
| 3. manually updating dependencies;
|
| and LLMs are a big boon wrt to helping us avoiding all 3, but
| forcing your team to pick writing tests is a sensible trade
| off in this context, since as you say bugs in prod increased
| significantly.
| diwank wrote:
| yeah, this might change in the future but I also found that
| since building features has become faster, asking devs to
| write the tests themselves sort of demands that they _take
| responsibility_ of the code and the potential bugs
| nilirl wrote:
| Lot of visual noise because of model specific comments. Or maybe
| that's just the examples here.
|
| But as a human, I do like the CLAUDE.md file. It's like
| documentation for dev reasoning and choices. I like that.
|
| Is this faster than old style codebases but with developers
| having the LLM chat open as they work? Seems like this ups the
| learning curve. The code here doesn't look very approachable.
| skerit wrote:
| Very interesting, I'm going to use some of these ideas in my
| CLAUDE.md file.
|
| > One of the most counterintuitive lessons in AI-assisted
| development is that being stingy with context to save tokens
| actually costs you more
|
| Something similar I've been thinking about recently: For bigger
| projects & more complicated code, I really do notice a big
| difference between Claude Opus and Claude Sonnet. And Sonnet
| sometimes just wastes so much time on ideas that never pan out,
| or make things worse. So I wonder: wouldn't it make more sense
| for Anthropic to not differentiate between Opus and Sonnet for
| people with a Max subscription? It seems like Sonnet takes 10-20
| turns what Opus can do in 2 or 3, so in the end forcing people
| over to Sonnet would ultimately cost them more.
| diwank wrote:
| yeah, the max subscription comes in two tiers, $100 gets you 5x
| the tokens than pro (which only has sonent) and the $200 gets
| you 20x. doing the math for tokens is kinda annoying and not
| straightforward atm. they also have a "hybrid" mode which uses
| opus until ~20% tokens for opus left then switches to sonnet
| dkobia wrote:
| Thank you for writing this. Many software developers on HN are
| conflicted about ceding control of software development to LLMs
| for many reasons including the fact that it feels unstructured
| and exploratory rather than rigidly planned using more formal
| methodologies.
|
| There's a good middle ground where the LLMs can help us solve
| problems faster optimizing for outcomes rather than falling in
| love with solving the problem. Many of us usually lose sight of
| the actual goal we're trying to achieve when we get distracted by
| implementation details.
| diwank wrote:
| absolutely! I think of these as new levers in the making,
| rather rusty, and def can often bite you in the behind but
| worth learning, and perhaps most importantly to help evolve
| them into useful tools rather than an excuse to ship sloppy
| engineering
| nxobject wrote:
| As I read (and appreciate) your documentation tips, I don't even
| think they have to be labeled as AI-specific! 'CLAUDE.md' seems
| likely it could just as be 'CONVENTIONS.md', and comments to 'AI'
| could just as be comments to 'READER'. :) Certainly I would
| appreciate comments like that when reading _any_ codebase,
| especially as an unfamiliar contributor.
| SatvikBeri wrote:
| Not the OP, but in practice, the comments that tend to help
| Claude tend to be very different from what helps humans. With
| humans, I'll generally focus on the why.
|
| Our style guide for humans is about 100 lines long, with lines
| like "Add a ! to the end of a function name if and only if it
| mutates one of its inputs". Our style guide for Claude is ~500
| lines long, and equivalent sections have to include many
| examples like "do this, don't do this" to work.
| __mharrison__ wrote:
| I just tried this out with aider. It worked great. Vibe coded a
| PDF viewer with drawing capabilities in 30 minutes while
| waiting for a plane...
| remram wrote:
| The 2.3 MB picture at the top loaded comically slowly even on
| wifi.
| mgraczyk wrote:
| Good post, the only part I think I disagree with is
|
| > Never. Let. AI. Write. Your. Tests.
|
| AI writes all of my tests now, but I review them all carefully.
| Especially for new code, you have to let the AI write tests if
| you want it to work autonomously. I explicitly instruct the AI to
| write tests and make sure they pass before stopping. I usually
| review these tests while the AI is implementing the code to make
| sure they make sense and cover important cases. I add more cases
| if they are inadequate.
| m_kos wrote:
| Good read - I have learned something new.
|
| I have been having a horrible experience with Sonnet 4 via Cursor
| and Web. It keeps cutting corners and misreporting what it did.
| These are not hallucinations. Threatening it with deletion
| (inspired by Anthropic's report) only makes things worse.
|
| It also pathologically lies about non-programming things. I tried
| reporting it but the mobile app says "Something went wrong.
| Please try again later." Very bizarre.
|
| Am I the only person experiencing these issues? Many here seem to
| adore Claude.
| threeseed wrote:
| Routinely had every one of these issues.
|
| I find it's much better just to use Claude Web and be extremely
| specific about what I need it to do.
|
| And even then half the code it generates for me is riddled with
| errors.
| __mharrison__ wrote:
| Thanks for the post. It's very interesting time navigating the
| nascent field of AI assisted development.
|
| Curious if the author (or others) tried other tools / models.
___________________________________________________________________
(page generated 2025-06-08 23:00 UTC)