[HN Gopher] Writing a good Claude.md
___________________________________________________________________
Writing a good Claude.md
Author : objcts
Score : 696 points
Date : 2025-11-30 17:56 UTC (1 days ago)
(HTM) web link (www.humanlayer.dev)
(TXT) w3m dump (www.humanlayer.dev)
| eric-burel wrote:
| "You can investigate this yourself by putting a logging proxy
| between the claude code CLI and the Anthropic API using
| ANTHROPIC_BASE_URL" I'd be eager to read a tutorial about that I
| never know which tool to favour for doing that when you're not a
| system or network expert.
| fishmicrowaver wrote:
| Have you considered just asking claude? I'd wager you'd get up
| and running in <10 minutes.
| dhorthy wrote:
| agree - i've had claude one-shot this for me at least 10
| times at this point cause i'm too lazy to lug whatever code
| around. literally made a new one this morning
| eric-burel wrote:
| AI is good for discovery but not validation, I wanted
| experienced human feedback here
| 0xblacklight wrote:
| Hi, post author here
|
| We used cloudflare's AI gateway which is pretty simple. Set one
| up, get the proxy URL and set it through the env var, very
| plug-and-play
| eric-burel wrote:
| Smart, thanks for the tip
| Havoc wrote:
| Just install mitmproxy. Takes like 5 mins to figure out. 2 with
| Claude.
|
| On phone else I'd post commands
| jasonjmcghee wrote:
| Interesting selection of models for the "instruction count vs.
| accuracy" plot. Curious when that was done and why they chose
| those models. How well does ChatGPT 5/5.1 (and codex/mini/nano
| variants), Gemini 3, Claude Haiku/Sonnet/Opus 4.5, recent grok
| models, Kimi 2 Thinking etc (this generation of models) do?
| alansaber wrote:
| Guessing they included some smaller models just to show how
| they dump accuracy at smaller context sizes
| jasonjmcghee wrote:
| Sure - I was more commenting that they are all > 6 months
| old, which sounds silly, but things have been changing fast,
| and instruction following is definitely an area that has been
| developing a lot recently. I would be surprised if accuracy
| drops off that hard still.
| 0xblacklight wrote:
| I imagine it's highly-correlated to parameter count, but
| the research is a few months old and frontier model
| architecture is pretty opaque so hard to draw too too many
| conclusions about newer models that aren't in the study
| besides what I wrote in the post
| vladsh wrote:
| What is a good Claude.md?
| testdelacc1 wrote:
| Claude.md - A markdown file you add to your code repository to
| explain how things work to Claude.
|
| A good Claude.md - I don't know, presumably the article
| explains.
| andersco wrote:
| I have found enabling the codebase itself to be the "Claude.md"
| to be most effective. In other words, set up effective automated
| checks for linting, type checking, unit tests etc and tell Claude
| to always run these before completing a task. If the agent keeps
| doing something you don't like, then a linting update or an
| additional test often is more effective than trying to tinker
| with the Claude.md file. Also, ensure docs on the codebase are up
| to date and tell Claude to read relevant parts when working on a
| task and of course update the docs for each new task. YMMV but
| this has worked for me.
| Aeolun wrote:
| > Also, ensure docs on the codebase are up to date and tell
| Claude to read relevant parts when working on a task
|
| Yeah, if you do this every time it works fine. If you add what
| you tell it every time to CLAUDE.md, it also works fine, but
| you don't have to tell it any more ;)
| Havoc wrote:
| > Claude.md
|
| It's case sensitive btw. CLAUDE.md - Might explain your mixed
| results with it
| prettyblocks wrote:
| The advice here seems to assume a single .md file with
| instructions for the whole project, but the AGENTS.md methodology
| as supported by agents like github copilot is to break out more
| specific AGENTS.md files in the subdirectories in your code base.
| I wonder how and if the tips shared change assuming a flow with a
| bunch of focused AGENTS.md files throughout the code.
| 0xblacklight wrote:
| Hi, post author here :)
|
| I didn't dive into that because in a lot of cases it's not
| necessary and I wanted to keep the post short, but for large
| monorepos it's a good idea
| btbuildem wrote:
| It seems overall a good set of guidelines. I appreciate some of
| the observations being backed up by data.
|
| What I find most interesting is how a hierarchical / recursive
| context construct begins to emerge. The authors' note of "root"
| claude.md as well as the opening comments on LLMs being stateless
| ring to me like a bell. I think soon we will start seeing
| stateful LLMs, via clever manipulation of scope and context.
| Something akin to memory, as we humans perceive it.
| _pdp_ wrote:
| There is far much easier way to do this and one that is perfectly
| aligned with how these tools work.
|
| It is called documenting your code!
|
| Just write what this file is supposed to do in a clear concise
| way. It acts as a prompt, it provides much needed context
| specific to the file and it is used only when necessary.
|
| Another tip is to add README.md files where possible and where it
| helps. What is this folder for? Nobody knows! Write a README.md
| file. It is not a rocket science.
|
| What people often forget about LLMs is that they are largely
| trained on public information which means that nothing new needs
| to be invented.
|
| You don't have to "prompt it just the right way".
|
| What you have to do is to use the same old good best practices.
| dhorthy wrote:
| For the record I do think the AI community tries to
| unnecessarily reinvent the wheel on crap all the time.
|
| sure, readme.md is a great place to put content. But there's
| things I'd put in a readme that I'd never put in a claude.md if
| we want to squeeze the most out of these models.
|
| Further, claude/agents.md have special quality-of-life
| mechanics with the coding agent harnesses like e.g. `injecting
| this file into the context window whenever an agent touches
| this directory, no matter whether the model wants to read it or
| not`
|
| > What people often forget about LLMs is that they are largely
| trained on public information which means that nothing new
| needs to be invented.
|
| I don't think this is relevant at all - when you're working
| with coding agents, the more you can finesse and manage every
| token that goes into your model and how its presented, the
| better results you can get. And the public data that goes into
| the models is near useless if you're working in a complex
| codebase, compared to the results you can get if you invest
| time into how context is collected and presented to your agent.
| theshrike79 wrote:
| > For the record I do think the AI community tries to
| unnecessarily reinvent the wheel on crap all the time.
|
| On Reddit's LLM subreddits people are rediscovering the very
| basics of software project management as some massive
| insights daily or very least weekly.
|
| Who would've guessed that proper planning, accessible and up
| to documentation and splitting tasks into manageable testable
| chunks produces good code? Amazing!
|
| Then they write a massive blog post or even some MCP
| mostrosity for it and post it everywhere as a new discovery
| =)
| dkubb wrote:
| I can totally understand where you are coming from with
| this comment. It does feel a bit frustrating that people
| are rediscovering things that were written in books
| 30/40/50 years ago.
|
| However, I think this is awesome for the industry. People
| are rediscovering basic things, but if they didn't know
| about the existing literature this is a perfect opportunity
| to refer them to it. And if they were aware, but maybe not
| practicing it, this is a great time for the ideas to be
| reinforced.
|
| A lot of people, myself included, never really understand
| which practices are important or not until we were forced
| to work on a system that was most definitely not written
| with any good practices in mind.
|
| My current view of agentic coding is that it's forcing an
| entire generation of devs to learn software project
| management _or_ drowning under the mountain of debt an LLM
| can produce. Previously it took much longer to feel the
| weight of bad decisions in a project but an LLM allows you
| to speed-run this process in a few weeks or months.
| bastawhiz wrote:
| This is missing the point. If I want to instruct Claude to
| never write a database query that doesn't hit a preexisting
| index, where exactly am I supposed to document that? You can
| either choose:
|
| 1. A centralized location, like a README (congrats, you've just
| invented CLAUDE.md)
|
| 2. You add a docs folder (congrats, you've just done exactly
| what the author suggests under Progressive Disclosure)
|
| Moreover, you can't just do it all in a README, for the exact
| reasons that the author lays out under "CLAUDE.md file length &
| applicability".
|
| CLAUDE.md simply isn't about telling Claude what all the parts
| of your code are and how they work. You're right, that's what
| documenting your code is for. But even if you have READMEs
| everywhere, Claude has no idea where to put code when it starts
| a new task. If it has to read all your documentation every time
| it starts a new task, you're needlessly burning tokens. The
| whole point is to give Claude important information up front
| _so it doesn 't have to_ read all your docs and fill up its
| context window searching for the right information on every
| task.
|
| Think of it this way: incredibly well documented code has
| everything a new engineer needs to get started on a task, yes.
| But this engineer has amnesia and forgets everything it's
| learned after every task. Do you want them to have to reonboard
| from scratch every time? No! You structure your docs in a way
| so they don't have to start from scratch every time. This is an
| accommodation: humans don't need this, for the most part,
| because we don't reonboard to the same codebase over and over.
| And so yes, you do need to go above and beyond the "same old
| good best practices".
| _pdp_ wrote:
| You put a warning where it is most likely to be seen by a
| human coder.
|
| Besides, no amount of prompting will prevent this situation.
|
| If it is a concern then you put a linter or unit tests to
| prevent it altogether, or make a wrapper around the tricky
| function with some warning in its doc strings.
|
| I don't see how this is any different from how you typically
| approach making your code more resilient to accidental
| mistakes.
| mvkel wrote:
| Documenting for AI exactly like you would document for a
| human is ignoring how these tools work
| anonzzzies wrote:
| But they are right, claude routinely ignores stuff from
| CLAUDE.md, even with warning bells etc. You need a linter
| preventing things. Like drizzle sql` templates: it just
| loves them.
| CuriouslyC wrote:
| You can make affordances for agent abilities without
| deviating from what humans find to be good documentation.
| Use hyperlinks, organize information, document in layers,
| use examples, be concise. It's not either/or unless
| you're being lazy.
| notachatbot123 wrote:
| Sounds like we should call them tools, not AI!
| theshrike79 wrote:
| Agentic AI is LLMs using tools in a loop to achieve a
| goal.
|
| Needs a better term than "AI", I agree, but it's 99%
| marketing the tech will stay the same.
| bastawhiz wrote:
| > no amount of prompting will prevent this situation.
|
| Again, missing the point. If you don't prompt for it and
| you document it in a place where the tool won't look first,
| the tool simply won't do it. "No amount of promoting"
| couldn't be more wrong, it works for me and all my
| coworkers.
|
| > If it is a concern then you put a linter or unit tests to
| prevent it altogether
|
| Sure, and then it'll always do things it's own way, run the
| tests, and have to correct itself. Needlessly burning
| tokens. But if you want to pay for it to waste its time and
| yours, go for it.
|
| > I don't see how this is any different from how you
| typically approach making your code more resilient to
| accidental mistakes.
|
| It's not about avoiding mistakes! It's about having it
| follow the norms of your codebase.
|
| - My codebase at work is slowly transitioning from Mocha to
| Jest. I can't write a linter to ban new mocha tests, and it
| would be a pain to keep a list of legacy mocha test suites.
| The solution is to simply have a bullet point in the
| CLAUDE.md file that says "don't write new Mocha test
| suites, only write new test suites in Jest". A more robust
| solution isn't necessary and doesn't avoid mistakes, it
| avoids the extra step of telling the LLM to rewrite the
| tests.
|
| - We have a bunch of terraform modules for convenience when
| defining new S3 buckets. No amount of documenting the
| modules will have Claude magically know they exist. You
| tell it that there are convenience modules and to consider
| using them.
|
| - Our ORM has findOne that returns one record or null. We
| have a convenience function getOne that returns a record or
| throws a NotFoundError to return a 404 error. There's no
| way to exhaustively detect with a linter that you used
| findOne and checked the result for null and threw a
| NotFoundError. And the hassle of maybe catching some
| instances isn't necessary, because avoiding it is just one
| line in CLAUDE.md.
|
| It's really not that hard.
| girvo wrote:
| > There's no way to exhaustively detect with a linter
| that you used findOne and checked the result for null and
| threw a NotFoundError
|
| Yes there is? Though this is usually better served with a
| type checker, it's still totally feasible with a linter
| too if that's your bag
|
| > because avoiding it is just one line in CLAUDE.md.
|
| Except no, it isn't, because these tools still ignore
| that line sometimes so I _still_ have to check for it
| myself.
| gitgud wrote:
| > _1. A centralized location, like a README (congrats, you
| 've just invented CLAUDE.md)_
|
| README files are not a new concept, and have been used in
| software for like 5 decades now, whereas CLAUDE.md files were
| invented 12 months ago...
| callc wrote:
| This CLAUDE.md dance feels like herding cats. Except we're
| herding a really good autocorrect encyclopedic parrot. Sans
| intelligence
|
| Relating / personifying LLM to an engineer doesn't work out
|
| Maybe the best though model currently is just "good way to
| automate trivial text modifications" and "encyclopedic
| ramblings"
| saturatedfat wrote:
| unfair characterization.
|
| think about how this thing is interacting with your
| codebase. it can read one file at a time. sections of
| files.
|
| in this UX, is it ergonomic to go hunting for patterns and
| conventions? if u have to linearly process every single
| thing u look at every time you do something, how are you
| supposed to have "peripheral vision"? if you have amnesia,
| how do you continue to do good work in a codebase given
| you're a skilled engineer?
|
| it is different from you. that is OK. it doesn't mean its
| stupid. it means it needs different accomodations to
| perform as well as you do. accomodations IRL exist for a
| reason, different people work differently and have
| different strengths and weaknesses. just like humans, you
| get the most out of them if you meet and work with them
| from where they're at.
| victorbuilds wrote:
| Learned this the hard way. Asked Claude Code to run a
| database migration. It deleted my production database
| instead, then immediately apologised and started panicking
| trying to restore it.
|
| Thankfully Azure keeps deleted SQL databases recoverable, so
| I got it back in under an hour. But yeah - no amount of
| CLAUDE.md instructions would have prevented that. It no
| longer gets prod credentials.
| theshrike79 wrote:
| 1. Create a tool that can check if a query hits a prexisting
| index
|
| In step 2 either force Claude to use it (hooks) or suggest it
| (CLAUDE.md)
|
| 3. Profit!
|
| As for "where stuff is", for anything more complex I have a
| tree-style graph in CLAUDE.md that shows the rough categories
| of where stuff is. Like the handler for letterboxd is in
| cmd/handlerletterboxd/ and internal modules are in internal/
|
| Now it doesn't need to go in blind but can narrow down
| searches when I tell it to "add director and writer to the
| letterboxd handler output".
| johnfn wrote:
| So how exactly does one "write what this file is supposed to do
| in a clear concise way" in a way that is quickly comprehensible
| to AI? The gist of the article is that when your audience
| changes from "human" to "AI" the manner in which you write
| documentation changes. The article is fairly high quality, and
| presents excellent evidence that simply "documenting your code"
| won't get you as far as the guidelines it provides.
|
| Your comment comes off as if you're dispensing common-sense
| advice, but I don't think it actually applies here.
| 0xblacklight wrote:
| I think you're missing that CLAUDE.md is deterministically
| injected into the model's context window
|
| This means that instead of behaving like a file the LLM reads,
| it effectively lets you customize the model's prompt
|
| I also didn't write that you have to "prompt it just the right
| way", I think you're missing the point entirely
| datacynic wrote:
| Writing documentation for LLMs is strangely pleasing because
| you have very linear returns for every bit of effort you spend
| on improving its quality and the feedback loop is very tight.
| When writing for humans, especially internal documentation,
| I've found that these returns are quickly diminishing or even
| negative as it's difficult to know if people even read it or if
| they didn't understand it or if it was incomplete.
| avereveard wrote:
| Well, no. You run pretty fast into context limit (or attention
| limit for long context models) And the model understand pretty
| well what code does without documentation.
|
| Theres also a question of processes. How to format code what
| style of catching to use and how to run the tests, which human
| keep on the bacl of their head after reading it once or twice
| but need a constant reminder for llm whose knowledge lifespan
| is session limited
| uncletaco wrote:
| I'm pretty sure Claude would not work well in my code base if
| I hadn't meticulously added docstrings, type hints, and
| module level documentation. Even if you're stubbing out code
| for later implementation, it helps to go ahead and document
| it so that a code assistant will get a hint of what to do
| next.
| candiddevmike wrote:
| None of this should be necessary if these tools did what they say
| on the tin, and most of this advice will probably age like milk.
|
| Write readmes for humans, not LLMs. That's where the ball is
| going.
| 0xblacklight wrote:
| Hi, post author here :)
|
| Yes README.md should still be written for humans and isn't
| going away anytime soon.
|
| CLAUDE.md is a convention used by claude code, and AGENTS.md is
| used by other coding agents. Both are intended to be
| supplemental to the README and are deterministically injected
| into the agent's context.
|
| It's a configuration point for the harness, it's not intended
| to replace the README.
|
| Some of the advice in here will undoubtedly age poorly as
| harnesses change and models improve, but some of the generic
| principles will stay the same - e.g. that you shouldn't use an
| LLM to do a linter &formatter's job, or that LLMs are stateless
| and need to be onboarded into the codebase, and having some
| deterministically-injected instructions to achieve that is
| useful instead of relying on the agent to non-deterministically
| derive all that info by reading config and package files
|
| The post isn't really intended to be super forward-looking as
| much as "here's how to use this coding agent harness
| configuration point as best as we know how to right now"
| teiferer wrote:
| > you shouldn't use an LLM to do a linter &formatter's job,
|
| Why is that good advice? If that thing is eventually supposed
| to do the most tricky coding tasks, and already a year ago
| could have won a medal at the informatics olympics, then why
| wouldn't it eventually be able to tell if I'm using 2 or 4
| spaces and format my code accordingly? Either it's going to
| change the world, then this is a trivial task, or it's all
| vaporware, then what are we even discussing..
|
| > or that LLMs are stateless and need to be onboarded into
| the codebase
|
| What? Why would that be a reasonable assumption/prediction
| for even near term agent capabilities? Providing it with some
| kind of local memory to dump its learned-so-far state of the
| world shouldn't be too hard. Isn't it supposed to already be
| treated like a junior dev? All junior devs I'm working with
| remember what I told them 2 weeks ago. Surely a coding agent
| can eventually support that too.
|
| This whole CLAUDE.md thing seems a temporary kludge until
| such basic features are sorted out, and I'm seriously
| surprised how much time folks are spending to make that early
| broken state less painful to work with. All that precious
| knowledge y'all are building will be worthless a year or two
| from now.
| cruffle_duffle wrote:
| The stateless nature of Claude code is what annoys me so
| much. Like it has to spend so much time doing repetitious
| bootstraps. And how much it "picks up and propagates"
| random shit it finds in some document it wrote. It will
| echo back something it wrote that "stood out" and I'll
| forget where it got that and ask "find where you found that
| info so we can remove it." And it will do so but somehow
| mysteriously pick it up again and it will be because of
| some git commit message or something. It's like a tune
| stuck in its head or something only it's sticky for LLMs
| not humans.
|
| And that describes the issues I had with "automatic
| memories" features things like ChatGPT had. Turns out it is
| an awful judge of things to remember. Like it would make
| memories like "cruffle is trying to make pepper soup with
| chicken stock"! Which it would then parrot back to me at
| some point 4 months later and I'd be like "WTF I figured it
| out". The "# remember this" is much more powerful because
| know how sticky this stuff gets and id rather have it over
| index on my own forceful memories than random shit it
| decided.
|
| I dunno. All I'm saying is you are right. The future is in
| having these things do a better job of remembering. And I
| don't know if LLMs are the right tool for that. Keyword
| search isn't either though. And vector search might not be
| either--I think it suffers from the same kinds of "catchy
| tune attack" an LLM might.
|
| Somebody will figure it out somehow.
| lijok wrote:
| > All junior devs I'm working with remember what I told
| them 2 weeks ago
|
| That's why they're junior
| alwillis wrote:
| > Then why wouldn't it eventually be able to tell if I'm
| using 2 or 4 spaces and format my code accordingly?
|
| It's not that an agent doesn't know if you're using 2 or 4
| spaces in your code; it comes down to:
|
| - there are many ways to ensure your code is formatted
| correctly; that's what .editorconfig [1] is for.
|
| - in a halfway serious project, incorrectly formatted code
| shouldn't reach the LLM in the first place
|
| - tokens are relatively cheap but they're not free on a
| paid plan; why spend tokens on something linters and
| formatters can do deterministically and for free?
|
| If you wanted Claude Code to handle linting automatically,
| you're better off taking that out of CLAUDE.md and creating
| a Skill [2].
|
| > What? Why would that be a reasonable
| assumption/prediction for even near-term agent
| capabilities? Providing it with some kind of local memory
| to dump its learned-so-far state of the world shouldn't be
| too hard. Isn't it supposed to already be treated like a
| junior dev? All junior devs I'm working with remember what
| I told them 2 weeks ago. Surely a coding agent can
| eventually support that too.
|
| It wasn't mentioned in the article, but Claude Code, for
| example, does save each chat session by default. You can
| come back to a project and type `claude --resume` and
| you'll get a list of past Claude Code sessions that you can
| pick up from where you left off.
|
| [1]: https://editorconfig.org
|
| [2]: https://code.claude.com/docs/en/skills
| Zerot wrote:
| > Why is that good advice? If that thing is eventually
| supposed to do the most tricky coding tasks, and already a
| year ago could have won a medal at the informatics
| olympics, then why wouldn't it eventually be able to tell
| if I'm using 2 or 4 spaces and format my code accordingly?
| Either it's going to change the world, then this is a
| trivial task, or it's all vaporware, then what are we even
| discussing..
|
| This is the exact reason for the advice: The LLM already is
| able to follow coding conventions by just looking at the
| surrounding code which was already included in the context.
| So by adding your coding conventions to the claude.md, you
| are just using more context for no gain.
|
| And another reason to not use an agent for
| linting/formatting(i.e. prompting to "format this code for
| me") is that dedicated linters/formatters are faster and
| only take maybe a single cent of electricity to run whereas
| using an LLM to do that job will cost multiple dollars if
| not more.
| rootusrootus wrote:
| Ha, I just tell Claude to write it. My results have been
| generally fine, but I only use Claude on a simple codebase that
| is well documented already. Maybe I will hand-edit it to see if I
| can see any improvements.
| serial_dev wrote:
| I'm sure I'm just working like a caveman, but I simply highlight
| the relevant code, add it to the chat, and talk to these tools as
| if they were my colleagues and I'm getting pretty good results.
|
| About 12 to 6 months ago this was not the case (with or without
| .md files), I was getting mainly subpar result, so I'm assuming
| that the models have improved a lot.
|
| Basically, I found that they not make that much of a difference,
| the model is either good enough or not...
|
| I know (or at least I suppose) that these markdown files could
| bring some marginal improvements, but at this point, I don't
| really care.
|
| I assume this is an unpopular take because I see so many people
| treat these files as if they were black magic or silver bullet
| that 100x their already 1000x productivity.
| vanviegen wrote:
| > I simply highlight the relevant code, add it to the chat, and
| talk to these tools
|
| Different use case. I assume the discussion is about having the
| agent implement whole features or research and fix bugs without
| much guidance.
| 0xblacklight wrote:
| Yep it is opinionated for how to get coding agents to solve
| hard problems in complex brownfield codebases which is what
| we are focused on at humanlayer :)
| rmnclmnt wrote:
| Matches my experience also. Bothered only once to setup a
| proper CLAUDE.md file, and now never do it. Simply refering to
| the context properly for surgical recommendations and edit
| works relatively well.
|
| It feels a lot like bikeshedding to me, maybe I'm wrong
| wredcoll wrote:
| How about a list of existing database tables/columns so you
| don't need to repeat it each time?
| anonzzzies wrote:
| Claude code figures that out at startup every time. Never had
| issues with it.
| theshrike79 wrote:
| You can save some precious context by having it somewhere
| without it having to figure it out from scratch every time.
| HDThoreaun wrote:
| Do you not use a model file for your orm?
| wredcoll wrote:
| ORMs are generally a bad idea, so.. hopefully not?
| girvo wrote:
| Even without the explicit magic ORMs, with data mapper
| style query builders like Kysely and similar, I still
| find I need to marshall selected rows into objects to,
| yknow, do things with them in a lot of cases.
|
| Perhaps a function of GraphQL though.
| wredcoll wrote:
| Sure, but that's not the same thing. For example, whether
| or not you have to redeclare your entire database schema
| in a custom ORM language in a different repo.
| mattmanser wrote:
| This isn't the 00s any more.
| girvo wrote:
| I gave it a tool to execute to get that info if required, but
| it mostly doesn't need to due to Kysely migration files and
| the database type definition being enough.
| jwpapi wrote:
| === myExperience
| gonzalohm wrote:
| Probably a lot of people here disagree with this feeling. But my
| take is that if setting up all the AI infrastructure and
| onboarding to my code is going to take this amount of effort,
| then I might as well code the damn thing myself which is what I'm
| getting paid to (and enjoy doing anyway)
| vanviegen wrote:
| Perhaps. But keep in mind that the setup work is typically
| mostly delegated to LLMs as well.
| fragmede wrote:
| Whether it's setting up AI infrastructure or configuring
| Emacs/vim/VSCode, the important distinction to make is if the
| cost has to be paid continually, or if it's a one
| time/intermittent cost. If I had to configure my shell/git
| aliases every time I booted my computer, I wouldn't use them,
| but seeing as how they're saved in config files, they're pretty
| heavily customized by this point.
|
| Don't use AI if you don't want to, but "it takes too much
| effort to set up" is an excuse printf debuggers use to avoid
| setting up a debugger. Which is a whole other debate though.
| bird0861 wrote:
| I fully agree with this POV but for one detail; there is a
| problem with sunsetting frontier models. As we begin to adopt
| these tools and build workflows with them, they become pieces
| of our toolkit. We depend on them. We take them for granted
| even. And then the model either changes (new checkpoints,
| maybe alignment gets fiddled with) and all of the sudden
| prompts no longer yield the same results we expected from
| them after working on them for quite some time. I think the
| term for this is "prompt instability". I felt this with
| Gemini 3 (and some people had less pronounced but similar
| experience with Sonnet releases after 3.7) which for certain
| tasks that 2.5Pro excelled at..it's just unusable now. I was
| already a local model advocate before this but now I'm a
| local model zealot. I've stopped using Gemini 3 over this.
| Last night I used Qwen3 VL on my 4090 and although it was not
| perfect (sycophancy, overuse of certain cliches...nothing I
| can't get rid of later with some custom promptsets and a few
| hours in Heretic) it did a decent enough job of helping me
| work through my blindspots in the UI/UX for a project that I
| got what I needed.
|
| If we have to perform tuning on our prompts ("skills",
| agents.md/claude.md, all of the stuff a coding assistant
| packs context with) every model release then I see new model
| releases becoming a liability more than a boon.
| kissgyorgy wrote:
| I strongly disagree with the author not using /init. It takes a
| minute to run and Claude provides surprisingly good results.
| alwillis wrote:
| /init has evolved since the early day; it's more concise than
| it used to be.
| 0xblacklight wrote:
| If you find it works for you, then that's great! This post is
| mostly from our learnings from getting it to solve hard
| problems in complex brownfield codebases where auto
| generation is almost never sufficient.
| nvarsj wrote:
| It really doesn't take that much effort. Like any tool, people
| can over-optimise on the setup rather than just use it.
| nichochar wrote:
| The effort described in the article is maybe a couple hours of
| work.
|
| I understand the "enjoy doing anyway" part and it resonates,
| but not using AI is simply less productive.
| TheRoque wrote:
| > but not using AI is simply less productive
|
| Some studies shows the opposite for experienced devs. And it
| also shows that developers are delusional about said
| productivity gains:
| https://metr.org/blog/2025-07-10-early-2025-ai-
| experienced-o...
|
| If you have a counter-study (for experienced devs, not
| juniors), I'd be curious to see. My experience also has been
| that using AI as part of your main way to produce code, is
| not faster when you factor in everything.
| ares623 wrote:
| Curious why there hasn't been a rebuttal study to that one
| yet (or if there is I haven't seen it come up). There must
| be near infinite funding available to debunk that study
| right?
| bird0861 wrote:
| That study is garbo and I suspect you didn't even read the
| abstract. Am I right?
| gravypod wrote:
| I've heard this mentioned a few times. Here is a
| summarized version of the abstract: >
| ... We conduct a randomized controlled trial (RCT)
| > ... AI tools ... affect the productivity of experienced
| > open-source developers. 16 developers with moderate AI
| > experience complete 246 tasks in mature projects on
| which they > have an average of 5 years of prior
| experience. Each task is > randomly assigned to
| allow or disallow usage of early-2025 AI > tools.
| ... developers primarily use Cursor Pro ... and >
| Claude 3.5/3.7 Sonnet. Before starting tasks, developers
| forecast that allowing > AI will reduce
| completion time by 24%. After completing the >
| study, developers estimate that allowing AI reduced
| completion time by 20%. > Surprisingly, we find
| that allowing AI actually increases > completion
| time by 19%--AI tooling slowed developers down. This
| > slowdown also contradicts predictions from experts in
| economics > (39% shorter) and ML (38% shorter).
| To understand this result, > we collect and
| evaluate evidence for 21 properties of our setting
| > that a priori could contribute to the observed slowdown
| effect--for > example, the size and quality
| standards of projects, or prior > developer
| experience with AI tooling. Although the influence of
| > experimental artifacts cannot be entirely ruled out,
| the robustness > of the slowdown effect across
| our analyses suggests it is unlikely > to
| primarily be a function of our experimental design.
|
| So what we can gather:
|
| 1. 16 people were randomly given tasks to do
|
| 2. They knew the codebase they worked on pretty well
|
| 3. They said AI would help them work 24% faster (before
| starting tasks)
|
| 4. They said AI made them ~20% faster (after completion
| of tasks)
|
| 5. ML Experts claim that they think programmers will be
| ~38% faster
|
| 6. Economists say ~39% faster.
|
| 7. We measured that people were actually 19% slower
|
| This seems to be done on Cursor, with big models, on
| codebases people know. There are definitely problems with
| industry-wide statements like this but I feel like the
| biggest area AI tools help me is if I'm working on
| something I know nothing about. For example: I am really
| bad at web development so CSS / HTML is easier to edit
| through prompts. I don't have trouble believing that I
| would be slower trying to make an edit to code that I
| already know how to make.
|
| Maybe they would see the speedups by allowing the
| engineer to select when to use the AI assistance and when
| not to.
| saturatedfat wrote:
| it doesnt control for skill using models/experience using
| models. this looks VERY different at hour 1000 and hour
| 5000 than hour 100.
| brumar wrote:
| Lazy from me to not check if I remember well or not, but
| the dev that got productivity gains was a regular user of
| cursor.
| svachalek wrote:
| Minutes really, despite what the article says you can get 90%
| of the way there by telling Claude how you want the project
| documentation structured and just let it do it. Up to you if
| you really want to tune the last 10% manually, I don't. I
| have been using basically the same system and when I tell
| Claude to update docs it doesn't revert to one big Claude.md,
| it maintains it in a structure like this.
| globular-toast wrote:
| It's a couple of hours right now, then another couple of
| hours "correcting" the AI when it still goes wrong, another
| couple of hours tweaking the file again, another couple of
| hours to update when the model changes, another couple of
| hours when someone writes a new blog post with another method
| etc.
|
| There's a huge difference between investing time into a
| deterministic tool like a text editor or programming language
| and a moving target like "AI".
|
| The difference between programming in Notepad in a language
| you don't know and using "AI" will be huge. But the
| difference between being fluent in a language and having a
| powerful editor/IDE? Minimal at best. I actually think
| productivity is worse because it tricks you into wasting time
| via the "just one more roll" (ie. gambling) mentality. Not to
| mention you're not building that fluency or toolkit for
| yourself, making you barely more valuable than the "AI"
| itself.
| fragmede wrote:
| You say that as if tech hasn't always been a moving target
| anyway. The skills I spent months learning a specific
| language and IDE became obsolete with the next job and the
| next paradigm shift. That's been one of the few consistent
| themes throughout my career. Hours here and there, spread
| across months and years, just learning whatever was new.
| Sometimes, like with Linux, it really paid off. Other
| times, like PHP, it did, and then fizzled out.
|
| --
|
| The other thing is, this need for determinism bewilders me.
| I mean, I get where it comes from, we want nice,
| predictable reliable machines. But how deterministic does
| it need to be? If today, it decides to generate code and
| the variable is called fileName, and tomorrow it's
| filePath, as long as it's passing tests, what do I care
| that it's not totally deterministic and the names of the
| variables it generates are different? as long as it's
| consistent with existing code, and it passes tests, whats
| the importance of it being deterministic to a computer
| science level of rigor? It reminds me about the travelling
| salesman problem, or the knapsack problem. Both NP hard,
| but users don't care about that. They just want the
| computer to tell them something good enough for them to go
| on about their day. So if a customer comes up to you and
| offers you a pile of money to solve either one of those
| problems, do I laugh in their face, knowing damn well I
| won't be the one to prove that NP = P, or do I explain to
| them the situation, and build them software that will do
| the best it can, with however much compute resources
| they're willing to pay for?
| Havoc wrote:
| A lot of the style stuff you can write once and reuse. I
| started splitting mine into overall and project specific files
| for this reason
|
| Universal has stuff I always want (use uv instead of pip etc)
| while the other describes what tech choice for this project
| ctoth wrote:
| I've gotten quite a bit of utility out of my current setup[0]:
|
| Some explicit things I found helpful: Have the agent address you
| as something specific! This way you know if the agent is paying
| attention to your detailed instructions.
|
| Rationality, as in the stuff practiced on early Less Wrong, gives
| a great language for constraining the agent, and since it's read
| The Sequences and everything else you can include pointers and
| the more you do the more it will nudge it into that mode of
| thought.
|
| The explicit "This is what I'm doing, this is what I expect"
| pattern has been hugely useful for both me monitoring it/coming
| back to see what it did, and it itself. It makes it more likely
| to recover when it goes down a bad path.
|
| The system reminder this article mentions is definitely there but
| I have not noticed it messing much with adherence. I wish there
| were some sort of power user mode to turn it off though!
|
| Also, this is probably too long! But I have been experimenting
| and iterating for a while, and this is what is working best
| currently. Not that I've been able to hold any other part
| constant -- Opus 4.5 really is remarkable.
|
| [0]:
| https://gist.github.com/ctoth/d8e629209ff1d9748185b9830fa4e7...
| johnfn wrote:
| I was expecting the traditional AI-written slop about AI, but
| this is actually really good. In particular, the "As instruction
| count increases, instruction-following quality decreases
| uniformly" section and associated graph is truly fantastic! To my
| mind, the ability to follow long lists of rules is one of the
| most obvious ways that virtually all AI models fail today. That's
| why I think that graph is so useful -- I've never seen someone go
| and systematically measure it before!
|
| I would love to see it extended to show Codex, which to my mind
| is by far the best at rule-following. (I'd also be curious to see
| how Gemini 3 performs.)
| 0xblacklight wrote:
| I looked when I wrote the post but the paper hasn't been
| revisited with newer models :/
| boredtofears wrote:
| It would be nice to see an actual example of what a good
| claude.md that implements all of these recommendations looks
| like.
| huqedato wrote:
| Looking for a similar GEMINI.md
| 0xblacklight wrote:
| It might support AGENTS.md, you could check the site and see if
| it's there
| vunderba wrote:
| From the article:
|
| _> We recommend keeping task-specific instructions in separate
| markdown files with self-descriptive names somewhere in your
| project. Then, in your CLAUDE.md file, you can include a list of
| these files with a brief description of each, and instruct Claude
| to decide which (if any) are relevant and to read them before it
| starts working._
|
| I've been doing this since the early days of agentic coding
| though I've always personally referred to it as the _Table-of-
| Contents approach_ to keep the context window relatively
| streamlined. Here 's a snippet of my CLAUDE.md file that
| demonstrates this approach: # Documentation
| References - When adding CSS, refer to:
| docs/ADDING_CSS.md - When adding assets, refer to:
| docs/ADDING_ASSETS.md - When working with user data, refer
| to: docs/STORAGE_MANAGER.md
|
| Full CLAUDE.md file for reference:
|
| https://gist.github.com/scpedicini/179626cfb022452bb39eff10b...
| sothatsit wrote:
| I have also done this, but my results are very hit or miss.
| Claude rarely actually reads the other documentation files I
| point it to.
| dhorthy wrote:
| I think the key here is "if X then Y syntax" - this seems to
| be quite effective at piercing through the "probably ignore
| this" system message by highlighting WHEN a given instruction
| is "highly relevant"
| throwaway314155 wrote:
| What?
| xpe wrote:
| It helps when questions intended to resolve ambiguity are
| not themselves hopelessly ambiguous.
|
| See also: "Help me help you" -
| https://en.wikipedia.org/wiki/Jerry_Maguire
| Sammi wrote:
| Yeah I don't trust any agent to follow document references
| consistently. I just manually add the relevant files to
| context every single time.
|
| Though I know some people who have built an mcp that does
| exactly this: https://www.usable.dev/
|
| It's basically a chat-bot frontend to your markdown files,
| with both rag and graph db indexes.
| wry_discontent wrote:
| That makes sense given that it's trained on real world
| developers.
| dimitri-vs wrote:
| Correct me if I'm wrong but I think the new "skillss are
| exactly this, but better.
| vunderba wrote:
| Yeah I think "Skills" are just a more codified folder based
| approach to this TOC system. The main reason I haven't
| migrated yet is that the TOC approach lends itself better to
| the more generic AGENTS.md style - allowing me to swap over
| to alternative LLMs (such as Gemini) relatively easily.
| stpedgwdgfhgdd wrote:
| Indeed, the article links to the skill documentation which
| says:
|
| Skills are modular capabilities that extend Claude's
| functionality through organized folders containing
| instructions, scripts, and resources.
|
| And
|
| Extend Claude's capabilities for your specific workflows
|
| E.g. building your project is definitely a workflow.
|
| It als makes sense to put as much as you can into a skill as
| this an optimized mechanism for claude code to retrieve
| relevant information based on the skill's frontmatter.
| Zarathruster wrote:
| I've done this too. The nice side-benefit of this approach is
| that it also serves as good documentation for other humans
| (including your future self) when trying to wrap their heads
| around what was done and why. In general I find it helpful to
| write docs that help both humans and agents to understand the
| structure and purpose of my codebase.
| tietjens wrote:
| I think this could work really well for infrastructure/ops style
| work where the LLM will not be able to grasp the full context of
| say the network from just a few files that you have open.
|
| But as others are saying this is just basic documentation that
| should be done anyway.
| acedTrex wrote:
| "Here's how to use the slop machine better" is such a ridiculous
| pretense for a blog or article. You simply write a sentence and
| it approximates it. That is hardly worth any literature being
| written as it is so self obvious.
| 0xblacklight wrote:
| This is an excellent point - LLMs are autoregressive next-token
| predictors, and output token quality is a function of input
| token quality
|
| Consider that if the only code you get out of the
| autoregressive token prediction machine is slop, that this
| indicates more about the quality of your code than the quality
| of the autoregressive token prediction machine
| acedTrex wrote:
| > that this indicates more about the quality of your code
|
| Considering that the "input" to these models is essentially
| all public code in existence, the direct context input is a
| drop in the bucket.
| johnsmith1840 wrote:
| I don't get the point. Point it at your relevent files ask it to
| review discuss the update refine it's understanding and then tell
| it to go.
|
| I have found that more context comments and info damage quality
| on hard problems.
|
| I actually for a long time now have two views for my code.
|
| 1. The raw code with no empty space or comments. 2. Code with
| comments
|
| I never give the second to my LLM. The more context you give the
| lower it's upper end of quality becomes. This is just a habit
| I've picked up using LLMs every day hours a day since gpt3.5 it
| allows me to reach farther into extreme complexity.
|
| I suppose I don't know what most people are using LLMs for but
| the higher complexity your work entails the less noise you should
| inject into it. It's tempting to add massive amounts of xontext
| but I've routinely found that fails on the higher levels of
| coding complexity and uniqueness. It was more apparent in earlier
| models newer ones will handle tons of context you just won't be
| able to get those upper ends of quality.
|
| Compute to informatio ratio is all that matters. Compute is
| capped.
| ra wrote:
| This is exactly right. Attention is all you need. It's all
| about attention. Attention is finite.
|
| The more you data load into context the more you dilute
| attention.
| throwuxiytayq wrote:
| people who criticize LLMs for merely regurgitating
| statistically related token sequences have very clearly never
| read a single HN comment
| nightski wrote:
| IMO within the documentation .md files the information density
| should be very high. Higher than trying to shove the entire
| codebase into context that is for sure.
| johnsmith1840 wrote:
| You deffinetly don't just push the entire code base. Previous
| models required you to be meticulous about your input. A
| function here a class there.
|
| Even now if I am working on REALLY hard problems I will still
| manually copy and paste code sections out for discussion and
| algorithm designs. Depends on complexity.
|
| This is why I still believe open ai O1-Pro was the best model
| I've ever seen. The amount of compute you could throw at a
| problem was absurd.
| senshan wrote:
| > I never give the second to my LLM.
|
| How do you practically achieve this? Honest question. Thanks
| johnsmith1840 wrote:
| Custom scripts.
|
| 1. Turn off 2. Code 3. Turn on 4. Commit
|
| I also delete all llm comments they 100% poison your
| codebase.
| senshan wrote:
| >> 1. The raw code with no empty space or comments. 2. Code
| with comments
|
| > 1. Turn off 2. Code 3. Turn on 4. Commit
|
| What does it mean "turn off" / "turn on"?
|
| Do you have a script to strip comments?
|
| Okay, after the comments were stripped, does this become
| the common base for 3-way merge?
|
| After modification of the code stripped of the comments, do
| you apply 3-way merge to reconcile the changes and the
| comments?
|
| This seems a lot of work. What is the benefit? I mean
| demonstrable benefit.
|
| How does it compare to instructing through AGENTS.md to
| ignore all comments?
| johnsmith1840 wrote:
| Telling an AI to ignore comments != no comments that's
| pretty fundamental to get my point.
| senshan wrote:
| >> 1. The raw code with no empty space or comments. 2.
| Code with comments
|
| > 1. Turn off 2. Code 3. Turn on 4. Commit
|
| So can you describe your "turn off" / "turn on" process
| in practical terms?
|
| Asking simply because saying "Custom scripts" is similar
| to saying "magic".
| Mtinie wrote:
| > 1. The raw code with no empty space or comments. 2. Code with
| comments
|
| I like the sound of this but what technique do you use to
| maintain consistency across both views? Do you have a post-
| modification script which will strip comments and extraneous
| empty space after code has been modified?
| wormpilled wrote:
| Curious if that is the case, how you would put comments back
| too? Seems like a mess.
| Mtinie wrote:
| As I think more on how this could work, I'd treat the fully
| commented code as the source of truth (SOT).
|
| 1. SOT through a processor to strip comments and extra
| spaces. Publish to feature branch.
|
| 2. Point Claude at feature branch. Prompt for whatever
| changes you need. This runs against the minimalist feature
| branch. These changes will be committed with comments and
| readable spacing for the new code.
|
| 3. Verify code changes meet expectations.
|
| 4. Diff the changes from minimal version, and merge only
| that code into SOT.
|
| Repeat.
| johnsmith1840 wrote:
| Just test it, maybe you won't get a boost.
|
| 1. Run into a problem you and AI can't solve. 2. Drop all
| comments 3. Restart debug/design session 4. Solve it and
| save results 5. Revert code to have comments and put
| update in
|
| If that still doesn't work: Step 2.5 drop all unrelated
| code from context
| johnsmith1840 wrote:
| Custom scripts and basic merge logic but manual still happens
| around modifications. Forces me to update stale comments
| around changes anyhow.
|
| I first "discovered" it because I repeatedly found LLM
| comments poisoned my code base over time and linited it's
| upper end of ability.
|
| Easy to try just drop comments around a problem and see the
| difference. I was previously doing that and then manually
| updating the original.
| Aurornis wrote:
| > I have found that more context comments and info damage
| quality on hard problems.
|
| There can be diminishing returns, but every time I've used
| Claude Code for a real project I've found myself repeating
| certain things over and over again and interrupting tool usage
| until I put it in the Claude notes file.
|
| You shouldn't try to put everything in there all the time, but
| putting key info in there has been very high ROI for me.
|
| Disclaimer: I'm a casual user, not a hardcore vibe coder.
| Claude seems much more capable when you follow the happy path
| of common projects, but gets constantly turned around when you
| try to use new frameworks and tools and such.
| lostdog wrote:
| Agreed, I don't love the CLAUDE.md that gets autogenerated.
| It's too wordy for me to understand and for the model to
| follow consistently.
|
| I like to write my CLAUDE.md directly, with just a couple
| paragraphs describing the codebase at a high level, and then
| I add details as I see the model making mistakes.
| MarkMarine wrote:
| Setting hooks has been super helpful for me, you can reject
| certain uses of tools (don't touch my tests for this session)
| with just simple scripting code.
| brianwawok wrote:
| Git lint hook has been key. No matter how many times I told
| it, it lints randomly. Sometimes not at all. Sometime
| before rubbing tests (but not after fixing test failures).
| schrodinger wrote:
| Genuinely curious -- how did you isolate the effect of
| comments/context on model performance from all the other
| variables that change between sessions (prompt phrasing, model
| variance, etc)? In other words, how did you validate the
| hypothesis that "turning off the comments" (assuming you mean
| stripping them temporarily...) resulted in an objectively
| superior experience?
|
| What did your comparison process look like? It feels
| intuitively accurate and validates my anecdotal impression but
| I'd love to hear the rigor behind your conclusions!
| johnsmith1840 wrote:
| I was already in the habit of copy pasting relevent code
| sections to maximize reasoning performance to squeeze earlier
| weaker models performance on stubborn problems. (Still do
| this on really nasty ones)
|
| It's also easy to notice LLMs create garbage comments that
| get worse over time. I started deleting all comments manually
| alongside manual snippet selection to get max performance.
|
| Then started just routinely deleting all comments pre big
| problem solving session. Was doing it enough to build some
| automation.
|
| Maybe high quality human comments improve ability? Hard to
| test in a hybrid code base.
| saturatedfat wrote:
| could u share some more intuition as to why you started
| believing that? are there ANY comments that are useful?
| stpedgwdgfhgdd wrote:
| The comments are what makes the model understand your code much
| better.
|
| See it as a human, the comments are there to speed up
| understanding of the code.
| xpe wrote:
| > I have found that more context comments and info damage
| quality on hard problems.
|
| I'm skeptical this a valid generalization over what was
| directly observed. [1] We would learn more if they wrote a more
| detailed account of their observations. [2]
|
| I'd like to draw a parallel to another area of study possibly
| unfamiliar to many of us. Anthropology faced similar issues
| until Geertz's 1970s reform emphasized "thick description" [3]
| meaning detailed contextual observations instead of thin
| generalization.
|
| [1]: I would not draw this generalization. I've found that
| adding guidelines (on the order of 10k tokens) to my CLAUDE.md
| has been beneficial across all my conversations. At the same
| time, I have not constructed anything close to _study_ of
| variations of my approach. And the underlying models are a
| moving target. I will admit that some of my guidelines were
| added to address issues I saw over a year ago and may be
| nothing more than vestigial appendages nowadays. This is why I
| 'm reluctant to generalize.
|
| [2]: What kind of "hard problems"? What is meant by "more"
| exactly? (Going from 250 to 500 tokens? 1000 to 2000? 2500 to
| 5000? &c) How much overlap exists between the CLAUDE.md content
| items? How much ambiguity? How much contradiction?
|
| [3]: https://en.wikipedia.org/wiki/Thick_description
| malshe wrote:
| I have been using Claude.md to stuff way too many instructions so
| this article was an eye opener. Btw, any tips for Claude.md when
| one uses subagents?
| 0xcb0 wrote:
| Here is my take, on writing a good claude.md. I had very good
| results with my 3 file approach. And it has also been inspired by
| the great blog posts that Human Layer is publishing from time to
| time https://github.com/marcuspuchalla/claude-project-management
| mmaunder wrote:
| That paper the article references is old at this point. No GPT
| 5.1, no Gemini 3, which both were game changers. I'd love to see
| their instruction following graphs.
| 0xblacklight wrote:
| Same!
| grishka wrote:
| Oh yeah I added a CLAUDE.md to my project the other day:
| https://github.com/grishka/Smithereen/blob/master/CLAUDE.md
|
| Is it a good one?
| lijok wrote:
| I copy/pasted it into my codebase to see if it's any good and
| now Claude is refusing to do any work? I asked Copilot to
| investigate why Claude is not working but it too is not
| working. Do you know what happened?
| wizzledonker wrote:
| Definitely a good one - probably one of the best CLAUDE.md
| files you can put in any repository if you care about your
| project at all.
| max-privatevoid wrote:
| The only good Claude.md is a deleted Claude.md.
| rvz wrote:
| This is the only correct answer.
| VimEscapeArtist wrote:
| What's the actual completion rate for Advent of Code? I'd bet the
| majority of participants drop off before day 25, even among those
| aiming to complete it.
|
| Is this intentional? Is AoC designed as an elite challenge, or is
| the journey more important than finishing?
| philipwhiuk wrote:
| Wrong article.
|
| I rarely get past 18 or so. The stats for last year are here:
| https://adventofcode.com/2024/stats
| DR_MING wrote:
| I already forgot CLAUDE.md, I generate and update it by AI, I
| prefer to keep design, tasks, docs folder instead. It is always
| better to ask it to read a some spec docs and read the real code
| first before doing anything.
| nico wrote:
| > Claude often ignores CLAUDE.md
|
| > The more information you have in the file that's not
| universally applicable to the tasks you have it working on, the
| more likely it is that Claude will ignore your instructions in
| the file
|
| Claude.md files can get pretty long, and many times Claude Code
| just stops following a lot of the directions specified in the
| file
|
| A friend of mine tells Claude to always address him as "Mr
| Tinkleberry", he says he can tell Claude is not paying attention
| to the instructions on Claude.md, when Claude stops calling him
| "Mr Tinkleberry" consistently
| stingraycharles wrote:
| That's hilarious and a great way to test this.
|
| What I'm surprised about is that OP didn't mention having
| multiple CLAUDE.md files in each directory, specifically
| describing the current context / files in there. Eg if you have
| some database layer and want to document some critical things
| about that, put it in "src/persistence/CLAUDE.md" instead of
| the main one.
|
| Claude pulls in those files automatically whenever it tries to
| read a file in that directory.
|
| I find that to be a very effective technique to leverage
| CLAUDE.md files and be able to put a lot of content in them,
| but still keep them focused and avoid context bloat.
| sroussey wrote:
| Ummm... sounds like that directory should have a readme. And
| Claude should read readme files.
| stingraycharles wrote:
| READMEs are written for people, CLAUDE.mds are written for
| coding assistants. I don't write "CRITICAL (PRIORITY 0):"
| in READMEs.
|
| The benefit of CLAUDE.md files is that they're pulled in
| automatically, eg if Claude wants to read
| "tests/foo_test.py" it will automatically pull in
| "tests/CLAUDE.md" (if it exists).
| adastra22 wrote:
| Is this documented anywhere? This is the first I have
| ever heard of it.
| grumbelbart wrote:
| Here: https://www.anthropic.com/engineering/claude-code-
| best-pract...
|
| claude.md seems to be important enough to be their very
| first point in that document.
| ffsm8 wrote:
| Naw man, it's the first point because in April Claude
| code didn't really gave anything else that somewhat
| worked.
|
| I tried to use that effectively, I even started a new
| greenfield project just to make sure to test it under
| ideal circumstances - and while it somewhat worked, it
| was always super lackluster and way more effective to
| explicitly add the context manually via prepared md you
| just reference in the prompt.
|
| I'd tell anyone to go for skills first before littering
| your project with these config files everywhere
| llbeansandrice wrote:
| If AI is supposed to deliver on this magical no-lift ease
| of use task flexibility that everyone likes to talk about
| I think it should be able to work with a README instead
| of clogging up ALL of my directories with yet another
| fucking config file.
|
| Also this isn't portable to other potential AI tools. Do
| I need 3+ md files in every directory?
| stingraycharles wrote:
| It's not delivering on magical stuff. Getting real
| productivity improvements out of this requires
| engineering and planning and it needs to be approached as
| such.
|
| One of the big mistakes I think is that all these tools
| are over-promising on the "magic" part of it.
|
| It's not. You need to really learn how to use all these
| tools effectively. This is not done in days or weeks
| even, it takes months in the same way becoming proficient
| in eMacs or vim or a programming language is.
|
| Once you've done that, though, it can absolutely enhance
| productivity. Not 10x, but definitely in the area of 2x.
| Especially for projects / domains you're uncomfortable
| with.
|
| And of course the most important thing is that you need
| to enjoy all this stuff as well, which I happen to do. I
| can totally understand the resistance as it's a shitload
| of stuff you need to learn, and it may not even be
| relevant anymore next year.
| giancarlostoro wrote:
| Yeah I feel like on average I still spend a similar
| amount of time developing but drastically less time
| fixing obscure bugs, because once it codes the feature
| and I describe the bugs it fixed them, the rest of my
| times spent testing and reviewing code.
| tsimionescu wrote:
| While I believe you're probably right that getting any
| productivity gains from these tools requires an
| investment, I think calling the process "engineering" is
| really stretching the meaning of the word. It's really
| closer to ritual magic than any solid engineering
| practices at this point. People have guesses and
| practices that may or may not actually work for them
| (since measuring productivity increases is difficult if
| not impossible), and they teach others their magic
| formulas for controlling the demon.
| gtaylor wrote:
| Most countries don't have a notion of a formally licensed
| software engineer, anyway. Arguing what is and is not
| engineering is not useful.
| llbeansandrice wrote:
| I think it's relevant when people keep using terms like
| "prompt engineering" to try and beef up this charade of
| md files that don't even seem to work consistently.
|
| This is a far far cry from even writing yaml for
| Github/Gitlab CICD pipelines. Folks keep trying to say
| "engineering" when every AI thread like this seems to
| push me more towards "snake oil" as an appropriate term.
| tsimionescu wrote:
| Most countries don't have a notion of a formally licenses
| physicist either. That doesn't make it right to call
| astrology physics. And all of the practices around using
| LLM agents for coding are a lot closer to astrology than
| they are to astronomy.
|
| I was replying to someone who claimed that getting real
| productivity gains from this tool requires engineering
| and needs to be approached as such. It also compared
| learning to use LLM agents to learning to code in emacs
| or vim, or learning a programming language - things which
| are nothing alike to learning to control an inherently
| stochastic tool that can't even be understood using any
| of our regular scientific methods.
| fpauser wrote:
| >> [..] and it may not even be relevant anymore next
| year.
|
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| ^
| nineteen999 wrote:
| Learning how to equip a local LLM with tools it can use
| to interact with to extend its capabilities has been a
| lot of fun for me and is a great educational experience
| for anyone who is interested. Just another tool for the
| toolchest.
| llbeansandrice wrote:
| My issue is not with learning. This "tool" has an
| incredibly shallow learning curve. My issue is that I'm
| having to make way for these "tools" that everyone says
| vastly increases productivity but seems to just churn out
| tech-debt as quickly as it can write it.
|
| It a large leap to "requires engineering and planning"
| when no one even in this thread can seem to agree on the
| behavior of any of these "tools". Some comments tell
| anecdotes of not getting the agents to listen until the
| context of the whole world is laid out in these md files.
| Others say the only way is to keep the context tight and
| focused, going so far as to have written _yet more tools_
| to remove and re-add code comments so they don't "poison"
| the context.
|
| I am slightly straw-manning, but the tone in this thread
| has already shifted from a few months ago where these
| "tools" were going to immediately give huge productivity
| gains but now you're telling me they need 1) their own
| special files everywhere (again, this isn't even agreed
| on) and 2) "engineering and planning...not done in days
| or weeks even
|
| The entire economy is propped up on this tech right now
| and no one can even agree on whether it's effective or
| how to use it properly? Not to mention the untold damage
| it is doing to learning outcomes.
| whywhywhywhy wrote:
| > Do I need 3+ md files in every directory?
|
| Don't worry, as of about 6 weeks ago when they changed
| the system prompt Claude will make sure every folder has
| way more than 3 .md files seen as it often writes 2 or
| more per task so if you don't clean them up...
| solumunus wrote:
| Strange. I haven't experienced this a single time and I
| use it almost all day everyday.
| beardedwizard wrote:
| That is strange because it's been going on since sonnet
| 4.5 release.
| mewpmewp2 wrote:
| Is your logic that unless something is perfect it should
| not be used even though it is delivering massive
| productivity gains?
| swiftcoder wrote:
| > it is delivering massive productivity gains
|
| [citation needed]
|
| Every article I can find about this is citing the
| valuation of the S&P500 as evidence of the productivity
| gains, and that feels very circular
| brigandish wrote:
| I often can't tell the difference between my Readme and
| Claude files to the point that I cannibalise the Claude
| file for the Readme.
|
| It's the difference between instructions for a user and
| instructions for a developer, but in coding projects
| that's not much different.
| pmarreck wrote:
| > "CRITICAL (PRIORITY 0):"
|
| There's no need for this level of performative
| ridiculousness with AGENTS.md (Codex) directives, FYI.
| grayhatter wrote:
| > A friend of mine tells Claude to always address him as "Mr
| Tinkleberry", he says he can tell Claude is not paying
| attention to the instructions on Claude.md, when Claude stops
| calling him "Mr Tinkleberry" consistently
|
| this is a totally normal thing that everyone does, that no one
| should view as a signal of a psychotic break from reality...
|
| is your friend in the room with us right now?
|
| I doubt I'll ever understand the lengths AI enjoyers will go
| though just to avoid any amount of independent thought...
| crystal_revenge wrote:
| I suspect you're misjudging the friend here. This sounds more
| like the famous "no brown m&ms" clause in the Van Halen
| performance contract. As ridiculous as the request is, it
| being followed provides strong evidence that the rest (and
| more meaningful) of the requests are.
|
| Sounds like the friend understands quite well how LLMs
| actually work and has found a clever way to be signaled when
| it's starting to go off the rails.
| davnicwil wrote:
| It's also a common tactic for filtering inbound email.
|
| Mention that people may optionally include some word like
| 'orange' in the subject line to tell you they've come via
| some place like your blog or whatever it may be, and have
| read at least carefully enough to notice this.
|
| Of course ironically that trick's probably trivially broken
| now because of use of LLMs in spam. But the point stands,
| it's an old trick.
| kiernan wrote:
| Could try asking for a seahorse emoji in addition...
| mosselman wrote:
| Apart from the fact that not even every human would read
| this and add it to the subject, this would still work.
|
| I doubt there is any spam machine out there the quickly
| tries to find peoples personal blog before sending them
| viagra mail.
|
| If you are being targeted personally, then of course all
| bets are off, but that would've been the case with or
| without the subject-line-trick
| grayhatter wrote:
| > I suspect you're misjudging the friend here. This sounds
| more like the famous "no brown m&ms" clause in the Van
| Halen performance contract. As ridiculous as the request
| is, it being followed provides strong evidence that the
| rest (and more meaningful) of the requests are.
|
| I'd argue, it's more like you've bought so much into the
| idea this is reasonable, that you're also willing to go
| through extreme lengths to recon and pretend like this is
| sane.
|
| Imagine two different worlds, one where the tools that
| engineers use, have a clear, and reasonable way to detect
| and determine if the generative subsystem is still on the
| rails provided by the controller.
|
| And another world where the interface is completely devoid
| of any sort of basic introspection interface, and because
| it's a problematic mess, all the way down, everyone invents
| some asinine way that they believe provides some sort of
| signal as to whether or not the random noise generator has
| gone off the rails.
|
| > Sounds like the friend understands quite well how LLMs
| actually work and has found a clever way to be signaled
| when it's starting to go off the rails.
|
| My point is that while it's a cute hack, if you step back
| and compare it objectively, to what good engineering would
| look like. It's wild so many people are all just willing to
| accept this interface as "functional" because it means they
| don't have to do the thinking that required to emit the
| output the AI is able to, via the specific randomness
| function used.
|
| Imagine these two worlds actually do exist; and instead of
| using the real interface that provides a clear bool answer
| to "the generative system has gone off the rails" they
| *want* to be called Mr Tinkerberry
|
| Which world do you think this example lives in? You could
| convince me, Mr Tinkleberry is a cute example of the
| latter, obviously... but it'd take effort to convince me
| that this reality is half reasonable or that's it's
| reasonable that people who would want to call themselves
| engineers should feel proud to be a part of this one.
|
| Before you try to strawman my argument, this isn't a
| gatekeeping argument. It's only a critical take on the
| interface options we have to understand something that
| might as well be magic, because that serves the snakeoil
| sales much better.
|
| > > Is the magic token machine working?
|
| > Fuck I have no idea dude, ask it to call you a funny
| name, if it forgets the funny name it's probably broken,
| and you need to reset it
|
| Yes, I enjoy working with these people and living in this
| world.
| gyomu wrote:
| It is kind of wild that not that long ago the general
| sentiment in software engineering (at least as observed
| on boards like this one) seemed to be about valuing
| systems that were understandable, introspectable, with
| tight feedback loops, within which we could compose
| layers of abstractions in meaningful and predictable ways
| (see for example the hugely popular - at the time - works
| of Chris Granger, Bret Victor, etc).
|
| And now we've made a complete 180 and people are getting
| excited about proprietary black boxes and "vibe
| engineering" where you have to pretend like the computer
| is some amnesic schizophrenic being that you have to
| coerce into maybe doing your work for you, but you're
| never really sure whether it's working or not because who
| wants to read 8000 line code diffs every time you ask
| them to change something. And never mind if your feedback
| loops are multiple minutes long because you're waiting on
| some agent to execute some complex network+GPU bound
| workflow.
| adastra22 wrote:
| You don't think people are trying very hard to understand
| LLMs? We recognize the value of interpretability. It is
| just not an easy task.
|
| It's not the first time in human history that our ability
| to create things has exceeded our capacity to understand.
| gyomu wrote:
| Your comment would be more useful if you could point us
| to some concrete tooling that's been built out in the
| last ~3 years that LLM assisted coding has been around to
| improve interpretability.
| adastra22 wrote:
| That would be the exact opposite of my claim: it is a
| very hard problem.
| grayhatter wrote:
| > You don't think people are trying very hard to
| understand LLMs? We recognize the value of
| interpretability. It is just not an easy task.
|
| I think you're arguing against a tangential position to
| both me, and the person this directly replies to. It can
| be hard to use and understand something, but if you have
| a magic box that you can't tell if it's working. It
| doesn't belong anywhere near the systems that other
| humans use. The people that use the code you're about to
| commit to whatever repo you're generating code for, all
| deserve better than to be part of your unethical science
| experiment.
|
| > It's not the first time in human history that our
| ability to create things has exceeded our capacity to
| understand.
|
| I don't agree this is a correct interpretation of the
| current state of generative transformer based AI. But
| even if you wanted to try to convince me; my point would
| still be, this belongs in a research lab, not anywhere
| near prod. And that wouldn't be a controversial idea in
| the industry.
| nineteen999 wrote:
| > It doesn't belong anywhere near the systems that other
| humans use
|
| Really for those of us who _actually_ work in critical
| systems (emergency services in my case) - of course we
| 're not going to start patching the core applications
| with vibe code.
|
| But yeah, that frankenstein reporting script that half a
| dozen amateur hackers made a mess of over 20 years
| instead of refactoring and redesigning? That's prime
| fodder for this stuff. NOBODY wants to clean that stuff
| up by hand.
| grayhatter wrote:
| > Really for those of us who actually work in critical
| systems (emergency services in my case) - of course we're
| not going to start patching the core applications with
| vibe code.
|
| I used to believe that no one would seriously consider
| this too... but I don't believe that this is a safe
| assumption anymore. You might be the exception, but there
| are many more people who don't consider the implications
| of turning over said intellectual control.
|
| > But yeah, that frankenstein reporting script that half
| a dozen amateur hackers made a mess of over 20 years
| instead of refactoring and redesigning? That's prime
| fodder for this stuff. NOBODY wants to clean that stuff
| up by hand.
|
| It's horrible, no one currently understands it, so let
| the AI do it, so that still, no one will understand it,
| but at least this one bug will be harder to trigger.
|
| I don't agree that harder to trigger bugs are better than
| easy to trigger bugs. And from my view, the argument that
| "it's currently broken now, and hard to fix!" Isn't
| exactly an argument I find compelling for leaving it that
| way.
| adastra22 wrote:
| We used the steam engine for 100 years before we had a
| firm understanding of why it worked. We still don't
| understand how ice skating works. We don't have a
| physical understanding of semi-fluid flow in grain silos,
| but we've been using them since prehistory.
|
| I could go on and on. The world around you is full of not
| well understood technology, as well as non deterministic
| processes. We know how to engineer around that.
| grayhatter wrote:
| > We used the steam engine for 100 years before we had a
| firm understanding of why it worked. We still don't
| understand how ice skating works. We don't have a
| physical understanding of semi-fluid flow in grain silos,
| but we've been using them since prehistory.
|
| I don't think you and I are using the same definition for
| "firm understanding" or "how it works".
|
| > I could go on and on. The world around you is full of
| not well understood technology, as well as non
| deterministic processes. We know how to engineer around
| that.
|
| Again, you're side stepping my argument so you can
| restate things that are technically correct, but not
| really a point in of themselves. I see people who want to
| call themselves software engineers throw code they
| clearly don't understand against the wall because the AI
| said so. There's a significant delta between knowing you
| can heat water to turn it into a gas with increased
| pressure that you can use to mechanically turn a wheel,
| vs, put wet liquid in jar, light fire, get magic spinny
| thing. If jar doesn't call you a funny name first, that's
| bad!
| adastra22 wrote:
| > I don't think you and I are using the same definition
| for "firm understanding" or "how it works".
|
| I'm standing in firm ground here. Debate me in the
| details if you like.
|
| You are constructing a strawman.
| adastra22 wrote:
| It feels like you're blaming the AI engineers here, that
| they built it this way out of ignorance or something.
| Look into interpretability research. It is a hard
| problem!
| grayhatter wrote:
| I am blaming the developers who use AI because they're
| willing to sacrifice intellectual control in trade for
| something that I find has minimal value.
|
| I agree it's likely to be a complex or intractable
| problem. But I don't enjoy watching my industry revert
| down the professionalism scale. Professionals don't
| choose tools that they can't explain how it works. If
| your solution to understanding if your tool is still
| functional is inventing an amusing name and trying to use
| that as the heuristic, because you have no better way to
| determine if it's still working correctly. That feels
| like it might be a problem, no?
| adastra22 wrote:
| I'm sorry you don't like it. But this has very strong
| old-man-yells-at-cloud vibes. This train is moving,
| whether you want it to or not.
|
| Professionals use tools that work, whether they know why
| it works is of little consequence. It took 100 years to
| explain the steam engine. That didn't stop us from making
| factories and railroads.
| grayhatter wrote:
| > It took 100 years to explain the steam engine. That
| didn't stop us from making factories and railroads.
|
| You keep saying this, why do you believe it so strongly?
| Because I don't believe this is true. Why do you?
|
| And then, even assuming it's completely true exactly as
| stated; shouldn't we have higher standards than that when
| dealing with things that people interact with? Boiler
| explosions are bad right? And we should do everything we
| can to prove stuff works the way we want and expect? Do
| you think AI, as it's currently commonly used, helps do
| that?
| adastra22 wrote:
| Because I'm trained as a physicist and (non-software)
| engineer and I know my field's history? Here's the first
| result that comes up on Google. Seems accurate from a
| quick skim: https://www.ageofinvention.xyz/p/age-of-
| invention-why-wasnt-...
|
| And yes we should seek to understand new inventions.
| Which we are doing right now, in the form of
| interpretability research.
|
| We should not be making Luddite calls to halt progress
| simply because our analytic capabilities haven't caught
| up to our progress in engineering.
| grayhatter wrote:
| Can you cite a section from this very long page that
| might convince me no one at the time understood how
| turning water into steam worked to create pressure?
|
| If this is your industry, shouldn't you have a more
| reputable citation, maybe something published more
| formally? Something expected to stand up to peer review,
| instead of just a page on the internet?
|
| > We should not be making Luddite calls to halt progress
| simply because our analytic capabilities haven't caught
| up to our progress in engineering.
|
| You've misunderstood my argument. I'm not making a
| luddite call to halt progress, I'm objecting to my
| industry which should behave as one made up of
| professionals, willingly sacrifice intellectual control
| over the things they are responsible for, and advocate
| others should do the same. Especially not at the expense
| of users, which I see happening.
|
| Anything that results in sacrificing the understanding
| over exactly how the thing you built works is bad should
| be avoided. The source, either AI or something different,
| doesn't matter as much as the result.
| adastra22 wrote:
| The steam engine is more than just boiling water. It is a
| thermodynamic cycle that exploits differences in the
| pressure curve in the expansion and contraction part of
| the cycle and the cooling of expanding gas to turn a
| temperature difference (the steam) into physical force
| (work).
|
| To really understand WHY a steam engine works, you need
| to understand the behavior of ideal gasses (1787 - 1834)
| and entropy (1865). The ideal gas law is enough to
| perform calculations needed to design a steam engine, but
| it was seen at the time to be just as inscrutable. It was
| an empirical observation not derivable from physical
| principles. At least not until entropy was understood in
| 1865.
|
| James Watt invented his steam engine in 1765, exactly a
| hundred years before the theory of statistical mechanics
| that was required to explain why it worked, and prior to
| all of the gas laws except Boyle's.
| orbital-decay wrote:
| This reads like you either have an idealized view of Real
| Engineering(tm), or used to work in a stable, extremely
| regulated area (e.g. civil engineering). I used to work
| in aerospace in the past, and we had a lot of silly Mr
| Tinkleberry canaries. We didn't strictly rely on them
| because our job was "extremely regulated" to put it
| mildly, but they did save us some time.
|
| There's a ton of pretty stable engineering subfields that
| involve a lot more intuition than rigor. A lot of things
| in EE are like that. Anything novel as well. That's how
| steam in 19th century or aeronautics in the early 20th
| century felt. Or rocketry in 1950s, for that matter.
| There's no need to be upset with the fact that some
| people want to hack explosive stuff together before it
| becomes a predictable glacier of Real Engineering.
| gyomu wrote:
| Man I hate this kind of HN comment that makes grand
| sweeping statement like "that's how it was with steam in
| the 19th century or rocketry in the 1950s", because
| there's no way to tell whether you're just pulling these
| things out of your... to get internet points or actually
| have insightful parallels to make.
|
| Could you please elaborate with concrete examples on how
| aeronautics in the 20th century felt like having a
| fictional friend in a text file for the token predictor?
| orbital-decay wrote:
| We're not going to advance the discussion this way. I
| also hate this kind of HN comment that makes grand
| sweeping statement like "LLMs are like having a fictional
| friend in a text file for the token predictor", because
| there's no way to tell whether you're just pulling these
| things out of your... to get internet points or actually
| have insightful parallels to make.
|
| Yes, during the Wright era aeronautics was absolutely
| dominated by tinkering, before the aerodynamics was
| figured out. It wouldn't pass the high standard of Real
| Engineering.
| grayhatter wrote:
| > Yes, during the Wright era aeronautics was absolutely
| dominated by tinkering, before the aerodynamics was
| figured out. It wouldn't pass the high standard of Real
| Engineering.
|
| Remind me: did the Wright brothers start selling tickets
| to individuals telling them it was completely safe? Was
| step 2 of their research building a large passenger
| plane?
|
| I originally wanted to avoid that specific flight
| analogy, because it felt a bit too reductive. But while
| we're being reductive, how about medicine too; the first
| smallpox vaccine was absolutely not well understood...
| would that origin story pass ethical review today? What
| do you think the pragmatics would be if the medical
| profession encouraged that specific kind of behavior?
|
| > It wouldn't pass the high standard of Real Engineering.
|
| I disagree, I think it 100% is really engineering.
| Engineering at it's most basic is tricking physics into
| doing what you want. There's no more perfect example of
| that than heavier than air flight. But there's a critical
| difference between engineering research, and
| experimenting on unwitting people. I don't think users
| need to know how the sausage is made. That counts equally
| to planes, bridges, medicine, and code. But the
| professionals absolutely must. It's disappointing
| watching the industry I'm a part of willingly eschew
| understanding to avoid a bit of effort. Such a thing is
| considered malpractice in "real professions".
|
| Ideally neither of you to wring your hands about the
| flavor or form of the argument, or poke fun at the
| gamified comment thread. But if you're gonna complain
| about adding positively to the discussion, try to add
| something to it along with the complaints?
| orbital-decay wrote:
| As a matter of fact, commercial passenger service started
| almost immediately as the tech was out of the fiction
| phase. The airship were large, highly experimental,
| barely controllable, hydrogen-filled death traps that
| were marketed as luxurious and safe. First airliners also
| appeared with big engines and large planes (WWI disrupted
| this a bit). Nothing of that was built on solid grounds.
| The adoption was only constrained by the industrial
| capacity and cost. Most large aircraft were more or less
| experimental up until the 50's, and aviation in general
| was unreliable until about 80's.
|
| I would say that right from the start everyone was pretty
| well aware about the unreliability of LLM-assisted coding
| and nobody was experimenting on unwitting people or
| forcing them to adopt it.
|
| _> Engineering at it's most basic is tricking physics
| into doing what you want._
|
| Very well, then Mr Tinkleberry also passes the bar
| because it's exactly such a trick. That it irks you as a
| cheap hack that lacks rigor (which it does) is another
| matter.
| grayhatter wrote:
| > As a matter of fact, commercial passenger service
| started almost immediately as the tech was out of the
| fiction phase. The airship were large, highly
| experimental, barely controllable, hydrogen-filled death
| traps that were marketed as luxurious and safe.
|
| And here, you've stumbled onto the exact thing I'm
| objecting to. I think the Hindenburg disaster was a bad
| thing, and software engineering shouldn't repeat those
| mistakes.
|
| > Very well, then Mr Tinkleberry also passes the bar
| because it's exactly such a trick. That it irks you as a
| cheap hack that lacks rigor (which it does) is another
| matter.
|
| Yes, this is what I said.
|
| > there's a critical difference between engineering
| research, and experimenting on unwitting people.
|
| I object to watching developers do, exactly that.
| grayhatter wrote:
| > There's no need to be upset with the fact that some
| people want to hack explosive stuff together before it
| becomes a predictable glacier of Real Engineering.
|
| You misunderstand me. I'm not upset that people are
| playing with explosives. I'm upset that my industry is
| playing with explosives that all read, "front: face
| towards users"
|
| And then, more upset that we're all seemingly ok with
| that.
|
| The driving force of enshittifacation of everything, may
| be external, but degradation clearly comes from engineers
| first. These broader industry trends only convince me
| it's not likely to get better anytime soon, and I don't
| like how everything is user hostile.
| pacifika wrote:
| This could be a very niche standup comedy routine, I
| approve.
| solumunus wrote:
| I use agents almost all day and I do way more thinking
| than I used to, this is why I'm now more productive.
| There is little thinking required to produce output,
| typing requires very little thinking. The thinking is all
| in the planning... If the LLM output is bad in any given
| file I simply step in and modify it, and obviously this
| is much faster than typing every character.
|
| I'm spending more time planning and my planning is more
| comprehensive than it used to be. I'm spending less time
| producing output, my output is more plentiful and of
| equal quality. No generated code goes into my commits
| without me reviewing it. Where exactly is the problem
| here?
| Alpha_Logic wrote:
| The 'canary in the coal mine' approach (like the Mr.
| Tinkleberry trick) is silly but pragmatic. Until we have
| deterministic introspection for LLMs, engineers will always
| invent weird heuristics to detect drift. It's not elegant
| engineering, but it's effective survival tactics in a non-
| deterministic loop.
| jmathai wrote:
| I have a /bootstrap command that I run which instructs Claude
| Code to read all system and project CLAUDE.md files, skills and
| commands.
|
| Helps me quickly whip it back in line.
| adastra22 wrote:
| Isn't that what every new session does?
| threecheese wrote:
| That also clears the context; a command would just append
| to the context.
| jmathai wrote:
| This. I've had Claude not start sessions with all of the
| CLAUDE.md, skills, commands loaded and I've had it lose
| it mid-session.
| mrasong wrote:
| Mind sharing it? (As long as it doesn't involve anything
| private.)
| chickensong wrote:
| The article explains why that's not a very good test however.
| sydd wrote:
| Why not? It's relevant for all tasks, and just adds 1 line
| chickensong wrote:
| I guess I assumed that it's not highly relevant to the
| task, but I suppose it depends on interpretation. E.g. if
| someone tells the bus driver to smile while he drives, it's
| hopefully clear that actually driving the bus is more
| important than smiling.
|
| Having experimented with similar config, I found that
| Claude would adhere to the instructions somewhat reliably
| at the beginning and end of the conversation, but was
| likely to ignore during the middle where the real work is
| being done. Recent versions also seem to be more context-
| aware, and tend to start rushing to wrap up as the context
| is nearing compaction. These behaviors seem to support my
| assumption, but I have no real proof.
| dncornholio wrote:
| It will also let the LLM process even more tokens, thus
| decreasing it's accuracy
| globular-toast wrote:
| It baffles me how people can be happy working like this. "I
| wrap the hammer in paper so if the paper breaks I know the
| hammer has turned into a saw."
| fragmede wrote:
| probably by not thinking in ridiculous analogies that don't
| help
| easyThrowaway wrote:
| If you have any experience in 3D modeling, I feel it's quite
| closer to 3D Unwrapping than software development.
|
| You got a bitmap atlas ("context") where you have to cram as
| much information as possible without losing detail, and then
| you need to massage both your texture and the structure of
| your model so that your engine doesn't go mental when trying
| to map your informations from a 2D to a 3D space.
|
| Likewise, both operations are rarely blemish-free and your
| ability resides in being able to contain the intrinsic
| stochastic nature of the tool.
| mewpmewp2 wrote:
| You could think of it as art or creativity.
| pacifika wrote:
| > It Is Difficult to Get a Man to Understand Something When
| His Salary Depends Upon His Not Understanding It
| isoprophlex wrote:
| That's smart, but I worry that that works only partially;
| you'll be filling up the context window with conversation turns
| where the LLM consistently addresses it's user as "Mr.
| Tinkleberry", thus reinforcing that specifc behavior encoded by
| CLAUDE.md. I'm not convinced that this way of addressing the
| user implies that it keeps attention the rest of the file.
| pmarreck wrote:
| I've found that Codex is much better at instruction-following
| like that, almost to a fault (for example, when I tell it to
| "always use TDD", it will try to use TDD even when just fixing
| already-valid-just-needing-expectation-updates tests!
| sesm wrote:
| We are back to color-sorted M&Ms bowls.
| homeonthemtn wrote:
| The green m&M's trick of AI instructions.
|
| I've used that a couple times, e.g. "Conclude your
| communications with "Purple fish" at the end"
|
| Claude definitely picks and chooses when purple fish will show
| up
| nathan_douglas wrote:
| I tell it to accomplish only half of what it thinks it can,
| then conclude with a haiku. That seems to help, because 1) I
| feel like it starts shedding discipline as it starts feeling
| token pressure, and 2) I feel like it is more likely to
| complete task n - 1 than it is to complete task n. I have no
| idea if this is actually true or not, or if I'm
| hallucinating... all I can say is that this is the impression
| I get.
| bryanrasmussen wrote:
| I wonder if there are any benefits, side-effects or downsides
| of everyone using the same fake name for Claude to call them.
|
| If a lot of people always put call me Mr. Tinkleberry in the
| file will it start calling people Mr. Tinkleberry even when it
| loses the context because so many people seem to want to be
| called Mr. Tinkleberry.
| seunosewa wrote:
| Then you switch to another name.
| dkersten wrote:
| I used to tell it to always start every message with a specific
| emoji. Of the emoji wasn't present, I knew the rules were
| ignored.
|
| But it's bro reliable enough. It can send the emoji or address
| you correctly while still ignoring more important rules.
|
| Now I find that it's best to have a short and tight rules file
| that references other files where necessary. And to refresh
| context often. The longer the context window gets, the more
| likely it is to forget rules and instructions.
| aqme28 wrote:
| You could make a hook in Claude to re-inject claude.md. For
| example, make it say "Mr Tinkleberry" in every response, and
| failing to do so re-injects the instructions.
| lubujackson wrote:
| For whatever reason, I can't get into Claude's approach. I like
| how Cursor handles this, with a directory of files (even
| subdirectories allowed) where you can define when it should use
| specific documents.
|
| We are all "context engineering" now but Claude expects one big
| file to handle everything? Seems luke a deadend approach.
| piokoch wrote:
| This is good for the company, chances are you will eat more
| tokens. I liked Aider approach, it wasn't trying to be too
| clever, it used files added to chat and asks if it figure out
| that something more is needed (like, say, settings in case of
| Django application).
|
| Sadly Aider is no longer maintained...
| unshavedyak wrote:
| I think their skills have the ability to dynamically pull in
| more data, but so far i've not tested it to much since it
| seems more tailored towards specific actions. Ie converting a
| PDF might translate nicely to the Agent pulling in the skill
| doc, but i'm not sure if it will translate well to it pulling
| in some rust_testing_patterns.md file when it writes rust
| tests.
|
| Eg i toyed with the idea of thinning out various CLAUDE.md
| files in favor of my targeted skill.md files. In doing so my
| hope was to have less irrelevant data in context.
|
| However the more i thought through this, the more i realized
| the Agent is doing "everything" i wanted to document each
| time. Eg i wasn't sure that creating
| skills/writing_documentation.md and skills/writing_tests.md
| would actually result in less context usage, since both of
| those would be in memory most of the time. My CLAUDE.md is
| already pretty hyper focused.
|
| So yea, anyway my point was that skills _might_ have
| potential to offload irrelevant context which seems useful.
| Though in my case i 'm not sure it would help.
| jswny wrote:
| They have an entire feature for this:
| https://www.claude.com/blog/skills
|
| CLAUDE.md should only be for persistent reminders that are
| useful in 100% of your sessions
|
| Otherwise, you should use skills, especially if CLAUDE.md
| gets too long.
|
| Also just as a note, Claude already supports lazy loaded
| separate CLAUDE.md files that you place in subdirectories. It
| will read those if it dips into those dirs
| astrostl wrote:
| I have Claude itself write CLAUDE.md. Once it is informed of its
| context (e.g., "README.md is for users, CLAUDE.md is for you")
| you can say things like, "update readme and claudemd" and it will
| do it. I find this especially useful for prompts like, "update
| claudemd to make absolutely certain that you check the API docs
| every single time before making assumptions about its behavior"
| -- I don't need to know what magick spell will make that happen,
| just that it does happen.
| dexwiz wrote:
| Do you have any proof that AI written instructions are better
| than human ones? I don't see why an AI would have an innate
| understanding on how best to prompt itself.
| michaelbuckbee wrote:
| Generally speaking it has a lot of information from things
| like OP's blog post on how best to structure the file and
| prompt itself and you can also (from within Claude Code) ask
| it to look at posts or Anthropic prompting best practices and
| adopt those to your own file.
| astrostl wrote:
| Having been through cycles of manual writing with '#' and
| having it do it itself, it seems to have been a push on
| efficacy while spending less effort and getting less
| frustrated. Hard to quantify except to say that I've had
| great results with it. I appreciate the spirit of OP's,
| "CLAUDE.md is the highest leverage point of the harness, so
| avoid auto-generating it" but you can always ask Claude to
| tighten it up itself too.
| chickensong wrote:
| This will start to break down after a while unless you have a
| small project, for reasons being described in the article.
| brcmthrowaway wrote:
| Is CLAUDE.md required when claude has a --continue option?
| Zerot wrote:
| I would recommend using it, yeah. You have limited context and
| it will be compacted/summarized occasionally. The
| compaction/summary will lose some information and it is easy
| for it to forget certain instructions you gave it. Afaik
| claude.md will be loaded into the context on every compaction
| which allows you to use it for instructions that should always
| be included in the context.
| bryanhogan wrote:
| I've been very satisfied with creating a short AGENTS.md file
| with the project basics, and then also including references to
| where to find more information / context, like a /context folder
| that has markdown files such as app-description.md.
| m13rar wrote:
| I was waiting for someone to build this so that I can chuck it
| into CLAUDE and tell it how to write good MD.
| foobarbecue wrote:
| Funny how this is exactly the documentation you'd need to make it
| easy for a human to work with the codebase. Perhaps this'll be
| the greatest thing about LLMs -- they force people to write
| developer guides for their code. Of course, people are going to
| ask an LLM to write the CLAUDE.md and then it'll just be more
| slop...
| chickensong wrote:
| It's not exactly the doc you'd need for a human. There could be
| overlap, but each side may also have unique requirements that
| aren't necessarily suitable for the other. E.g. a doc for a
| human may have considerably more information than you'd want to
| give to the agent, or, you may want to define agent behavior
| for workflows that don't apply to a human.
|
| Also, while it may be hip to call any LLM output slop, that
| really isn't the case. Look at what a poor history we have of
| developer documentation. LLMs may not be great at everything,
| but they're actually quite capable when it comes to technical
| documentation. Even a 1-shot attempt by LLM is often way better
| than many devs who either can't write very well, or just can't
| be bothered to.
| AndyNemmity wrote:
| it's always funny, i think the opposite. I use a massive
| CLAUDE.md file, but it's targetted towards very specific details
| of what to do, and what not to do.
|
| I have a full system of agents, hooks, skills, and commands, and
| it all works for me quite well.
|
| I believe is massive context, but targetted context. It has to be
| valuable, and important.
|
| My agents are large. My skills are large. Etc etc.
| wowamit wrote:
| > Regardless of which model you're using, you may notice that
| Claude frequently ignores your CLAUDE.md file's contents.
|
| This is a news for me. And at the same time it isn't. Without the
| knowledge of how the models actually work, most of the prompting
| is guesstimate at best. You have no control over models via
| prompts.
| Ozzie_osman wrote:
| Has anyone had success getting Claude to write it's own Claude.md
| file? It should be able to deduce rules by looking at the code,
| documentation, and PR comments.
| handoflixue wrote:
| The main failure state I find is that Claude wants to write an
| incredibly verbose Claude.md, but if I instruct it "one
| sentence per topic, be concise" it usually does a good job.
|
| That said, a lot of what it can deduce by looking at the code
| is exactly what you shouldn't include, since it will usually
| deduce that stuff just by interacting with the code base.
| Claude doesn't seem good at that.
|
| An example of both overly-verbose and unnecessary:
|
| ### 1. Identify the Working Directory
|
| When a user asks you to work on something:
|
| 1. *Check which project* they're referring to
|
| 2. *Change to that directory* explicitly if needed
|
| 3. *Stay in that directory* for file operations
|
| ```bash
|
| # Example: Working on ProjectAlpha
|
| cd /home/user/code/ProjectAlpha
|
| ```
|
| (The one sentence version is "Each project has a subfolder; use
| pwd to make sure you're in the right directory", and the ideal
| version is probably just letting it occasionally spend 60
| seconds confused, until it remembers pwd exists)
| chickensong wrote:
| If you have any substantial codebase, it will write a massive
| file unless you explicitly tell it not to. It also will try and
| make updates, including garbage like historical or transitional
| changes, project status, etc...
|
| I think most people who use Claude regularly have probably come
| to the same conclusions as the article. A few bits of high-
| level info, some behavior stuff, and pointers to actual docs.
| Load docs as-needed, either by prompt or by skill. Work through
| lists and constantly update status so you can clear context and
| pick up where you left off. Any other approach eats too much
| context.
|
| If you have a complex feature that would require ingesting too
| many large docs, you can ask Claude to determine exactly what
| it needs to build the appropriate context for that feature and
| save that to a context doc that you load at the beginning of
| each session.
| adastra22 wrote:
| > Claude code injects the following system reminder...
|
| OMG this finally makes sense.
|
| Is there any way to turn off this behavior?
|
| Or better yet is there a way to filter the context that is being
| sent?
| edf13 wrote:
| Ah, never knew about this injection...
|
| <system-reminder> IMPORTANT: this context may or may not be
| relevant to your tasks. You should not respond to this context
| unless it is highly relevant to your task. </system-reminder>
|
| Perhaps a small proxy between Claude code and the API to enforce
| following CLAUDE.md may improve things... I may try this
| nurettin wrote:
| I've been a customer since sonnet 3.5. It is coming to the point
| where opus 4.5 usually does better than whatever your
| instructions say on claude.md just by reading your code and
| having a general sense of what your preferences are.
|
| I used to instruct about coding style (prefer functions, avoid
| classes, use structs for complex params and returns, avoid member
| functions unless needed by shared state, avoid superfluous
| comments, avoid silly utf8 glyphs, AoS vs SoA, dry, etc)
|
| I removed all my instructions and it basically never violates
| those points.
| magictux wrote:
| I think this is an overall good approach and I've got allright
| results with a similar approach - I still think that this
| CLAUDE.md experience is too magical and that Anthropic should
| really focus on it.
|
| Actually having official guidelines in their docs would be a good
| entrypoint, even though I guess we have this which is the closest
| available from anything official for now:
| https://www.claude.com/blog/using-claude-md-files
|
| One interesting thing I also noticed and used recently is that
| Claude Code ships with a @agent-claude-code-guide. I've used it
| to review and update my dev workflow / CLAUDE.md file but I've
| got mixed feelings on the discussion with the subagent.
| toenail wrote:
| A good Claude.md only needs one line:
|
| Read your instructions from Agents.md
| ilmj8426 wrote:
| I've recently started using a similar approach for my own
| projects. providing a high-level architecture overview in a
| single markdown file really helps the LLM understand the 'why'
| behind the code, not just the 'how'. Does anyone have a specific
| structure or template for Claude.md that works best for frontend-
| heavy projects (like React/Vite)? I find that's where the context
| window often gets cluttered.
| asim wrote:
| That's a good write up. Very useful to know. I'm sort of on the
| outside of all this. I've only sort of dabbled and now use
| copilot quite a lot with claude. What's being said here, reminds
| me a lot of CPU registers. If you think about the limited space
| in CPU registers and the processing of information is astounding,
| how much we're actually able to do. So we actually need higher
| layers of systems and operating systems to help manage all of
| this. So it feels like a lot of what's being said. Here will end
| up inevitably being an automated system or compiler or
| effectively an operating system. Even something basic like a
| paging system would make a lot of difference.
| aiibe wrote:
| Writing and updating CLAUDE.md or AGENTS.md feels like pointless
| to me. Humans are the real audience for documentation. The code
| changes too fast, and LLMs are stateless anyway. What's been
| working is just letting the LLM explore the relevant part of the
| code to acquire the context, defining the problem or feature, and
| asking for a couple of ways to tackle it. All in a one short
| prompt. That usually gets me solid options to pick and build it
| out. And always do, one session for one problem. This is my lazy
| approach to getting useful help from an LLM.
| arnorhs wrote:
| I agree with you, however your approach results in much longer
| LLM development runs, increased token usage and a whole lot of
| repetitive iterations.
| aiibe wrote:
| I'm definitely interested in reducing token usage techniques.
| But with one session one problem I've never hit a context
| limit yet, especially when the problem is small and clearly
| defined using divide-and-conquer. Also, agentic models are
| improving at tool use and should require fewer tokens. I'll
| take as many iterations as needed to ensure the code is
| correct.
| dncornholio wrote:
| Because it's stateless it's not pointless? Good codebases don't
| change fast. Stuff gets added but for the most stuff, they
| shouldn't change.
| aiibe wrote:
| A well-documented codebase lets both developers and agentic
| models locate relevant code easily. If you treat the model
| like a teammate, extra docs for LLMs are unnecessary. IMHO.
| In frontend work, code moves quickly.
| samuelknight wrote:
| I use .md to tell the model about my development workflow.
| Along the lines of "here's how you lint", "do this to re-
| generate the API", "this is how you run unit tests", "The
| sister repositories are cloned here and this is what they are
| for".
|
| One may argue that these should go in a README.md, but these
| markdowns are meant to be more streamlined for context, and
| it's not appropriate to put a one-liner in the imperative tone
| to fix model behavior in a top-level file like the README.md
| aiibe wrote:
| That kind of repetitive process belongs in a script, rather
| than baked into markdown prompts. Claude has custom hooks for
| that.
| aqme28 wrote:
| This is true but sometimes your codebase has unique quirks that
| you get tired of repeating. "No, Claude, we do it this other
| way here. Every time."
| aiibe wrote:
| Quirks are pretty much unavoidable. I tend to get better
| results using Codex. It sticks to established patterns. Slow,
| but more deliberate. Claude focuses more on speed.
| xpe wrote:
| > Humans are the real audience for documentation.
|
| Seeing "real" is a warning flag here that either-or thinking is
| in play.
|
| Putting aside hopes and norms, we live in a world now where
| multiple kinds of agents (human and non-human) are contributing
| to codebases. They do not contribute equally; they work
| according to different mechanisms, with different strengths and
| weaknesses, with different economic and cultural costs.
|
| Recall a lesson from Ralph Waldo Emerson: "a foolish
| consistency is the hobgoblin of little minds" [1]. Don't cling
| to the past; pay attention to the now, and do what works.
| Another way of seeing it: don't force a false equivalence
| between things that warrant different treatment.
|
| If you find yourself thinking thoughts that do more harm than
| good (e.g. muddle rather than clarify), attempt to reframe them
| to better make sense of reality (which has texture and
| complexity).
|
| Here's my reframing: "Documentation serves different purposes
| to different agents across different contexts. So plan and
| execute accordingly."
|
| [1]:
| https://en.wikipedia.org/wiki/Wikipedia:Emerson_and_Wilde_on...
| philipp-gayret wrote:
| I find writing a good CLAUDE.md is done by running /init, and
| having the LLM write it. If you need more controls on how it
| should work, I would highly recommend you implement it in an
| unavoidable way via hooks and not in a handwritten note to your
| LLM.
| jankdc wrote:
| I'm not sure if Claude Code has integrated it in its system
| prompts or not since it's moving at breakneck speed, but one
| instruction I like putting on all of my projects is to "Prompt
| for technical decisions from user when choices are unsure". This
| would almost always trigger the prompting feature that Claude
| Code has for me when it's got some uncertainty about the
| instructions I gave it, giving me options or alternatives on how
| to approach the problem when planning or executing.
|
| This way, it's got more of a chance in generating something that
| I wanted, rather than running off on it's own.
| saberience wrote:
| I find the Claude.md file mostly useless. It seems to be 50/50 or
| LESS that Claude.md even reads/uses this file.
|
| You can easily test this by adding some mandatory instruction
| into the file. E.g. "Any new method you write must have less than
| 50 lines or code." Then use Claude for ten minutes and watch it
| blow through this limit again and again.
|
| I use CC and Codex extensively and I constantly am resetting my
| context and manually pasting my custom instructions in again and
| again, because these models DO NOT remember or pay attention to
| Claude.md or Agents.md etc.
| uncletaco wrote:
| Honestly I'd rather google get their gemini tool in better shape.
| I know for a fact it doesn't ignore instructions like Claude code
| does but it is horrible at editing files.
| rcarmo wrote:
| PSA: Claude can also use .github/copilot-instructions.md
|
| If you're using VSCode, that is automatically added to context
| (and I think in Zed that happens as well, although I can't verify
| right now).
| fpauser wrote:
| Even better: learn to code yourself.
| _august wrote:
| I copied this post and gave it to claude code, and had it self-
| modify CLAUDE.md. It.. worked really well.
| scelerat wrote:
| > we recommend keeping task-specific instructions in separate
| markdown files with self-descriptive names somewhere in your
| project.
|
| Should do this for human developers too. Can't count the number
| of times I've been thrown onto a project and had to spend a
| significant amount of time opening and skimming files just to
| answer simple questions that should be answered in high-level
| docs like this.
| abustamam wrote:
| There's a funny joke I heard that they made Claude Code only to
| force developers to write better documentation.
|
| But in all seriousness, it's working. I write cursor rules
| religiously and I point other devs to them. Its great.
| minor3 wrote:
| Yeah I do love how many "best practices" we are only
| implementing because of LLMs, even though they were massively
| beneficial for humans prior as well.
| vaer-k wrote:
| > we recommend keeping task-specific instructions in separate
| markdown files with self-descriptive names somewhere in your
| project.
|
| Why should we do this when anthropic specifically recommends
| creating multiple CLAUDE.md files in various directories where
| the information is specific and pertinent? It seems to me that
| anthropic has designed claude to look for claude.md for guidance,
| and randomly named markdown files may or may not stand out to it
| as it searches the directory.
|
| You can place CLAUDE.md files in several locations:
|
| > The root of your repo, or wherever you run claude from (the
| most common usage). Name it CLAUDE.md and check it into git so
| that you can share it across sessions and with your team
| (recommended), or name it CLAUDE.local.md and .gitignore it Any
| parent of the directory where you run claude. This is most useful
| for monorepos, where you might run claude from root/foo, and have
| CLAUDE.md files in both root/CLAUDE.md and root/foo/CLAUDE.md.
| Both of these will be pulled into context automatically Any child
| of the directory where you run claude. This is the inverse of the
| above, and in this case, Claude will pull in CLAUDE.md files on
| demand when you work with files in child directories Your home
| folder (~/.claude/CLAUDE.md), which applies it to all your claude
| sessions
|
| https://www.anthropic.com/engineering/claude-code-best-pract...
| andai wrote:
| >Frontier thinking LLMs can follow ~ 150-200 instructions with
| reasonable consistency.
|
| Doesn't that mean that Claude Code's system prompt exhausts that
| budget before you even get to CLAUDE.md and the user prompt?
|
| Edit: They say Claude Code's system prompt has 50. I might have
| misjudged then. It seemed pretty verbose to me!
|
| The part about smaller models attending to fewer instructions is
| interesting too, since most of what was added doesn't seem
| necessary for the big models. I thought they added them so Haiku
| could handle the job as well, despite a relative lack of common
| sense.
| rootnod3 wrote:
| Here's an idea for LLM makers: allow for a very rigid and
| structured Claude.md file. One that gives detailed instructions,
| as void of ambiguity as possible. Then go and refine said
| language, allow maybe for more than one file to give it some file
| structure. Iterate on that for a few years and if you ever need a
| name for it, you might wanna give it a name describing something
| that describes a program, or maybe if you are inclined
| enough....a programming language.
|
| Have we really reached the low point that we need tutorials on
| how to coerce a LLM into doing what we want instead of
| just....writing the god damn code?
| sixothree wrote:
| I pointed CC to this URL and told it to fix my files in planning
| mode. It gave me some options and did all of the work.
___________________________________________________________________
(page generated 2025-12-01 23:02 UTC)