[HN Gopher] Compressed Agents.md > Agent Skills
___________________________________________________________________
Compressed Agents.md > Agent Skills
Author : maximedupre
Score : 63 points
Date : 2026-01-29 13:08 UTC (9 hours ago)
(HTM) web link (vercel.com)
(TXT) w3m dump (vercel.com)
| ares623 wrote:
| 2 months later: "Anthropic introduces 'Claude Instincts'"
| EnPissant wrote:
| This is confusing.
|
| TFA says they added an index to Agents.md that told the agent
| where to find all documentation and that was a big improvement.
|
| The part I don't understand is that this is exactly how I thought
| skills work. The short descriptions are given to the model up-
| front and then it can request the full documentation as it wants.
| With skills this is called "Progressive disclosure".
|
| Maybe they used more effective short descriptions in the
| AGENTS.md than they did in their skills?
| NitpickLawyer wrote:
| The reported tables also don't match the screenshots. And their
| baselines and tests are too close to tell (judging by the
| screenshots not tables). 29/33 baseline, 31/33 skills, 32/33
| skills + use skill prompt, 33/33 agent.md
| sally_glance wrote:
| I also thought this is how skills work, but in practice I
| experienced similar issues. The agents I'm using (Gemini CLI,
| Opencode, Claude) all seem to have trouble activating skills on
| their own unless explicitly prompted. Yeah, probably this will
| be fixed over the next couple of generations but right now
| dumping the documentation index right into the agent prompt or
| AGENTS.md works much better for me. Maybe it's similar to
| structured output or tool calls which also only started working
| well after providers specifically trained their models for
| them.
| tottenhm wrote:
| > In 56% of eval cases, the skill was never invoked. The agent
| had access to the documentation but didn't use it.
|
| The agent passes the Turing test...
| pietz wrote:
| Isn't it obvious that an agent will do better if he internalizes
| the knowledge on something instead of having the option to
| request it?
|
| Skills are new. Models haven't been trained on them yet. Give it
| 2 months.
| WA wrote:
| Not so obvious, because the model still needs to look up the
| required doc. The article glances over this detail a little bit
| unfortunately. The model needs to decide when to use a skill,
| but doesn't it also need to decide when to look up
| documentation instead of relying on pretraining data?
| sothatsit wrote:
| I believe the skills would contain the documentation. It
| would have been nice for them to give more information on the
| granularity of the skills they created though.
| velcrovan wrote:
| Removing the skill does remove a level of indirection.
|
| It's a difference of "choose whether or not to make use of a
| skill that would THEN attempt to find what you need in the
| docs" vs. "here's a list of everything in the docs that you
| might need."
| rao-v wrote:
| In a month or three we'll have the sensible approach, which is
| smaller cheaper fast models optimized for looking at a query and
| identifying which skills / context to provide in full to the main
| model.
|
| It's really silly to waste big model tokens on throat clearing
| steps
| Calavar wrote:
| I thought most of the major AI programming tools were already
| doing this. Isn't this what subagents are in Claude code?
| MillionOClock wrote:
| I don't know about Claude Code but in GitHub Copilot as far
| as I can tell the subagents are just always the same model as
| the main one you are using. They also need to be started
| manually by the main agent in many cases, whereas maybe the
| parent comment was referring about calling them more
| deterministically?
| jryan49 wrote:
| Something that I always wonder with each blog post comparing
| different types of prompt engineering is did they run it once, or
| multiple times? LLMs are not consistent for the same task. I
| imagine they realize this of course, but I never get enough
| details of the testing methodology.
| only-one1701 wrote:
| This drives me absolutely crazy. Non-falsifiable and non-
| deterministic results. All of this stuff is (at best) anecdotes
| and vibes being presented as science and engineering.
| bluGill wrote:
| That is my experience. Sometimes the LLM gives good results,
| sometimes it does something stupid. You tell it what to do,
| and like a stubborn 5 year old it ignores you - even after it
| tries it and fails it will do what you tell it for a while
| and then go back to the thing that doesn't work.
| sothatsit wrote:
| This seems like an issue that will be fixed in newer model
| releases that are better trained to use skills.
| thom wrote:
| You need the model to interpret documentation as policy you care
| about (in which case it will pay attention) rather than as
| something it can look up if it doesn't know something (which it
| will never admit). It helps to really internalise the personality
| of LLMs as wildly overconfident but utterly obsequious.
| smcleod wrote:
| Sounds like they've been using skills incorrectly if they're
| finding their agents don't invoke the skills. I have Claude Code
| agents calling my skills frequently, almost every session. You
| need to make sure your skill descriptions are well defined and
| describe when to use them and that your tasks / goals clearly set
| out requirements that align with the available skills.
| velcrovan wrote:
| I think if you read it, their agents did invoke the skills and
| they did find ways to increase the agents' use of skills quite
| a bit. But the new approach works 100% of the time as opposed
| to 79% of the time, which is a big deal. Skills might be
| working OK for you at that 79% level and for your particular
| codebase/tool set, that doesn't negate anything they've written
| here.
| jgbuddy wrote:
| Am I missing something here?
|
| Obviously directly including context in something like a system
| prompt will put it in context 100% of the time. You could just as
| easily take all of an agent's skills, feed it to the agent (in a
| system prompt, or similar) and it will follow the instructions
| more reliably.
|
| However, at a certain point you have to use skills, because
| including it in the context every time is wasteful, or not
| possible. this is the same reason anthropic is doing advanced
| tool use ref: https://www.anthropic.com/engineering/advanced-
| tool-use, because there's not enough context to straight up
| include everything.
|
| It's all a context / price trade off, obviously if you have the
| context budget just include what you can directly (in this case,
| compressing into a AGENTS.md)
| orlandohohmeier wrote:
| I've been using symlinked agent files for about a year as a
| hacky workaround before skils became a thing load additional
| "context" for different tasks, and it might actually address
| the issue you're talking about. Honestly, it's worked so well
| for me that I haven't really felt the need to change it.
| observationist wrote:
| This is one of the reasons the RLM methodology works so well.
| You have access to as much information as you want in the
| overall environment, but only the things relevant to the task
| at hand get put into context for the current task, and it shows
| up there 100% of the time, as opposed to lossy "memory"
| compaction and summarization techniques, or probabilistic agent
| skills implementations.
|
| Having an agent manage its own context ends up being
| extraordinarily useful, on par with the leap from non-reasoning
| to reasoning chats. There are still issues with memory and
| integration, and other LLM weaknesses, but agents are probably
| going to get extremely useful this year.
| thorum wrote:
| The article presents AGENTS.md as something distinct from Skills,
| but it is actually a simplified instance of the same concept.
| Their AGENTS.md approach tells the AI where to find instructions
| for performing a task. That's a Skill.
|
| I expect the benefit is from better Skill design, specifically,
| minimizing the number of steps and decisions between the AI's
| starting state and the correct information. Fewer transitions ->
| fewer chances for error to compound.
| CjHuber wrote:
| That feels like a stupid article. well of course if you have one
| single thing you want to optimize putting it into AGENTS.md is
| better. but the advantage of skills is exactly that you don't
| cram them all into the AGENTS file. Let's say you had 3 different
| elaborate things you want the agent to do. good luck putting them
| all in your AGENTS.md and later hoping that the agent remembers
| any of it. After all the key advantage of the SKILLs is that they
| get loaded to the end of the context when needed
| sheepscreek wrote:
| It seems their tests rely on Claude alone. It's not safe to
| assume that Codex or Gemini will behave the same way as Claude. I
| use all three and each has its own idiosyncrasies.
| newzino wrote:
| The compressed agents.md approach is interesting, but the
| comparison misses a key variable: what happens when the agent
| needs to do something outside the scope of its instructions?
|
| With explicit skills, you can add new capabilities modularly -
| drop in a new skill file and the agent can use it. With a
| compressed blob, every extension requires regenerating the entire
| instruction set, which creates a versioning problem.
|
| The real question is about failure modes. A skill-based system
| fails gracefully when a skill is missing - the agent knows it
| can't do X. A compressed system might hallucinate capabilities it
| doesn't actually have because the boundary between "things I can
| do" and "things I can't" is implicit in the training rather than
| explicit in the architecture.
|
| Both approaches optimize for different things. Compressed
| optimizes for coherent behavior within a narrow scope. Skills
| optimize for extensibility and explicit capability boundaries.
| The right choice depends on whether you're building a specialist
| or a platform.
| delduca wrote:
| Ah nice... vercel is vibecoded
| BenoitEssiambre wrote:
| Wouldn't this have been more readable with a \n newline instead
| of a pipe operator as a seperator? This wouldn't have made the
| prompt longer.
___________________________________________________________________
(page generated 2026-01-29 23:00 UTC)