hngopher.com

       [HN Gopher] Compressed Agents.md > Agent Skills
       ___________________________________________________________________
        
       Compressed Agents.md > Agent Skills
        
       Author : maximedupre
       Score  : 63 points
       Date   : 2026-01-29 13:08 UTC (9 hours ago)
        
 (HTM) web link (vercel.com)
 (TXT) w3m dump (vercel.com)
        
       | ares623 wrote:
       | 2 months later: "Anthropic introduces 'Claude Instincts'"
        
       | EnPissant wrote:
       | This is confusing.
       | 
       | TFA says they added an index to Agents.md that told the agent
       | where to find all documentation and that was a big improvement.
       | 
       | The part I don't understand is that this is exactly how I thought
       | skills work. The short descriptions are given to the model up-
       | front and then it can request the full documentation as it wants.
       | With skills this is called "Progressive disclosure".
       | 
       | Maybe they used more effective short descriptions in the
       | AGENTS.md than they did in their skills?
        
         | NitpickLawyer wrote:
         | The reported tables also don't match the screenshots. And their
         | baselines and tests are too close to tell (judging by the
         | screenshots not tables). 29/33 baseline, 31/33 skills, 32/33
         | skills + use skill prompt, 33/33 agent.md
        
         | sally_glance wrote:
         | I also thought this is how skills work, but in practice I
         | experienced similar issues. The agents I'm using (Gemini CLI,
         | Opencode, Claude) all seem to have trouble activating skills on
         | their own unless explicitly prompted. Yeah, probably this will
         | be fixed over the next couple of generations but right now
         | dumping the documentation index right into the agent prompt or
         | AGENTS.md works much better for me. Maybe it's similar to
         | structured output or tool calls which also only started working
         | well after providers specifically trained their models for
         | them.
        
       | tottenhm wrote:
       | > In 56% of eval cases, the skill was never invoked. The agent
       | had access to the documentation but didn't use it.
       | 
       | The agent passes the Turing test...
        
       | pietz wrote:
       | Isn't it obvious that an agent will do better if he internalizes
       | the knowledge on something instead of having the option to
       | request it?
       | 
       | Skills are new. Models haven't been trained on them yet. Give it
       | 2 months.
        
         | WA wrote:
         | Not so obvious, because the model still needs to look up the
         | required doc. The article glances over this detail a little bit
         | unfortunately. The model needs to decide when to use a skill,
         | but doesn't it also need to decide when to look up
         | documentation instead of relying on pretraining data?
        
           | sothatsit wrote:
           | I believe the skills would contain the documentation. It
           | would have been nice for them to give more information on the
           | granularity of the skills they created though.
        
           | velcrovan wrote:
           | Removing the skill does remove a level of indirection.
           | 
           | It's a difference of "choose whether or not to make use of a
           | skill that would THEN attempt to find what you need in the
           | docs" vs. "here's a list of everything in the docs that you
           | might need."
        
       | rao-v wrote:
       | In a month or three we'll have the sensible approach, which is
       | smaller cheaper fast models optimized for looking at a query and
       | identifying which skills / context to provide in full to the main
       | model.
       | 
       | It's really silly to waste big model tokens on throat clearing
       | steps
        
         | Calavar wrote:
         | I thought most of the major AI programming tools were already
         | doing this. Isn't this what subagents are in Claude code?
        
           | MillionOClock wrote:
           | I don't know about Claude Code but in GitHub Copilot as far
           | as I can tell the subagents are just always the same model as
           | the main one you are using. They also need to be started
           | manually by the main agent in many cases, whereas maybe the
           | parent comment was referring about calling them more
           | deterministically?
        
       | jryan49 wrote:
       | Something that I always wonder with each blog post comparing
       | different types of prompt engineering is did they run it once, or
       | multiple times? LLMs are not consistent for the same task. I
       | imagine they realize this of course, but I never get enough
       | details of the testing methodology.
        
         | only-one1701 wrote:
         | This drives me absolutely crazy. Non-falsifiable and non-
         | deterministic results. All of this stuff is (at best) anecdotes
         | and vibes being presented as science and engineering.
        
           | bluGill wrote:
           | That is my experience. Sometimes the LLM gives good results,
           | sometimes it does something stupid. You tell it what to do,
           | and like a stubborn 5 year old it ignores you - even after it
           | tries it and fails it will do what you tell it for a while
           | and then go back to the thing that doesn't work.
        
       | sothatsit wrote:
       | This seems like an issue that will be fixed in newer model
       | releases that are better trained to use skills.
        
       | thom wrote:
       | You need the model to interpret documentation as policy you care
       | about (in which case it will pay attention) rather than as
       | something it can look up if it doesn't know something (which it
       | will never admit). It helps to really internalise the personality
       | of LLMs as wildly overconfident but utterly obsequious.
        
       | smcleod wrote:
       | Sounds like they've been using skills incorrectly if they're
       | finding their agents don't invoke the skills. I have Claude Code
       | agents calling my skills frequently, almost every session. You
       | need to make sure your skill descriptions are well defined and
       | describe when to use them and that your tasks / goals clearly set
       | out requirements that align with the available skills.
        
         | velcrovan wrote:
         | I think if you read it, their agents did invoke the skills and
         | they did find ways to increase the agents' use of skills quite
         | a bit. But the new approach works 100% of the time as opposed
         | to 79% of the time, which is a big deal. Skills might be
         | working OK for you at that 79% level and for your particular
         | codebase/tool set, that doesn't negate anything they've written
         | here.
        
       | jgbuddy wrote:
       | Am I missing something here?
       | 
       | Obviously directly including context in something like a system
       | prompt will put it in context 100% of the time. You could just as
       | easily take all of an agent's skills, feed it to the agent (in a
       | system prompt, or similar) and it will follow the instructions
       | more reliably.
       | 
       | However, at a certain point you have to use skills, because
       | including it in the context every time is wasteful, or not
       | possible. this is the same reason anthropic is doing advanced
       | tool use ref: https://www.anthropic.com/engineering/advanced-
       | tool-use, because there's not enough context to straight up
       | include everything.
       | 
       | It's all a context / price trade off, obviously if you have the
       | context budget just include what you can directly (in this case,
       | compressing into a AGENTS.md)
        
         | orlandohohmeier wrote:
         | I've been using symlinked agent files for about a year as a
         | hacky workaround before skils became a thing load additional
         | "context" for different tasks, and it might actually address
         | the issue you're talking about. Honestly, it's worked so well
         | for me that I haven't really felt the need to change it.
        
         | observationist wrote:
         | This is one of the reasons the RLM methodology works so well.
         | You have access to as much information as you want in the
         | overall environment, but only the things relevant to the task
         | at hand get put into context for the current task, and it shows
         | up there 100% of the time, as opposed to lossy "memory"
         | compaction and summarization techniques, or probabilistic agent
         | skills implementations.
         | 
         | Having an agent manage its own context ends up being
         | extraordinarily useful, on par with the leap from non-reasoning
         | to reasoning chats. There are still issues with memory and
         | integration, and other LLM weaknesses, but agents are probably
         | going to get extremely useful this year.
        
       | thorum wrote:
       | The article presents AGENTS.md as something distinct from Skills,
       | but it is actually a simplified instance of the same concept.
       | Their AGENTS.md approach tells the AI where to find instructions
       | for performing a task. That's a Skill.
       | 
       | I expect the benefit is from better Skill design, specifically,
       | minimizing the number of steps and decisions between the AI's
       | starting state and the correct information. Fewer transitions ->
       | fewer chances for error to compound.
        
       | CjHuber wrote:
       | That feels like a stupid article. well of course if you have one
       | single thing you want to optimize putting it into AGENTS.md is
       | better. but the advantage of skills is exactly that you don't
       | cram them all into the AGENTS file. Let's say you had 3 different
       | elaborate things you want the agent to do. good luck putting them
       | all in your AGENTS.md and later hoping that the agent remembers
       | any of it. After all the key advantage of the SKILLs is that they
       | get loaded to the end of the context when needed
        
       | sheepscreek wrote:
       | It seems their tests rely on Claude alone. It's not safe to
       | assume that Codex or Gemini will behave the same way as Claude. I
       | use all three and each has its own idiosyncrasies.
        
       | newzino wrote:
       | The compressed agents.md approach is interesting, but the
       | comparison misses a key variable: what happens when the agent
       | needs to do something outside the scope of its instructions?
       | 
       | With explicit skills, you can add new capabilities modularly -
       | drop in a new skill file and the agent can use it. With a
       | compressed blob, every extension requires regenerating the entire
       | instruction set, which creates a versioning problem.
       | 
       | The real question is about failure modes. A skill-based system
       | fails gracefully when a skill is missing - the agent knows it
       | can't do X. A compressed system might hallucinate capabilities it
       | doesn't actually have because the boundary between "things I can
       | do" and "things I can't" is implicit in the training rather than
       | explicit in the architecture.
       | 
       | Both approaches optimize for different things. Compressed
       | optimizes for coherent behavior within a narrow scope. Skills
       | optimize for extensibility and explicit capability boundaries.
       | The right choice depends on whether you're building a specialist
       | or a platform.
        
       | delduca wrote:
       | Ah nice... vercel is vibecoded
        
       | BenoitEssiambre wrote:
       | Wouldn't this have been more readable with a \n newline instead
       | of a pipe operator as a seperator? This wouldn't have made the
       | prompt longer.
        
       ___________________________________________________________________
       (page generated 2026-01-29 23:00 UTC)