hngopher.com

       [HN Gopher] Claude Skills
       ___________________________________________________________________
        
       Claude Skills
        
       https://www.anthropic.com/engineering/equipping-agents-for-t...
        
       Author : meetpateltech
       Score  : 397 points
       Date   : 2025-10-16 16:05 UTC (6 hours ago)
        
 (HTM) web link (www.anthropic.com)
 (TXT) w3m dump (www.anthropic.com)
        
       | j45 wrote:
       | I wonder if Claude Skills will help return Claude back to the
       | level of performance it had a few months ago.
        
       | bicx wrote:
       | Interesting. For Claude Code, this seems to have generous overlap
       | with existing practice of having markdown "guides" listed for
       | access in the CLAUDE.md. Maybe skills can simply make managing
       | such guides more organized and declarative.
        
         | kfarr wrote:
         | Yeah my first thought was, oh it sounds like a bunch of
         | CLAUDE.md's under the surface :P
        
         | crancher wrote:
         | It's interesting (to me) visualizing all of these techniques as
         | efforts to replicate A* pathfinding through the model's vector
         | space "maze" to find the desired outcome. The potential to "one
         | shot" any request is plausible with the right context.
        
           | candiddevmike wrote:
           | > The potential to "one shot" any request is plausible with
           | the right context.
           | 
           | You too can win a jackpot by spinning the wheel just like
           | these other anecdotal winners. Pay no attention to your
           | dwindling credits every time you do though.
        
             | NitpickLawyer wrote:
             | On the other hand, our industry has always chased the "one
             | baby in one month out of 9 mothers" paradigm. While you
             | couldn't do that with humans, it's likely you'll soon (tm)
             | be able to do it with agents.
        
         | j45 wrote:
         | If so, it would be a better way than encapsulating
         | functionality in markdown.
         | 
         | I have been using claude code to create some and organize them
         | but they can have diminishing return.
        
         | guluarte wrote:
         | it also may point out that the solution for context rot may not
         | be coming in the foreseeable future
        
       | phildougherty wrote:
       | getting hard to keep up with skills, plugins, marketplaces,
       | connectors, add-ons, yada yada
        
         | prng2021 wrote:
         | Yep. Now I need an AI to help me use AI
        
           | consumer451 wrote:
           | I mean, that is a very common thing that I do.
        
             | wartywhoa23 wrote:
             | That's why the key word for all the AI horror stories that
             | have been emerging lately is "recursion".
        
               | consumer451 wrote:
               | Does that imply no human in the loop? If so, that's not
               | what I meant, or do. Whoever is doing that at this point:
               | bless your heart :)
        
               | mikkupikku wrote:
               | "Recursion" is a word that shows up a lot in the rants of
               | people in AI psychosis (believe they turned the chatbot
               | into god, or believe the chatbot revealed themselves to
               | be god.)
        
           | andoando wrote:
           | Train AI to setup/train AI on doing tasks. Bam
        
           | josefresco wrote:
           | Joking aside, I ask Claude how to uses Claude... all the
           | time! Sometimes I ask ChatGTP about Claude. It actually
           | doesn't work well because they don't imbue these AI tools
           | with any special knowledge about how they work, they seem to
           | rely on public documentation which usually lags behind the
           | breakneck pace of these feature-releases.
        
         | gordonhart wrote:
         | Agree -- it's a big downside as a user to have more and more of
         | these provider-specific features. More to learn, more to
         | configure, more to get locked into.
         | 
         | Of course this is why the model providers keep shipping new
         | ones; without them their product is a commodity.
        
         | hansonkd wrote:
         | Thats the start of the singularity. The changes will keep
         | accelerating and less and less people will be able to keep up
         | until only the AIs themselves know how to use.
        
           | matthewaveryusa wrote:
           | Nah, we'll create AI to manage the AI....oh
        
           | skybrian wrote:
           | People thought the same in the '90's. The argument that
           | technology accelerates and "software eats the world" doesn't
           | depend on AI.
           | 
           | It's not exactly wrong, but it leaves out a lot of
           | intermediate steps.
        
             | xpe wrote:
             | Yes and as we rely on AI to help us choose our tools... the
             | phenomena feels very different, don't you think? Human
             | thinking, writing, talking, etc is becoming less important
             | in this feedback loop seems to me.
        
           | xpe wrote:
           | abstractions all the way down:                   abstraction
           | abstraction             abstraction               abstraction
           | ...
        
           | AaronAPU wrote:
           | I don't think these are things to keep up with. Those would
           | be actual fundamental advances in the transformer
           | architecture and core elements around it.
           | 
           | This stuff is like front end devs building fad add-ons which
           | call into those core elements and falsely market themselves
           | as fundamental advancements.
        
         | marcusestes wrote:
         | Agreed, but I think it's actually simple.
         | 
         | Plugins include: * Commands * MCPs * Subagents * Now, Skills
         | 
         | Marketplaces aggregate plugins.
        
           | input_sh wrote:
           | It's so simple you didn't even name all of them properly.
        
         | xpe wrote:
         | If I were to say "Claude Skills can be seen as a particular
         | productization of a system prompt" would I be wrong?
         | 
         | From a technical perspective, it seems like unnecessary
         | complexity in a way. Of course I recognize there are lot of
         | product decisions that seem to layer on 'unnecessary'
         | abstractions but still have utility.
         | 
         | In terms of connecting with customers, it seems sensible, under
         | the assumption that Anthropic is triaging customer feedback
         | well _and_ leading to where they want to go (even if they don
         | 't know it yet).
         | 
         |  _Update_ : a sibling comment just wrote something quite
         | similar: "All these things are designed to create lock in for
         | companies. They don't really fundamentally add to the
         | functionality of LLMs." I think I agree.
        
         | tempusalaria wrote:
         | All these things are designed to create lock in for companies.
         | They don't really fundamentally add to the functionality of
         | LLMs. Devs should focus on working directly with model generate
         | apis and not using all the decoration.
        
           | tqwhite wrote:
           | Me? I love some lock in. Give me the coolest stuff and I'll
           | be your customer forever. I do not care about trying to be my
           | own AI company. I'd feel the same about OpenAI if they got me
           | first... but they didn't. I am team Anthropic.
        
         | dominicq wrote:
         | Features will be added until morale improves
        
         | hansmayer wrote:
         | Well, have some understanding: the good folks need to produce
         | _something_ , since their main product is not delivering the
         | much yearned for era of joblessness yet. It's not for you, it's
         | signalling their investors - see, we're not burning your cash
         | paying a bunch of PhDs to tweak the model weights without
         | visible results. We are actually building products. With a huge
         | and willing A/B testing base.
        
         | hiq wrote:
         | IMHO, don't, don't keep up. Just like "best practices in prompt
         | engineering", these are just temporary workaround for current
         | limitations, and they're bound to disappear quickly. Unless you
         | really need the extra performance right now, just wait until
         | models get you this performance out of the box instead of
         | investing into learning something that'll be obsolete in
         | months.
        
           | spprashant wrote:
           | I agree with this take. Models and the tooling around them
           | are both in flux. I d rather not spend time learning
           | something in detail for these companies to then pull the plug
           | chasing next-big-thing.
        
           | lukev wrote:
           | I agree with your conclusion not to sweat all these features
           | too much, but only because they're not hard at all to
           | understand on demand once you realize that they all boil down
           | to a small handful of ways to manipulate model context.
           | 
           | But context engineering very much not going anywhere as a
           | discipline. Bigger and better models will _by no means_ make
           | it obsolete. In fact, raw model capability is pretty clearly
           | leveling off into the top of an S-curve, and most real-world
           | performance gains over the last year have been precisely
           | _because_ of innovations on how to better leverage context.
        
           | vdfs wrote:
           | IMO, these are just marketing or new ways of using functions
           | calling, under the hood they all get re-written as tools the
           | model can call
        
         | adidoit wrote:
         | All of it is ultimately managing the context for a model. Just
         | different methods
        
       | BoredPositron wrote:
       | It is a bit ironic that the better the models get they seem to
       | need more and more user input.
        
         | quintu5 wrote:
         | More like they can better react to user input within their
         | context window. With older models, the value of that additional
         | user input would have been much more limited.
        
       | nozzlegear wrote:
       | It superficially reminds me of the old "Alexa Skills" thing (I'm
       | not even sure if Alexa still has "Skills"). It might just be the
       | name making that connection for me.
        
         | j45 wrote:
         | Seems to be a bit more than that.
        
         | phildougherty wrote:
         | Alexa skills are 3rd party add-ons/plugins. Want to control
         | your hue lights? add the phillips hue skill. I think claude
         | skills in an alexa world would be like having to seed alexa
         | with a bunch of context for it to remember how to turn my
         | lights on and off or it will randomly attempt a bunch of
         | incorrect ways of doing it until it gets lucky.
        
         | candiddevmike wrote:
         | And how many of those Alexa Skills are still being updated...
         | 
         | This is where waiting for this stuff to stablize/standardize,
         | and then writing a "skill" based on an actual RFC or standard
         | protocol makes more sense, IMO. I've been burned too many times
         | building vendor-locked chatbot extensions.
        
           | nozzlegear wrote:
           | > And how many of those Alexa Skills are still being
           | updated...
           | 
           | Not mine! I made a few when they first opened it up to devs,
           | but I was trying to use Azure Logic Apps (something like
           | that?) at the time which was supremely slow and finicky with
           | F#, and an exercise in frustration.
        
       | joilence wrote:
       | If I understand correctly, looks like `skill` is a instructed
       | usage / pattern of tools, so it saves llm agent's efforts at
       | trial & error of using tools? and it basically just a prompt.
        
       | sshine wrote:
       | I love how the promise of free labor motivates everyone to become
       | API first, document their practices, and plan ahead in writing
       | before coding.
        
         | ebiester wrote:
         | It helps that you can have the "free" labor document the
         | processes and build the plan.
        
         | skybrian wrote:
         | Cheaper, not free. Also, no training to learn a new skill.
         | 
         | Building a new one that works well is a project, but then it
         | will scale up as much as you like.
         | 
         | This is bringing some of the advantages of software development
         | to office tasks, but you give up some things like reliable,
         | deterministic results.
        
           | sshine wrote:
           | There is an acquisition cost of researching and developing
           | the LLM, but the running cost should not be classified as a
           | wage, hence cost of labor is zero.
        
             | maigret wrote:
             | It's still opex for finance
        
             | skybrian wrote:
             | Don't call it "free labor" at all then? Regardless, running
             | an LLM is usually not free.
        
       | _pdp_ wrote:
       | At first I wasn't sure what this is. Upon further inspection
       | skills are effectively a bunch of markdown files and scripts that
       | get unzipped at the right time and used as context. The scripts
       | are executed to get deterministic output.
       | 
       | The idea is interesting and something I shall consider for our
       | platform as well.
        
       | nperez wrote:
       | Seems like a more organized way to do the equivalent of a folder
       | full of md files + instructing the LLM to ls that folder and read
       | the ones it needs
        
         | j45 wrote:
         | If so it would be most welcome since LLMs doesn't always
         | consistently follow the folder full of MD files to the same
         | depth and consistency.
        
           | RamtinJ95 wrote:
           | what makes it more likely that claude would read these .md
           | files then?
        
             | phildougherty wrote:
             | trained to
        
             | j45 wrote:
             | Skills is hopefully put through a deterministic process
             | that is guaranteed to occur, instead of a non-deterministic
             | one that can only ever be guaranteed to happen most of the
             | time (the way it is now).
        
       | meetpateltech wrote:
       | Detailed engineering blog:
       | 
       | "Equipping agents for the real world with Agent Skills"
       | https://www.anthropic.com/engineering/equipping-agents-for-t...
        
         | dang wrote:
         | Thanks, we'll put that link in the toptext as well
        
       | jampa wrote:
       | I think this is great. A problem with huge codebases is that
       | CLAUDE.md files become bloated with niche workflows like CI and
       | E2E testing. Combined with MCPs, this pollutes the context window
       | and eventually degrades performance.
       | 
       | You get the best of both worlds if you can select tokens by
       | problem rather than by folder.
       | 
       | The key question is how effective this will be with tool calling.
        
       | crancher wrote:
       | Seems like the exact same thing, from front page a few days ago:
       | https://github.com/obra/superpowers/tree/main
        
       | Flux159 wrote:
       | I wonder how this works with mcpb (renamed from dxt Desktop
       | extensions): https://github.com/anthropics/mcpb
       | 
       | Specifically, it looks like skills are a different structure than
       | mcp, but overlap in what they provide? Skills seem to be just
       | markdown file & then scripts (instead of prompts & tool calls
       | defined in MCP?).
       | 
       | Question I have is why would I use one over the other?
        
         | rahimnathwani wrote:
         | One difference I see is that with tool calls the LLM doesn't
         | see the actual code. It delegates the task to the LLM. With
         | scripts in an agent, I _think_ the agent can see the code being
         | run and can decide to run something different. I may be wrong
         | about this. The documentation says that assets aren't read into
         | context. It doesn't say the same about scripts, which is what
         | makes me think the LLM can read them.
        
       | irtemed88 wrote:
       | Can someone explain the differences between this and Agents in
       | Claude Code? Logically they seem similar. From my perspective it
       | seems like Skills are more well-defined in their behavior and
       | function?
        
         | j45 wrote:
         | Skills might be used by Agents.
         | 
         | Skills can merge together like lego.
         | 
         | Agents might be more separated.
        
         | rahimnathwani wrote:
         | Subagents have their own context. Skills do not.
        
       | ryancnelson wrote:
       | The uptake on Claude-skills seems to have a lot of momentum
       | already! I was fascinated on Tuesday by "Superpowers" ,
       | https://blog.fsck.com/2025/10/09/superpowers/ ... and then
       | packaged up all the tool-building I've been working on for awhile
       | into somewhat tidy skills that i can delegate agents to:
       | 
       | http://github.com/ryancnelson/deli-gator I'd love any feedback
        
         | skinnymuch wrote:
         | Delegation is super cool. I can sometimes end up having too
         | much Linear issue context coming in. IE frequently I want a
         | Linear issue description and last comment retrieved. Linear MCP
         | grabs all comments which pollutes the context and fills it up
         | too much.
        
       | mousetree wrote:
       | I'm perplexed why they would use such a silly example in their
       | demo video (rotating an image of a dog upside down and cropping).
       | Surely they can find more compelling examples of where these
       | skills could be used?
        
         | alansaber wrote:
         | Dog photo >> informing the consumer
        
         | Mouvelie wrote:
         | You'd think so, eh ?
         | https://en.wikipedia.org/wiki/The_purpose_of_a_system_is_wha...
        
         | antiloper wrote:
         | The developer page uses a better example, a PDF processing
         | skill: https://github.com/anthropics/skills/tree/main/document-
         | skil...
         | 
         | I've been emulating this in claude code by manually @tagging
         | markdown files containing guides for common tasks in our
         | repository. Nice to see that this step is now automatic as
         | well.
        
         | mritchie712 wrote:
         | this is the best example I found
         | 
         | https://github.com/anthropics/skills/blob/main/document-skil...
         | 
         | I was dealing with 2 issues this morning getting Claude to
         | produce a .xlsx that are covered in the doc above
        
       | bgwalter wrote:
       | "Skills are repeatable and customizable instructions that Claude
       | can follow in any chat."
       | 
       | We used to call that a programming language. Here, they are
       | presumably repeatable instructions how to generate stolen code or
       | stolen procedures so users have to think even less or not at all.
        
       | azraellzanella wrote:
       | "Keep in mind, this feature gives Claude access to execute code.
       | While powerful, it means being mindful about which skills you use
       | --stick to trusted sources to keep your data safe."
       | 
       | Yes, this can only end well.
        
       | m3kw9 wrote:
       | I feel like this is making things more complicated than it needs
       | to be. LLMs should automatically do this behind you, you won't
       | even see it.
        
       | Imnimo wrote:
       | I feel like a danger with this sort of thing is that the
       | capability of the system to use the right skill is limited by the
       | little blurb you give about what the skill is for. Contrast with
       | the way a human learns skills - as we gain experience with a
       | skill, we get better at understanding when it's the right tool
       | for the job. But Claude is always starting from ground zero and
       | skimming your descriptions.
        
         | j45 wrote:
         | LLMs are a probability based calculation, so it will always
         | skim to some degree, and always guess to some degree, and often
         | pick the best choice available to it even though it might not
         | be the best.
         | 
         | For folks who this seems elusive for, it's worth learning how
         | the internals actually work, helps a great deal in how to
         | structure things in general, and then over time as the parent
         | comment said, specifically for individual cases.
        
         | zobzu wrote:
         | IMO this is a context window issue. Humans are pretty good are
         | memorizing super broad context without great accuracy.
         | Sometimes our "recall" function doesn't even work right ("How
         | do you say 'blah' in German again?"), so the more you
         | specialize (say, 10k hours / mastery), the better you are at
         | recalling a specific set of "skills", but perhaps not other
         | skills.
         | 
         | On the other hand, LLMs have a programatic context with
         | consistent storage and the ability to have perfect recall, they
         | just don't always generate the expected output in practice as
         | the cost to go through ALL context is prohibitive in terms of
         | power and time.
         | 
         | Skills.. or really just context insertion is simply a way to
         | prioritize their output generation manually. LLM "thinking
         | mode" is the same, for what it's worth - it really is just
         | reprioritizing context - so not "starting from scratch" per se.
         | 
         | When you start thinking about it that way, it makes sense - and
         | it helps using these tools more effectively too.
        
           | dwaltrip wrote:
           | There are ways to compensate for lack of "continual
           | learning", but recognizing that underlying missing piece is
           | important.
        
           | ryancnelson wrote:
           | I commented here already about deli-gator (
           | https://github.com/ryancnelson/deli-gator ) , but your
           | summary nailed what I didn't mention here before: Context.
           | 
           | I'd been re-teaching Claude to craft Rest-api calls with curl
           | every morning for months before i realized that skills would
           | let me delegate that to cheaper models, re-using cached-
           | token-queries, and save my context window for my actual
           | problem-space CONTEXT.
        
             | dingnuts wrote:
             | >I'd been re-teaching Claude to craft Rest-api calls with
             | curl every morning for months
             | 
             | what the fuck, there is absolutely no way this was cheaper
             | or more productive than just learning to use curl and
             | writing curl calls yourself. Curl isn't even hard! And if
             | you learn to use it, you get WAY better at working with
             | HTTP!
             | 
             | You're kneecapping yourself to expend more effort than it
             | would take to just write the calls, helping to train a bot
             | to do the job you should be doing
        
               | jmtulloss wrote:
               | My interpretation of the parent comment was that they
               | were loading specific curl calls into context so that
               | Claude could properly exercise the endpoints after making
               | changes.
        
               | F7F7F7 wrote:
               | He's likely talking about Claude's hook system that
               | Anthropic created to provide better control over context.
        
               | ryancnelson wrote:
               | _i_ know how to use curl. (I was a contributor before git
               | existed) ... watching Claude iterate to re-learn whether
               | to try application /x-form-urle ncoded or GET /?foo
               | wastes SO MUCH time and fills your context with "how to
               | curl" that you re-send over again until your context
               | compacts.
               | 
               | You are bad at reading comprehension. My comment meant I
               | can tell Claude "update jira with that test outcome in a
               | comment" and, Claude can eventually figure that out with
               | just a Key and curl, but that's way too low level.
               | 
               | What I linked to literally explains that, with code and a
               | blog post.
        
           | mbesto wrote:
           | > IMO this is a context window issue.
           | 
           | Not really. It's a consequential issue. No matter how big or
           | small the context window is, LLMs simply do not have the
           | concept of goals and consequences. Thus, it's difficult for
           | them to acquire dynamic and evolving "skills" like humans do.
        
         | seunosewa wrote:
         | The blurbs can be improved if they aren't effective. You can
         | also invoke skills directly.
         | 
         | The description is equivalent to your short term memory.
         | 
         | The skill is like your long term memory which is retrieved if
         | needed.
         | 
         | These should both be considered as part of the AI agent. Not
         | external things.
        
         | blackoil wrote:
         | Most of the experience is general information not specific to
         | project/discussion. LLM starts with all that knowledge. Next it
         | needs a memory and lookup system for project specific
         | information. Lookup in humans is amazingly fast, but even with
         | a slow lookup, LLMs can refer to it in near real-time.
        
         | andruby wrote:
         | Would this requirement to start from ground zero in current
         | LLMs be an artefact of the requirement to have a "multi-tenant"
         | infrastructure?
         | 
         | Of course OpenAI and Anthropic want to be able to reuse the
         | same servers/memory for multiple users, otherwise it would be
         | too expensive.
         | 
         | Could we have "personal" single-tenant setups? Where the LLM
         | incorporates every previous conversation?
        
         | mbesto wrote:
         | > Contrast with the way a human learns skills - as we gain
         | experience with a skill, we get better at understanding when
         | it's the right tool for the job.
         | 
         | Which is precisely why Richard Sutton doesn't think LLMs will
         | evolve to AGI[0]. LLMs are based on mimicry, not experience, so
         | it's more likely (according to Sutton) that AGI will be based
         | on some form of RL (reinforcement learning) and not neural
         | networks (LLMs).
         | 
         | More specifically, LLMs don't have goals and consequences of
         | actions, which is the foundation for intelligence. So, to your
         | point, the idea of a "skill" is more akin to a reference
         | manual, than it is a skill building exercise that can be
         | applied to developing an instrument, task, solution, etc.
         | 
         | [0] https://www.youtube.com/watch?v=21EYKqUsPfg
        
           | buildbot wrote:
           | The industry has been doing RL on many kinds of neural
           | networks, including LLMs, for quite some time. Is this person
           | saying we RL on some kind of non neural network design? Why
           | is that more likely to bring AGI than an LLM?.
           | 
           | > More specifically, LLMs don't have goals and consequences
           | of actions, which is the foundation for intelligence.
           | 
           | Citation?
        
             | jfarina wrote:
             | Why are you asking them to cite something for that
             | statement? Are you questioning whether it's the foundation
             | for intelligence or whether LLMS understand goals and
             | consequences?
        
               | buildbot wrote:
               | Yes, I'm questioning if that's the foundation of
               | intelligence. Says who?
        
               | mbesto wrote:
               | Richard Sutton. He won a Turing Award. Why ask your
               | question above when you can just watch the YouTube link I
               | posted?
        
             | anomaloustho wrote:
             | Looks like they added the link. But I think it's doing RL
             | in realtime vs pre-trained as an LLM is.
             | 
             | And I associate that part to AGI being able to do cutting
             | edge research and explore new ideas like humans can. Where,
             | when that seems to "happen" with LLMs it's been more
             | debatable. (e.g. there was an existing paper that the LLM
             | was able to tap into)
             | 
             | I guess another example would be to get an AGI doing RL in
             | realtime to get really good at a video game with completely
             | different mechanics in the same way a human could. Today,
             | that wouldn't really happen unless it was able to pre-train
             | on something similar.
        
               | ibejoeb wrote:
               | I don't think any of the commercial models are doing RL
               | at the consumer. The R is just accepting or rejecting the
               | action, right?
        
           | hbarka wrote:
           | For humans, it's not uncommon to have a clever realization by
           | way of serendipity. How do you skill AI to have serendipity.
        
           | mediaman wrote:
           | It's a false dichotomy. LLMs are already being trained with
           | RL to have goal directedness.
           | 
           | He is right that non-RL'd LLMs are just mimicry, but the
           | field already moved beyond that.
        
             | leptons wrote:
             | I can't wait to try to convince an LLM/RL/whatever-it-is
             | that what it "thinks" is right is actually wrong.
        
             | dingnuts wrote:
             | Explain something to me that I've long wondered: how does
             | Reinforcement Learning work if you cannot measure your
             | distance from the goal? In other words, how can RL be used
             | for literally anything qualitative?
        
               | kmacdough wrote:
               | This is one of known hardest parts of RL. The short
               | answer is human feedback.
               | 
               | But this is easier said than done. Current models require
               | vastly more learning events than humans, making direct
               | supervision infeasable. One strategy is to train models
               | on human supervisors, so they can bear the bulk of the
               | supervision. This is tricky, but has proven more
               | effective than direct supervision.
               | 
               | But, in my experience, AIs don't specifically struggle
               | with the "qualitative" side of things per-se. In fact,
               | they're great at things like word choice, color theory,
               | etc. Rather, they struggle to understand continuity,
               | consequence and to combine disparate sources of input.
               | They also suck at differentiating fact from fabrication.
               | To speculate wildly, it feels like it's missing the the
               | RL of living in the "real world". In order to eat, sleep
               | and breath, you must operate within the bounds of physics
               | and society and live forever with the consequences of an
               | ever-growing history of choices.
        
               | mbesto wrote:
               | This 100%.
               | 
               | While we might agreed that language is foundational to
               | what it is to be human, it's myopic to think its the only
               | thing. LLMs are based on training sets of language
               | (period).
        
             | anomaloustho wrote:
             | I wrote elsewhere but I'm more interpreting this
             | distinction as "RL in real-time" vs "RL beforehand".
        
               | munchler wrote:
               | I agree with this description, but I'm not sure we really
               | want our AI agents evolving in real time as they gain
               | experience. Having a static model that is thoroughly
               | tested before deployment seems much safer.
        
               | mbesto wrote:
               | > Having a static model that is thoroughly tested before
               | deployment seems much safer.
               | 
               | While that might true, it fundamentally means it's not
               | going to ever replicate human or provide super
               | intelligence.
        
             | baxtr wrote:
             | So it's on-the-fly adaptive mimicry?
        
             | OtherShrezzing wrote:
             | In the interview transcript, he seems aware that the field
             | is doing RL, and he makes a compelling argument that
             | bootstrapping isn't as scalable as a purely RL trained AI
             | would be.
        
             | mbesto wrote:
             | > LLMs are already being trained with RL to have goal
             | directedness.
             | 
             | That might be true, but we're talking about the
             | fundamentals of the concept. His argument is that you're
             | never going to reach AGI/super intelligence on an evolution
             | of the current concepts (mimicry) even through fine tuning
             | and adaptions - it'll like be different (and likely based
             | on some RL technique). At least we have NO history to
             | suggest this will be case (hence his argument for "the
             | bitter lesson").
        
             | samrus wrote:
             | The LLMs dont have RL baked into them. They need that at
             | the token prediction level to be able to do the sort of
             | things humans can do
        
           | vonneumannstan wrote:
           | This is an uninformed take. Much of the improvement in
           | performance of LLM based models has been through RLHF and
           | other RL techniques.
        
             | mbesto wrote:
             | > This is an uninformed take.
             | 
             | You may disagree with this take but its not uninformed.
             | Many LLMs use self-supervised pretraining followed by RL-
             | based fine-tuning but that's essentially it - it's fine
             | tuning.
        
           | skurilyak wrote:
           | Besides a "reference manual", Claude Skills is analogous to a
           | "toolkit with an instruction manual" in that it includes both
           | instructions (manuals) and executable functions (tools/code)
        
         | ChadMoran wrote:
         | This is the crux of knowledge/tool enrichment in LLMs. The idea
         | that we can have knowledge bases and LLMs will know WHEN to use
         | them is a bit of a pipe dream right now.
        
           | fragmede wrote:
           | Can you be more specific? The simple case seems to be solved,
           | eg if I have an mcp for foo enabled and then ask about a list
           | of foo, Claude will go and call the list function on foo.
        
             | corytheboyd wrote:
             | > [...] and then ask about a list of foo
             | 
             | Not OP, but this is the part that I take issue with. I want
             | to forget what tools are there and have the LLM figure out
             | on its own which tool to use. Having to remember to add
             | special words to encourage it to use specific tools
             | (required a lot of the time, especially with esoteric
             | tools) is annoying. I'm not saying this renders the whole
             | thing "useless" because it's good to have some idea of what
             | you're doing to guide the LLM anyway, but I wish it could
             | do better here.
        
             | ChadMoran wrote:
             | It doesn't reliably do it. You need to inject context into
             | the prompt to instruct the LLM to use tools/kb/etc. It
             | isn't deterministic of when/if it will follow-through.
        
       | fridder wrote:
       | All of these random features is just pushing me further towards
       | model agnostic tools like goose
        
         | xpe wrote:
         | Thanks for sharing goose.
         | 
         | This phase of LLM product development feels a bit like the
         | Tower of Babel days with Cloud services before wrapper tools
         | became popular and more standardization happened.
        
         | cesarvarela wrote:
         | I wonder how much this affects the model's performance. I
         | imagine Anthropic trains its models to use a generic set of
         | tools, but they can also lean on their specific tool
         | definitions to save the agent from having to guess which tool
         | for what.
        
       | asdev wrote:
       | I wonder what the accuracy is for Claude to always follow a Skill
       | accurately. I've had trouble getting LLMs to follow specific
       | workflows 100% consistently without skipping or missing steps.
        
       | rob wrote:
       | Subagents, plugins, skills, hooks, mcp servers, output styles,
       | memory, extended thinking... seems like a bunch of stuff you can
       | configure in Claude Code that overlap in a lot of areas. Wish
       | they could figure out a way to simplify things.
        
         | singularity2001 wrote:
         | Also the post does not contain a single word how it relates to
         | the very similar agents in claude code. Capabilities,
         | connectors, tasks, apps, custom-gpts, ... the space needs some
         | serious consolidation and standardization!
         | 
         | I noticed the general tendency for overlap also when trying to
         | update claude since 3+ methods conflicted with each other
         | (brew, curl, npm, bun, vscode).
         | 
         | Might this be the handwriting of AI? ;)
        
           | kordlessagain wrote:
           | The post is simply "here's a folder with crap in it I may or
           | may not use".
        
         | CuriouslyC wrote:
         | My agent has handlebars system prompts that you can pass
         | variables at orchestration time. You can cascade imports and
         | such, it's really quite powerful; a few variables can result in
         | radically different system prompt.
        
       | _greim_ wrote:
       | > Developers can also easily create, view, and upgrade skill
       | versions through the Claude Console.
       | 
       | For coding in particular, it would be super-nice if they could
       | just live in a standard location in the repo.
        
         | GregorStocks wrote:
         | Looks like they do:
         | 
         | > You can also manually install skills by adding them to
         | ~/.claude/skills.
        
       | deeviant wrote:
       | Basically just rules/workflows from cursor/windsurf, but with a
       | UI.
        
       | pixelpoet wrote:
       | Aside: I really love Anthropic's design language, so beautiful
       | and functional.
        
         | maigret wrote:
         | Yes and fantastically executed, consistently through all their
         | products and website - desktop, command line, third parties and
         | more.
        
         | lukev wrote:
         | I agree 100%, except for the logo, which persistently looks
         | like something they... probably did not intend.
        
           | nozzlegear wrote:
           | I always thought of it as an ink blot. Until now.
        
           | micromacrofoot wrote:
           | a helpful reminder that these things often speak from their
           | asses
        
       | jasonthorsness wrote:
       | When the skill is used locally in Claude Code does it still run
       | in a virtual machine? Like some sort of isolation container with
       | the target directory mounted?
        
       | xpe wrote:
       | Better when blastin' Skills by Gang Starr (headphones recommended
       | if at work):
       | 
       | https://www.youtube.com/watch?v=Lgmy9qlZElc
        
       | 999900000999 wrote:
       | Can I just tell it to read the entire Godot source repo as a
       | skill ?
       | 
       | Or is there some type of file limit here. Maybe the context
       | windows just aren't there yet, but it would be really awesome if
       | coding agents would stop trying to make up functions.
        
         | s900mhz wrote:
         | Download the godot docs and tell the skill to use them. It
         | won't be able to fit the entire docs in the context but that's
         | not the point. Depending on the task it will search for what it
         | needs
        
       | dearilos wrote:
       | We're trying to solve a similar problem at wispbit - this is an
       | interesting way to do it!
        
       | CuriouslyC wrote:
       | Anything the model chooses to use is going to waste context and
       | get utilized poorly. Also, the more skills you have, the worse
       | they're going to be. It's subagents v2.
       | 
       | Just use slash commands, they work a lot better.
        
       | just-working wrote:
       | I simply do not care about anything AI now. I have a severe
       | revulsion to it. I miss the before times.
        
       | sega_sai wrote:
       | There seems to be a lot of overlap of this with MCP tools. Also
       | presumably if there are a lot of skills, they will be too big for
       | the context and one would need some way to find the right one. It
       | is unclear how well this approach will scale.
        
         | rahimnathwani wrote:
         | Anthropic talks about 'progressive disclosure'.
         | 
         | If you have a large number of skills, you could group them into
         | a smaller number of skills each with subskills. That way not
         | all the (sub)skill descriptions need to be loaded into context.
         | 
         | For example, instead of having a 'PDF editing' skill, you can
         | have a 'file editing' skill that, when loaded into context,
         | tells the LLM what type of files it can operate on. And then
         | the LLM can ask for the info about how to do stuff with PDF
         | files.
        
       | guluarte wrote:
       | great! another set of files the models will completely ignore
       | like CLAUDE.md
        
       | simonw wrote:
       | I accidentally leaked the existence of these last Friday, glad
       | they officially exist now!
       | https://simonwillison.net/2025/Oct/10/claude-skills/
        
         | buildbot wrote:
         | "So I fired up a fresh Claude instance (fun fact: Code
         | Interpreter also works in the Claude iOS app now, which it
         | didn't when they first launched) and prompted:
         | 
         | Create a zip file of everything in your /mnt/skills folder"
         | 
         | It's a fun, terrifying world that this kind of "hack" to
         | exfiltrate data is possible! I hope it does not have full
         | filesystem/bin access, lol. Can it SSH?...
        
           | antiloper wrote:
           | What's the hack? Instead of typing `zip -r mnt.zip /mnt` into
           | bash, you type `Create a zip file of /mnt` in claude code.
           | It's the same thing running as the same user.
        
             | tgtweak wrote:
             | Skills run remotely in the llm environment, not locally on
             | your system running claude - worth noting.
        
         | skylurk wrote:
         | Woah, Jesse's blog has really come alive lately. Thanks for
         | highlighting this post.
        
       | sva_ wrote:
       | All this AI, and yet it can't render properly on mobile.
        
       | mikkupikku wrote:
       | I'd love a Skill for effective use of subagents in Claude Code.
       | I'm still struggling with that.
        
       | arjie wrote:
       | It's pretty neat that they're adding these things. In my
       | projects, I have a `bin/claude` subdirectory where I ask it to
       | put scripts etc. that it builds. In the claude.md I then note
       | that it should look there for tools. It does a pretty good job of
       | this. To be honest, the thing I most need are context-management
       | helpers like "start a claude with this set of MCPs, then that
       | set, and so on". Instead right now I have separate subdirectories
       | that I then treat as projects (which are supported as profiles in
       | Claude) which I then launch a `claude` from. The advantage of the
       | `bin/claude` in each of these things is that it functions as a
       | longer-cycle learning thing. My Claude instantly knows how to
       | analyze certain BigQuery datasets and where to find the
       | credentials file and so on.
       | 
       | Filesystem as profile manager is not something I thought I'd be
       | doing, but here we are.
        
         | tomComb wrote:
         | > the thing I most need are context-management helpers like
         | "start a claude with this set of MCPs, then that set, and so
         | on".
         | 
         | Isn't that sub agents?
        
           | arjie wrote:
           | Ah, in my case, I want to just talk to a video-editing
           | Claude, and then a sys-admin Claude, and so on. I don't want
           | to go through a main Claude who will instantiate these guys.
           | I want to talk to the particular Claudes myself. But if sub-
           | agents work for this, then maybe I just haven't been using
           | them well.
        
       | iyn wrote:
       | Does anyone know how skills relate to subagents? Seems that
       | subagents have more capabilities (e.g. can access the internet)
       | but seems that there's a lot of overlap.
       | 
       | I've asked Claude and this it answered this:
       | Skills = Instructions + resources for the current Claude instance
       | (shared context)       Subagents = Separate AI instances with
       | isolated contexts that can work in parallel (different context
       | windows)       Skills make Claude better at specific tasks.
       | Subagents are like having multiple specialized Claudes working
       | simultaneously on different aspects of a problem.
       | 
       | I imagine we can probably compose them, e.g. invoke subagents (to
       | keep separate context) which could use some skills to in the end
       | summarize the findings/provide output, without "polluting" the
       | main context window.
        
         | lukev wrote:
         | How this reads to me is that a skill is "just" a bundle of
         | prompts, scripts, and files that can be read into context as a
         | unit.
         | 
         | Having a sub-agent "execute" a skill makes a lot of sense from
         | a context management, perspective, but I think the way to think
         | about it is that a sub-agent is an "execution-level" construct,
         | whereas a skill is a "data-level" construct.
        
           | throwup238 wrote:
           | Skills can also contain scripts that can be executed in a VM.
           | The Anthropic engineering blog mentions that you can specify
           | in the markdown instructions whether the script should be
           | executed or read into context. One of their examples is a
           | script to extract properties from a PDF file.
        
       | jstummbillig wrote:
       | ELI5: How is a skill different from a tool?
        
       | notepad0x90 wrote:
       | Just me or is anthropic doing a lot better of a job at marketing
       | than openai and google?
        
         | reed1234 wrote:
         | It's much more focused on devs I feel like. Less fluff
        
       | lquist wrote:
       | lol how is this not optimized for mobile
        
       | emadabdulrahim wrote:
       | So skills are basically preset system prompts, assuming different
       | roles etc? Or is there more to it.
       | 
       | I'm a little confused.
        
         | imiric wrote:
         | Right, that's my interpretation as well.
         | 
         | "AI" companies have reached the end of the road when it comes
         | to throwing more data and compute at the problem. The only way
         | now for charts to go up and to the right is to deliver value-
         | added services.
         | 
         | And, to be fair, there's a potentially long and profitable road
         | by doing good engineering work that was needed anyways.
         | 
         | But it should be obvious to anyone within this bubble that this
         | is not the road to "superintelligence" or "AGI". I hope that
         | the hype and false advertising stops soon, so that we can focus
         | on practical applications of this technology, which are
         | numerous.
        
         | JyB wrote:
         | I'm super confused as well. This seems like exactly that, just
         | some default prompt injections to chose from. I guess I kinda
         | understand them in the context of their claude chat UI product.
         | 
         | By I don't understand why it's a thing in Claude Code tho when
         | we already have Claude.md? Could also just point to any .md
         | file in the prompt as preamble but not even needed.
         | https://www.anthropic.com/engineering/claude-code-best-pract...
         | 
         | That concept is also already perfectly specd in the MCP
         | standard right? (Although not super used I think?)
         | https://modelcontextprotocol.io/specification/2025-06-18/ser...
        
           | chickensong wrote:
           | Claude.md gets read every time and eats context, while it
           | sounds like the skills are read as-needed, saving context.
        
         | pollinations wrote:
         | Plus executable.xode snippets. I think their actual source code
         | doesn't use context. But feels like function calling packaged.
        
       | mercurialsolo wrote:
       | Sub agents, mcp, skills - wonder how are they supposed to
       | interact with each other?
       | 
       | Feels like fair bit of overlap here. It's ok to proceed in a
       | direction where you are upgrading the spec and enabling claude
       | wth additional capabilities. But one can pretty much use any of
       | these approaches and end up with the same capability for an
       | agent.
       | 
       | Right now feels like a ux upgrade from mcp where you need a json
       | but instead can use a markdown in a file / folder and provide
       | multi-modal inputs.
        
         | JyB wrote:
         | Claude Skills just seem to be the same as MCP prompts:
         | https://modelcontextprotocol.io/specification/2025-06-18/ser...
         | 
         | I don't really see why they had to create a different concept.
         | Maybe makes sense "marketing-wise" for their chat UI, but in
         | Claude Code? Especially when CLAUDE.md is a thing?
        
           | datadrivenangel wrote:
           | Yeah how is this different from MCP prompts?
        
           | pizza wrote:
           | Narrowly focused semantics/affordances (for both LLM and
           | users/future package managers/communities, ease of
           | redistribution and context management:
           | 
           | - skills are plain files that are injected contextually
           | whereas prompts would come w the overhead of live, running
           | code that has to be installed just right into your particular
           | env, to provide a whole mcp server. Tbh prompts also seem to
           | be more about literal prompting, too
           | 
           | - you could have a thousand skills folders for different
           | softwares etc but good luck with having more than a few mcp
           | servers that are loaded into context w/o it clobbering the
           | context
        
           | jjfoooo4 wrote:
           | I see this as a lower overhead replacement for MCP. Rather
           | than managing a bunch of MCP's, use the directory structure
           | to your advantage, leverage the OS's capability to execute
        
             | JyB wrote:
             | I think you are right.
        
             | ebonnafoux wrote:
             | For me the concept of MCP was to have a client/server
             | relation. For skills everything will be local.
        
           | pattobrien wrote:
           | MCP Prompts are meant to be _user triggered_ , whereas I
           | believe a Skill is meant to be an LLM-triggered, use-case
           | centric set of instructions for a specific task.
           | - MCP Prompt: "Please solve GitHub Issue #{issue_id}"       -
           | Skills:         - React Component Development (React best
           | practices, accessible tools)         - REST API Endpoint
           | Development         - Code Review
           | 
           | This will probably result in:                 - Single
           | "CLAUDE.md" instructions are broken out into discoverable
           | instructions that the LLM will dynamically utilize based on
           | the user's prompt       - rather than having direct access to
           | Tools, Claude will always need to go through Skill
           | instructions first (making context tighter since it cant use
           | Tools without understanding \*how\* to use them to achieve a
           | certain goal)       - Clients will be able to add infinite
           | MCP servers / tools, since the Tools themselves will no
           | longer all be added to the context window
           | 
           | It's basically a way to decouple User prompts from direct raw
           | Tool access, which actually makes a ton of sense when you
           | think of it.
        
       | fny wrote:
       | I fear the conceptual churn we're going to endure in the coming
       | years will rival frontend dev.
       | 
       | Across ChatGPT and Claude we now have tools, functions, skills,
       | agents, subagents, commands, and apps, and there's a
       | metastasizing complex of vibe frameworks feeding on this mess.
        
         | LPisGood wrote:
         | Metastasizing is such an excellent way to describe this
         | phenomenon. They grow on top of each other.
        
         | hkt wrote:
         | The same thing will happen: skilled people will do one thing
         | well. I've zero interest in anything but Claude code in a dev
         | container and, while mindful of the lethal trifecta, will give
         | Claude as much access to a local dev environment and it's
         | associated tooling as I would give to a junior developer.
        
         | mathattack wrote:
         | There's so much white space - this is the cost of a brand new
         | technology. Similar issues with figuring out what cloud tools
         | to use, or what python libraries are most relevant.
         | 
         | This is also why not everyone is an early adopter. There are
         | mental costs involved in staying on top of everything.
        
           | benterix wrote:
           | > This is also why not everyone is an early adopter.
           | 
           | Usually, there are relatively few adopters of a new
           | technology.
           | 
           | But with LLMs, it's quite the opposite: there was a huge
           | number of early adopters. Some got extremely excited and run
           | hundreds of agents all the time, some got burned and went
           | back to the good old ways of doing things, whereas the
           | majority is just using LLMs from time to time for various
           | tasks, bigger of smaller.
        
             | a4isms wrote:
             | I follow your reasoning. If we just look at businesses, and
             | we include every business that pays money for AI and one or
             | more employees use AI to do their their jobs, then we're in
             | the Early Majority phase, not the Innovator or Early
             | Adopter phases.
             | 
             | https://en.wikipedia.org/wiki/Technology_adoption_life_cycl
             | e
        
             | mathattack wrote:
             | There's early adoption from individuals. Much less from
             | enterprises. (They're buying site licenses, but not re-
             | engineering their company processes)
        
         | kbar13 wrote:
         | i'm letting the smarter folks figure all this out and just
         | picking the tools i like every now and then. i like just using
         | claude code with vscode and still doing some things manually
        
           | efields wrote:
           | same same
        
         | esafak wrote:
         | On the other hand, this complexity represents a new niche that,
         | for a while at least, will present job and business
         | opportunities.
        
         | Trias11 wrote:
         | Right.
         | 
         | I focus on building projects delivering some specific business
         | value and pick the tools that gets me there.
         | 
         | There is zero value in spending cycles by engaging in new tools
         | hype.
        
         | dalmo3 wrote:
         | For Cursor: cursorrules, mdc rules, user rules, team rules.
        
         | catgary wrote:
         | These companies are also biased towards solutions that will
         | more-or-less trap you in a heavily agent-based workflow.
         | 
         | I'm surprised/disappointed that I haven't seen any papers out
         | of the programming languages community about how to integrate
         | agentic coding with compilers/type system features/etc. They
         | really need to step up, otherwise there's going to be a lot of
         | unnecessary CO2 produced by tools like this.
        
         | awb wrote:
         | Hopefully there's a similar "don't make me think" mantra that
         | comes to AI product design.
         | 
         | I like the trend where the agent decides what models, tooling
         | and thought process to use. That seems to me far more powerful
         | than asking users to create solutions for each discreet problem
         | space.
        
           | kingkongjaffa wrote:
           | Where I've seen it be really transformative is giving it
           | additive tools that are multiplicative in utility. So like
           | giving an LLM 5 primitive tools for a specific domain and the
           | agent figuring out how to use them together and chain them
           | and run some tools multiple times etc.
        
         | iLoveOncall wrote:
         | Except in reality it's ALL marketing terms for 2 things:
         | additional prompt sections, and APIs.
        
           | james_marks wrote:
           | I more or less agree, but it's surprising what naming a
           | concept does for the average user.
           | 
           | You see a text file and understand that it can be anything,
           | but end users can't/won't make the jump. They need to see the
           | words Note, Reminder, Email, etc.
        
         | butlike wrote:
         | Just wait until I can pull in just the concepts I want with
         | "GPT Package Manager." I can simply call `gptpm add skills` and
         | the LLM package manager will add the Skills package to my GPT.
         | What could go wrong?
        
         | libraryofbabel wrote:
         | You forgot mcp-everything!
         | 
         | Yes, it's a mess, and there will be a lot of churn, you're not
         | wrong, but there are foundational concepts underneath it all
         | that you can learn and then it's easy to fit insert-new-feature
         | into your mental model. (Or you can just ignore the new
         | features, and roll your own tools. Some people here do that
         | with a lot of success.)
         | 
         | The foundational mental model to get the hang of is really
         | just:
         | 
         | * An LLM
         | 
         | * ...called in a loop
         | 
         | * ...maintaining a history of stuff it's done in the session
         | (the "context")
         | 
         | * ...with access to tool calls to do things. Like, read files,
         | write files, call bash, etc.
         | 
         | Some people call this "the agentic loop." Call it what you
         | want, you can write it in 100 lines of Python. I encourage
         | every programmer I talk to who is remotely curious about LLMs
         | to try that. It is a lightbulb moment.
         | 
         | Once you've written your own basic agent, if a new tool comes
         | along, you can easily demystify it by thinking about how you'd
         | implement it yourself. For example, Claude Skills are really
         | just:
         | 
         | 1) Skills are just a bunch of files with instructions for the
         | LLM in them.
         | 
         | 2) Search for the available "skills" on startup and put all the
         | short descriptions into the context so the LLM knows about
         | them.
         | 
         | 3) Also tell the LLM how to "use" a skill. Claude just uses the
         | `bash` tool for that.
         | 
         | 4) When Claude wants to use a skill, it uses the "call bash"
         | tool to read in the skill files, then does the thing described
         | in them.
         | 
         | and that's more or less it, glossing over a lot of things that
         | are important but not foundational like ensuring granular tool
         | permissions, etc.
        
           | Der_Einzige wrote:
           | Tool use is only good with structured/constrained generation
        
             | libraryofbabel wrote:
             | You'll need to expand on what you mean, I'm afraid.
        
               | AStrangeMorrow wrote:
               | I think, from my experience, what they mean is tool use
               | is as good as your model capability to stick to a given
               | answer template/grammar. For example if it does tool
               | calling using a JSON format it needs to stick to that
               | format, not hallucinate extra fields and use the existing
               | fields properly. This has worked for a few years and LLMs
               | are getting better and better but the more tools you
               | have, the more parameters your functions to call can have
               | etc the higher the risk of errors. You also have systems
               | that constrain the whole inference itself, for example
               | with the outlines package, by changing the way tokens are
               | sampled (this way you can force a model to stick to a
               | template/grammar, but that can also degrade results in
               | some other ways)
        
               | libraryofbabel wrote:
               | I see, thanks for channeling the GP! Yeah, like you say,
               | I just don't think getting the tool call template right
               | is really a problem anymore, at least with the big-labs
               | SotA models that most of us use for coding agents. Claude
               | Sonnet, Gemini, GPT-5 and friends have been heavily
               | heavily RL-ed into being really good at tool calls, and
               | it's all built into the providers' apis now so you never
               | even see the magic where the tool call is parsed out of
               | the raw response. To be honest, when I first read about
               | tools calls with LLMs I thought, "that'll never work
               | reliably, it'll mess up the syntax sometimes." But in
               | practice, it does work. (Or, to be more precise, if the
               | LLM ever does mess up the grammar, you never know because
               | it's able to seamlessly retry and correct without it ever
               | being visible at the user-facing api layer.) Claude Code
               | plugged into Sonnet (or even Haiku) might do hundreds of
               | tool calls in an hour of work without missing a beat. One
               | of the many surprises of the last few years.
        
           | dlivingston wrote:
           | > Call it what you want, you can write it in 100 lines of
           | Python. I encourage every programmer I talk to who is
           | remotely curious about LLMs to try that. It is a lightbulb
           | moment.
           | 
           | Definitely want to try this out. Any resources / etc. on
           | getting started?
        
             | libraryofbabel wrote:
             | This is the classic blog post, by Thorsten Ball, from way
             | back in the AI Stone Age (April this year):
             | https://ampcode.com/how-to-build-an-agent
             | 
             | It uses Go, which is more verbose than Python would be, so
             | he takes 300 lines to do it. Also, his edit_file tool could
             | be a lot simpler (I just make my minimal agent "edit" files
             | by overwriting the entire existing file).
             | 
             | I keep meaning to write a similar blog post with Python, as
             | I think it makes it even clearer how simple the stripped-
             | down essence of a coding agent can be. There is magic, but
             | it all lives in the LLM, not the agent software.
        
               | judahmeek wrote:
               | > I keep meaning to write a similar blog post with
               | Python...
               | 
               | Just have your agent do it.
        
               | libraryofbabel wrote:
               | I could, but I'm actually rather snobbish about my
               | writing and don't believe in having LLMs write first
               | drafts (for proofreading and editing, they're great).
               | 
               | (I am not snobbish about my code. If it works and is
               | solid and maintainable I don't care if I wrote it or not.
               | Some people seem to feel a sense of loss when an LLM
               | writes code for them, because of The Craft or whatever.
               | That's not me; I don't have my identity wrapped up in my
               | code. Maybe I did when I was more junior, but I've been
               | in this game long enough to just let it go.)
        
           | ibejoeb wrote:
           | Pretty true, and definitely a good exercise. But if we're
           | going to actual use these things in practice, you need more.
           | Things like prompt caching, capabilities/constraints, etc.
           | It's pretty dangerous to let an agent go hog wild in an
           | unprotected environment.
        
             | libraryofbabel wrote:
             | Oh sure! And if I was talking someone through building a
             | barebones agent, I'd definitely tag on a warning along the
             | lines of "but don't actually use this without XYZ!" That
             | said, you can add prompt caching by just setting a couple
             | of parameters in the api calls to the LLM. I agree
             | constraints is a much more complex topic, although even in
             | my 100-line example I am able to fit in a user approval
             | step before file write or bash actions.
        
               | apsurd wrote:
               | when you say prompt caching, does it mean cache the thing
               | you send to the llm or the thing you get back?
               | 
               | sounds like prompt is what you send, and caching is
               | important here because what you send is derived from
               | previous responses from llm calls earlier?
               | 
               | sorry to sound dense, I struggle to understand where and
               | how in the mental model the non-determinism of a response
               | is dealt with. is it just that it's all cached?
        
               | libraryofbabel wrote:
               | Not dense to ask questions! There are two separate
               | concepts in play:
               | 
               | 1) Maintaining the state of the "conversation" history
               | with the LLM. LLMs are stateless, so you have to store
               | the entire series of interactions on the client side in
               | your agent (every user prompt, every LLM response, every
               | tool call, every tool call result). You then send the
               | entire previous conversation history to the LLM every
               | time you call it, so it can "see" what has already
               | happened. In a basic agent, it's essentially just a big
               | list of strings, and you pass it into the LLM api on
               | every LLM call.
               | 
               | 2) "Prompt caching", which is a clever optimization in
               | the LLM infrastructure to take advantage of the fact that
               | most LLM interactions involve processing a lot of
               | unchanging past conversation history, plus a little bit
               | of new text at the end. Understanding it requires
               | understanding the internals of LLM transformer
               | architecture, but the essence of it is that you can save
               | a lot of GPU compute time by caching previous result
               | states that then become intermediate states for the next
               | LLM call. You cache on the entire history: the base
               | prompt, the user's messages, the LLM's responses, the
               | LLM's tool calls, everything. As a user of an LLM api,
               | you don't have to worry about how any of it works under
               | the hood, you just have to enable it. The reason to turn
               | it on is it dramatically increases response time and
               | reduces cost.
               | 
               | Hope that clarifies!
        
         | __loam wrote:
         | Langchain was the original sin of thin framework bullshit
        
         | kelvinjps10 wrote:
         | I found that the way that Claude now handle tools on my sistema
         | simplifies stuff, with its cli usage, I find the Claude skills
         | model better than mcp
        
         | lukev wrote:
         | The cool part is that none of any of this is actually that big
         | or difficult. You can master it on-demand, or build your own
         | substitutes if necessary.
         | 
         | Yeah, if you chase buzzword compliance and try to learn all
         | these things outside of a particular use case you're going to
         | burn out and have a bad time. So... don't?
        
         | siva7 wrote:
         | It feels like every week these companies release some new
         | product that feels very similar to what they released a week
         | before. Can the employees at Anthropic even tell themselves
         | what the difference is?
        
           | amelius wrote:
           | These products are all cannibalizing eachother, so a bad
           | strategy.
        
         | zmmmmm wrote:
         | Yep, the ecosystem is well on its way to collapsing under its
         | own weight.
         | 
         | You have to remember, every system or platform has a total
         | complexity budget that effectively sits at the limit of what a
         | broad spectrum of people can effectively incorporate into their
         | day to day working memory. How it gets spent is absolutely
         | crucial. When a platform vendor adds a new piece of complexity,
         | it comes from the same budget that could have been devoted to
         | things built on the platform. But unlike things built on the
         | platform, it's there whether developers like it and use it or
         | not. It's common these days that providers binge on ecosystem
         | complexity because they think it's building differentiation,
         | when in fact it's building huge barriers to the exact audience
         | they need to attract to scale up their customer base, and
         | subtracting from the value of what can actually be built _on_
         | their platform.
         | 
         | Here you have a highly overlapping duplicative concept that's
         | taking a solid chunk of new complexity budget but not really
         | adding a lot of new capability in return. I am sure the people
         | who designed it think they are reducing complexity by adding a
         | "simple" new feature that does what people would otherwise have
         | to learn themselves. It's far more likely they are at break
         | even for how many people they deter vs attract from using their
         | platform by doing this.
        
       | josefresco wrote:
       | I just used tested the canvas-design skill and the results were
       | pretty awful.
       | 
       | This is the skill description:
       | 
       | Create beautiful visual art in .png and .pdf documents using
       | design philosophy. You should use this skill when the user asks
       | to create a poster, piece of art, design, or other static piece.
       | Create original visual designs, never copying existing artists'
       | work to avoid copyright violations.
       | 
       | What it created was an abstract art museum-esque poster with
       | random shapes and no discernable message. It may have been trying
       | to design a playing card but just failed miserably which is my
       | experience with most AI image generators.
       | 
       | It certainly spent a lot of time, and effort to create the
       | poster. It asked initial questions, developed a plan, did
       | research, created tooling - seems like a waste of "tokens" given
       | how simple and lame the resulting image turned out.
       | 
       | Also after testing I still don't know how to "use" one of these
       | skills in an actual chat.
        
         | taejavu wrote:
         | If you want to generate images, use Midjourney or whatever.
         | It's almost like you've deliberately missed the point of the
         | feature.
        
       | jedisct1 wrote:
       | Too many options, this is getting very confusing.
       | 
       | Roo Code just has "modes", and honestly, this is more than
       | enough.
        
       | rohan_ wrote:
       | Cursor launched this a while ago with "Cursor Rules"
        
       | radley wrote:
       | It will be interesting to see how this is structured. I was
       | already doing something similar with Claude Projects &
       | Instructions, MCP, and Obsidian. I'm hoping that Skills can
       | cascade (from general to specific) and/or be combined between
       | projects.
        
       | datadrivenangel wrote:
       | So sort of like MCP prompt templates except not prompt templates?
        
       | laurentiurad wrote:
       | AGI nowhere near
        
         | skylurk wrote:
         | I know I'm replying to a shitpost. But I had a realisation, and
         | I'm probably not the only one.
         | 
         | If you can manage to keep structuring slightly intelligent
         | tools so that they compound, seems like AGI is achievable.
         | 
         | That's why the thing everyone is after right now is new ways to
         | make those slight intelligences keep compounding.
         | 
         | Just like repeated multiplication of 1.001 grows indefinitely.
        
           | gigatree wrote:
           | But how often can you repeat the multiplication when the
           | repetitions are unsustainable?
        
             | skylurk wrote:
             | Yeah, sometimes it feels like we're just layering
             | unintelligent things, with compounding unintelligence...
             | 
             | But starting earlier this year, I've started to see
             | glimpses of what seems like intelligence (to me) in the
             | tools, so who knows.
        
           | Lionga wrote:
           | I know I'm replying to a shitpost. Well enough said.
        
       | robwwilliams wrote:
       | Could be helpful. I often edit scientific papers and grant
       | applications. Orienting Claude on the frontend of each project
       | works but an "Editing Skill" set could be more general and make
       | interactions with Claude more clued in to goals instead of
       | starting stateless.
        
       | mercurialsolo wrote:
       | One sharp contrast though I see between OpenAI and Anthropic is
       | the product extensions are built around their flagship products.
       | 
       | OpenAI ships extensions for ChatGPT - that feed more to plug into
       | the consumer experience. Anthropic ships extensions (made for
       | builders) into ClaudeCode - feel more DX.
        
       | corytheboyd wrote:
       | I'll give it a fair go, but how is it not going to have the same
       | problem of _maybe_ using MCP tools? The same problem of trying to
       | add to your prompt "only answer if you are 100% correct"? A skill
       | just sounds like more markdown that is fed into context, but with
       | a cool name that sounds impressive, and some indexing of the
       | defined skills on start (same as MCP tools?)
        
       | butlike wrote:
       | Great, so now I can script the IDE...err, I mean LLM. I can't
       | help but feel like we've been here before, and the magic is
       | wearing thin.
        
       | gloosx wrote:
       | wow, this news post layout is not fitting the screen on mobile...
       | Couldnt these 10x programmers vibecode a proper mobile version?
        
       | thorio wrote:
       | How about using some of that skills to make that page mobile
       | ready...
        
       | I_am_tiberius wrote:
       | Every release of these companies makes me angry because I know
       | they take advantage of all the people who release content to the
       | public. They just consume and take the profit. In addition to
       | that Anthropic has shown that they don't care about our privacy
       | AT ALL.
        
       | mercurialsolo wrote:
       | The way this is headed - I also see a burgeoning class of tools
       | emerging. MCP servers, Skill managers, Sub-Agent builders. Feels
       | like the patterns and protocols need more explainability to how
       | they synthesize into a practical dev (extension) toolkit which is
       | useful across multiple surfaces e.g. chat vs coding vs media gen.
        
       | actinium226 wrote:
       | It's an interesting idea (among many) to try to address the
       | problem of LLMs getting off task, but I notice that there's no
       | evaluation in the blog post. Like, ok cool, you've added
       | "skills," but is there any evidence that they're useful or are we
       | just grasping at straws here?
        
       | titzer wrote:
       | While not generally a bad idea, I find it amusing that they are
       | reinventing shared libraries where the code format is...English.
       | So the obvious next step is "precompiling" skills to a form that
       | is better for Claude internally.
       | 
       | ...which would be great if the (likely binary) format of that was
       | used internally, but something tells me an architectural screwup
       | will lead to leaking the binaries and we'll have a dependency on
       | a dumb inscrutable binary format to carry forward...
        
       | tgtweak wrote:
       | At term (and not even far term) - LLMs will be able to churn up
       | their own "skills" using their sandbox code environments - and
       | possibly recycle them through context on a per-user basis.
       | 
       | While I like the flexibility of deploying your own skills to
       | claude for use org-wide, this really feels like what MCP should
       | be for that use case, or what built-in analysis sandbox should
       | be.
       | 
       | We haven't even gone mainstream with MCP and there are already 10
       | stand-ins doing roughly the same thing with a different twist.
       | 
       | I would have honestly preferred they called this embedded MCP
       | instead of 'skills'.
        
       | _pdp_ wrote:
       | I predict there will be some sort of package manager opensource
       | project soon. Download skills from some 3rd-party website and run
       | inside Claude. Risks of supply chain issue will be obvious but
       | nobody will care - at least not in the short term.
        
       | nextworddev wrote:
       | What is this, tools for Claude web app?
        
       | XCSme wrote:
       | Isn't this just RAG?
        
       | jrh3 wrote:
       | The tools I build for Claude Code keep reducing back to just
       | using Claude Code and watching Anthropic add what I need. This is
       | my tool for brownfield projects with Claude Code. I added skills
       | based on https://blog.fsck.com/2025/10/09/superpowers/
       | 
       | https://github.com/RossH3/context-tree - Helps Claude and humans
       | understand complex brownfield codebases through maintained
       | context trees.
        
       | simonw wrote:
       | Just published this about skills: "Claude Skills are awesome,
       | maybe a bigger deal than MCP"
       | 
       | https://simonwillison.net/2025/Oct/16/claude-skills/
        
         | pants2 wrote:
         | Skills are cool, but to me it's more of a design pattern /
         | prompt engineering trick than something in need of a hard spec.
         | You can even implement it in an MCP - I've been doing it for a
         | while: "Before doing anything, search the skills MCP and read
         | any relevant guides."
        
           | manbash wrote:
           | I agree with you, but also I want to ask if I do understand
           | this correctly: there was a paradigm in which we were aiming
           | for Small Language Models to perform specific types of tasks,
           | orchestrated by the LLM. That is what I perceived the MCP
           | architecture came to standardize.
           | 
           | But here, it seems more like a diamond shape of information
           | flow: the LLM processes the big task, then prompts are
           | customized (not via LLM) with reference to the Skills, and
           | then the customized prompt is fed yet again to the LLM.
           | 
           | Is that the case?
        
         | kingkongjaffa wrote:
         | when do you need to make a skill vs a project?
        
           | simonw wrote:
           | In Claude and ChatGPT a project is really just a custom
           | system prompt and an optional bunch of files. Those files are
           | both searchable via tools and get made available in the Code
           | Interpreter container.
           | 
           | I see skills as something you might use inside of a project.
           | You could have a project called "data analyst" with a bunch
           | of skills for different aspects of that task - how to run a
           | regression, how to export data from MySQL, etc.
           | 
           | They're effectively custom instructions that are unlimited in
           | size and that don't cause performance problems by clogging up
           | the context - since the whole point of skills is they're only
           | read into the context when the LLM needs them.
        
         | timcobb wrote:
         | then submit it, you don't need to post here about it
        
           | hu3 wrote:
           | i found it useful and coinstructive to post it here also.
           | 
           | no reason not to.
        
         | hu3 wrote:
         | Do you reckon Skills overlap with AGENTS.md?
         | 
         | VSCode recently introduced support nested AGENTS.md which
         | albeit less formal, might overlap:
         | 
         | https://code.visualstudio.com/updates/v1_105#_support-for-ne...
        
       | outlore wrote:
       | I'm struggling to see how this is different from prepackaged
       | prompts. Simon's article talks about skill metadata being used by
       | the model to look up the full prompt as a way to save on context
       | usage. That is analogous to the model calling --help when it
       | needs to use a CLI tool without needing to load up the full man
       | pages ahead of time.
       | 
       | But couldn't an MCP server expose a "help" tool?
        
         | throwup238 wrote:
         | That's pretty much all it is. If you look at the docs it even
         | uses a bash script to read the skill markdown files into the
         | context.
         | 
         | I think the big difference is that now you can include scripts
         | in these skills that can be executed as part of the skill, in a
         | VM on their servers.
        
         | GoatInGrey wrote:
         | It's the fact that a collection of files are tied to a specific
         | task or action. Prompts are only injected context, whereas
         | files can be more selectively loaded into context.
         | 
         | What they're trying to do here is translate MCP servers to
         | something more broadly useable by the population. They cannot
         | differentiate themselves with model training anymore, so they
         | have been focusing more and more on tooling development to grow
         | revenue.
        
       | kingkongjaffa wrote:
       | What's the difference in use case between a claude-skill and
       | making a task specific claude project?
        
       | kristo wrote:
       | How is this different from commands? They're automatically
       | invoked? How does claude decide when to use a skill? How specific
       | do I need to write my skill?
        
       | stego-tech wrote:
       | I'm kind of in stitches over this. Claude's "skills" are
       | dependent upon developers writing competent documentation _and_
       | keeping it up to date...which most seemingly can't even do for
       | actual code they write, nevermind a brute-force black box like an
       | LLM.
       | 
       | For those few who do write competent documentation _and_ have
       | well-organized file systems _and_ the risk tolerance to allow
       | LLMs to run roughshod over data, sure, there's some potential
       | here. Though if you're already that far in, you'd likely be
       | better off farming that grunt work to a Junior as a learning
       | exercise than an LLM, especially since you'll have to cleanup the
       | output anyhow.
       | 
       | With the limited context windows of LLMs, you can never truly get
       | this sort of concept to "stick" like you can with a human, and if
       | you're training an agent for this specific task anyway, you're
       | effectively locking yourself to that specific LLM in perpetuity
       | rather than a replaceable or promotable worker.
       | 
       | Just...it makes me giggle, how _optimistic_ they are that stars
       | would align at scale like that in an organization.
        
       | yodsanklai wrote:
       | I'd like to fast forward to a time where these tools are stable
       | and mature so we can focus on coding again
        
       ___________________________________________________________________
       (page generated 2025-10-16 23:00 UTC)