[HN Gopher] The Bitter Lesson of LLM Extensions
       ___________________________________________________________________
        
       The Bitter Lesson of LLM Extensions
        
       Author : sawyerjhood
       Score  : 64 points
       Date   : 2025-11-24 18:32 UTC (4 hours ago)
        
 (HTM) web link (www.sawyerhood.com)
 (TXT) w3m dump (www.sawyerhood.com)
        
       | dsign wrote:
       | I don't know, even ChatGPT 5.1 hallucinates API's that don't
       | exist, though it's a step forward in that it also hallucinates
       | the non existence of APIs that exist.
       | 
       | But I reckon that every time that humans have been able to
       | improve their information processing in any way, the world has
       | changed. Even if all we get is to have an LLM be right more times
       | than it is wrong, the world will change again.
        
       | vessenes wrote:
       | > "If I could short MCP, I would"
       | 
       | I mean, MCP is hard to work with. But there's a very large set of
       | things that we want a hardened interface to out there - if not
       | MCP, it will be something very like it. In particular, MCP was
       | probably overly complicated at the design phase to deal with the
       | realities of streaming text / tokens back and forth live. That
       | is, it chose not to abstract these realities in exchange for some
       | nice features, and we got a lot of implementation complexity
       | early.
       | 
       | To quote the Systems Bible, any working complex system is only
       | the result of the growth of a working simple system -- MCP seems
       | to me to be _right_ on the edge of what you 'd define as a
       | "working simple system" -- but to the extent it's all torn down
       | for something simpler, that thing will inevitably evolve to allow
       | API specifications, API calls, and streaming interaction modes.
       | 
       | Anyway, I'm "neutral" on MCP, which is to say I don't love it.
       | But I don't have a better system in mind, and crucially, because
       | these models still need fine-tuning to deal properly with agent
       | setups, I think it's likely here to stay.
        
         | zby wrote:
         | MCP is another middleware story - this always fails (hat tip
         | Benetict Evans).
        
         | robot-wrangler wrote:
         | I always see the hard/complex criticism but find it confusing..
         | what is the perceived difficulty with MCP at the implementation
         | level? (I do understand the criticism about exhausting tokens
         | with tool-descriptions and stuff, but that's a different
         | challenge)
         | 
         | Doesn't seem like implementation could be _more_ simple. Just
         | JSON-RPC and API stuff. For example the MCP hello-world with
         | python and FastMCP is practically 1-to-1 with a http /web
         | flavored hello-world in flask
        
           | vessenes wrote:
           | There is a LOT under the surface. custom routes,
           | bidirectional streaming choices (it started as a "local
           | first" protocol). Implementing an endpoint from scratch is
           | not easy, and the spec documentation moves very quickly, and
           | generally doesn't have simple-to-digest updates for
           | implementation.
           | 
           | I haven't looked in a few months, so my information might be
           | a bit out of date, but at the time - if you wanted to use a
           | python server from the modelcontextprotocol GitHub, fine. If
           | you wanted to, say, build a proxy server in rust or golang,
           | you were looking at a set of half-implemented server
           | implementations targeting two-versions-old MCP specs while
           | clients like claude obscure even which endpoints they use for
           | discovery.
           | 
           | It's an immature spec, moderately complicated, and moving
           | really quickly with only a few major 'subscribers' to the
           | server side; I found it challenging to work with.
        
       | mkagenius wrote:
       | > Skills are the actualization of the dream that was set out by
       | ChatGPT Plugins .. But I have a hypothesis that it might actually
       | work now because the models are actually smart enough for it to
       | work.
       | 
       | and earlier Simon Willison argued[1] that Skills are even bigger
       | deal than MCP.
       | 
       | But I do not see as much hype for Skills as it was for MCP - it
       | seems people are in the MCP "inertia" and having no time to shift
       | to Skills.
       | 
       | 1. https://simonwillison.net/2025/Oct/16/claude-skills/
        
         | zby wrote:
         | I still don't get what is special about the skills directory -
         | since like forever I instructed Claud Code - "please read X and
         | do Y" - how skills are different from that?
        
           | mkagenius wrote:
           | The difference is that the code in the directory (and the
           | markdown) are hardcoded and known to work beforehand.
        
             | munk-a wrote:
             | But we are still reliant on the LLM correctly interpreting
             | the choice to pick the right skill. So "known to work"
             | should be understood in the very limited context of "this
             | sub-function will do what it was designed to do reliably"
             | rather than "if the user asks to use this sub-function it
             | will do was it was designed to do reliably".
             | 
             | Skills feel like a non-feature to me. It feels more
             | valuable to connect a user to the actual tool and let them
             | familiarize themselves with it (and not need the LLM to
             | find it in the future) rather than having the tool embedded
             | in the LLM platform. I will carve out a very big exception
             | of accessibility here - I love my home device being an egg
             | timer - it's a wonderful egg timer (when it doesn't
             | randomly play music) and I could buy an egg timer but
             | having a hands-free egg timer is actually quite valuable to
             | me while cooking. So I believe there is real value in
             | making these features accessible through the LLM over media
             | that the feature would normally be difficult to use in.
        
           | bavell wrote:
           | Not really special, just officially supported and I'm
           | guessing how best to use it baked in via RL. Claude already
           | knows how skills work vs learning your own home-rolled
           | solution.
        
           | simonw wrote:
           | They're not. They are just a formalization of that pattern,
           | with a very tiny extra feature where the model harness scans
           | that folder on startup and loads some YAML metadata into the
           | system prompt so it knows which ones to read later on.
        
         | CuriouslyC wrote:
         | Skills do something you could already do with folder level
         | readme files and hyperlinks inside source, but in a vendor-
         | locked-in way. Not a fan.
        
           | mkagenius wrote:
           | It's definitely not vendor locked. For instance, I have made
           | it work with Gemini with Open-Skills[1].
           | 
           | It is after all a collection of instructions and code that
           | any other llm can read and understand and then do a code
           | execution (via tool call / mcp call)
           | 
           | 1. Open-Skills: https://github.com/BandarLabs/open-skills
        
           | pluralmonad wrote:
           | They are just text files though. I'm sensitive to vendor
           | lock-in and do not perceive a standard folder structure and
           | bare text files to be that.
        
             | bavell wrote:
             | Yeah, the reason I like Skills better than MCP is
             | specifically because skills are just plain text.
        
         | robot-wrangler wrote:
         | Skills are like the "end-user" version of MCP at best, where
         | MCP is for people building systems. Any other point of view
         | raises a lot of questions.
         | 
         | Aren't skills really just a collection of tagged MCP prompts,
         | config resources, and tools, except with more lock-in since
         | only Claude can use it? About that "agent virtual environment"
         | that runs the scripts.. how is it customized, and.. can it just
         | be a container? Aren't you going to need to ship/bundle
         | dependencies for the tools/libraries those skills
         | require/reference, and at that point why are we avoiding MCP-
         | style docker/npx/uvx again?
         | 
         | Other things that jump out are that skills are supposed to be
         | "composable", yet afaik it's still the case that skills may not
         | explicitly reference other skills. Huge limiting factors IMHO
         | compared to MCP servers that can just use boring inheritance
         | and composition with, you know, programming languages, or
         | composition/grouping with namespacing and such at the server
         | layer. It's unclear how we're going to extend skills, require
         | skills, use remote skills, "deploy" reusable skills etc etc,
         | and answering all these questions gets us most of the way back
         | to MCP!
         | 
         | That said, skills do seem like a potentially useful alternate
         | "view" on the same data/code that MCP is covering. If it really
         | catches on, maybe we'll see skill-to-MCP converters for serious
         | users that want to be able do the normal stuff (like scaling
         | out, testing in isolation, doing stuff without being completely
         | attached to the claude engine forever). Until there's
         | interoperability I personally can't see getting interested
         | though
        
         | fzysingularity wrote:
         | I definitely see the value and versatility of Claude Skills
         | (over what MCP is today), but I find the sandboxed execution to
         | be painfully inefficient.
         | 
         | Even if we expect the LLMs to fully resolve the task, it'll
         | heavily rely on I/O and print statements sprinkled across the
         | execution trace to get the job done.
        
           | mkagenius wrote:
           | > but I find the sandboxed execution to be painfully
           | inefficient
           | 
           | sandbox is not mandatory here. You can execute the skills on
           | your host machine too (with some fidgeting) but it's a good
           | practice and probably for the better to get in to the habit
           | of executing code in an isolated environment for security
           | purposes.
        
             | munk-a wrote:
             | The better practice is, if it isn't a one-off, being
             | introduced to the tool (perhaps by an LLM) and then just
             | running the tool yourself with structured inputs when it is
             | appropriate. I think the 2015 era novice coding habit of
             | copying a blob of twenty shell scripts off of stack
             | overflow and blindly running them in your terminal (while
             | also not good for obvious reasons) was better than that
             | essentially happening but you not being able to watch and
             | potentially learn what those commands were.
        
               | fzysingularity wrote:
               | I do think that if the agents can successfully resolve
               | these tasks in a code execution environment, it can
               | likely come up with better parametrized solutions with
               | structured I/O - assuming these are workflows we want to
               | run over and over again.
        
         | sawyerjhood wrote:
         | I agree with you. I don't see people hyping them and I think a
         | big part of this is that we have sort of hit an LLM fatigue
         | point right now. Also Skills require that your agent can
         | execute arbitrary code which is a bigger buy-in cost if your
         | app doesn't have this already.
        
       | j2kun wrote:
       | I don't see how "they improved the models" is related to the
       | bitter lesson. You are still injecting human-level expertise
       | (whether it is by prompts or a structured API) to compensate for
       | the model's failures. A "bitter lesson" would be that the model
       | can do better without any injection, but more compute power, than
       | it could with human interference.
        
         | idle_zealot wrote:
         | > A "bitter lesson" would be that the model can do better
         | without any injection, but more compute power, than it could
         | with human interference.
         | 
         | This is what I expected the post to be about before clicking.
        
       | zby wrote:
       | I believe that what we need is treating prompts as stochastic
       | programs and using a special shell for calling them. Claude Code
       | and Codex and other coding agents are like that - now everybody
       | understands that they are not just coding assistants they are a
       | general shell that can use LLM for executing specs. I would like
       | to have this extracted from IDE tools - this is what I am working
       | on in llm-do.
        
       ___________________________________________________________________
       (page generated 2025-11-24 23:00 UTC)