[HN Gopher] The Bitter Lesson of LLM Extensions
___________________________________________________________________
The Bitter Lesson of LLM Extensions
Author : sawyerjhood
Score : 64 points
Date : 2025-11-24 18:32 UTC (4 hours ago)
(HTM) web link (www.sawyerhood.com)
(TXT) w3m dump (www.sawyerhood.com)
| dsign wrote:
| I don't know, even ChatGPT 5.1 hallucinates API's that don't
| exist, though it's a step forward in that it also hallucinates
| the non existence of APIs that exist.
|
| But I reckon that every time that humans have been able to
| improve their information processing in any way, the world has
| changed. Even if all we get is to have an LLM be right more times
| than it is wrong, the world will change again.
| vessenes wrote:
| > "If I could short MCP, I would"
|
| I mean, MCP is hard to work with. But there's a very large set of
| things that we want a hardened interface to out there - if not
| MCP, it will be something very like it. In particular, MCP was
| probably overly complicated at the design phase to deal with the
| realities of streaming text / tokens back and forth live. That
| is, it chose not to abstract these realities in exchange for some
| nice features, and we got a lot of implementation complexity
| early.
|
| To quote the Systems Bible, any working complex system is only
| the result of the growth of a working simple system -- MCP seems
| to me to be _right_ on the edge of what you 'd define as a
| "working simple system" -- but to the extent it's all torn down
| for something simpler, that thing will inevitably evolve to allow
| API specifications, API calls, and streaming interaction modes.
|
| Anyway, I'm "neutral" on MCP, which is to say I don't love it.
| But I don't have a better system in mind, and crucially, because
| these models still need fine-tuning to deal properly with agent
| setups, I think it's likely here to stay.
| zby wrote:
| MCP is another middleware story - this always fails (hat tip
| Benetict Evans).
| robot-wrangler wrote:
| I always see the hard/complex criticism but find it confusing..
| what is the perceived difficulty with MCP at the implementation
| level? (I do understand the criticism about exhausting tokens
| with tool-descriptions and stuff, but that's a different
| challenge)
|
| Doesn't seem like implementation could be _more_ simple. Just
| JSON-RPC and API stuff. For example the MCP hello-world with
| python and FastMCP is practically 1-to-1 with a http /web
| flavored hello-world in flask
| vessenes wrote:
| There is a LOT under the surface. custom routes,
| bidirectional streaming choices (it started as a "local
| first" protocol). Implementing an endpoint from scratch is
| not easy, and the spec documentation moves very quickly, and
| generally doesn't have simple-to-digest updates for
| implementation.
|
| I haven't looked in a few months, so my information might be
| a bit out of date, but at the time - if you wanted to use a
| python server from the modelcontextprotocol GitHub, fine. If
| you wanted to, say, build a proxy server in rust or golang,
| you were looking at a set of half-implemented server
| implementations targeting two-versions-old MCP specs while
| clients like claude obscure even which endpoints they use for
| discovery.
|
| It's an immature spec, moderately complicated, and moving
| really quickly with only a few major 'subscribers' to the
| server side; I found it challenging to work with.
| mkagenius wrote:
| > Skills are the actualization of the dream that was set out by
| ChatGPT Plugins .. But I have a hypothesis that it might actually
| work now because the models are actually smart enough for it to
| work.
|
| and earlier Simon Willison argued[1] that Skills are even bigger
| deal than MCP.
|
| But I do not see as much hype for Skills as it was for MCP - it
| seems people are in the MCP "inertia" and having no time to shift
| to Skills.
|
| 1. https://simonwillison.net/2025/Oct/16/claude-skills/
| zby wrote:
| I still don't get what is special about the skills directory -
| since like forever I instructed Claud Code - "please read X and
| do Y" - how skills are different from that?
| mkagenius wrote:
| The difference is that the code in the directory (and the
| markdown) are hardcoded and known to work beforehand.
| munk-a wrote:
| But we are still reliant on the LLM correctly interpreting
| the choice to pick the right skill. So "known to work"
| should be understood in the very limited context of "this
| sub-function will do what it was designed to do reliably"
| rather than "if the user asks to use this sub-function it
| will do was it was designed to do reliably".
|
| Skills feel like a non-feature to me. It feels more
| valuable to connect a user to the actual tool and let them
| familiarize themselves with it (and not need the LLM to
| find it in the future) rather than having the tool embedded
| in the LLM platform. I will carve out a very big exception
| of accessibility here - I love my home device being an egg
| timer - it's a wonderful egg timer (when it doesn't
| randomly play music) and I could buy an egg timer but
| having a hands-free egg timer is actually quite valuable to
| me while cooking. So I believe there is real value in
| making these features accessible through the LLM over media
| that the feature would normally be difficult to use in.
| bavell wrote:
| Not really special, just officially supported and I'm
| guessing how best to use it baked in via RL. Claude already
| knows how skills work vs learning your own home-rolled
| solution.
| simonw wrote:
| They're not. They are just a formalization of that pattern,
| with a very tiny extra feature where the model harness scans
| that folder on startup and loads some YAML metadata into the
| system prompt so it knows which ones to read later on.
| CuriouslyC wrote:
| Skills do something you could already do with folder level
| readme files and hyperlinks inside source, but in a vendor-
| locked-in way. Not a fan.
| mkagenius wrote:
| It's definitely not vendor locked. For instance, I have made
| it work with Gemini with Open-Skills[1].
|
| It is after all a collection of instructions and code that
| any other llm can read and understand and then do a code
| execution (via tool call / mcp call)
|
| 1. Open-Skills: https://github.com/BandarLabs/open-skills
| pluralmonad wrote:
| They are just text files though. I'm sensitive to vendor
| lock-in and do not perceive a standard folder structure and
| bare text files to be that.
| bavell wrote:
| Yeah, the reason I like Skills better than MCP is
| specifically because skills are just plain text.
| robot-wrangler wrote:
| Skills are like the "end-user" version of MCP at best, where
| MCP is for people building systems. Any other point of view
| raises a lot of questions.
|
| Aren't skills really just a collection of tagged MCP prompts,
| config resources, and tools, except with more lock-in since
| only Claude can use it? About that "agent virtual environment"
| that runs the scripts.. how is it customized, and.. can it just
| be a container? Aren't you going to need to ship/bundle
| dependencies for the tools/libraries those skills
| require/reference, and at that point why are we avoiding MCP-
| style docker/npx/uvx again?
|
| Other things that jump out are that skills are supposed to be
| "composable", yet afaik it's still the case that skills may not
| explicitly reference other skills. Huge limiting factors IMHO
| compared to MCP servers that can just use boring inheritance
| and composition with, you know, programming languages, or
| composition/grouping with namespacing and such at the server
| layer. It's unclear how we're going to extend skills, require
| skills, use remote skills, "deploy" reusable skills etc etc,
| and answering all these questions gets us most of the way back
| to MCP!
|
| That said, skills do seem like a potentially useful alternate
| "view" on the same data/code that MCP is covering. If it really
| catches on, maybe we'll see skill-to-MCP converters for serious
| users that want to be able do the normal stuff (like scaling
| out, testing in isolation, doing stuff without being completely
| attached to the claude engine forever). Until there's
| interoperability I personally can't see getting interested
| though
| fzysingularity wrote:
| I definitely see the value and versatility of Claude Skills
| (over what MCP is today), but I find the sandboxed execution to
| be painfully inefficient.
|
| Even if we expect the LLMs to fully resolve the task, it'll
| heavily rely on I/O and print statements sprinkled across the
| execution trace to get the job done.
| mkagenius wrote:
| > but I find the sandboxed execution to be painfully
| inefficient
|
| sandbox is not mandatory here. You can execute the skills on
| your host machine too (with some fidgeting) but it's a good
| practice and probably for the better to get in to the habit
| of executing code in an isolated environment for security
| purposes.
| munk-a wrote:
| The better practice is, if it isn't a one-off, being
| introduced to the tool (perhaps by an LLM) and then just
| running the tool yourself with structured inputs when it is
| appropriate. I think the 2015 era novice coding habit of
| copying a blob of twenty shell scripts off of stack
| overflow and blindly running them in your terminal (while
| also not good for obvious reasons) was better than that
| essentially happening but you not being able to watch and
| potentially learn what those commands were.
| fzysingularity wrote:
| I do think that if the agents can successfully resolve
| these tasks in a code execution environment, it can
| likely come up with better parametrized solutions with
| structured I/O - assuming these are workflows we want to
| run over and over again.
| sawyerjhood wrote:
| I agree with you. I don't see people hyping them and I think a
| big part of this is that we have sort of hit an LLM fatigue
| point right now. Also Skills require that your agent can
| execute arbitrary code which is a bigger buy-in cost if your
| app doesn't have this already.
| j2kun wrote:
| I don't see how "they improved the models" is related to the
| bitter lesson. You are still injecting human-level expertise
| (whether it is by prompts or a structured API) to compensate for
| the model's failures. A "bitter lesson" would be that the model
| can do better without any injection, but more compute power, than
| it could with human interference.
| idle_zealot wrote:
| > A "bitter lesson" would be that the model can do better
| without any injection, but more compute power, than it could
| with human interference.
|
| This is what I expected the post to be about before clicking.
| zby wrote:
| I believe that what we need is treating prompts as stochastic
| programs and using a special shell for calling them. Claude Code
| and Codex and other coding agents are like that - now everybody
| understands that they are not just coding assistants they are a
| general shell that can use LLM for executing specs. I would like
| to have this extracted from IDE tools - this is what I am working
| on in llm-do.
___________________________________________________________________
(page generated 2025-11-24 23:00 UTC)