hngopher.com

       [HN Gopher] Tools: Code Is All You Need
       ___________________________________________________________________
        
       Tools: Code Is All You Need
        
       Author : Bogdanp
       Score  : 252 points
       Date   : 2025-07-03 10:51 UTC (12 hours ago)
        
 (HTM) web link (lucumr.pocoo.org)
 (TXT) w3m dump (lucumr.pocoo.org)
        
       | mritchie712 wrote:
       | > try completing a GitHub task with the GitHub MCP, then repeat
       | it with the gh CLI tool. You'll almost certainly find the latter
       | uses context far more efficiently and you get to your intended
       | results quicker.
       | 
       | This is spot on. I have a "devops" folder with a CLAUDE.md with
       | bash commands for common tasks (e.g. find prod / staging logs
       | with this integration ID).
       | 
       | When I complete a novel task (e.g. count all the rows that were
       | synced from stripe to duckdb) I tell Claude to update CLAUDE.md
       | with the example. The next time I ask a similar question, Claude
       | one-shots it.
       | 
       | This is the first few lines of the CLAUDE.md
       | This file provides guidance to Claude Code (claude.ai/code) when
       | working with code in this repository.              ## Purpose
       | This devops folder is dedicated to Google Cloud Platform (GCP)
       | operations, focusing on:         - Google Cloud Composer
       | (Airflow) DAG management and monitoring         - Google Cloud
       | Logging queries and analysis         - Kubernetes cluster
       | management (GKE)         - Cloud Run service debugging
       | ## Common DevOps Commands              ### Google Cloud Composer
       | ```bash         # View Composer environment details
       | gcloud composer environments describe meltano --location us-
       | central1 --project definite-some-id              # List DAGs in
       | the environment         gcloud composer environments storage dags
       | list --environment meltano --location us-central1 --project
       | definite-some-id              # View DAG runs         gcloud
       | composer environments run meltano --location us-central1 dags
       | list              # Check Airflow logs         gcloud logging
       | read 'resource.type="cloud_composer_environment" AND
       | resource.labels.environment_name="meltano"' --project definite-
       | some-id --limit 50
        
         | lsaferite wrote:
         | Just as a related aside, you could literally make that bottom
         | section into a super simple stdio MCP Server and attach that to
         | Claude Code. Each of your operations could be a tool and have a
         | well-defined schema for parameters. Then you are giving the LLM
         | a more structured and defined way to access your custom
         | commands. I'm pretty positive there are even pre-made MCP
         | Servers that are designed for just this activity.
         | 
         | Edit: First result when looking for such an MCP Server:
         | https://github.com/inercia/MCPShell
        
           | gbrindisi wrote:
           | wouldn't this defeat the point? Claude Code already has
           | access to the terminal, adding specific instruction in the
           | context is enough
        
             | lsaferite wrote:
             | No. You are giving textual instructions to Claude in the
             | hopes that it correctly generates a shell command for you
             | vs giving it a tool definition with a clearly defined
             | schema for parameters and your MCP Server is, presumably,
             | enforcing adherence to those parameters BEFORE it hits your
             | shell. You would be helping Claude in this case as you're
             | giving a clearer set of constraints on operation.
        
               | fassssst wrote:
               | Either way it is text instructions used to call a
               | function (via a JSON object for MCP or a shell command
               | for scripts). What works better depends on how the model
               | you're using was post trained and where in the prompt
               | that info gets injected.
        
               | wrs wrote:
               | Well, with MCP you're giving textual instructions to
               | Claude in hopes that it correctly generates a tool call
               | for you. It's not like tool calls have access to some
               | secret deterministic mode of the LLM; it's still just
               | text.
               | 
               | To an LLM there's not much difference between the list of
               | sample commands above and the list of tool commands it
               | would get from an MCP server. JSON and GNU-style args are
               | very similar in structure. And presumably the command is
               | enforcing constraints even better than the MCP server
               | would.
        
               | lsaferite wrote:
               | Not strictly true. The LLM provider should be running a
               | constrained token selection based off of the json schema
               | of the tool call. That alone makes a massive difference
               | as you're already discarding non-valid tokens during the
               | completion at a low level. Now, if they had a BNF Grammer
               | for each cli tool and enforced token selection based on
               | that, you'd be much better off than unrestrained token
               | selection.
        
         | jayd16 wrote:
         | I feel like I'm taking crazy pills sometimes. You have a file
         | with a set of snippets and you prefer to ask the AI to
         | hopefully run them instead of just running it yourself?
        
           | lreeves wrote:
           | The commands aren't the special sauce, it's the analytical
           | capabilities of the LLM to view the outputs of all those
           | commands and correlate data or whatever. You could accomplish
           | the same by prefilling a gigantic context window with all the
           | logs but when the commands are presented ahead of time the
           | LLM can "decide" which one to run based on what it needs to
           | do.
        
           | light_hue_1 wrote:
           | Yes. I'm not the poster but I do something similar.
           | 
           | Because now the capabilities of the model grow over time. And
           | I can ask questions that involve a handful of those snippets.
           | When we get to something new that requires some doing, it
           | becomes another snippet.
           | 
           | I can offload everything I used to know about an API and
           | never have to think about it again.
        
           | mritchie712 wrote:
           | the snippets are examples. You can ask hundreds of variations
           | of similar, but different, complex questions and the LLM can
           | adjust the example for that need.
           | 
           | I don't have a snippet for, "find all 500's for the meltano
           | service for duckdb syntax errors", but it'd easily nail that
           | given the existing examples.
        
             | dingnuts wrote:
             | but if I know enough about the service to write examples,
             | most of the time I will know the command I want, which is
             | less typing, faster, costs less, and doesn't waste a ton of
             | electricity.
             | 
             | In the other cases I see what the computer outputs, LEARN,
             | and then the functionality of finding what I need just
             | isn't useful next time. Next time I just type the command.
             | 
             | I don't get it.
        
               | loudmax wrote:
               | LLMs are really good at processing vague descriptions of
               | problems and offering a solution that's reasonably close
               | to the mark. They can be a great guide for unfamiliar
               | tools.
               | 
               | For example, I have a pretty good grasp of regular
               | expressions because I'm an old Perl programmer, but I
               | find processing json using `jq` utterly baffling. LLMs
               | are great at coming up with useful examples, and
               | sometimes they'll even get it perfect the first time.
               | I've learned more about properly using `jq` with the help
               | of LLMs than I ever did on my own. Same goes for
               | `ffmpeg`.
               | 
               | LLMs are not a substitute for learning. When used
               | properly, they're an enhancement to learning.
               | 
               | Likewise, never mind the idiot CEOs of failing companies
               | looking forward to laying off half their workforce and
               | replacing them with AI. When properly used, AI is a tool
               | to help people become more productive, not replace human
               | understanding.
        
           | qazxcvbnmlp wrote:
           | You dont ask the ai to run the commands. you say "build and
           | test this feature" and then the AI correctly iterates back
           | and forth between the build and test commands until the thing
           | works.
        
         | chriswarbo wrote:
         | I use a similar file, but just for myself (I've never used an
         | LLM "agent"). I live in Emacs, but this is the only thing I use
         | org-mode for: it lets me fold/unfold the sections, and I can
         | press C-c C-c over any of the code snippets to execute it. Some
         | of them are shell code, some of them are Emacs Lisp code which
         | generates shell code, etc.
        
         | stpedgwdgfhgdd wrote:
         | I do something similar, but the problem is that claude.md keeps
         | on growing.
         | 
         | To tackle this, I converted a custom prompt into an
         | application, but there is an interesting trade-off. The
         | application is deterministic. It cannot deal with unknown
         | situations. In contrast to CC, which is way slower, but can try
         | alternative ways of dealing with an unknown situation.
         | 
         | I ended up with adding an instruction to the custom command to
         | run the application and fix the application code (TDD) if there
         | is a problem. Self healing software... who ever thought
        
         | e12e wrote:
         | You're letting the LLM execute privileged API calls against
         | your production/test/staging environment, just hoping it won't
         | corrupt something, like truncate logs, files, databases etc?
         | 
         | Or are you asking it to provide example commands that you can
         | sanity check?
         | 
         | I'd be curious to see some more concrete examples.
        
       | mindwok wrote:
       | More appropriately: the terminal is all you need.
       | 
       | I have used MCP daily for a few months. I'm now down to a single
       | MCP server: terminal (iTerm2). I have OpenAPI specs on hand if I
       | ever need to provide them, but honestly shell commands and curl
       | get you pretty damn far.
        
         | jasonthorsness wrote:
         | I never knew how far it was possible to go in bash shell with
         | the built-in tools until I saw the LLMs use them.
        
           | zahlman wrote:
           | Possibly because most people who could mentor you, would give
           | up and switch to their preference of {Perl, Python, Ruby,
           | PHP, ...} far earlier.
           | 
           | (Check out Dave Eddy, though. https://github.com/bahamas10 ;
           | also occasionally streams on YouTube and then creates short
           | educational video content there:
           | https://www.youtube.com/@yousuckatprogramming )
        
       | pclowes wrote:
       | Directionally I think this is right. Most LLM usage at scale
       | tends to be filling the gaps between two hardened interfaces. The
       | reliability comes not from the LLM inference and generation but
       | the interfaces themselves only allowing certain configuration to
       | work with them.
       | 
       | LLM output is often coerced back into something more
       | deterministic such as types, or DB primary keys. The value of the
       | LLM is determined by how well your existing code and tools model
       | the data, logic, and actions of your domain.
       | 
       | In some ways I view LLMs today a bit like 3D printers, both in
       | terms of hype and in terms of utility. They excel at quickly
       | connecting parts similar to rapid prototyping with 3d printing
       | parts. For reliability and scale you want either the LLM or an
       | engineer to replace the printed/inferred connector with something
       | durable and deterministic (metal/code) that is cheap and fast to
       | run at scale.
       | 
       | Additionally, there was a minute during the 3D printer Gardner
       | hype cycle where there were notions that we would all just print
       | substantial amounts of consumer goods when the reality is the
       | high utility use case are much more narrow. There is a corollary
       | here to LLM usage. While LLMs are extremely useful we cannot rely
       | on LLMs to generate or infer our entire operational reality or
       | even engage meaningfully with it without some sort of pre-
       | existing digital modeling as an anchor.
        
         | abdulhaq wrote:
         | this is a really good take
        
         | foobarbecue wrote:
         | Hype cycle for drones and VR was similar -- at the peak, you
         | have people claiming drones will take over package delivery and
         | everyone will spend their day in VR. Reality is that the
         | applicability is more narrow.
        
           | jangxx wrote:
           | I mean both of these things are actually happening (drone
           | deliveries and people spending a lot of time in VR), just at
           | a much much smaller scale than it was hyped up to be.
        
             | giovannibonetti wrote:
             | Drones and VR require significant upfront hardware
             | investment, which curbs adoption. On the other hand,
             | adopting LLM-as-a-service has none of these costs, so no
             | wonder so many companies are getting involved with it so
             | quickly.
        
               | nativeit wrote:
               | Right, but abstract costs are still costs to _someone_ ,
               | so how far does that go before mass adoption turns into a
               | mass liability for whomever is ultimately on the hook? It
               | seems like there is this extremely risky wager that
               | everyone is playing--that LLM's will find their "killer
               | app" before the real costs of maintaining them becomes
               | too much to bear. I don't think these kinds of bets often
               | pay off. The opposite actually, I think every truly
               | revolutionary technological advance in the contemporary
               | timeframe has arisen out of its very obvious killer
               | app(s), they were in a sense inevitable. Speculative tech
               | --the blockchain being one of the more salient and
               | frequently tapped examples--tends to work in pretty clear
               | bubbles, in my estimation. I've not yet been convinced
               | this one is any different, aside from the absurd scale at
               | which it has been cynically sold as the biggest thing
               | since Gutenberg, but while that makes it somewhat
               | distinct, it's still a rather poor argument against it
               | being a bubble.
        
             | pxc wrote:
             | A parallel outcome for LLMs sounds realistic to me.
        
             | deadbabe wrote:
             | If it's not happening at the scale it was pitched, then
             | it's not happening.
        
               | jangxx wrote:
               | This makes no sense, just because something didn't become
               | as big as the hypemen said it would doesn't make the
               | inventions or users of those inventions disappear.
        
               | deadbabe wrote:
               | For something to be considered "happening" you can't just
               | have a handful of localized examples. It has to be
               | happening at a large noticeable scale that even people
               | unfamiliar with the tech are noticing. Then you can say
               | it's "happening". Otherwise, it's just smaller groups of
               | people doing stuff.
        
               | falcor84 wrote:
               | Considering what we've been seeing in the Russia-Ukraine
               | and Iran-Israel wars, drones are definitely happening at
               | scale. For better or for worse, I expect worldwide
               | production of drones to greatly expand over the coming
               | years.
        
           | soulofmischief wrote:
           | That's the claim for AR, not VR, and you're just noticing how
           | research and development cycles play out, you can draw
           | comparisons to literally any technology cycle.
        
             | 65 wrote:
             | That is in fact the claim for VR. Remember the Metaverse?
             | Oculus headsets are VR headsets. The Apple Vision Pro is a
             | VR headset.
        
               | mumbisChungo wrote:
               | The metaverse is and was a guess at how the children of
               | today might interact as they age into active market
               | participants. Like all these other examples, speculative
               | mania preceded genuine demand and it remains to be seen
               | whether it plays out over the coming 10-15 years.
        
               | sizzle wrote:
               | Ahh yes let's get the next generation addicted to literal
               | screens strapped to their eyeballs for maximum
               | monetization, humanity be damned. Glad it's a failing
               | bet. Now sex bots might be onto something...
        
               | mumbisChungo wrote:
               | It may or may not be a failing bet. Maybe smartphones are
               | the ultimate form of human-data interface and we'll
               | simply never do better.
        
               | jrm4 wrote:
               | I'll take your argument a bit further. The thing is --
               | "human-data" interfaces are not particularly important.
               | Human-Human ones are. This is probably why it's going to
               | be difficult, if not impossible, to beat the smartphone;
               | VR or whatever doesn't fundamentally "bring people closer
               | together" in a way the smartphone nearly absolutely did.
        
               | mumbisChungo wrote:
               | VR may not, but social interaction with AR might be more
               | palatable and better UX than social interaction while
               | constantly looking down at at a computer we still call a
               | "phone" for some reason.
        
               | outworlder wrote:
               | > The Apple Vision Pro is a VR headset.
               | 
               | For some use cases it is indeed used for VR. But it has
               | AR applications and all the necessary hardware and
               | software.
        
           | threatofrain wrote:
           | Good drones are very Chinese atm, as is casual consumer drone
           | delivery. Americans might be more than a decade away even
           | with concerted bipartisan war-like effort to boost domestic
           | drone competency.
           | 
           | The reality is Chinese.
        
             | sarchertech wrote:
             | Aren't people building DIY drones that are close to and in
             | some cases superior to off the shelf Chinese drones?
        
               | threatofrain wrote:
               | Off the shelf Chinese drones is somewhat vague, we can
               | just say DJI. Their full drone and dock system for the
               | previous generation goes for around $20k. DJI iterates on
               | this space on a yearly cadence and have just come out
               | with the Dock 3.
               | 
               | 54 minute flight time (47 min hover) for fully unmanned
               | operations.
               | 
               | If you're talking about fpv racing where tiny drones fly
               | around 140+ mph, then yeah DJI isn't in that space.
        
               | sarchertech wrote:
               | That hardly seems like it would take the US 10 years to
               | replicate on a war footing aside from the price.
               | 
               | I mean if we're talking dollar to dollar comparison, the
               | US will likely never be able to produce something as
               | cheaply as China (unless China drastically increases
               | their average standard of living).
        
               | tonyarkles wrote:
               | There's a really weird phenomenon too with drones. I've
               | used Chinese (non-drone) software for work a bunch in the
               | past and it's been almost universally awful. On the drone
               | side, especially DJI, they've flipped this script
               | completely. Every non-DJI drone I've flown has had
               | miserable UX in comparison to DJI. Mission Planner (open
               | source, as seen in the Ukraine attack videos) is super
               | powerful but also looks like ass and functions similarly.
               | QGC is a bit better, especially the vendor-customized
               | versions (BSD licensed) but the vendors almost always
               | neuter great features that are otherwise available in the
               | open source version and at the same time modify things so
               | that you can't talk to the aircraft using the OSS
               | version. The commercial offerings I've used are no
               | better.
               | 
               | Sure, we need to be working on being able to build the
               | hardware components in North America, and I've seen a
               | bunch of people jump on that in the last year. But wow is
               | the software ever bad and I haven't really seen anyone
               | working to improve that.
        
           | ivape wrote:
           | You checked out drone warfare? It's all the rage in every
           | conflict at the moment. The hype around drones is not fake,
           | and I'd compare it more to autonomous cars because regulation
           | is the only reason you don't see a million private drones
           | flying around.
        
             | dazed_confused wrote:
             | Yes, to an extent, but I would say that is an extension of
             | artillery and long-range fire capabilities.
        
               | jmj wrote:
               | As is well known, AI is whatever hasn't been done yet.
        
           | golergka wrote:
           | People claimed that we would spend most of our day on the
           | internet in the mid-90s, and then the dotcom bubble burst.
           | And then people claimed that by 2015 robo-taxis would be
           | around all the major cities of the planet.
           | 
           | You can be right but too early. There was a hype wave for
           | drones and VR (more than one for the latter one), but I
           | wouldn't be so sure that it's peak of their real world usage
           | yet.
        
             | skeeter2020 wrote:
             | >> You can be right but too early.
             | 
             | Unless opportunity cost is zero this is a varation on being
             | wrong.
        
         | whiplash451 wrote:
         | Interesting take but too bearish on LLMs in my opinion.
         | 
         | LLMs have already found large-scale usage (deep research,
         | translation) which makes them more ubiquitous today than 3D
         | printers ever will or could have been.
        
           | dingnuts wrote:
           | And yet you didn't provide a single reference link! Every
           | case of LLM usage that I've seen claimed about those things
           | has been largely a lie -- guess you won't take the
           | opportunity to be the first to present a real example. Just
           | another rumor.
        
             | whiplash451 wrote:
             | My reference is the daily usage of chatgpt around me
             | (outside of tech circles).
             | 
             | I don't want to sound like a hard-core LLM believer. I get
             | your point and it's fair.
             | 
             | I just wanted to point out that the current usage of
             | chatgpt is a lot broader than that of 3D printers even at
             | the peak hype of it.
        
               | dingnuts wrote:
               | Outside of tech circles it looks like NFTs: people
               | following hype using tech they don't understand which
               | will be popular until the downsides we're aware of that
               | they are ignorant to have consequences, and then the
               | market will reflect the shift in opinion.
        
               | whiplash451 wrote:
               | I see it differently: people are switching to chatgpt
               | like they switched to google back in 2005 (from whatever
               | alternative existed back then)
               | 
               | And I mean random people, not tech circles
               | 
               | It's very different from NFTs in that respect
        
               | basch wrote:
               | No way.
               | 
               | Everybody under a certain age is using ChatGPT, where
               | they were once using search and friendship/expertises.
               | It's the number 1 app in the App Store. Copilot use in
               | the enterprise is so seamless, you just talk to
               | PowerPoint or outlook and it formulated what you were
               | supposed to make or write.
               | 
               | It's not a fad, it is a paradigm change.
               | 
               | People don't need to understand how it works for it to
               | work.
        
               | lotsoweiners wrote:
               | > It's the number 1 app in the App Store.
               | 
               | When I checked the iOS App Store just now, something
               | called Love Island USA is the #1 free app. Kinda makes
               | you think....
        
               | dingnuts wrote:
               | I know it's popular; that doesn't mean it's not a fad.
               | Consequences take time. It's easy to use but once you get
               | burned in a serious way by the bot that's still wrong 20%
               | of the time, you'll become more reluctant to put your
               | coin in the slot machine.
               | 
               | Maybe if the AI companies start offering refunds for
               | wrong answers, then the price per token might not be such
               | a scam.
        
               | jrm4 wrote:
               | Not even remotely in the same universe; the difference is
               | ChatGPT is actually having an impact, people are
               | incorporating it day-to-day in a way that NFTs never
               | stood much of a chance.
        
               | retsibsi wrote:
               | Even if the most bearish predictions turn out to be
               | correct, the comparison of LLMs to NFTs is a galaxy-
               | spanning stretch.
               | 
               | NFTs are about as close to literally useless as it gets,
               | and that was always obvious; 99% of the serious attention
               | paid to them came from hustlers and speculators.
               | 
               | LLMs, for all their limitations, are already good at some
               | things and useful in some ways. Even in the areas where
               | they are (so far) too unreliable for serious use, they're
               | not pure hype and bullshit; they're doing things that
               | would have seemed like magic 10 years ago.
        
           | benreesman wrote:
           | What we call an LLM today (by which almost everyone means an
           | autogressive language model from the Generative Pretrained
           | Transformer family tree, and BERTs are still doing important
           | eork, believe that) is actually an offshoot of neural machine
           | translation.
           | 
           | This isn't (intentionally at least) mere HN pedantry: they
           | really do act like translation tools in a bunch of observable
           | ways.
           | 
           | And while they have recently crossed the threshold into
           | "yeah, I'm always going to have a gptel buffer open now"
           | territory at the extreme high end, their utility outside of
           | the really specific, totally non-generalizing code lookup
           | gizmo usecase remains a claim unsupported by robust profits.
           | 
           | There is a hole in the ground where something between 100
           | billion and a trillion dollars in the ground that so far has
           | about 20B in revenue (not profit) going into it annually.
           | 
           | AI is going to be big (it was big ten years ago).
           | 
           | LLMs? Look more and more like the Metaverse every day as
           | concerns the economics.
        
             | rapind wrote:
             | > There is a hole in the ground where something between 100
             | billion and a trillion dollars in the ground that so far
             | has about 20B in revenue (not profit) going into it
             | annually.
             | 
             | This is a concern for me. I'm using claude-code daily and
             | find it very useful, but I'm expecting the price to
             | continue getting jacked up. I do want to support Anthropic,
             | but they might eventually need to cross a price threshold
             | where I bail. We'll see.
             | 
             | I expect at some point the more open models and tools will
             | catch up when the expensive models like ChatGPT plateau
             | (assuming they do plateau). Then we'll find out if these
             | valuations measure up to reality.
             | 
             | Note to the Hypelords: It's not perfect. I need to read
             | every change and intervene often enough. "Vibe coding" is
             | nonsense as expected. It is definitely good though.
        
               | benreesman wrote:
               | Vibe coding is nonsense, and its really kind of
               | uncomfortable to realize that a bunch of people you had
               | tons of respect for are either ignorant or
               | dishonest/bought enough to say otherwise. There's a cold
               | wind blowing and the bunker-building crowd, well let's
               | just say I won't shed a tear.
               | 
               | You don't stock antibiotics and bullets in a survival
               | compound because you think that's going to keep out a
               | paperclip optimizer gone awry. You do that in the forlorn
               | hope that when the guillotines come out that you'll be
               | able to ride it out until the Nouveau Regime is in a
               | negotiating mood. But they never are.
        
               | juped wrote:
               | I'm just taking advantage and burning VCs' money on
               | useful but not world-changing tools while I still can.
               | We'll come out of it with consumer-level okay tools even
               | if they don't reach the levels of Claude today, though.
        
               | strgcmc wrote:
               | As a thought-exercise -- assume models continue to
               | improve, whereas "using claude-code daily" is something
               | you choose to do because it's useful, but is not yet at
               | the level of "absolute necessity, can't imagine work
               | without it". What if it does become, that level of
               | absolute necessity?
               | 
               | - Is your demand inelastic at that point, if having
               | claude-code becomes effectively required, to sustain your
               | livelihood? Does pricing continue to increase, until it's
               | 1%/5%/20%/50% of your salary (because hey, what's the
               | alternative? if you don't pay, then you won't keep up
               | with other engineers and will just lose your job
               | completely)?
               | 
               | - But if tools like claude-code become such a necessity,
               | wouldn't enterprises be the ones paying? Maybe, but maybe
               | like health-insurance in America (a uniquely dystopian
               | thing), your employer may pay some portion of the
               | premiums, but they'll also pass some costs to you as the
               | employee... Tech salaries have been cushy for a while
               | now, but we might be entering a "K-shaped" inflection
               | point --> if you are an OpenAI elite researcher, then you
               | might get a $100M+ offer from Meta; but if you are an
               | average dev doing average enterprise CRUD, maybe your
               | wages will be suppressed because the small cabal of LLM
               | providers can raise prices and your company HAS to pay,
               | which means you HAVE to bear the cost (or else what? you
               | can quit and look for another job, but who's hiring?)
               | 
               | This is a pessimistic take of course (and vastly
               | oversimplified / too cynical). A more positive outcome
               | might be, that increasing quality of AI/LLM options leads
               | to a democratization of talent, or a blossoming of "solo
               | unicorns"... personally I have toyed with calling this,
               | something like a "techno-Amish utopia", in the sense that
               | Amish people believe in self-sufficiency and are not
               | wholly-resistant to technology (it's actually quite
               | clever, what sorts of technology they allow for
               | themselves or not), so what if we could take that
               | further?
               | 
               | If there was a version of that Amish-mentality of
               | loosely-federated self-sufficient communities (they have
               | newsletters! they travel to each other! but they largely
               | feed themselves, build their own tools, fix their own
               | fences, etc.!), where engineers + their chosen LLM
               | partner could launch companies from home, manage their
               | home automation / security tech, run a high-tech small
               | farm, live off-grid from cheap solar, use excess
               | electricity to Bitcoin mine if they choose to, etc....
               | maybe there is actually a libertarian world that can
               | arise, where we are no longer as dependent on large
               | institutions to marshal resources, deploy capital, scale
               | production, etc., if some of those things are more in-
               | reach for regular people in smaller communities, assisted
               | by AI. This of course assumes that, the cabal of LLM
               | model creators can be broken, that you don't need to pay
               | for Claude if the cheaper open-source-ish Llama-like
               | alternative is good enough
        
               | rapind wrote:
               | Well my business doesn't rely on AI as a competitive
               | advantage, at least not yet anyways. So as it stands, if
               | claude got 100x as effective, but cost 100x more, I'm not
               | sure I could justify the cost because my market might
               | just not be large enough. Which means I can either ditch
               | it (for an alternative if one exists) or expand into
               | other markets... which is appealing but a huge change
               | from what I'm currently doing.
               | 
               | As usual, the answer is "it depends". I guarantee though
               | that I'll at least start looking at alternatives when
               | there's a huge price hike.
               | 
               | Also I suspect that a 100x improvement (if even possible)
               | wouldn't just cost 100 times as much, but probably
               | 100,000+ times as much. I also suspect than an
               | improvement of 100x will be hyped as an improvement of
               | 1,000x at least :)
               | 
               | Regardless, AI is really looking like a commodity to me.
               | While I'm thankful for all the investment that got us
               | here, I doubt anyone investing this late in the game at
               | these inflated numbers are going to see a long term
               | return (other than ponzi selling).
        
             | sebzim4500 wrote:
             | >LLMs? Look more and more like the Metaverse every day as
             | concerns the economics.
             | 
             | ChatGPT has 800M+ weekly active users how is that
             | comparable to the Metaverse in any way?
        
               | benreesman wrote:
               | I said as concerns the economics. It's clearly more
               | popular than the Oculus or whatever, but it's still a
               | money bonfire and shows no signs of changing on that
               | front.
        
               | threetonesun wrote:
               | LLMs as we know them via ChatGPT were a way to disrupt
               | the search monopoly Google had for so many years. And my
               | guess is the reason Google was in no rush to jump into
               | that market was because they knew the economics of it
               | sucked.
        
               | benreesman wrote:
               | Right, and inb4 ads on ChatGPT to stop the bleeding.
               | That's the default outcome at this point: quantize it
               | down gradually to the point where it can be ad supported.
               | 
               | You can just see the scene from the Sorkin film where
               | Fidji is saying to Altman: "Its time to monetize the
               | site."
               | 
               | "We don't even know what it is yet, we know that it is
               | cool."
        
           | datameta wrote:
           | Without trying to take away from your assertion, I think it
           | is worthwhile to mention that part of this phenomenon is the
           | unavoidable matter of meatspace being expensive and dataspace
           | being intangibly present everywhere.
        
           | deadbabe wrote:
           | large scale usage in niche domains is still small scale
           | overall.
        
           | kibwen wrote:
           | No, 3D printers are the backbone of modern physical
           | prototyping. They're far more important to today's global
           | economy than LLMs are, even if you don't have the vantage
           | point to see it from your sector. That might change in the
           | future, but snapping your fingers to wink LLMs out of
           | existence would change essentially nothing about how the
           | world works today; it would be a non-traumatic non-event.
           | There just hasn't been time to integrate them into any
           | essential processes.
        
             | whiplash451 wrote:
             | > snapping your fingers to wink LLMs out of existence would
             | change essentially nothing about how the world works today
             | 
             | One could have said the same thing about Google in 2006
        
               | kibwen wrote:
               | No, not even close. By 2006 all sorts of load-bearing
               | infrastructure was relying on Google (e.g. Gmail). Today
               | LLMs are still on the edge of important systems, rather
               | than underlying those systems.
        
               | johnsmith1840 wrote:
               | Things like BERT are a load bearing structure in data
               | science pipelines.
               | 
               | I assume there are massive number of LLM analysis
               | pipelines out there.
               | 
               | I suppose it depends if you consider non determinist
               | DS/ML pipelines "loadbearing" or not. Most are not using
               | LLMs though.
               | 
               | 3D parts regularly are used beyond prototyping though as
               | tooling for a small company can be higher than just metal
               | 3D parts. So I do somewhat agree but the loss of
               | productivity in software prototyping would be a massive
               | hit if LLMs vanished.
        
           | nativeit wrote:
           | [citation needed]
        
           | skeeter2020 wrote:
           | Th author is not bearish on LLMs at all; this post is about
           | using LLMs and code vs. LLMs with autonomous tools via MCP.
           | An example from your set would be translation. The author
           | says you'll get better results if you do something like ask
           | an LLM to translate documents, review the proposed approach,
           | ask it to review it's work and maybe ask another LLM to
           | validate the results than if you say "you've got 10K
           | documents in English, and these tools - I speak French"
        
         | hk1337 wrote:
         | > Directionally I think this is right.
         | 
         | We have a term at work we use called, "directionally accurate",
         | when it's not entirely accurate but headed in the right
         | direction.
        
       | graerg wrote:
       | > This is a significant advantage that an MCP (Multi-Component
       | Pipeline) typically cannot offer
       | 
       | Oh god please no, we must stop this initialism. We've gone too
       | far.
        
         | the_mitsuhiko wrote:
         | It's the wrong acronym. I wrote this blog post on the bike and
         | used an LLM to fix up the dictation that I did. While I did
         | edit it heavily and rewrote a lot of things, I did not end up
         | noticing that my LLM expanded MCP incorrectly. It's Model
         | Context Protocol.
        
           | apgwoz wrote:
           | And you shipped it to production. Just like real agentic
           | coding! Nice!
        
             | the_mitsuhiko wrote:
             | Which I don't feel great about because I do not like to use
             | LLMs for writing blog posts. I just really wanted to
             | explore if I can write a blog post on my bike commute :)
        
         | bitwize wrote:
         | We're all in line to get de-rezzed by the MCP, one way or
         | another.
        
       | baal80spam wrote:
       | Isn't it a bit like saying: saw is all you need (for carpenters)?
       | 
       | I mean, you _probably_ could make most furniture with only a saw,
       | but why?
        
         | nativeit wrote:
         | In this analogy, do you have to design, construct, and learn
         | from first principles to operate literally every other tool
         | you'd like to use in addition to the saw?
        
       | jasonthorsness wrote:
       | Tools are constraints and time/token savers. Code is expensive in
       | terms of tokens and harder to constrain in environments that
       | can't be fully locked-down because network access for example is
       | needed by the task. You need code AND tools.
        
         | blahgeek wrote:
         | > Code is expensive in terms of tokens and harder to constrain
         | in environments
         | 
         | It's also true for human. But then we invented functions /
         | libraries / modules
        
       | webdevver wrote:
       | what does "MCP" stand for?
        
         | dangus wrote:
         | I was about to say the same thing.
         | 
         | It's bad writing practice to do this, even if you are assuming
         | your followers are following you.
         | 
         | Especially for a site like Twitter that has a login wall.
        
           | jasonlotito wrote:
           | I'm confused. It links to the definition in the first
           | sentence, and I'm not sure what you mean by Twitter in this
           | context.
        
         | empath75 wrote:
         | If you don't know what it is and can't be bothered to google
         | it, then you probably aren't the audience for this.
        
         | komali2 wrote:
         | Microsoft Certified Professional, a very common certification.
         | 
         | Oh wait... hm ;) perhaps the writing nerds had it right when
         | they recommend always writing the full acronym out the first
         | time it's used in an article, no matter how common one presumes
         | it to be
        
         | apgwoz wrote:
         | Mashup Context Protocol, of course! There was a post the other
         | day comparing MCP tools to the mashups of web 2.0. It's a much
         | better acronym expansion.
        
         | aidenn0 wrote:
         | I didn't know either, but in the very first sentence, the
         | author provides the expansion and a link to the Wikipedia page
         | for it.
        
       | vidarh wrote:
       | I frankly use tools mostly as an auth layer for things were raw
       | access is too big a footgun without a permissions step. So I give
       | the agent the choice of asking for permission to do things via
       | the shell, or going nuts without user-interaction via a tool that
       | enforces reasonable limitations.
       | 
       | Otherwise you can e.g just give it a folder of preapproved
       | scripts to run and explain usage in a prompt.
        
       | empath75 wrote:
       | The problem with this is that you have to give your LLM basically
       | unbounded access to everything you have access to, which is a
       | recipe for pain.
        
         | the_mitsuhiko wrote:
         | Not necessarily. I have a small little POC agentic tool on my
         | side which is fully sandboxed, an it's inherently "non prompt
         | injectable" by the data that it processes since it only ever
         | passes that data through generated code.
         | 
         | Disclaimer: it does not work well enough. But I think it shows
         | great promise.
        
       | elyase wrote:
       | This is similar to the tool call (fixed code & dynamic params) vs
       | code generation (dynamic code & dynamic params) discussion: tools
       | offer contrains and save tokens, code gives you flexibility. Some
       | papers suggest that generating code is often superior and this
       | will likely become even more true as language models improve
       | 
       | [1] https://huggingface.co/papers/2402.01030
       | 
       | [2] https://huggingface.co/papers/2401.00812
       | 
       | [3] https://huggingface.co/papers/2411.01747
       | 
       | I am working on a model that goes a step beyond and even makes
       | the distinction between thinking and code execution unnecessary
       | (it is all computation in the end), unfortunately no link to
       | share yet
        
       | simonw wrote:
       | Something I've realized about LLM tool use is that it means that
       | if you can reduce a problem to something that can be solved by an
       | LLM in a sandbox using tools in a loop, you can brute force that
       | problem.
       | 
       | The job then becomes identifying those problems and figuring out
       | how to configure a sandbox for them, what tools to provide and
       | how to define the success criteria for the model.
       | 
       | That still takes significant skill and experience, but it's at a
       | higher level than chewing through that problem using trial and
       | error by hand.
       | 
       | My assembly Mandelbrot experiment was the thing that made this
       | click for me: https://simonwillison.net/2025/Jul/2/mandelbrot-
       | in-x86-assem...
        
         | rasengan wrote:
         | Makes sense.
         | 
         | I treat an LLM the same way I'd treat myself as it relates to
         | context and goals when working with code.
         | 
         | "If I need to do __________ what do I need to know/see?"
         | 
         | I find that traditional tools, as per the OP, have become ever
         | more powerful and useful in the age of LLMs (especially grep).
         | 
         | Furthermore, LLMs are quite good at working with shell tools
         | and functionalities (heredoc, grep, sed, etc.).
        
         | dist-epoch wrote:
         | I've been using a VM for a sandbox, just to make sure it won't
         | delete my files if it goes insane.
         | 
         | With some host data directories mounted read only inside the
         | VM.
         | 
         | This creates some friction though. Feels like a tool which runs
         | the AI agent in a VM, but then copies it's output to the host
         | machine after some checks would help, so that it would feel
         | that you are running it natively on the host.
        
           | jitl wrote:
           | This is very easy to do with Docker. Not sure it you want the
           | vm layer as an extra security boundary, but even so you can
           | just specify the VM's docker api endpoint to spawn processes
           | and copy files in/out from shell scripts.
        
           | simonw wrote:
           | Have you tried giving the model a fresh checkout in a read-
           | write volume?
        
             | dist-epoch wrote:
             | Hmm, excellent idea, somehow I assumed that it would be
             | able to do damage in a writable volume, but it wouldn't be
             | able to exit it, it would be self-contained to that
             | directory.
        
         | nico wrote:
         | > LLM in a sandbox using tools in a loop, you can brute force
         | that problem
         | 
         | Does this require using big models through their APIs and
         | spending a lot of tokens?
         | 
         | Or can this be done either with local models (probably very
         | slow), or with subscriptions like Claude Code with Pro (without
         | hitting the rate/usage limits)?
         | 
         | I saw the Mandelbrot experiment, it was very cool, but still a
         | rather small project, not really comparable to a
         | complex/bigger/older code base for a platform used in
         | production
        
           | simonw wrote:
           | The local models aren't quite good enough for this yet in my
           | experience - the big hosted models (o3, Gemini 2.5, Claude 4)
           | only just crossed the capability threshold for this to start
           | working well.
           | 
           | I think it's possible we'll see a local model that can do
           | this well within the next few months though - it needs good
           | tool calling, not an encyclopedic knowledge of the world.
           | Might be possible to fit that in a model that runs locally.
        
             | pxc wrote:
             | There's a fine-tune of Qwen3 4B called "Jan Nano" that I
             | started playing with yesterday, which is basically just
             | fine-tuned to be more inclined to look things up via web
             | searches than to answer them "off the dome". It's not good-
             | good, but it does seem to have a much lower effective
             | hallucination rate than other models of its size.
             | 
             | It seems like maybe similar approaches could be used for
             | coding tasks, especially with tool calls for reading man
             | pages, info pages, running `tldr`, specifically consulting
             | Stack Overflow, etc. Some of the recent small MoE models
             | from Chinese companies are significantly smarter than
             | models like Qwen 4B, but run about as quickly, so maybe on
             | systems with high RAM or high unified memory, even with
             | middling GPUs, they could be genuinely useful for coding if
             | they are made to be avoid doing anything without tool use.
        
             | never_inline wrote:
             | Wasn't there a tool calling benchmark by docker guys which
             | concluded qwen models are nearly as good as GPT? What is
             | your experience about it?
             | 
             | Personally I am convinced JSON is a bad format for LLMs and
             | code orchestration in python-ish DSL is the future. But
             | local models are pretty bad at code gen too.
        
             | nico wrote:
             | > it needs good tool calling, not an encyclopedic knowledge
             | of the world
             | 
             | I wonder if there are any groups/companies out there
             | building something like this
             | 
             | Would love to have models that only know 1 or 2 languages
             | (eg. python + js), but are great at them and at tool
             | calling. Definitely don't need my coding agent to know all
             | of Wikipedia and translating between 10 different languages
        
               | johnsmith1840 wrote:
               | Given 2 datasets:
               | 
               | 1. A special code dataset 2. A bunch of "unrelated" books
               | 
               | My understanding is that the model trained on just the
               | first will never beat the model trained on both.
               | Bloomberg model is my favorite example of this.
               | 
               | If you can squirell away special data then that special
               | data plus everything else will beat the any other models.
               | But that's basically what openai, google, and anthropic
               | are all currently doing.
        
             | e12e wrote:
             | I wonder if common lisp with repl and debugger could
             | provide a better tool than your example with nasm wrapped
             | via apt in Docker...
             | 
             | Essentially just giving LLMs more state of the art systems
             | made for incremental development?
             | 
             | Ed: looks like that sort of exists:
             | https://github.com/bhauman/clojure-mcp
             | 
             | (Would also be interesting if one could have a few LLMs
             | working together on red/green TDD approach - have an
             | orchestrator that parse requirements, and dispatch a red
             | goblin to write a failing test; a green goblin that writes
             | code until the test pass; and then some kind of hobgoblin
             | to refactor code, keeping test(s) green - working with the
             | orchestrator to "accept" a given feature as done and move
             | on to the next...
             | 
             | With any luck the resulting code _might_ be a bit more
             | transparent (stricter form) than other LLM code)?
        
         | chamomeal wrote:
         | That's super cool, I'm glad you shared this!
         | 
         | I've been thinking about using LLMs for brute forcing problems
         | too.
         | 
         | Like LLMs kinda suck at typescript generics. They're
         | surprisingly bad at them. Probably because it's easy to write
         | generics that _look_ correct, but are then screwy in many
         | scenarios. Which is also why generics are hard for humans.
         | 
         | If you could have any LLM actually use TSC, it could run tests,
         | make sure things are inferring correctly, etc. it could just
         | keep trying until it works. I'm not sure this is a way to
         | produce understandable or maintainable generics, but it would
         | be pretty neat.
         | 
         | Also while typing this is realized that cursor can see
         | typescript errors. All I need are some utility testing types,
         | and I could have cursor write the tests and then brute force
         | the problem!
         | 
         | If I ever actually do this I'll update this comment lol
        
         | vunderba wrote:
         | _> The job then becomes identifying those problems and figuring
         | out how to configure a sandbox for them, what tools to provide,
         | and how to define the success criteria for the model._
         | 
         | Your test case seems like a quintessential example where you're
         | missing that last step.
         | 
         | Since it is unlikely that you understand the math behind
         | fractals or x86 assembly (apologies if I'm wrong on this), your
         | only means for verifying the accuracy of your solution is a
         | superficial visual inspection, e.g. "Does it look like the
         | Mandelbrot series?"
         | 
         | Ideally, your evaluation criteria would be expressed as a
         | continuous function, but at the very least, it should take the
         | form of a sufficiently diverse quantifiable set of discrete
         | inputs and their expected outputs.
        
           | simonw wrote:
           | That's exactly why I like using Mandelbrot as a demo: it's
           | perfect for "superficial visual inspection".
           | 
           | With a bunch more work I could likely have got a vision LLM
           | to do that visual inspection for me in the assembly example,
           | but having a human in the loop for that was much more
           | productive.
        
           | shepherdjerred wrote:
           | Are fractals or x86 assembly representative of most dev work?
        
             | nartho wrote:
             | I think it's irrelevant. The point they are trying to make
             | is anytime you ask a LLM for something that's outside of
             | your area of expertise you have very little to no way to
             | insure it is correct.
        
               | diggan wrote:
               | > anytime you ask a LLM for something that's outside of
               | your area of expertise you have very little to no way to
               | insure it is correct.
               | 
               | I regularly use LLMs to code specific functions I don't
               | necessarily understand the internals of. Most of the time
               | I do that, it's something math-heavy for a game. Just
               | like any function, I put it under automated and manual
               | tests. Still, I review and try to gain some intuition
               | about what is happening, but it is still very far of my
               | area of expertise, yet I can be sure it works as I expect
               | it to.
        
         | chrisweekly wrote:
         | Giving LLMs the right context -- eg in the form of predefined
         | "cognitive tools", as explored with a ton of rigor here^1 --
         | seems like the way forward, at least to this casual observer.
         | 
         | 1. https://github.com/davidkimai/Context-
         | Engineering/blob/main/...
         | 
         | (the repo is a WIP book, I've only scratched the surface but it
         | seems pretty brilliant to me)
        
         | skeeter2020 wrote:
         | One of my biggest, ongoing challenges has been to get the LLM
         | to use the tool(s) that are appropriate for the job. It feels
         | like teach your kids to say, do laundry and you want to just
         | tell them to step aside and let you do it.
        
       | FrustratedMonky wrote:
       | I wonder if having 2 LLM's communicate will eventually be more
       | like humans talking. With all the same problems.
        
         | CuriouslyC wrote:
         | I already have agents managing different repositories ask each
         | other questions and make requests. It works pretty well for the
         | most part.
        
       | victorbjorklund wrote:
       | I think the GitHub CLI example isn't entirely fair to MCP. Yes,
       | GitHub's CLI is extensively documented online, so of course LLMs
       | will excel at generating code for well-known tools. But MCP
       | shines in different scenarios.
       | 
       | Consider internal company tools or niche APIs with minimal online
       | documentation. Sure, you could dump all the documentation into
       | context for code generation, but that often requires more context
       | than interacting with an MCP tool. More importantly, generated
       | code for unfamiliar APIs is prone to errors so you'd need robust
       | testing and retry mechanisms built in to the process.
       | 
       | With MCP, if the tools are properly designed and receive correct
       | inputs, they work reliably. The LLM doesn't need to figure out
       | API intricacies, authentication flows, or handle edge cases -
       | that's already handled by the MCP server.
       | 
       | So I agree MCP for GitHub is probably overkill but there are many
       | legitimate use cases where pre-built MCP tools make more sense
       | than asking an LLM to reverse-engineer poorly documented or
       | proprietary systems from scratch.
        
         | light_hue_1 wrote:
         | That's handled by the MCP server in the sense of it doesn't do
         | authentication, etc. it provides a simplified view of the
         | world.
         | 
         | If that's what you wanted you could have designed that as your
         | poorly documented internal API differently to begin with.
         | There's zero advantage to MCP in the scenario you describe
         | aside from convincing people that their original API is too
         | hard to use.
        
         | the_mitsuhiko wrote:
         | > Sure, you could dump all the documentation into context for
         | code generation, but that often requires more context than
         | interacting with an MCP tool.
         | 
         | MCP works exactly that way: you dump documentation into the
         | context. That's how the LLM knows how to call your tool. Even
         | for custom stuff I noticed that giving the LLM things to work
         | with that it knows (eg: python, javascript, bash) beats it
         | using MCP tool calling, and in some ways it wastes less
         | context.
         | 
         | YMMV, but I found the limit of tools available to be <15 with
         | sonnet4. That's a super low amount. Basically the official
         | playwright MCP alone is enough to fully exhaust your available
         | tool space.
        
           | JyB wrote:
           | Ive never used that many. The LLM performances
           | collapse/degrade significantly because of too much initial
           | context? It seems like MCP implems updates could easily solve
           | that. Like only injecting relevant servers for the given task
           | based on initial user prompt.
        
             | the_mitsuhiko wrote:
             | > Ive never used that many.
             | 
             | The playwright MCP alone introduces 25 tools into the
             | context :(
        
       | forrestthewoods wrote:
       | Unpopular Opinion: I hate Bash. Hate it. And hate the ecosystem
       | of Unix CLIs that are from the 80s and have the most obtuse,
       | inscrutable APIs ever designed. Also this ecosystem doesn't work
       | on Windows -- which, as a game dev, is my primary environment.
       | And no, WSL does not count.
       | 
       | I don't think the world needs yet another shell scripting
       | language. They're all pretty mediocre at best. But maybe this is
       | an opportunity to do something interesting.
       | 
       | Python environment is a clusterfuck. Which UV is rapidly bringing
       | into something somewhat sane. Python isn't the ultimate language.
       | But I'd definitely be more interested in "replace yourself with a
       | UV Python script" over "replace yourself with a shell script".
       | Would be nice to see use this as an opportunity to do better than
       | Bash.
       | 
       | I realize this is unpopular. But unpopular doesn't mean wrong.
        
         | lsaferite wrote:
         | Python CAN be a "shell script" in this case though...
         | 
         | Tool composition over stdio will get you very very far. That's
         | what an interface "from the 80s" does for you 45 years later.
         | That same stdio composability is easily pipe into/through any
         | number of cli tools written in any number of languages,
         | compiled and interpreted.
        
           | forrestthewoods wrote:
           | Composing via stdio is so bloody terrible. Layers and layers
           | of bullshit string parsing and encoding and decoding. Soooo
           | many bugs. And utterly undebuggable. A truly miserable
           | experience.
        
             | zahlman wrote:
             | And now you also understand many of the limitations LLMs
             | have.
        
         | osigurdson wrote:
         | Nobody likes coding in bash but everyone does it (a little)
         | because it is everywhere.
        
           | forrestthewoods wrote:
           | > because it is everywhere
           | 
           | Except for the fact that actually it is not everywhere.
        
             | nativeit wrote:
             | I see your point, but bear with me here--it kind of is.
             | 
             | I suppose if one wanted to be pedantically literal, then
             | you are indeed correct. In every other meaningful
             | consideration, the parent comment is. Maybe not Bash
             | specifically, but #!/bin/sh is broadly available on nearly
             | every connected device on the planet, in some capacity.
             | From the perspective of how we could automate nearly
             | anything, you'd be hard-pressed to find something more
             | universal than a shell script.
        
               | forrestthewoods wrote:
               | > you'd be hard-pressed to find something more universal
               | than a shell script.
               | 
               | 99.9% of my 20-year career has been spent on Windows. So
               | bash scripts are entirely worthless and dead to me.
        
               | osigurdson wrote:
               | If you use git on Windows, bash is normally available.
               | Agree, that this isn't widely used though.
        
               | forrestthewoods wrote:
               | Yeah I've never see anyone rely on users to use GitBash
               | to run shell scripts.
               | 
               | Amusingly although I certainly use GitHub for hobby
               | projects I've never actually used it for work. And have
               | never come across a Windows project that mandated its
               | use. Well, maybe one or two over the years.
        
               | nativeit wrote:
               | What do you suppose the proportion is of computers
               | actively running Windows in the world right now, versus
               | those running some kind of *nix/BSD-based OS? This
               | includes everything a person or machine could reasonably
               | interface with, and that's Turing complete (in other
               | words, a traffic light is limited to its own fixed logic,
               | so it doesn't count; but most contemporary wifi routers
               | contain general-purpose memory and processors, many even
               | run some kind of *nix kernel, so they very much do
               | count).
               | 
               | That's my case for Bash being more or less everywhere,
               | but I think this debate is entirely semantic. Literally
               | just talking about different things.
               | 
               | EDIT: escaped *
        
               | forrestthewoods wrote:
               | I think if someone were, for example, to release an open
               | source C++ library and it only compiles for Linux or only
               | comes with Bash scripts then I would not consider that
               | library to be crossplatform nor would I consider it to
               | run everywhere.
               | 
               | I don't think it's "just semantics". I think it's a
               | meaningful distinction.
               | 
               | Game dev is a perhaps a small niche of computer
               | programming. I mean these days the majority of
               | programming is webdev JavaScript, blech. But game dev is
               | also _overwhelmingly_ Windows based. So I dispute any
               | claim that Unix is "everywhere". And I'm regularly
               | annoyed by people who falsely pretend it is.
        
         | hollerith wrote:
         | Me, too. Also, _Unix_ as a whole is overrated. One reason it
         | won was an agreement mediated by a Federal judge presiding over
         | an anti-trust trial that AT &T would not enter the computer
         | market while IBM would not enter the telecommunications market,
         | so Unix was distributed at zero cost rather than sold.
         | 
         | Want to get me talking reverentially about the pioneers of our
         | industry? Talk to me about Doug Engelbart, Xerox PARC and the
         | Macintosh team at Apple. There was some brilliant work!
        
           | nativeit wrote:
           | > Also, Unix as a whole is overrated. One reason it won was
           | an agreement mediated by a Federal judge presiding over an
           | anti-trust trial that AT&T would not enter the computer
           | market while IBM would not enter the telecommunications
           | market, so Unix was distributed at zero cost rather than
           | sold.
           | 
           | What did Unix win?
        
             | hollerith wrote:
             | Mind share of the basic design. Unix's design decisions are
             | important parts of MacOS and Linux.
             | 
             | Multics would be an example of a more innovative OS than
             | Unix, but its influence on the OSes we use today has been a
             | lot less.
        
               | nativeit wrote:
               | I suppose the deeper question I'd have would be, how
               | would its no-cost distribution prevent better
               | alternatives from being developed/promoted/adopted along
               | the way? I guess I don't follow your line of logic. To be
               | fair, I'm not experienced enough with either OS
               | development nor any notable alternatives to Unix to
               | agree/disagree with your conclusions. My intuition wants
               | to disagree, only because I like Linux, and even sort of
               | like Bash scripts--but I have _nothing_ but my own
               | subjective preferences to base that position on, and I 'm
               | actually quite open to being better-informed into
               | submission. ;-)
               | 
               | I'm a pretty old hat with Debian at this point, so I've
               | got plenty of opinions for its contemporary
               | implementations, but I always sort of assumed most of the
               | fundamental architectural/systems choices had more or
               | less been settled as the "best choices" via the usual
               | natural selection, along with the OSS community's abiding
               | love for reasoned debate. I can generally understand the
               | issues folks have with some of these defaults, but my
               | favorite aspect of OS's like Debian are that they
               | generally defer to the sysadmin's desires for all things
               | where we're likely to have strong opinions. It's "default
               | position" of providing no default positions. Certainly
               | now that there are containers and orchestration like Nix,
               | the layer that is Unix is even less visible, and
               | infrastructure-as-code mean a lot of developers can just
               | kind of forget about the OS layer altogether, at least
               | beyond the OS('s) they choose for their own daily
               | driver(s).
               | 
               | Getting this back to the OG point--I can understand why
               | people don't like the Bash scripting language. But it
               | seems trivial these days to get to a point where one
               | could use Python, Lua, Forth, et al to automate and
               | control any system running a _nix /BSD OS, and _nix OS's
               | do several key things rather well (in my opinion), such
               | as service bootstrapping, lifecycle management,
               | networking/comms, and maintaining a small footprint.
               | 
               | For whatever it's worth, one could start with nothing but
               | a Debian ISO and some preseed files, and get to a point
               | where they could orchestrate/launch anything they could
               | imagine using their own language/application of choice,
               | without ever touching having touched a shell prompt or
               | writing a line of Bash. Not for nothing, that's almost
               | certainly how many Linux-based customized distributions
               | (and even full-blown custom/bespoke OS's) are created,
               | but it doesn't have to be so complicated if one just
               | wants to get to where Python scripts are able to run (for
               | example).
        
               | hollerith wrote:
               | Most OSes no longer have any users or squeak by with less
               | than 1000 users on their best day ever: Plan 9, OS/2,
               | Beos, AmigaOS, Symbian, PalmOS, the OS for the Apple II,
               | CP/M, VMS, TOPS-10, Multics, Compatible Time-Sharing
               | System, Burroughs Master Control Program, Univac's Exec
               | 8, Dartmouth Time-Sharing System, etc.
               | 
               | Some of the events that help Unix survive longer than
               | most are the decision of DARPA (in 1979 or the early
               | 1980s IIRC) to fund the addition of a TCP/IP networking
               | stack to Unix and the decision in 1983 of Richard
               | Stallman to copy the Unix design for his GNU project. The
               | reason DARPA and Stallman settled on Unix was that they
               | knew about it and were somewhat familiar with it because
               | it was given away for free (mostly to universities and
               | research labs). Success tends to beget success in
               | "spaces" with strong "network externalities" such as the
               | OS space.
               | 
               | >Getting this back to the OG point
               | 
               | I agree that it is easy to avoid writing shell scripts.
               | The problem is that other people write them, e.g., as the
               | recommended way to install some package I want. The
               | recommended way to install a Rust toolchain for example
               | is to run a shell script (rustup). I trust the Rust
               | maintainers not to intentionally put an attack in the
               | script, but I don't trust them not to have inadvertently
               | included a vulnerability in the script that some third
               | party might be able to exploit (particularly since it is
               | quite difficult to write an attack-resistant shell
               | script).
        
               | hollerith wrote:
               | OK, consider the browser market: are there any browsers
               | that cost money? If so, I've not heard of it. From the
               | beginning, Netscape Corporation, Microsoft, Opera and
               | Apple gave away their browsers for free. That is because
               | by the early 1990s it was well understood (at least by
               | Silicon Valley execs) that what is important is grabbing
               | mind share, and charging any amount of money would
               | severely curtain the ability to do that.
               | 
               | In the 1970s when Unix started being distributed outside
               | of Bell Labs, tech company execs did not yet understand
               | that. The owners of Unix adopted a superior strategy to
               | ensure survival of Unix _by accident_ (namely, by being
               | sued -- IIRC in the 1950s -- by the US Justice Department
               | on anti-trust grounds).
        
         | zahlman wrote:
         | > Python environment is a clusterfuck. Which UV is rapidly
         | bringing into something somewhat sane.
         | 
         | Uv is able to do what it does mainly because of a) being a
         | greenfield project b) in an environment of new standards that
         | the community has been working on since the first days that
         | people complained about said clusterfuck.
         | 
         | But that's assuming you actually need to set up an environment.
         | People really underestimate what can be done easily with just
         | the standard library. And when they do grab the most popular
         | dependencies, they end up exercising a tiny fraction of that
         | code.
         | 
         | > But I'd definitely be more interested in "replace yourself
         | with a UV Python script" over "replace yourself with a shell
         | script".
         | 
         | There is no such thing as "a UV Python script". Uv doesn't
         | create a new language. It doesn't even have a monopoly on what
         | I _guess_ you 're referring to, i.e. the system it uses for
         | specifying dependencies inline in a script. That comes from an
         | ecosystem-wide standard, https://peps.python.org/pep-0723/.
         | Pipx also implements creating environments for such code and
         | running it, as do Hatch and PDM; and other tools offer
         | appropriate support - e.g. editors may be able to syntax-
         | highlight the declaration etc.
         | 
         | Regardless, what you describe is not at all opposed to what the
         | author has in mind here. The term "shell script" is often used
         | quite loosely.
        
           | forrestthewoods wrote:
           | Ok?
        
       | recursivedoubts wrote:
       | I would like to see MPC integrate the notion of hypermedia
       | controls.
       | 
       | Seems like that would be a potential way to get self-organizing
       | integrations.
        
       | vasusen wrote:
       | I think the Playwright MCP is a really good example of the
       | overall problem that the author brings up.
       | 
       | However, I couldn't really understand if he's saying that the
       | Playwright MCP is good to use for your own app or whether he
       | means for your own app just tell the LLM directly to export
       | Playwright code.
        
         | shelajev wrote:
         | it's the latter: "you can actually start telling it to write a
         | Playwright Python script instead and run that".
         | 
         | and while running the code might faster, it's unclear whether
         | that approach scales well. Sending an MCP tool command to click
         | the button that says "X", is something a small local LLM can
         | do. Writing complex code after parsing significant amount of
         | HTML (for correct selectors for example) probably needs a
         | managed model.
        
       | jumploops wrote:
       | We're playing an endless cat and mouse game of capabilities
       | between old and new right now.
       | 
       | Claude Code shows that the models can excel at using "old"
       | programmatic interfaces (CLIs) to do Real Work(tm).
       | 
       | MCP is a way to dynamically provide "new" programmatic interfaces
       | to the models.
       | 
       | At some point this will start to converge, or at least appear to
       | do so, as the majority of tools a model needs will be in its pre-
       | training set.
       | 
       | Then we'll argue about MPPP (model pre-training protocol
       | pipeline), and how to reduce knowledge pollution of all the LLM-
       | generated tools we're passing to the model.
       | 
       | Eventually we'll publish the Merrium-Webster Model Tool
       | Dictionary (MWMTD), surfacing all of the approved tools hidden in
       | the pre-training set.
       | 
       | Then the kids will come up with Model Context Slang (MCS), in an
       | attempt to use the models to dynamically choose unapproved tools,
       | for much fun and enjoyment.
       | 
       | Ad infinitum.
        
       | JyB wrote:
       | > It demands too much context.
       | 
       | This is solved trivially by having default initial prompts. All
       | majors tools like Claude Code or Gemini CLI have ways to set them
       | up.
       | 
       | > You pass all your tools to an LLM and ask it to filter it down
       | based on the task at hand. So far, there hasn't been much better
       | approaches proposed.
       | 
       | Why is a "better" approach needed? If modern LLMs can properly
       | figure it out? It's not like LLMs don't keep getting better with
       | larger and larger context length. I never had a problem with an
       | LLM struggling to use the appropriate MCP function on it's own.
       | 
       | > But you run into three problems: cost, speed, and general
       | reliability
       | 
       | - cost: They keep getting cheaper and cheaper. It's ridiculously
       | inexpensive for what those tools provide.
       | 
       | - speed: That seem extremely short sighted. No one is sitting
       | idle looking at Claude Code in their terminal. And you can have
       | more than one working on unrelated topics. That defeats the
       | purpose. No matter how long it takes the time spent is purely
       | bonus. You don't have to spend time in the loop when asking well
       | defined tasks.
       | 
       | - reliability: Seem very prompt correlated ATM. I guess some
       | people don't know what to ask which is the main issue.
       | 
       | Having LLMS being able to complete tedious tasks involving so
       | many external tools at once is simply amazing thanks to MCP.
       | Anecdotal but just today it did a task flawlessly involving:
       | Notion pages, Linear Ticket, git, GitHub PR, GitHub CI logs.
       | Being in the loop was just submitting one review on the PR. All
       | the while I was busy doing something else. And for what, ~1$?
        
         | the_mitsuhiko wrote:
         | > This is solved trivially by having default initial prompts.
         | All majors tools like Claude Code or Gemini CLI have ways to
         | set them up.
         | 
         | That only makes it worse. The MCP tools available all add to
         | the initial context. The more tools, the more of the context is
         | populated by MCP tool definitions.
        
           | JyB wrote:
           | Do you mean that some tools (MCP clients) pass all functions
           | of all configured MCP servers in the initial prompt?
           | 
           | If that's the case: I understand the knee-jerk reaction but
           | if it works? Also what theoretically prevents altering the
           | prompt chaining logic in these tools to only expose a
           | condensed list of MCP servers, not their whole capabilities,
           | and only inject details based on LLM outputs? Doesn't seem
           | like an insurmountable problem.
        
             | the_mitsuhiko wrote:
             | > Do you mean that some tools (MCP clients) pass all
             | functions of all configured MCP servers in the initial
             | prompt?
             | 
             | Not just some, all. That's just how MCP works.
             | 
             | > If that's the case: I understand the knee-jerk reaction
             | but if it works?
             | 
             | I would not be writing about this if it worked well. The
             | data indicates that it worse significantly worse than not
             | using MCP because of the context rot, and the low too
             | utilization.
        
               | JyB wrote:
               | I guess I don't see the technical limitation. Seem like a
               | protocol update issue.
        
         | dingnuts wrote:
         | > cost: They keep getting cheaper and cheaper
         | 
         | no they don't[0], the cost is just still hidden from you but
         | the freebies will end just like MoviePass and cheap Ubers
         | 
         | https://bsky.app/profile/edzitron.com/post/3lsw4vatg3k2b
         | 
         | "Cursor released a $200-a-month subscription then made their
         | $20-a-month subscription worse (worse output, slower) - yet it
         | seems even on Max they're rate limiting people!"
         | 
         | https://bsky.app/profile/edzitron.com/post/3lsw3zwgw4c2h
        
           | fkyoureadthedoc wrote:
           | The cost will stay hidden from me because my job will pay it,
           | just like the cost of my laptop, o365 license, and every
           | other tool I use at work.
        
             | nativeit wrote:
             | Until they use your salary to pay for another dozen
             | licenses.
        
           | JyB wrote:
           | Fair. I'm using Claude Code which is pay as you go. The
           | Market will probably do its things. (The company pays anyway
           | obviously)
        
       | antirez wrote:
       | I have the feeling that's not really MCP specifically VS other
       | ways, but it is pretty simply: at the current state of AI, to
       | have a human in the loop is _much_ better. LLMs are great at
       | certain tasks but they often get trapped into local minima, if
       | you do the back and forth via the web interface of LLMs, ask it
       | to write a program, look at it, provide hints to improve it, test
       | it, ..., you get much better results and you don 't cut yourself
       | out to find a 10k lines of code mess that could be 400 lines of
       | clear code. That's the current state of affairs, but of course
       | many will try very hard to replace programmers that is currently
       | _not_ possible. What it is possible is to accelerate the work of
       | a programmer several times (but they must be good both at
       | programming and LLM usage), or take a smart person that has a
       | relatively low skill in some technology, and thanks to LLM make
       | this person productive in this field without the long training
       | otherwise needed. And many other things. But  "agentic coding"
       | right now does not work well. This will change, but right now the
       | real gain is to use the LLM as a colleague.
       | 
       | It is not MCP: it is autonomous agents that don't get feedbacks
       | from smart humans.
        
         | rapind wrote:
         | So I run my own business (product), I code everything, and I
         | use claude-code. I also wear all the other hats and so I'd be
         | happy to let Claude handle all of the coding if / when it can.
         | I can confirm we're certainly not there yet.
         | 
         | It's definitely useful, but you have to read everything. I'm
         | working in a type-safe functional compiled language too. I'd be
         | scared to try this flow in a less "correctness enforced"
         | language.
         | 
         | That being said, I do find that it works well. It's not living
         | up to the hype, but most of that hype was obvious nonsense. It
         | continues to surprise me with it's grasp on concepts and is
         | definitely saving me some time, and more importantly making
         | some larger tasks more approachable since I can split my time
         | better.
        
       | galdre wrote:
       | My absolute favorite use of MCP so far is Bruce Hauman's clojure-
       | mcp. In short, it gives the LLM (a) a bash tool, (b) a persistent
       | Clojure REPL, and (c) structural editing tools.
       | 
       | The effect is that it's far more efficient at editing Clojure
       | code than any purely string-diff-based approach, and if you write
       | a good test suite it can rapidly iterate back and forth just
       | editing files, reloading them, and then re-running the test suite
       | at the REPL -- just like I would. It's pretty incredible to
       | watch.
        
         | chamomeal wrote:
         | I was just going to comment about clojure-mcp!! It's far and
         | away the coolest use of mcp I've seen so far.
         | 
         | It can straight up debug your code, eval individual
         | expressions, document return types of functions. It's amazing.
         | 
         | It actually makes me think that languages with strong REPLs are
         | a better for languages than those without. Seeing clojure-mcp
         | do its thing is the most impressive AI feat I've seen since I
         | saw GPT-3 in action for the first time
        
         | e12e wrote:
         | https://github.com/bhauman/clojure-mcp
        
       | manaskarekar wrote:
       | Off topic: That font/layout/contrast on the page is very pleasing
       | and inviting.
        
       | khalic wrote:
       | Honestly, I'm getting tired of these sweeping statements about
       | what developers are supposed to be, how it's "the right way to
       | use AI". We are in uncharted territories that are changing by the
       | day. Maybe we have to drop the self-assurance and opinionated
       | view points and tackle this like a scientific problem.
        
         | pizzathyme wrote:
         | 100% agreed - he mentions 3 barriers to using MCP over code:
         | "cost, speed, and general reliability". But all 3 of these
         | could change by 10-100x within a few years, if not months. Just
         | recently OpenAI dropped the price of using o3 by 80%
         | 
         | This is not an environment where you can establish a durable
         | manifesto
        
       | luckystarr wrote:
       | I always dreamed of a tool which would know the intent, semantic
       | and constraints of all inputs and outputs of any piece of code
       | and thus could combine these code pieces automatically. It was
       | always a fuzzy idea in my head, but this piece now made it a bit
       | more clear. While LLMs could generate those adapters between
       | distinct pieces automatically, it's a expensive (latency, tokens)
       | process. Having a system with which not only to type the
       | variables, but also to type the types (intents, semantic meaning,
       | etc.) would be helpful but likely not sufficient. There has been
       | so much work on ontologies, semantic networks, logical inference,
       | etc. but all of it is spread all over the place. I'd like to have
       | something like this integrated into a programming language and
       | see what it feels like.
        
       | tristanz wrote:
       | You can combine MCPs within composable LLM generated code if you
       | put in a little work. At Continual (https://continual.ai), we
       | have many workflows that require bulk actions, e.g. iterating
       | over all issues, files, customers, etc. We inject MCP tools into
       | a sandboxed code interpreter and have the agent generate both
       | direct MCP tool calls and composable scripts that leverage MCP
       | tools depending on the task complexity. After a bunch of work it
       | actually works quite well. We are also experimenting with
       | continual learning via a Voyager like approach where the LLM can
       | save tool scripts for future use, allowing lifelong learning for
       | repeated workflows.
        
         | JyB wrote:
         | That autocompounding aspect of constantly refining initial
         | prompts with more and more knowledge is so interesting. Gut
         | feeling says it's something that will be "standardized" in some
         | way, exactly like what MCP did.
        
           | tristanz wrote:
           | Yes, I think you could get quite far with a few tools like
           | memory/todo list + code interpreter + script save/load. You
           | could probably get a lot farther though if you RLVRed this
           | similar to how o3 uses web search so effectively during it's
           | thinking process.
        
       | wrs wrote:
       | MCP is literally the same as giving an LLM a set of man page
       | summaries and a very limited shell over HTTP. It's just in a
       | different syntax (JSON instead of man macros and CLI args).
       | 
       | It would be better for MCP to deliver function definitions and
       | let the LLM write little scripts in a simple language.
        
       | CharlieDigital wrote:
       | > So maybe we need to look at ways to find a better abstraction
       | for what MCP is great at, and code generation. For that that we
       | might need to build better sandboxes and maybe start looking at
       | how we can expose APIs in ways that allow an agent to do some
       | sort of fan out / fan in for inference. Effectively we want to do
       | as much in generated code as we can, but then use the magic of
       | LLMs after bulk code execution to judge what we did.
       | 
       | Por que no los dos? I ended up writing an OSS MCP server that
       | securely executes LLM generated JavaScript using a C# JS
       | interpreter (Jint) and handing it a `fetch` analogue as well as
       | `jsonpath-plus`. Also gave it a built-in secrets manager.
       | 
       | Give it an objective and the LLM writes its own code and uses the
       | tool iteratively to accomplish the task (as long as you can
       | interact with it via a REST API).
       | 
       | For well known APIs, it does a fine job generating REST API
       | calls.
       | 
       |  _You can pretty much do anything with this._
       | 
       | https://github.com/CharlieDigital/runjs
        
       | pamelafox wrote:
       | Regarding the Playwright example: I had the same experience this
       | week attempting to build an agent first by using the Playwright
       | MCP server, realizing it was slow, token-inefficient, and flaky,
       | and rewriting with direct Playwright calls.
       | 
       | MCP servers might be fun to get an idea for what's possible, and
       | good for one-off mashups, but API calls are generally more
       | efficient and stable, when you know what you want.
       | 
       | Here's the agent I ended up writing:
       | https://github.com/pamelafox/personal-linkedin-agent
       | 
       | Demo: https://www.youtube.com/live/ue8D7Hi4nGs
        
         | arkmm wrote:
         | This is cool! Also have found the Playwright MCP implementation
         | to be overkill and think of it more as a reference to an
         | opinionated subset of the Playwright API.
         | 
         | LinkedIn has this reputation of being notorious about making it
         | hard to build automations on top of, did you run into any
         | roadblocks when building your personal LinkedIn agent?
        
           | zahlman wrote:
           | ... Ah, reading these as well as more carefully reading TFA,
           | I understand now that there is an MCP based on Playwright,
           | and that Playwright itself is not considered an example of
           | something that accidentally is an MCP despite having been
           | released all the way back in January 2020.
           | 
           | ... But now I still feel far away from understanding what MCP
           | really _is_. As in:
           | 
           | * What specifically do I have to implement in order to create
           | one?
           | 
           | * Now that the concept exists, what are the implications as
           | the author of, say, a traditional REST API?
           | 
           | * Now that the concept exists, what new problems exist to
           | solve?
        
       | pramodbiligiri wrote:
       | Wouldn't the sweet spot for MCP be where the LLM is able to do
       | most of the heavy lifting on its own (outputting some kind of
       | structured or unstructured output), but needs a bit of
       | external/dynamic data that it can't do without? The list of MCP
       | servers/tools it can use should nail that external lookup in a
       | (mostly) deterministic way.
       | 
       | This would work best if a human is the end consumer of this
       | output, or will receive manual vetting eventually. I'm not sure
       | I'd leave such a system running unsupervised in production ("the
       | Automation at Scale" part mentioned by the OP).
        
         | ramoz wrote:
         | You don't solve the problem of being able to rely on the agent
         | to call the MCP.
         | 
         | Hooks into the agent's execution lifecycle seem more reliable
         | for deterministic behavior and supervision.
        
           | pramodbiligiri wrote:
           | I agree. In any large backend software running on a server,
           | it's the LLM invocation which would be a call out to an
           | external system, and with proper validation around the
           | results. At which point, calling an "MCP Server" is also just
           | your backend software invoking one more library/service based
           | on inspecting some part of the response from the LLM.
           | 
           | This doesn't take away from the utility of MCP when it comes
           | Claude Desktop and the likes!
        
       | briandw wrote:
       | Anyone else switch their LLM subscription every month? I'm back
       | on ChatGPT for O3 use, but expect that Grok4 will be next.
        
       | jrm4 wrote:
       | Yup, I can't help but think that a lot of the bad thinking comes
       | from trying to avoid the following fact: LLMs are only good where
       | your output does not need to be precise and/or verifiably
       | "perfect," which is kind of the opposite of how code has worked,
       | or has tried to work, in the past.
       | 
       | Right now I got it for: DRAFTS of prose things -- and the only
       | real killer in my opinion, autotagging thousands of old
       | bookmarks. But again, that's just to have cool stuff to go back
       | and peruse, not something that _must be correc.t_
        
       | never_inline wrote:
       | The problem I see with MCP is very simple. It's using JSON as the
       | format and that's nowhere as expressive as a programming
       | language.
       | 
       | Consider a python function signature
       | 
       | list_containers(show_stopped: bool = False, name_pattern:
       | Optional[str] = None, sort: Literal["size", "name", "started_at"]
       | = "name"). It doesn't even need docs
       | 
       | Now convert this to JSON schema which is 4x larger input already.
       | 
       | And when generating output, the LLM will generate almost 2x more
       | tokens too, because JSON. Easier to get confused.
       | 
       | And consider that the flow of calling python functions and using
       | their output to call other tools etc... is seen 1000x more times
       | in their fine tuning data, whereas JSON tool calling flows are
       | rare and practically only exist in instruction tuning phase. Then
       | I am sure instruction tuning also contains even more complex code
       | examples where model has to execute complex logic.
       | 
       | Then theres the whole issue of composition. To my knowledge
       | there's no way LLM can do this in one response.
       | vehicle = call_func_1()         if vehicle.type == "car":
       | details = lookup_car(vehicle.reg_no)         else if vehicle.type
       | == "motorcycle":           details =
       | lookup_motorcycle(vehicle.reg_ni)
       | 
       | How is JSON tool calling going to solve this?
        
         | chrisweekly wrote:
         | Great point.
         | 
         | But "the" problem with MCP? IMVHO (Very humble, non-expert) the
         | half-baked or missing security aspects are more fundamental.
         | I'd love to hear updates about that from ppl who know what
         | they're talking about.
        
         | 8note wrote:
         | the reason to use the llm is that you dont know ahead of time
         | that the vehicle type is only a car or motorcycle, and the llm
         | will also figure out a way to detail bycicles and boats and
         | airplanes, and to consider both left and right shoes
         | separately.
         | 
         | the llm cant just be given this function because its
         | specialized to just the two options.
         | 
         | you could have it do a feedback loop of rewriting the python
         | script after running it, but whats the savings at tha point?
         | youre wasting tokens talking about cars in python when you
         | already know is a ski, and the llm could ask directly for the
         | ski details without writing a script to do it in between
        
       | prairieroadent wrote:
       | makes sense and if realized then deno is in excellent position to
       | be one of the leading if not the main sandbox runtime for agents
        
       | keybored wrote:
       | tl;dr of one of today's AI posts: all you need is code generation
       | 
       | It's 2025 and this is the epitome of progress.
       | 
       | On the positive side code generation can be solid if you also
       | have/can generate easy-to-read validation or tests for the
       | generated code. I mean that _you_ can read, of course.
        
       | LudwigNagasena wrote:
       | I hit the same roadblock with MCP. If you work with data, LLM
       | becomes a very expensive pipe with an added risk of
       | hallucinations. It's better to simply connect it to a Python
       | environment enriched with integrations you need.
        
       | SatvikBeri wrote:
       | I use Julia at work, which benefits from long-running sessions,
       | because it compiles functions the first time they run. So I wrote
       | a very simple MCP that lets Claude Code send code to a persistent
       | Julia kernel using Jupyter.
       | 
       | It had a much bigger impact than I expected - not only does test
       | code run much faster (and not time out), but Claude seems to be
       | much more willing to just run functions from our codebase rather
       | than do a bunch of bespoke bash stuff to try and make something
       | work. It's anecdotal, but CCUsage says my token usage has dropped
       | nearly 50% since I wrote the server.
       | 
       | Of course, it didn't have to be MCP - I could have used some
       | other method to get Claude to run code from my codebase more
       | frequently. The broader point is that it's much easier to just
       | add a useful function to my codebase than it is to write
       | something bespoke for Claude.
        
         | macleginn wrote:
         | "Claude seems to be much more willing to just run functions
         | from our codebase rather than do a bunch of bespoke bash stuff
         | to try and make something work" -- simply because it knows that
         | there is a kernel it can send code to?
        
       | alganet wrote:
       | It's finally happening. The acceleration of the full AI
       | disillusionment:
       | 
       | - LLMs will do everything.
       | 
       | - Shit, they won't. I'll do some traditional programming to put
       | it on a leash.
       | 
       | - More traditional programming.
       | 
       | - Wait, this traditional programming thing is quite good.
       | 
       | - I barely use LLMs now.
       | 
       | - Man, that LLM stuff was a bad trip.
       | 
       | See you all on the other side!
        
       ___________________________________________________________________
       (page generated 2025-07-03 23:01 UTC)