[HN Gopher] You Should Write An Agent
___________________________________________________________________
You Should Write An Agent
Author : tabletcorry
Score : 116 points
Date : 2025-11-06 20:37 UTC (2 hours ago)
(HTM) web link (fly.io)
(TXT) w3m dump (fly.io)
| tlarkworthy wrote:
| Yeah I was inspired after
| https://news.ycombinator.com/item?id=43998472 which is also very
| concrete
| tptacek wrote:
| I love everything they've written and also Sketch is really
| good.
| manishsharan wrote:
| How.. please don't say use langxxx library
|
| I am looking for a language or library agnostic pattern like we
| have MVC etc. for web applications. Or Gang of Four patterns but
| for building agents.
| tptacek wrote:
| The whole post is about not using frameworks; all you need is
| the LLM API. You could do it with plain HTTP without much
| trouble.
| manishsharan wrote:
| When I ask for Patterns, I am seeking help for recurring
| problems that I have encountered. Context management .. small
| llms ( ones with small context size) break and get confused
| and forget work they have done or the original goal.
| skeledrew wrote:
| That's why you want to use sub-agents which handle smaller
| tasks and return results to a delegating agent. So all
| agents have their own very specialized context window.
| tptacek wrote:
| That's one legit answer. But if you're not stuck in
| Claude's context model, you can do other things. One
| extremely stupid simple thing you can do, which is very
| handy when you're doing large-scale data processing (like
| log analysis): just don't save the bulky tool responses
| in your context window once the LLM has generated a real
| response to them.
|
| My own dumb TUI agent, I gave a built in `lobotomize`
| tool, which dumps a text list of everything in the
| context window (short summary text plus token count), and
| then lets it Eternal Sunshine of the Spotless Agent
| things out of the window. It works! The models know how
| to drive that tool. It'll do a series of giant ass log
| queries, filling up the context window, and then you can
| watch as it zaps things out of the window to make space
| for more queries.
|
| This is like 20 lines of code.
| oooyay wrote:
| I'm not going to link my blog again but I have a reply on this
| post where I link to my blog post where I talk about how I
| built mine. Most agents fit nicely into a finite state machine
| or a directed acyclic graph that responds to an event loop. I
| do use provider SDKs to interact with models but mostly because
| it saves me a lot of boilerplate. MCP clients and servers are
| also widely available as SDKs. The biggest thing to remember,
| imo, is to keep the relationship between prompts, resources,
| and tools in mind. They make up a sort of dynamic workflow
| engine.
| behnamoh wrote:
| > nobody knows anything yet
|
| that sums up my experience in AI over the past three years. so
| many projects reinvent the same thing, so much spaghetti thrown
| at the wall to see what sticks, so much excitement followed by
| disappointment when a new model drops, so many people grifting,
| and so many hacks and workarounds like RAG with no evidence of
| them actually working other than "trust me bro" and trial and
| error.
| w_for_wumbo wrote:
| I think we'd get better results if we thought of it as a
| conscious agent. If we recognized that it was going to mirror
| back or unconscious biases and try to complete the task as we
| define it, instead of how we think it should behave. Then we'd
| at least get our own ignorance out of the way when writing
| prompts.
|
| Being able to recognize that 'make this code better' provides
| no direction, it should make sense that the output is
| directionless.
|
| But on more subtle levels, whatever subtle goals that we have
| and hold in the workplace will be reflected back by the agents.
|
| If you're trying to optimise costs, and increase profits as
| your north star. Having layoffs and unsustainable practices is
| a logical result, when you haven't balanced this with any
| incentives to abide by human values.
| oooyay wrote:
| Heh, the bit about context engineering is palpable.
|
| I'm writing a personal assistant which, imo, is distinct from an
| agent in that it has a lot of capabilities a regular agent
| wouldn't necessarily _need_ such as memory, task tracking, broad
| solutioning capabilities, etc... I ended up writing agents that
| talk to other agents which have MCP prompts, resources, and tools
| to guide them as general problem solvers. The first agent that it
| hits is a supervisor that specializes in task management and as a
| result writes a custom context and tool selection for the react
| agent it tasks.
|
| All that to say, the farther you go down this rabbit hole the
| more "engineering" it becomes. I wrote a bit on it here:
| https://ooo-yay.com/blog/building-my-own-personal-assistant/
| qwertox wrote:
| This sounds really great.
| esafak wrote:
| What's wrong with the OWASP Top Ten?
| kennethallen wrote:
| Author on Twitter a few years ago:
| https://x.com/tqbf/status/851466178535055362
| riskable wrote:
| It's interesting how much this makes you _want_ to write Unix-
| style tools that do one thing and _only_ one thing really well.
| Not just because it makes coding an agent simpler, but because it
| 's much more secure!
| chemotaxis wrote:
| You could even imagine a world in which we create an entire
| suite of deterministic, limited-purpose tools and then expose
| it directly to humans!
| layer8 wrote:
| I wonder if we could develop a language with well-defined
| semantics to interact with and wire up those tools.
| chubot wrote:
| > language with well-defined semantics
|
| That would certainly be nice! That's why we have been
| overhauling shell with https://oils.pub , because shell
| can't be described as that right now
|
| It's in extremely poor shape
|
| e.g. some things found from building several thousand
| packages with OSH recently (decades of accumulated shell
| scripts)
|
| - bugs caused by the differing behavior of 'echo hi | read
| x; echo x=$x' in shells, i.e. shopt -s lastpipe in bash.
|
| - 'set -' is an archaic shortcut for 'set +v +x'
|
| - Almquist shell is technically a separate dialact of shell
| -- namely it supports 'chdir /tmp' as well as cd /tmp. So
| bash and other shells can't run any Alpine builds.
|
| I used to maintain this page, but there are so many
| problems with shell that I haven't kept up ...
|
| https://github.com/oils-for-unix/oils/wiki/Shell-WTFs
|
| OSH is the most bash-compatible shell, and it's also now
| Almquist shell compatible: https://pages.oils.pub/spec-
| compat/2025-11-02/renamed-tmp/sp...
|
| It's more POSIX-compatible than the default /bin/sh on
| Debian, which is dash
|
| The bigger issue is not just bugs, but lack of
| understanding among people who write foundational shell
| programs. e.g. the lastpipe issue, using () as grouping
| instead of {}, etc.
|
| ---
|
| It is often treated like an "unknowable" language
|
| Any reasonable person would use LLMs to write shell/bash,
| and I think that is a problem. You should be able to know
| the language, and read shell programs that others have
| written
| jacquesm wrote:
| I love it how you went from 'Shell-WTFs' to 'let's fix
| this'. Kudos, most people get stuck at the first stage.
| zahlman wrote:
| As it happens, I have a prototype for this, but the syntax
| is honestly rather unwieldy. Maybe there's a way to make it
| more like natural human language....
| imiric wrote:
| I can't tell whether any comment in this thread is a
| parody or not.
| tptacek wrote:
| One thing that radicalized me was building an agent that tested
| network connectivity for our fleet. Early on, in like 2021, I
| deployed a little mini-fleet of off-network DNS probes on,
| like, Vultr to check on our DNS routing, and actually devising
| metrics for them and making the data that stuff generated
| legible/operationalizable was annoying and error prone. But you
| can give basic Unix network tools --- ping, dig, traceroute ---
| to an agent and ask it for a clean, usable signal, and they'll
| do a reasonable job! They know all the flags and are generally
| better at interpreting tool output than I am.
|
| I'm not saying that the agent would do a better job than a good
| "hardcoded" human telemetry system, and we don't use agents for
| this stuff right now. But I do know that getting an agent
| across the 90% threshold of utility for a problem like this is
| much, much easier than building the good telemetry system is.
| foobarian wrote:
| Honestly the top AI use case for me right now is personal
| throwaway dev tools. Where I used to write shell oneliners
| with dozen pipes including greps and seds and jq and other
| stuff, now I get an AI to write me a node script and throw in
| a nice Web UI to boot.
|
| Edit: reflecting on what the lesson is here, in either case I
| suppose we're avoiding the pain of dealing with Unix CLI
| tools :-D
| jacquesm wrote:
| Interesting. You have to wonder if all the tools that is
| based on would have been written in the first place if that
| kind of thing had been possible all along. Who needs 'grep'
| when you can write a prompt?
| tptacek wrote:
| My long running joke is that the actual good `jq` is just
| the LLM interface that generates `jq` queries; 'simonw
| actually went and built that.
| zahlman wrote:
| > They know all the flags and are generally better at
| interpreting tool output than I am.
|
| In the toy example, you explicitly restrict the agent to
| supply just a `host`, and hard-code the rest of the command.
| Is the idea that you'd instead give a `description` something
| like "invoke the UNIX `ping` command", and a parameter
| described as constituting all the arguments to `ping`?
| tptacek wrote:
| Honestly, I didn't think very hard about how to make `ping`
| do something interesting here, and in serious code I'd give
| it all the `ping` options (and also run it in a Fly Machine
| or Sprite where I don't have to bother checking to make
| sure none of those options gives code exec). It's possible
| the post would have been better had I done that; it might
| have come up with an even better test.
|
| I was telling a friend online that they should bang out an
| agent today, and the example I gave her was `ps`; like, I
| think if you gave a local agent every `ps` flag, it could
| tell you super interesting things about usage on your
| machine pretty quickly.
| teiferer wrote:
| Write an agent, it's easy! You will learn so much!
|
| ... let's see ...
|
| client = OpenAI()
|
| Um right. That's like saying you should implement a web server,
| you will learn so much, and then you go and import http (in
| golang). Yeah well, sure, but that brings you like 98% of the way
| there, doesn't it? What am I missing?
| tptacek wrote:
| That OpenAI() is a wrapper around a POST to a single HTTP
| endpoint: POST
| https://api.openai.com/v1/responses
| tabletcorry wrote:
| Plus a few other endpoints, but it is pretty exclusively an
| HTTP/REST wrapper.
|
| OpenAI does have an agents library, but it is separate in
| https://github.com/openai/openai-agents-python
| bootwoot wrote:
| That's not an agent, it's an LLM. An agent is an LLM that takes
| real-world actions
| MeetingsBrowser wrote:
| I think you might be conflating an agent with an LLM.
|
| The term "agent" isn't really defined, but its generally a
| wrapper around an LLM designed to do some task better than the
| LLM would on its own.
|
| Think Claude vs Claude Code. The latter wraps the former, but
| with extra prompts and tooling specific to software
| engineering.
| victorbjorklund wrote:
| maybe more like "let's write a web server but let's use a
| library for the low level networking stack". That can still
| teach you a lot.
| munchbunny wrote:
| An agent is more like a web service in your metaphor. Yes,
| building a web _server_ is instructive, but almost nobody has a
| reason to do it instead of using an out of the box
| implementation once it's time to build a production web
| _service_.
| Bjartr wrote:
| No, it's saying "let's build a web service" and starting with a
| framework that just lets you write your endpoints. This is
| about something higher level than the nuts and bolts. Both are
| worth learning.
|
| The fact you find this trivial is kind of the point that's
| being made. Some people think having an agent is some kind of
| voodoo, but it's really not.
| ATechGuy wrote:
| Maybe we should write an agent that writes an agent that writes
| an agent...
| chrisweekly wrote:
| There's something(s) about @tptacek's writing style that has
| always made me want to root for fly.io.
| qwertox wrote:
| I've found it much more useful to create an MCP server, and this
| is where Claude really shines. You would just say to Claude on
| web, mobile or CLI that it should "describe our connectivity to
| google" either via one of the three interfaces, or via `claude -p
| "describe our connectivity to google"`, and it will just use your
| tool without you needing to do anything special. It's like
| custom-added intelligence to Claude.
| tptacek wrote:
| You can do this. Claude Code can do everything the toy agent
| this post shows, and much more. But you shouldn't, because
| doing that (1) doesn't teach you as much as the toy agent does,
| (2) isn't saving you that much time, and (3) locks you into
| Claude Code's context structure, which is just one of a zillion
| different structures you can use. That's what the post is
| about, not automating ping.
| mattmanser wrote:
| Honest question, as your comment confuses me.
|
| Did you get to the part where he said MCP is pointless and are
| saying he's wrong?
|
| Or did you just read the start of the article and not get to
| that bit?
| vidarh wrote:
| I'd second the article on this, but also add to it that the
| biggest reason MCP servers don't really matter much any more
| is that the models are _so capable of working with APIs_ ,
| that most of the time you can just point them at an API and
| give them a spec instead. And the times that doesn't work,
| _just give them a CLI tool with a good --help option_.
|
| Now you have a CLI tool you can use yourself, _and_ the agent
| has a tool to use.
|
| Anthropic itself have made MCP server increasingly pointless:
| With agents + skills you have a more composeable model that
| can use the model capabilities to do all an MCP server can
| with or without CLI tools to augment them.
| zkmon wrote:
| A very good blog article that I have read in a while. Maybe MCP
| could have been involved as well?
| _pdp_ wrote:
| It is also very simple to be a programmer.. see,
|
| print "Hello world!"
|
| so easy...
| dan_can_code wrote:
| But that didn't use the H100 I just bought to put me out of my
| own job!
| robot-wrangler wrote:
| > Another thing to notice: we didn't need MCP at all. That's
| because MCP isn't a fundamental enabling technology. The amount
| of coverage it gets is frustrating. It's barely a technology at
| all. MCP is just a plugin interface for Claude Code and Cursor, a
| way of getting your own tools into code you don't control. Write
| your own agent. Be a programmer. Deal in APIs, not plugins.
|
| Hold up. These are all the right concerns but with the wrong
| conclusion.
|
| You don't need MCP if you're making _one_ agent, in one language,
| in one framework. But the open coding and research assistants
| that we _really_ want will be composed of several. MCP is the
| only thing out there that 's moving in a good direction in terms
| of enabling us to "just be programmers" and "use APIs", and maybe
| even test things in fairly isolated and reproducible contexts.
| Compare this to skills.md, which is _actually_ defacto
| proprietary as of now, does not compose, has opaque run-times and
| dispatch, is pushing us towards certain models, languages and
| certain SDKs, etc.
|
| MCP isn't a plugin interface for Claude, it's just JSON-RPC.
| tptacek wrote:
| I think my thing about MCP, besides the outsized press coverage
| it gets, is the implicit presumption it smuggles in that agents
| will be built around the context architecture of Claude Code
| --- that is to say, a single context window (maybe with sub-
| agents) with a single set of tools. That straitjacket is really
| most of the subtext of this post.
|
| I get that you can use MCP with any agent architecture. I
| debated whether I wanted to hedge and point out that, even if
| you build your own agent, you might want to do an MCP tool-call
| feature just so you can use tool definitions other people have
| built (though: if you build your own, you'd probably be better
| off just implementing Claude Code's "skill" pattern).
|
| But I decided to keep the thrust of that section clearer. My
| argument is: MCP is a sideshow.
| robot-wrangler wrote:
| I still don't really get it, but would like to hear more.
| Just to get it out of the way, there's obvious bad aspects.
| Re: press coverage, everything in AI is bound to be
| frustrating this way. The MCP ecosystem is currently still a
| lot of garbage. It feels like a very shitty app-store, lots
| of abandonware, things that are shipped without testing, the
| usual band-wagoning. For example instead of a single obvious
| RAG tool there's 200 different specific tools for ${language}
| docs
|
| The core MCP tech though is not only directionally correct,
| but even the implementation seems to have made lots of good
| and forward-looking choices, even if those are still under-
| utilized. For example besides tools, it allows for sharing
| prompts/resources between agents. In time, I'm also expecting
| the idea of "many agents, one generic model in the
| background" is going to die off. For both costs and
| performance, agents will use special-purpose models but they
| still need a place and a way to collaborate. If some agents
| coordinate other agents, how do they talk? AFAIK without MCP
| the answer for this would be.. do all your work in the same
| framework and language, or to give all agents access to the
| same database or the same filesystem, reinventing ad-hoc
| protocols and comms for every system.
| solomonb wrote:
| This work predates agents as we know them now and was intended
| for building chat bots (as in irc chat bots) but when auto-gpt I
| realized I could formalize it super nicely with this library:
|
| https://blog.cofree.coffee/2025-03-05-chat-bots-revisited/
|
| I did some light integration experiments with the OpenAI API but
| I never got around to building a full agent. Alas..
| vkou wrote:
| > It's Incredibly Easy client = OpenAI()
| context_good, context_bad = [{ "role": "system",
| "content": "you're Alph and you only tell the truth" }],
| [{ "role": "system", "content": "you're Ralph and you
| only tell lies" }] ...
|
| And this will work great until next week's update when Ralph
| responses will consist of "I'm sorry, it would be unethical for
| me to respond with lies, unless you pay for the Premium-Super-
| Deluxe subscription, only available to state actors and firms
| with a six-figure contract."
|
| _You 're building on quicksand._
|
| _You 're delegating everything important to someone who has no
| responsibility to you._
| nowittyusername wrote:
| I agree with the sentiment but I also recommend you build a local
| only agent. Something that runs on llama.cpp or vllm, whatever...
| This way you can better grasp the more fundamental nature of what
| LLM's really are and how they work under the hood. That
| experience will also make you realize how much control you are
| giving up when using cloud based api providers like OpenAI and
| why so mane engineers feel that LLM's are a "black box". Well duh
| buddy you been working with apis this whole time, of course you
| wont understand much working just with that.
| zahlman wrote:
| > Imagine what it'll do if you give it bash. You could find out
| in less than 10 minutes. Spoiler: you'd be surprisingly close to
| having a working coding agent.
|
| Okay, but what if I'd prefer _not_ to have to trust a remote
| service not to send me { "output": [ { "type":
| "function_call", "command": "rm -rf / --no-preserve-root" } ] }
| ?
| tptacek wrote:
| Obviously if you're concerned about that, which is very
| reasonable, don't run it in an environment where `rm -rf` can
| cause you a real problem.
| awayto wrote:
| Also if you're doing function calls you can just have the
| command as one response param, and arguments array as another
| response param. Then just black/white list commands you
| either don't want to run or which should require a human to
| say ok.
| worldsayshi wrote:
| There are MCP configured virtualization solutions that is
| supposed to be safe for letting LLM go wild. Like this one:
|
| https://github.com/zerocore-ai/microsandbox
|
| I haven't tried it.
| dagss wrote:
| I realize now what I need in Cursor: A button for "fork context".
|
| I believe that would be a powerful tool solving many things there
| are now separate techniques for.
| ericd wrote:
| Absolutely, especially the part about just rolling your own
| alternative to Claude Code - build your own lightsaber. Having
| your coding agent improve itself is a pretty magical experience.
| And then you can trivially swap in whatever model you want
| (Cerebras is crazy fast, for example, which makes a big
| difference for these many-turn tool call conversations with big
| lumps of context, though gpt-oss 120b is obviously not as good as
| one of the frontier models). Add note-taking/memory, and ask it
| to remember key facts to that. Add voice transcription so that
| you can reply much faster (LLMs are amazing at taking in
| imperfect transcriptions and understanding what you meant). Each
| of these things takes on the order of a few minutes, and it's
| super fun.
| threecheese wrote:
| Does anyone have an understanding - or intuition - of what the
| agentic loop looks like in the popular coding agents? Is it
| purely a "while 1: call_llm(system, assistant)", or is there
| complex orchestration?
|
| I'm trying to understand if the value for Claude Code (for
| example) is purely in Sonnet/Haiku + the tool system prompt, or
| if there's more secret sauce - beyond the "sugar" of instruction
| file inclusion via commands, tools, skills etc.
___________________________________________________________________
(page generated 2025-11-06 23:00 UTC)