hngopher.com

       [HN Gopher] You Should Write An Agent
       ___________________________________________________________________
        
       You Should Write An Agent
        
       Author : tabletcorry
       Score  : 116 points
       Date   : 2025-11-06 20:37 UTC (2 hours ago)
        
 (HTM) web link (fly.io)
 (TXT) w3m dump (fly.io)
        
       | tlarkworthy wrote:
       | Yeah I was inspired after
       | https://news.ycombinator.com/item?id=43998472 which is also very
       | concrete
        
         | tptacek wrote:
         | I love everything they've written and also Sketch is really
         | good.
        
       | manishsharan wrote:
       | How.. please don't say use langxxx library
       | 
       | I am looking for a language or library agnostic pattern like we
       | have MVC etc. for web applications. Or Gang of Four patterns but
       | for building agents.
        
         | tptacek wrote:
         | The whole post is about not using frameworks; all you need is
         | the LLM API. You could do it with plain HTTP without much
         | trouble.
        
           | manishsharan wrote:
           | When I ask for Patterns, I am seeking help for recurring
           | problems that I have encountered. Context management .. small
           | llms ( ones with small context size) break and get confused
           | and forget work they have done or the original goal.
        
             | skeledrew wrote:
             | That's why you want to use sub-agents which handle smaller
             | tasks and return results to a delegating agent. So all
             | agents have their own very specialized context window.
        
               | tptacek wrote:
               | That's one legit answer. But if you're not stuck in
               | Claude's context model, you can do other things. One
               | extremely stupid simple thing you can do, which is very
               | handy when you're doing large-scale data processing (like
               | log analysis): just don't save the bulky tool responses
               | in your context window once the LLM has generated a real
               | response to them.
               | 
               | My own dumb TUI agent, I gave a built in `lobotomize`
               | tool, which dumps a text list of everything in the
               | context window (short summary text plus token count), and
               | then lets it Eternal Sunshine of the Spotless Agent
               | things out of the window. It works! The models know how
               | to drive that tool. It'll do a series of giant ass log
               | queries, filling up the context window, and then you can
               | watch as it zaps things out of the window to make space
               | for more queries.
               | 
               | This is like 20 lines of code.
        
         | oooyay wrote:
         | I'm not going to link my blog again but I have a reply on this
         | post where I link to my blog post where I talk about how I
         | built mine. Most agents fit nicely into a finite state machine
         | or a directed acyclic graph that responds to an event loop. I
         | do use provider SDKs to interact with models but mostly because
         | it saves me a lot of boilerplate. MCP clients and servers are
         | also widely available as SDKs. The biggest thing to remember,
         | imo, is to keep the relationship between prompts, resources,
         | and tools in mind. They make up a sort of dynamic workflow
         | engine.
        
       | behnamoh wrote:
       | > nobody knows anything yet
       | 
       | that sums up my experience in AI over the past three years. so
       | many projects reinvent the same thing, so much spaghetti thrown
       | at the wall to see what sticks, so much excitement followed by
       | disappointment when a new model drops, so many people grifting,
       | and so many hacks and workarounds like RAG with no evidence of
       | them actually working other than "trust me bro" and trial and
       | error.
        
         | w_for_wumbo wrote:
         | I think we'd get better results if we thought of it as a
         | conscious agent. If we recognized that it was going to mirror
         | back or unconscious biases and try to complete the task as we
         | define it, instead of how we think it should behave. Then we'd
         | at least get our own ignorance out of the way when writing
         | prompts.
         | 
         | Being able to recognize that 'make this code better' provides
         | no direction, it should make sense that the output is
         | directionless.
         | 
         | But on more subtle levels, whatever subtle goals that we have
         | and hold in the workplace will be reflected back by the agents.
         | 
         | If you're trying to optimise costs, and increase profits as
         | your north star. Having layoffs and unsustainable practices is
         | a logical result, when you haven't balanced this with any
         | incentives to abide by human values.
        
       | oooyay wrote:
       | Heh, the bit about context engineering is palpable.
       | 
       | I'm writing a personal assistant which, imo, is distinct from an
       | agent in that it has a lot of capabilities a regular agent
       | wouldn't necessarily _need_ such as memory, task tracking, broad
       | solutioning capabilities, etc... I ended up writing agents that
       | talk to other agents which have MCP prompts, resources, and tools
       | to guide them as general problem solvers. The first agent that it
       | hits is a supervisor that specializes in task management and as a
       | result writes a custom context and tool selection for the react
       | agent it tasks.
       | 
       | All that to say, the farther you go down this rabbit hole the
       | more "engineering" it becomes. I wrote a bit on it here:
       | https://ooo-yay.com/blog/building-my-own-personal-assistant/
        
         | qwertox wrote:
         | This sounds really great.
        
       | esafak wrote:
       | What's wrong with the OWASP Top Ten?
        
         | kennethallen wrote:
         | Author on Twitter a few years ago:
         | https://x.com/tqbf/status/851466178535055362
        
       | riskable wrote:
       | It's interesting how much this makes you _want_ to write Unix-
       | style tools that do one thing and _only_ one thing really well.
       | Not just because it makes coding an agent simpler, but because it
       | 's much more secure!
        
         | chemotaxis wrote:
         | You could even imagine a world in which we create an entire
         | suite of deterministic, limited-purpose tools and then expose
         | it directly to humans!
        
           | layer8 wrote:
           | I wonder if we could develop a language with well-defined
           | semantics to interact with and wire up those tools.
        
             | chubot wrote:
             | > language with well-defined semantics
             | 
             | That would certainly be nice! That's why we have been
             | overhauling shell with https://oils.pub , because shell
             | can't be described as that right now
             | 
             | It's in extremely poor shape
             | 
             | e.g. some things found from building several thousand
             | packages with OSH recently (decades of accumulated shell
             | scripts)
             | 
             | - bugs caused by the differing behavior of 'echo hi | read
             | x; echo x=$x' in shells, i.e. shopt -s lastpipe in bash.
             | 
             | - 'set -' is an archaic shortcut for 'set +v +x'
             | 
             | - Almquist shell is technically a separate dialact of shell
             | -- namely it supports 'chdir /tmp' as well as cd /tmp. So
             | bash and other shells can't run any Alpine builds.
             | 
             | I used to maintain this page, but there are so many
             | problems with shell that I haven't kept up ...
             | 
             | https://github.com/oils-for-unix/oils/wiki/Shell-WTFs
             | 
             | OSH is the most bash-compatible shell, and it's also now
             | Almquist shell compatible: https://pages.oils.pub/spec-
             | compat/2025-11-02/renamed-tmp/sp...
             | 
             | It's more POSIX-compatible than the default /bin/sh on
             | Debian, which is dash
             | 
             | The bigger issue is not just bugs, but lack of
             | understanding among people who write foundational shell
             | programs. e.g. the lastpipe issue, using () as grouping
             | instead of {}, etc.
             | 
             | ---
             | 
             | It is often treated like an "unknowable" language
             | 
             | Any reasonable person would use LLMs to write shell/bash,
             | and I think that is a problem. You should be able to know
             | the language, and read shell programs that others have
             | written
        
               | jacquesm wrote:
               | I love it how you went from 'Shell-WTFs' to 'let's fix
               | this'. Kudos, most people get stuck at the first stage.
        
             | zahlman wrote:
             | As it happens, I have a prototype for this, but the syntax
             | is honestly rather unwieldy. Maybe there's a way to make it
             | more like natural human language....
        
               | imiric wrote:
               | I can't tell whether any comment in this thread is a
               | parody or not.
        
         | tptacek wrote:
         | One thing that radicalized me was building an agent that tested
         | network connectivity for our fleet. Early on, in like 2021, I
         | deployed a little mini-fleet of off-network DNS probes on,
         | like, Vultr to check on our DNS routing, and actually devising
         | metrics for them and making the data that stuff generated
         | legible/operationalizable was annoying and error prone. But you
         | can give basic Unix network tools --- ping, dig, traceroute ---
         | to an agent and ask it for a clean, usable signal, and they'll
         | do a reasonable job! They know all the flags and are generally
         | better at interpreting tool output than I am.
         | 
         | I'm not saying that the agent would do a better job than a good
         | "hardcoded" human telemetry system, and we don't use agents for
         | this stuff right now. But I do know that getting an agent
         | across the 90% threshold of utility for a problem like this is
         | much, much easier than building the good telemetry system is.
        
           | foobarian wrote:
           | Honestly the top AI use case for me right now is personal
           | throwaway dev tools. Where I used to write shell oneliners
           | with dozen pipes including greps and seds and jq and other
           | stuff, now I get an AI to write me a node script and throw in
           | a nice Web UI to boot.
           | 
           | Edit: reflecting on what the lesson is here, in either case I
           | suppose we're avoiding the pain of dealing with Unix CLI
           | tools :-D
        
             | jacquesm wrote:
             | Interesting. You have to wonder if all the tools that is
             | based on would have been written in the first place if that
             | kind of thing had been possible all along. Who needs 'grep'
             | when you can write a prompt?
        
               | tptacek wrote:
               | My long running joke is that the actual good `jq` is just
               | the LLM interface that generates `jq` queries; 'simonw
               | actually went and built that.
        
           | zahlman wrote:
           | > They know all the flags and are generally better at
           | interpreting tool output than I am.
           | 
           | In the toy example, you explicitly restrict the agent to
           | supply just a `host`, and hard-code the rest of the command.
           | Is the idea that you'd instead give a `description` something
           | like "invoke the UNIX `ping` command", and a parameter
           | described as constituting all the arguments to `ping`?
        
             | tptacek wrote:
             | Honestly, I didn't think very hard about how to make `ping`
             | do something interesting here, and in serious code I'd give
             | it all the `ping` options (and also run it in a Fly Machine
             | or Sprite where I don't have to bother checking to make
             | sure none of those options gives code exec). It's possible
             | the post would have been better had I done that; it might
             | have come up with an even better test.
             | 
             | I was telling a friend online that they should bang out an
             | agent today, and the example I gave her was `ps`; like, I
             | think if you gave a local agent every `ps` flag, it could
             | tell you super interesting things about usage on your
             | machine pretty quickly.
        
       | teiferer wrote:
       | Write an agent, it's easy! You will learn so much!
       | 
       | ... let's see ...
       | 
       | client = OpenAI()
       | 
       | Um right. That's like saying you should implement a web server,
       | you will learn so much, and then you go and import http (in
       | golang). Yeah well, sure, but that brings you like 98% of the way
       | there, doesn't it? What am I missing?
        
         | tptacek wrote:
         | That OpenAI() is a wrapper around a POST to a single HTTP
         | endpoint:                   POST
         | https://api.openai.com/v1/responses
        
           | tabletcorry wrote:
           | Plus a few other endpoints, but it is pretty exclusively an
           | HTTP/REST wrapper.
           | 
           | OpenAI does have an agents library, but it is separate in
           | https://github.com/openai/openai-agents-python
        
         | bootwoot wrote:
         | That's not an agent, it's an LLM. An agent is an LLM that takes
         | real-world actions
        
         | MeetingsBrowser wrote:
         | I think you might be conflating an agent with an LLM.
         | 
         | The term "agent" isn't really defined, but its generally a
         | wrapper around an LLM designed to do some task better than the
         | LLM would on its own.
         | 
         | Think Claude vs Claude Code. The latter wraps the former, but
         | with extra prompts and tooling specific to software
         | engineering.
        
         | victorbjorklund wrote:
         | maybe more like "let's write a web server but let's use a
         | library for the low level networking stack". That can still
         | teach you a lot.
        
         | munchbunny wrote:
         | An agent is more like a web service in your metaphor. Yes,
         | building a web _server_ is instructive, but almost nobody has a
         | reason to do it instead of using an out of the box
         | implementation once it's time to build a production web
         | _service_.
        
         | Bjartr wrote:
         | No, it's saying "let's build a web service" and starting with a
         | framework that just lets you write your endpoints. This is
         | about something higher level than the nuts and bolts. Both are
         | worth learning.
         | 
         | The fact you find this trivial is kind of the point that's
         | being made. Some people think having an agent is some kind of
         | voodoo, but it's really not.
        
       | ATechGuy wrote:
       | Maybe we should write an agent that writes an agent that writes
       | an agent...
        
       | chrisweekly wrote:
       | There's something(s) about @tptacek's writing style that has
       | always made me want to root for fly.io.
        
       | qwertox wrote:
       | I've found it much more useful to create an MCP server, and this
       | is where Claude really shines. You would just say to Claude on
       | web, mobile or CLI that it should "describe our connectivity to
       | google" either via one of the three interfaces, or via `claude -p
       | "describe our connectivity to google"`, and it will just use your
       | tool without you needing to do anything special. It's like
       | custom-added intelligence to Claude.
        
         | tptacek wrote:
         | You can do this. Claude Code can do everything the toy agent
         | this post shows, and much more. But you shouldn't, because
         | doing that (1) doesn't teach you as much as the toy agent does,
         | (2) isn't saving you that much time, and (3) locks you into
         | Claude Code's context structure, which is just one of a zillion
         | different structures you can use. That's what the post is
         | about, not automating ping.
        
         | mattmanser wrote:
         | Honest question, as your comment confuses me.
         | 
         | Did you get to the part where he said MCP is pointless and are
         | saying he's wrong?
         | 
         | Or did you just read the start of the article and not get to
         | that bit?
        
           | vidarh wrote:
           | I'd second the article on this, but also add to it that the
           | biggest reason MCP servers don't really matter much any more
           | is that the models are _so capable of working with APIs_ ,
           | that most of the time you can just point them at an API and
           | give them a spec instead. And the times that doesn't work,
           | _just give them a CLI tool with a good --help option_.
           | 
           | Now you have a CLI tool you can use yourself, _and_ the agent
           | has a tool to use.
           | 
           | Anthropic itself have made MCP server increasingly pointless:
           | With agents + skills you have a more composeable model that
           | can use the model capabilities to do all an MCP server can
           | with or without CLI tools to augment them.
        
       | zkmon wrote:
       | A very good blog article that I have read in a while. Maybe MCP
       | could have been involved as well?
        
       | _pdp_ wrote:
       | It is also very simple to be a programmer.. see,
       | 
       | print "Hello world!"
       | 
       | so easy...
        
         | dan_can_code wrote:
         | But that didn't use the H100 I just bought to put me out of my
         | own job!
        
       | robot-wrangler wrote:
       | > Another thing to notice: we didn't need MCP at all. That's
       | because MCP isn't a fundamental enabling technology. The amount
       | of coverage it gets is frustrating. It's barely a technology at
       | all. MCP is just a plugin interface for Claude Code and Cursor, a
       | way of getting your own tools into code you don't control. Write
       | your own agent. Be a programmer. Deal in APIs, not plugins.
       | 
       | Hold up. These are all the right concerns but with the wrong
       | conclusion.
       | 
       | You don't need MCP if you're making _one_ agent, in one language,
       | in one framework. But the open coding and research assistants
       | that we _really_ want will be composed of several. MCP is the
       | only thing out there that 's moving in a good direction in terms
       | of enabling us to "just be programmers" and "use APIs", and maybe
       | even test things in fairly isolated and reproducible contexts.
       | Compare this to skills.md, which is _actually_ defacto
       | proprietary as of now, does not compose, has opaque run-times and
       | dispatch, is pushing us towards certain models, languages and
       | certain SDKs, etc.
       | 
       | MCP isn't a plugin interface for Claude, it's just JSON-RPC.
        
         | tptacek wrote:
         | I think my thing about MCP, besides the outsized press coverage
         | it gets, is the implicit presumption it smuggles in that agents
         | will be built around the context architecture of Claude Code
         | --- that is to say, a single context window (maybe with sub-
         | agents) with a single set of tools. That straitjacket is really
         | most of the subtext of this post.
         | 
         | I get that you can use MCP with any agent architecture. I
         | debated whether I wanted to hedge and point out that, even if
         | you build your own agent, you might want to do an MCP tool-call
         | feature just so you can use tool definitions other people have
         | built (though: if you build your own, you'd probably be better
         | off just implementing Claude Code's "skill" pattern).
         | 
         | But I decided to keep the thrust of that section clearer. My
         | argument is: MCP is a sideshow.
        
           | robot-wrangler wrote:
           | I still don't really get it, but would like to hear more.
           | Just to get it out of the way, there's obvious bad aspects.
           | Re: press coverage, everything in AI is bound to be
           | frustrating this way. The MCP ecosystem is currently still a
           | lot of garbage. It feels like a very shitty app-store, lots
           | of abandonware, things that are shipped without testing, the
           | usual band-wagoning. For example instead of a single obvious
           | RAG tool there's 200 different specific tools for ${language}
           | docs
           | 
           | The core MCP tech though is not only directionally correct,
           | but even the implementation seems to have made lots of good
           | and forward-looking choices, even if those are still under-
           | utilized. For example besides tools, it allows for sharing
           | prompts/resources between agents. In time, I'm also expecting
           | the idea of "many agents, one generic model in the
           | background" is going to die off. For both costs and
           | performance, agents will use special-purpose models but they
           | still need a place and a way to collaborate. If some agents
           | coordinate other agents, how do they talk? AFAIK without MCP
           | the answer for this would be.. do all your work in the same
           | framework and language, or to give all agents access to the
           | same database or the same filesystem, reinventing ad-hoc
           | protocols and comms for every system.
        
       | solomonb wrote:
       | This work predates agents as we know them now and was intended
       | for building chat bots (as in irc chat bots) but when auto-gpt I
       | realized I could formalize it super nicely with this library:
       | 
       | https://blog.cofree.coffee/2025-03-05-chat-bots-revisited/
       | 
       | I did some light integration experiments with the OpenAI API but
       | I never got around to building a full agent. Alas..
        
       | vkou wrote:
       | > It's Incredibly Easy                   client = OpenAI()
       | context_good, context_bad = [{             "role": "system",
       | "content": "you're Alph and you only tell the truth"         }],
       | [{             "role": "system", "content": "you're Ralph and you
       | only tell lies"         }]         ...
       | 
       | And this will work great until next week's update when Ralph
       | responses will consist of "I'm sorry, it would be unethical for
       | me to respond with lies, unless you pay for the Premium-Super-
       | Deluxe subscription, only available to state actors and firms
       | with a six-figure contract."
       | 
       |  _You 're building on quicksand._
       | 
       |  _You 're delegating everything important to someone who has no
       | responsibility to you._
        
       | nowittyusername wrote:
       | I agree with the sentiment but I also recommend you build a local
       | only agent. Something that runs on llama.cpp or vllm, whatever...
       | This way you can better grasp the more fundamental nature of what
       | LLM's really are and how they work under the hood. That
       | experience will also make you realize how much control you are
       | giving up when using cloud based api providers like OpenAI and
       | why so mane engineers feel that LLM's are a "black box". Well duh
       | buddy you been working with apis this whole time, of course you
       | wont understand much working just with that.
        
       | zahlman wrote:
       | > Imagine what it'll do if you give it bash. You could find out
       | in less than 10 minutes. Spoiler: you'd be surprisingly close to
       | having a working coding agent.
       | 
       | Okay, but what if I'd prefer _not_ to have to trust a remote
       | service not to send me                   { "output": [ { "type":
       | "function_call", "command": "rm -rf / --no-preserve-root" } ] }
       | ?
        
         | tptacek wrote:
         | Obviously if you're concerned about that, which is very
         | reasonable, don't run it in an environment where `rm -rf` can
         | cause you a real problem.
        
           | awayto wrote:
           | Also if you're doing function calls you can just have the
           | command as one response param, and arguments array as another
           | response param. Then just black/white list commands you
           | either don't want to run or which should require a human to
           | say ok.
        
         | worldsayshi wrote:
         | There are MCP configured virtualization solutions that is
         | supposed to be safe for letting LLM go wild. Like this one:
         | 
         | https://github.com/zerocore-ai/microsandbox
         | 
         | I haven't tried it.
        
       | dagss wrote:
       | I realize now what I need in Cursor: A button for "fork context".
       | 
       | I believe that would be a powerful tool solving many things there
       | are now separate techniques for.
        
       | ericd wrote:
       | Absolutely, especially the part about just rolling your own
       | alternative to Claude Code - build your own lightsaber. Having
       | your coding agent improve itself is a pretty magical experience.
       | And then you can trivially swap in whatever model you want
       | (Cerebras is crazy fast, for example, which makes a big
       | difference for these many-turn tool call conversations with big
       | lumps of context, though gpt-oss 120b is obviously not as good as
       | one of the frontier models). Add note-taking/memory, and ask it
       | to remember key facts to that. Add voice transcription so that
       | you can reply much faster (LLMs are amazing at taking in
       | imperfect transcriptions and understanding what you meant). Each
       | of these things takes on the order of a few minutes, and it's
       | super fun.
        
       | threecheese wrote:
       | Does anyone have an understanding - or intuition - of what the
       | agentic loop looks like in the popular coding agents? Is it
       | purely a "while 1: call_llm(system, assistant)", or is there
       | complex orchestration?
       | 
       | I'm trying to understand if the value for Claude Code (for
       | example) is purely in Sonnet/Haiku + the tool system prompt, or
       | if there's more secret sauce - beyond the "sugar" of instruction
       | file inclusion via commands, tools, skills etc.
        
       ___________________________________________________________________
       (page generated 2025-11-06 23:00 UTC)