[HN Gopher] 12-factor Agents: Patterns of reliable LLM applications
       ___________________________________________________________________
        
       12-factor Agents: Patterns of reliable LLM applications
        
       I've been building AI agents for a while. After trying every
       framework out there and talking to many founders building with AI,
       I've noticed something interesting: most "AI Agents" that make it
       to production aren't actually that agentic. The best ones are
       mostly just well-engineered software with LLMs sprinkled in at key
       points.  So I set out to document what I've learned about building
       production-grade AI systems:
       https://github.com/humanlayer/12-factor-agents. It's a set of
       principles for building LLM-powered software that's reliable enough
       to put in the hands of production customers.  In the spirit of
       Heroku's 12 Factor Apps (https://12factor.net/), these principles
       focus on the engineering practices that make LLM applications more
       reliable, scalable, and maintainable. Even as models get
       exponentially more powerful, these core techniques will remain
       valuable.  I've seen many SaaS builders try to pivot towards AI by
       building greenfield new projects on agent frameworks, only to find
       that they couldn't get things past the 70-80% reliability bar with
       out-of-the-box tools. The ones that did succeed tended to take
       small, modular concepts from agent building, and incorporate them
       into their existing product, rather than starting from scratch.
       The full guide goes into detail on each principle with examples and
       patterns to follow. I've seen these practices work well in
       production systems handling real user traffic.  I'm sharing this as
       a starting point--the field is moving quickly so these principles
       will evolve. I welcome your feedback and contributions to help
       figure out what "production grade" means for AI systems!
        
       Author : dhorthy
       Score  : 421 points
       Date   : 2025-04-15 22:38 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | mertleee wrote:
       | What are your favorite open source "frameworks" for agents?
        
         | dhorthy wrote:
         | i have seen a ton of good ones, and they all have ups and
         | downs. I think rather than focusing on frameworks though, I'm
         | trying to dig into what goes into them, and what's the tradeoff
         | if you try to build most of it yourself instead
         | 
         | but since you asked, to name a few
         | 
         | - ts: mastra, gensx, vercel ai, many others! - python: crew,
         | langgraph, many others!
        
       | pancsta wrote:
       | Very informative wiki, thank you, I will definitely use it. So
       | Ive made my own "AI Agents framework" [0] based on actor model,
       | state machines and aspect oriented programming (released just
       | yesterday, no HN post yet) and I really like points 5 and 7:
       | 5: Unify execution state and business state         8. Own your
       | control flow
       | 
       | That is exactly what SecAI does, as it's a graph control flow
       | library at it's core (multigraph instead of DAG) and LLM calls
       | are embedded into graph's nodes. The flow is reinforced with
       | negotiation, cancellation and stateful relations, which make it
       | more "organic". Another thing often missed by other frameworks
       | are dedicated devtools (dbg, repl, svg) - programming for
       | failure, inspecting every step in detail, automatic data
       | exporters (metrics, traces, logs, sql), and dead-simple
       | integrations (bash). I've released the first tech demo [1] which
       | showcases all the devtools using a reference implementation of
       | deepresearch (ported from AtomicAgents). You may especially like
       | the Send/Stop button, which is nothings else then "Factor 6.
       | Launch/Pause/Resume with simple APIs". Oh and it's network
       | transparent, so it can scale.
       | 
       | Feel free to reach out.
       | 
       | [0] https://github.com/pancsta/secai
       | 
       | [1] https://youtu.be/0VJzO1S-gV0
        
         | dhorthy wrote:
         | i like the terminal UI and otel integrations - what tasks are
         | you using this for today?
        
         | serverlessmania wrote:
         | "Another thing often missed by other frameworks are dedicated
         | devtools"
         | 
         | From my experience, PydanticAI really nailed it with Logfire--
         | debugging[0] agents was significantly easier and more effective
         | compared to the other frameworks and libraries I tested.
         | 
         | [0] https://ai.pydantic.dev/logfire/#pydantic-logfire
        
       | DebtDeflation wrote:
       | > most "AI Agents" that make it to production aren't actually
       | that agentic. The best ones are mostly just well-engineered
       | software with LLMs sprinkled in at key points
       | 
       | I've been saying that forever, and I think that anyone who
       | actually implements AI in an enterprise context has come to the
       | same conclusion. Using the Anthropic vernacular, AI "workflows"
       | are the solution 90% of the time and AI "agents" maybe 10%. But
       | everyone wants the shiny new object on their CV and the LLM
       | vendors want to bias the market in that direction because running
       | LLMs in a loop drives token consumption through the roof.
        
         | film42 wrote:
         | Everyone wants to go the agent route until the agent messes up
         | once after working 99 times in a row. "Why did it make a silly
         | mistake?" We don't know. "Well, let's put a few more guard
         | rails around it." Sounds good... back to "workflows."
        
       | deadbabe wrote:
       | With all this AI-agent bullshit out there these days, the most
       | useful AI-agent I still use in daily life is the humble floor
       | vacuum/mopping robot.
        
       | daxfohl wrote:
       | Another one: plan for cost at scale.
       | 
       | These things aren't cheap at scale, so whenever something might
       | be handled by a deterministic component, try that first. Not only
       | save on hallucinations and latency, but could make a huge
       | difference in your bottom line.
        
         | dhorthy wrote:
         | Yeah definitely. I think the pattern I see people using most is
         | "start with slow, expensive, but low dev effort, and then
         | refine overtime as you fine speed/quality/cost bottlenecks
         | worth investing in"
        
       | daxfohl wrote:
       | This old obscure blog post about framework patterns has resonated
       | with me throughout my career and I think it applies here too.
       | LLMs are best used as "libraries" rather than "frameworks", for
       | all the reasons described in the article and more, especially now
       | while everything is in such flux. "Frameworks" are sexier and
       | easier to sell though, and lead to lock-in and add-on services,
       | so that's what gets promoted.
       | 
       | https://tomasp.net/blog/2015/library-frameworks/
        
         | saadatq wrote:
         | This is so good...
         | 
         | "... you can find frameworks not just in software, but also in
         | ordinary life. If you buy package holidays, you're buying a
         | framework - they transport you to some place, put you in a
         | hotel, feed you and your activities have to fit into the shape
         | provided by the framework (say, go into the pool and swim
         | there). If you travel independently, you are composing
         | libraries. You have to book your flights, find your
         | accommodation and arrange your program (all using different
         | libraries). It is more work, but you are in control - and you
         | can arrange things exactly the way you need."
        
           | daxfohl wrote:
           | My favorite blog post / presentation is Sandi Metz "The Wrong
           | Abstraction", but this one is up there. Definitely punches
           | above its weight for a small obscure post.
        
       | daxfohl wrote:
       | Also, "Don't lay off half your engineering department and try to
       | replace with LLMs"
        
       | mgdev wrote:
       | These are great. I had my own list of takeaways [0] after doing
       | this for a couple years, though I wouldn't go so far as calling
       | mine factors.
       | 
       | Like you, biggest one I _didn 't_ include but would now is to own
       | the lowest level planning loop. It's fine to have some dynamic
       | planning, but you should own an OODA loop (observe, orient,
       | decide, act) and have heuristics for determining if you're
       | converging on a solution (e.g. scoring), or else breaking out
       | (e.g. max loops).
       | 
       | I would also potentially bake in a workflow engine. Then, have
       | your model build a workflow specification that runs on that
       | engine (where workflow steps may call back to the model) instead
       | of trying to keep an implicit workflow valid/progressing through
       | multiple turns in the model.
       | 
       | [0]: https://mg.dev/lessons-learned-building-ai-agents/
        
         | dhorthy wrote:
         | this guide is great, i liked the "chat interfaces are dumb"
         | take - totally agree. AI-based UIs have a very long way to go
        
       | nickenbank wrote:
       | I totally agree with this. Most, if not all, frameworks or
       | building agents are a waste of time
        
         | dhorthy wrote:
         | this guy gets it
        
       | hellovai wrote:
       | really cool to see BAML on here :) 100% align on so much of what
       | you've said here. its really about treating LLMs as functions.
        
         | dhorthy wrote:
         | excellent work on BAML and love it as a building block for
         | agents
        
       | hhimanshu wrote:
       | I am wondering how libraries like DSPY [0] fits in your factor-2
       | [1]
       | 
       | As I was reading, I saw mention of BAML > (the above example uses
       | BAML to generate the prompt ...
       | 
       | Personally, in my experience hand-writing prompts for extracting
       | structured information from unstructured data has never been
       | easy. With DSPY, my experience has been quite good so far.
       | 
       | As you have used raw prompt from BAML, what do you think of using
       | the raw prompts from DSPY [2]?
       | 
       | [0] https://dspy.ai/
       | 
       | [1] https://github.com/humanlayer/12-factor-
       | agents/blob/main/con...
       | 
       | [2] https://dspy.ai/tutorials/observability/#using-
       | inspect_histo...
        
         | dhorthy wrote:
         | interesting - I think I have to side with the Boundary (YC W23)
         | folks on this one - if you want bleeding edge performance, you
         | need to be able to open the box and hack on the insides.
         | 
         | I don't agree fully with this article
         | https://www.chrismdp.com/beyond-prompting/ but the comparison
         | of punchards -> assembly -> c -> higher langs is quite useful
         | here
         | 
         | I just don't know when we'll get the right abstraction - i
         | don't think langchain or dspy are the "C programming language"
         | of AI yet (they could get there!).
         | 
         | For now I'll stick to my "close to the metal" workbench where I
         | can inspect tokens, reorder special tokens like
         | system/user/JSON, and dynamically keep up with the
         | idiosyncrasies of new models without being locked up waiting
         | for library support.
        
           | chrismdp wrote:
           | It's always true that you need to drop down a level of
           | abstraction in order to extract the ultimate performance. (eg
           | I wrote a decent-sized game + engine entirely in C about 10
           | years ago and played with SIMD vectors to optimise the render
           | loop)
           | 
           | However, I think the vast majority of use cases will not
           | require this level of control, and we will abandon prompts
           | once the tools improve.
           | 
           | Langchain and DSPY are also not there for me either - I think
           | the whole idea of prompting + evals needs a rethink.
           | 
           | (full disclaimer: I'm working on such a tool right now!)
        
             | dhorthy wrote:
             | i'd be interested to check it out
             | 
             | here's a take, I adapted this from someone on the
             | notebookLM team on swyx's podcast
             | 
             | > the only way to build really impressive experiences in
             | AI, is to find something right at the edge of the model's
             | capability, and to get it right consistently.
             | 
             | So in order to build something very good / better than the
             | rest, you will always benefit from being able to bring in
             | every optimization you can.
        
               | chrismdp wrote:
               | I think the building blocks of the most impressive
               | experiences will come from choosing the exact right point
               | to involve an LLM, the orchestration of the component
               | pieces, and the user experience.
               | 
               | That's certainly what I found in games. The games which
               | felt magic to play were never the ones with the best hand
               | rolled engine.
               | 
               | The tools aren't there yet to ignore prompts, and you'll
               | always need to drop down to raw prompting sometimes. I'm
               | looking forward to a future where wrangling prompts is
               | only needed for 1% of my system.
        
               | dhorthy wrote:
               | yeah. the issue is when you're baked into a tool
               | stack/framework where you _cant_ go customize in that 1%
               | of cases. A lot of tools _try_ to get the right
               | abstractions where you can  "customize everything you
               | would want to" but they miss the mark in some cases
        
               | chrismdp wrote:
               | 100%. You can't and shouldn't wrap every interaction. We
               | need a new approach.
        
       | wfn wrote:
       | This could not have come at a better time for me, thank you!
       | 
       | I've been tinkering with an idea for an audiovisual sandbox[1]
       | (like vvvv[2] but much simpler of course, barebones).
       | 
       | Idea is to have a way to insert LM (or some simple locally run
       | neural net) "nodes" which are given specific tasks and whose
       | _output_ is expected to be very constrained. Hence your example:
       | "question -> answer: float"
       | 
       | Is very attractive here. Of course, some questions in my case
       | would be quite abstract, but anyway. Also, multistage pipelines
       | are also very interesting.
       | 
       | [1]: loose set of bulletpoints brainstorming the idea if curious,
       | not organised: https://kfs.mkj.lt/#audiovisllm (click to expand
       | description)
       | 
       | [2]: https://vvvv.org/
        
         | dhorthy wrote:
         | Typed outputs from an LLM is a game changer!
        
       | mettamage wrote:
       | I've noticed some of these factors myself as well. I'd love to
       | build more AI applications like this. Currently I'm a data
       | analyst and they don't fully appreciate that I can build stuff
       | like this as it is not a technology oriented company.
       | 
       | I'd love to work on stuff like this full-time. If anyone is
       | interested in a chat, my email is on my profile (US/EU).
        
         | dhorthy wrote:
         | cool thing about open source is you can work on whatever you
         | want, and it's the best way to meet people who do similar work
         | for their day job as well
        
       | darepublic wrote:
       | I didn't really read this extensively but to me I would want to
       | use as much deterministic code as possible and leverage the llm
       | as little as possible. That to me is a better portend of
       | predictable result, lower operational costs and is a signal that
       | nobody could just quickly reproduce the same app. I would tend to
       | roll my own tools and not use out of the box buzz word glue to
       | integrate my llm with other systems. And if these conditions
       | aren't met or aren't necessary I'd figure someone else could just
       | vibe code the same solution in no time anyway. Keep control I
       | say! Die on the hill of control! That's not to say I'm not
       | impressed by LLMs.. quite the opposite
        
         | dhorthy wrote:
         | control is good, and determinism is good - while the primary
         | goal is to convince people "don't give up too much control" -
         | there is a secondary which is: THESE are the places where it
         | makes sense to give up some control
        
       | AbhishekParmar wrote:
       | would feel blessed if someone dropped something similar but for
       | image generation agents. Been trying to build consistent
       | image/video generation agents and god are they unreliable
        
       ___________________________________________________________________
       (page generated 2025-04-17 23:02 UTC)