[HN Gopher] A practical guide to building agents [pdf]
___________________________________________________________________
A practical guide to building agents [pdf]
Author : tosh
Score : 104 points
Date : 2025-06-04 15:21 UTC (7 hours ago)
(HTM) web link (cdn.openai.com)
(TXT) w3m dump (cdn.openai.com)
| ednite wrote:
| Thanks for sharing this. I'm actually starting to explore
| integrating an agent into one of my SaaS solutions, based on a
| client request.
|
| To be honest, my experience with agents is still pretty limited,
| so I'd really appreciate any advice, especially around best
| practices or a roadmap for implementation. The goal is to build
| something that can learn and reflect the company's culture,
| answer situational questions like "what to do in this case,"
| assist with document error checking, and generally serve as a
| helpful internal assistant.
|
| All of this came from the client's desire to have a tool that
| aligns with their internal knowledge and workflows.
|
| Is something like this feasible in terms of quality and
| reliability? And beyond hallucinations, are there major security
| or roadblocks concerns I should be thinking about?
| eric-burel wrote:
| The complexity of an agent may range from something relatively
| simple to whatever level of complexity you want. So your projet
| sounds doable but you'll have to run some exploration to get
| proper answers. Regarding reliability, quality and security, it
| is as important to learn how to observe an agent system than
| learning how to implement an agent system. An agent/LLM-based
| solution is proven to work only if you observe that it actually
| works, experiments, tests and monitoring are not optional like
| eg in web development. As for security concerns, you'd want to
| take a look at the OWASP top 10 for LLMs:
| https://owasp.org/www-project-top-10-for-large-language-mode...
| LLMs/agents indeed have their own new set of vulnerabilities.
| ednite wrote:
| That's sound advice, really appreciate the link. Regarding
| your point about continuous monitoring, that's actually the
| first thing I mentioned to the client.
|
| It's still highly experimental and needs to be observed,
| corrected, and tweaked constantly, kind of like teaching a
| child, where feedback and reinforcement are key.
|
| I may share my experience with the HN community down the
| line. Thanks again!
| abelanger wrote:
| I'm a big fan of https://github.com/humanlayer/12-factor-agents
| because I think it gets at the heart of engineering these
| systems for usage in your app rather than a completely
| unconstrained demo or MCP-based solution.
|
| In particular you can reduce most concerns around security and
| reliability when you treat your LLM call as a library method
| with structured output (Factor 4) and own your own control flow
| (Factor 8). There should never be a case where your agent is
| calling a tool with unconstrained input.
| ednite wrote:
| I guess I've got some reading and research ahead of me. I
| definitely would rather support the idea of treating LLM
| calls more like structured library functions, rather than
| letting them run wild.
|
| Definitely bookmarking this for reference. Appreciate you
| sharing it.
| ursaguild wrote:
| Ingesting documents and using natural language to search your
| org docs with an internal assistant sounds more like a good use
| case for RAG[1]. Agents are best when you need to autonomously
| plan and execute a series of actions[2]. You can combine the
| two but knowing when depends on the use case.
|
| I really like the OpenAI approach and how they outlined the
| thought process of when and how to use agents.
|
| [1] https://www.willowtreeapps.com/craft/retrieval-augmented-
| gen...
|
| [2] https://www.willowtreeapps.com/craft/building-ai-agents-
| with...
| ednite wrote:
| Interesting, and thanks for explanations.
|
| In this case, the agent would also need to learn from new
| events, like project lessons learned, for example.
|
| Just curious: can a RAG[1] system actually learn from new
| situations over time in this kind of setup, or is it purely
| pulling from what's already there?
| mousetree wrote:
| You can ingest new documents and data into the RAG system
| as you need
| ursaguild wrote:
| Especially with a client, consider the word choices around
| "learning". When using llms, agents, or rag, the system
| isn't learning (yet) but making a decision based on the
| context you provide. Most models are a fixed snapshot. If
| you provide up to date information, it will be able to give
| you an output based on that.
|
| "Learning" happens when initially training the llm or
| arguably when fine-tuning. Neither of which are needed for
| your use case as presented.
| ednite wrote:
| Thanks for the clarification, really appreciate it. It
| helps frame things more precisely.
|
| In my case, there will be a large amount of initial data
| fed into the system as context. But the client also
| expects the agent to act more like a smart assistant or
| teacher, one that can respond to new, evolving scenarios.
|
| Without getting into too much detail, imagine I feed the
| system an instruction like: "Box A and Box B should fit
| into Box 1 with at least 1" clearance." Later, a user
| gives the agent Box A, Box B, and now adds Box D and E,
| and asks it to fit everything into Box 1, which is too
| small. The expected behavior would be that the agent
| infers that an additional Box 2 is needed to accommodate
| everything.
|
| So I understand this isn't "learning" in the training
| sense, but rather pattern recognition and contextual
| reasoning based on prior examples and constraints.
|
| Basically, I should be saying "contextual reasoning"
| instead of "learning."
|
| Does that framing make sense?
| trevinhofmann wrote:
| Others have given some decent advice based on your comment, but
| would you be interested in a ~30 minute (video) call to dive a
| bit deeper so I can give more tailored suggestions?
| _pdp_ wrote:
| IMHO this guide should have been called "a theoretical guide for
| building agents". In practice, you cannot build agents like that
| if you want them to do useful things.
|
| Also, the examples provided are not only not practical but
| potentially bad practice. Why do you need a manager pattern to
| control a bunch of language translation agents when most models
| will do fine especially for latin-based languages? In practice a
| single LLM will not only be more cost-effective but also good for
| the overall user experience too.
|
| Also, prompting is the real unsung hero that barely gets a
| mention. In practice you cannot get away with just a couple of
| lines describing the problem / solution at a high-level. Prompts
| are complex and very much an art form because and frankly, let's
| be honest, there is no science whatsoever behind them - just
| intuition. But in practice they do have enormous effect on the
| overall agent performance.
|
| This guide is not aimed at developers to really educate how to
| build agents but at business executives and decision-makers who
| need a high-level understanding without getting into the
| practical implementation details. It glosses over the technical
| challenges and complexity that developers actually face when
| building useful agent systems in production environments.
| 3abiton wrote:
| Do you have any good practical guide in mind?
| ramesh31 wrote:
| Tools are the only thing that matters, and are what you should
| focus on, not "agents" as a separate concept. Locking yourself
| into any particular agent framework is silly; they are nothing
| but LLM calling while-loops connected to JSON/XML parsers. Tools
| define and shape the entirety of an agent's capability to do
| useful things, and through MCP can be trivially shared with
| virtually any agentic process.
| gavmor wrote:
| Yes, I wonder if it ever makes sense to partition an agent's
| toolkit across multiple "agents"--besides horizontal scaling.
| Why should one process have access to APIs that another
| doesn't? Authorization and secrets, maybe, but functionality?
___________________________________________________________________
(page generated 2025-06-04 23:00 UTC)