[HN Gopher] Building Effective "Agents"
___________________________________________________________________
Building Effective "Agents"
Author : jascha_eng
Score : 73 points
Date : 2024-12-20 12:29 UTC (10 hours ago)
(HTM) web link (www.anthropic.com)
(TXT) w3m dump (www.anthropic.com)
| jascha_eng wrote:
| I put the agents in quotes because anthropic actually talks more
| about what they call "workflows". And imo this is where the real
| value of LLMs currently lies, workflow automation.
|
| They also say that using LangChain and other frameworks is mostly
| unnecessary and does more harm than good. They instead argue to
| use some simple patterns, directly on the API level. Not dis-
| similar to the old-school Gang of Four software engineering
| patterns.
|
| Really like this post as a guidance for how to actually build
| useful tools with LLMs. Keep it simple, stupid.
| curious_cat_163 wrote:
| Indeed. Very clarifying.
|
| I would just posit that they do make a distinction between
| workflows and agents
| Philpax wrote:
| Aren't you editorialising by doing so?
| jascha_eng wrote:
| I guess a little. I really liked the read though, it put in
| words what I couldn't and I was curious if others felt the
| same.
|
| However the post was posted here yesterday and didn't really
| have a lot of traction. I thought this was partially because
| of the term agentic, which the community seems a bit fatigued
| by. So I put it in quotes to highlight that Anthropic
| themselves deems it a little vague and hopefully spark more
| interest. I don't think it messes with their message too
| much?
|
| Honestly it didn't matter anyways, without second chance
| pooling this post would have been lost again (so thanks
| Daniel!)
| curious_cat_163 wrote:
| > Agents can be used for open-ended problems where it's difficult
| or impossible to predict the required number of steps, and where
| you can't hardcode a fixed path. The LLM will potentially operate
| for many turns, and you must have some level of trust in its
| decision-making. Agents' autonomy makes them ideal for scaling
| tasks in trusted environments.
|
| The questions then become:
|
| 1. When can you (i.e. a person who wants to build systems with
| them) trust them to make decisions on their own?
|
| 2. What type of trusted environments are we talking about?
| (Sandboxing?)
|
| So, that all requires more thought -- perhaps by some folks who
| hang out at this site. :)
|
| I suspect that someone will come up with a "real-world"
| application at a non-tech-first enterprise company and let us
| know.
| timdellinger wrote:
| My personal view is that the roadmap to AGI requires an LLM
| acting as a prefrontal cortex: something designed to think about
| thinking.
|
| It would decide what circumstances call for double-checking facts
| for accuracy, which would hopefully catch hallucinations. It
| would write its own acceptance criteria for its answers, etc.
|
| It's not clear to me how to train each of the sub-models
| required, or how big (or small!) they need to be, or what
| architecture works best. But I think that complex architectures
| are going to win out over the "just scale up with more data and
| more compute" approach.
| zby wrote:
| IMHO with a simple loop LLMs are already capable of some meta
| thinking, even without any internal new architectures. For me
| where it still fails is that LLMs cannot catch their own
| mistakes even some obvious ones. Like with GPT 3.5 I had a
| persistent problem with the following question: "Who is older,
| Annie Morton or Terry Richardson?". I was giving it Wikipedia
| and it was correctly finding out the birth dates of the most
| popular people with the names - but then instead of comparing
| ages it was comparing birth years. And once it did that it was
| impossible to it to spot the error.
|
| Now with 4o-mini I have a similar even if not so obvious
| problem.
|
| Just writing this down convinced me that there are some ideas
| to try here - taking a 'report' of the thought process out of
| context and judging it there, or changing the temperature or
| even maybe doing cross-checking with a different model?
| delichon wrote:
| Here's Lloyd Watts describing his work on this.
|
| https://www.youtube.com/watch?v=b2Hp0Jk9d4I&list=LL&index=5
| serjester wrote:
| Couldn't agree more with this - too many people rush to build
| autonomous agents when their problem could easily be defined as a
| DAG workflow. Agents increase the degrees of freedom in your
| system exponentially making it so much more challenging to
| evaluate systematically.
| tonyhb wrote:
| It looks like Agents are less about DAG workflows and fully
| autonomous "networks of agents", but more of a stateful network:
|
| * A "network of agents" is a system of agents and tools
|
| * That run and build up state (both "memory" and actual state via
| tool use)
|
| * Which is then inspected when routing as a kind of "state
| machine".
|
| * Routing should specify which agent (or agents, in parallel) to
| run next, via that state.
|
| * Routing can also use other agents (routing agents) to figure
| out what to do next, instead of code.
|
| We're codifying this with durable workflows in a prototypical
| library -- AgentKit: https://github.com/inngest/agent-kit/ (docs:
| https://agentkit.inngest.com/overview).
|
| It took less than a day to get a network of agents to correctly
| fix swebench-lite examples. It's super early, but very fun. One
| of the cool things is that this uses Inngest under the hood, so
| you get all of the classic durable execution/step
| function/tracing/o11y for free, but it's just regular code that
| you write.
| brotchie wrote:
| Have been building agents for past 2 years, my tl;dr is that:
|
| _Agents are Interfaces, Not Implementations_
|
| The current zeitgeist seems to think of agents as passthrough
| agents: e.g. a lite wrapper around a core that's almost 100% a
| LLM.
|
| The most effective agents I've seen, and have built, are largely
| traditional software engineering with a sprinkling of LLM calls
| for "LLM hard" problems. LLM hard problems are problems that can
| ONLY be solved by application of an LLM (creative writing, text
| synthesis, intelligent decision making). Leave all the problems
| that are amenable to decades of software engineering best
| practice to good old deterministic code.
|
| I've been calling system like this _" Transitional Software
| Design."_ That is, they're mostly a traditional software
| application under the hood (deterministic, well structured code,
| separation of concerns) with judicious use of LLMs where
| required.
|
| Ultimately, users care about what the agent does, not how it does
| it.
|
| The biggest differentiator I've seen between agents that work and
| get adoption, and those that are eternally in a demo phase, is
| related to the cardinality of the state space the agent is
| operating in. Too many folks try and "boil the ocean" and try and
| implement a generic purpose capability: e.g. Generate Python code
| to do something, or synthesizing SQL based on natural language.
|
| The projects I've seen that work really focus on reducing the
| state space of agent decision making down to the smallest
| possible set that delivers user value.
|
| e.g. Rather than generating arbitrary SQL, work out a set of ~20
| SQL templates that are hyper-specific to the business problem
| you're solving. Parameterize them with the options for select,
| filter, group by, order by, and the subset of aggregate
| operations that are relevant. Then let the agent chose the right
| template + parameters from a relatively small finite set of
| options.
|
| ^^^ the delta in agent quality between "boiling the ocean" vs
| "agent's free choice over a small state space" is night and day.
| It lets you deploy early, deliver value, and start getting user
| feedback.
|
| Building Transitional Software Systems: 1. Deeply
| understand the domain and CUJs, 2. Segment out the system
| into "problems that traditional software is good at solving" and
| "LLM-hard problems", 3. For the LLM hard problems, work out
| the smallest possible state space of decision making, 4.
| Build the system, and get users using it, 5. Gradually
| expand the state space as feedback flows in from users.
| samdjstephens wrote:
| There'll always be an advantage for those who understand the
| problem they're solving for sure.
|
| The balance of traditional software components and LLM driven
| components in a system is an interesting topic - I wonder how
| the capabilities of future generations of foundation model will
| change that?
| simonw wrote:
| This is by far the most practical piece of writing I've seen on
| the subject of "agents" - it includes actionable definitions,
| then splits most of the value out into "workflows" and describes
| those in depth with example applications.
|
| There's also a cookbook with useful code examples:
| https://github.com/anthropics/anthropic-cookbook/tree/main/p...
|
| Blogged about this here:
| https://simonwillison.net/2024/Dec/20/building-effective-age...
| 3abiton wrote:
| I'm glad they are publishing their cookbooks recipes on github
| too. Openai used to be more active there.
| ramesh31 wrote:
| Key to understanding the power of agentic workflows is tool
| usage. You don't have to write logic anymore, you simply give an
| agent the tools it needs to accomplish a task and ask it to do
| so. Models like the latest Sonnet have gotten so advanced now
| that coding abilities are reaching superhuman levels. All the
| hallucinations and "jitter" of models from 1-2 years ago has gone
| away. They can be reasoned on now and you can build reliable
| systems with them.
| minimaxir wrote:
| > you simply give an agent the tools
|
| That isn't simple. There is a lot of nuance in tool definition.
| ramesh31 wrote:
| >That isn't simple. There is a lot of nuance in tool
| definition.
|
| Simple relative to the equivalent in traditional coding
| techniques. If my system needs to know about some reference
| material, I no longer need to build out endpoints and handle
| all of that, or modify any kind of controller code. I just
| ask the agent to include the information, and it intuitively
| knows how to do that with a DB tool in its' context.
| websku wrote:
| indeed, we've seen this approach as well. All these "frameworks"
| in real business cases become too complicated.
___________________________________________________________________
(page generated 2024-12-20 23:00 UTC)