[HN Gopher] Building Effective "Agents"
       ___________________________________________________________________
        
       Building Effective "Agents"
        
       Author : jascha_eng
       Score  : 73 points
       Date   : 2024-12-20 12:29 UTC (10 hours ago)
        
 (HTM) web link (www.anthropic.com)
 (TXT) w3m dump (www.anthropic.com)
        
       | jascha_eng wrote:
       | I put the agents in quotes because anthropic actually talks more
       | about what they call "workflows". And imo this is where the real
       | value of LLMs currently lies, workflow automation.
       | 
       | They also say that using LangChain and other frameworks is mostly
       | unnecessary and does more harm than good. They instead argue to
       | use some simple patterns, directly on the API level. Not dis-
       | similar to the old-school Gang of Four software engineering
       | patterns.
       | 
       | Really like this post as a guidance for how to actually build
       | useful tools with LLMs. Keep it simple, stupid.
        
         | curious_cat_163 wrote:
         | Indeed. Very clarifying.
         | 
         | I would just posit that they do make a distinction between
         | workflows and agents
        
         | Philpax wrote:
         | Aren't you editorialising by doing so?
        
           | jascha_eng wrote:
           | I guess a little. I really liked the read though, it put in
           | words what I couldn't and I was curious if others felt the
           | same.
           | 
           | However the post was posted here yesterday and didn't really
           | have a lot of traction. I thought this was partially because
           | of the term agentic, which the community seems a bit fatigued
           | by. So I put it in quotes to highlight that Anthropic
           | themselves deems it a little vague and hopefully spark more
           | interest. I don't think it messes with their message too
           | much?
           | 
           | Honestly it didn't matter anyways, without second chance
           | pooling this post would have been lost again (so thanks
           | Daniel!)
        
       | curious_cat_163 wrote:
       | > Agents can be used for open-ended problems where it's difficult
       | or impossible to predict the required number of steps, and where
       | you can't hardcode a fixed path. The LLM will potentially operate
       | for many turns, and you must have some level of trust in its
       | decision-making. Agents' autonomy makes them ideal for scaling
       | tasks in trusted environments.
       | 
       | The questions then become:
       | 
       | 1. When can you (i.e. a person who wants to build systems with
       | them) trust them to make decisions on their own?
       | 
       | 2. What type of trusted environments are we talking about?
       | (Sandboxing?)
       | 
       | So, that all requires more thought -- perhaps by some folks who
       | hang out at this site. :)
       | 
       | I suspect that someone will come up with a "real-world"
       | application at a non-tech-first enterprise company and let us
       | know.
        
       | timdellinger wrote:
       | My personal view is that the roadmap to AGI requires an LLM
       | acting as a prefrontal cortex: something designed to think about
       | thinking.
       | 
       | It would decide what circumstances call for double-checking facts
       | for accuracy, which would hopefully catch hallucinations. It
       | would write its own acceptance criteria for its answers, etc.
       | 
       | It's not clear to me how to train each of the sub-models
       | required, or how big (or small!) they need to be, or what
       | architecture works best. But I think that complex architectures
       | are going to win out over the "just scale up with more data and
       | more compute" approach.
        
         | zby wrote:
         | IMHO with a simple loop LLMs are already capable of some meta
         | thinking, even without any internal new architectures. For me
         | where it still fails is that LLMs cannot catch their own
         | mistakes even some obvious ones. Like with GPT 3.5 I had a
         | persistent problem with the following question: "Who is older,
         | Annie Morton or Terry Richardson?". I was giving it Wikipedia
         | and it was correctly finding out the birth dates of the most
         | popular people with the names - but then instead of comparing
         | ages it was comparing birth years. And once it did that it was
         | impossible to it to spot the error.
         | 
         | Now with 4o-mini I have a similar even if not so obvious
         | problem.
         | 
         | Just writing this down convinced me that there are some ideas
         | to try here - taking a 'report' of the thought process out of
         | context and judging it there, or changing the temperature or
         | even maybe doing cross-checking with a different model?
        
         | delichon wrote:
         | Here's Lloyd Watts describing his work on this.
         | 
         | https://www.youtube.com/watch?v=b2Hp0Jk9d4I&list=LL&index=5
        
       | serjester wrote:
       | Couldn't agree more with this - too many people rush to build
       | autonomous agents when their problem could easily be defined as a
       | DAG workflow. Agents increase the degrees of freedom in your
       | system exponentially making it so much more challenging to
       | evaluate systematically.
        
       | tonyhb wrote:
       | It looks like Agents are less about DAG workflows and fully
       | autonomous "networks of agents", but more of a stateful network:
       | 
       | * A "network of agents" is a system of agents and tools
       | 
       | * That run and build up state (both "memory" and actual state via
       | tool use)
       | 
       | * Which is then inspected when routing as a kind of "state
       | machine".
       | 
       | * Routing should specify which agent (or agents, in parallel) to
       | run next, via that state.
       | 
       | * Routing can also use other agents (routing agents) to figure
       | out what to do next, instead of code.
       | 
       | We're codifying this with durable workflows in a prototypical
       | library -- AgentKit: https://github.com/inngest/agent-kit/ (docs:
       | https://agentkit.inngest.com/overview).
       | 
       | It took less than a day to get a network of agents to correctly
       | fix swebench-lite examples. It's super early, but very fun. One
       | of the cool things is that this uses Inngest under the hood, so
       | you get all of the classic durable execution/step
       | function/tracing/o11y for free, but it's just regular code that
       | you write.
        
       | brotchie wrote:
       | Have been building agents for past 2 years, my tl;dr is that:
       | 
       |  _Agents are Interfaces, Not Implementations_
       | 
       | The current zeitgeist seems to think of agents as passthrough
       | agents: e.g. a lite wrapper around a core that's almost 100% a
       | LLM.
       | 
       | The most effective agents I've seen, and have built, are largely
       | traditional software engineering with a sprinkling of LLM calls
       | for "LLM hard" problems. LLM hard problems are problems that can
       | ONLY be solved by application of an LLM (creative writing, text
       | synthesis, intelligent decision making). Leave all the problems
       | that are amenable to decades of software engineering best
       | practice to good old deterministic code.
       | 
       | I've been calling system like this _" Transitional Software
       | Design."_ That is, they're mostly a traditional software
       | application under the hood (deterministic, well structured code,
       | separation of concerns) with judicious use of LLMs where
       | required.
       | 
       | Ultimately, users care about what the agent does, not how it does
       | it.
       | 
       | The biggest differentiator I've seen between agents that work and
       | get adoption, and those that are eternally in a demo phase, is
       | related to the cardinality of the state space the agent is
       | operating in. Too many folks try and "boil the ocean" and try and
       | implement a generic purpose capability: e.g. Generate Python code
       | to do something, or synthesizing SQL based on natural language.
       | 
       | The projects I've seen that work really focus on reducing the
       | state space of agent decision making down to the smallest
       | possible set that delivers user value.
       | 
       | e.g. Rather than generating arbitrary SQL, work out a set of ~20
       | SQL templates that are hyper-specific to the business problem
       | you're solving. Parameterize them with the options for select,
       | filter, group by, order by, and the subset of aggregate
       | operations that are relevant. Then let the agent chose the right
       | template + parameters from a relatively small finite set of
       | options.
       | 
       | ^^^ the delta in agent quality between "boiling the ocean" vs
       | "agent's free choice over a small state space" is night and day.
       | It lets you deploy early, deliver value, and start getting user
       | feedback.
       | 
       | Building Transitional Software Systems:                 1. Deeply
       | understand the domain and CUJs,       2. Segment out the system
       | into "problems that traditional software is good at solving" and
       | "LLM-hard problems",       3. For the LLM hard problems, work out
       | the smallest possible state space of decision making,       4.
       | Build the system, and get users using it,       5. Gradually
       | expand the state space as feedback flows in from users.
        
         | samdjstephens wrote:
         | There'll always be an advantage for those who understand the
         | problem they're solving for sure.
         | 
         | The balance of traditional software components and LLM driven
         | components in a system is an interesting topic - I wonder how
         | the capabilities of future generations of foundation model will
         | change that?
        
       | simonw wrote:
       | This is by far the most practical piece of writing I've seen on
       | the subject of "agents" - it includes actionable definitions,
       | then splits most of the value out into "workflows" and describes
       | those in depth with example applications.
       | 
       | There's also a cookbook with useful code examples:
       | https://github.com/anthropics/anthropic-cookbook/tree/main/p...
       | 
       | Blogged about this here:
       | https://simonwillison.net/2024/Dec/20/building-effective-age...
        
         | 3abiton wrote:
         | I'm glad they are publishing their cookbooks recipes on github
         | too. Openai used to be more active there.
        
       | ramesh31 wrote:
       | Key to understanding the power of agentic workflows is tool
       | usage. You don't have to write logic anymore, you simply give an
       | agent the tools it needs to accomplish a task and ask it to do
       | so. Models like the latest Sonnet have gotten so advanced now
       | that coding abilities are reaching superhuman levels. All the
       | hallucinations and "jitter" of models from 1-2 years ago has gone
       | away. They can be reasoned on now and you can build reliable
       | systems with them.
        
         | minimaxir wrote:
         | > you simply give an agent the tools
         | 
         | That isn't simple. There is a lot of nuance in tool definition.
        
           | ramesh31 wrote:
           | >That isn't simple. There is a lot of nuance in tool
           | definition.
           | 
           | Simple relative to the equivalent in traditional coding
           | techniques. If my system needs to know about some reference
           | material, I no longer need to build out endpoints and handle
           | all of that, or modify any kind of controller code. I just
           | ask the agent to include the information, and it intuitively
           | knows how to do that with a DB tool in its' context.
        
       | websku wrote:
       | indeed, we've seen this approach as well. All these "frameworks"
       | in real business cases become too complicated.
        
       ___________________________________________________________________
       (page generated 2024-12-20 23:00 UTC)