[HN Gopher] Building Effective "Agents"
       ___________________________________________________________________
        
       Building Effective "Agents"
        
       Author : jascha_eng
       Score  : 510 points
       Date   : 2024-12-20 12:29 UTC (1 days ago)
        
 (HTM) web link (www.anthropic.com)
 (TXT) w3m dump (www.anthropic.com)
        
       | jascha_eng wrote:
       | I put the agents in quotes because anthropic actually talks more
       | about what they call "workflows". And imo this is where the real
       | value of LLMs currently lies, workflow automation.
       | 
       | They also say that using LangChain and other frameworks is mostly
       | unnecessary and does more harm than good. They instead argue to
       | use some simple patterns, directly on the API level. Not dis-
       | similar to the old-school Gang of Four software engineering
       | patterns.
       | 
       | Really like this post as a guidance for how to actually build
       | useful tools with LLMs. Keep it simple, stupid.
        
         | curious_cat_163 wrote:
         | Indeed. Very clarifying.
         | 
         | I would just posit that they do make a distinction between
         | workflows and agents
        
         | Philpax wrote:
         | Aren't you editorialising by doing so?
        
           | jascha_eng wrote:
           | I guess a little. I really liked the read though, it put in
           | words what I couldn't and I was curious if others felt the
           | same.
           | 
           | However the post was posted here yesterday and didn't really
           | have a lot of traction. I thought this was partially because
           | of the term agentic, which the community seems a bit fatigued
           | by. So I put it in quotes to highlight that Anthropic
           | themselves deems it a little vague and hopefully spark more
           | interest. I don't think it messes with their message too
           | much?
           | 
           | Honestly it didn't matter anyways, without second chance
           | pooling this post would have been lost again (so thanks
           | Daniel!)
        
         | rybosome wrote:
         | I felt deeply vindicated by their assessment of these
         | frameworks, in particular LangChain.
         | 
         | I've built and/or worked on a few different LLM-based
         | workflows, and LangChain definitely makes things worse in my
         | opinion.
         | 
         | What it boils down to is that we are still coming to understand
         | the right patterns of development for how to develop agents and
         | agentic workflows. LangChain made choices about how to abstract
         | things that are not general or universal enough to be useful.
        
           | wahnfrieden wrote:
           | The article does not mention the LangChain framework.
           | LangGraph is a different framework, have you tried it?
        
         | ankit219 wrote:
         | Deploying in production, the current agentic systems do not
         | really work well. Workflow automation does. The reason is very
         | native to LLMs, but also incredibly basic. Every agentic system
         | starts with planning and reasoning module, where an LLM
         | evaluates the task given and plans about how to accomplish that
         | task, before moving on to next steps.
         | 
         | When an agent is given a task, they inevitably come up with
         | different plans on different tries due to inherent nature of
         | LLMs. Most companies like this step to be predictable, and they
         | end up removing it from the system and doing it manually. Thus
         | turning it into a workflow automation vs an agentic system. I
         | think this is what people actually mean when they want to
         | deploy agents in production. LLMs are great at automation*, not
         | great at problem solving. Examples I have seen - customer
         | support (you want predictability), lead mining, marketing copy
         | generation, code flows and architecture, product specs
         | generation, etc.
         | 
         | The next leap for AI systems is going to be whether they can
         | solve challenging problems at companies - being the experts vs
         | the doing the task they are assigned. They should really be
         | called agents, not the current ones.
        
         | Kydlaw wrote:
         | In fact they are mentioning LangGraph (the agent framework from
         | the LangChain company). Imo LangGraph is a much more thoughtful
         | and better built piece of software than the LangChain
         | framework.
         | 
         | As I said, they already mention LangGraph in the article, so
         | the Anthropic's conclusions still hold (i.e. KISS).
         | 
         | But this thread is going in the wrong direction when talking
         | about LangChain
        
           | jascha_eng wrote:
           | I'm lumping them all in the same category tbh. They say to
           | just use the model libraries directly or a thin abstraction
           | layer (like litellm maybe?) if you want to keep flexibility
           | to change models easily.
        
       | curious_cat_163 wrote:
       | > Agents can be used for open-ended problems where it's difficult
       | or impossible to predict the required number of steps, and where
       | you can't hardcode a fixed path. The LLM will potentially operate
       | for many turns, and you must have some level of trust in its
       | decision-making. Agents' autonomy makes them ideal for scaling
       | tasks in trusted environments.
       | 
       | The questions then become:
       | 
       | 1. When can you (i.e. a person who wants to build systems with
       | them) trust them to make decisions on their own?
       | 
       | 2. What type of trusted environments are we talking about?
       | (Sandboxing?)
       | 
       | So, that all requires more thought -- perhaps by some folks who
       | hang out at this site. :)
       | 
       | I suspect that someone will come up with a "real-world"
       | application at a non-tech-first enterprise company and let us
       | know.
        
         | ripped_britches wrote:
         | Just take any example and think how a human would break it down
         | with decision trees.
         | 
         | You are building an AI system to respond to your email.
         | 
         | The first agent decides whether the new email should be
         | responded to, yes or no.
         | 
         | If no, it can send it to another LLM call that decides to
         | archive it or leave it in the inbox for the human.
         | 
         | If yes, it sends it to classifier that decides what type of
         | response is required.
         | 
         | Maybe there are some emails like for your work that require
         | something brief like "congrats!" to all those new feature
         | launch emails you get internally.
         | 
         | Or others that are inbound sales emails that need to go out to
         | another system that fetches product related knowledge to craft
         | a response with the right context. Followed by a checker call
         | that makes sure the response follows brand guidelines.
         | 
         | The point is all of these steps are completely hypothetical but
         | you can imagine how loosely providing some set of instructions
         | and function calls and procedural limits can easily classify
         | things and minimize error rate.
         | 
         | You can do this for any workflow by creatively combining
         | different function calls, recursion, procedural limits, etc.
         | And if you build multiple different decision trees/workflows,
         | you can A/B test those and use LLM-as-a-judge to score the
         | performance. Especially if you're working on a task with lots
         | of example outputs.
         | 
         | As for trusted environments, assume every single LLM call has
         | been hijacked and don't trust its input/output and you'll be
         | good. I put mine in their own cloudflare workers where they
         | can't do any damage beyond giving an odd response to the user.
        
           | skydhash wrote:
           | > _The first agent decides whether the new email should be
           | responded to, yes or no._
           | 
           | How would you trust that the agent is following the criteria,
           | and how sure that the criteria is specific enough. Like
           | someone you just meet told you they going to send you
           | something via email, but then the agent misinterpret it due
           | to missing context and decided to respond in a generic manner
           | leading to misunderstanding.
           | 
           | > _assume every single LLM call has been hijacked and don't
           | trust its input /output and you'll be good._
           | 
           | Which is not new. But with formal languages, you have a more
           | precise definition of what acceptable inputs are (the whole
           | point of formalism is precise definitions). With LLM
           | workflows, the whole environment should be assumed to be
           | public information. And you should probably add a fine point
           | that the output does not engage you in anything.
        
             | TobTobXX wrote:
             | > How would you trust that the agent is following the
             | criteria, and how sure that the criteria is specific
             | enough?
             | 
             | How do you know if a spam filter heuristic works only when
             | intended?
             | 
             | You test it. Hard. On the thousands of emails in your
             | archive, on edge-cases you prepare manually, and on the
             | incoming mails. If it doesn't work for some cases, write
             | tests that test for this, adjust prompt and run the test
             | suite.
             | 
             | It won't ever work in 100% of all cases, but neither do
             | spam filters and we still use them.
        
       | timdellinger wrote:
       | My personal view is that the roadmap to AGI requires an LLM
       | acting as a prefrontal cortex: something designed to think about
       | thinking.
       | 
       | It would decide what circumstances call for double-checking facts
       | for accuracy, which would hopefully catch hallucinations. It
       | would write its own acceptance criteria for its answers, etc.
       | 
       | It's not clear to me how to train each of the sub-models
       | required, or how big (or small!) they need to be, or what
       | architecture works best. But I think that complex architectures
       | are going to win out over the "just scale up with more data and
       | more compute" approach.
        
         | zby wrote:
         | IMHO with a simple loop LLMs are already capable of some meta
         | thinking, even without any internal new architectures. For me
         | where it still fails is that LLMs cannot catch their own
         | mistakes even some obvious ones. Like with GPT 3.5 I had a
         | persistent problem with the following question: "Who is older,
         | Annie Morton or Terry Richardson?". I was giving it Wikipedia
         | and it was correctly finding out the birth dates of the most
         | popular people with the names - but then instead of comparing
         | ages it was comparing birth years. And once it did that it was
         | impossible to it to spot the error.
         | 
         | Now with 4o-mini I have a similar even if not so obvious
         | problem.
         | 
         | Just writing this down convinced me that there are some ideas
         | to try here - taking a 'report' of the thought process out of
         | context and judging it there, or changing the temperature or
         | even maybe doing cross-checking with a different model?
        
           | tomrod wrote:
           | Brains are split internally, with each having their own
           | monologue. One happens to have command.
        
             | furyofantares wrote:
             | I don't think there's reason to believe both halves have a
             | monologue, is there? Experience, yes, but doesn't only one
             | half do language?
        
               | ggm wrote:
               | So if like me you have an interior dialogue, which is
               | speaking and which is listening or is it the same one? I
               | do not ascribe the speaker or listener to a lobe, but
               | whatever the language and comprehension centre(s)
               | is(are), it can do both at the same time.
        
               | furyofantares wrote:
               | Same half. My understanding is that in split brain
               | patients, it looks like the one half has extremely
               | limited ability to parse language and no ability to
               | create it.
        
               | Filligree wrote:
               | Neither of my halves need a monologue, thanks.
        
           | zby wrote:
           | Ah yeah - actually I tested that taking out of context. This
           | is the thing that surprised me - I thought it is about
           | 'writing itself into a corner - but even in a completely
           | different context the LLM is consistently doing an obvious
           | mistake. Here is the example: https://chatgpt.com/share/67667
           | 827-dd88-8008-952b-242a40c2ac...
           | 
           | Janet Waldo was playing Corliss Archer on radio - and the
           | quote the LLM found in Wikipedia was confirming it. But the
           | question was about film - and the LLM cannot spot the gap in
           | its reasoning - even if I try to warn it by telling it the
           | report came from a junior researcher.
        
         | neom wrote:
         | After I read attention is all you need, my first thought was:
         | "Orchestration is all you need". When 4o came out I published
         | this: https://b.h4x.zip/agi/
        
         | naasking wrote:
         | Interesting, because I almost think of it the opposite way.
         | LLMs are like system 1 thinking, fast, intuitive, based on what
         | you consider most probable based on what you know/have
         | experienced/have been trained on. System 2 thinking is
         | different, more careful, slower, logical, deductive, more like
         | symbolic reasoning. And then some metasystem to tie these two
         | together and make them work cohesively.
        
         | mikebelanger wrote:
         | > But I think that complex architectures are going to win out
         | over the "just scale up with more data and more compute"
         | approach.
         | 
         | I'm not sure about AGI, but for specialized jobs/tasks (ie
         | having a marketing agent that's familiar with your products and
         | knows how to copywrite for your products) will win over "just
         | add more compute/data" mass-market LLMs. This article does
         | encourage us to keep that architecture simple, which is
         | refreshing to hear. Kind of the AI version of rule of least
         | power.
         | 
         | Admittedly, I have a degree in Cognitive Science, which tended
         | to focus on good 'ol fashioned AI, so I have my biases.
        
       | serjester wrote:
       | Couldn't agree more with this - too many people rush to build
       | autonomous agents when their problem could easily be defined as a
       | DAG workflow. Agents increase the degrees of freedom in your
       | system exponentially making it so much more challenging to
       | evaluate systematically.
        
       | tonyhb wrote:
       | It looks like Agents are less about DAG workflows and fully
       | autonomous "networks of agents", but more of a stateful network:
       | 
       | * A "network of agents" is a system of agents and tools
       | 
       | * That run and build up state (both "memory" and actual state via
       | tool use)
       | 
       | * Which is then inspected when routing as a kind of "state
       | machine".
       | 
       | * Routing should specify which agent (or agents, in parallel) to
       | run next, via that state.
       | 
       | * Routing can also use other agents (routing agents) to figure
       | out what to do next, instead of code.
       | 
       | We're codifying this with durable workflows in a prototypical
       | library -- AgentKit: https://github.com/inngest/agent-kit/ (docs:
       | https://agentkit.inngest.com/overview).
       | 
       | It took less than a day to get a network of agents to correctly
       | fix swebench-lite examples. It's super early, but very fun. One
       | of the cool things is that this uses Inngest under the hood, so
       | you get all of the classic durable execution/step
       | function/tracing/o11y for free, but it's just regular code that
       | you write.
        
       | brotchie wrote:
       | Have been building agents for past 2 years, my tl;dr is that:
       | 
       |  _Agents are Interfaces, Not Implementations_
       | 
       | The current zeitgeist seems to think of agents as passthrough
       | agents: e.g. a lite wrapper around a core that's almost 100% a
       | LLM.
       | 
       | The most effective agents I've seen, and have built, are largely
       | traditional software engineering with a sprinkling of LLM calls
       | for "LLM hard" problems. LLM hard problems are problems that can
       | ONLY be solved by application of an LLM (creative writing, text
       | synthesis, intelligent decision making). Leave all the problems
       | that are amenable to decades of software engineering best
       | practice to good old deterministic code.
       | 
       | I've been calling system like this _" Transitional Software
       | Design."_ That is, they're mostly a traditional software
       | application under the hood (deterministic, well structured code,
       | separation of concerns) with judicious use of LLMs where
       | required.
       | 
       | Ultimately, users care about what the agent does, not how it does
       | it.
       | 
       | The biggest differentiator I've seen between agents that work and
       | get adoption, and those that are eternally in a demo phase, is
       | related to the cardinality of the state space the agent is
       | operating in. Too many folks try and "boil the ocean" and try and
       | implement a generic purpose capability: e.g. Generate Python code
       | to do something, or synthesizing SQL based on natural language.
       | 
       | The projects I've seen that work really focus on reducing the
       | state space of agent decision making down to the smallest
       | possible set that delivers user value.
       | 
       | e.g. Rather than generating arbitrary SQL, work out a set of ~20
       | SQL templates that are hyper-specific to the business problem
       | you're solving. Parameterize them with the options for select,
       | filter, group by, order by, and the subset of aggregate
       | operations that are relevant. Then let the agent chose the right
       | template + parameters from a relatively small finite set of
       | options.
       | 
       | ^^^ the delta in agent quality between "boiling the ocean" vs
       | "agent's free choice over a small state space" is night and day.
       | It lets you deploy early, deliver value, and start getting user
       | feedback.
       | 
       | Building Transitional Software Systems:                 1. Deeply
       | understand the domain and CUJs,       2. Segment out the system
       | into "problems that traditional software is good at solving" and
       | "LLM-hard problems",       3. For the LLM hard problems, work out
       | the smallest possible state space of decision making,       4.
       | Build the system, and get users using it,       5. Gradually
       | expand the state space as feedback flows in from users.
        
         | samdjstephens wrote:
         | There'll always be an advantage for those who understand the
         | problem they're solving for sure.
         | 
         | The balance of traditional software components and LLM driven
         | components in a system is an interesting topic - I wonder how
         | the capabilities of future generations of foundation model will
         | change that?
        
           | brotchie wrote:
           | Certain the end state is "one model to rule them all" hence
           | the "transitional."
           | 
           | Just that the pragmatic approach, today, given current LLM
           | capabilities, is to minimize the surface area / state space
           | that the LLM is actuating. And then gradually expand that
           | until the whole system is just a passthrough. But starting
           | with a passthrough kinda doesn't lead to great products in
           | December 2024.
        
         | CharlieDigital wrote:
         | Same experience.
         | 
         | The smaller and more focused the context, the higher the
         | consistency of output, and the lower the chance of jank.
         | 
         | Fundamentally no different than giving instructions to a junior
         | dev. Be more specific -- point them to the right docs, distill
         | the requirements, identify the relevant areas of the source --
         | to get good output.
         | 
         | My last attempt at a workflow of agents was at the 3.5 to 4
         | transition and OpenAI wasn't good enough at that point to
         | produce consistently good output and was slow to boot.
         | 
         | My team has taken the stance that getting consistently good
         | output from LLMs is really an ETL exercise: acquire, aggregate,
         | and transform the minimum relevant data for the output to reach
         | the desired level of quality and depth and let the LLM do it's
         | thing.
        
         | handfuloflight wrote:
         | When trying to do everything, they end up doing nothing.
        
         | shinryuu wrote:
         | Do you have a public example of a good agentic system. I would
         | like to experience it.
        
         | throw83288 wrote:
         | Unrelated, but since you seem to have experience here, how
         | would you recommend getting into the bleeding edge of
         | LLMs/Agents? Traditional SWE is obviously on it's way out, but
         | I can't even tell where to start with this new tech and
         | struggle to find ways to apply them to an actual project.
        
       | simonw wrote:
       | This is by far the most practical piece of writing I've seen on
       | the subject of "agents" - it includes actionable definitions,
       | then splits most of the value out into "workflows" and describes
       | those in depth with example applications.
       | 
       | There's also a cookbook with useful code examples:
       | https://github.com/anthropics/anthropic-cookbook/tree/main/p...
       | 
       | Blogged about this here:
       | https://simonwillison.net/2024/Dec/20/building-effective-age...
        
         | 3abiton wrote:
         | I'm glad they are publishing their cookbooks recipes on github
         | too. Openai used to be more active there.
        
         | NeutralForest wrote:
         | Thanks for all the write-ups on LLMs, you're on top of the news
         | and it makes it way easier to follow what's happening and the
         | existing implementations by following your blog instead.
        
         | th0ma5 wrote:
         | How do you protect from compounding errors?
        
           | tlarkworthy wrote:
           | read the article, close the feedback loop with something
           | verifiable (e.g. tests)
        
         | Animats wrote:
         | Yes, they have actionable definitions, but they are defining
         | something quite different than the normal definition of an
         | "agent". An agent is a party who acts for another. Often this
         | comes from an employer-employee relationship.
         | 
         | This matters mostly when things go wrong. Who's responsible?
         | The airline whose AI agent gave out wrong info about airline
         | policies found, in court, that their "intelligent agent" was
         | considered an agent in legal terms. Which meant the airline was
         | stuck paying for their mistake.
         | 
         | Anthropic's definition: Some customers define agents as fully
         | autonomous systems that operate independently over extended
         | periods, using various tools to accomplish complex tasks.
         | 
         | That's an autonomous system, not an agent. Autonomy is about
         | how much something can do without outside help. Agency is about
         | who's doing what for whom, and for whose benefit and with what
         | authority. Those are independent concepts.
        
           | solidasparagus wrote:
           | That's only one of many definitions for the word agent
           | outside of the context of AI. Another is something produces
           | effects on the world. Another is something that has agency.
           | 
           | Sort of interesting that we've coalesced on this term that
           | has many definitions, sometimes conflicting, but where many
           | of the definitions vaguely fit into what an "AI Agent" could
           | be for a given person.
           | 
           | But in the context of AI, Agent as Anthropic defines it is an
           | appropriate word because it is a thing that has agency.
        
             | Animats wrote:
             | > But in the context of AI, Agent as Anthropic defines it
             | is an appropriate word because it is a thing that has
             | agency.
             | 
             | That seems circular.
        
               | Nevermark wrote:
               | It would only be circular if agency was only defined as
               | "the property of being an agent". That circle of
               | reasoning isn't being proposed as the formal definitions
               | by anyone.
               | 
               | Perhaps you mean tautological. In which case, an agent
               | having agency would be an informal tautology. A
               | relationship so basic to the subject matter that it
               | essentially must be true. Which would be the strongest
               | possible type of argument.
        
           | simonw wrote:
           | Where did you get the idea that your definition there is the
           | "normal" definition of agent, especially in the context of
           | AI?
           | 
           | I ask because you seem _very_ confident in it - and my
           | biggest frustration about the term  "agent" is that so many
           | people are confident that their personal definition is
           | clearly the one everyone else should be using.
        
             | PhilippGille wrote:
             | Didn't he mention it was the court's definition?
             | 
             | But I'm not sure if that's true. The court didn't define
             | anything, in contrary they only said that (in simplified
             | terms) the chatbot was part of the website and it's
             | reasonable to expect the info on their website to be
             | accurate.
             | 
             | The closest I could find to the chatbot being considered an
             | agent in legal terms (an entity like an employee) is this:
             | 
             | > Air Canada argues it cannot be held liable for
             | information provided by one of its agents, servants, or
             | representatives - including a chatbot.
             | 
             | Source: https://www.canlii.org/en/bc/bccrt/doc/2024/2024bcc
             | rt149/202...
        
             | JonChesterfield wrote:
             | Defining "agent" as "thing with agency" seems legitimate to
             | me, what with them being the same word.
        
               | simonw wrote:
               | That logic doesn't work for me, because many words have
               | multiple meanings. "Agency" can also be a noun that means
               | an organization that you hire - like a design agency. Or
               | it can mean the CIA.
               | 
               | I'm not saying it's not a valid definition of the term,
               | I'm pushing back on the idea that it's THE single correct
               | definition of the term.
        
           | pvg wrote:
           | AI people have been using a much broader definition of
           | 'agent' for ages, though. One from Russel and Norvig's 90s
           | textbook:
           | 
           |  _" Anything that can be viewed as perceiving its environment
           | through sensors and acting upon that environment through
           | actuators"_
           | 
           | https://en.wikipedia.org/wiki/Intelligent_agent#As_a_definit.
           | ..
        
           | jeffreygoesto wrote:
           | And "autonomous" is "having one's own laws".
           | 
           | https://www.etymonline.com/word/autonomous
        
         | dmezzetti wrote:
         | If you're looking for a lightweight open-source framework
         | designed to handle the patterns mentioned in this article:
         | https://github.com/neuml/txtai
         | 
         | Disclaimer: I'm the author of the framework.
        
         | adeptima wrote:
         | 100% agree. I did a research on workflows, durable execution
         | engines in context of Agents and RAGs. Put some links in a
         | comment to the article below
        
       | ramesh31 wrote:
       | Key to understanding the power of agentic workflows is tool
       | usage. You don't have to write logic anymore, you simply give an
       | agent the tools it needs to accomplish a task and ask it to do
       | so. Models like the latest Sonnet have gotten so advanced now
       | that coding abilities are reaching superhuman levels. All the
       | hallucinations and "jitter" of models from 1-2 years ago has gone
       | away. They can be reasoned on now and you can build reliable
       | systems with them.
        
         | minimaxir wrote:
         | > you simply give an agent the tools
         | 
         | That isn't simple. There is a lot of nuance in tool definition.
        
           | ramesh31 wrote:
           | >That isn't simple. There is a lot of nuance in tool
           | definition.
           | 
           | Simple relative to the equivalent in traditional coding
           | techniques. If my system needs to know about some reference
           | material, I no longer need to build out endpoints and handle
           | all of that, or modify any kind of controller code. I just
           | ask the agent to include the information, and it intuitively
           | knows how to do that with a DB tool in its' context.
        
           | ripped_britches wrote:
           | Depends on what you're building. A general assistant is going
           | to have a lot of nuance. A well defined agent like a tutor
           | only has so many tools to call upon.
        
       | websku wrote:
       | indeed, we've seen this approach as well. All these "frameworks"
       | in real business cases become too complicated.
        
       | OutOfHere wrote:
       | Anthropic keeps advertising its MCP (Model Context Protocol), but
       | to the extent it doesn't support other LLMs, e.g. GPT, it
       | couldn't possibly gain adoption. I have yet to see any example of
       | MCP that can be extended to use a random LLM.
        
       | thoughtlede wrote:
       | When thinking about AI agents, there is still conflation between
       | how to decide the next step to take vs what information is needed
       | to decide the next step.
       | 
       | If runtime information is insufficient, we can use AI/ML models
       | to fill that information. But deciding the next step could be
       | done ahead of time assuming complete information.
       | 
       | Most AI agent examples short circuit these two steps. When faced
       | with unstructured or insufficient information, the program asks
       | the LLM/AI model to decide the next step. Instead, we could ask
       | the LLM/AI model to structure/predict necessary information and
       | use pre-defined rules to drive the process.
       | 
       | This approach will translate most [1] "Agent" examples into
       | "Workflow" examples. The quotes here are meant to imply
       | Anthropic's definition of these terms.
       | 
       | [1] I said "most" because there might be continuous world systems
       | (such as real world simulacrum) that will require a very large
       | number of rules and is probably impractical to define each of
       | them. I believe those systems are an exception, not a rule.
        
       | elpalek wrote:
       | Claude api lacks structured output, without uniformity in output,
       | it's not useful as agent. I've had agents system broke down
       | suddenly due to degradation in output, which leads to the
       | previous suggested json output hacks (from official cookbook)
       | stopped working.
        
       | jsemrau wrote:
       | Agents are still a misaligned concept in AI. While this article
       | offers a lot in orchestration, memory (only mentioned once in the
       | post) and governance are not really mentioned. The latter is
       | important to increase reliability -- something Ilya Sutskever
       | mentioned to be important as agents can be less deterministic in
       | their responses. Interestingly, "agency" i.e., the ability of the
       | agent to make own decisions is not mentioned once.
       | 
       | I work on CAAs and document my journey on my substack
       | (https://jdsmerau.substack.com)
        
         | handfuloflight wrote:
         | That URL says Not Found.
        
           | vacuity wrote:
           | Seems to be https://jdsemrau.substack.com/, also in their
           | bio.
        
             | jsemrau wrote:
             | Thanks. I was rushing out to the gym.
        
       | qianli_cs wrote:
       | Good article. I think it can emphasize a bit more on supporting
       | human interactions in agentic workflows. While composing
       | workflows isn't new, involving human-in-the-loop introduces huge
       | complexity, especially for long-running, async processes. Waiting
       | for human input (which could take days), managing retries, and
       | avoiding errors like duplicate refunds or missed updates require
       | careful orchestration.
       | 
       | I think this is where durable execution shines. By ensuring every
       | step in an async processing workflow is fault-tolerant and
       | durable, even interruptions won't lose progress. For example, in
       | a refund workflow, a durable system can resume exactly where it
       | left off--no duplicate refunds, no lost state.
        
       | mediumsmart wrote:
       | I have always voted for the _Unix style multiple do one thing
       | good blackboxes_ as the plumbing in the ruling agent.
       | 
       | Divide and conquer me hearties.
        
       | zer00eyz wrote:
       | Does any one have a solid examples of a real agent, deployed in
       | production?
        
       | mi_lk wrote:
       | Tangent but anyone know what software is used to draw those
       | workflow diagrams?
        
         | huydotnet wrote:
         | they probably use their own designers. jk. the arrows looks a
         | lot like Figma/Figjam.
        
       | melvinmelih wrote:
       | While I agree with the premise of keeping it simple (especially
       | when it comes to using opaque and overcomplicated frameworks like
       | LangChain/LangGraph!) I do believe there's a lot more to building
       | agentic systems than this article covers.
       | 
       | I recently wrote[1] about the 4 main components of autonomous AI
       | agents (Profile, Memory, Planning & Action) and all of that can
       | still be accomplished with simple LLM calls, but there's simply a
       | lot more to think about than simple workflow orchestration if you
       | are thinking of building production-ready autonomous agentic
       | systems.
       | 
       | [1] https://melvintercan.com/p/anatomy-of-an-autonomous-ai-agent
        
       | debois wrote:
       | Note how much the principles here resemble general programming
       | principles: keep complexity down, avoid frameworks if you can,
       | avoid unnecessary layers, make debugging easy, document, and
       | test.
       | 
       | It's as if AI took over the writing-the-program part of software
       | engineering, but sort of left all the rest.
        
       | zby wrote:
       | My wish list for LLM APIs to make them more useful for 'agentic'
       | workflows:
       | 
       | Finer grained control over the tools the LLM is supposed to use.
       | The 'tool_choice' should allow giving a list of tools to choose.
       | The point is that the list of all available tools is needed to
       | interpret the past tool calls - so you cannot use it to also
       | limit the LLM choice at a particular step. See also:
       | https://zzbbyy.substack.com/p/two-roles-of-tool-schemas
       | 
       | Control over how many tool calls can go in one request. For
       | stateful tools multiple tool calls in one request leads to
       | confusion.
       | 
       | By the way - is anyone working with stateful tools? Often they
       | seem very natural and you would think that the LLM at training
       | should encounter lots of stateful interactions and be skilled in
       | using them. But there aren't many examples and the libraries are
       | not really geared towards that.
        
       | zaran wrote:
       | This was an excellent writeup - felt a bit surprised at how much
       | they considered "workflow" instead of agent but I think it's good
       | to start to narrow down the terminology
       | 
       | I think these days the main value of the LLM "agent" frameworks
       | is being able to trivially switch between model providers, though
       | even that breaks down when you start to use more esoteric
       | features that may not be implemented in cleanly overlapping ways
        
       | jmj wrote:
       | I'm part of a team that is currently #1 at the SWEBench-lite
       | benchmark. Interesting times!
        
       | adeptima wrote:
       | The whole Agent thing can easily blow in complexity.
       | 
       | Here some challenges I personally faced recently
       | 
       | - Durable Execution Paradigm: You may need the system to operate
       | in a "durable execution" fashion like Temporal, Hatchet, Inngest,
       | and Windmill. Your processes need to run for months, be upgraded
       | and restarted. Links below
       | 
       | - FSM vs. DAG: Sometimes, a Finite State Machine (FSM) is more
       | appropriate than a Directed Acyclic Graph (DAG) for my use cases.
       | FSMs support cyclic behavior, allowing for repeated states or
       | loops (e.g., in marketing sequences). FSM done right is hard. If
       | you need FSM, you can't use most tools without "magic" hacking
       | 
       | - Observability and Tracing - takes time to put it everything
       | nice in Grafana (Alloy, Tempo, Loki, Prometheus) or whatever you
       | prefer. Attention switch between multiple systems is not an
       | option during to limited attention span and "skills" issue. Most
       | of "out of box" functionality or new Agents frameworks quickly
       | becomes a liability
       | 
       | - Token/Inference Economy - token consumption and identifying
       | edge cases with poor token management is a challenge, similar to
       | Ethereum's gas consumption issues. Building a billing system
       | based on actual consumption on the top of Stripe was a challenge.
       | This is even 10x harder ... at least for me ;)
       | 
       | - Context Switching - managing context switching is akin to
       | handling concurrency and scheduling with async/await paradigms,
       | which can become complex. Simple prompts is a ok, but once you
       | start joggling documents or screenshots or screen reading it's
       | another game.
       | 
       | What I like about the all above it's nothing new - all design
       | patterns, architecture are known for a while.
       | 
       | It's just hard to see it through AI/ML buzzwords storm ... but
       | once you start looking at source code ... the fog of mind wars
       | become clear.
       | 
       | Durable Execution / Workflow Engines
       | 
       | - Temporal https://github.com/temporalio -
       | https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
       | 
       | - Hatchet https://news.ycombinator.com/item?id=39643136
       | 
       | - Inngest https://news.ycombinator.com/item?id=36403014
       | 
       | - Windmill https://news.ycombinator.com/item?id=35920082
       | 
       | Any comments and links on the above challenges and solutions are
       | greatly appreciated!
        
         | ldjkfkdsjnv wrote:
         | Which do you think is the best workflow engine to use here?
         | I've chosen temporal. Engineering management and their
         | background at AWS means the platform is rock solid.
        
       | still_w1 wrote:
       | Slightly off topic, but does anyone have a suggestion for a tool
       | to make the visualizations of the different architectures like in
       | this post?
        
         | asd33313131 wrote:
         | https://www.mermaidchart.com/
        
       ___________________________________________________________________
       (page generated 2024-12-21 18:00 UTC)