[HN Gopher] Building Effective "Agents"
___________________________________________________________________
Building Effective "Agents"
Author : jascha_eng
Score : 510 points
Date : 2024-12-20 12:29 UTC (1 days ago)
(HTM) web link (www.anthropic.com)
(TXT) w3m dump (www.anthropic.com)
| jascha_eng wrote:
| I put the agents in quotes because anthropic actually talks more
| about what they call "workflows". And imo this is where the real
| value of LLMs currently lies, workflow automation.
|
| They also say that using LangChain and other frameworks is mostly
| unnecessary and does more harm than good. They instead argue to
| use some simple patterns, directly on the API level. Not dis-
| similar to the old-school Gang of Four software engineering
| patterns.
|
| Really like this post as a guidance for how to actually build
| useful tools with LLMs. Keep it simple, stupid.
| curious_cat_163 wrote:
| Indeed. Very clarifying.
|
| I would just posit that they do make a distinction between
| workflows and agents
| Philpax wrote:
| Aren't you editorialising by doing so?
| jascha_eng wrote:
| I guess a little. I really liked the read though, it put in
| words what I couldn't and I was curious if others felt the
| same.
|
| However the post was posted here yesterday and didn't really
| have a lot of traction. I thought this was partially because
| of the term agentic, which the community seems a bit fatigued
| by. So I put it in quotes to highlight that Anthropic
| themselves deems it a little vague and hopefully spark more
| interest. I don't think it messes with their message too
| much?
|
| Honestly it didn't matter anyways, without second chance
| pooling this post would have been lost again (so thanks
| Daniel!)
| rybosome wrote:
| I felt deeply vindicated by their assessment of these
| frameworks, in particular LangChain.
|
| I've built and/or worked on a few different LLM-based
| workflows, and LangChain definitely makes things worse in my
| opinion.
|
| What it boils down to is that we are still coming to understand
| the right patterns of development for how to develop agents and
| agentic workflows. LangChain made choices about how to abstract
| things that are not general or universal enough to be useful.
| wahnfrieden wrote:
| The article does not mention the LangChain framework.
| LangGraph is a different framework, have you tried it?
| ankit219 wrote:
| Deploying in production, the current agentic systems do not
| really work well. Workflow automation does. The reason is very
| native to LLMs, but also incredibly basic. Every agentic system
| starts with planning and reasoning module, where an LLM
| evaluates the task given and plans about how to accomplish that
| task, before moving on to next steps.
|
| When an agent is given a task, they inevitably come up with
| different plans on different tries due to inherent nature of
| LLMs. Most companies like this step to be predictable, and they
| end up removing it from the system and doing it manually. Thus
| turning it into a workflow automation vs an agentic system. I
| think this is what people actually mean when they want to
| deploy agents in production. LLMs are great at automation*, not
| great at problem solving. Examples I have seen - customer
| support (you want predictability), lead mining, marketing copy
| generation, code flows and architecture, product specs
| generation, etc.
|
| The next leap for AI systems is going to be whether they can
| solve challenging problems at companies - being the experts vs
| the doing the task they are assigned. They should really be
| called agents, not the current ones.
| Kydlaw wrote:
| In fact they are mentioning LangGraph (the agent framework from
| the LangChain company). Imo LangGraph is a much more thoughtful
| and better built piece of software than the LangChain
| framework.
|
| As I said, they already mention LangGraph in the article, so
| the Anthropic's conclusions still hold (i.e. KISS).
|
| But this thread is going in the wrong direction when talking
| about LangChain
| jascha_eng wrote:
| I'm lumping them all in the same category tbh. They say to
| just use the model libraries directly or a thin abstraction
| layer (like litellm maybe?) if you want to keep flexibility
| to change models easily.
| curious_cat_163 wrote:
| > Agents can be used for open-ended problems where it's difficult
| or impossible to predict the required number of steps, and where
| you can't hardcode a fixed path. The LLM will potentially operate
| for many turns, and you must have some level of trust in its
| decision-making. Agents' autonomy makes them ideal for scaling
| tasks in trusted environments.
|
| The questions then become:
|
| 1. When can you (i.e. a person who wants to build systems with
| them) trust them to make decisions on their own?
|
| 2. What type of trusted environments are we talking about?
| (Sandboxing?)
|
| So, that all requires more thought -- perhaps by some folks who
| hang out at this site. :)
|
| I suspect that someone will come up with a "real-world"
| application at a non-tech-first enterprise company and let us
| know.
| ripped_britches wrote:
| Just take any example and think how a human would break it down
| with decision trees.
|
| You are building an AI system to respond to your email.
|
| The first agent decides whether the new email should be
| responded to, yes or no.
|
| If no, it can send it to another LLM call that decides to
| archive it or leave it in the inbox for the human.
|
| If yes, it sends it to classifier that decides what type of
| response is required.
|
| Maybe there are some emails like for your work that require
| something brief like "congrats!" to all those new feature
| launch emails you get internally.
|
| Or others that are inbound sales emails that need to go out to
| another system that fetches product related knowledge to craft
| a response with the right context. Followed by a checker call
| that makes sure the response follows brand guidelines.
|
| The point is all of these steps are completely hypothetical but
| you can imagine how loosely providing some set of instructions
| and function calls and procedural limits can easily classify
| things and minimize error rate.
|
| You can do this for any workflow by creatively combining
| different function calls, recursion, procedural limits, etc.
| And if you build multiple different decision trees/workflows,
| you can A/B test those and use LLM-as-a-judge to score the
| performance. Especially if you're working on a task with lots
| of example outputs.
|
| As for trusted environments, assume every single LLM call has
| been hijacked and don't trust its input/output and you'll be
| good. I put mine in their own cloudflare workers where they
| can't do any damage beyond giving an odd response to the user.
| skydhash wrote:
| > _The first agent decides whether the new email should be
| responded to, yes or no._
|
| How would you trust that the agent is following the criteria,
| and how sure that the criteria is specific enough. Like
| someone you just meet told you they going to send you
| something via email, but then the agent misinterpret it due
| to missing context and decided to respond in a generic manner
| leading to misunderstanding.
|
| > _assume every single LLM call has been hijacked and don't
| trust its input /output and you'll be good._
|
| Which is not new. But with formal languages, you have a more
| precise definition of what acceptable inputs are (the whole
| point of formalism is precise definitions). With LLM
| workflows, the whole environment should be assumed to be
| public information. And you should probably add a fine point
| that the output does not engage you in anything.
| TobTobXX wrote:
| > How would you trust that the agent is following the
| criteria, and how sure that the criteria is specific
| enough?
|
| How do you know if a spam filter heuristic works only when
| intended?
|
| You test it. Hard. On the thousands of emails in your
| archive, on edge-cases you prepare manually, and on the
| incoming mails. If it doesn't work for some cases, write
| tests that test for this, adjust prompt and run the test
| suite.
|
| It won't ever work in 100% of all cases, but neither do
| spam filters and we still use them.
| timdellinger wrote:
| My personal view is that the roadmap to AGI requires an LLM
| acting as a prefrontal cortex: something designed to think about
| thinking.
|
| It would decide what circumstances call for double-checking facts
| for accuracy, which would hopefully catch hallucinations. It
| would write its own acceptance criteria for its answers, etc.
|
| It's not clear to me how to train each of the sub-models
| required, or how big (or small!) they need to be, or what
| architecture works best. But I think that complex architectures
| are going to win out over the "just scale up with more data and
| more compute" approach.
| zby wrote:
| IMHO with a simple loop LLMs are already capable of some meta
| thinking, even without any internal new architectures. For me
| where it still fails is that LLMs cannot catch their own
| mistakes even some obvious ones. Like with GPT 3.5 I had a
| persistent problem with the following question: "Who is older,
| Annie Morton or Terry Richardson?". I was giving it Wikipedia
| and it was correctly finding out the birth dates of the most
| popular people with the names - but then instead of comparing
| ages it was comparing birth years. And once it did that it was
| impossible to it to spot the error.
|
| Now with 4o-mini I have a similar even if not so obvious
| problem.
|
| Just writing this down convinced me that there are some ideas
| to try here - taking a 'report' of the thought process out of
| context and judging it there, or changing the temperature or
| even maybe doing cross-checking with a different model?
| tomrod wrote:
| Brains are split internally, with each having their own
| monologue. One happens to have command.
| furyofantares wrote:
| I don't think there's reason to believe both halves have a
| monologue, is there? Experience, yes, but doesn't only one
| half do language?
| ggm wrote:
| So if like me you have an interior dialogue, which is
| speaking and which is listening or is it the same one? I
| do not ascribe the speaker or listener to a lobe, but
| whatever the language and comprehension centre(s)
| is(are), it can do both at the same time.
| furyofantares wrote:
| Same half. My understanding is that in split brain
| patients, it looks like the one half has extremely
| limited ability to parse language and no ability to
| create it.
| Filligree wrote:
| Neither of my halves need a monologue, thanks.
| zby wrote:
| Ah yeah - actually I tested that taking out of context. This
| is the thing that surprised me - I thought it is about
| 'writing itself into a corner - but even in a completely
| different context the LLM is consistently doing an obvious
| mistake. Here is the example: https://chatgpt.com/share/67667
| 827-dd88-8008-952b-242a40c2ac...
|
| Janet Waldo was playing Corliss Archer on radio - and the
| quote the LLM found in Wikipedia was confirming it. But the
| question was about film - and the LLM cannot spot the gap in
| its reasoning - even if I try to warn it by telling it the
| report came from a junior researcher.
| neom wrote:
| After I read attention is all you need, my first thought was:
| "Orchestration is all you need". When 4o came out I published
| this: https://b.h4x.zip/agi/
| naasking wrote:
| Interesting, because I almost think of it the opposite way.
| LLMs are like system 1 thinking, fast, intuitive, based on what
| you consider most probable based on what you know/have
| experienced/have been trained on. System 2 thinking is
| different, more careful, slower, logical, deductive, more like
| symbolic reasoning. And then some metasystem to tie these two
| together and make them work cohesively.
| mikebelanger wrote:
| > But I think that complex architectures are going to win out
| over the "just scale up with more data and more compute"
| approach.
|
| I'm not sure about AGI, but for specialized jobs/tasks (ie
| having a marketing agent that's familiar with your products and
| knows how to copywrite for your products) will win over "just
| add more compute/data" mass-market LLMs. This article does
| encourage us to keep that architecture simple, which is
| refreshing to hear. Kind of the AI version of rule of least
| power.
|
| Admittedly, I have a degree in Cognitive Science, which tended
| to focus on good 'ol fashioned AI, so I have my biases.
| serjester wrote:
| Couldn't agree more with this - too many people rush to build
| autonomous agents when their problem could easily be defined as a
| DAG workflow. Agents increase the degrees of freedom in your
| system exponentially making it so much more challenging to
| evaluate systematically.
| tonyhb wrote:
| It looks like Agents are less about DAG workflows and fully
| autonomous "networks of agents", but more of a stateful network:
|
| * A "network of agents" is a system of agents and tools
|
| * That run and build up state (both "memory" and actual state via
| tool use)
|
| * Which is then inspected when routing as a kind of "state
| machine".
|
| * Routing should specify which agent (or agents, in parallel) to
| run next, via that state.
|
| * Routing can also use other agents (routing agents) to figure
| out what to do next, instead of code.
|
| We're codifying this with durable workflows in a prototypical
| library -- AgentKit: https://github.com/inngest/agent-kit/ (docs:
| https://agentkit.inngest.com/overview).
|
| It took less than a day to get a network of agents to correctly
| fix swebench-lite examples. It's super early, but very fun. One
| of the cool things is that this uses Inngest under the hood, so
| you get all of the classic durable execution/step
| function/tracing/o11y for free, but it's just regular code that
| you write.
| brotchie wrote:
| Have been building agents for past 2 years, my tl;dr is that:
|
| _Agents are Interfaces, Not Implementations_
|
| The current zeitgeist seems to think of agents as passthrough
| agents: e.g. a lite wrapper around a core that's almost 100% a
| LLM.
|
| The most effective agents I've seen, and have built, are largely
| traditional software engineering with a sprinkling of LLM calls
| for "LLM hard" problems. LLM hard problems are problems that can
| ONLY be solved by application of an LLM (creative writing, text
| synthesis, intelligent decision making). Leave all the problems
| that are amenable to decades of software engineering best
| practice to good old deterministic code.
|
| I've been calling system like this _" Transitional Software
| Design."_ That is, they're mostly a traditional software
| application under the hood (deterministic, well structured code,
| separation of concerns) with judicious use of LLMs where
| required.
|
| Ultimately, users care about what the agent does, not how it does
| it.
|
| The biggest differentiator I've seen between agents that work and
| get adoption, and those that are eternally in a demo phase, is
| related to the cardinality of the state space the agent is
| operating in. Too many folks try and "boil the ocean" and try and
| implement a generic purpose capability: e.g. Generate Python code
| to do something, or synthesizing SQL based on natural language.
|
| The projects I've seen that work really focus on reducing the
| state space of agent decision making down to the smallest
| possible set that delivers user value.
|
| e.g. Rather than generating arbitrary SQL, work out a set of ~20
| SQL templates that are hyper-specific to the business problem
| you're solving. Parameterize them with the options for select,
| filter, group by, order by, and the subset of aggregate
| operations that are relevant. Then let the agent chose the right
| template + parameters from a relatively small finite set of
| options.
|
| ^^^ the delta in agent quality between "boiling the ocean" vs
| "agent's free choice over a small state space" is night and day.
| It lets you deploy early, deliver value, and start getting user
| feedback.
|
| Building Transitional Software Systems: 1. Deeply
| understand the domain and CUJs, 2. Segment out the system
| into "problems that traditional software is good at solving" and
| "LLM-hard problems", 3. For the LLM hard problems, work out
| the smallest possible state space of decision making, 4.
| Build the system, and get users using it, 5. Gradually
| expand the state space as feedback flows in from users.
| samdjstephens wrote:
| There'll always be an advantage for those who understand the
| problem they're solving for sure.
|
| The balance of traditional software components and LLM driven
| components in a system is an interesting topic - I wonder how
| the capabilities of future generations of foundation model will
| change that?
| brotchie wrote:
| Certain the end state is "one model to rule them all" hence
| the "transitional."
|
| Just that the pragmatic approach, today, given current LLM
| capabilities, is to minimize the surface area / state space
| that the LLM is actuating. And then gradually expand that
| until the whole system is just a passthrough. But starting
| with a passthrough kinda doesn't lead to great products in
| December 2024.
| CharlieDigital wrote:
| Same experience.
|
| The smaller and more focused the context, the higher the
| consistency of output, and the lower the chance of jank.
|
| Fundamentally no different than giving instructions to a junior
| dev. Be more specific -- point them to the right docs, distill
| the requirements, identify the relevant areas of the source --
| to get good output.
|
| My last attempt at a workflow of agents was at the 3.5 to 4
| transition and OpenAI wasn't good enough at that point to
| produce consistently good output and was slow to boot.
|
| My team has taken the stance that getting consistently good
| output from LLMs is really an ETL exercise: acquire, aggregate,
| and transform the minimum relevant data for the output to reach
| the desired level of quality and depth and let the LLM do it's
| thing.
| handfuloflight wrote:
| When trying to do everything, they end up doing nothing.
| shinryuu wrote:
| Do you have a public example of a good agentic system. I would
| like to experience it.
| throw83288 wrote:
| Unrelated, but since you seem to have experience here, how
| would you recommend getting into the bleeding edge of
| LLMs/Agents? Traditional SWE is obviously on it's way out, but
| I can't even tell where to start with this new tech and
| struggle to find ways to apply them to an actual project.
| simonw wrote:
| This is by far the most practical piece of writing I've seen on
| the subject of "agents" - it includes actionable definitions,
| then splits most of the value out into "workflows" and describes
| those in depth with example applications.
|
| There's also a cookbook with useful code examples:
| https://github.com/anthropics/anthropic-cookbook/tree/main/p...
|
| Blogged about this here:
| https://simonwillison.net/2024/Dec/20/building-effective-age...
| 3abiton wrote:
| I'm glad they are publishing their cookbooks recipes on github
| too. Openai used to be more active there.
| NeutralForest wrote:
| Thanks for all the write-ups on LLMs, you're on top of the news
| and it makes it way easier to follow what's happening and the
| existing implementations by following your blog instead.
| th0ma5 wrote:
| How do you protect from compounding errors?
| tlarkworthy wrote:
| read the article, close the feedback loop with something
| verifiable (e.g. tests)
| Animats wrote:
| Yes, they have actionable definitions, but they are defining
| something quite different than the normal definition of an
| "agent". An agent is a party who acts for another. Often this
| comes from an employer-employee relationship.
|
| This matters mostly when things go wrong. Who's responsible?
| The airline whose AI agent gave out wrong info about airline
| policies found, in court, that their "intelligent agent" was
| considered an agent in legal terms. Which meant the airline was
| stuck paying for their mistake.
|
| Anthropic's definition: Some customers define agents as fully
| autonomous systems that operate independently over extended
| periods, using various tools to accomplish complex tasks.
|
| That's an autonomous system, not an agent. Autonomy is about
| how much something can do without outside help. Agency is about
| who's doing what for whom, and for whose benefit and with what
| authority. Those are independent concepts.
| solidasparagus wrote:
| That's only one of many definitions for the word agent
| outside of the context of AI. Another is something produces
| effects on the world. Another is something that has agency.
|
| Sort of interesting that we've coalesced on this term that
| has many definitions, sometimes conflicting, but where many
| of the definitions vaguely fit into what an "AI Agent" could
| be for a given person.
|
| But in the context of AI, Agent as Anthropic defines it is an
| appropriate word because it is a thing that has agency.
| Animats wrote:
| > But in the context of AI, Agent as Anthropic defines it
| is an appropriate word because it is a thing that has
| agency.
|
| That seems circular.
| Nevermark wrote:
| It would only be circular if agency was only defined as
| "the property of being an agent". That circle of
| reasoning isn't being proposed as the formal definitions
| by anyone.
|
| Perhaps you mean tautological. In which case, an agent
| having agency would be an informal tautology. A
| relationship so basic to the subject matter that it
| essentially must be true. Which would be the strongest
| possible type of argument.
| simonw wrote:
| Where did you get the idea that your definition there is the
| "normal" definition of agent, especially in the context of
| AI?
|
| I ask because you seem _very_ confident in it - and my
| biggest frustration about the term "agent" is that so many
| people are confident that their personal definition is
| clearly the one everyone else should be using.
| PhilippGille wrote:
| Didn't he mention it was the court's definition?
|
| But I'm not sure if that's true. The court didn't define
| anything, in contrary they only said that (in simplified
| terms) the chatbot was part of the website and it's
| reasonable to expect the info on their website to be
| accurate.
|
| The closest I could find to the chatbot being considered an
| agent in legal terms (an entity like an employee) is this:
|
| > Air Canada argues it cannot be held liable for
| information provided by one of its agents, servants, or
| representatives - including a chatbot.
|
| Source: https://www.canlii.org/en/bc/bccrt/doc/2024/2024bcc
| rt149/202...
| JonChesterfield wrote:
| Defining "agent" as "thing with agency" seems legitimate to
| me, what with them being the same word.
| simonw wrote:
| That logic doesn't work for me, because many words have
| multiple meanings. "Agency" can also be a noun that means
| an organization that you hire - like a design agency. Or
| it can mean the CIA.
|
| I'm not saying it's not a valid definition of the term,
| I'm pushing back on the idea that it's THE single correct
| definition of the term.
| pvg wrote:
| AI people have been using a much broader definition of
| 'agent' for ages, though. One from Russel and Norvig's 90s
| textbook:
|
| _" Anything that can be viewed as perceiving its environment
| through sensors and acting upon that environment through
| actuators"_
|
| https://en.wikipedia.org/wiki/Intelligent_agent#As_a_definit.
| ..
| jeffreygoesto wrote:
| And "autonomous" is "having one's own laws".
|
| https://www.etymonline.com/word/autonomous
| dmezzetti wrote:
| If you're looking for a lightweight open-source framework
| designed to handle the patterns mentioned in this article:
| https://github.com/neuml/txtai
|
| Disclaimer: I'm the author of the framework.
| adeptima wrote:
| 100% agree. I did a research on workflows, durable execution
| engines in context of Agents and RAGs. Put some links in a
| comment to the article below
| ramesh31 wrote:
| Key to understanding the power of agentic workflows is tool
| usage. You don't have to write logic anymore, you simply give an
| agent the tools it needs to accomplish a task and ask it to do
| so. Models like the latest Sonnet have gotten so advanced now
| that coding abilities are reaching superhuman levels. All the
| hallucinations and "jitter" of models from 1-2 years ago has gone
| away. They can be reasoned on now and you can build reliable
| systems with them.
| minimaxir wrote:
| > you simply give an agent the tools
|
| That isn't simple. There is a lot of nuance in tool definition.
| ramesh31 wrote:
| >That isn't simple. There is a lot of nuance in tool
| definition.
|
| Simple relative to the equivalent in traditional coding
| techniques. If my system needs to know about some reference
| material, I no longer need to build out endpoints and handle
| all of that, or modify any kind of controller code. I just
| ask the agent to include the information, and it intuitively
| knows how to do that with a DB tool in its' context.
| ripped_britches wrote:
| Depends on what you're building. A general assistant is going
| to have a lot of nuance. A well defined agent like a tutor
| only has so many tools to call upon.
| websku wrote:
| indeed, we've seen this approach as well. All these "frameworks"
| in real business cases become too complicated.
| OutOfHere wrote:
| Anthropic keeps advertising its MCP (Model Context Protocol), but
| to the extent it doesn't support other LLMs, e.g. GPT, it
| couldn't possibly gain adoption. I have yet to see any example of
| MCP that can be extended to use a random LLM.
| thoughtlede wrote:
| When thinking about AI agents, there is still conflation between
| how to decide the next step to take vs what information is needed
| to decide the next step.
|
| If runtime information is insufficient, we can use AI/ML models
| to fill that information. But deciding the next step could be
| done ahead of time assuming complete information.
|
| Most AI agent examples short circuit these two steps. When faced
| with unstructured or insufficient information, the program asks
| the LLM/AI model to decide the next step. Instead, we could ask
| the LLM/AI model to structure/predict necessary information and
| use pre-defined rules to drive the process.
|
| This approach will translate most [1] "Agent" examples into
| "Workflow" examples. The quotes here are meant to imply
| Anthropic's definition of these terms.
|
| [1] I said "most" because there might be continuous world systems
| (such as real world simulacrum) that will require a very large
| number of rules and is probably impractical to define each of
| them. I believe those systems are an exception, not a rule.
| elpalek wrote:
| Claude api lacks structured output, without uniformity in output,
| it's not useful as agent. I've had agents system broke down
| suddenly due to degradation in output, which leads to the
| previous suggested json output hacks (from official cookbook)
| stopped working.
| jsemrau wrote:
| Agents are still a misaligned concept in AI. While this article
| offers a lot in orchestration, memory (only mentioned once in the
| post) and governance are not really mentioned. The latter is
| important to increase reliability -- something Ilya Sutskever
| mentioned to be important as agents can be less deterministic in
| their responses. Interestingly, "agency" i.e., the ability of the
| agent to make own decisions is not mentioned once.
|
| I work on CAAs and document my journey on my substack
| (https://jdsmerau.substack.com)
| handfuloflight wrote:
| That URL says Not Found.
| vacuity wrote:
| Seems to be https://jdsemrau.substack.com/, also in their
| bio.
| jsemrau wrote:
| Thanks. I was rushing out to the gym.
| qianli_cs wrote:
| Good article. I think it can emphasize a bit more on supporting
| human interactions in agentic workflows. While composing
| workflows isn't new, involving human-in-the-loop introduces huge
| complexity, especially for long-running, async processes. Waiting
| for human input (which could take days), managing retries, and
| avoiding errors like duplicate refunds or missed updates require
| careful orchestration.
|
| I think this is where durable execution shines. By ensuring every
| step in an async processing workflow is fault-tolerant and
| durable, even interruptions won't lose progress. For example, in
| a refund workflow, a durable system can resume exactly where it
| left off--no duplicate refunds, no lost state.
| mediumsmart wrote:
| I have always voted for the _Unix style multiple do one thing
| good blackboxes_ as the plumbing in the ruling agent.
|
| Divide and conquer me hearties.
| zer00eyz wrote:
| Does any one have a solid examples of a real agent, deployed in
| production?
| mi_lk wrote:
| Tangent but anyone know what software is used to draw those
| workflow diagrams?
| huydotnet wrote:
| they probably use their own designers. jk. the arrows looks a
| lot like Figma/Figjam.
| melvinmelih wrote:
| While I agree with the premise of keeping it simple (especially
| when it comes to using opaque and overcomplicated frameworks like
| LangChain/LangGraph!) I do believe there's a lot more to building
| agentic systems than this article covers.
|
| I recently wrote[1] about the 4 main components of autonomous AI
| agents (Profile, Memory, Planning & Action) and all of that can
| still be accomplished with simple LLM calls, but there's simply a
| lot more to think about than simple workflow orchestration if you
| are thinking of building production-ready autonomous agentic
| systems.
|
| [1] https://melvintercan.com/p/anatomy-of-an-autonomous-ai-agent
| debois wrote:
| Note how much the principles here resemble general programming
| principles: keep complexity down, avoid frameworks if you can,
| avoid unnecessary layers, make debugging easy, document, and
| test.
|
| It's as if AI took over the writing-the-program part of software
| engineering, but sort of left all the rest.
| zby wrote:
| My wish list for LLM APIs to make them more useful for 'agentic'
| workflows:
|
| Finer grained control over the tools the LLM is supposed to use.
| The 'tool_choice' should allow giving a list of tools to choose.
| The point is that the list of all available tools is needed to
| interpret the past tool calls - so you cannot use it to also
| limit the LLM choice at a particular step. See also:
| https://zzbbyy.substack.com/p/two-roles-of-tool-schemas
|
| Control over how many tool calls can go in one request. For
| stateful tools multiple tool calls in one request leads to
| confusion.
|
| By the way - is anyone working with stateful tools? Often they
| seem very natural and you would think that the LLM at training
| should encounter lots of stateful interactions and be skilled in
| using them. But there aren't many examples and the libraries are
| not really geared towards that.
| zaran wrote:
| This was an excellent writeup - felt a bit surprised at how much
| they considered "workflow" instead of agent but I think it's good
| to start to narrow down the terminology
|
| I think these days the main value of the LLM "agent" frameworks
| is being able to trivially switch between model providers, though
| even that breaks down when you start to use more esoteric
| features that may not be implemented in cleanly overlapping ways
| jmj wrote:
| I'm part of a team that is currently #1 at the SWEBench-lite
| benchmark. Interesting times!
| adeptima wrote:
| The whole Agent thing can easily blow in complexity.
|
| Here some challenges I personally faced recently
|
| - Durable Execution Paradigm: You may need the system to operate
| in a "durable execution" fashion like Temporal, Hatchet, Inngest,
| and Windmill. Your processes need to run for months, be upgraded
| and restarted. Links below
|
| - FSM vs. DAG: Sometimes, a Finite State Machine (FSM) is more
| appropriate than a Directed Acyclic Graph (DAG) for my use cases.
| FSMs support cyclic behavior, allowing for repeated states or
| loops (e.g., in marketing sequences). FSM done right is hard. If
| you need FSM, you can't use most tools without "magic" hacking
|
| - Observability and Tracing - takes time to put it everything
| nice in Grafana (Alloy, Tempo, Loki, Prometheus) or whatever you
| prefer. Attention switch between multiple systems is not an
| option during to limited attention span and "skills" issue. Most
| of "out of box" functionality or new Agents frameworks quickly
| becomes a liability
|
| - Token/Inference Economy - token consumption and identifying
| edge cases with poor token management is a challenge, similar to
| Ethereum's gas consumption issues. Building a billing system
| based on actual consumption on the top of Stripe was a challenge.
| This is even 10x harder ... at least for me ;)
|
| - Context Switching - managing context switching is akin to
| handling concurrency and scheduling with async/await paradigms,
| which can become complex. Simple prompts is a ok, but once you
| start joggling documents or screenshots or screen reading it's
| another game.
|
| What I like about the all above it's nothing new - all design
| patterns, architecture are known for a while.
|
| It's just hard to see it through AI/ML buzzwords storm ... but
| once you start looking at source code ... the fog of mind wars
| become clear.
|
| Durable Execution / Workflow Engines
|
| - Temporal https://github.com/temporalio -
| https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
|
| - Hatchet https://news.ycombinator.com/item?id=39643136
|
| - Inngest https://news.ycombinator.com/item?id=36403014
|
| - Windmill https://news.ycombinator.com/item?id=35920082
|
| Any comments and links on the above challenges and solutions are
| greatly appreciated!
| ldjkfkdsjnv wrote:
| Which do you think is the best workflow engine to use here?
| I've chosen temporal. Engineering management and their
| background at AWS means the platform is rock solid.
| still_w1 wrote:
| Slightly off topic, but does anyone have a suggestion for a tool
| to make the visualizations of the different architectures like in
| this post?
| asd33313131 wrote:
| https://www.mermaidchart.com/
___________________________________________________________________
(page generated 2024-12-21 18:00 UTC)