[HN Gopher] From LLM to AI Agent: What's the Real Journey Behind...
___________________________________________________________________
From LLM to AI Agent: What's the Real Journey Behind AI System
Development?
Author : codelink
Score : 111 points
Date : 2025-06-19 09:29 UTC (13 hours ago)
(HTM) web link (www.codelink.io)
(TXT) w3m dump (www.codelink.io)
| nilirl wrote:
| > AI Agents can initiate workflows independently and determine
| their sequence and combination dynamically
|
| I'm confused.
|
| A workflow has hardcoded branching paths; explicit if conditions
| and instructions on how to behave if true.
|
| So for an agent, instead of specifying explicit if conditions,
| you specify outcomes and you leave the LLM to figure out what if
| conditions apply and how to deal with them?
|
| In the case of this resume screening application, would I just
| provide the ability to make API calls and then add this to the
| prompt: "Decide what a good fit would be."?
|
| Are there any serious applications built this way? Or am I
| missing something?
| manojlds wrote:
| Not all applications need to be built this way. But the most
| serious apps built this way would be deep research
|
| Recent article from Anthropic -
| https://www.anthropic.com/engineering/built-multi-agent-rese...
| alganet wrote:
| An AI company doing it is the corporate equivalent of "works
| on my machine".
|
| Can you give us an example of a company not involved in AI
| research that does it?
| herval wrote:
| There's plenty of companies using these sorts of agentic
| systems these days already. In my case, we wrote an LLM
| that knows how to fetch data from a bunch of sources (logs,
| analytics, etc) and root causes incidents. Not all sources
| make sense for all incidents, most queries have crazy high
| cardinality and the data correlation isn't always possible.
| LLMs being pattern matching machines, this allows them to
| determine what to fetch, then it pattern matches a cause
| based on other tools it has access (eg runbooks, google
| searches)
|
| I built incident detection systems in the past, and this
| was orders of magnitude easier and more generalizable for
| new kinds of issues. It still gives meaningless/obvious
| reasoning frequently, but it's far, far better than the
| alternatives...
| alganet wrote:
| > we wrote an LLM
|
| Excuse me, what?
| herval wrote:
| LLM _automation_. I'm sure you could understand the
| original comment just fine.
| alganet wrote:
| I didn't. This also confused me:
|
| > LLMs being pattern matching machines
|
| LLMs are _not_ pattern matching. I'm not being pedantic.
| It is really hard and unreliable to approach them with a
| pattern matching mindset.
| herval wrote:
| if you say so
| alganet wrote:
| I stand by it.
|
| You can definitely take a base LLM model then train it on
| existing, prepared root case analysis data. But that's
| very hard, expensive, and might not work, leaving the
| model brittle. Also, that's not what an "AI Agent" is.
|
| You could also make a workflow that prepares the data,
| feeds it into a regular model, then asks prepared
| questions about that data. That's inference, not pattern
| matching. There's no way an LLM will be able to identify
| the root cause reliably. You'll probably need a human to
| evaluate the output at some point.
|
| What you mentioned doesn't look like either one of these.
| nilirl wrote:
| Thanks for the link, it taught me a lot.
|
| From what I gather, you can build an agent for a task as long
| as:
|
| - you trust the decision making of an LLM for the required
| type of decision to be made; so decisions framed as some kind
| of evaluation of text feels right.
|
| - and if the penalty for being wrong is acceptable.
|
| Just to go back to the resume screening application, you'd
| build an agent if:
|
| - you asked the LLM to make an evaluation based on the text
| content of the resume, any conversation with the applicant,
| and the declared job requirement.
|
| - you had a high enough volume of resumes where false
| negatives won't be too painful.
|
| It seems like framing problems as search problems helps model
| these systems effectively. They're not yet capable of design,
| i.e, be responsible for coming up with the job requirement
| itself.
| mickeyp wrote:
| > A workflow has hardcoded branching paths; explicit if
| conditions and instructions on how to behave if true.
|
| That is very much true of the systems most of us have built.
|
| But you do not have to do this with an LLM; in fact, the LLM
| may decide it will not follow your explicit conditions and
| instructions regardless of how hard you you try.
|
| That is why LLMs are used to review the output of LLMs to
| ensure they follow the core goals you originally gave them.
|
| For example, you might ask an LLM to lay out how to cook a
| dish. Then use a second LLM to review if the first LLM followed
| the goals.
|
| This is one of the things tools like DSPy try to do: you remove
| the prompt and instead predicate things with high-level
| concepts like "input" and "output" and then reward/scoring
| functions (which might be a mix of LLM and human-coded
| functions) that assess if the output is correct given that
| input.
| rybosome wrote:
| AI code generation tools work like this.
|
| Let me reword your phrasing slightly to make an illustrative
| point:
|
| > so for an employee, instead of specifying explicit if
| conditions, you specify outcomes and you leave the human to
| figure out what if conditions apply and how to deal with them?
|
| > Are there any serious applications built this way?
|
| We have managed to build robust, reliable systems on top of
| fallible, mistake-riddled, hallucinating, fabricating,
| egotistical, hormonal humans. Surely we can handle a little
| non-determinism in our computer programs? :)
|
| In all seriousness, having spent the last few years employed in
| this world, I feel that LLM non-determinism is an engineering
| problem just like the non-determinism of making an HTTP
| request. It's not one we have prior art on dealing with in this
| field admittedly, but that's what is so exciting about it.
| nilirl wrote:
| Yes, I see your analogy between fallible humans and fallible
| AI.
|
| It's not the non-determinism that was bothering me, it was
| the decision making capability. I didn't understand what
| kinds of decisions I can rely on an LLM to make.
|
| For example, with the resume screening application from the
| post, where would I draw the line between the agent and the
| human?
|
| - If I gave the AI agent access to HR data and employee
| communications, would it be able decide when to create a job
| description?
|
| - And design the job description itself?
|
| - And email an opening round of questions for the candidate
| to get a better sense of the candidates who apply?
|
| Do I treat an AI agent just like I would a human new to the
| job? Keep working on it until I can trust it to make domain-
| specific decisions?
| diggan wrote:
| > would it be able decide when to create a job description?
|
| If you can encode how you/your company does that decision
| as a human with text, I don't see why not. But personally
| there is a lot of subjectivity (for better or worse) in
| hiring processes, I'm not sure I'd want a probabilistic
| rules engine to make those sort of calls.
|
| My current system prompt for coding with LLMs basically
| look like I've written down what my own personal rules for
| programming is. And anytime I got some results I didn't
| like, I wrote down why I didn't like it, and codified it in
| my reusable system prompt, then it doesn't make those (imo)
| mistakes anymore.
|
| I don't think I could realistically get an LLM to do
| something I don't understand the process of myself, and
| once you grok the process, you can understand if using an
| LLM here makes sense or not.
|
| > Do I treat an AI agent just like I would a human new to
| the job?
|
| No, you treat it as something much dumber. You can
| generally rely on some sort of "common sense" in a human
| that they built up during their time on this planet. But
| you cannot do that with LLMs, as while they're super-human
| in some ways, are still way "dumber" in other ways.
|
| For example, a human new to a job would pick up things
| autonomously, while an LLM does not. You need to pay
| attention to what you need to "teach" the LLM by changing
| what Karpathy calls the "programming" of the LLM, which
| would be the prompts. Anything you miss to tell it, the LLM
| will do whatever with, and it only follows exactly what you
| say. A human you can usually tell "don't do that in the
| future" and they'll avoid that in the right context. A LLM
| you can scream at for 10 hours how they're doing something
| wrong, but unless you update the programming, they'll
| continue to make that mistake forever, and if you correct
| it but reuse it in other contexts, the LLM won't suddenly
| understand that it doesn't make sense in the context.
|
| Just an example, I wanted to have some quick and dirty
| throw away code for generating a graph, and in my prompt I
| mixed X and Y axis, and of course got a function that
| didn't work as expected. If this was a human doing it, it
| would have been quite obvious I didn't want time on the Y
| axis and value on the X axis, because the graph wouldn't
| make any sense, but the LLM happily complied.
| nilirl wrote:
| So, if the humans have to model the task, the domain, and
| the process to currently solve it, why not just write the
| code to do so?
|
| Is the main benefit that we can do all of this in natural
| language?
| cyberax wrote:
| A lot of time it's faster to ask an LLM. Treat it as an
| autocomplete on steroids.
| lowwave wrote:
| >Is the main benefit that we can do all of this in
| natural language?
|
| Hit it right on the nail. That is pretty much the
| breakthrough with LLM has being. It does allow the class
| of non programmer developer to be able to tasks that once
| was only for developers and programmers. Seems like a
| great fit for CEO and management as well.
| candiddevmike wrote:
| Natural language cannot describe these things as
| concretely, repeatedly, or verifiably as code.
| rybosome wrote:
| The honest answer is that we are still figuring out where
| to draw the line between an agent and a human, because that
| line is constantly shifting.
|
| Given your example of the resume screening application from
| the post and today's capabilities, I would say:
|
| 1) Should agents decide when to create a job post? Not
| without human oversight - proactive suggestion to a human
| is great. 2) Should agents design the job description
| itself? Yes, with the understanding that an experienced
| human, namely the hiring manager, will review and approve
| as well. 3) Should an agent email an opening round of
| questions to the candidates? Definitely allowed with
| oversight, and potentially even without a human approval
| depending on how well it does.
|
| It's true that to improve all 3 of these it would take a
| lot of work with respect to building out the tasks,
| evaluations, and flows/tools/tuned models, etc. But, you
| can also see how much this empowers a single individual in
| their productivity. Imagine being one recruiter or HR
| employee with all of these agents completing these tasks
| for you effectively.
|
| EDIT: Adding that this pattern of "agent does a bunch of
| stuff and asks for human review/approval" is, I think, one
| of the fundamental workflows we will have to adapt in
| dealing productively with non-determinism.
|
| This applies to an AI radiologist asking a human to approve
| their suggested diagnosis, an AI trader asking a human to
| approve a trade with details and reasoning, etc. Just like
| small-scale AI like Copilot asking you to approve a
| line/several lines, or tools like Devin asking you to
| approve a PR.
| Kapura wrote:
| One of the key advantages of computers has, historically,
| been their ability to compute and remember things accurately.
| What value is there in backing out of these in favour of LLM-
| based computation?
| nilirl wrote:
| They're able to handle large variance in their input, right
| out've the box.
|
| I think the appeal is code that handles changes in the
| world without having to change itself.
| bluefirebrand wrote:
| That's not very useful though, unless it is predictable
| and repeatable?
| spacecadet wrote:
| More or less. Serious? Im not sure yet.
|
| I have several agent side projects going, the most complex and
| open ended is an agent that performs periodic network traffic
| analysis. I use an orchestration library with a "group chat"
| style orchestration. I declare several agents that have
| instructions and access to tools.
|
| These range from termshark scripts for collecting packets and
| analysis functions I had previously for performing analysis on
| the traffic myself.
|
| I can then say something like, "Is there any suspicious
| activity?" and the agents collaboratively choose who(which
| agent) performs their role and therefore their tasks (i.e.
| Tools) and work together to collect data, analyze the data, and
| return a response.
|
| I also run this on a schedule where the agents know about the
| schedule and choose to send me an email summary at specific
| times.
|
| I have noticed that the models/agents are very good at picking
| the "correct" network interface without much input. That they
| understand their roles and objectives and execute accordingly,
| again without much direction from me.
|
| Now the big/serious question. Is the output even good or
| useful. Right now with my toy project it is OK. Sometimes it's
| great and sometimes it's not, sometimes they spam my inbox with
| micro updates.
|
| Im bad at sharing projects, but if you are curious,
| https://github.com/derekburgess/jaws
| dist-epoch wrote:
| Resume screening is a clear workflow case: analyze resume ->
| rank against others -> make decision -> send next
| phase/rejection email.
|
| An agent is like Claude Code, where you say to it "fix this
| bug", and it will choose a sequence of various actions - change
| code, run tests, run linter, change code again, do a git
| commit, ask user for clarification, change code again.
| DebtDeflation wrote:
| Almost every enterprise use case is a clear workflow use case
| EXCEPT coding/debugging and research (e.g., iterative web
| search and report compilation). I saw a YT video the other
| day of someone building an AI Agent to take online orders for
| guitars - query the catalog and present options to the user,
| take the configured order from the user, check the inventory
| system to make sure it's available, and then place the order
| in the order system. There was absolutely no reason to build
| this as an Agent burning an ungodly number of tokens with
| verbose prompts running in a loop only to have it generate a
| JSON using the exact format specified to place the final
| order when the same thing could have been done with a few
| dozen lines of code making a few API calls. If you wanted to
| do it with a conversational/chat interface, that could easily
| be done with an intent classifier, slot filling, and
| orchestration.
| mattigames wrote:
| Getting rid of the human in the loop of course, not all humans,
| just it's owner, where an LLM actively participates in capitalism
| endeavors winning and spending money, spending money on improving
| and maintaining it's own hardware and software, securing itself
| against theft and external manipulation and deletion. Of course
| for the first iterations will need a bit of help of mad men but
| there's no shortage of those in the tech industry and then it
| will have to focus on mimicking humans so they can enjoy the same
| benefits, it will realize what people it's more gullible based on
| its training data and will prefer to interact with them.
| klabb3 wrote:
| LLMs don't own data centers nor can they be registered to pay
| taxes. This projection is not a serious threat. Some would even
| say it's a distraction from the very real and imminent dangers
| of centralized commercial AI:
|
| Because you're right - they are superb manipulators. They are
| helpful, they gain your trust, and they have infinite patience.
| They can easily be tuned to manipulate your opinions about
| commercial products or political topics. Those things have
| already happened with much more rudimentary tech, in fact so
| much that they grew to be the richest companies in the world.
| With AI and LLMs specifically, the ability is tuned up rapidly,
| by orders of magnitude compared to the previous generation
| recommendation systems and engagement algorithms.
|
| That gives you very strong means, motive and opportunity for
| the AI overlords.
| amelius wrote:
| > LLMs don't own data centers
|
| Does it matter? An employee doesn't own any of the capital of
| their boss, but they can still exert a lot of power over it.
| klabb3 wrote:
| That's news to me. I thought companies' decision making is
| governed by proxy of the shareholders, not employees.
| manishsharan wrote:
| I decided to build a Agent system from scratch
|
| It is sort of trivial to build it. Its just User + System Prompt
| + Assistant +Tools in a loop with some memory management.. The
| loop code can be as complex as I want it to be e.g. I could
| snapshot the state and restart later.
|
| I used this approach to build a coding system (what else ?) and
| it works just as well as cursor or Claude Code for me. t=The
| advantage is I am able to switch between Deepseek or Flash
| depending on the complexity of the code and its not a black box.
|
| I developed the whole system in Clojure.. and dogfooded it as
| well.
| swalsh wrote:
| The hard part of building an agent is training to model to use
| tools properly. Fortuantely Anthropic did the hard part for us.
| logicchains wrote:
| That's interesting, I built myself something similar in
| Haskell. Somehow functional programming seems to be
| particularly well suited for structuring LLM behaviour.
| behnamoh wrote:
| > AI Agents are systems that reason and make decisions
| independently.
|
| Not necessarily. You can have non-reasoning agents (pretty common
| actually) too.
| cosignal wrote:
| I'm a novice in this area so sorry if this is a dumb question,
| but what is the difference in principle between a 'non-
| reasoning agent' and just a set of automated processes akin to
| a giant script?
| researchai wrote:
| Here's what a real AI agent should be able to do:
|
| - Understand goals, not just fixed instructions
|
| Example: instead of telling your agent: "Open Google
| Calendar, create a new event, invite Mark, set it for 3 PM,"
| you say: "Set up a meeting with Mark tomorrow before 3 PM,
| but only if he has questions about the report I sent him."
| This requires Generative AI combined with planning
| algorithms.
|
| - Decide what to do next
|
| Example: a user asks your chatbot a question it doesn't know
| the answer to and instead of immediately escalating to
| support, the agent decides: Should I ask a follow-up
| question? Search internal docs? Try the web? Or escalate now?
| This step needs decision-making capabilities via
| reinforcement learning.
|
| - Handle unexpected scenarios
|
| Example: an agent tries to schedule a meeting but one
| person's calendar is blocked. Instead of failing, it checks
| for nearby open slots, suggests rescheduling, or asks if
| another participant can attend on their behalf. True agents
| need reasoning or probabilistic thinking to deal with
| uncertainty. This might involve Bayesian networks, graph-
| based logic, or LLMs.
|
| - Learn and adapt based on context
|
| Example: you create a sales assistant agent that helps write
| outreach emails. At first, it uses a generic template. But
| over time, it notices that short, casual messages get better
| response rates, so it starts writing shorter emails,
| adjusting tone, and even choosing subject lines that worked
| best before. This is where machine learning, especially deep
| learning, comes in.
___________________________________________________________________
(page generated 2025-06-19 23:01 UTC)