hngopher.com

       [HN Gopher] From LLM to AI Agent: What's the Real Journey Behind...
       ___________________________________________________________________
        
       From LLM to AI Agent: What's the Real Journey Behind AI System
       Development?
        
       Author : codelink
       Score  : 111 points
       Date   : 2025-06-19 09:29 UTC (13 hours ago)
        
 (HTM) web link (www.codelink.io)
 (TXT) w3m dump (www.codelink.io)
        
       | nilirl wrote:
       | > AI Agents can initiate workflows independently and determine
       | their sequence and combination dynamically
       | 
       | I'm confused.
       | 
       | A workflow has hardcoded branching paths; explicit if conditions
       | and instructions on how to behave if true.
       | 
       | So for an agent, instead of specifying explicit if conditions,
       | you specify outcomes and you leave the LLM to figure out what if
       | conditions apply and how to deal with them?
       | 
       | In the case of this resume screening application, would I just
       | provide the ability to make API calls and then add this to the
       | prompt: "Decide what a good fit would be."?
       | 
       | Are there any serious applications built this way? Or am I
       | missing something?
        
         | manojlds wrote:
         | Not all applications need to be built this way. But the most
         | serious apps built this way would be deep research
         | 
         | Recent article from Anthropic -
         | https://www.anthropic.com/engineering/built-multi-agent-rese...
        
           | alganet wrote:
           | An AI company doing it is the corporate equivalent of "works
           | on my machine".
           | 
           | Can you give us an example of a company not involved in AI
           | research that does it?
        
             | herval wrote:
             | There's plenty of companies using these sorts of agentic
             | systems these days already. In my case, we wrote an LLM
             | that knows how to fetch data from a bunch of sources (logs,
             | analytics, etc) and root causes incidents. Not all sources
             | make sense for all incidents, most queries have crazy high
             | cardinality and the data correlation isn't always possible.
             | LLMs being pattern matching machines, this allows them to
             | determine what to fetch, then it pattern matches a cause
             | based on other tools it has access (eg runbooks, google
             | searches)
             | 
             | I built incident detection systems in the past, and this
             | was orders of magnitude easier and more generalizable for
             | new kinds of issues. It still gives meaningless/obvious
             | reasoning frequently, but it's far, far better than the
             | alternatives...
        
               | alganet wrote:
               | > we wrote an LLM
               | 
               | Excuse me, what?
        
               | herval wrote:
               | LLM _automation_. I'm sure you could understand the
               | original comment just fine.
        
               | alganet wrote:
               | I didn't. This also confused me:
               | 
               | > LLMs being pattern matching machines
               | 
               | LLMs are _not_ pattern matching. I'm not being pedantic.
               | It is really hard and unreliable to approach them with a
               | pattern matching mindset.
        
               | herval wrote:
               | if you say so
        
               | alganet wrote:
               | I stand by it.
               | 
               | You can definitely take a base LLM model then train it on
               | existing, prepared root case analysis data. But that's
               | very hard, expensive, and might not work, leaving the
               | model brittle. Also, that's not what an "AI Agent" is.
               | 
               | You could also make a workflow that prepares the data,
               | feeds it into a regular model, then asks prepared
               | questions about that data. That's inference, not pattern
               | matching. There's no way an LLM will be able to identify
               | the root cause reliably. You'll probably need a human to
               | evaluate the output at some point.
               | 
               | What you mentioned doesn't look like either one of these.
        
           | nilirl wrote:
           | Thanks for the link, it taught me a lot.
           | 
           | From what I gather, you can build an agent for a task as long
           | as:
           | 
           | - you trust the decision making of an LLM for the required
           | type of decision to be made; so decisions framed as some kind
           | of evaluation of text feels right.
           | 
           | - and if the penalty for being wrong is acceptable.
           | 
           | Just to go back to the resume screening application, you'd
           | build an agent if:
           | 
           | - you asked the LLM to make an evaluation based on the text
           | content of the resume, any conversation with the applicant,
           | and the declared job requirement.
           | 
           | - you had a high enough volume of resumes where false
           | negatives won't be too painful.
           | 
           | It seems like framing problems as search problems helps model
           | these systems effectively. They're not yet capable of design,
           | i.e, be responsible for coming up with the job requirement
           | itself.
        
         | mickeyp wrote:
         | > A workflow has hardcoded branching paths; explicit if
         | conditions and instructions on how to behave if true.
         | 
         | That is very much true of the systems most of us have built.
         | 
         | But you do not have to do this with an LLM; in fact, the LLM
         | may decide it will not follow your explicit conditions and
         | instructions regardless of how hard you you try.
         | 
         | That is why LLMs are used to review the output of LLMs to
         | ensure they follow the core goals you originally gave them.
         | 
         | For example, you might ask an LLM to lay out how to cook a
         | dish. Then use a second LLM to review if the first LLM followed
         | the goals.
         | 
         | This is one of the things tools like DSPy try to do: you remove
         | the prompt and instead predicate things with high-level
         | concepts like "input" and "output" and then reward/scoring
         | functions (which might be a mix of LLM and human-coded
         | functions) that assess if the output is correct given that
         | input.
        
         | rybosome wrote:
         | AI code generation tools work like this.
         | 
         | Let me reword your phrasing slightly to make an illustrative
         | point:
         | 
         | > so for an employee, instead of specifying explicit if
         | conditions, you specify outcomes and you leave the human to
         | figure out what if conditions apply and how to deal with them?
         | 
         | > Are there any serious applications built this way?
         | 
         | We have managed to build robust, reliable systems on top of
         | fallible, mistake-riddled, hallucinating, fabricating,
         | egotistical, hormonal humans. Surely we can handle a little
         | non-determinism in our computer programs? :)
         | 
         | In all seriousness, having spent the last few years employed in
         | this world, I feel that LLM non-determinism is an engineering
         | problem just like the non-determinism of making an HTTP
         | request. It's not one we have prior art on dealing with in this
         | field admittedly, but that's what is so exciting about it.
        
           | nilirl wrote:
           | Yes, I see your analogy between fallible humans and fallible
           | AI.
           | 
           | It's not the non-determinism that was bothering me, it was
           | the decision making capability. I didn't understand what
           | kinds of decisions I can rely on an LLM to make.
           | 
           | For example, with the resume screening application from the
           | post, where would I draw the line between the agent and the
           | human?
           | 
           | - If I gave the AI agent access to HR data and employee
           | communications, would it be able decide when to create a job
           | description?
           | 
           | - And design the job description itself?
           | 
           | - And email an opening round of questions for the candidate
           | to get a better sense of the candidates who apply?
           | 
           | Do I treat an AI agent just like I would a human new to the
           | job? Keep working on it until I can trust it to make domain-
           | specific decisions?
        
             | diggan wrote:
             | > would it be able decide when to create a job description?
             | 
             | If you can encode how you/your company does that decision
             | as a human with text, I don't see why not. But personally
             | there is a lot of subjectivity (for better or worse) in
             | hiring processes, I'm not sure I'd want a probabilistic
             | rules engine to make those sort of calls.
             | 
             | My current system prompt for coding with LLMs basically
             | look like I've written down what my own personal rules for
             | programming is. And anytime I got some results I didn't
             | like, I wrote down why I didn't like it, and codified it in
             | my reusable system prompt, then it doesn't make those (imo)
             | mistakes anymore.
             | 
             | I don't think I could realistically get an LLM to do
             | something I don't understand the process of myself, and
             | once you grok the process, you can understand if using an
             | LLM here makes sense or not.
             | 
             | > Do I treat an AI agent just like I would a human new to
             | the job?
             | 
             | No, you treat it as something much dumber. You can
             | generally rely on some sort of "common sense" in a human
             | that they built up during their time on this planet. But
             | you cannot do that with LLMs, as while they're super-human
             | in some ways, are still way "dumber" in other ways.
             | 
             | For example, a human new to a job would pick up things
             | autonomously, while an LLM does not. You need to pay
             | attention to what you need to "teach" the LLM by changing
             | what Karpathy calls the "programming" of the LLM, which
             | would be the prompts. Anything you miss to tell it, the LLM
             | will do whatever with, and it only follows exactly what you
             | say. A human you can usually tell "don't do that in the
             | future" and they'll avoid that in the right context. A LLM
             | you can scream at for 10 hours how they're doing something
             | wrong, but unless you update the programming, they'll
             | continue to make that mistake forever, and if you correct
             | it but reuse it in other contexts, the LLM won't suddenly
             | understand that it doesn't make sense in the context.
             | 
             | Just an example, I wanted to have some quick and dirty
             | throw away code for generating a graph, and in my prompt I
             | mixed X and Y axis, and of course got a function that
             | didn't work as expected. If this was a human doing it, it
             | would have been quite obvious I didn't want time on the Y
             | axis and value on the X axis, because the graph wouldn't
             | make any sense, but the LLM happily complied.
        
               | nilirl wrote:
               | So, if the humans have to model the task, the domain, and
               | the process to currently solve it, why not just write the
               | code to do so?
               | 
               | Is the main benefit that we can do all of this in natural
               | language?
        
               | cyberax wrote:
               | A lot of time it's faster to ask an LLM. Treat it as an
               | autocomplete on steroids.
        
               | lowwave wrote:
               | >Is the main benefit that we can do all of this in
               | natural language?
               | 
               | Hit it right on the nail. That is pretty much the
               | breakthrough with LLM has being. It does allow the class
               | of non programmer developer to be able to tasks that once
               | was only for developers and programmers. Seems like a
               | great fit for CEO and management as well.
        
               | candiddevmike wrote:
               | Natural language cannot describe these things as
               | concretely, repeatedly, or verifiably as code.
        
             | rybosome wrote:
             | The honest answer is that we are still figuring out where
             | to draw the line between an agent and a human, because that
             | line is constantly shifting.
             | 
             | Given your example of the resume screening application from
             | the post and today's capabilities, I would say:
             | 
             | 1) Should agents decide when to create a job post? Not
             | without human oversight - proactive suggestion to a human
             | is great. 2) Should agents design the job description
             | itself? Yes, with the understanding that an experienced
             | human, namely the hiring manager, will review and approve
             | as well. 3) Should an agent email an opening round of
             | questions to the candidates? Definitely allowed with
             | oversight, and potentially even without a human approval
             | depending on how well it does.
             | 
             | It's true that to improve all 3 of these it would take a
             | lot of work with respect to building out the tasks,
             | evaluations, and flows/tools/tuned models, etc. But, you
             | can also see how much this empowers a single individual in
             | their productivity. Imagine being one recruiter or HR
             | employee with all of these agents completing these tasks
             | for you effectively.
             | 
             | EDIT: Adding that this pattern of "agent does a bunch of
             | stuff and asks for human review/approval" is, I think, one
             | of the fundamental workflows we will have to adapt in
             | dealing productively with non-determinism.
             | 
             | This applies to an AI radiologist asking a human to approve
             | their suggested diagnosis, an AI trader asking a human to
             | approve a trade with details and reasoning, etc. Just like
             | small-scale AI like Copilot asking you to approve a
             | line/several lines, or tools like Devin asking you to
             | approve a PR.
        
           | Kapura wrote:
           | One of the key advantages of computers has, historically,
           | been their ability to compute and remember things accurately.
           | What value is there in backing out of these in favour of LLM-
           | based computation?
        
             | nilirl wrote:
             | They're able to handle large variance in their input, right
             | out've the box.
             | 
             | I think the appeal is code that handles changes in the
             | world without having to change itself.
        
               | bluefirebrand wrote:
               | That's not very useful though, unless it is predictable
               | and repeatable?
        
         | spacecadet wrote:
         | More or less. Serious? Im not sure yet.
         | 
         | I have several agent side projects going, the most complex and
         | open ended is an agent that performs periodic network traffic
         | analysis. I use an orchestration library with a "group chat"
         | style orchestration. I declare several agents that have
         | instructions and access to tools.
         | 
         | These range from termshark scripts for collecting packets and
         | analysis functions I had previously for performing analysis on
         | the traffic myself.
         | 
         | I can then say something like, "Is there any suspicious
         | activity?" and the agents collaboratively choose who(which
         | agent) performs their role and therefore their tasks (i.e.
         | Tools) and work together to collect data, analyze the data, and
         | return a response.
         | 
         | I also run this on a schedule where the agents know about the
         | schedule and choose to send me an email summary at specific
         | times.
         | 
         | I have noticed that the models/agents are very good at picking
         | the "correct" network interface without much input. That they
         | understand their roles and objectives and execute accordingly,
         | again without much direction from me.
         | 
         | Now the big/serious question. Is the output even good or
         | useful. Right now with my toy project it is OK. Sometimes it's
         | great and sometimes it's not, sometimes they spam my inbox with
         | micro updates.
         | 
         | Im bad at sharing projects, but if you are curious,
         | https://github.com/derekburgess/jaws
        
         | dist-epoch wrote:
         | Resume screening is a clear workflow case: analyze resume ->
         | rank against others -> make decision -> send next
         | phase/rejection email.
         | 
         | An agent is like Claude Code, where you say to it "fix this
         | bug", and it will choose a sequence of various actions - change
         | code, run tests, run linter, change code again, do a git
         | commit, ask user for clarification, change code again.
        
           | DebtDeflation wrote:
           | Almost every enterprise use case is a clear workflow use case
           | EXCEPT coding/debugging and research (e.g., iterative web
           | search and report compilation). I saw a YT video the other
           | day of someone building an AI Agent to take online orders for
           | guitars - query the catalog and present options to the user,
           | take the configured order from the user, check the inventory
           | system to make sure it's available, and then place the order
           | in the order system. There was absolutely no reason to build
           | this as an Agent burning an ungodly number of tokens with
           | verbose prompts running in a loop only to have it generate a
           | JSON using the exact format specified to place the final
           | order when the same thing could have been done with a few
           | dozen lines of code making a few API calls. If you wanted to
           | do it with a conversational/chat interface, that could easily
           | be done with an intent classifier, slot filling, and
           | orchestration.
        
       | mattigames wrote:
       | Getting rid of the human in the loop of course, not all humans,
       | just it's owner, where an LLM actively participates in capitalism
       | endeavors winning and spending money, spending money on improving
       | and maintaining it's own hardware and software, securing itself
       | against theft and external manipulation and deletion. Of course
       | for the first iterations will need a bit of help of mad men but
       | there's no shortage of those in the tech industry and then it
       | will have to focus on mimicking humans so they can enjoy the same
       | benefits, it will realize what people it's more gullible based on
       | its training data and will prefer to interact with them.
        
         | klabb3 wrote:
         | LLMs don't own data centers nor can they be registered to pay
         | taxes. This projection is not a serious threat. Some would even
         | say it's a distraction from the very real and imminent dangers
         | of centralized commercial AI:
         | 
         | Because you're right - they are superb manipulators. They are
         | helpful, they gain your trust, and they have infinite patience.
         | They can easily be tuned to manipulate your opinions about
         | commercial products or political topics. Those things have
         | already happened with much more rudimentary tech, in fact so
         | much that they grew to be the richest companies in the world.
         | With AI and LLMs specifically, the ability is tuned up rapidly,
         | by orders of magnitude compared to the previous generation
         | recommendation systems and engagement algorithms.
         | 
         | That gives you very strong means, motive and opportunity for
         | the AI overlords.
        
           | amelius wrote:
           | > LLMs don't own data centers
           | 
           | Does it matter? An employee doesn't own any of the capital of
           | their boss, but they can still exert a lot of power over it.
        
             | klabb3 wrote:
             | That's news to me. I thought companies' decision making is
             | governed by proxy of the shareholders, not employees.
        
       | manishsharan wrote:
       | I decided to build a Agent system from scratch
       | 
       | It is sort of trivial to build it. Its just User + System Prompt
       | + Assistant +Tools in a loop with some memory management.. The
       | loop code can be as complex as I want it to be e.g. I could
       | snapshot the state and restart later.
       | 
       | I used this approach to build a coding system (what else ?) and
       | it works just as well as cursor or Claude Code for me. t=The
       | advantage is I am able to switch between Deepseek or Flash
       | depending on the complexity of the code and its not a black box.
       | 
       | I developed the whole system in Clojure.. and dogfooded it as
       | well.
        
         | swalsh wrote:
         | The hard part of building an agent is training to model to use
         | tools properly. Fortuantely Anthropic did the hard part for us.
        
         | logicchains wrote:
         | That's interesting, I built myself something similar in
         | Haskell. Somehow functional programming seems to be
         | particularly well suited for structuring LLM behaviour.
        
       | behnamoh wrote:
       | > AI Agents are systems that reason and make decisions
       | independently.
       | 
       | Not necessarily. You can have non-reasoning agents (pretty common
       | actually) too.
        
         | cosignal wrote:
         | I'm a novice in this area so sorry if this is a dumb question,
         | but what is the difference in principle between a 'non-
         | reasoning agent' and just a set of automated processes akin to
         | a giant script?
        
           | researchai wrote:
           | Here's what a real AI agent should be able to do:
           | 
           | - Understand goals, not just fixed instructions
           | 
           | Example: instead of telling your agent: "Open Google
           | Calendar, create a new event, invite Mark, set it for 3 PM,"
           | you say: "Set up a meeting with Mark tomorrow before 3 PM,
           | but only if he has questions about the report I sent him."
           | This requires Generative AI combined with planning
           | algorithms.
           | 
           | - Decide what to do next
           | 
           | Example: a user asks your chatbot a question it doesn't know
           | the answer to and instead of immediately escalating to
           | support, the agent decides: Should I ask a follow-up
           | question? Search internal docs? Try the web? Or escalate now?
           | This step needs decision-making capabilities via
           | reinforcement learning.
           | 
           | - Handle unexpected scenarios
           | 
           | Example: an agent tries to schedule a meeting but one
           | person's calendar is blocked. Instead of failing, it checks
           | for nearby open slots, suggests rescheduling, or asks if
           | another participant can attend on their behalf. True agents
           | need reasoning or probabilistic thinking to deal with
           | uncertainty. This might involve Bayesian networks, graph-
           | based logic, or LLMs.
           | 
           | - Learn and adapt based on context
           | 
           | Example: you create a sales assistant agent that helps write
           | outreach emails. At first, it uses a generic template. But
           | over time, it notices that short, casual messages get better
           | response rates, so it starts writing shorter emails,
           | adjusting tone, and even choosing subject lines that worked
           | best before. This is where machine learning, especially deep
           | learning, comes in.
        
       ___________________________________________________________________
       (page generated 2025-06-19 23:01 UTC)