[HN Gopher] New tools for building agents
___________________________________________________________________
New tools for building agents
Author : meetpateltech
Score : 219 points
Date : 2025-03-11 17:04 UTC (5 hours ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| nnurmanov wrote:
| Does anyone know if there are any difference if you typed the
| question with typos vs you did it correctly?
| davidbarker wrote:
| In theory there shouldn't be -- LLMs are pretty robust to typos
| and usually infer the intended meaning regardless.
| swyx wrote:
| swyx here. we got some preview and time with the API/DX team to
| ask FAQs about all the new APIs.
|
| https://latent.space/p/openai-agents-platform
|
| main fun part - since responses are stored for free by default
| now, how can we abuse the Responses API as a database :)
|
| other fun qtns that a HN crew might enjoy:
|
| - hparams for websearch - depth/breadth of search for making your
| own DIY Deep Research
|
| - now that OAI is offering RAG/reranking out of the box as part
| of the Responses API, when should you build your own RAG? (i
| basically think somebody needs to benchmark the RAG capabilities
| of the Files API now, because the community impression has not
| really updated from back when Assistants API was first launched)
|
| - whats the diff between Agents SDK and OAI Swarm? (basically
| types, tracing, pluggable LLMs)
|
| - will the `search-preview` and `computer-use-preview` finetunes
| be merged into GPT5?
| ggnore7452 wrote:
| appreciate the question on hparams for websearch!
|
| one of the main reasons i build these ai search tools from
| scratch is that i can fully control the depth and breadth (and
| also customize loader to whatever data/sites). and currently
| the web search isn't very transparent on what sites they do not
| have full text or just use snippets.
|
| having computer use + websearch is definitely something very
| powerful (openai's deep research essentially)
| mritchie712 wrote:
| for anyone that likes the Agents SDK, but doesn't want their
| framework attached to OpenAI, we're really liking
| PydanticAI[0].
|
| 0 - https://ai.pydantic.dev/
| fullstackwife wrote:
| Openai SDK docs:
|
| > Notably, our SDK is compatible with any model providers
| that support the OpenAI Chat Completions API format.
|
| so you can use with everything, not only OpenAI?
| DrBenCarson wrote:
| Yes
| swyx wrote:
| yea they mention this on the pod
| darkteflon wrote:
| There's also HF's smolagents[1].
|
| 1 - https://github.com/huggingface/smolagents
| suttontom wrote:
| What is a "qtns"?
| oofbaroomf wrote:
| Questions.
| baxtr wrote:
| A bit off topic but the post comes handy: can we settle the
| debate what an agent really is? It seems like everyone has their
| own definition.
|
| Ok I'll start: an agent is a computer program that utilized LLMs
| heutiger for decision making.
| codydkdc wrote:
| an agent is software that does something on behalf of someone
| (aka software)
|
| I personally strongly prefer the term "bots" for what most of
| these frameworks call "agents"
| handfuloflight wrote:
| Stick to the agentic nomenclature if you want at least a
| magnitude increase in valuation.
| 3stripe wrote:
| First rule of writing definitions: use everyday English.
| baxtr wrote:
| True! Meant heuristic
| knowaveragejoe wrote:
| I think Anthropic's definition makes the most sense.
|
| - Workflows are systems where LLMs and tools are orchestrated
| through predefined code paths. (imo this is what most people
| are referring to as "agents")
|
| - Agents, on the other hand, are systems where LLMs dynamically
| direct their own processes and tool usage, maintaining control
| over how they accomplish tasks.
|
| https://www.anthropic.com/engineering/building-effective-age...
| kodablah wrote:
| The problem with this definition is that modern workflow
| systems are not through predefined code paths, they do
| dynamically direct their own processes and tool usage.
| rglover wrote:
| Agents are just regular LLM chat bots that are prompted to
| parse user input into instructions about what functions to call
| in your back-end, with what data, etc. Basically it's a way to
| take random user input and turn it into pseudo-logic you can
| write code against.
|
| As an example, I can provide a system prompt that mentions a
| function like get_weather() being available to call. Then, I
| can pass whatever my user's prompt text is and the LLM will
| determine what code I need to call on the back-end.
|
| So if a user types "What is the weather in Nashville?" the LLM
| would infer that the user is asking about weather and reply to
| me with a string like "call function get_weather with location
| Nashville" or if you prompted it, some JSON like {
| function_to_call: 'get_weather', location: 'Nashville' }. From
| there, I'd just call that function with any the data I asked
| the LLM to provide.
| kylecazar wrote:
| Even more off topic, does "heutiger" mean something in English
| that I'm unaware of? Google tells me it's just German for
| 'today' or 'current'.
| baxtr wrote:
| Never heard that word either!
| zellyn wrote:
| Notably not mentioned: Model Context Protocol
| https://www.anthropic.com/news/model-context-protocol
| nilslice wrote:
| not implementing doesn't mean its not supported
| https://github.com/dylibso/mcpx-openai-node (this is for
| mcp.run tool calling with OpenAI models, not generic)
|
| but yes, it's the strongest anti-developer move to not directly
| support MCP. not surprised given OpenAI generally. but would be
| a very nice addition!
| benatkin wrote:
| DeepSeek doesn't seem to support it either FWIW. Maybe MCP is
| just an Anthropic thing.
| esafak wrote:
| How do they compare?
| cowpig wrote:
| MCP is a protocol, and Anthropic has provided SDKs for
| implementing that protocol. In practice, I find the MCP
| protocol to be pretty great, but it leaves basically
| everything _except_ the model parts out. I.e. MCP really only
| addresses how "agentic" systems interact with one another,
| nothing else.
|
| This SDK is trying to provide a bunch of code for
| implementing specific agent codebases. There are a bunch of
| open source ones already, so this is OpenAI throwing their
| hat in the ring.
|
| IMO this OpenAI release is kind of ecosystem-hostile in that
| they are directly competing with their users, in the same way
| that the GPT apps were.
| esafak wrote:
| Thank you. Which open source ones are best?
| knowaveragejoe wrote:
| You can (somewhat) bridge between them:
|
| https://github.com/SecretiveShell/MCP-Bridge
| dgellow wrote:
| Do you have experience with MCP? If yes, what do you think of
| it?
| thenameless7741 wrote:
| it's mentioned in the main thread:
| https://nitter.net/athyuttamre/status/1899511569274347908
|
| > [Q] Does the Agents SDK support MCP connections? So can we
| easily give certain agents tools via MCP client server
| connections?
|
| > [A] You're able to define any tools you want, so you could
| implement MCP tools via function calling
| rvz wrote:
| They did not announce the price(s) in the presentation. Likely
| because they know it is going to be very expensive:
| Web Search [0] * $30 and $25 per 1K queries for GPT-4o
| search and 4o-mini search. File search [1] *
| $2.50 per 1K queries and file storage at $0.10/GB/day *
| First 1GB is free. Computer use tool (computer-use-
| preview model) [2] * $3 per 1M input tokens and $12/1M
| output tokens.
|
| [0] https://platform.openai.com/docs/pricing#web-search
|
| [1] https://platform.openai.com/docs/pricing#built-in-tools
|
| [2] https://platform.openai.com/docs/pricing#latest-models
| yard2010 wrote:
| So they're basically pivoting from selling text by the ounce to
| selling web searches and cloud storage? I like it, it's a bold
| move. When the slow people at Google finally catch up it might
| be too late for Google?
| KoolKat23 wrote:
| Google AI Studios "Grounding" basically web search is priced
| similarly. (Very expensive for either, although Google gives
| you your first 1500 queries free).
|
| It seems completely upside down, they always said traditional
| search was cheaper/less intensive, I guess a lot of tokens
| must go into the actual LLM searching and retrieving.
| Areibman wrote:
| Nice to finally see one of the labs throwing weight behind a much
| needed simple abstraction. It's clear they learned from the
| incumbents (langchain et al)-- don't sell complexity.
|
| Also very nice of them to include extensible tracing. The
| AgentOps integration is a nice touch to getting behind the scenes
| to understand how handoffs and tool calls are triggered
| esafak wrote:
| Extensible how?
| swyx wrote:
| why agentops specifically? doesnt the oai first party one also
| do it?
| bloomingkales wrote:
| Langchain felt like a framework that was designed to allow
| people to sell it on their resumes. So many ideas, it would
| easily take up one full line of a resume. I think it's super
| important not to let frameworks like that become incumbent
| right now, especially when everyone is an exploration state.
| serjester wrote:
| This is one of the few agent abstractions I've seen that actually
| seems intuitive. Props to the OpenAI team, seems like it'll kill
| a lot of bad startups.
| sdcoffey wrote:
| Steve here from the OpenAI team-this means a lot! We really
| hope you enjoy building on it
| ilaksh wrote:
| The Agents SDK they linked to comes up 404.
|
| BTW I have something somewhat similar to some of this like
| Responses and File Search in MindRoot by using the task API:
| https://github.com/runvnc/mindroot/blob/main/api.md
|
| Which could be combined with the query_kb tool from the mr_kb
| plugin (in my mr_kb repo) which is actually probably better than
| File Search because it allows searching multiple KBs.
|
| Anyway, if anyone wants to help with my program, create a plugin
| on PR, or anything, feel free to connect on GitHub, email or
| Discord/Telegram (runvnc).
| yablak wrote:
| Loads fine for me. Maybe because I'm logged in?
| IncreasePosts wrote:
| That should be a 403 then. Tsk tsk open ai
| 29ebJCyy wrote:
| Technically it should be a 401. Tsk tsk IncreasePosts.
| __float wrote:
| It's common (see: S3, private GitHub repos) to return 404
| instead of unauthorized to avoid even leaking existence
| of a resource at URL.
| anorak27 wrote:
| I have built myself a much simpler and powerful version of the
| responses API and it works with all LLM providers.
|
| https://github.com/Anilturaga/aiide
| nextworddev wrote:
| This may be bad for Langflow, Langsmith, etc
| nowittyusername wrote:
| How does this compare to MCP? Anyone has any considerations on
| the matter?
| mentalgear wrote:
| Well, I'll just wait 2-3 days until a (better) open-source
| alternative is released. :D
| jumploops wrote:
| > "we plan to formally announce the deprecation of the Assistants
| API with a target sunset date in mid-2026."
|
| The new Responses API is a step in the right direction,
| especially with the built-in "handoff" functionality.
|
| For agentic use cases, the new API still feels a bit limited, as
| there's a lack of formal "guardrails"/state machine logic built
| in.
|
| > "Our goal is to give developers a seamless platform experience
| for building agents"
|
| It will be interesting to see how they move towards this
| platform, my guess is that we'll see a graph-based control flow
| in the coming months.
|
| Now there are countless open-source solutions for this, but most
| of them fall short and/or add unnecessary obfuscation/complexity.
|
| We've been able to build our agentic flows using a combination of
| tool calling and JSON responses, but there's still a missing
| higher order component that no one seems to have cracked yet.
| hodanli wrote:
| I wonder why they phased out Pydantic in structured output for
| the Responses API.
| sdcoffey wrote:
| Hey there! This is Steve here from the OpenAI team-I worked on
| the Responses API. We have not removed this! It should still
| work just like before! Here's an example:
|
| https://github.com/openai/openai-python/blob/main/examples/r...
| lunarcave wrote:
| (Shameless plug) I worked on something for anyone else wanting
| to get structured outputs from LLMs in a model agnostic way
| (Including Open AI models): https://github.com/inferablehq/l1m
| phren0logy wrote:
| I'm a bit surprised at the approach to RAG. It will be great to
| see how well it handles complex PDFs. The max size is _far_
| larger than the Anthropic API permits (though that 's obviously
| very different - no RAG).
|
| The chunking strategy is... pretty basic, but I guess we'll see
| if it works well enough for enough people.
| cosbgn wrote:
| We handle over 1M requests per month using the Assistant API on
| https://rispose.com which apparently will get depreciated mid
| 2026. Should we move to the new API?
| jstummbillig wrote:
| Eventually, yes. The addressed Assistant API near the end of
| the the video: They say there will be a transition path, once
| they built all Assistant features into the new API, and ample
| time to take action.
| nknj wrote:
| there's no rush to do this - in the coming weeks, we will add
| support for:
|
| - assistant-like and thread-like objects to the responses api
|
| - async responses
|
| - code interpreter in responses
|
| once we do this, we'll share a migration guide that allows you
| to move over without any loss of features or data. we'll also
| give you a full 12 months to do your migration. feel free to
| reach out at nikunj[at]openai.com if you have any questions
| about any of this, and thank you so much for building on the
| assistants api beta! I think you'll really like responses api
| too!
| marko-k wrote:
| If Responses is replacing Assistants, is there a quickstart
| template available--similar to the one you had for
| Assistants?
|
| https://github.com/openai/openai-assistants-quickstart
| dmayle wrote:
| Is it just me, or is what OpenAI is really lacking is a billing
| API/platform?
|
| As an engineer, I have to manage the cost/service ratio manually,
| making sure I charge enough to handle my traffic, while
| enforcing/managing/policing the usage.
|
| Additionally, there are customers who already pay for OpenAI, so
| the value add for them is less, since they are paying twice for
| the underlying capabilities.
|
| If OpenAPI had a billing API/platform ala AppStore/PlayStore, I
| have multiple price points matched to OpenAI usage limits (and
| maybe configurable profit margins).
|
| For customers that don't have an existing relationship with me,
| OpenAI could support a Netflix/YouTube-style profit-sharing
| system, where OpenAI customers can try out and use products
| integrated with the billing platform/API, and my products would
| receive payment in accordance with customer usage...
| mrcwinn wrote:
| One, if you charge above API costs, you should never police
| usage (so long as you're transparent with customers). Why would
| you need to cap usage if you're pricing correctly? (Rate limits
| aside)
|
| Two, yes, many people will pay $20/mo for ChatGPT and then also
| pay for a product that under the hood uses OpenAI API. If
| you're worried about your product's value not being
| differentiated from ChatGPT, I'd say you have a product problem
| moreso than OpenAI has a billing model problem.
| bloomingkales wrote:
| We need a subreddit on how everyone is managing token pricing.
| falcor84 wrote:
| I'm impressed by the advances in Computer Use mentioned here and
| this got me wondering - is this already mature enough to be
| utilized for usability testing? Would I be right to assume that
| in general, a UI that is more difficult for AI to navigate is
| likely to also be relatively difficult for humans, and that it's
| a signal that it should be simplified/improved in some way?
| m3t4man wrote:
| Why would you assume that? Modality of engagement is
| drastically different between the way LLM engages with UI vs
| human being
| daviding wrote:
| It would have been nice if the Completions use of the internal
| web-search tool wasn't always mandatory and could be set to
| 'auto'. Would save a lot of reworking just to go the new
| Responses API format just for that use case.
| theuppermiddle wrote:
| Does the SDK allow executing Python code generated in some sort
| of sandbox? If not are there any open source library which does
| this for us? I would ideally like the state of the code executed,
| including return values, available for the entire chat session,
| like IPython, so that subsequent LLM generated code can use them.
| sci_prog wrote:
| Yeah, OpenInterpreter does this (you are not limited to OpenAI
| only): https://github.com/OpenInterpreter/open-interpreter
|
| I wrote a wrapper around it that works in a web browser (you'll
| need an OpenAI API key):
| https://github.com/uhsealevelcenter/IDEA
| nekitamo wrote:
| Does the new Agents SDK support streaming audio and Realtime
| models?
| simonw wrote:
| There's a really good thread on Twitter from the designer of the
| new APIs going into the background behind many of the design
| decisions:
| https://twitter.com/athyuttamre/status/1899541471532867821
|
| Here's the alternative link for people who aren't signed in to
| Twitter:
| https://nitter.net/athyuttamre/status/1899541471532867821
| bradyriddle wrote:
| The nitter link is appreciated!
| cowpig wrote:
| Feels like OpenAI really want to compete with its own ecosystem.
| I guess they are doing this to try to position themselves as the
| standard web index that everyone uses, and the standard RAG
| service, etc.
|
| But they could just make great services and live in the infra
| layer instead of trying to squeeze everyone out at the
| application layer. Seems unnecessarily ecosystem-hostile
| andrethegiant wrote:
| $25 per thousand searches seems excessive
| simonw wrote:
| If you want to get an idea for the changes, here's a giant commit
| where they updated ALL of the Python library examples in one go
| from the old chat completions to the new resources APIs:
| https://github.com/openai/openai-python/commit/2954945ecc185...
___________________________________________________________________
(page generated 2025-03-11 23:00 UTC)