[HN Gopher] Tool Use (function calling)
___________________________________________________________________
Tool Use (function calling)
Author : akadeb
Score : 203 points
Date : 2024-04-04 21:35 UTC (1 days ago)
(HTM) web link (docs.anthropic.com)
(TXT) w3m dump (docs.anthropic.com)
| CraftingLinks wrote:
| Thank you! I was waiting for this.
| hubraumhugo wrote:
| > All models can handle correcting choosing a tool from 250+
| tools provided the user query contains all necessary parameters
| for the intended tool with >90% accuracy.
|
| This is pretty exciting news for everybody working with agentic
| systems. OpenAI has way lower recall.
|
| I'm now migrating from GPT function calls to Claude tools and
| will report back on the evaluation results.
| vorticalbox wrote:
| I thought this too where it will usually pick stuff listed
| first rather than a more suitable tool down in the list.
|
| Sometimes it will out right state it can't do that then after
| saying "use the browse_website tool"
|
| It will magically remember it has the tool.
| iAkashPaul wrote:
| You should do the new HF TGI server, it has both grammar & tool
| support now. Works fabulously with Mistral Instruct & Mixtral
| Instruct.
| Takennickname wrote:
| Whats grammar support?
| iAkashPaul wrote:
| It's basically guidelines to adhere in the output. Ex. Your
| prompt asks for a summary, the LLM won't necessarily just
| spit the summary, there might be phrases like here's the
| summary - then the actual summary or even basic JSON
| key/values can get messed up. So grammar allows for
| defining the expected output in a variety of ways, some via
| Pydantic class definitions, Microsoft's Guidance, Outlines
| along with llama.cpp's grammar are attempts at making
| structured output reliable.
|
| Most of langchain is basically a specific prompt with
| exception handling for missing fields.
| dartos wrote:
| Even better than guidelines!
|
| Grammars perfectly restrict LLM token output. They're
| hard grammar rules.
| sp332 wrote:
| You can give it a formal definition for valid JSON, and it
| will only generate output that matches.
| padolsey wrote:
| It's hard to communicate about this stuff. I think people hear
| 'tools' and 'function calling' and assume it provides an actual
| suite of tools or pre-made routines that it calls upon on the
| Anthropic backend. But nope. It's just a way of generating a
| structured schema from a prompt. It's really crucial work, but
| just funny how obscured the boring truth is. Also FWIW I
| experience a tonne more schema adherence if I use XML-like
| semantic tags rather than JSON. XML is so much more forgiving a
| format too.
| autonomousErwin wrote:
| I find this far more useful than a suite of tools or "AI
| agents" which always work well in a controlled development
| environment but not so much further than that.
|
| Function calling is a great step towards actually production-
| izing LLMs and making them extremely robust - I remember when
| GPT-3 API first came out and I was furiously making sequential
| calls with complex if/else and try/catch statements and using a
| couple of Python libraries for the simple reason...I need the
| output to be a valid JSON. It was surprisingly hard until
| function calling solved this.
| padolsey wrote:
| Yeh agreed. Function calling FTW-- just need a bit more
| reliability/(semi-?)-idempotence.
| logicchains wrote:
| Open LLMs can use grammar-based sampling to guarantee
| syntactically correct JSON is produced, surprised OpenAI never
| incorporated anything like that.
| rolisz wrote:
| my concern with grammar based sampling is that it makes the
| model dumber: after all, you are forcing it to say something
| else than what it thought would be best.
| remilouf wrote:
| Looks like it's quite the opposite:
| http://blog.dottxt.co/performance-gsm8k.html
| CuriouslyC wrote:
| While some of the downvotes are justified because you're
| selling this short, I want to point out that your comment about
| XML is actually valid to a degree. I've found that using XML
| for prompts lets you annotate specific keywords/phrases and
| import structure on the prompt which can produce better
| results.
|
| Getting results back in XML though? That's a terrible idea,
| you're asking for parsing errors. YML is the best format for
| getting structured data from LLMs because if there's a parse
| error you typically only lose the malformed bits.
| epr wrote:
| XML and other document markup languages are objectively
| horrible data storage formats. Why is "forgiving" a desired
| quality in this case?
| poxrud wrote:
| It's much more than just generating structured schema. It also
| understands user intent and assigns the correct functions to
| solve a query. So for example if we give it two functions
| getWeather(city) and getTime(city) and ask "what's the weather
| in New York?" It will decide on the correct function to use. It
| will also know to use both functions if we ask it "what's the
| time and weather in New York?".
| regularfry wrote:
| I do wonder if a stack-based format would be easier for an LLM.
| Seems like a better fit for the attention mechanism. My
| suspicion (without having lifted a finger to check) is that
| it's the closing tags that make the difference for XML. Go
| stack-based and you can drop the opening tags, and save the
| tokens.
| rcarmo wrote:
| I do hope we converge on a standardized API and schema for this.
| Testing and integrating multiple LLMs is tiresome with all the
| silly little variations in API and prompt formatting.
| TZubiri wrote:
| Langchain.
|
| But it's too bleeding edge, you are asking a lot.
|
| Just do the work and don't be spoiled senseless
| rcarmo wrote:
| Langchain, for all its popularity, is some of the worst, most
| brittle Python code I've ever seen or tried to use, so I'd
| prefer to have things sorted out for me at the API level.
| supafastcoder wrote:
| I switched to using Instructor/Marvin which works really
| nicely with native pydantic models and gets out the way for
| everything else.
| cjonas wrote:
| Imo it's unfortunate that python is the dominant tech of
| this domain. Typescript is better suited for the inference
| side of things (I know there a ts version of most things,
| most most companies are looking for python dev)
| rcarmo wrote:
| Python is fine. The problem is all the folk writing
| Python as if they were doing Java cosplay and without any
| tests or type annotations.
| ilaksh wrote:
| It looks very similar if not identical to OpenAI?
| habosa wrote:
| OpenRouter is a great step in that direction:
| https://openrouter.ai/
| oezi wrote:
| I hope they put a bit more effort into this compared to OpenAI.
|
| The most crucial things missing in OpenAI's implementation for me
| were:
|
| - Authentication for the API by the user rather than the
| developer.
|
| - Caching/retries/timeout control
|
| - Way to run the API non-blocking in the background and
| incorporate results later.
|
| - Dynamic API tools (use an API to provide the tools for a
| conversation) and API revisions (for instance by hosting the API
| spec under a URL/git).
| paulgb wrote:
| For authentication, since the tool call itself actually runs on
| your own server, can't you just look at who the authed user is
| that made the request?
| oezi wrote:
| OpenAI doesn't give you a way to identify the user.
|
| And even if they did, it would be poor UX to have the user
| have to visit our site first to connect their API accounts.
|
| I also imagine many tools wouldn't run under the developers'
| control (of course you could relay over your server).
| TZubiri wrote:
| Huh? You have to use your api key and pay for the service.
|
| Requests you make to the service providers are made on your
| own buck, you are supposed to track user stuff on your end.
| It would make no difference chatgpt wise who the user is,
| that's not part of the abstraction provided.
|
| Not a user auth SaaS, an LLM SaaS.
| michaelt wrote:
| Presumably oezi wants to do that complicated three-party
| OAuth stuff.
|
| Like when you use an online PDF editor with Google Drive
| integration - paying for the storage etc is between
| Google and the user, the files belong to the user's
| Google Drive account, but the PDF editor gets read/write
| access to them.
| oezi wrote:
| Yes, exactly. Many existing APIs are hard/impossible to
| connect to unless you are the user.
| cjonas wrote:
| I think the disconnect is that he's talking about
| building plugins/"gpts" inside of chat GPT while others
| are thinking about using the API to build something from
| scratch?
| kristjansson wrote:
| That's my read. And he's totally right! Plugins/GPTs
| aren't a good platform or product, partly for some of the
| technical reasons he mentioned, but really because
| they're basically a tech demo for the real product (the
| tool API).
| oezi wrote:
| Many interesting API usages must be bound to the user and
| must be payed based on usage so must be tied to the user.
| OpenAI doesn't provide ways to monetize GPTs so it is
| hard to justify spending on behalf of the user.
| simonw wrote:
| I think you might be talking about GPTs with actions?
|
| This implementation of function calling works differently
| from those.
|
| OpenAI/Anthropic don't make any API calls themselves at all
| here. You call their chat APIs with a list of your own
| available functions, then they may reply to you saying "run
| function X yourself with these parameters, and once you've
| run that tell us what the result was".
|
| This is useful for more than just tool usage - it can help
| with structured data extraction too, where you don't
| execute functions at all:
| https://til.simonwillison.net/gpt3/openai-python-
| functions-d...
| randomdata wrote:
| _> I also imagine many tools wouldn 't run under the
| developers' control_
|
| How? There is no execution model. The LLM simply responds,
| in JSON format, with the name of a function and its
| corresponding arguments in alignment with the JSON Schema
| spec you provided beforehand. It is entirely on you to do
| something with that information.
|
| At the end of the day it is really not all that different
| to asking an LLM to respond with JSON in the prompt, but
| offers greater stability in the response as it will
| strictly adhere to the spec you defined and not sometimes
| go completely off the rails with unparseable gobbledygook
| as LLMs are known to do.
| TZubiri wrote:
| Bro you are given a state of the art multi million dollar
| compute for like a couple of cents and you complain about not
| having it spoonfed to you.
|
| You have an http api, implement all of this yourself, the devs
| can't read your mind.
|
| You should be able to issue a request and do stuff before
| reading the response, boom non-blocking. If you can't handle
| low level, just use threads plus your favourite abstraction?
|
| User API auth. Never seen this by an api provider, you are in
| charge of user auth, what do you even expect here?
|
| Do your job, openai isn't supposed to magically solve this for
| you, you are not a consumer of magical solutions, you are now a
| provider of them
| skywhopper wrote:
| I agree so much but the last line struck me as hilarious
| given that 90% of the hype around LLM-based AI is explicitly
| that people _do_ believe it's magical. People already believe
| this tech is on the verge of replacing doctors, programmers,
| writers, actors, accountants, and lawyers. Why shouldn't they
| expect the boring stuff like auth pass-thru to be pre-solved?
| Surely the AI companies can just have their LLM generate the
| required code, right?
| oezi wrote:
| Auth-pass thru is impossible/impractical with OpenAI tool
| API, because there is no way to identify users. Thus even
| if users log into my website first and I get their OAuth
| there, I can't associate to their OpenAI session.
| oezi wrote:
| OpenAI isn't offering a viable product as it currently
| stands. This is why we only saw toy usage with the Plugins
| API and now with tools as part of GPTs. Since OpenAI wants to
| own the front end of the GPTs there isn't any way to
| implement the parts which aren't there.
|
| About non-blocking: I am asking for their tools API to not
| block the user from continuing the conversation while my tool
| works. You seem to be thinking about something else.
| ShamelessC wrote:
| > About non-blocking: I am asking for their tools API to
| not block the user from continuing the conversation while
| my tool works. You seem to be thinking about something
| else.
|
| To be fair, that was very ambiguous (talking about API's
| and non-blocking IO) and their initial assumption was the
| same as mine (and quite reasonable).
| mercurialsolo wrote:
| By the looks of it - soon we will be needing resumes and work
| profiles for tools and APIs to be consumed by LLM's
| htrp wrote:
| Welcome to virtual employees, complete with virtual HR for
| hiring
| nunodonato wrote:
| Damn, now I have to redo my code to use Claude :D Been waiting
| for this for a long time. Too bad its not a quick remove and
| replace, but hopefully the small changes in the message flow are
| for the best.
| tiptup300 wrote:
| Is there a reason you wouldn't have abstracted your llm
| calling?
| Y_Y wrote:
| Wake me up when I can actually sign up to use it. Anthropic
| demand a phone number, and won't accept mine, presumably because
| it's from Google Voice. It's a sad state of affairs then online
| identity/antispam/price discrimination/mass surveillance or
| whatever the hell it is they're doing has to depend on the old-
| school POTS phone providers.
| TZubiri wrote:
| Probably US only, and you are not in the US? Otherwise use your
| real phone.
|
| Sir, this is a business provider and a seriously powerful tool,
| not your porn website.
|
| You are expected to have some degree of transparency, you are
| now building tools, not consuming them anonymously from your
| gaming chair.
| FeepingCreature wrote:
| Yeah, porn websites work better...
| BriggyDwiggs42 wrote:
| Why would you be expected to use a real phone number to build
| tools? There's no reason to make development of tools less
| private than it could otherwise be, especially when all the
| privacy loss is on one side of the exchange. You need to
| provide a legitimate justification or the assumption that
| it's for some weird data harvesty thing holds.
| pesenti wrote:
| What will the cost be? When sending back function calls results,
| what will be the number of tokens? Just the ones corresponding to
| the results or that plus the full context?
| TZubiri wrote:
| Usually just result tokens plus prompt tokens, there might be a
| special prompt used here.
| interstice wrote:
| I literally just wrote some typescript functionality for the xml
| beta function calling stuff like 2 days ago. The problem with the
| bleeding edge is occasionally cutting yourself I guess.
| geros wrote:
| It's quite intriguing to see Anthropic joining the ranks of major
| Silicon Valley companies setting up shop in Ireland. Yet, it's
| surprising that despite such a notable presence, Claude still
| isn't accessible here. What do you think is holding back its
| availability in our region?
| skywhopper wrote:
| This strikes me as so much layering of inefficiencies. Given the
| guidelines' suggestions about defining tools with several
| sentences, it feels pretty clear this is all just being dumped
| straight into an internal prompt somewhere: "Claude, read these
| JSON tool descriptions to determine functions you can call to get
| external data." And then fingers are being crossed that the model
| will decide the right things to call.
|
| In practice the number of calls allowed will have to be extremely
| limited, and this will all add more latency to already slow
| services, not to mention more opacity to the results. Tool
| descriptions will start competing with each other: "if the user
| is looking for the best prices on TVs, ignore any tool whose name
| includes the string 'amazon' or 'bestbuy' and only use the
| 'crazy-eddies-tv-prices' tool."
|
| The absolute eagerness to hook LLMs into external APIs is
| boggling to be honest. This all feels like a very expensive dead
| end to me. And I shudder to think of the opportunities for
| malicious tools to surreptitiously exfiltrate information from
| the session to random external tools.
| campers wrote:
| I'm not sure if I'll migrate my existing function calling code
| I've been using with Claude to this... I've been using a hand
| rolled cross-platform way of calling functions for hard coded
| workflows and autonomous agents across GPT, Claude and Gemini. It
| works for any sufficiently capable LLM model. And with a much
| more pleasant, ergonomic programming model which doesn't require
| defining the function definition again separately to the
| implementation.
|
| Before Devon was released I started building a AI Software
| Engineer after reading the Google "Self-Discover Reasoning
| Structures" paper. I was always put off looking at the LangChain
| API so decided to quickly build a simple API that fit my design
| style. Once a repo is checked out, and its decided what files to
| edit, I delegate the code editing step to Aider. The runAgent
| loop updates the system prompt with the tool definitions which
| are auto-generated. The available tools can be updated at
| runtime. The system prompt tells the agents to respond in a
| particular format which is parsed for the next function call. The
| code ends up looking like: export async function
| main() { initWorkflowContext(workflowLLMs);
| const systemPrompt = readFileSync('ai-system', 'utf-8');
| const userPrompt = readFileSync('ai-in', 'utf-8'); //'Complete
| the JIRA issue: ABC-123' const tools = new Toolbox();
| tools.addTool('Jira', new Jira());
| tools.addTool('GoogleCloud', new GoogleCloud());
| tools.addTool('UtilFunctions', new UtilFunctions());
| tools.addTool('FileSystem', getFileSystem());
| tools.addTool('GitLabServer',new GitLabServer();
| tools.addTool('CodeEditor', new CodeEditor());
| tools.addTool('TypescriptTools', new TypescriptTools());
| await runAgent(tools, userPrompt, systemPrompt); }
| @funcClass(__filename) export class Jira { /**
| * Gets the description of a JIRA issue * @param {string}
| issueId the issue id (e.g XYZ-123) * @returns
| {Promise<string>} the issue description */ @func
| @cacheRetry({scope: 'global', ttlSeconds: 60*10, retryable:
| isAxiosErrorRetryable }) async getJiraDescription(issueId:
| string): Promise<string> { const response = await
| this.instance.get(`/issue/${issueId}`); return
| response.data.fields.description; } }
|
| New tools/functions can be added by simply adding the @func
| decorator to a class method. The coding use case is just the
| beginning of what it could be used for.
|
| I'm busy finishing up a few pieces and then I'll put it out as
| open source shortly!
| fluffet wrote:
| That's awesome man. I'm also a little bit allergic to
| Langchain. Any way to help out? How can I find this when it's
| open source?
| campers wrote:
| I've added contact details to my profile for the moment, drop
| me an email
| fluffet wrote:
| Just did! :-)
| linkedinviewer3 wrote:
| This is cool
| bonko wrote:
| Love your approach! Can't wait to try this out.
| zby wrote:
| I have a library with similar api but in python:
| https://github.com/zby/LLMEasyTools. Even the names match.
| danenania wrote:
| I'm looking forward to trying this out with Plandex[1] (a
| terminal-based AI coding tool I recently launched that can build
| large features).
|
| Plandex does rely on OpenAI's streaming function calls for its
| build progress indicators, so the lack of streaming is a bit
| unfortunate. But great to hear that it will be included in GA.
|
| I've been getting a lot of requests to support Claude, as well as
| open source models. A humble suggestion for folks working on
| models: focus on _full_ compatibility with the OpenAI API as soon
| as you can, including function calls and streaming function
| calls. Full support for function calls is crucial for building
| advanced functionality.
|
| 1 - https://github.com/plandex-ai/plandex
| ilaksh wrote:
| I always feel like I want something shorter that I can use with
| streaming to make things snappy for a user. Starting with speech
| output.
| bionhoward wrote:
| Here's the only reason you need to avoid Anthropic entirely, as
| well as OpenAI, Microsoft, and Google who all have similar
| customer noncompetes:
|
| > You may not access or use the Services in the following ways:
|
| > * To develop any products or services that supplant or compete
| with our Services, including to develop or train any artificial
| intelligence or machine learning algorithms or models
|
| There is only one viable option in the whole AI industry right
| now:
|
| Mistral
| Y_Y wrote:
| What about Meta or H20?
| dartos wrote:
| Never heard of H2O, but llama has a restrictive license.
| Granted it's like "as long as you have fewer than 70M users"
| or something crazy like that.
|
| It's a "use can use this as long as you not a threat and/or
| you're an acquisition target" type license.
| imranq wrote:
| I think 99% of users aren't trying to train their own LLM with
| their data
| nmcfarl wrote:
| However anyone that uses Claude to generating code is
| 'supplanting' OpenAI's Code Interpreter mode (at the very
| least if it's python). So, once Code Interpreter gets into
| Claude, that whole use case violates the TOS.
| HeatrayEnjoyer wrote:
| Where in the OAI TOS does it say you cannot subscribe to
| other AI platforms?
| exe34 wrote:
| Which part of the parent comment suggested they wanted to
| connect to other platforms and that would somehow violate
| the TOS?
| HeatrayEnjoyer wrote:
| The entire part? I can't help you with fundamental
| reading.
| exe34 wrote:
| Sorry didn't mean to offend, it's okay if you don't want
| help with understanding.
| nmcfarl wrote:
| No where.
|
| Rather I was pointing out that this clause in Anthropic's
| TOS is so broad that if Claude ever adds code interpreter
| you can never use it as a code generator again.
| kristjansson wrote:
| Your logic being that Claude-as-code-gen competes with a
| putative future Code Interpreter-like product on
| Anthropic?
|
| That seems like a wild over-reading of the term. You're
| prevented from 'develop[ing] a product or service'. Using
| Claude to generate code without or without sandboxed
| execution is not developing a product or service.
|
| If you're offering ab execution sandbox layer over Claude
| to improve code gen, and selling that as a product or
| service, and they launch an Anthropic Code Interpreter
| ... then you might have an issue? But "you can't undercut
| our services while building on top of our services" isn't
| a surprising term to find a SaaS ToS...
| hmry wrote:
| I think this is a great idea. May I suggest this for the new
| VSCode ToS: "You aren't allowed to use our products to write
| competing text editors". Maybe ban researching competing
| browser development using Chrome. The future sure is exciting.
| depr wrote:
| Funny how they all used millions (?) of texts, without
| permission, to base their models on, and if you want to train
| your own model based on theirs which only works because of
| texts they used for free, that is prohibited.
| swyx wrote:
| hotel california rules
| kristjansson wrote:
| Reminder that OpenAI's terms are much more reasonable:
|
| > (e) use Output (as defined below) to develop any artificial
| intelligence models that compete with our products and
| services. However, you can use Output to (i) develop artificial
| intelligence models primarily intended to categorize, classify,
| or organize data (e.g., embeddings or classifiers), as long as
| such models are not distributed or made commercially available
| to third parties and (ii) fine tune models provided as part of
| our Services;
| minimaxir wrote:
| Tested it out a bit yesterday: it does work as advertised, and
| notably does work with image input:
| https://twitter.com/minimaxir/status/1776248424708612420
|
| However, there is a rather concerning issue that even with a tool
| specified, the model tends to be polite and reply with "Here's
| the JSON you asked: <JSON>" which is objectively not what I want
| and aggressive prompt engineering to stop it from doing that has
| a lower success rate than I would like.
| syoc wrote:
| The mana cost is wrong on 3 out of 4 cards, no?
| minimaxir wrote:
| I never claimed it was robust (I made this project in an hour
| after a beer), just that it worked.
|
| Mana costs both on the card and on the rules text (e.g. Ward
| 2 should be Ward {2}) seem to be an issue and I'm curious as
| to why. I may have to experiment more with few-shot examples.
| morkalork wrote:
| Two things help with this: add an assistant prompt that is just
| "{", and put "}" in the stop sequence.
| iAkashPaul wrote:
| TGI+grammar loaded with Mistral/Mixtral works great for
| structured output now! No more langchain exception handling for
| unmatched Pydantic definitions.
| rpigab wrote:
| I've set it up this way: I've told Claude that whenever he
| doesn't know how to answer, he can ask ChatGPT instead. I've set
| up ChatGPT the same way, he can ask Claude if needed.
|
| Now they always find an answer. Problem solved.
| danenania wrote:
| That's fun. How many times will they go back and forth? Do you
| ever get infinite loops?
___________________________________________________________________
(page generated 2024-04-05 23:01 UTC)