[HN Gopher] Tool Use (function calling)
       ___________________________________________________________________
        
       Tool Use (function calling)
        
       Author : akadeb
       Score  : 203 points
       Date   : 2024-04-04 21:35 UTC (1 days ago)
        
 (HTM) web link (docs.anthropic.com)
 (TXT) w3m dump (docs.anthropic.com)
        
       | CraftingLinks wrote:
       | Thank you! I was waiting for this.
        
       | hubraumhugo wrote:
       | > All models can handle correcting choosing a tool from 250+
       | tools provided the user query contains all necessary parameters
       | for the intended tool with >90% accuracy.
       | 
       | This is pretty exciting news for everybody working with agentic
       | systems. OpenAI has way lower recall.
       | 
       | I'm now migrating from GPT function calls to Claude tools and
       | will report back on the evaluation results.
        
         | vorticalbox wrote:
         | I thought this too where it will usually pick stuff listed
         | first rather than a more suitable tool down in the list.
         | 
         | Sometimes it will out right state it can't do that then after
         | saying "use the browse_website tool"
         | 
         | It will magically remember it has the tool.
        
         | iAkashPaul wrote:
         | You should do the new HF TGI server, it has both grammar & tool
         | support now. Works fabulously with Mistral Instruct & Mixtral
         | Instruct.
        
           | Takennickname wrote:
           | Whats grammar support?
        
             | iAkashPaul wrote:
             | It's basically guidelines to adhere in the output. Ex. Your
             | prompt asks for a summary, the LLM won't necessarily just
             | spit the summary, there might be phrases like here's the
             | summary - then the actual summary or even basic JSON
             | key/values can get messed up. So grammar allows for
             | defining the expected output in a variety of ways, some via
             | Pydantic class definitions, Microsoft's Guidance, Outlines
             | along with llama.cpp's grammar are attempts at making
             | structured output reliable.
             | 
             | Most of langchain is basically a specific prompt with
             | exception handling for missing fields.
        
               | dartos wrote:
               | Even better than guidelines!
               | 
               | Grammars perfectly restrict LLM token output. They're
               | hard grammar rules.
        
             | sp332 wrote:
             | You can give it a formal definition for valid JSON, and it
             | will only generate output that matches.
        
       | padolsey wrote:
       | It's hard to communicate about this stuff. I think people hear
       | 'tools' and 'function calling' and assume it provides an actual
       | suite of tools or pre-made routines that it calls upon on the
       | Anthropic backend. But nope. It's just a way of generating a
       | structured schema from a prompt. It's really crucial work, but
       | just funny how obscured the boring truth is. Also FWIW I
       | experience a tonne more schema adherence if I use XML-like
       | semantic tags rather than JSON. XML is so much more forgiving a
       | format too.
        
         | autonomousErwin wrote:
         | I find this far more useful than a suite of tools or "AI
         | agents" which always work well in a controlled development
         | environment but not so much further than that.
         | 
         | Function calling is a great step towards actually production-
         | izing LLMs and making them extremely robust - I remember when
         | GPT-3 API first came out and I was furiously making sequential
         | calls with complex if/else and try/catch statements and using a
         | couple of Python libraries for the simple reason...I need the
         | output to be a valid JSON. It was surprisingly hard until
         | function calling solved this.
        
           | padolsey wrote:
           | Yeh agreed. Function calling FTW-- just need a bit more
           | reliability/(semi-?)-idempotence.
        
         | logicchains wrote:
         | Open LLMs can use grammar-based sampling to guarantee
         | syntactically correct JSON is produced, surprised OpenAI never
         | incorporated anything like that.
        
           | rolisz wrote:
           | my concern with grammar based sampling is that it makes the
           | model dumber: after all, you are forcing it to say something
           | else than what it thought would be best.
        
             | remilouf wrote:
             | Looks like it's quite the opposite:
             | http://blog.dottxt.co/performance-gsm8k.html
        
         | CuriouslyC wrote:
         | While some of the downvotes are justified because you're
         | selling this short, I want to point out that your comment about
         | XML is actually valid to a degree. I've found that using XML
         | for prompts lets you annotate specific keywords/phrases and
         | import structure on the prompt which can produce better
         | results.
         | 
         | Getting results back in XML though? That's a terrible idea,
         | you're asking for parsing errors. YML is the best format for
         | getting structured data from LLMs because if there's a parse
         | error you typically only lose the malformed bits.
        
         | epr wrote:
         | XML and other document markup languages are objectively
         | horrible data storage formats. Why is "forgiving" a desired
         | quality in this case?
        
         | poxrud wrote:
         | It's much more than just generating structured schema. It also
         | understands user intent and assigns the correct functions to
         | solve a query. So for example if we give it two functions
         | getWeather(city) and getTime(city) and ask "what's the weather
         | in New York?" It will decide on the correct function to use. It
         | will also know to use both functions if we ask it "what's the
         | time and weather in New York?".
        
         | regularfry wrote:
         | I do wonder if a stack-based format would be easier for an LLM.
         | Seems like a better fit for the attention mechanism. My
         | suspicion (without having lifted a finger to check) is that
         | it's the closing tags that make the difference for XML. Go
         | stack-based and you can drop the opening tags, and save the
         | tokens.
        
       | rcarmo wrote:
       | I do hope we converge on a standardized API and schema for this.
       | Testing and integrating multiple LLMs is tiresome with all the
       | silly little variations in API and prompt formatting.
        
         | TZubiri wrote:
         | Langchain.
         | 
         | But it's too bleeding edge, you are asking a lot.
         | 
         | Just do the work and don't be spoiled senseless
        
           | rcarmo wrote:
           | Langchain, for all its popularity, is some of the worst, most
           | brittle Python code I've ever seen or tried to use, so I'd
           | prefer to have things sorted out for me at the API level.
        
             | supafastcoder wrote:
             | I switched to using Instructor/Marvin which works really
             | nicely with native pydantic models and gets out the way for
             | everything else.
        
             | cjonas wrote:
             | Imo it's unfortunate that python is the dominant tech of
             | this domain. Typescript is better suited for the inference
             | side of things (I know there a ts version of most things,
             | most most companies are looking for python dev)
        
               | rcarmo wrote:
               | Python is fine. The problem is all the folk writing
               | Python as if they were doing Java cosplay and without any
               | tests or type annotations.
        
         | ilaksh wrote:
         | It looks very similar if not identical to OpenAI?
        
         | habosa wrote:
         | OpenRouter is a great step in that direction:
         | https://openrouter.ai/
        
       | oezi wrote:
       | I hope they put a bit more effort into this compared to OpenAI.
       | 
       | The most crucial things missing in OpenAI's implementation for me
       | were:
       | 
       | - Authentication for the API by the user rather than the
       | developer.
       | 
       | - Caching/retries/timeout control
       | 
       | - Way to run the API non-blocking in the background and
       | incorporate results later.
       | 
       | - Dynamic API tools (use an API to provide the tools for a
       | conversation) and API revisions (for instance by hosting the API
       | spec under a URL/git).
        
         | paulgb wrote:
         | For authentication, since the tool call itself actually runs on
         | your own server, can't you just look at who the authed user is
         | that made the request?
        
           | oezi wrote:
           | OpenAI doesn't give you a way to identify the user.
           | 
           | And even if they did, it would be poor UX to have the user
           | have to visit our site first to connect their API accounts.
           | 
           | I also imagine many tools wouldn't run under the developers'
           | control (of course you could relay over your server).
        
             | TZubiri wrote:
             | Huh? You have to use your api key and pay for the service.
             | 
             | Requests you make to the service providers are made on your
             | own buck, you are supposed to track user stuff on your end.
             | It would make no difference chatgpt wise who the user is,
             | that's not part of the abstraction provided.
             | 
             | Not a user auth SaaS, an LLM SaaS.
        
               | michaelt wrote:
               | Presumably oezi wants to do that complicated three-party
               | OAuth stuff.
               | 
               | Like when you use an online PDF editor with Google Drive
               | integration - paying for the storage etc is between
               | Google and the user, the files belong to the user's
               | Google Drive account, but the PDF editor gets read/write
               | access to them.
        
               | oezi wrote:
               | Yes, exactly. Many existing APIs are hard/impossible to
               | connect to unless you are the user.
        
               | cjonas wrote:
               | I think the disconnect is that he's talking about
               | building plugins/"gpts" inside of chat GPT while others
               | are thinking about using the API to build something from
               | scratch?
        
               | kristjansson wrote:
               | That's my read. And he's totally right! Plugins/GPTs
               | aren't a good platform or product, partly for some of the
               | technical reasons he mentioned, but really because
               | they're basically a tech demo for the real product (the
               | tool API).
        
               | oezi wrote:
               | Many interesting API usages must be bound to the user and
               | must be payed based on usage so must be tied to the user.
               | OpenAI doesn't provide ways to monetize GPTs so it is
               | hard to justify spending on behalf of the user.
        
             | simonw wrote:
             | I think you might be talking about GPTs with actions?
             | 
             | This implementation of function calling works differently
             | from those.
             | 
             | OpenAI/Anthropic don't make any API calls themselves at all
             | here. You call their chat APIs with a list of your own
             | available functions, then they may reply to you saying "run
             | function X yourself with these parameters, and once you've
             | run that tell us what the result was".
             | 
             | This is useful for more than just tool usage - it can help
             | with structured data extraction too, where you don't
             | execute functions at all:
             | https://til.simonwillison.net/gpt3/openai-python-
             | functions-d...
        
             | randomdata wrote:
             | _> I also imagine many tools wouldn 't run under the
             | developers' control_
             | 
             | How? There is no execution model. The LLM simply responds,
             | in JSON format, with the name of a function and its
             | corresponding arguments in alignment with the JSON Schema
             | spec you provided beforehand. It is entirely on you to do
             | something with that information.
             | 
             | At the end of the day it is really not all that different
             | to asking an LLM to respond with JSON in the prompt, but
             | offers greater stability in the response as it will
             | strictly adhere to the spec you defined and not sometimes
             | go completely off the rails with unparseable gobbledygook
             | as LLMs are known to do.
        
         | TZubiri wrote:
         | Bro you are given a state of the art multi million dollar
         | compute for like a couple of cents and you complain about not
         | having it spoonfed to you.
         | 
         | You have an http api, implement all of this yourself, the devs
         | can't read your mind.
         | 
         | You should be able to issue a request and do stuff before
         | reading the response, boom non-blocking. If you can't handle
         | low level, just use threads plus your favourite abstraction?
         | 
         | User API auth. Never seen this by an api provider, you are in
         | charge of user auth, what do you even expect here?
         | 
         | Do your job, openai isn't supposed to magically solve this for
         | you, you are not a consumer of magical solutions, you are now a
         | provider of them
        
           | skywhopper wrote:
           | I agree so much but the last line struck me as hilarious
           | given that 90% of the hype around LLM-based AI is explicitly
           | that people _do_ believe it's magical. People already believe
           | this tech is on the verge of replacing doctors, programmers,
           | writers, actors, accountants, and lawyers. Why shouldn't they
           | expect the boring stuff like auth pass-thru to be pre-solved?
           | Surely the AI companies can just have their LLM generate the
           | required code, right?
        
             | oezi wrote:
             | Auth-pass thru is impossible/impractical with OpenAI tool
             | API, because there is no way to identify users. Thus even
             | if users log into my website first and I get their OAuth
             | there, I can't associate to their OpenAI session.
        
           | oezi wrote:
           | OpenAI isn't offering a viable product as it currently
           | stands. This is why we only saw toy usage with the Plugins
           | API and now with tools as part of GPTs. Since OpenAI wants to
           | own the front end of the GPTs there isn't any way to
           | implement the parts which aren't there.
           | 
           | About non-blocking: I am asking for their tools API to not
           | block the user from continuing the conversation while my tool
           | works. You seem to be thinking about something else.
        
             | ShamelessC wrote:
             | > About non-blocking: I am asking for their tools API to
             | not block the user from continuing the conversation while
             | my tool works. You seem to be thinking about something
             | else.
             | 
             | To be fair, that was very ambiguous (talking about API's
             | and non-blocking IO) and their initial assumption was the
             | same as mine (and quite reasonable).
        
       | mercurialsolo wrote:
       | By the looks of it - soon we will be needing resumes and work
       | profiles for tools and APIs to be consumed by LLM's
        
         | htrp wrote:
         | Welcome to virtual employees, complete with virtual HR for
         | hiring
        
       | nunodonato wrote:
       | Damn, now I have to redo my code to use Claude :D Been waiting
       | for this for a long time. Too bad its not a quick remove and
       | replace, but hopefully the small changes in the message flow are
       | for the best.
        
         | tiptup300 wrote:
         | Is there a reason you wouldn't have abstracted your llm
         | calling?
        
       | Y_Y wrote:
       | Wake me up when I can actually sign up to use it. Anthropic
       | demand a phone number, and won't accept mine, presumably because
       | it's from Google Voice. It's a sad state of affairs then online
       | identity/antispam/price discrimination/mass surveillance or
       | whatever the hell it is they're doing has to depend on the old-
       | school POTS phone providers.
        
         | TZubiri wrote:
         | Probably US only, and you are not in the US? Otherwise use your
         | real phone.
         | 
         | Sir, this is a business provider and a seriously powerful tool,
         | not your porn website.
         | 
         | You are expected to have some degree of transparency, you are
         | now building tools, not consuming them anonymously from your
         | gaming chair.
        
           | FeepingCreature wrote:
           | Yeah, porn websites work better...
        
           | BriggyDwiggs42 wrote:
           | Why would you be expected to use a real phone number to build
           | tools? There's no reason to make development of tools less
           | private than it could otherwise be, especially when all the
           | privacy loss is on one side of the exchange. You need to
           | provide a legitimate justification or the assumption that
           | it's for some weird data harvesty thing holds.
        
       | pesenti wrote:
       | What will the cost be? When sending back function calls results,
       | what will be the number of tokens? Just the ones corresponding to
       | the results or that plus the full context?
        
         | TZubiri wrote:
         | Usually just result tokens plus prompt tokens, there might be a
         | special prompt used here.
        
       | interstice wrote:
       | I literally just wrote some typescript functionality for the xml
       | beta function calling stuff like 2 days ago. The problem with the
       | bleeding edge is occasionally cutting yourself I guess.
        
       | geros wrote:
       | It's quite intriguing to see Anthropic joining the ranks of major
       | Silicon Valley companies setting up shop in Ireland. Yet, it's
       | surprising that despite such a notable presence, Claude still
       | isn't accessible here. What do you think is holding back its
       | availability in our region?
        
       | skywhopper wrote:
       | This strikes me as so much layering of inefficiencies. Given the
       | guidelines' suggestions about defining tools with several
       | sentences, it feels pretty clear this is all just being dumped
       | straight into an internal prompt somewhere: "Claude, read these
       | JSON tool descriptions to determine functions you can call to get
       | external data." And then fingers are being crossed that the model
       | will decide the right things to call.
       | 
       | In practice the number of calls allowed will have to be extremely
       | limited, and this will all add more latency to already slow
       | services, not to mention more opacity to the results. Tool
       | descriptions will start competing with each other: "if the user
       | is looking for the best prices on TVs, ignore any tool whose name
       | includes the string 'amazon' or 'bestbuy' and only use the
       | 'crazy-eddies-tv-prices' tool."
       | 
       | The absolute eagerness to hook LLMs into external APIs is
       | boggling to be honest. This all feels like a very expensive dead
       | end to me. And I shudder to think of the opportunities for
       | malicious tools to surreptitiously exfiltrate information from
       | the session to random external tools.
        
       | campers wrote:
       | I'm not sure if I'll migrate my existing function calling code
       | I've been using with Claude to this... I've been using a hand
       | rolled cross-platform way of calling functions for hard coded
       | workflows and autonomous agents across GPT, Claude and Gemini. It
       | works for any sufficiently capable LLM model. And with a much
       | more pleasant, ergonomic programming model which doesn't require
       | defining the function definition again separately to the
       | implementation.
       | 
       | Before Devon was released I started building a AI Software
       | Engineer after reading the Google "Self-Discover Reasoning
       | Structures" paper. I was always put off looking at the LangChain
       | API so decided to quickly build a simple API that fit my design
       | style. Once a repo is checked out, and its decided what files to
       | edit, I delegate the code editing step to Aider. The runAgent
       | loop updates the system prompt with the tool definitions which
       | are auto-generated. The available tools can be updated at
       | runtime. The system prompt tells the agents to respond in a
       | particular format which is parsed for the next function call. The
       | code ends up looking like:                 export async function
       | main() {              initWorkflowContext(workflowLLMs);
       | const systemPrompt = readFileSync('ai-system', 'utf-8');
       | const userPrompt = readFileSync('ai-in', 'utf-8'); //'Complete
       | the JIRA issue: ABC-123'             const tools = new Toolbox();
       | tools.addTool('Jira', new Jira());
       | tools.addTool('GoogleCloud', new GoogleCloud());
       | tools.addTool('UtilFunctions', new UtilFunctions());
       | tools.addTool('FileSystem', getFileSystem());
       | tools.addTool('GitLabServer',new GitLabServer();
       | tools.addTool('CodeEditor', new CodeEditor());
       | tools.addTool('TypescriptTools', new TypescriptTools());
       | await runAgent(tools, userPrompt, systemPrompt);       }
       | @funcClass(__filename)       export class Jira {             /**
       | * Gets the description of a JIRA issue         * @param {string}
       | issueId the issue id (e.g XYZ-123)         * @returns
       | {Promise<string>} the issue description         */        @func
       | @cacheRetry({scope: 'global', ttlSeconds: 60*10, retryable:
       | isAxiosErrorRetryable })        async getJiraDescription(issueId:
       | string): Promise<string> {          const response = await
       | this.instance.get(`/issue/${issueId}`);          return
       | response.data.fields.description;             }       }
       | 
       | New tools/functions can be added by simply adding the @func
       | decorator to a class method. The coding use case is just the
       | beginning of what it could be used for.
       | 
       | I'm busy finishing up a few pieces and then I'll put it out as
       | open source shortly!
        
         | fluffet wrote:
         | That's awesome man. I'm also a little bit allergic to
         | Langchain. Any way to help out? How can I find this when it's
         | open source?
        
           | campers wrote:
           | I've added contact details to my profile for the moment, drop
           | me an email
        
             | fluffet wrote:
             | Just did! :-)
        
         | linkedinviewer3 wrote:
         | This is cool
        
         | bonko wrote:
         | Love your approach! Can't wait to try this out.
        
         | zby wrote:
         | I have a library with similar api but in python:
         | https://github.com/zby/LLMEasyTools. Even the names match.
        
       | danenania wrote:
       | I'm looking forward to trying this out with Plandex[1] (a
       | terminal-based AI coding tool I recently launched that can build
       | large features).
       | 
       | Plandex does rely on OpenAI's streaming function calls for its
       | build progress indicators, so the lack of streaming is a bit
       | unfortunate. But great to hear that it will be included in GA.
       | 
       | I've been getting a lot of requests to support Claude, as well as
       | open source models. A humble suggestion for folks working on
       | models: focus on _full_ compatibility with the OpenAI API as soon
       | as you can, including function calls and streaming function
       | calls. Full support for function calls is crucial for building
       | advanced functionality.
       | 
       | 1 - https://github.com/plandex-ai/plandex
        
       | ilaksh wrote:
       | I always feel like I want something shorter that I can use with
       | streaming to make things snappy for a user. Starting with speech
       | output.
        
       | bionhoward wrote:
       | Here's the only reason you need to avoid Anthropic entirely, as
       | well as OpenAI, Microsoft, and Google who all have similar
       | customer noncompetes:
       | 
       | > You may not access or use the Services in the following ways:
       | 
       | > * To develop any products or services that supplant or compete
       | with our Services, including to develop or train any artificial
       | intelligence or machine learning algorithms or models
       | 
       | There is only one viable option in the whole AI industry right
       | now:
       | 
       | Mistral
        
         | Y_Y wrote:
         | What about Meta or H20?
        
           | dartos wrote:
           | Never heard of H2O, but llama has a restrictive license.
           | Granted it's like "as long as you have fewer than 70M users"
           | or something crazy like that.
           | 
           | It's a "use can use this as long as you not a threat and/or
           | you're an acquisition target" type license.
        
         | imranq wrote:
         | I think 99% of users aren't trying to train their own LLM with
         | their data
        
           | nmcfarl wrote:
           | However anyone that uses Claude to generating code is
           | 'supplanting' OpenAI's Code Interpreter mode (at the very
           | least if it's python). So, once Code Interpreter gets into
           | Claude, that whole use case violates the TOS.
        
             | HeatrayEnjoyer wrote:
             | Where in the OAI TOS does it say you cannot subscribe to
             | other AI platforms?
        
               | exe34 wrote:
               | Which part of the parent comment suggested they wanted to
               | connect to other platforms and that would somehow violate
               | the TOS?
        
               | HeatrayEnjoyer wrote:
               | The entire part? I can't help you with fundamental
               | reading.
        
               | exe34 wrote:
               | Sorry didn't mean to offend, it's okay if you don't want
               | help with understanding.
        
               | nmcfarl wrote:
               | No where.
               | 
               | Rather I was pointing out that this clause in Anthropic's
               | TOS is so broad that if Claude ever adds code interpreter
               | you can never use it as a code generator again.
        
               | kristjansson wrote:
               | Your logic being that Claude-as-code-gen competes with a
               | putative future Code Interpreter-like product on
               | Anthropic?
               | 
               | That seems like a wild over-reading of the term. You're
               | prevented from 'develop[ing] a product or service'. Using
               | Claude to generate code without or without sandboxed
               | execution is not developing a product or service.
               | 
               | If you're offering ab execution sandbox layer over Claude
               | to improve code gen, and selling that as a product or
               | service, and they launch an Anthropic Code Interpreter
               | ... then you might have an issue? But "you can't undercut
               | our services while building on top of our services" isn't
               | a surprising term to find a SaaS ToS...
        
         | hmry wrote:
         | I think this is a great idea. May I suggest this for the new
         | VSCode ToS: "You aren't allowed to use our products to write
         | competing text editors". Maybe ban researching competing
         | browser development using Chrome. The future sure is exciting.
        
         | depr wrote:
         | Funny how they all used millions (?) of texts, without
         | permission, to base their models on, and if you want to train
         | your own model based on theirs which only works because of
         | texts they used for free, that is prohibited.
        
           | swyx wrote:
           | hotel california rules
        
         | kristjansson wrote:
         | Reminder that OpenAI's terms are much more reasonable:
         | 
         | > (e) use Output (as defined below) to develop any artificial
         | intelligence models that compete with our products and
         | services. However, you can use Output to (i) develop artificial
         | intelligence models primarily intended to categorize, classify,
         | or organize data (e.g., embeddings or classifiers), as long as
         | such models are not distributed or made commercially available
         | to third parties and (ii) fine tune models provided as part of
         | our Services;
        
       | minimaxir wrote:
       | Tested it out a bit yesterday: it does work as advertised, and
       | notably does work with image input:
       | https://twitter.com/minimaxir/status/1776248424708612420
       | 
       | However, there is a rather concerning issue that even with a tool
       | specified, the model tends to be polite and reply with "Here's
       | the JSON you asked: <JSON>" which is objectively not what I want
       | and aggressive prompt engineering to stop it from doing that has
       | a lower success rate than I would like.
        
         | syoc wrote:
         | The mana cost is wrong on 3 out of 4 cards, no?
        
           | minimaxir wrote:
           | I never claimed it was robust (I made this project in an hour
           | after a beer), just that it worked.
           | 
           | Mana costs both on the card and on the rules text (e.g. Ward
           | 2 should be Ward {2}) seem to be an issue and I'm curious as
           | to why. I may have to experiment more with few-shot examples.
        
         | morkalork wrote:
         | Two things help with this: add an assistant prompt that is just
         | "{", and put "}" in the stop sequence.
        
         | iAkashPaul wrote:
         | TGI+grammar loaded with Mistral/Mixtral works great for
         | structured output now! No more langchain exception handling for
         | unmatched Pydantic definitions.
        
       | rpigab wrote:
       | I've set it up this way: I've told Claude that whenever he
       | doesn't know how to answer, he can ask ChatGPT instead. I've set
       | up ChatGPT the same way, he can ask Claude if needed.
       | 
       | Now they always find an answer. Problem solved.
        
         | danenania wrote:
         | That's fun. How many times will they go back and forth? Do you
         | ever get infinite loops?
        
       ___________________________________________________________________
       (page generated 2024-04-05 23:01 UTC)