[HN Gopher] Octopus v2: On-device language model for super agent
___________________________________________________________________
Octopus v2: On-device language model for super agent
Author : lawrencechen
Score : 78 points
Date : 2024-04-03 05:12 UTC (17 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| wanderingmind wrote:
| I'm going to start commenting on ArXiV paper links with the same
| request.
|
| 1. Show me the data
|
| 2. Show me the code
|
| 3. Show me the model
|
| If we can't play and modify it easily it doesn't belong in HN.
| mirekrusin wrote:
| https://huggingface.co/NexaAIDev/Octopus-v2
| smcleod wrote:
| Yeah I've got to agree with this. Having links to a paper is
| useful - but not that interesting without a demo and the source
| code. It's not helped that ArXiV has a pretty horrible
| interface for anyone other than people writing papers.
| CGamesPlay wrote:
| So, I guess it's a LoRa for function calls. Makes sense that this
| would work well, and bodes well for creating really cheap request
| routers in more advanced cloud-based situations.
| iandanforth wrote:
| It's not. They do train one version with LoRa but also train
| three variants without.
| gardnr wrote:
| > To mitigate such errors, we propose designating functions as
| unique functional tokens.
|
| I just skimmed the paper but this seems to be the crux of it.
| They map functions to a single token and can then fine-tune
| models to use the token instead of the function name. This
| increases accuracy of smaller LLMs and reduces total number of
| tokens required for prompts and for generations, which is where
| they get their speed gains from.
|
| The paper is worth a look just to see "Figure (2)"
| alwa wrote:
| Figure 2 is incredible.
|
| With only passing familiarity with the norms in this kind of
| work, the accuracy rates of all models on this benchmark suite
| seem suspiciously (and uniformly) high. Is choosing the right
| intention among "20 vehicle functions" or "20 Android APIs"
| consistent with an ordinary level of ambition in this kind of
| research these days?
| jerpint wrote:
| That's pretty clever, encoding atomic concepts as a token
| iandanforth wrote:
| They might even get higher accuracies with a dedicated
| classification layer. By using the existing vocabulary they are
| spreading the probability mass across a _much_ larger space. If
| they stuck to N options where N is the total number of functions
| available to the model I suspect they could get to 100% accuracy.
|
| It's also not clear whether there is sufficient ambiguity in the
| test data for this to be a generalizable model. The difficulty
| with "intent recognition" (which they don't mention but is what
| this problem is called for agents like Siri) is that human
| generated inputs vary widely and are often badly formed. If they
| haven't done extensive evaluation with human users and/or they've
| constrained the functions to be quite distinct then they aren't
| yet tackling a hard problem, they've just got a complex setting.
| vessenes wrote:
| Short summary of the paper:
|
| Take Gemma-2B. Take your API. Use ChatGPT-3.5 to generate 1,000
| "correct" API function call responses by dint of placing only
| your API calls in the pre-prompt, then prompting it. I imagine
| they use ChatGPT to create the request language as well. Then
| make 1,000 "incorrect" API call responses by filling the pre-
| prompt with functions not from your API.
|
| Finetune.
|
| Note that they use "functional tokens" in training - they convert
| a function to a particular, previously unused tokenization, and
| refer to it that way. They claim this speeds up inference (I'm
| sure it does). They don't make any claims as to whether or not it
| changes their accuracy (I bet that it does). It definitely makes
| the system more fragile / harder to train for large and very
| large APIs.
|
| Outcome: highly capable _single API_ function call LLM. They say
| you could do it with as little as 100 training inputs if you
| really wanted.
|
| I think this is interesting, but not world-shattering. I could
| imagine building a nice little service company on it, basically
| just "send us a git repo and you'll get a helpful function call
| API for this version of your code which you can hook up to an API
| endpoint / chatbot".
|
| Limitations are going to be largely around Gemma-2B's skills -- A
| 2B model isn't super sophisticated. And you can see they specify
| "<30 tokens" for the prompt. But, I imagine this could be trained
| quickly enough that it could be part of a release CI process.
| There are a number of libraries I use that I would like to have
| access to such a model.
|
| I'd be interested in something that has general knowledge of a
| large set of packages for a language, and could pull in /
| finetune / MoE little models for specific repositories I'm coding
| on. Right now I would rely on either a very large model and hope
| its knowledge cutoff is right (Claude/GPT-4), or using a lot of a
| large context window. There might be some Goldilocks version in
| the middle here which would be helpful in a larger codebase but
| be faster and more accurate than the cloud monopoly providers.
| saltsaman wrote:
| I can see people training loras this way, which allows for
| multiple API function calls
| vessenes wrote:
| Yeah, good idea! I'm not sure if you would be successful
| mixing LoRA + functional tokens. If you could, that would be
| great. Then You could ship very light LoRA packs with
| repositories.
|
| Their LoRA training was _I think_ against their finetuned
| model, not Gemma-2B directly. But, seems worth playing with
| -- could be super useful.
| turnsout wrote:
| This is the frontier--tiny, specialized models like this and
| ReALM [0], coupled to the application logic and able to run on-
| device.
|
| Eventually devices will be powerful enough to run more general
| purpose models locally, but for high-frequency user tasks with a
| low tolerance for error, smaller specialized models may always
| win.
|
| [0]: https://arxiv.org/abs/2403.20329
| mikece wrote:
| "What is better than one recipe for Octopus?"
|
| I can't be the only person who heard that line in their head
| instantly when reading that headline.
___________________________________________________________________
(page generated 2024-04-03 23:02 UTC)