[HN Gopher] Stanford Alpaca, and the acceleration of on-device L...
___________________________________________________________________
Stanford Alpaca, and the acceleration of on-device LLM development
Author : Kye
Score : 95 points
Date : 2023-03-13 19:54 UTC (3 hours ago)
(HTM) web link (simonwillison.net)
(TXT) w3m dump (simonwillison.net)
| swyx wrote:
| I feel like there has to be another shoe to drop here, this seems
| almost too good to be true.
|
| > Alpaca shows that you can apply fine-tuning with a feasible
| sized set of examples (52,000) and cost ($600) such that even the
| smallest of the LLaMA models--the 7B one, which can compress down
| to a 4GB file with 4-bit quantization--provides results that
| compare well to cutting edge text-davinci-003 in initial human
| evaluation.
|
| this is the most exciting thing. the cost of finetuning is
| rapidly coming down which means everyone will be able to train
| their own models for their usecases.
|
| Looking for the contrarians on HN: what is being left unsaid here
| that people like myself and Simon might be getting too optimistic
| about? what are the known downsides that people in academia
| already know about?
| atleastoptimal wrote:
| "initial human evaluation" is codeword for "cherrypicked
| prompts given to people who don't know how to trick an LLM"
| sebzim4500 wrote:
| I don't understand why that is a bad thing. If your goal is
| to make an AI assistant, then you should be optimizing for
| giving answers that real users find useful, not trying to
| impress other AI researchers.
| ChubbyGlasses wrote:
| i always found this to be a strange pov to have on LLMs. imo,
| it's not humans tricking/gaming the ai, but rather chatgpt
| has tricked you into believing it's smarter than it actually
| is. (in human terms, chatgpt is just more articulate than
| llama)
|
| it's a subtle distinction, but i think it shapes and reflects
| how you view ai as a tool for humans or as a replacement.
| dougmwne wrote:
| First catch is that someone needed to spend the enormous up
| front cost to train the base model, then release it under a
| flexible enough license for your use case.
|
| The second catch is that you would get much higher quality out
| of the 65b model, but would need to lay out a few thousand for
| the hardware.
|
| The third catch is that you need the fine tuning data, but that
| seems easier than ever to create out of more capable LMMs.
| blueblimp wrote:
| It seems still unclear how much quality loss there is compared
| to the best models. What's really needed is systematic
| evaluation of the output quality, but that's tricky and
| relatively expensive (compared to automated benchmarks), so I
| understand why it hasn't happened yet.
|
| Edit: I just tried it with a single task of my own (that I've
| successfully used with ChatGPT and Bing) and it flubbed it
| horribly, so this model at least is noticeably inferior to the
| SOTA, which is not surprising given how small it is.
| yunyu wrote:
| I assume you haven't tried Alpaca (which hasn't been
| released), only Llama. See the instruction fine tuning
| section in the article.
| karmasimida wrote:
| They currently only supports a single input/response format
| of input right? Multi turns will be more challenging to
| handle.
|
| I am optimistic for 30B or 66B to catch up with OpenAI, but
| 7B is unlikely to have the same quality.
| Kye wrote:
| The big failure mode is they can hallucinate nonsense that
| isn't obviously nonsense. You have to check any facts against
| expert sources. At that point, you could just email an expert
| who can use their own LLM to whip up an answer and check the
| facts themselves.
| simonw wrote:
| That's a big problem if you're using a language model as a
| search engine. The trick is to learn how to use them for the
| things that they're good for outside of that.
| typest wrote:
| ^^ this. For instance, LLMs are really good at turning
| natural language into SQL. And if you know SQL, you can
| read it and make sure it looks good. But, it's much faster
| and easier than writing SQL by hand.
| flir wrote:
| But that's still "You have to check any facts against
| expert sources"! You just have the advantage of being
| your own personal expert.
| porcc wrote:
| We saw this happen with Stable Diffusion and it's not
| surprising we see this happening here. There is a lot of
| interest in taking these models that are in striking distance
| (single order of magnitude) from running inference and training
| on consumer level hardware and as such a lot of energy is going
| into making the optimizations that can get us there.
|
| Generally speaking, research is not usually done with consumer
| usage in mind, so what this is, and Dreambooth etc. for Stable
| Diffusion was, is that gap between researcher software and
| accessible software being bridged.
| smoldesu wrote:
| > what is being left unsaid here that people like myself and
| Simon might be getting too optimistic about?
|
| The past week has felt like a wake-up call to enthusiasts.
| Running models locally has been available for a while (even
| small, fairly coherent ones), and the majority of
| "improvements" recently have come from implementing the leaked
| LLaMa model.
|
| The results from 7B are an improvement on what we had a year
| ago, but not by much. We're learning that there's room to
| optimize these models, but _also_ that size matters. ChatGPT
| and 7B are both great at bullshitting, but you can feel the
| difference in model size during regular conversation. Adding
| insult to injury, it will almost always be faster to query an
| API for AI results than it will be to run it locally.
|
| Analysis: Things are moving at a clip right now, but people
| expecting competitive LLMs running locally on their smartphone
| will be disappointed for quite a while. As the technology
| improves, it's also safe to assume that we'll find ways to
| scale model intelligence with greater resources, and the status
| quo will look much different than it does today.
| atleastoptimal wrote:
| >API for AI results than it will be to run it locally. True,
| and remotely called APIs will always be the mover for the AI
| craze. Only niche hobbyists will be running them locally.
|
| There is no company on the planet that would benefit from
| providing people local means to run LLMs. As a result only
| hacks and leaks will be how individuals can manage to run
| LLMs outside of heavily monitored remote API calls.
| flangola7 wrote:
| Who said anything about companies? Companies don't benefit
| by giving people free access to buildings full of books and
| knowledge, yet here they are.
| atleastoptimal wrote:
| In the last 50 years of AI research has any academic
| institution ever provided open source easy to use tools
| like the stuff big companies have put out in the past 5
| years?
| simonw wrote:
| Stable Diffusion came from an academic research lab.
| niemandhier wrote:
| Companies like Facebook can harm their competitors by
| releasing models.
|
| Facebook is not a major player in the Llm field, the
| technological advantage of openai is to large, BUT they can
| reduce the expected gains of their competition by providing
| less powerful alternatives for free.
| mikek wrote:
| Apple comes to mind.
| BryantD wrote:
| Agreed. Apple will run models remotely if necessary, but
| from a PR perspective they align with their stated
| intentions when they can run locally.
| flir wrote:
| At a guess: assuming quality scales with size, the model in the
| data centre is always going to outcompete the model on the
| device. So in any situation where you've got bandwidth
| >4800bps, why would you choose the model on the device?
| CuriouslyC wrote:
| Your own fine tuning, no restrictions on output,
| privacy/security, and if you have a reason to produce a lot
| of output it'll be cheaper. Use ChatGPT if you only want to
| use it occasionally, you don't care about privacy/security in
| this context, the output restrictions don't bother you and
| having the best possible language model is the most important
| thing to you.
| warning26 wrote:
| Really neat!
|
| _> Second, the instruction data is based OpenAI's text-
| davinci-003, whose terms of use prohibit developing models that
| compete with OpenAI._
|
| Wow, that seems really sketchy on the part of OpenAI. Even
| considering their overall lack of openness, this clause feels
| particularly egregious.
| kir-gadjello wrote:
| Charitably speaking the researchers had little time to execute
| this, so they just ended up using the well known OpenAI API.
| Still, it would be very useful if someone used LLaMA-65B
| instead of text-davinci-003 here.
|
| Someone should ask the researchers, either via email or via
| github pull request, it shouldn't even be that hard to do.
| flangola7 wrote:
| That has to run afoul of competition/antitrust laws and be
| unforceable. Imagine if Ford tried to tell people you can't use
| their pickups to carry tools around on a new Honda plant
| construction site.
| [deleted]
| macintux wrote:
| Active discussion on Alpaca:
| https://news.ycombinator.com/item?id=35136624
|
| Also: https://news.ycombinator.com/item?id=35139450
| dang wrote:
| Thanks! Macroexpanded:
|
| _Alpaca: A strong open-source instruction-following model_ -
| https://news.ycombinator.com/item?id=35136624
|
| Also recent and related:
|
| _Large language models are having their Stable Diffusion
| moment_ - https://news.ycombinator.com/item?id=35111646 - March
| 2023 (355 comments)
___________________________________________________________________
(page generated 2023-03-13 23:00 UTC)