[HN Gopher] LLaMA-Pro-8B
___________________________________________________________________
LLaMA-Pro-8B
Author : tosh
Score : 75 points
Date : 2024-01-06 19:12 UTC (3 hours ago)
(HTM) web link (huggingface.co)
(TXT) w3m dump (huggingface.co)
| dweekly wrote:
| Paper (which is more useful than the linked HF model) is at
| https://arxiv.org/abs/2401.02415
|
| There is also an -Instruct variant of this model published on HF.
|
| They are claiming the best balance of performance on language and
| coding tasks in an open source LLM of this size.
| Jackson__ wrote:
| Interesting, this sounds very similar to this other recent
| paper as well: https://arxiv.org/abs/2401.02412
|
| Especially with keeping the original model weights the same.
| miven wrote:
| What do they consider to be an "LLM of this size"?
|
| While this technique of scaling up an existing pre-trained
| model via fine-tuning is really impressive, it feels a bit
| unfair to compare what's essentially now an 8.3B model to
| mostly standard 7B ones, especially considering how important
| scale is in predicting LLM performance.
| WhackyIdeas wrote:
| I can't keep up with the millions of llm variants coming out each
| day. So far it's Mixtral / Mistral that have caught my attention.
| crq-yml wrote:
| I've just avoided engaging with the LLM "race" and stick with
| free ChatGPT 3.5 when I want to tinker. The technology is not
| on my critical path right now, so I'll probably engage when it
| settles into more defined products.
| KronisLV wrote:
| > I've just avoided engaging with the LLM "race" and stick
| with free ChatGPT 3.5 when I want to tinker.
|
| In my experience, Phind also seems fairly nice and is free
| for now, at least when the Cloudflare protection in front of
| it doesn't fuss over me using a VPN: https://www.phind.com
|
| Aside from that, I just paid for GitHub Copilot
| (https://github.com/features/copilot), albeit JetBrains are
| also coming out with their own product
| (https://www.jetbrains.com/ai/).
|
| It's nice that there are options that are basically "plug and
| play", both free and paid ones, if you don't feel like
| tinkering with the open source models, or don't have the
| hardware to run them with good performance (such as me only
| having an RX 580 GPU).
| aaomidi wrote:
| The jetbrains one feels so much worse in my experience.
| minimaxir wrote:
| That's perfectly fine.
|
| One of my biggest pet peeves with the LLM Benchmark culture is
| that in _practice_ , marginal increases don't matter and are
| overall subjective. But using a strong model (like ChatGPT is
| still) is a good starting point.
| reexpressionist wrote:
| The alternative approach is to start with a small[er] model,
| but derive reliable uncertainty estimates, only moving to a
| larger model if necessary (i.e., if the probability of the
| predictions is lower than needed for the task).
|
| And I agree that the leaderboards don't currently reflect the
| quantities of interest typically needed in practice.
| minimaxir wrote:
| > derive reliable uncertainty estimates
|
| That is very, very hard to do in an objective manner, as
| the current LLM benchmark gaming demonstrates.
|
| Sure, you can deploy a smaller model to production to get
| real-world user data and feedback, but a) deploying a
| suboptimal model can give a bad first impression and b) the
| _quality_ is still subjective and requires other metrics to
| be analyzed. Looking at prediction probabilities only
| really helps if you have a single correct output token,
| which isn 't what LLM benchmarks test for.
| infecto wrote:
| That and most of the merged or fine tuned models focus on
| overfitting for specific rankings.
| Casteil wrote:
| Same here. Official Mixtral & Mistral (the new v0.2) seem to
| have the best & most dependable output vs basically all their
| predecessors/derivatives of equal or smaller parameter sizes.
| Mixtral is on a whole other level with its baked-in 'chain of
| thought' reasoning.
|
| The fine tunes (e.g. dolphin-mixtral, dolphin2.2-mistral, etc)
| may be good at coding or whatever else they specialize in, but
| they seem to sacrifice in other areas that result in
| 'hallucinating' where Mixtral/Mistral wouldn't - perhaps as a
| result of 'score chasing' on benchmarks like has been mentioned
| already.
|
| Depends on your own specific needs & use cases whether this is
| important or not.
| LanzVonL wrote:
| Anybody else dislike the word huggingface? Also what is
| huggingface, why not just put these models up on Bittorrent?
| Whole thing weirds me out.
| ilrwbwrkhv wrote:
| they started off building chatbot apps for teenagers. hugging
| face is this emoji -> https://emojipedia.org/hugging-face
| apparently teens are into this stuff.
|
| then they pivoted into the whole ai thing like many other
| companies and are burning money now. the usual burn money till
| you get bought or you close the company with the owners getting
| rich of secondaries and your "lessons".
|
| edit: here is an old article
| https://techcrunch.com/2017/03/09/hugging-face-wants-to-beco...
| m0zzie wrote:
| > then they pivoted into the whole ai thing like many other
| companies
|
| I have nothing to do with the company, nor do I really use
| their platform, but I felt a need to point something out:
| your statement is a pretty egregious trivialisation of
| huggingface and their contributions to AI. They were
| releasing NLP papers in early 2018, and in the same year
| open-sourced one of the most impactful ML libraries in the
| industry's history. The original chatbot app was dropped very
| early on and their full focus became the research, tooling,
| and platform.
| minimaxir wrote:
| > then they pivoted into the whole ai thing
|
| Hugging Face first gained popularity as they provided the
| first PyTorch-friendly implementations of BERT and GPT-2,
| which later became the transformers library.
|
| They didn't pivot to AI, they were there since the start.
| yreg wrote:
| For some models you have to accept an eula before you can
| download them. Hence no bittorrent.
| MCUmaster wrote:
| Those aren't legal in my country.
| yreg wrote:
| And how exactly should that motivate the authors to put the
| models on bittorrent?
| MandieD wrote:
| Yes, because the first thing I think of is the Facehugger from
| Alien.
| orbital-decay wrote:
| For some reason I always assumed it's a deliberate
| pun/reference on their part, but turns out it isn't, they
| named it after the emoji.
| raziel2701 wrote:
| My mind always thinks facehuggers from Alien.
| zora_goron wrote:
| Context from the model page:
|
| "Developed by Tencent's ARC Lab, LLaMA-Pro is an 8.3 billion
| parameter model. It's an expansion of LLaMA2-7B, further trained
| on code and math corpora totaling 80 billion tokens."
___________________________________________________________________
(page generated 2024-01-06 23:01 UTC)