[HN Gopher] LLaMA-Pro-8B
       ___________________________________________________________________
        
       LLaMA-Pro-8B
        
       Author : tosh
       Score  : 75 points
       Date   : 2024-01-06 19:12 UTC (3 hours ago)
        
 (HTM) web link (huggingface.co)
 (TXT) w3m dump (huggingface.co)
        
       | dweekly wrote:
       | Paper (which is more useful than the linked HF model) is at
       | https://arxiv.org/abs/2401.02415
       | 
       | There is also an -Instruct variant of this model published on HF.
       | 
       | They are claiming the best balance of performance on language and
       | coding tasks in an open source LLM of this size.
        
         | Jackson__ wrote:
         | Interesting, this sounds very similar to this other recent
         | paper as well: https://arxiv.org/abs/2401.02412
         | 
         | Especially with keeping the original model weights the same.
        
         | miven wrote:
         | What do they consider to be an "LLM of this size"?
         | 
         | While this technique of scaling up an existing pre-trained
         | model via fine-tuning is really impressive, it feels a bit
         | unfair to compare what's essentially now an 8.3B model to
         | mostly standard 7B ones, especially considering how important
         | scale is in predicting LLM performance.
        
       | WhackyIdeas wrote:
       | I can't keep up with the millions of llm variants coming out each
       | day. So far it's Mixtral / Mistral that have caught my attention.
        
         | crq-yml wrote:
         | I've just avoided engaging with the LLM "race" and stick with
         | free ChatGPT 3.5 when I want to tinker. The technology is not
         | on my critical path right now, so I'll probably engage when it
         | settles into more defined products.
        
           | KronisLV wrote:
           | > I've just avoided engaging with the LLM "race" and stick
           | with free ChatGPT 3.5 when I want to tinker.
           | 
           | In my experience, Phind also seems fairly nice and is free
           | for now, at least when the Cloudflare protection in front of
           | it doesn't fuss over me using a VPN: https://www.phind.com
           | 
           | Aside from that, I just paid for GitHub Copilot
           | (https://github.com/features/copilot), albeit JetBrains are
           | also coming out with their own product
           | (https://www.jetbrains.com/ai/).
           | 
           | It's nice that there are options that are basically "plug and
           | play", both free and paid ones, if you don't feel like
           | tinkering with the open source models, or don't have the
           | hardware to run them with good performance (such as me only
           | having an RX 580 GPU).
        
             | aaomidi wrote:
             | The jetbrains one feels so much worse in my experience.
        
         | minimaxir wrote:
         | That's perfectly fine.
         | 
         | One of my biggest pet peeves with the LLM Benchmark culture is
         | that in _practice_ , marginal increases don't matter and are
         | overall subjective. But using a strong model (like ChatGPT is
         | still) is a good starting point.
        
           | reexpressionist wrote:
           | The alternative approach is to start with a small[er] model,
           | but derive reliable uncertainty estimates, only moving to a
           | larger model if necessary (i.e., if the probability of the
           | predictions is lower than needed for the task).
           | 
           | And I agree that the leaderboards don't currently reflect the
           | quantities of interest typically needed in practice.
        
             | minimaxir wrote:
             | > derive reliable uncertainty estimates
             | 
             | That is very, very hard to do in an objective manner, as
             | the current LLM benchmark gaming demonstrates.
             | 
             | Sure, you can deploy a smaller model to production to get
             | real-world user data and feedback, but a) deploying a
             | suboptimal model can give a bad first impression and b) the
             | _quality_ is still subjective and requires other metrics to
             | be analyzed. Looking at prediction probabilities only
             | really helps if you have a single correct output token,
             | which isn 't what LLM benchmarks test for.
        
         | infecto wrote:
         | That and most of the merged or fine tuned models focus on
         | overfitting for specific rankings.
        
         | Casteil wrote:
         | Same here. Official Mixtral & Mistral (the new v0.2) seem to
         | have the best & most dependable output vs basically all their
         | predecessors/derivatives of equal or smaller parameter sizes.
         | Mixtral is on a whole other level with its baked-in 'chain of
         | thought' reasoning.
         | 
         | The fine tunes (e.g. dolphin-mixtral, dolphin2.2-mistral, etc)
         | may be good at coding or whatever else they specialize in, but
         | they seem to sacrifice in other areas that result in
         | 'hallucinating' where Mixtral/Mistral wouldn't - perhaps as a
         | result of 'score chasing' on benchmarks like has been mentioned
         | already.
         | 
         | Depends on your own specific needs & use cases whether this is
         | important or not.
        
       | LanzVonL wrote:
       | Anybody else dislike the word huggingface? Also what is
       | huggingface, why not just put these models up on Bittorrent?
       | Whole thing weirds me out.
        
         | ilrwbwrkhv wrote:
         | they started off building chatbot apps for teenagers. hugging
         | face is this emoji -> https://emojipedia.org/hugging-face
         | apparently teens are into this stuff.
         | 
         | then they pivoted into the whole ai thing like many other
         | companies and are burning money now. the usual burn money till
         | you get bought or you close the company with the owners getting
         | rich of secondaries and your "lessons".
         | 
         | edit: here is an old article
         | https://techcrunch.com/2017/03/09/hugging-face-wants-to-beco...
        
           | m0zzie wrote:
           | > then they pivoted into the whole ai thing like many other
           | companies
           | 
           | I have nothing to do with the company, nor do I really use
           | their platform, but I felt a need to point something out:
           | your statement is a pretty egregious trivialisation of
           | huggingface and their contributions to AI. They were
           | releasing NLP papers in early 2018, and in the same year
           | open-sourced one of the most impactful ML libraries in the
           | industry's history. The original chatbot app was dropped very
           | early on and their full focus became the research, tooling,
           | and platform.
        
           | minimaxir wrote:
           | > then they pivoted into the whole ai thing
           | 
           | Hugging Face first gained popularity as they provided the
           | first PyTorch-friendly implementations of BERT and GPT-2,
           | which later became the transformers library.
           | 
           | They didn't pivot to AI, they were there since the start.
        
         | yreg wrote:
         | For some models you have to accept an eula before you can
         | download them. Hence no bittorrent.
        
           | MCUmaster wrote:
           | Those aren't legal in my country.
        
             | yreg wrote:
             | And how exactly should that motivate the authors to put the
             | models on bittorrent?
        
         | MandieD wrote:
         | Yes, because the first thing I think of is the Facehugger from
         | Alien.
        
           | orbital-decay wrote:
           | For some reason I always assumed it's a deliberate
           | pun/reference on their part, but turns out it isn't, they
           | named it after the emoji.
        
         | raziel2701 wrote:
         | My mind always thinks facehuggers from Alien.
        
       | zora_goron wrote:
       | Context from the model page:
       | 
       | "Developed by Tencent's ARC Lab, LLaMA-Pro is an 8.3 billion
       | parameter model. It's an expansion of LLaMA2-7B, further trained
       | on code and math corpora totaling 80 billion tokens."
        
       ___________________________________________________________________
       (page generated 2024-01-06 23:01 UTC)