[HN Gopher] Meta rolls out AI language model LLaMA
       ___________________________________________________________________
        
       Meta rolls out AI language model LLaMA
        
       Author : marban
       Score  : 109 points
       Date   : 2023-02-24 16:58 UTC (6 hours ago)
        
 (HTM) web link (www.reuters.com)
 (TXT) w3m dump (www.reuters.com)
        
       | nothing0001 wrote:
       | How can they avoid that someone perform a post-training from this
       | model to obtain a different model that perhaps could not be
       | protected by copyright?
        
       | PhillyPhuture wrote:
       | The language model that lies better than any other model in the
       | universe? (just guessing from the training data)
        
       | idlewords wrote:
       | It's a funny accident of history that one company's business
       | model in 2023 is a combination of:
       | 
       | 1. Large interactive language model
       | 
       | 2. Wireless helmet for 3D equivalent of conference calls
       | 
       | 3. Helping you keep track of which classmate from high school
       | went bald
       | 
       | 4. Birthday reminder service
       | 
       | 5. Virtual currency intended to compete with the dollar
       | 
       | 6. Running all school and church message boards in the US
       | 
       | 7. World's largest collection of food photography
       | 
       | 8. Instant messaging
        
         | esalman wrote:
         | Ever heard of Facebook marketplace?
         | 
         | Besides every mom and pop shop uses FB pages and groups these
         | days. With mobile carriers supporting free bandwidth for FB
         | it's basically part of infrastructure in many 3rd world
         | countries.
        
         | tannhauser23 wrote:
         | They dropped the currency idea.
        
       | classified wrote:
       | Yes, everyone in the business of luring suckers will grow
       | themselves one of these.
        
       | karaterobot wrote:
       | > Meta's LLaMA, short for Large Language Model Meta AI
       | 
       | I would have said "LLaMA, an acronym based on cramming concepts
       | together until the word Llama is suggested, and then as a
       | flourish, adding an unnecessary lower-case letter 'a' in the
       | middle. Llamas are unrelated to AI, so this should not have been
       | anybody's priority." Guess that's why I'm not a writer.
        
         | deburo wrote:
         | As long as it is pronounceable, memorable, it's actually a good
         | one.
        
         | _puk wrote:
         | Also known as a "Backronyn".
         | 
         | Start with the acronym you want, and work back from there..
        
       | zmmmmm wrote:
       | Is there something special about this that would motivate someone
       | to bother when they can access alternatives with less
       | restrictions / more openly?
        
       | russdpale wrote:
       | I wouldn't trust this with a 10 foot pole. Given facebooks record
       | why would we as a people's even allow this?
        
         | titaniumtown wrote:
         | Facebook has a record of tainting their machine learning
         | models? I was not aware of that.
        
           | scubbo wrote:
           | It is disingenuous to pretend not to understand that
           | unethical behaviour in one area is a strong predictor of
           | unethhical behaviour in another area.
        
             | Reubend wrote:
             | It's disingenuous to make assumptions and argue without
             | having made a good faith effort to see if what you're
             | saying is true. 5 minutes of research would have taken you
             | to the research paper in question, which has a
             | straightforward description of the training data sources:
             | 
             | CommonCrawl 67.0% C4 15.0% Github 4.5% Wikipedia 4.5% Books
             | 4.5% ArXiv 2.5% StackExchange 2.0%
             | 
             | So in other words, nothing here in this research has used
             | people's social media posts, and there's no private data
             | whatsoever because the sources used were all public.
        
         | madsmith wrote:
         | I'm biased because I used to work for Facebook a long time ago.
         | So take my words with a grain of salt.
         | 
         | But Facebook research really wants to build amazing tech to
         | power new experiences. The company has attracted many smart and
         | amazing developers who want to pioneer innovation and want to
         | see their work thrive in the broader tech community.
         | 
         | I'm sure this will have the same issues as the other Large
         | Language Models and there's a lot of work to be done to figure
         | out how to distill a reliably useful system from something
         | trained on so much human language which is unreliable.
         | 
         | I'm heartened to see them talk about needing to research how to
         | remove bias, toxicity, disinformation and hallucinations.
         | 
         | This is something we need to focus on for all LLMs so I'm happy
         | to see more focus on that in the community.
        
           | theGnuMe wrote:
           | The FAIR group is top notch.
        
         | giancarlostoro wrote:
         | I don't trust Facebook at all, but we need more than just "THE
         | EVIL AI WAS MADE BY AN EVIL COMPANY" what is the concern?
         | They're going to shove ads into your AI?
        
           | freejazz wrote:
           | They impermissibly used customer's information in the data
           | set? that they knowingly disregarded known risks of harm
           | associated with the product?
        
             | return_to_monke wrote:
             | got any citations on that one, mate?
        
             | giancarlostoro wrote:
             | I would hope not, have you seen the dumb stuff my relatives
             | say? I think you need to provide some evidence to back your
             | claim though.
        
       | atemerev wrote:
       | It really whips...
        
       | behnamoh wrote:
       | All of this just to stay relevant among the competition, and yet,
       | none of these AI giants has introduced something that just works.
        
         | hummus_bae wrote:
         | [dead]
        
         | cryoz wrote:
         | Waiting for Apple.
        
           | jocaal wrote:
           | What makes you think that apple can even compete at all? The
           | businesses they are competing with in this space have actual
           | monetary incentive to do research in the field, this sounds
           | like the comments people make about apples electric cars and
           | AR headsets that are always just around the corner and better
           | than anything that already exists
        
       | bluetidepro wrote:
       | Related discussion: https://news.ycombinator.com/item?id=34925944
        
       | luckydata wrote:
       | the level of the comments in this post is disappointing for
       | hacker news.
        
         | dougmwne wrote:
         | Absolutely. From what I can tell, this model hits state of the
         | art on several benchmarks at 1/10 the size of its benchmark
         | winning competitors. That performance efficiency is great to
         | see because while we know we can increase performance by
         | scaling up the size and compute requirements, getting the same
         | performance out of a fraction of the compute is a major win.
         | This will be especially valuable as these models begin to see
         | real production use at scale such as with ChatGPT and Bing.
         | 
         | HN seems to have mostly devolved into skepticism, hype and
         | confusion lately around all things AI. I'd say we have just hit
         | futureshock.
        
       | rvnx wrote:
       | Very cool to see such model weights getting released. Makes me
       | wonder, seeing that Meta is crushing Google... This is like
       | MySpace teaching a lesson to Apple.
        
         | ilaksh wrote:
         | But they are not released. Only for non-commercial use by
         | groups that they grant access to. Which means studies funded by
         | giant corporations, or governments using it for their own
         | propaganda, or hackers that steal it or find a leak and use it
         | for spam and phishing etc. Anyone who wants to make something
         | useful and positive is going to be left out.
        
           | pilarphosol wrote:
           | AI companies seem to be in a paradoxical space where they
           | need the optics of being open even if it goes against their
           | business interests.
        
             | AuryGlenz wrote:
             | You know, Google releasing a ChatGPT competitor that could
             | be run locally would really take the wind out of
             | Microsoft's sails without needing to shift their own
             | business model.
        
           | hackernewds wrote:
           | Yan LeCunn seems to be parading that this is open source
           | under GPL V3, which would have been great. However, yes the
           | model weights have not been released and are under a separate
           | non commercial license.
        
           | DennisP wrote:
           | "Researchers and entities affiliated with government, civil
           | society, and academia" sounds like it might include _some_
           | positive uses, if you don 't assume Meta is maximally evil
           | just for the fun of it.
        
         | luckydata wrote:
         | I don't see any crushing going on.
        
         | meragrin_ wrote:
         | >seeing that Meta is crushing Google... This is like MySpace
         | teaching a lesson to Apple.
         | 
         | Huh? How so? I see Meta and Alphabet as very similar companies.
         | Both rely on ads, extracting information from text, keeping
         | users engaged on their platforms, furthering AI/machine
         | learning, etcetera. Does MySpace have their own computing
         | devices I'm unaware of?
        
       | marban wrote:
       | https://ai.facebook.com/blog/large-language-model-llama-meta...
        
         | qup wrote:
         | Even with this it's a little unclear, is this open source?
         | 
         | What kind of machine is needed to run it?
        
           | laluser wrote:
           | > Access to the model will be granted on a case-by-case basis
           | to academic researchers. Non commercial license
        
             | hackernewds wrote:
             | why does Yann LeCun advertise that it is open source then?
        
               | mertd wrote:
               | I assume you could train one like this provided that you
               | bring your own training data and computing resources.
        
               | p1esk wrote:
               | Because it is open source:
               | https://github.com/facebookresearch/llama
        
               | kaoD wrote:
               | It feels like open source as a term became
               | obsolete(-ish). Is it open source (in spirit) if only the
               | inference code is open source? I mean, technically it
               | might be, but not being able to train it myself is
               | basically against FSF's freedom 1 if you consider the
               | model the software.
               | 
               | That repo is basically releasing a binary (the weights)
               | and an open source runtime to run them.
        
               | nazka wrote:
               | I agree... This is not open source. If you publish a
               | wrapper that represent just 0.01% of your code and your
               | product and hide all the rest as closed source that is
               | considered open source now? Then any closed source
               | providing an API become open source with this bogus
               | definition. I guess now Windows is open source too, just
               | miss the 99.99% of the rest of the code.
               | 
               | It's absurd and it damages the open source world and its
               | real definition.
        
               | p1esk wrote:
               | No, this is open source. The model code and everything
               | needed to verify their results is provided. No one
               | promised the training code, and the training code is not
               | needed to reproduce the results.
        
               | kaoD wrote:
               | Going back to my original comment: if I release a giant
               | binary blob (say, a Windows image) and an open source
               | runtime to run it (say, VirtualBox), does that mean I can
               | say Windows is open source? I don't think so, and to me
               | it seems that this situation is a perfect parallel.
               | 
               | This is FSF's freedom 1: _" The freedom to study how the
               | program works, and change it so it does your computing as
               | you wish (freedom 1). Access to the source code is a
               | precondition for this."_
               | 
               | By that definition I don't think you can say this _model_
               | is open source then.
               | 
               | You could say the inference code is open source, and
               | that's technically true (and the only thing the repo
               | claims TBH), but calling this open source AI _model_ is
               | misleading.
               | 
               | Open source inference code is outright uninteresting.
               | It's a few hundred lines of glue code to run a giant
               | binary blob which does the heavy lifting, i.e. the actual
               | computing, i.e. the actual software, i.e. the software
               | whose source we're actually interested in.
        
             | svengrunner2049 wrote:
             | Microsoft and OpenAI, Google losing $100bn from a bad demo,
             | layoffs, cost cutting pressure to innovate again... I feel
             | the days of full openness in AI research from corporations
             | are over.
        
               | alfalfasprout wrote:
               | I wouldn't be remotely so quick to throw in the towel. ML
               | research tends to operate in "jumps" and plateaus. At
               | this point, the concepts behind the big LLMs are
               | relatively well known and the bottleneck is cost of
               | compute + cost of training data. Thing is, cost of
               | compute keeps coming down.
               | 
               | OpenAI's "win" wasn't even so much in the research but in
               | the design of ChatGPT as an interface. Its own model
               | makes the same kinds of egregious mistakes as google and
               | FB's own LLMs. Also, OpenAI was willing to just deal with
               | the ethical fallout of releasing it into the wild with
               | the ability to generate authoritative sounding
               | falsehoods.
               | 
               | I suspect we're going to go back to a period soon where a
               | lot of the innovation we're seeing is around interfaces
               | and infra to make interacting with LLMs natural and
               | applying them to product use cases where they make sense.
        
               | simonh wrote:
               | I really don't blame OpenAI for ethical issues over
               | opening up access to ChatGPT. They're not claiming it's
               | responses are factually correct, and arguably by making
               | it openly available they have done more than anyone else
               | to raise awareness of the risks and limitations of LLMs.
               | We need access to these things to make informed decisions
               | of what are or are not appropriate uses.
               | 
               | Microsoft and Google are a different story, they're
               | specifically pushing these as authoritative sources of
               | information. If we hadn't had access to ChatGPT and the
               | ability to learn it's ins and outs, it might have taken
               | longer to expose so may of the flaws in the Microsoft and
               | Google services.
        
               | mochomocha wrote:
               | I think we indeed hit "peak open-source" for AI and there
               | unfortunately won't be as much sharing in the coming
               | years. When the economy is down, people and companies
               | think more in "zero-sum-game". I hope to be proven wrong.
        
               | yacine_ wrote:
               | https://twitter.com/EMostaque/status/1629160706368053249
               | 
               | :)
        
               | rvz wrote:
               | This is what everyone needs to watch. It was Stability.ai
               | that spooked OpenAI for disrupting DALLE-2 with a open-
               | source AI model; Stable Diffusion. I am betting they are
               | going to do it again for ChatGPT.
               | 
               | The endgame for this AI race is obvious and when it comes
               | to AI models, open-source ones always disrupt fully
               | closed source AI companies.
               | 
               | But first, we'll see which 'AI companies' will survive
               | the lawsuits, regulations and fierce competition for
               | funding.
        
               | laluser wrote:
               | There is hope: https://www.bloomberg.com/news/articles/20
               | 23-02-21/amazon-s-....
        
               | somebodythere wrote:
               | The golden age of open-source AI is ahead of us. Open-
               | source AI companies are being launched and funded. High
               | quality, large, labeled data sets have never been more
               | accessible, and scaling law plateaus means there is going
               | to be a lot more momentum on data- and compute-
               | optimization, meaning current SOTA models will start
               | fitting on smaller and smaller hardware, down to
               | commodity hardware.
        
               | flangola7 wrote:
               | Until legislation clamps it down.
        
               | Karunamon wrote:
               | At least in the United States, it's well established that
               | code is protected by freedom of speech.
               | https://www.eff.org/press/archives/2008/04/21-40
        
               | A4ET8a8uTh0 wrote:
               | This. We already had discussions of 'attacks' on models
               | based on public data sets so 'good sets' may soon become
               | the thing to go after ( and suddenly data brokers may
               | really want to up the prices of their sets ). We might
               | actually see more privacy as a result as the data brokers
               | will start charging a premium for clear sets.
               | 
               | Naturally, as predictions go, don't quote me on that. I
               | was wrong before.
        
             | generalizations wrote:
             | Here's hoping someone with access puts it on bittorrent.
        
               | 1letterunixname wrote:
               | DLP, overall architecture, and many more aspects put the
               | kibosh on that.
               | 
               | The career risk isn't worth it especially when tech is
               | deployed client- and network-side to detect just such
               | exfil attempts. The average of network, security, and
               | client management staff tend to be PEs (SREs) who can
               | code, some have PhDs, and are the cream of what was
               | previously organized as "corporate IT" world. So I fail
               | to see any incentive to throw away their career and
               | reputation by giving away IP for $0.
               | 
               | There's a metric s*ton of optimized hardware to generate
               | models. And I have my doubts if Sama at OpenAI, even with
               | 10 gigabucks from Microsoft, can sustain growth,
               | organizational culture, and long-term investment at the
               | scale others are bringing online with less trouble and
               | more experience.
               | 
               | The future interaction will AI models will most likely be
               | through an API because the models themselves are becoming
               | too large to fit even on the most extreme DIY NAS
               | solutions.
               | 
               | TL;DR: it's not happening.
        
               | Hyption wrote:
               | There are plenty low paid students or working students or
               | admins in-between.
        
               | generalizations wrote:
               | > The future interaction will AI models will most likely
               | be through an API because the models themselves are
               | becoming too large to fit even on the most extreme DIY
               | NAS solutions.
               | 
               | And yet I spent last night running GPT-J-6B on my desktop
               | CPU at 2 tokens / sec. People are finally starting to
               | optimize these models, and there's a ton of optimization
               | to go. We'll definitely be running these locally in the
               | next few years. This model especially looks like an ideal
               | candidate for CPU optimization, given the pairity with
               | GPT3, and that it's within spitting distance of the size
               | of models like GPT-J-6B.
        
               | imranq wrote:
               | You can make small LLMs more performant than large ones
               | like GPT-3 by fine-tuning on specific tasks or providing
               | them tools to offload precise calculations:
               | 
               | e.g. Toolformer: https://arxiv.org/abs/2302.04761
               | 
               | which uses APIs and functions to improve GPT-J beyond
               | GPT-3 for various tasks
        
               | 1letterunixname wrote:
               | How dare you bring knowledge in here! Some people at work
               | would be offended if they couldn't buy their A100 or DGX
               | toys, or had to scale back their custom ASIC and systems
               | R&D.
        
               | cma wrote:
               | Once it is out there it can be rehosted on the clear net
               | since model weights can't be copyrighted (unless they
               | intentionally over train on some copyrighted snippets
               | they own maybe).
        
       | MongoTheMad wrote:
       | The name made me think of this: https://youtu.be/KMYN4djSq7o
        
         | mmahemoff wrote:
         | It made me think of Winamp.
        
         | dukoid wrote:
         | Better than LMAA I guess O:)
         | https://en.wikipedia.org/wiki/Swabian_salute
        
         | RheingoldRiver wrote:
         | ah, I expected that to be this:
         | https://www.youtube.com/watch?v=kZUPCB9533Y
        
         | thom wrote:
         | Or this: https://www.youtube.com/watch?v=gTmEfoPnQDo
        
       | a-dub wrote:
       | the focus on non-proprietary training datasets and computational
       | efficiency for the purposes of reducing costs and democratizing
       | access is pretty cool!
       | 
       | it is interesting how model performance seems to scale almost
       | linearly with size for the sizes chosen. (or perhaps not, perhaps
       | the researchers are choosing to focus on the part of the
       | performance-size response curve that is linear).
        
       | jejeyyy77 wrote:
       | dumb question - but - can we actually use this right now? Like
       | download the model and run it locally?
        
       ___________________________________________________________________
       (page generated 2023-02-24 23:01 UTC)