[HN Gopher] Meta rolls out AI language model LLaMA
___________________________________________________________________
Meta rolls out AI language model LLaMA
Author : marban
Score : 109 points
Date : 2023-02-24 16:58 UTC (6 hours ago)
(HTM) web link (www.reuters.com)
(TXT) w3m dump (www.reuters.com)
| nothing0001 wrote:
| How can they avoid that someone perform a post-training from this
| model to obtain a different model that perhaps could not be
| protected by copyright?
| PhillyPhuture wrote:
| The language model that lies better than any other model in the
| universe? (just guessing from the training data)
| idlewords wrote:
| It's a funny accident of history that one company's business
| model in 2023 is a combination of:
|
| 1. Large interactive language model
|
| 2. Wireless helmet for 3D equivalent of conference calls
|
| 3. Helping you keep track of which classmate from high school
| went bald
|
| 4. Birthday reminder service
|
| 5. Virtual currency intended to compete with the dollar
|
| 6. Running all school and church message boards in the US
|
| 7. World's largest collection of food photography
|
| 8. Instant messaging
| esalman wrote:
| Ever heard of Facebook marketplace?
|
| Besides every mom and pop shop uses FB pages and groups these
| days. With mobile carriers supporting free bandwidth for FB
| it's basically part of infrastructure in many 3rd world
| countries.
| tannhauser23 wrote:
| They dropped the currency idea.
| classified wrote:
| Yes, everyone in the business of luring suckers will grow
| themselves one of these.
| karaterobot wrote:
| > Meta's LLaMA, short for Large Language Model Meta AI
|
| I would have said "LLaMA, an acronym based on cramming concepts
| together until the word Llama is suggested, and then as a
| flourish, adding an unnecessary lower-case letter 'a' in the
| middle. Llamas are unrelated to AI, so this should not have been
| anybody's priority." Guess that's why I'm not a writer.
| deburo wrote:
| As long as it is pronounceable, memorable, it's actually a good
| one.
| _puk wrote:
| Also known as a "Backronyn".
|
| Start with the acronym you want, and work back from there..
| zmmmmm wrote:
| Is there something special about this that would motivate someone
| to bother when they can access alternatives with less
| restrictions / more openly?
| russdpale wrote:
| I wouldn't trust this with a 10 foot pole. Given facebooks record
| why would we as a people's even allow this?
| titaniumtown wrote:
| Facebook has a record of tainting their machine learning
| models? I was not aware of that.
| scubbo wrote:
| It is disingenuous to pretend not to understand that
| unethical behaviour in one area is a strong predictor of
| unethhical behaviour in another area.
| Reubend wrote:
| It's disingenuous to make assumptions and argue without
| having made a good faith effort to see if what you're
| saying is true. 5 minutes of research would have taken you
| to the research paper in question, which has a
| straightforward description of the training data sources:
|
| CommonCrawl 67.0% C4 15.0% Github 4.5% Wikipedia 4.5% Books
| 4.5% ArXiv 2.5% StackExchange 2.0%
|
| So in other words, nothing here in this research has used
| people's social media posts, and there's no private data
| whatsoever because the sources used were all public.
| madsmith wrote:
| I'm biased because I used to work for Facebook a long time ago.
| So take my words with a grain of salt.
|
| But Facebook research really wants to build amazing tech to
| power new experiences. The company has attracted many smart and
| amazing developers who want to pioneer innovation and want to
| see their work thrive in the broader tech community.
|
| I'm sure this will have the same issues as the other Large
| Language Models and there's a lot of work to be done to figure
| out how to distill a reliably useful system from something
| trained on so much human language which is unreliable.
|
| I'm heartened to see them talk about needing to research how to
| remove bias, toxicity, disinformation and hallucinations.
|
| This is something we need to focus on for all LLMs so I'm happy
| to see more focus on that in the community.
| theGnuMe wrote:
| The FAIR group is top notch.
| giancarlostoro wrote:
| I don't trust Facebook at all, but we need more than just "THE
| EVIL AI WAS MADE BY AN EVIL COMPANY" what is the concern?
| They're going to shove ads into your AI?
| freejazz wrote:
| They impermissibly used customer's information in the data
| set? that they knowingly disregarded known risks of harm
| associated with the product?
| return_to_monke wrote:
| got any citations on that one, mate?
| giancarlostoro wrote:
| I would hope not, have you seen the dumb stuff my relatives
| say? I think you need to provide some evidence to back your
| claim though.
| atemerev wrote:
| It really whips...
| behnamoh wrote:
| All of this just to stay relevant among the competition, and yet,
| none of these AI giants has introduced something that just works.
| hummus_bae wrote:
| [dead]
| cryoz wrote:
| Waiting for Apple.
| jocaal wrote:
| What makes you think that apple can even compete at all? The
| businesses they are competing with in this space have actual
| monetary incentive to do research in the field, this sounds
| like the comments people make about apples electric cars and
| AR headsets that are always just around the corner and better
| than anything that already exists
| bluetidepro wrote:
| Related discussion: https://news.ycombinator.com/item?id=34925944
| luckydata wrote:
| the level of the comments in this post is disappointing for
| hacker news.
| dougmwne wrote:
| Absolutely. From what I can tell, this model hits state of the
| art on several benchmarks at 1/10 the size of its benchmark
| winning competitors. That performance efficiency is great to
| see because while we know we can increase performance by
| scaling up the size and compute requirements, getting the same
| performance out of a fraction of the compute is a major win.
| This will be especially valuable as these models begin to see
| real production use at scale such as with ChatGPT and Bing.
|
| HN seems to have mostly devolved into skepticism, hype and
| confusion lately around all things AI. I'd say we have just hit
| futureshock.
| rvnx wrote:
| Very cool to see such model weights getting released. Makes me
| wonder, seeing that Meta is crushing Google... This is like
| MySpace teaching a lesson to Apple.
| ilaksh wrote:
| But they are not released. Only for non-commercial use by
| groups that they grant access to. Which means studies funded by
| giant corporations, or governments using it for their own
| propaganda, or hackers that steal it or find a leak and use it
| for spam and phishing etc. Anyone who wants to make something
| useful and positive is going to be left out.
| pilarphosol wrote:
| AI companies seem to be in a paradoxical space where they
| need the optics of being open even if it goes against their
| business interests.
| AuryGlenz wrote:
| You know, Google releasing a ChatGPT competitor that could
| be run locally would really take the wind out of
| Microsoft's sails without needing to shift their own
| business model.
| hackernewds wrote:
| Yan LeCunn seems to be parading that this is open source
| under GPL V3, which would have been great. However, yes the
| model weights have not been released and are under a separate
| non commercial license.
| DennisP wrote:
| "Researchers and entities affiliated with government, civil
| society, and academia" sounds like it might include _some_
| positive uses, if you don 't assume Meta is maximally evil
| just for the fun of it.
| luckydata wrote:
| I don't see any crushing going on.
| meragrin_ wrote:
| >seeing that Meta is crushing Google... This is like MySpace
| teaching a lesson to Apple.
|
| Huh? How so? I see Meta and Alphabet as very similar companies.
| Both rely on ads, extracting information from text, keeping
| users engaged on their platforms, furthering AI/machine
| learning, etcetera. Does MySpace have their own computing
| devices I'm unaware of?
| marban wrote:
| https://ai.facebook.com/blog/large-language-model-llama-meta...
| qup wrote:
| Even with this it's a little unclear, is this open source?
|
| What kind of machine is needed to run it?
| laluser wrote:
| > Access to the model will be granted on a case-by-case basis
| to academic researchers. Non commercial license
| hackernewds wrote:
| why does Yann LeCun advertise that it is open source then?
| mertd wrote:
| I assume you could train one like this provided that you
| bring your own training data and computing resources.
| p1esk wrote:
| Because it is open source:
| https://github.com/facebookresearch/llama
| kaoD wrote:
| It feels like open source as a term became
| obsolete(-ish). Is it open source (in spirit) if only the
| inference code is open source? I mean, technically it
| might be, but not being able to train it myself is
| basically against FSF's freedom 1 if you consider the
| model the software.
|
| That repo is basically releasing a binary (the weights)
| and an open source runtime to run them.
| nazka wrote:
| I agree... This is not open source. If you publish a
| wrapper that represent just 0.01% of your code and your
| product and hide all the rest as closed source that is
| considered open source now? Then any closed source
| providing an API become open source with this bogus
| definition. I guess now Windows is open source too, just
| miss the 99.99% of the rest of the code.
|
| It's absurd and it damages the open source world and its
| real definition.
| p1esk wrote:
| No, this is open source. The model code and everything
| needed to verify their results is provided. No one
| promised the training code, and the training code is not
| needed to reproduce the results.
| kaoD wrote:
| Going back to my original comment: if I release a giant
| binary blob (say, a Windows image) and an open source
| runtime to run it (say, VirtualBox), does that mean I can
| say Windows is open source? I don't think so, and to me
| it seems that this situation is a perfect parallel.
|
| This is FSF's freedom 1: _" The freedom to study how the
| program works, and change it so it does your computing as
| you wish (freedom 1). Access to the source code is a
| precondition for this."_
|
| By that definition I don't think you can say this _model_
| is open source then.
|
| You could say the inference code is open source, and
| that's technically true (and the only thing the repo
| claims TBH), but calling this open source AI _model_ is
| misleading.
|
| Open source inference code is outright uninteresting.
| It's a few hundred lines of glue code to run a giant
| binary blob which does the heavy lifting, i.e. the actual
| computing, i.e. the actual software, i.e. the software
| whose source we're actually interested in.
| svengrunner2049 wrote:
| Microsoft and OpenAI, Google losing $100bn from a bad demo,
| layoffs, cost cutting pressure to innovate again... I feel
| the days of full openness in AI research from corporations
| are over.
| alfalfasprout wrote:
| I wouldn't be remotely so quick to throw in the towel. ML
| research tends to operate in "jumps" and plateaus. At
| this point, the concepts behind the big LLMs are
| relatively well known and the bottleneck is cost of
| compute + cost of training data. Thing is, cost of
| compute keeps coming down.
|
| OpenAI's "win" wasn't even so much in the research but in
| the design of ChatGPT as an interface. Its own model
| makes the same kinds of egregious mistakes as google and
| FB's own LLMs. Also, OpenAI was willing to just deal with
| the ethical fallout of releasing it into the wild with
| the ability to generate authoritative sounding
| falsehoods.
|
| I suspect we're going to go back to a period soon where a
| lot of the innovation we're seeing is around interfaces
| and infra to make interacting with LLMs natural and
| applying them to product use cases where they make sense.
| simonh wrote:
| I really don't blame OpenAI for ethical issues over
| opening up access to ChatGPT. They're not claiming it's
| responses are factually correct, and arguably by making
| it openly available they have done more than anyone else
| to raise awareness of the risks and limitations of LLMs.
| We need access to these things to make informed decisions
| of what are or are not appropriate uses.
|
| Microsoft and Google are a different story, they're
| specifically pushing these as authoritative sources of
| information. If we hadn't had access to ChatGPT and the
| ability to learn it's ins and outs, it might have taken
| longer to expose so may of the flaws in the Microsoft and
| Google services.
| mochomocha wrote:
| I think we indeed hit "peak open-source" for AI and there
| unfortunately won't be as much sharing in the coming
| years. When the economy is down, people and companies
| think more in "zero-sum-game". I hope to be proven wrong.
| yacine_ wrote:
| https://twitter.com/EMostaque/status/1629160706368053249
|
| :)
| rvz wrote:
| This is what everyone needs to watch. It was Stability.ai
| that spooked OpenAI for disrupting DALLE-2 with a open-
| source AI model; Stable Diffusion. I am betting they are
| going to do it again for ChatGPT.
|
| The endgame for this AI race is obvious and when it comes
| to AI models, open-source ones always disrupt fully
| closed source AI companies.
|
| But first, we'll see which 'AI companies' will survive
| the lawsuits, regulations and fierce competition for
| funding.
| laluser wrote:
| There is hope: https://www.bloomberg.com/news/articles/20
| 23-02-21/amazon-s-....
| somebodythere wrote:
| The golden age of open-source AI is ahead of us. Open-
| source AI companies are being launched and funded. High
| quality, large, labeled data sets have never been more
| accessible, and scaling law plateaus means there is going
| to be a lot more momentum on data- and compute-
| optimization, meaning current SOTA models will start
| fitting on smaller and smaller hardware, down to
| commodity hardware.
| flangola7 wrote:
| Until legislation clamps it down.
| Karunamon wrote:
| At least in the United States, it's well established that
| code is protected by freedom of speech.
| https://www.eff.org/press/archives/2008/04/21-40
| A4ET8a8uTh0 wrote:
| This. We already had discussions of 'attacks' on models
| based on public data sets so 'good sets' may soon become
| the thing to go after ( and suddenly data brokers may
| really want to up the prices of their sets ). We might
| actually see more privacy as a result as the data brokers
| will start charging a premium for clear sets.
|
| Naturally, as predictions go, don't quote me on that. I
| was wrong before.
| generalizations wrote:
| Here's hoping someone with access puts it on bittorrent.
| 1letterunixname wrote:
| DLP, overall architecture, and many more aspects put the
| kibosh on that.
|
| The career risk isn't worth it especially when tech is
| deployed client- and network-side to detect just such
| exfil attempts. The average of network, security, and
| client management staff tend to be PEs (SREs) who can
| code, some have PhDs, and are the cream of what was
| previously organized as "corporate IT" world. So I fail
| to see any incentive to throw away their career and
| reputation by giving away IP for $0.
|
| There's a metric s*ton of optimized hardware to generate
| models. And I have my doubts if Sama at OpenAI, even with
| 10 gigabucks from Microsoft, can sustain growth,
| organizational culture, and long-term investment at the
| scale others are bringing online with less trouble and
| more experience.
|
| The future interaction will AI models will most likely be
| through an API because the models themselves are becoming
| too large to fit even on the most extreme DIY NAS
| solutions.
|
| TL;DR: it's not happening.
| Hyption wrote:
| There are plenty low paid students or working students or
| admins in-between.
| generalizations wrote:
| > The future interaction will AI models will most likely
| be through an API because the models themselves are
| becoming too large to fit even on the most extreme DIY
| NAS solutions.
|
| And yet I spent last night running GPT-J-6B on my desktop
| CPU at 2 tokens / sec. People are finally starting to
| optimize these models, and there's a ton of optimization
| to go. We'll definitely be running these locally in the
| next few years. This model especially looks like an ideal
| candidate for CPU optimization, given the pairity with
| GPT3, and that it's within spitting distance of the size
| of models like GPT-J-6B.
| imranq wrote:
| You can make small LLMs more performant than large ones
| like GPT-3 by fine-tuning on specific tasks or providing
| them tools to offload precise calculations:
|
| e.g. Toolformer: https://arxiv.org/abs/2302.04761
|
| which uses APIs and functions to improve GPT-J beyond
| GPT-3 for various tasks
| 1letterunixname wrote:
| How dare you bring knowledge in here! Some people at work
| would be offended if they couldn't buy their A100 or DGX
| toys, or had to scale back their custom ASIC and systems
| R&D.
| cma wrote:
| Once it is out there it can be rehosted on the clear net
| since model weights can't be copyrighted (unless they
| intentionally over train on some copyrighted snippets
| they own maybe).
| MongoTheMad wrote:
| The name made me think of this: https://youtu.be/KMYN4djSq7o
| mmahemoff wrote:
| It made me think of Winamp.
| dukoid wrote:
| Better than LMAA I guess O:)
| https://en.wikipedia.org/wiki/Swabian_salute
| RheingoldRiver wrote:
| ah, I expected that to be this:
| https://www.youtube.com/watch?v=kZUPCB9533Y
| thom wrote:
| Or this: https://www.youtube.com/watch?v=gTmEfoPnQDo
| a-dub wrote:
| the focus on non-proprietary training datasets and computational
| efficiency for the purposes of reducing costs and democratizing
| access is pretty cool!
|
| it is interesting how model performance seems to scale almost
| linearly with size for the sizes chosen. (or perhaps not, perhaps
| the researchers are choosing to focus on the part of the
| performance-size response curve that is linear).
| jejeyyy77 wrote:
| dumb question - but - can we actually use this right now? Like
| download the model and run it locally?
___________________________________________________________________
(page generated 2023-02-24 23:01 UTC)