[HN Gopher] LLaMA: A foundational, 65B-parameter large language ...
___________________________________________________________________
LLaMA: A foundational, 65B-parameter large language model
Author : mfiguiere
Score : 296 points
Date : 2023-02-24 16:08 UTC (6 hours ago)
(HTM) web link (ai.facebook.com)
(TXT) w3m dump (ai.facebook.com)
| freezed88 wrote:
| (creator of gpt index / llamaindex here
| https://github.com/jerryjliu/gpt_index)
|
| Funny that we had just rebranded our tool from GPT Index to
| LlamaIndex about a week ago to avoid potential trademark issues
| with OpenAI, and turns out Meta has similar ideas around
| LLM+llama puns :). Must mean the name is good though!
|
| Also very excited to try plugging in the LLaMa model into
| LlamaIndex, will report the results.
| haolez wrote:
| gpt-index is awesome! What should I study to understand the
| inner workings a bit better?
| mark_l_watson wrote:
| I look forward to your comparison results.
|
| BTW, I have been heavily experimenting with both your
| LlamaIndex and also LangChain which you use in LlamaIndex. I am
| also writing a new book centered around both projects. Great
| stuff!!
| modeless wrote:
| How possible is it to run these models on a gaming GPU?
| coolspot wrote:
| Yes, if your gaming rig is NVidia DGX-1 workstation.
|
| https://youtu.be/5TRr2oWeSw0
| iandanforth wrote:
| "we are publicly releasing LLaMA"
|
| "Access to the model will be granted on a case-by-case basis to
| academic researchers"
|
| They keep saying the word 'release' but I don't think they know
| what that word means. There are perfectly good words in the
| English language to describe this situation without abusing
| "release." They "will begin to grant access to a select few".
| Nothing about that releases the model, or their control, which
| they are not doing and shouldn't imply.
| modeless wrote:
| Someone needs to start a pirate bay for torrents of ML models.
| Call it ClosedAI, since in AI land these words mean the
| opposite of what they say.
| ImprobableTruth wrote:
| Unfortunately non-commercial and only available to academics upon
| request.
| elorant wrote:
| Until someone leaks it.
| brucethemoose2 wrote:
| I give it a week.
| flangola7 wrote:
| That's probably the best and safest way to do it
| yieldcrv wrote:
| darn, guess we'll have to wait until next week when some other
| surprise organization releases something extremely surprising
| and useful in this space
| weystrom wrote:
| I'd love to hear what Facebook defines as "open" and
| "democratic".
|
| I understand the unwillingness to attach your brand to whatever
| porn that inevitably comes out of making it available for
| everybody, but maybe don't use such big words then.
| dentalperson wrote:
| How does token->embedding lookup work with 1.4T BPE tokens? Since
| there are more tokens than the 65B parameters it must be doing
| some sort of interesting thing based on the merge operations. Is
| it different from what other GPT models with ~100k tokens are
| doing?
|
| At inference, how many of those tokens are used? (they mention
| most tokens are used only once during training, so the must be
| very long sequences.)
| bitRAKE wrote:
| The 1.4T tokens are what the model was trained on, and not the
| token range of the embedding.
| dentalperson wrote:
| Ah, that makes more sense, thank you. Since this was
| mentioned in the tokenizer section and the number of unique
| tokens wasn't mentioned I misunderstood.
| zxienin wrote:
| Will it get 1M users using it by next week?
|
| "Our 65B model is better performant than ChatGPT's 175B" is
| missing the point openai made in Nov'22.
|
| That aside, this looks nice incremental research progress.
| apgwoz wrote:
| LLaMA is short for "Large Language Model Meta AI" -- I don't get
| how, but I guess the "fakeronym" was a brandable, recognizable
| word, and close enough that they went with it?
| muglug wrote:
| I was surprised to see this citation in the linked paper
|
| > Alan M Turing. 2009. Computing machinery and intelligence
|
| Did he invent a time machine too?
| harveywi wrote:
| It is undecidable to determine whether a manuscript has
| terminated, or whether it will continue being revised forever.
| igravious wrote:
| Absolutely a pet peeve of mine.
|
| What's wrong with (orig. pub. 1950, rev. ed. 2009) or something
| like that.
|
| With Zotero, the ref manager I use, you need to know the
| special magic incantation to store the original date of
| publication (the publishing date refers to the edition you're
| citing) but it just looks stupid (I think) to see Nietzsche F.
| 1999, Untimely Meditations (or whatever) and also I'd like to
| sort texts by original date of publication from oldest to
| newest because that's interesting. You have to put a special
| magic code in the _extra_ field and then your tool-chain has to
| preserve it and transmogrify it correctly in the doc you 're
| making use of the citation.
| Mizza wrote:
| Looks like it was republished with additional commentary:
| https://link.springer.com/chapter/10.1007/978-1-4020-6710-5_...
| thedudeabides5 wrote:
| LLMs are already getting commoditized, and this is good
| swyx wrote:
| > To maintain integrity and prevent misuse, we are releasing our
| model under a noncommercial license focused on research use
| cases.
|
| if say i wanted to replicate this paper for commercial use, what
| would it take and how do i get started? would FB have a basis for
| objection?
| minimaxir wrote:
| A metric-ton of GPU compute, for starters.
| chandureddyvari wrote:
| 82342 GPU(A-100 80GB) hours for training their 7B(smallest)
| model according to their paper.
| riku_iki wrote:
| 115 GPUs can be used to train it within month.
| causalmodels wrote:
| I'm going to assume you know how to stand up and manage a
| distributed training cluster as a simplifying assumption. Note
| this is an aggressive assumption.
|
| You would need to replicate the preprocessing steps.
| Replicating these steps is going to be tricky as they are not
| described in detail.Then you would need to implement the model
| using xformers [1]. Using xformers is going to save you a lot
| of compute spend. You will need to manually implement the
| backwards pass to reduce recomputation of expensive
| activations.
|
| The model was trained using 2048 A100 GPUs with 80GBs of VRAM.
| A single 8 A100 GPU machine from Lambda Cloud costs $12.00/hr
| [2]. The team from meta used 256 such machines giving you a per
| day cost of $73,728. It takes 21 days to train this model. The
| upfront lower bound cost estimate of doing this is [(12.00 *
| 24) * 21 * 256) = ] $1,548,288 dollars assuming everything goes
| smoothly and your model doesn't bite it during training. You
| may be able to negotiate bulk pricing for these types of
| workloads.
|
| That dollar value is just for the compute resources alone.
| Given the compute costs required you will probably also want a
| team composed of ML Ops engineers to monitor the training
| cluster and research scientists to help you with the
| preprocessing and model pipelines.
|
| [1] https://github.com/facebookresearch/xformers [2]
| https://lambdalabs.com/service/gpu-cloud
|
| edit: these costs are for the 65B parameter model.
| swyx wrote:
| thank you! love getting a peek into infra math. hopefully
| these will come down someday but also my resources rise up to
| be able to train these inhouse. can foresee foundation model
| training being a core competency of many tech platform
| companies in future.
| lovelearning wrote:
| I hope they make at least the smallest one available for
| everyone. I find it ironic that they want to prevent misuse, but
| think allowing access to those affiliated with government
| organizations will achieve that.
| [deleted]
| bioemerl wrote:
| It's not really open source which is something of a massive
| bummer, but still cool to see Facebook of all companies being the
| relative good guy.
| tevon wrote:
| How big is one of these models once it is trained and tuned?
| Could it be run on a single reasonably-large EC2? Or does it need
| any special architecture?
| rsrsrs86 wrote:
| Funny how the release has a note that the models clearly have
| ethical problemas that must be addressed, and still the company
| chooses to publish it.
| 323 wrote:
| The old "not my problem how people use the gun, I'm just a
| manufacturer" defense.
| UncleEntity wrote:
| Ethical problems that they don't purge it of Wrong Think?
|
| Why people care that a dodgy AI can be made to say politically
| incorrect things is beyond me. Don't use its outputs for
| anything important and everything will be alright.
| rvz wrote:
| > To maintain integrity and prevent misuse, we are releasing our
| model under a noncommercial license focused on research use
| cases. Access to the model will be granted on a case-by-case
| basis to academic researchers; those affiliated with
| organizations in government, civil society, and academia; and
| industry research laboratories around the world. People
| interested in applying for access can find the link to the
| application in our research paper.
|
| The closest you are going to get to the source is here:
| https://github.com/facebookresearch/llama
|
| It is still unclear if you are even going to get access to the
| entire model as open source. Even if you did, you can't use it
| for your commercial product anyway.
| swatcoder wrote:
| > Even if you did, you can't use it for your commercial product
| anyway.
|
| And of course, the irony is that its not commercial products
| that endanger "integrity and misuse".
|
| The looming misuse of LLM's is cheap content flooding by
| spammers, black hats, and propaganda bots -- and those people
| don't care about licenses, and will inevitably defeat any
| watermarks meant to prevent or track leaks.
| ninjin wrote:
| Facebook continues to appropriate the word "open" [1], which is
| sad really. There are plenty of good words they and others
| could use instead.
|
| [1]: https://news.ycombinator.com/item?id=32079558
| jerpint wrote:
| I just signed some form with license to maybe possibly get
| access to the model... not very open indeed
| _a_a_a_ wrote:
| Newbie asks: what is a 'parameter' here
| sp332 wrote:
| One neuron weight. A single element of a matrix or tensor.
| simonw wrote:
| The most interesting snippet in the paper I think is this:
|
| > For instance, LLaMA-13B outperforms GPT-3 on most bench- marks,
| despite being 10x smaller. We believe that this model will help
| democratize the access and study of LLMs, since it can be run on
| a single GPU.
| m3kw9 wrote:
| Does this show how you inefficient GPT3 is or how easy their
| model can be disrupted? The way they will need to keep business
| is innovate faster with a usable commercial product.
| stevenhuang wrote:
| How about alignment/ability to answer prompt queries and chain
| of thought reasoning capabilities?
|
| Without the fine tuning RLHF phase to make it like instructgpt
| I'm assuming it won't be as good as ChatGPT, is that right?
|
| How hard would it be to fine tune the 65B model on commodity
| hardware?
|
| Found answer here:
|
| > Out-of-scope use cases LLaMA is a base, or foundational,
| model. As such, it should not be used on downstream
| applications without further risk evaluation and mitigation. In
| particular, our model has not been trained with human feedback,
| and can thus generate toxic or offensive content, incorrect
| information or generally unhelpful answers.
|
| https://github.com/facebookresearch/llama/blob/main/MODEL_CA...
| zozbot234 wrote:
| Yes but fine tuning for RL is not expected to be hard. You're
| essentially limited by how much human feedback is available,
| so it's very different from training the foundational model
| on random bulk data.
| simonw wrote:
| At this point I fully expect that someone will release a
| usable RLHF-fine-tuned language model that can run on
| consumer hardware, based on the methodology used for LLaMA
| (and other similar papers e.g.
| https://github.com/FMInference/FlexGen ), at some point in
| the next 6-24 months.
| monkeydust wrote:
| What are the functional steps that need to happen to get
| to that point ?
| stevenhuang wrote:
| Of course, but even still the level of scale, clean data,
| and human supervision needed may be significant. It's
| reported OpenAI used an army of humans to generate question
| answer prompts and rate the model output.
|
| They kept the details closely guarded and only hinted at
| how they did RLHF and transitioned the architecture to self
| supervised learning.
| pps wrote:
| I'm stupid in this area and have only a very broad
| understanding of "AI". But isn't this already happening
| in this project? https://github.com/LAION-AI/Open-
| Assistant
| adeon wrote:
| Has anyone had success being approved to download weights even if
| you are just a hobbyist with a big GPU? (I'm here about to
| begrudgingly fill out the Google Form) Asking in general, not
| just for this particular model.
|
| I've been following these language models and so far went through
| approval once for a English/Chinese GLM-130B model and it took
| only ~30 minutes to get approved even though I am a complete
| nobody.
| amoss wrote:
| Given the limited release of access under a license - is there
| any decision on whether a model like this would be protected by
| copyright if it leaked?
|
| The output of an algorithm is not a human authored creative work,
| which given the recent decision in the book case would seem to
| weigh against it.
| rafaelero wrote:
| Wow. That's nuts. An open-source model as competitive as 540b
| Palm? Sign me in. Wish they made it easy to access it, though.
| Not sure if I would be able to put my hands on it solely as an
| interested individual.
| cypress66 wrote:
| There's nothing even close to open source here.
| milesward wrote:
| Right, but does it whip the LLaMA's ass?
| layer8 wrote:
| You mean, will their competitors whip the LLaMA's ass. ;)
| blablablub wrote:
| So what hardware do we need to run this model?
| CuriouslyC wrote:
| 7 billion can run on 16+ gb GPUs as fp16, 14 billion can be run
| on 16+ gb if quantized to int8. 14G @ fp16 and 30G at int8 will
| require one of the 48 gb cards (less, but hardware mostly goes
| 24 -> 48).
| brucethemoose2 wrote:
| Requirements could be reduced with something like DeepSpeed
| or ColossalAI (or even just simple hacks to move bits to RAM
| more aggressively)
| blablablub wrote:
| thanks
| qup wrote:
| What kind of machine does it take to run the model?
| nafizh wrote:
| >'We release all our models to the research community'
|
| Where release means you fill out a form and wait indefinitely.
| Also no use for commercial purposes - which means 95% of users
| are out - certainly doesn't democratize LLMs.
| version_five wrote:
| I think these "open" models are the lamest release strategy. I
| get the idea of being closed a la OpenAI to protect a business
| model, and I understand open sharing in the community
|
| "We'll let you see it if we approve you" is just being a self-
| important pompous middle man.
| mliker wrote:
| > doesn't democratize LLMs
|
| open source always leads to more democratization in the long
| run
| circuit10 wrote:
| This isn't open source, at least by the usual definition that
| you can freely use it for any purpose and redistribute it
| bioemerl wrote:
| Yeah. I was hoping to be able to use the model in koboldai,
| but no dice.
| minimaxir wrote:
| This blog post is terrible at listing the improvements offered by
| the model, the abstract is better:
|
| > We introduce LLaMA, a collection of foundation language models
| ranging from 7B to 65B parameters. We train our models on
| trillions of tokens, and show that it is possible to train state-
| of-the-art models using publicly available datasets exclusively,
| without resorting to proprietary and inaccessible datasets. In
| particular, LLaMA-13B outperforms GPT-3 (175B) on most
| benchmarks, and LLaMA-65B is competitive with the best models,
| Chinchilla70B and PaLM-540B. We release all our models to the
| research community.
| mjburgess wrote:
| > We release all our models to the research community.
|
| This is yet more evidence for the "AI isn't a competitive
| advantage" thesis. State-of-the-Art is a public resource, so
| competing _with_ AI offers no "moat".
| lr1970 wrote:
| > We release all our models to the research community.
|
| And from the FB blogpost [0] "Request Form Thank you for your
| interest in Meta AI's LLaMA (Large Language Model Meta AI)
| models. To request access to the models, please fill out this
| form, and we'll review and let you know if your use case is
| approved. The information you provide below will be used
| solely to assess eligibility to access these models."
|
| So much for "releasing" the model for research community.
|
| [0] https://ai.facebook.com/blog/large-language-model-llama-
| meta...
| jstx1 wrote:
| The research is open, you still need
|
| 1) to have a good idea for a product that people want
|
| 2) lots of talent and resources to actually build and run
| everything
| ilaksh wrote:
| Actually no it's not available for commercial purposes or
| for anyone they do not approve.
| mjburgess wrote:
| And then facebook will release a version which beats it ?
| :)
| minimaxir wrote:
| Business orgs have been finetuning open-source models like
| these on their own internal data to create a moat since BERT
| in 2018.
| Der_Einzige wrote:
| Glad to see you still on HN! You've done amazing work in
| this domain!
|
| I'd argue that this goes further back to the word2vec/glove
| days too. I was working for a company in 2018 who leveraged
| my skills for fine-tuning word2vec/fasttext even before
| BERT/attention is all you need paper.
| cardine wrote:
| And yet there are still no publicly available models that
| could actually compete with ChatGPT.
|
| I'm not even talking about RLHF (although data like that is
| also a huge moat) - just simple things like larger context
| sizes.
|
| There are still plenty of AI advantages to be had if you go
| just a little bit outside of what is currently possible with
| off the shelf models.
| layer8 wrote:
| They are only releasing it on a case-by-case basis and with a
| non-commercial license.
| fwlr wrote:
| In terms of medieval warfare, what Facebook is doing here
| looks like filling the moat with rocks and dirt. OpenAI is
| worth billions, Microsoft is spending billions to retrofit
| most of their big offerings with AI, Google is doubtless also
| spending billions to integrate AI in their products. Their
| moat in all cases is "a blob of X billion weights trained on
| Y trillion tokens". Facebook here is spending mere _millions_
| to make and release competitive models, effectively filling
| in the moat by giving it to everyone.
| mgraczyk wrote:
| My guess is that the primary strategic motivation here is
| recruiting and maintaining a competent org of people who
| know how to build large models. I don't think that the
| actual release is as calculated as all that. It's more
| about proving to themselves and the world that they can
| train big models.
| skybrian wrote:
| Not everyone. You have to apply for access.
| twoodfin wrote:
| Isn't this a classic "commoditize your complements" play?
| Facebook's value is in their social graph and users'
| attachment to it via various apps. TikTok and others are
| not trying to replicate that, they're creating similar
| volumes of user attachment via models. You can easily
| imagine LLM's being applied by competitors in a comparable
| way. If models become commodities, then Facebook continues
| to hold the one advantage nobody else has.
| [deleted]
| recuter wrote:
| I don't see such grand stratagems as being a likely
| explanation. It seems more likely that a bunch of dorks are
| running around unsupervised trying their best to make
| lemonade with whatever budgets they are given as nobody can
| realistically manage AI researches. Or at least, it seems
| that much like in political governance, events are suddenly
| outpacing the ability of corporate to react.
|
| In the case of OpenAI it is a "nudge humanity into a more
| wholesome direction" because a lot of them went on an acid
| fueled toxic affective altruism bender. And in this case it
| is "release all the things while Zuck is still obsessed
| with VR --for science!".
|
| I like this other group better. But it is disturbing that
| it can stop at any moment. Probably why they are doing it,
| while they still can.
| vineyardmike wrote:
| I think there is definitely a component of competition
| here.
|
| Facebook has no real way to monetize this (eg they won't
| release an api a la OpenAI and they don't own a search
| engine). Since they can't monetize it... why not provide
| a bit of kindling to lower the barrier for everyone else
| to compete with your competitors. This strategy is called
| "commodize your complement".
|
| If Facebook makes it easier to develop a google
| alternative, especially by doing something that doesn't
| hurt them, then they just weakened a competitor. See
| Facebook releasing datasets for mapping. Think of the
| panic ChatGPT caused Google. It only cost a few Million
| to train but it's probably costing Google more than that
| already.
| naasking wrote:
| Facebook maybe can't make money from it, but they
| arguably could _save_ money from it, for instance in
| automating some fact checking and moderation activities
| that they currently spend quite a bit of money on.
| webmaven wrote:
| _> Facebook has no real way to monetize this (eg they
| won't release an api a la OpenAI and they don't own a
| search engine). Since they can't monetize it... why not
| provide a bit of kindling to lower the barrier for
| everyone else to compete with your competitors. This
| strategy is called "commodize your complement"._
|
| Your analysis is good, but that isn't what "commoditize
| your complement" means.
|
| Strictly speaking, a search engine isn't a complement for
| FBs revenue streams. Relatively little of FBs revenue can
| be attributed to search engine traffic leading to FBs
| walled garden where they can show the user ads.
|
| Complements are generally a required product or service
| that _enables_ your core, but that isn 't revenue
| generating for you. Examples for FB are data centers (so
| they participate in the Open Compute Project[0]), and
| mobile operating systems (which Google already has made a
| commodity, for their own reasons, with Android).
|
| What FB is doing here is commoditizing their competitors'
| _core_ offering (or rather, a rather promising future
| one). That 's just the tactic though, there are several
| strategies this can enable, from undermining the barriers
| to entry into their competitors' search markets, to
| actually fragmenting that market by encouraging a
| diversity of specialized chat interfaces over one big
| chat model. You can see hints of both in this
| announcement.
|
| Final note: FB is also protecting itself from relying on
| a competitor as a supplier should chat become a preferred
| user interface for the content on social networks, which
| it hasn't, but if it ever did this would count as
| "commoditizing their complement", though I would actually
| expect FB to switch to a mostly proprietary approach in
| that circumstance (so not much openness on having LLMs
| operate on social graphs and the like), just keeping the
| foundation they rely on, and which undermines and
| prevents gatekeeping by their advertising competitors,
| open.
|
| [0] https://www.opencompute.org/
| invig wrote:
| I don't think that's the best interpretation of
| complement. Think of complement as "is your users
| experience benefited by the existence of the other
| thing". I think that Search qualifies as something that
| enhances the Facebook users experience (e.g. my cousin
| mentioned thing that they like doing today, how do I find
| more about thing?)
| titzer wrote:
| They almost certainly spent at least a few million
| dollars on this research project. Hard to say when the
| decision was made to open source this (from the outset or
| after it started showing results), but the decision was
| conscious and calculated. Nothing this high-profile is
| going to escape the strategic decision making processes
| of upper management.
| visarga wrote:
| > It seems more likely that a bunch of dorks are running
| around unsupervised trying their best to make lemonade
| with whatever budgets they are given
|
| Probably not, Zuck is announcing it.
| recuter wrote:
| Today we're releasing a new state-of-the-art AI large
| language model called LLaMA designed to help researchers
| advance their work. LLMs have shown a lot of promise in
| generating text, having conversations, summarizing
| written material, and more complicated tasks like solving
| math theorems or predicting protein structures. Meta is
| committed to this open model of research and we'll make
| our new model available to the AI research community.
|
| I don't know what that means or if he even wrote/read it
| tbh. I hope it literally just means Meta is actually
| committed to this open model of research (for now).
|
| Maybe he is being a Machiavellian moat filler, I stand
| corrected. I think/hope that they don't really have a
| plan to counter OpenAI yet because I am afraid this
| attitude won't last once they do and this stuff has
| recently started moving quickly.
| ahati wrote:
| It is to please the shareholders. Now shareholder can
| know that Meta can compete with GPT3.
| ttul wrote:
| 100%. This kind of announcement is for the street.
| Investors need to be reassured that Meta is keeping up
| with the cool AI stuff over at Google and Microsoft. A
| release like this will cause a flurry of downstream news
| coverage. I'm sure they are hoping that the model will be
| picked up by researchers doing interesting things who
| might further generate good coverage.
|
| And, yes, it fills the moat.
| mliker wrote:
| this is a pretty biased and uninformed opinion. Pretty
| condescending to call it a "bunch of dorks...running
| around unsupervised."
| recuter wrote:
| I don't know what my bias is supposed to be but I called
| them dorks affectionately for one. The other replies are
| literally arguing that they are _very extremely_
| supervised where as I am speculating they are just eager
| to share their work for the right reasons and the eye of
| Sauron has yet to turn upon them.
|
| Inside knowledge I never claimed. Anything else I can
| help you with today? :)
| rcpt wrote:
| > I called them dorks affectionately
|
| Never in my life have I seen that word used
| affectionately
| vidarh wrote:
| I have seen that many times, as well as seeing people use
| it about themselves.
| sebastiennight wrote:
| I think Noah Kagan's blog is literally called "OK Dork",
| and I always assumed the title was self-deprecating (in a
| fun/positive way) rather than negative.
| tantalor wrote:
| Check out
| https://news.ycombinator.com/newsguidelines.html
|
| Not sure how calling out comment as "biased",
| "uninformed" or "condescending" is helping.
| fwlr wrote:
| The researchers and engineers and other assorted dorks
| who built it weren't thinking about moats, for sure, I
| agree with you there. But I guarantee you that the
| metaphor of medieval warfare was on the minds of the
| executives and legal team deciding whether to let the
| team release their work in this way.
| Veen wrote:
| Commoditize your complement.
|
| https://gwern.net/complement
| PoignardAzur wrote:
| Which kind of suggests Microsoft made a really bad move
| antagonizing the open-source community with Gtihub Copilot.
|
| They got a few years of lead time in the "AI codes for you"
| market, but in exchange permanently soured a significant
| fraction of their potential userbase who will turn to open-
| source alternatives soon anyway.
|
| I wonder if they'd have been better served focusing on
| selling Azure usage and released Copilot as an open-source
| product.
| freeqaz wrote:
| How did Microsoft sour developers with Copilot? I know
| dozens of people that pay for it (including myself) and I
| feel like it is widely regarded as a "no brainer" for the
| price that it's offered at.
|
| Please help me understand!
| sandkoan wrote:
| Presumably, because they trained Copilot on billions of
| lines of, often licensed, code (without permission), that
| Copilot has a tendency to regurgitate verbatim, without
| said license.
| ldhough wrote:
| For a specific example some variation of "fast inverse
| square root" will usually get you the exact GPL licensed
| code from Quake III, comments included.
| theRealMe wrote:
| Do you mean the same code that has its own Wikipedia page
| where the exact code is written, comments included, and
| has probably been copy pasted into 100's of other
| projects?
|
| https://en.m.wikipedia.org/wiki/Fast_inverse_square_root
| robertlagrant wrote:
| The point is that it's charging having been trained on
| open source code. What you're saying agrees with that,
| but your triumphant tone seems to be implying the
| opposite. Which did you mean?
| alar44 wrote:
| [dead]
| theRealMe wrote:
| They didn't. There is a small group of people that are
| always looking for the latest reason to be outraged and
| to point at any of the big tech companies and go "aha!
| They are evil!" Copilot's ai was trained on GitHub
| projects and so these people are taking their turns
| clutching their pearls inside of their little bubble.
|
| I'd bet that more than 95% of devs haven't even heard of
| this "controversy" and even if they did, wouldn't care.
| layer8 wrote:
| Controversy about unlicensed use of source code as
| training data and lack of attributions in the generated
| output.
| Mizza wrote:
| The company that tried to kill Linux in the 90s, owned by
| the world's most famously rich man, is now stealing my
| code and selling it back to me? Yeah, fuck that.
| visarga wrote:
| It's not selling you back your code. It's different code,
| adapted to a different task; your own code is forever
| free for you, you don't need anyone to give it to you.
|
| Given the cost of running these models, and the utmost
| dedication needed to train them, I think it is worth it.
| GPUs cost money, electricity costs money. They can't
| serve the world for free and offer good latency.
| mbb70 wrote:
| I mean, that's like saying an author steals the open
| source alphabet and charges you for reading their
| ordering of letters, as if the ordering of letters isn't
| where all the value is.
| robertlagrant wrote:
| These models are trained on sequences of words, not told
| the letters and left to get on with it.
| KarlKemp wrote:
| It continues to amaze how people are incapable of
| following even the most trivial cases of abstract
| reasoning.
| Vvector wrote:
| Microsoft makes dozens of "really bad moves" every year.
| This is nothing.
| miketery wrote:
| Can this be downloaded like stable diffusion? How big are these
| models? How long does it take to run a "query" against them on a
| decent gaming PC?
| bioemerl wrote:
| This can't, but there's some software, KoboldAi, which lets you
| download and run other LLMs
| flatiron wrote:
| you have to contact facebook for a copy, so not as open as
| stable diffusion
| brucethemoose2 wrote:
| No one seems to be running it locally at the moment.
| monkeydust wrote:
| What no implementation example?
| naveen99 wrote:
| Why not a 1 Billion parameter model that can run on a single GPU
| ?
|
| worse is better will come to LLM's.
| carlsborg wrote:
| "The code is licensed under the GPLv3, which permits commercial
| use." Yann on Twitter
| return_to_monke wrote:
| The code. They didn't release a model tho, so unless you have a
| lot of money or get granted access to it, not for you.
| saurabh20n wrote:
| Quick notes from first glance at paper
| https://research.facebook.com/publications/llama-open-and-ef...:
|
| * All variants were trained on 1T - 1.4T tokens; which is a good
| compared to their sizes based on the Chinchilla-metric. Code is
| 4.5% of the training data (similar to others). [Table 2]
|
| * They note the GPU hours as 82,432 (7B model) to 1,022,362 (65B
| model). [Table 15] GPU hour rates will vary, but let's give a
| range of $1 to $4. The 7B model would have cost ~$82-329k and the
| 65B something in the range of ~$1-4M. They also note their total
| time spent for all models: "we used 2048 A100-80GB for a period
| of approximately 5 months" [sec 6, pg 10]
|
| * 65B model's performance is broadly comparable to PALM-540B. Not
| a small feat, but also could indicate the benefits of good model-
| vs-token size ratios [Tables 3,4,5,6]. Their conjecture for
| underperforming on MMLU (multitask language understanding)
| compared to PALM-540B and Chinchilla-70B is smaller fraction of
| books and academic training data.
|
| * Math and code tasks: Math tasks they are substantially worse
| than Minerva (comparing their 65B to Minerva 62B; they hands down
| fail against Minerva 540B) [Table 7]. Code tasks they are broadly
| competitive with PALM-540B (HumanEval and MBPP evals) [Table 8]
|
| * Surprising that instruction fine tuning takes such a small part
| of the paper (sec 4, pg. 7)
| zozbot234 wrote:
| https://github.com/facebookresearch/llama/blob/main/MODEL_CA...
| (linked in OP) has basic information about this model.
| make3 wrote:
| do they do instruction fine-tuning
| sandGorgon wrote:
| >* 65B model's performance is broadly comparable to PALM-540B.
| Not a small feat, but also could indicate the benefits of good
| model-vs-token size ratios [Tables 3,4,5,6]. Their conjecture
| for underperforming on MMLU (multitask language understanding)
| compared to PALM-540B and Chinchilla-70B is smaller fraction of
| books and academic training data.*
|
| what do you mean by this ? The OpenAI papers talk roughly about
| model performance scaling by parameters. does this show the
| other way ?
| vishal0123 wrote:
| Scaling law is for training till convergence. Both PALM and
| this model have been undertrained. See the training loss plot
| in the paper.
| machinekob wrote:
| I hate when people don't include approximation for traning
| before final hyperparameters are found as its most costly part
| of whole process most of the time.
|
| Just yes we train it for so long etc. but they never speak
| about tens or even hundres of runs before they finalize the
| model parameters and architecture -.-
| 323 wrote:
| > _we used 2048 A100-80GB for a period of approximately 5
| months_
|
| Do we know how much total energy a human consumes from birth to
| 20 yo? Something like 2000 calories integrated over 20 years.
| How does it compare to the GPUs above?
|
| Wolfram Alpha:
|
| - human - 17 MW/h ((2000 calories per day) over 20 years in
| MWh)
|
| - GPUs - 3000 MW/h ((2048 * 400) W over 5 months in MWh)
|
| We still have the edge.
|
| LOL, I'm being downvoted, I wonder way. Some don't like the
| question.
| isoprophlex wrote:
| You mean MWh maybe, not MW/h? (which is what, J/s^2 in SI...
| "Power rate".)
| 323 wrote:
| Right, I used the correct MWh in Wolfram, but for some
| reason wrote MW/h, I think it was written like that a long
| time ago on electricity bills.
| robbiep wrote:
| It's because your human math for power output is so far off
| it's hard to know where to start to point you in the right
| direction
| 323 wrote:
| Please do tell. Or better provide your estimation. I just
| took raw calorie intake, no
| heating/transportation/lighting/computer usage/....
| zhynn wrote:
| You have to include our evolutionary history too. A
| considerable amount of our sophisticated behavior doesn't
| require specific training, as it is encoded in our genetic
| and epigenetic systems. We aren't starting from zero.
| melling wrote:
| Every human requires the same energy, 20+ years, and
| training.
|
| The trained computer model can be duplicated and used,
| requiring much less energy.
|
| None of this matters to me, though.
|
| The goal is to build better models. We can worry about the
| efficiency later.
| Dylan16807 wrote:
| > We still have the edge.
|
| Depends on what you're doing. A human is much smarter than
| one of these models, but the model has approximate knowledge
| of orders of magnitude more things. And the energy costs per
| word of output are a lot closer.
| SethTro wrote:
| (1022362 + 82432) gpu-hours / 2048gpus / 5 months ~= 15%
| uptime.
|
| That's only 0.08 nines of availability!
|
| I remember in one of their old guidebooks a lot of struggle to
| keep their 64 machine (512 gpu) cluster running this was
| probably 4x the machines and 4x the number of cluster dropouts.
___________________________________________________________________
(page generated 2023-02-24 23:01 UTC)