[HN Gopher] LLaMA: A foundational, 65B-parameter large language ...
       ___________________________________________________________________
        
       LLaMA: A foundational, 65B-parameter large language model
        
       Author : mfiguiere
       Score  : 296 points
       Date   : 2023-02-24 16:08 UTC (6 hours ago)
        
 (HTM) web link (ai.facebook.com)
 (TXT) w3m dump (ai.facebook.com)
        
       | freezed88 wrote:
       | (creator of gpt index / llamaindex here
       | https://github.com/jerryjliu/gpt_index)
       | 
       | Funny that we had just rebranded our tool from GPT Index to
       | LlamaIndex about a week ago to avoid potential trademark issues
       | with OpenAI, and turns out Meta has similar ideas around
       | LLM+llama puns :). Must mean the name is good though!
       | 
       | Also very excited to try plugging in the LLaMa model into
       | LlamaIndex, will report the results.
        
         | haolez wrote:
         | gpt-index is awesome! What should I study to understand the
         | inner workings a bit better?
        
         | mark_l_watson wrote:
         | I look forward to your comparison results.
         | 
         | BTW, I have been heavily experimenting with both your
         | LlamaIndex and also LangChain which you use in LlamaIndex. I am
         | also writing a new book centered around both projects. Great
         | stuff!!
        
       | modeless wrote:
       | How possible is it to run these models on a gaming GPU?
        
         | coolspot wrote:
         | Yes, if your gaming rig is NVidia DGX-1 workstation.
         | 
         | https://youtu.be/5TRr2oWeSw0
        
       | iandanforth wrote:
       | "we are publicly releasing LLaMA"
       | 
       | "Access to the model will be granted on a case-by-case basis to
       | academic researchers"
       | 
       | They keep saying the word 'release' but I don't think they know
       | what that word means. There are perfectly good words in the
       | English language to describe this situation without abusing
       | "release." They "will begin to grant access to a select few".
       | Nothing about that releases the model, or their control, which
       | they are not doing and shouldn't imply.
        
         | modeless wrote:
         | Someone needs to start a pirate bay for torrents of ML models.
         | Call it ClosedAI, since in AI land these words mean the
         | opposite of what they say.
        
       | ImprobableTruth wrote:
       | Unfortunately non-commercial and only available to academics upon
       | request.
        
         | elorant wrote:
         | Until someone leaks it.
        
           | brucethemoose2 wrote:
           | I give it a week.
        
         | flangola7 wrote:
         | That's probably the best and safest way to do it
        
         | yieldcrv wrote:
         | darn, guess we'll have to wait until next week when some other
         | surprise organization releases something extremely surprising
         | and useful in this space
        
       | weystrom wrote:
       | I'd love to hear what Facebook defines as "open" and
       | "democratic".
       | 
       | I understand the unwillingness to attach your brand to whatever
       | porn that inevitably comes out of making it available for
       | everybody, but maybe don't use such big words then.
        
       | dentalperson wrote:
       | How does token->embedding lookup work with 1.4T BPE tokens? Since
       | there are more tokens than the 65B parameters it must be doing
       | some sort of interesting thing based on the merge operations. Is
       | it different from what other GPT models with ~100k tokens are
       | doing?
       | 
       | At inference, how many of those tokens are used? (they mention
       | most tokens are used only once during training, so the must be
       | very long sequences.)
        
         | bitRAKE wrote:
         | The 1.4T tokens are what the model was trained on, and not the
         | token range of the embedding.
        
           | dentalperson wrote:
           | Ah, that makes more sense, thank you. Since this was
           | mentioned in the tokenizer section and the number of unique
           | tokens wasn't mentioned I misunderstood.
        
       | zxienin wrote:
       | Will it get 1M users using it by next week?
       | 
       | "Our 65B model is better performant than ChatGPT's 175B" is
       | missing the point openai made in Nov'22.
       | 
       | That aside, this looks nice incremental research progress.
        
       | apgwoz wrote:
       | LLaMA is short for "Large Language Model Meta AI" -- I don't get
       | how, but I guess the "fakeronym" was a brandable, recognizable
       | word, and close enough that they went with it?
        
       | muglug wrote:
       | I was surprised to see this citation in the linked paper
       | 
       | > Alan M Turing. 2009. Computing machinery and intelligence
       | 
       | Did he invent a time machine too?
        
         | harveywi wrote:
         | It is undecidable to determine whether a manuscript has
         | terminated, or whether it will continue being revised forever.
        
         | igravious wrote:
         | Absolutely a pet peeve of mine.
         | 
         | What's wrong with (orig. pub. 1950, rev. ed. 2009) or something
         | like that.
         | 
         | With Zotero, the ref manager I use, you need to know the
         | special magic incantation to store the original date of
         | publication (the publishing date refers to the edition you're
         | citing) but it just looks stupid (I think) to see Nietzsche F.
         | 1999, Untimely Meditations (or whatever) and also I'd like to
         | sort texts by original date of publication from oldest to
         | newest because that's interesting. You have to put a special
         | magic code in the _extra_ field and then your tool-chain has to
         | preserve it and transmogrify it correctly in the doc you 're
         | making use of the citation.
        
         | Mizza wrote:
         | Looks like it was republished with additional commentary:
         | https://link.springer.com/chapter/10.1007/978-1-4020-6710-5_...
        
       | thedudeabides5 wrote:
       | LLMs are already getting commoditized, and this is good
        
       | swyx wrote:
       | > To maintain integrity and prevent misuse, we are releasing our
       | model under a noncommercial license focused on research use
       | cases.
       | 
       | if say i wanted to replicate this paper for commercial use, what
       | would it take and how do i get started? would FB have a basis for
       | objection?
        
         | minimaxir wrote:
         | A metric-ton of GPU compute, for starters.
        
         | chandureddyvari wrote:
         | 82342 GPU(A-100 80GB) hours for training their 7B(smallest)
         | model according to their paper.
        
           | riku_iki wrote:
           | 115 GPUs can be used to train it within month.
        
         | causalmodels wrote:
         | I'm going to assume you know how to stand up and manage a
         | distributed training cluster as a simplifying assumption. Note
         | this is an aggressive assumption.
         | 
         | You would need to replicate the preprocessing steps.
         | Replicating these steps is going to be tricky as they are not
         | described in detail.Then you would need to implement the model
         | using xformers [1]. Using xformers is going to save you a lot
         | of compute spend. You will need to manually implement the
         | backwards pass to reduce recomputation of expensive
         | activations.
         | 
         | The model was trained using 2048 A100 GPUs with 80GBs of VRAM.
         | A single 8 A100 GPU machine from Lambda Cloud costs $12.00/hr
         | [2]. The team from meta used 256 such machines giving you a per
         | day cost of $73,728. It takes 21 days to train this model. The
         | upfront lower bound cost estimate of doing this is [(12.00 *
         | 24) * 21 * 256) = ] $1,548,288 dollars assuming everything goes
         | smoothly and your model doesn't bite it during training. You
         | may be able to negotiate bulk pricing for these types of
         | workloads.
         | 
         | That dollar value is just for the compute resources alone.
         | Given the compute costs required you will probably also want a
         | team composed of ML Ops engineers to monitor the training
         | cluster and research scientists to help you with the
         | preprocessing and model pipelines.
         | 
         | [1] https://github.com/facebookresearch/xformers [2]
         | https://lambdalabs.com/service/gpu-cloud
         | 
         | edit: these costs are for the 65B parameter model.
        
           | swyx wrote:
           | thank you! love getting a peek into infra math. hopefully
           | these will come down someday but also my resources rise up to
           | be able to train these inhouse. can foresee foundation model
           | training being a core competency of many tech platform
           | companies in future.
        
       | lovelearning wrote:
       | I hope they make at least the smallest one available for
       | everyone. I find it ironic that they want to prevent misuse, but
       | think allowing access to those affiliated with government
       | organizations will achieve that.
        
         | [deleted]
        
       | bioemerl wrote:
       | It's not really open source which is something of a massive
       | bummer, but still cool to see Facebook of all companies being the
       | relative good guy.
        
       | tevon wrote:
       | How big is one of these models once it is trained and tuned?
       | Could it be run on a single reasonably-large EC2? Or does it need
       | any special architecture?
        
       | rsrsrs86 wrote:
       | Funny how the release has a note that the models clearly have
       | ethical problemas that must be addressed, and still the company
       | chooses to publish it.
        
         | 323 wrote:
         | The old "not my problem how people use the gun, I'm just a
         | manufacturer" defense.
        
         | UncleEntity wrote:
         | Ethical problems that they don't purge it of Wrong Think?
         | 
         | Why people care that a dodgy AI can be made to say politically
         | incorrect things is beyond me. Don't use its outputs for
         | anything important and everything will be alright.
        
       | rvz wrote:
       | > To maintain integrity and prevent misuse, we are releasing our
       | model under a noncommercial license focused on research use
       | cases. Access to the model will be granted on a case-by-case
       | basis to academic researchers; those affiliated with
       | organizations in government, civil society, and academia; and
       | industry research laboratories around the world. People
       | interested in applying for access can find the link to the
       | application in our research paper.
       | 
       | The closest you are going to get to the source is here:
       | https://github.com/facebookresearch/llama
       | 
       | It is still unclear if you are even going to get access to the
       | entire model as open source. Even if you did, you can't use it
       | for your commercial product anyway.
        
         | swatcoder wrote:
         | > Even if you did, you can't use it for your commercial product
         | anyway.
         | 
         | And of course, the irony is that its not commercial products
         | that endanger "integrity and misuse".
         | 
         | The looming misuse of LLM's is cheap content flooding by
         | spammers, black hats, and propaganda bots -- and those people
         | don't care about licenses, and will inevitably defeat any
         | watermarks meant to prevent or track leaks.
        
         | ninjin wrote:
         | Facebook continues to appropriate the word "open" [1], which is
         | sad really. There are plenty of good words they and others
         | could use instead.
         | 
         | [1]: https://news.ycombinator.com/item?id=32079558
        
           | jerpint wrote:
           | I just signed some form with license to maybe possibly get
           | access to the model... not very open indeed
        
       | _a_a_a_ wrote:
       | Newbie asks: what is a 'parameter' here
        
         | sp332 wrote:
         | One neuron weight. A single element of a matrix or tensor.
        
       | simonw wrote:
       | The most interesting snippet in the paper I think is this:
       | 
       | > For instance, LLaMA-13B outperforms GPT-3 on most bench- marks,
       | despite being 10x smaller. We believe that this model will help
       | democratize the access and study of LLMs, since it can be run on
       | a single GPU.
        
         | m3kw9 wrote:
         | Does this show how you inefficient GPT3 is or how easy their
         | model can be disrupted? The way they will need to keep business
         | is innovate faster with a usable commercial product.
        
         | stevenhuang wrote:
         | How about alignment/ability to answer prompt queries and chain
         | of thought reasoning capabilities?
         | 
         | Without the fine tuning RLHF phase to make it like instructgpt
         | I'm assuming it won't be as good as ChatGPT, is that right?
         | 
         | How hard would it be to fine tune the 65B model on commodity
         | hardware?
         | 
         | Found answer here:
         | 
         | > Out-of-scope use cases LLaMA is a base, or foundational,
         | model. As such, it should not be used on downstream
         | applications without further risk evaluation and mitigation. In
         | particular, our model has not been trained with human feedback,
         | and can thus generate toxic or offensive content, incorrect
         | information or generally unhelpful answers.
         | 
         | https://github.com/facebookresearch/llama/blob/main/MODEL_CA...
        
           | zozbot234 wrote:
           | Yes but fine tuning for RL is not expected to be hard. You're
           | essentially limited by how much human feedback is available,
           | so it's very different from training the foundational model
           | on random bulk data.
        
             | simonw wrote:
             | At this point I fully expect that someone will release a
             | usable RLHF-fine-tuned language model that can run on
             | consumer hardware, based on the methodology used for LLaMA
             | (and other similar papers e.g.
             | https://github.com/FMInference/FlexGen ), at some point in
             | the next 6-24 months.
        
               | monkeydust wrote:
               | What are the functional steps that need to happen to get
               | to that point ?
        
             | stevenhuang wrote:
             | Of course, but even still the level of scale, clean data,
             | and human supervision needed may be significant. It's
             | reported OpenAI used an army of humans to generate question
             | answer prompts and rate the model output.
             | 
             | They kept the details closely guarded and only hinted at
             | how they did RLHF and transitioned the architecture to self
             | supervised learning.
        
               | pps wrote:
               | I'm stupid in this area and have only a very broad
               | understanding of "AI". But isn't this already happening
               | in this project? https://github.com/LAION-AI/Open-
               | Assistant
        
       | adeon wrote:
       | Has anyone had success being approved to download weights even if
       | you are just a hobbyist with a big GPU? (I'm here about to
       | begrudgingly fill out the Google Form) Asking in general, not
       | just for this particular model.
       | 
       | I've been following these language models and so far went through
       | approval once for a English/Chinese GLM-130B model and it took
       | only ~30 minutes to get approved even though I am a complete
       | nobody.
        
       | amoss wrote:
       | Given the limited release of access under a license - is there
       | any decision on whether a model like this would be protected by
       | copyright if it leaked?
       | 
       | The output of an algorithm is not a human authored creative work,
       | which given the recent decision in the book case would seem to
       | weigh against it.
        
       | rafaelero wrote:
       | Wow. That's nuts. An open-source model as competitive as 540b
       | Palm? Sign me in. Wish they made it easy to access it, though.
       | Not sure if I would be able to put my hands on it solely as an
       | interested individual.
        
         | cypress66 wrote:
         | There's nothing even close to open source here.
        
       | milesward wrote:
       | Right, but does it whip the LLaMA's ass?
        
         | layer8 wrote:
         | You mean, will their competitors whip the LLaMA's ass. ;)
        
       | blablablub wrote:
       | So what hardware do we need to run this model?
        
         | CuriouslyC wrote:
         | 7 billion can run on 16+ gb GPUs as fp16, 14 billion can be run
         | on 16+ gb if quantized to int8. 14G @ fp16 and 30G at int8 will
         | require one of the 48 gb cards (less, but hardware mostly goes
         | 24 -> 48).
        
           | brucethemoose2 wrote:
           | Requirements could be reduced with something like DeepSpeed
           | or ColossalAI (or even just simple hacks to move bits to RAM
           | more aggressively)
        
           | blablablub wrote:
           | thanks
        
       | qup wrote:
       | What kind of machine does it take to run the model?
        
       | nafizh wrote:
       | >'We release all our models to the research community'
       | 
       | Where release means you fill out a form and wait indefinitely.
       | Also no use for commercial purposes - which means 95% of users
       | are out - certainly doesn't democratize LLMs.
        
         | version_five wrote:
         | I think these "open" models are the lamest release strategy. I
         | get the idea of being closed a la OpenAI to protect a business
         | model, and I understand open sharing in the community
         | 
         | "We'll let you see it if we approve you" is just being a self-
         | important pompous middle man.
        
         | mliker wrote:
         | > doesn't democratize LLMs
         | 
         | open source always leads to more democratization in the long
         | run
        
           | circuit10 wrote:
           | This isn't open source, at least by the usual definition that
           | you can freely use it for any purpose and redistribute it
        
             | bioemerl wrote:
             | Yeah. I was hoping to be able to use the model in koboldai,
             | but no dice.
        
       | minimaxir wrote:
       | This blog post is terrible at listing the improvements offered by
       | the model, the abstract is better:
       | 
       | > We introduce LLaMA, a collection of foundation language models
       | ranging from 7B to 65B parameters. We train our models on
       | trillions of tokens, and show that it is possible to train state-
       | of-the-art models using publicly available datasets exclusively,
       | without resorting to proprietary and inaccessible datasets. In
       | particular, LLaMA-13B outperforms GPT-3 (175B) on most
       | benchmarks, and LLaMA-65B is competitive with the best models,
       | Chinchilla70B and PaLM-540B. We release all our models to the
       | research community.
        
         | mjburgess wrote:
         | > We release all our models to the research community.
         | 
         | This is yet more evidence for the "AI isn't a competitive
         | advantage" thesis. State-of-the-Art is a public resource, so
         | competing _with_ AI offers no  "moat".
        
           | lr1970 wrote:
           | > We release all our models to the research community.
           | 
           | And from the FB blogpost [0] "Request Form Thank you for your
           | interest in Meta AI's LLaMA (Large Language Model Meta AI)
           | models. To request access to the models, please fill out this
           | form, and we'll review and let you know if your use case is
           | approved. The information you provide below will be used
           | solely to assess eligibility to access these models."
           | 
           | So much for "releasing" the model for research community.
           | 
           | [0] https://ai.facebook.com/blog/large-language-model-llama-
           | meta...
        
           | jstx1 wrote:
           | The research is open, you still need
           | 
           | 1) to have a good idea for a product that people want
           | 
           | 2) lots of talent and resources to actually build and run
           | everything
        
             | ilaksh wrote:
             | Actually no it's not available for commercial purposes or
             | for anyone they do not approve.
        
             | mjburgess wrote:
             | And then facebook will release a version which beats it ?
             | :)
        
           | minimaxir wrote:
           | Business orgs have been finetuning open-source models like
           | these on their own internal data to create a moat since BERT
           | in 2018.
        
             | Der_Einzige wrote:
             | Glad to see you still on HN! You've done amazing work in
             | this domain!
             | 
             | I'd argue that this goes further back to the word2vec/glove
             | days too. I was working for a company in 2018 who leveraged
             | my skills for fine-tuning word2vec/fasttext even before
             | BERT/attention is all you need paper.
        
           | cardine wrote:
           | And yet there are still no publicly available models that
           | could actually compete with ChatGPT.
           | 
           | I'm not even talking about RLHF (although data like that is
           | also a huge moat) - just simple things like larger context
           | sizes.
           | 
           | There are still plenty of AI advantages to be had if you go
           | just a little bit outside of what is currently possible with
           | off the shelf models.
        
           | layer8 wrote:
           | They are only releasing it on a case-by-case basis and with a
           | non-commercial license.
        
           | fwlr wrote:
           | In terms of medieval warfare, what Facebook is doing here
           | looks like filling the moat with rocks and dirt. OpenAI is
           | worth billions, Microsoft is spending billions to retrofit
           | most of their big offerings with AI, Google is doubtless also
           | spending billions to integrate AI in their products. Their
           | moat in all cases is "a blob of X billion weights trained on
           | Y trillion tokens". Facebook here is spending mere _millions_
           | to make and release competitive models, effectively filling
           | in the moat by giving it to everyone.
        
             | mgraczyk wrote:
             | My guess is that the primary strategic motivation here is
             | recruiting and maintaining a competent org of people who
             | know how to build large models. I don't think that the
             | actual release is as calculated as all that. It's more
             | about proving to themselves and the world that they can
             | train big models.
        
             | skybrian wrote:
             | Not everyone. You have to apply for access.
        
             | twoodfin wrote:
             | Isn't this a classic "commoditize your complements" play?
             | Facebook's value is in their social graph and users'
             | attachment to it via various apps. TikTok and others are
             | not trying to replicate that, they're creating similar
             | volumes of user attachment via models. You can easily
             | imagine LLM's being applied by competitors in a comparable
             | way. If models become commodities, then Facebook continues
             | to hold the one advantage nobody else has.
        
             | [deleted]
        
             | recuter wrote:
             | I don't see such grand stratagems as being a likely
             | explanation. It seems more likely that a bunch of dorks are
             | running around unsupervised trying their best to make
             | lemonade with whatever budgets they are given as nobody can
             | realistically manage AI researches. Or at least, it seems
             | that much like in political governance, events are suddenly
             | outpacing the ability of corporate to react.
             | 
             | In the case of OpenAI it is a "nudge humanity into a more
             | wholesome direction" because a lot of them went on an acid
             | fueled toxic affective altruism bender. And in this case it
             | is "release all the things while Zuck is still obsessed
             | with VR --for science!".
             | 
             | I like this other group better. But it is disturbing that
             | it can stop at any moment. Probably why they are doing it,
             | while they still can.
        
               | vineyardmike wrote:
               | I think there is definitely a component of competition
               | here.
               | 
               | Facebook has no real way to monetize this (eg they won't
               | release an api a la OpenAI and they don't own a search
               | engine). Since they can't monetize it... why not provide
               | a bit of kindling to lower the barrier for everyone else
               | to compete with your competitors. This strategy is called
               | "commodize your complement".
               | 
               | If Facebook makes it easier to develop a google
               | alternative, especially by doing something that doesn't
               | hurt them, then they just weakened a competitor. See
               | Facebook releasing datasets for mapping. Think of the
               | panic ChatGPT caused Google. It only cost a few Million
               | to train but it's probably costing Google more than that
               | already.
        
               | naasking wrote:
               | Facebook maybe can't make money from it, but they
               | arguably could _save_ money from it, for instance in
               | automating some fact checking and moderation activities
               | that they currently spend quite a bit of money on.
        
               | webmaven wrote:
               | _> Facebook has no real way to monetize this (eg they
               | won't release an api a la OpenAI and they don't own a
               | search engine). Since they can't monetize it... why not
               | provide a bit of kindling to lower the barrier for
               | everyone else to compete with your competitors. This
               | strategy is called "commodize your complement"._
               | 
               | Your analysis is good, but that isn't what "commoditize
               | your complement" means.
               | 
               | Strictly speaking, a search engine isn't a complement for
               | FBs revenue streams. Relatively little of FBs revenue can
               | be attributed to search engine traffic leading to FBs
               | walled garden where they can show the user ads.
               | 
               | Complements are generally a required product or service
               | that _enables_ your core, but that isn 't revenue
               | generating for you. Examples for FB are data centers (so
               | they participate in the Open Compute Project[0]), and
               | mobile operating systems (which Google already has made a
               | commodity, for their own reasons, with Android).
               | 
               | What FB is doing here is commoditizing their competitors'
               | _core_ offering (or rather, a rather promising future
               | one). That 's just the tactic though, there are several
               | strategies this can enable, from undermining the barriers
               | to entry into their competitors' search markets, to
               | actually fragmenting that market by encouraging a
               | diversity of specialized chat interfaces over one big
               | chat model. You can see hints of both in this
               | announcement.
               | 
               | Final note: FB is also protecting itself from relying on
               | a competitor as a supplier should chat become a preferred
               | user interface for the content on social networks, which
               | it hasn't, but if it ever did this would count as
               | "commoditizing their complement", though I would actually
               | expect FB to switch to a mostly proprietary approach in
               | that circumstance (so not much openness on having LLMs
               | operate on social graphs and the like), just keeping the
               | foundation they rely on, and which undermines and
               | prevents gatekeeping by their advertising competitors,
               | open.
               | 
               | [0] https://www.opencompute.org/
        
               | invig wrote:
               | I don't think that's the best interpretation of
               | complement. Think of complement as "is your users
               | experience benefited by the existence of the other
               | thing". I think that Search qualifies as something that
               | enhances the Facebook users experience (e.g. my cousin
               | mentioned thing that they like doing today, how do I find
               | more about thing?)
        
               | titzer wrote:
               | They almost certainly spent at least a few million
               | dollars on this research project. Hard to say when the
               | decision was made to open source this (from the outset or
               | after it started showing results), but the decision was
               | conscious and calculated. Nothing this high-profile is
               | going to escape the strategic decision making processes
               | of upper management.
        
               | visarga wrote:
               | > It seems more likely that a bunch of dorks are running
               | around unsupervised trying their best to make lemonade
               | with whatever budgets they are given
               | 
               | Probably not, Zuck is announcing it.
        
               | recuter wrote:
               | Today we're releasing a new state-of-the-art AI large
               | language model called LLaMA designed to help researchers
               | advance their work. LLMs have shown a lot of promise in
               | generating text, having conversations, summarizing
               | written material, and more complicated tasks like solving
               | math theorems or predicting protein structures. Meta is
               | committed to this open model of research and we'll make
               | our new model available to the AI research community.
               | 
               | I don't know what that means or if he even wrote/read it
               | tbh. I hope it literally just means Meta is actually
               | committed to this open model of research (for now).
               | 
               | Maybe he is being a Machiavellian moat filler, I stand
               | corrected. I think/hope that they don't really have a
               | plan to counter OpenAI yet because I am afraid this
               | attitude won't last once they do and this stuff has
               | recently started moving quickly.
        
               | ahati wrote:
               | It is to please the shareholders. Now shareholder can
               | know that Meta can compete with GPT3.
        
               | ttul wrote:
               | 100%. This kind of announcement is for the street.
               | Investors need to be reassured that Meta is keeping up
               | with the cool AI stuff over at Google and Microsoft. A
               | release like this will cause a flurry of downstream news
               | coverage. I'm sure they are hoping that the model will be
               | picked up by researchers doing interesting things who
               | might further generate good coverage.
               | 
               | And, yes, it fills the moat.
        
               | mliker wrote:
               | this is a pretty biased and uninformed opinion. Pretty
               | condescending to call it a "bunch of dorks...running
               | around unsupervised."
        
               | recuter wrote:
               | I don't know what my bias is supposed to be but I called
               | them dorks affectionately for one. The other replies are
               | literally arguing that they are _very extremely_
               | supervised where as I am speculating they are just eager
               | to share their work for the right reasons and the eye of
               | Sauron has yet to turn upon them.
               | 
               | Inside knowledge I never claimed. Anything else I can
               | help you with today? :)
        
               | rcpt wrote:
               | > I called them dorks affectionately
               | 
               | Never in my life have I seen that word used
               | affectionately
        
               | vidarh wrote:
               | I have seen that many times, as well as seeing people use
               | it about themselves.
        
               | sebastiennight wrote:
               | I think Noah Kagan's blog is literally called "OK Dork",
               | and I always assumed the title was self-deprecating (in a
               | fun/positive way) rather than negative.
        
               | tantalor wrote:
               | Check out
               | https://news.ycombinator.com/newsguidelines.html
               | 
               | Not sure how calling out comment as "biased",
               | "uninformed" or "condescending" is helping.
        
               | fwlr wrote:
               | The researchers and engineers and other assorted dorks
               | who built it weren't thinking about moats, for sure, I
               | agree with you there. But I guarantee you that the
               | metaphor of medieval warfare was on the minds of the
               | executives and legal team deciding whether to let the
               | team release their work in this way.
        
             | Veen wrote:
             | Commoditize your complement.
             | 
             | https://gwern.net/complement
        
             | PoignardAzur wrote:
             | Which kind of suggests Microsoft made a really bad move
             | antagonizing the open-source community with Gtihub Copilot.
             | 
             | They got a few years of lead time in the "AI codes for you"
             | market, but in exchange permanently soured a significant
             | fraction of their potential userbase who will turn to open-
             | source alternatives soon anyway.
             | 
             | I wonder if they'd have been better served focusing on
             | selling Azure usage and released Copilot as an open-source
             | product.
        
               | freeqaz wrote:
               | How did Microsoft sour developers with Copilot? I know
               | dozens of people that pay for it (including myself) and I
               | feel like it is widely regarded as a "no brainer" for the
               | price that it's offered at.
               | 
               | Please help me understand!
        
               | sandkoan wrote:
               | Presumably, because they trained Copilot on billions of
               | lines of, often licensed, code (without permission), that
               | Copilot has a tendency to regurgitate verbatim, without
               | said license.
        
               | ldhough wrote:
               | For a specific example some variation of "fast inverse
               | square root" will usually get you the exact GPL licensed
               | code from Quake III, comments included.
        
               | theRealMe wrote:
               | Do you mean the same code that has its own Wikipedia page
               | where the exact code is written, comments included, and
               | has probably been copy pasted into 100's of other
               | projects?
               | 
               | https://en.m.wikipedia.org/wiki/Fast_inverse_square_root
        
               | robertlagrant wrote:
               | The point is that it's charging having been trained on
               | open source code. What you're saying agrees with that,
               | but your triumphant tone seems to be implying the
               | opposite. Which did you mean?
        
               | alar44 wrote:
               | [dead]
        
               | theRealMe wrote:
               | They didn't. There is a small group of people that are
               | always looking for the latest reason to be outraged and
               | to point at any of the big tech companies and go "aha!
               | They are evil!" Copilot's ai was trained on GitHub
               | projects and so these people are taking their turns
               | clutching their pearls inside of their little bubble.
               | 
               | I'd bet that more than 95% of devs haven't even heard of
               | this "controversy" and even if they did, wouldn't care.
        
               | layer8 wrote:
               | Controversy about unlicensed use of source code as
               | training data and lack of attributions in the generated
               | output.
        
               | Mizza wrote:
               | The company that tried to kill Linux in the 90s, owned by
               | the world's most famously rich man, is now stealing my
               | code and selling it back to me? Yeah, fuck that.
        
               | visarga wrote:
               | It's not selling you back your code. It's different code,
               | adapted to a different task; your own code is forever
               | free for you, you don't need anyone to give it to you.
               | 
               | Given the cost of running these models, and the utmost
               | dedication needed to train them, I think it is worth it.
               | GPUs cost money, electricity costs money. They can't
               | serve the world for free and offer good latency.
        
               | mbb70 wrote:
               | I mean, that's like saying an author steals the open
               | source alphabet and charges you for reading their
               | ordering of letters, as if the ordering of letters isn't
               | where all the value is.
        
               | robertlagrant wrote:
               | These models are trained on sequences of words, not told
               | the letters and left to get on with it.
        
               | KarlKemp wrote:
               | It continues to amaze how people are incapable of
               | following even the most trivial cases of abstract
               | reasoning.
        
               | Vvector wrote:
               | Microsoft makes dozens of "really bad moves" every year.
               | This is nothing.
        
       | miketery wrote:
       | Can this be downloaded like stable diffusion? How big are these
       | models? How long does it take to run a "query" against them on a
       | decent gaming PC?
        
         | bioemerl wrote:
         | This can't, but there's some software, KoboldAi, which lets you
         | download and run other LLMs
        
         | flatiron wrote:
         | you have to contact facebook for a copy, so not as open as
         | stable diffusion
        
         | brucethemoose2 wrote:
         | No one seems to be running it locally at the moment.
        
       | monkeydust wrote:
       | What no implementation example?
        
       | naveen99 wrote:
       | Why not a 1 Billion parameter model that can run on a single GPU
       | ?
       | 
       | worse is better will come to LLM's.
        
       | carlsborg wrote:
       | "The code is licensed under the GPLv3, which permits commercial
       | use." Yann on Twitter
        
         | return_to_monke wrote:
         | The code. They didn't release a model tho, so unless you have a
         | lot of money or get granted access to it, not for you.
        
       | saurabh20n wrote:
       | Quick notes from first glance at paper
       | https://research.facebook.com/publications/llama-open-and-ef...:
       | 
       | * All variants were trained on 1T - 1.4T tokens; which is a good
       | compared to their sizes based on the Chinchilla-metric. Code is
       | 4.5% of the training data (similar to others). [Table 2]
       | 
       | * They note the GPU hours as 82,432 (7B model) to 1,022,362 (65B
       | model). [Table 15] GPU hour rates will vary, but let's give a
       | range of $1 to $4. The 7B model would have cost ~$82-329k and the
       | 65B something in the range of ~$1-4M. They also note their total
       | time spent for all models: "we used 2048 A100-80GB for a period
       | of approximately 5 months" [sec 6, pg 10]
       | 
       | * 65B model's performance is broadly comparable to PALM-540B. Not
       | a small feat, but also could indicate the benefits of good model-
       | vs-token size ratios [Tables 3,4,5,6]. Their conjecture for
       | underperforming on MMLU (multitask language understanding)
       | compared to PALM-540B and Chinchilla-70B is smaller fraction of
       | books and academic training data.
       | 
       | * Math and code tasks: Math tasks they are substantially worse
       | than Minerva (comparing their 65B to Minerva 62B; they hands down
       | fail against Minerva 540B) [Table 7]. Code tasks they are broadly
       | competitive with PALM-540B (HumanEval and MBPP evals) [Table 8]
       | 
       | * Surprising that instruction fine tuning takes such a small part
       | of the paper (sec 4, pg. 7)
        
         | zozbot234 wrote:
         | https://github.com/facebookresearch/llama/blob/main/MODEL_CA...
         | (linked in OP) has basic information about this model.
        
         | make3 wrote:
         | do they do instruction fine-tuning
        
         | sandGorgon wrote:
         | >* 65B model's performance is broadly comparable to PALM-540B.
         | Not a small feat, but also could indicate the benefits of good
         | model-vs-token size ratios [Tables 3,4,5,6]. Their conjecture
         | for underperforming on MMLU (multitask language understanding)
         | compared to PALM-540B and Chinchilla-70B is smaller fraction of
         | books and academic training data.*
         | 
         | what do you mean by this ? The OpenAI papers talk roughly about
         | model performance scaling by parameters. does this show the
         | other way ?
        
           | vishal0123 wrote:
           | Scaling law is for training till convergence. Both PALM and
           | this model have been undertrained. See the training loss plot
           | in the paper.
        
         | machinekob wrote:
         | I hate when people don't include approximation for traning
         | before final hyperparameters are found as its most costly part
         | of whole process most of the time.
         | 
         | Just yes we train it for so long etc. but they never speak
         | about tens or even hundres of runs before they finalize the
         | model parameters and architecture -.-
        
         | 323 wrote:
         | > _we used 2048 A100-80GB for a period of approximately 5
         | months_
         | 
         | Do we know how much total energy a human consumes from birth to
         | 20 yo? Something like 2000 calories integrated over 20 years.
         | How does it compare to the GPUs above?
         | 
         | Wolfram Alpha:
         | 
         | - human - 17 MW/h ((2000 calories per day) over 20 years in
         | MWh)
         | 
         | - GPUs - 3000 MW/h ((2048 * 400) W over 5 months in MWh)
         | 
         | We still have the edge.
         | 
         | LOL, I'm being downvoted, I wonder way. Some don't like the
         | question.
        
           | isoprophlex wrote:
           | You mean MWh maybe, not MW/h? (which is what, J/s^2 in SI...
           | "Power rate".)
        
             | 323 wrote:
             | Right, I used the correct MWh in Wolfram, but for some
             | reason wrote MW/h, I think it was written like that a long
             | time ago on electricity bills.
        
           | robbiep wrote:
           | It's because your human math for power output is so far off
           | it's hard to know where to start to point you in the right
           | direction
        
             | 323 wrote:
             | Please do tell. Or better provide your estimation. I just
             | took raw calorie intake, no
             | heating/transportation/lighting/computer usage/....
        
           | zhynn wrote:
           | You have to include our evolutionary history too. A
           | considerable amount of our sophisticated behavior doesn't
           | require specific training, as it is encoded in our genetic
           | and epigenetic systems. We aren't starting from zero.
        
           | melling wrote:
           | Every human requires the same energy, 20+ years, and
           | training.
           | 
           | The trained computer model can be duplicated and used,
           | requiring much less energy.
           | 
           | None of this matters to me, though.
           | 
           | The goal is to build better models. We can worry about the
           | efficiency later.
        
           | Dylan16807 wrote:
           | > We still have the edge.
           | 
           | Depends on what you're doing. A human is much smarter than
           | one of these models, but the model has approximate knowledge
           | of orders of magnitude more things. And the energy costs per
           | word of output are a lot closer.
        
         | SethTro wrote:
         | (1022362 + 82432) gpu-hours / 2048gpus / 5 months ~= 15%
         | uptime.
         | 
         | That's only 0.08 nines of availability!
         | 
         | I remember in one of their old guidebooks a lot of struggle to
         | keep their 64 machine (512 gpu) cluster running this was
         | probably 4x the machines and 4x the number of cluster dropouts.
        
       ___________________________________________________________________
       (page generated 2023-02-24 23:01 UTC)