[HN Gopher] GPT Neo: open-source GPT model, with pretrained 1.3B...
       ___________________________________________________________________
        
       GPT Neo: open-source GPT model, with pretrained 1.3B & 2.7B weight
       models
        
       Author : pizza
       Score  : 533 points
       Date   : 2021-03-21 21:01 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | ve55 wrote:
       | This is a nice release, but the title is a bit misleading as the
       | released sizes (1.3B and 2.7B parameters) do not yet compare to
       | the size of GPT-3 (175B), but rather GPT-2 (1.5B) instead
       | (although future releases may have significantly more!).
       | 
       | Edit: title improved, thank you!
        
         | nl wrote:
         | Yeah. They say they are doing a 10B release soon[1].
         | 
         | I suspect they have run into training issues since they are
         | moving to a new repo[2]
         | 
         | [1]
         | https://twitter.com/arankomatsuzaki/status/13737326468119674...
         | 
         | [2] https://github.com/EleutherAI/gpt-neox/
        
           | chillee wrote:
           | It's more about hardware - these models were trained on TPUs,
           | while GPT-NeoX is being trained on GPUs graciously provided
           | by Coreweave.
        
             | orra wrote:
             | Any idea what the required GPU time would cost (if not
             | donated)? Is GPT-3 just a commodity soon?
        
               | minimaxir wrote:
               | With training improvements such as DeepSpeed, the GPU
               | costs will likely be substantially lower than what was
               | available at the time OpenAI trained GPT-3. Still not
               | free, though.
               | 
               | The hard part with GPT-3 is it's big enough to make it
               | difficult to actually _deploy_.
        
               | stellaathena wrote:
               | Our current estimate is that it requires between 2000 and
               | 4000 V100 months.
        
               | Voloskaya wrote:
               | ~4M$ per full training give or take.
        
               | teruakohatu wrote:
               | The number thrown around for gpt-3 is $4.6 million, but I
               | am not sure where that figure originates.
        
               | minimaxir wrote:
               | It was a number tossed around by a GPU hosting provider,
               | based on their own costs:
               | https://lambdalabs.com/blog/demystifying-gpt-3/
               | 
               | The reality is that GPT-3 was likely "free" to train on
               | Azure, as Microsoft has provided a lot of resources to
               | OpenAI.
        
         | pizza wrote:
         | Fixed title to reflect that, thanks
        
           | ve55 wrote:
           | I would perhaps change 'GPT-3' to just say 'GPT' instead, as
           | a more salient fix.
        
             | stellaathena wrote:
             | GPT-3 isn't a single model. It's a model architecture that
             | is very closely followed by GPT-Neo. The 2.7B model is the
             | exact same size as something OpenAI sells under the label
             | "GPT-3"
        
               | ve55 wrote:
               | My line of thinking was that for the average HN reader,
               | who has probably read 'GPT-3' perhaps 500 times by now
               | (every instance of which was referencing OpenAI's
               | infamous 175B model), it may be confusing for them to see
               | this with the same label, when the release is not
               | comparable as far as parameters/performance (yet). But as
               | yourself and another commenter noted, it is still the
               | GPT-3 architecture (or hopefully isomorphic to it), so I
               | appreciate your correction as well.
        
               | stellaathena wrote:
               | That's fair. I also later learned that the title didn't
               | explicitly mention model size at first, and I would have
               | probably raised similar complaints had I seen that.
        
               | Dylan16807 wrote:
               | Is GPT-2's architecture any different?
        
               | stellaathena wrote:
               | Not hugely, but yes. I tend to think of GPT as a style of
               | architecture with consistent themes and major features,
               | but varying minor features and implementation details.
               | Off the top of my head, I believe the most important
               | difference is that GPT-3 alternates global and local
               | attention while GPT-2 is all global attention.
               | 
               | The two published GPT-Neo models follow GPT-3's lead but
               | the repo lets the user pick whether to use global or
               | local attention layers.
        
             | nl wrote:
             | This is incorrect. It's the GPT-3 model architecture and
             | optimisations, and uses training techniques similar to
             | GPT-3.
        
               | ve55 wrote:
               | Thank you, I've rephrased a few things to improve the
               | wording with respect to this.
        
       | sillysaurusx wrote:
       | Please test their models before you take it at face value.
       | 
       | Eleuther has a history of claiming to replicate projects when
       | they haven't. For example, they shipped a DALL-E repo a few days
       | after OpenAI announced it
       | (https://twitter.com/theshawwn/status/1348017515897659392) which
       | was broken, and they've walked back their GPT-3 replication
       | claims to replicating 1.5B due to the fact that their
       | architecture doesn't scale.
       | 
       | As far as I can tell, they're generating a large amount of hype
       | with grandiose claims that they can't deliver on.
       | 
       | All I care about is whether you like their models and actually
       | use them in practice. If you do, please let me know and I'll pipe
       | down. But so far, I haven't heard of anyone who uses anything
       | they've produced, and that worries me. Has anyone?
       | 
       | One specific claim they made:
       | https://twitter.com/BlancheMinerva/status/134727697554780980...
       | 
       | "DALL-E is quite straight forward and already coded. We just need
       | data to train it."
       | 
       | No, DALL-E is neither straightforward nor was it successfully
       | coded, especially back on January 7th.
       | 
       | Anyway, carry on. I really don't like speaking badly of AI
       | projects, and I hope that they succeed. The model release today
       | is a good step forward, assuming it works. But it might be better
       | to have the expectation of "the models don't work" until proven
       | otherwise.
       | 
       | I'd also like to point out that there are some capable people
       | doing work at Eleuther. Sid in particular is one of the best TPU
       | hackers in the scene. I just wish they would scale down their
       | claims, release more models, and not claim that they've done X
       | until actually doing X. For example, the readme says they have
       | "the ability to scale up to full GPT3 sizes (and possibly more!),
       | using the mesh-tensorflow library," which they don't.
        
         | [deleted]
        
         | 6gvONxR4sf7o wrote:
         | People are always claiming to release replicated models by
         | replicating the architecture (or main parts of it) but not
         | testing whether it produces the same level of results. It's
         | maddening, especially when the level of results is so directly
         | measurable (just measure what the paper did, not that it's
         | easy, just concrete).
        
         | cookiengineer wrote:
         | What I find interesting about their marketing(?) is that they
         | identified a market niche that they want to position themselves
         | in.
         | 
         | Enterprise customers that have no idea about the technical
         | details will just hear about OpenAI's success in this fancy new
         | model and assume that Eleuther can deliver.
         | 
         | I mean, most use cases for "big data" projects that are tiny in
         | comparison with Alphabet's datasets will just work with GPT2
         | fine, probably.
         | 
         | And Enterprise customers that hear those claims and see some
         | code, maybe some demo, is enough for them to start the
         | consultancy process.
         | 
         | In my opinion that's a policy problem that OpenAI introduced by
         | not requiring the absolute reproducability of both the code and
         | model, and both training and dataset of their models upon
         | release.
         | 
         | Stakes are pretty high in the AI industry, and OpenAI actively
         | influences it. My dream was in the beginning that they are a
         | source of verification, audits and "proof" that models are
         | legit...yet I have the feeling lately that they just buzz
         | around like everyone else.
         | 
         | To this date I haven't seen anyone replicate any of the DNC
         | results, for example.
         | 
         | Anyways, just my two cents on this one.
        
           | leogao wrote:
           | To date, EleutherAI as an "organization" (read: basically a
           | Discord server) has not really attempted any kind of
           | marketing. It has no PR dept, just individuals tweeting about
           | the work that Eleuther does.
        
             | ma2rten wrote:
             | Also I'm pretty sure there is no consultancy process.
        
               | [deleted]
        
         | loxias wrote:
         | Geez, that's really harsh.
         | 
         | I don't think any single thing you've claimed is factually
         | wrong, and I don't speak for Eleuther nor am I attempting to
         | justify their claims.
         | 
         | But.
         | 
         | As I understand it (mostly from lurking on their discord and
         | reading publicly available materials) this is a group of
         | volunteer academic types trying to replicate something great
         | and awesome, with the only goal of giving it to the world. You
         | could cut them some slack.
         | 
         | I can't speak for you, but as a "for free, weekend project"
         | what they've done certainly makes me feel I need to up my game.
        
           | OgAstorga wrote:
           | This has nothing to do with the good work, awesome
           | intentions, nor the fact that they have no financial
           | incentives behind this.
           | 
           | Claiming something that is not true is in itself wrong.
        
             | loxias wrote:
             | > Claiming something that is not true is in itself wrong.
             | 
             | I 100% agree with this.
             | 
             | I also think that one catches more flies with honey than
             | vinegar, and the criticism in the parent comment, while
             | possibly valid, could be phrased more encouragingly and
             | less combatively. It's easy to criticize, it's hard to
             | create, and it's even harder to release.
        
             | wiz21c wrote:
             | > Claiming something that is not true is in itself wrong.
             | 
             | yup, in any project, and especially the one done for the
             | community where the only you get is satisfaction and fame,
             | the success is super tied to communication. Good, honest
             | communication is what builds trust.
        
             | regisg wrote:
             | "Claiming something that is not true is in itself wrong."
             | Does this apply to butterflies disguising themselves as
             | dead leaves?
        
             | [deleted]
        
             | stellaathena wrote:
             | I am sorry that I was misinformed about the state of our
             | DALL-E replication when I made that tweet. It was not
             | malicious - I was reporting what I had been told by someone
             | else.
             | 
             | Yes I was wrong. That said, I had hoped that maybe after
             | two an a half months Shawn would stop holding it over my
             | head.
        
         | nullc wrote:
         | Wouldn't it be nice if OpenAI were like .. actually open? :P
        
           | brunoluiz wrote:
           | Considering last decade social consequences due to easy
           | access to APIs and data, I am quite happy that these
           | initiatives are cautious around opening up software which can
           | have huge impact on society.
        
             | hesdeadjim wrote:
             | Unfortunately, the cat is out of the bag. Their methods are
             | documented and the results exciting, so to a bad actor
             | (especially state-sponsored) it's completely justified to
             | spend millions attempting to replicate their results from
             | what is publicly available.
        
           | ryanackley wrote:
           | Not just that. To even get access to their API, you need to
           | apply. That is the future of AI I'm afraid without projects
           | like this. That is Elites controlling AI and deciding who is
           | "worthy" to use it.
           | 
           | I'm sure they have the best of intentions but "worthiness" is
           | subjective.
        
             | TremendousJudge wrote:
             | Depends on who you mean by "they". If you mean the
             | researchers, then sure, they probably actually believe
             | whatever's written in their ethics statement.
             | 
             | Now, the actual owners? I don't believe it for a second
        
           | You-Are-Right wrote:
           | Freedom is Slavery.
        
         | natch wrote:
         | Good perspective but I would like to hear the response from the
         | developers before concluding too much.
         | 
         | This is not meant as a goad to you, but more just as more info
         | for everyone, my understanding is it is an open source
         | community of like minded people type project (as opposed to a
         | bigco) and actively solicits contributions (by which I mean
         | code and data) so anyone seeing room for improvement is welcome
         | to step in from what I can tell.
         | 
         | I did find your comment helpful and informative; just adding
         | another angle here.
        
           | stellaathena wrote:
           | It's literally a couple people hanging out in a discord
           | channel and doing this as a way to procrastinate their jobs.
        
             | mirekrusin wrote:
             | Peak of laziness - build ai to do your job so you have more
             | time building ai.
        
         | ImprobableTruth wrote:
         | Disclaimer: I know absolutely nothing about machine learning.
         | 
         | Isn't GPT-3 the architecture? Are they doing something
         | different or why would it not scale?
        
           | nmfisher wrote:
           | GPT-3 is the name for the architecture, but there are a few
           | different versions/sizes. The OpenAI version that impressed
           | us all was ~170B parameters, this is far smaller.
           | 
           | To go from 2.7B to 170B parameters will need more than just a
           | few config tweaks. There's a whole bunch of hacks and tricks
           | needed to coax a model to train at that scale, the Eleuther
           | version is almost guaranteed to fail out-of-the-box.
        
             | minimaxir wrote:
             | It's worth noting that the GPT-3 paper did train models
             | with more sane sizes (e.g. 1.5B) as a point of comparison.
             | I am surprised/annoyed they never released them though.
        
               | stellaathena wrote:
               | It's because OpenAI sells them for profit. The "Ada"
               | model is the same size as the larger of these two
               | EleutherAI models.
        
               | minimaxir wrote:
               | Huh, I was wondering what the size of the non-davinci
               | models were; guess that make sense.
               | 
               | It's still telling that a "small" GPT-3 model can risk
               | cannibalizing a larger model.
        
               | stellaathena wrote:
               | Ada is 2.7B, Babbage is 6.7B, Curie is 13.0B, and DaVinci
               | is 175B. The new one they announced last month is in the
               | 20-50B range I think, not totally sure though.
        
           | nl wrote:
           | Almost _all_ the challenges with GPT-sized models are
           | engineering and training challenges, not architectural.
           | 
           | How _do_ you train a model too big to fit in a single GPU? It
           | 's doable, but not simple. How do you update weights across
           | your cluster? etc etc
        
           | sendtown_expwy wrote:
           | I would guess that an average FAANG ML engineer could code up
           | and successfully execute a forward/backward pass on a GPT-1
           | or GPT-2 model with a day of effort or less. (GPT-3 a little
           | harder, but not significantly). But is that model actually
           | going to perform well? Most likely no. Model performance
           | varies significantly due to subtle details in data processing
           | implementations, seemingly insignificant details in code, and
           | even from different numerical methods of calculating the same
           | semantics.
           | 
           | If you don't believe me, consider that many ML researchers
           | track their commits (or exact code versions) extremely
           | carefully, because oftentimes they will make some change (or
           | changes) they think are inconsequential and later find that
           | actually, their model broke. If they made too many changes,
           | whoops, guess you have to binary search over the diff to see
           | what happened since your last "good run".
           | 
           | If the people who spent months (if not years) tuning a model
           | can't tell whether it will work from the code, how could
           | anyone else? Most ML researchers will not bother with most
           | code that doesn't give proof of results (in terms of a model
           | that can actually be evaluated) because it is just so
           | unlikely that it will actually work well. Now, it might
           | "work" in the sense that it converges and does something when
           | you prompt it with examples. But will this GPT-3
           | reimplementation actually outperform say, the 10x smaller T5
           | checkpoint that was released by Google, or the other smaller
           | language models others have released? If it doesn't, it's
           | hard to argue that its very useful at all.
           | 
           | I think that's the spirit of why the original commenter said
           | what they did, but I still do applaud the efforts of this
           | team (and hope that their implementation is, in fact, highly
           | performant!)
        
           | m00x wrote:
           | It's the model, not the architecture, but you could say the
           | model contains the architecture.
        
       | f430 wrote:
       | Can somebody explain to this beginner how to use this? Where can
       | I load this code and start running it? How can I train it on a
       | dataset and what do I need to prepare?
       | 
       | Lots of language here I don't understand like what is he
       | referring to when he says 1.5B or 1T weights?
       | 
       | What resources/videos can I watch in order to start tinkering
       | with this?
        
         | vertis wrote:
         | The repository readme actually includes a link to a notebook[1]
         | that helps getting started on Google colab. It's as good a
         | place to start as any:
         | 
         | [1]:
         | https://colab.research.google.com/github/EleutherAI/GPTNeo/b...
        
           | f430 wrote:
           | thanks, I never used this before. Do I have to add a credit
           | card? How much will it cost to run this?
        
             | zora_goron wrote:
             | Colab is free to use -- you can click Runtime - Run All to
             | run the cells in the notebook free-of-charge. (You may need
             | to be logged in to a Google Account to run it.)
        
               | f430 wrote:
               | very cool! side question but is there a complete guide to
               | learn PyTorch with Colab?
               | 
               | I tried to learn ML a few years ago but gave up because I
               | couldn't install CUDA on my machine for some reason. The
               | landscape seems to change dramatically.
               | 
               | I am interested in transformers in particular completing
               | incomplete images like what
               | https://openai.com/blog/image-gpt/ does, is there a
               | project that implements that and can let me start
               | training?
               | 
               | I'm excited but I just get overwhelmed as to where I need
               | to focus my attention on.
               | 
               | My goal is to utilize something like image-gpt but for a
               | more narrow domain (ex. only dealing with cats), how can
               | I build my knowledge and skills towards that goal?
               | 
               | Much thanks for your answers I'm really looking to learn
               | these stuffs
        
               | p1esk wrote:
               | Your questions are easily googleable, but if you insist
               | start at pytorch.org
        
               | f430 wrote:
               | I'm sure they are
        
             | [deleted]
        
       | leogao wrote:
       | Could the title of this post be change to emphasize that the
       | model sizes released were 1.3B and 2.7B? Something like
       | "EleutherAI releases 1.3B and 2.7B parameter GPT-like language
       | models". The current title implies that a full sized GPT-3 model
       | is currently available, which is not the case.
       | 
       | edit: the title has been changed, seems good enough
        
       | flemhans wrote:
       | So I would want to include a big corpus like GPT-3 or this
       | newfangled "Neo" thing but still have it trained to respond to
       | our own customers based on 200k email passages.
       | 
       | How to create a hybrid?
        
         | minimaxir wrote:
         | You fine-tune an existing pretrained model on your proprietary
         | dataset.
        
         | jalammar wrote:
         | I wouldn't trust any model to generate text for customers yet.
         | Not even the largest GPT3. There are no guarantees on what they
         | will output and could be damaging to your business.
         | 
         | You're better off either: 1- Defining common "intents" that a
         | lot of customer queries are categorized into, and having a
         | model map the incoming message to the appropriate canned
         | response. Look at Rasa, for an example of this.
         | 
         | 2- if you insist on generating the text, have it be a
         | recommendation to a human agent that either chooses to send it
         | or writes their own response.
        
         | stellaathena wrote:
         | 200k emails is not enough to train a model from scratch. If you
         | check out the google colab file in the GPT-Neo repository, it
         | talks about how to fine-tune the model on data which is what
         | you want to do
        
       | choxi wrote:
       | Is there anything a non-AI researcher can do to help support this
       | project? Is there a way to donate money? Or could a software
       | engineer help with testing, tooling, or other kinds of
       | infrastructure?
       | 
       | I was really excited about OpenAI's original plan and still
       | believe that an open source solution is the best way to prevent
       | the potential negative impacts AI might have on society. I can
       | sort of appreciate why OpenAI went the route of going private and
       | trying to monetize their work instead, it might prevent people
       | from using their work nefariously and will probably provide them
       | with way more capital to continue their efforts. But, I trust
       | humanity as a collective more than any particular group of people
       | in the long run. I'm sure there are many others like me who would
       | be eager to help out if they could.
       | 
       | Edit: EleutherAI has a whole page on their site about how others
       | can contribute: https://www.eleuther.ai/get-involved/. I didn't
       | see anything about accepting donations though, if anyone involved
       | with the project was interested in setting up a crowdfunding
       | account somewhere I'd be eager to donate.
        
         | zmix wrote:
         | You may indirectly support the project by supporting the host,
         | that hosts their data, https://the-eye.eu
         | 
         | Right on the front they write:                   > Hey there
         | fans! We are currently looking for help funding large storage
         | upgrades,          > if you want to help us serve more data see
         | our donation options (crypto, etc)          > Thanks for
         | reading, happy downloading!
        
           | stellaathena wrote:
           | The Eye has been a phenomenal partner and enables a lot of
           | what we do. In addition to providing terabytes of storage for
           | free, they also help us out with CPU from time to time.
        
         | punnerud wrote:
         | Indirectly they say you can donate money, in the form of
         | computation that can be rented: "As an independent
         | organization, we are dependent upon donations for our computing
         | costs. We are always interested in speaking with people who can
         | donate compute times."
        
       | godmode2019 wrote:
       | Following
        
       | dylanbyte wrote:
       | Curious to see what parameter size of gpt3 this will end up being
       | equivalent to. Obviously we won't know until they evaluate their
       | models.
        
         | Voloskaya wrote:
         | It's trained using the same architecture, and with a very
         | similar dataset, so it should be very close.
        
           | dylanbyte wrote:
           | My experience is that replicating papers is actually
           | nontrivial. For example someone announced they had replicated
           | gpt2 some time back but when evals were run it turned about
           | to be the equivalent of a much smaller model.
        
             | [deleted]
        
       | vincentmarle wrote:
       | Does anyone know if there's a hosted version of this kind of GPT
       | model somewhere? All I want to do is just call a GPT-2 API and
       | get a response back, I'm not interested in setting up the entire
       | infrastructure by myself.
        
         | jalammar wrote:
         | Hugging Face has that service
         | 
         | https://huggingface.co/pricing
        
       | ignoranceprior wrote:
       | Are there any output samples? Would like to get a sense of how
       | close this is to GPT-2 vs. GPT-3 levels of coherency.
        
         | aeroscripts wrote:
         | I've been having pretty good luck generating all sorts of
         | stuff. There's a large chunk of text generated with the default
         | unicorns prompt here:
         | https://cdn.discordapp.com/attachments/823393531305525288/82...
         | 
         | It seems really good at generating code too
         | 
         | prompt:
         | 
         | class ConvBlock(nn.Module):                   def
         | __init__(self, channels, stride=1):
         | super(ConvBlock, self).__init__()             self.convolution
         | = nn.Conv2d(channels//stride, channels, 3, stride=stride,
         | padding=1)             self.normalization =
         | nn.BatchNorm2d(channels)             #self.activation =
         | nn.LeakyReLU()              def forward(self, x):
         | 
         | output:                       x =             #  x =
         | self.convolution(x)             #  x = self.normalization(x)
         | #  x = self.activation(x)                  x =
         | self.convolution(x)             x = self.normalization(x)
         | x = self.activation(x)                  return x
         | 
         | class Resblock(nn.Module):                   def __init__(self,
         | stride=1):                  super(Resblock, self).__init__()
         | self.res = nn.Sequential(                 nn.Conv2d(9, 9, 3,
         | stride=stride, padding=1),                 nn.BatchNorm2d(9),
         | nn.LeakyReLU(0.2)             )             self.output =
         | nn.MaxPool2d(2)              def forward(self, x):
         | return self.res(x)
         | 
         | Also, some "tongue twisters":
         | 
         | Big ugly stick.
         | 
         | Bowl full of chicken peas.
         | 
         | New York is universally inadvisable.
         | 
         | Long-eared seagull.
         | 
         | This morning I have a tumor in me.
         | 
         | A dog bark.
         | 
         | Invisible dog hides.
         | 
         | A crow blind.
         | 
         | A cat can't see if it had a billion eyes.
         | 
         | Bumblebee.
         | 
         | Sheep herder, sheep herder.
         | 
         | A fawns abducts.
         | 
         | Two black birds are trapped.
         | 
         | Bottle on her finger.
         | 
         | Elephant sees another elephant.
         | 
         | Bull.
         | 
         | Mice in a box in the library.
         | 
         | A church swelter.
         | 
         | The door of a hotel opens.
         | 
         | Bosnian honey melons.
         | 
         | Grapes in excess.
         | 
         | Cat is on the loose.
         | 
         | Soil is shoveled into a glass jar
        
       | loxias wrote:
       | I'd love to know the minimum hardware requirements to run
       | something like this locally.
        
       | krick wrote:
       | Are there some truly objective benchmarks to compare this to GPT
       | 2/3?
        
         | ipsum2 wrote:
         | yes, see GLUE or superGLUE benchmarks. It assumes the answers
         | have not been scraped and included in the dataset though.
        
         | clircle wrote:
         | I think this is an important problem. With logistic regression
         | or deep learning, at least one can compare (out of sample)
         | calibration curves or discrimination measures. With a language
         | model, what can we do?
        
           | bigpumpkin wrote:
           | perplexity score against a corpus such as wikipedia?
           | Basically how well the model predicts the next word.
        
             | rkimb wrote:
             | This is a good start, but given the breadth of applications
             | this would hardly give us enough to compare, as the goal of
             | these models isn't to simply recite Wikipedia articles.
             | What about language translation? Content summarization?
             | Code generation? Turing test performance?
        
             | stellaathena wrote:
             | Both models were trained on Wikipedia, so that's a
             | particularly bad choice. But yes, in practice this is what
             | people tend to do. Take results with a very large grain of
             | salt though, as the domain of the prompts you feed it make
             | a huge difference.
        
       | PufPufPuf wrote:
       | Did anyone manage to successfully run inference in the provided
       | Google Colab (https://colab.research.google.com/github/EleutherAI
       | /GPTNeo/b...)? I can run training, but can't manage to make the
       | inference (even from a pre-trained model) work.
        
         | MasterScrat wrote:
         | Same here. I managed to make it "work" in the sense that it
         | wouldn't crash during inference, but then it generated
         | gibberish. Has anyone managed to make it work reliably?
        
           | aeroscripts wrote:
           | The problem in my case was "train_steps" in the model json
           | file. Default is 0. The notebook sets it to 401000 which
           | works.
        
         | stellaathena wrote:
         | Hi! Thanks for trying it out. There was a bug that should now
         | be fixed. When I run the example unicorn prompt I get the
         | follow. Don't hesitate to open an issue if you're still having
         | trouble.
         | 
         | "In a shocking finding, scientists discovered a herd of
         | unicorns living in a remote, previously unexplored valley, in
         | the Andes Mountains. Even more surprising to the researchers
         | was the fact that the unicorns spoke perfect English.
         | 
         | Bebek Uranzoglu, another member of the research team from the
         | University of British Columbia, was working on a project the
         | Latino-Canadian rodeo competition equipos to document a rare
         | and remarkable ecosystem in the Andes Mountains.
         | 
         | His curiosity was piqued when he spotted an adolescent herd of
         | about 10 unicorns foraging in a forest near the valley of the
         | Jumbo Flu Group. The unicorns -- whose numbers once swelled to
         | 46,000 -- were perched on the forest floor and watched the
         | researchers work.
         | 
         | Urizoglu grew excited when he spotted another group that seemed
         | to be thriving in an area below the herd. The team hoped the
         | apparent population growth would indicate a human presence.
         | 
         | But when a team of researchers set up a camera trap, they were
         | surprised to find the unicorns in the first place, and in a
         | forest near a lake -- in fact the forest was almost entirely
         | made up of the animals. Despite their own magical presence, the
         | team could not see the herd was populated by humans.
         | 
         | "The whole place almost smelled like animals," says Bebek. "We
         | were never able to find human footprints at any of the points
         | we stood at. The trees were so large, you wouldn't have been
         | able to walk 40 meters through them. We assumed that the truth
         | of the matter was, 'Well the deer didn't like this forest at
         | all.'"
        
       | joshhart wrote:
       | I think we need more funding outside of large tech companies and
       | OpenAI for these kinds of things. I wonder if there is a way to
       | crowdsource donations to rent the hardware to train big versions
       | of these things in an open manner.
        
       | FeepingCreature wrote:
       | Is there something like chattingtransformer (
       | https://pypi.org/project/chattingtransformer/ ) for gpt-neo? Ie.
       | a trivial way to get text completion on a sample with sane
       | defaults from the commandline.
       | 
       | edit: Oh, I see the "generating text" section. Any way to run it
       | on CPU, even if it takes an hour?
        
         | victor9000 wrote:
         | Stella mentioned elsewhere in this thread that HuggingFace is
         | adding support for the Eleuther model, so text generation
         | should become trivial once this work is complete.
        
       | graiz wrote:
       | A great start for a truly open approach. It's ironic that OpenAI
       | isn't particularly open about its tech.
        
         | victor9000 wrote:
         | It was disappointing to see just how quickly ClosedAI changed
         | its tune once they produced something of value.
        
           | catillac wrote:
           | Is ClosedAI some counterpart to OpenAI with a clever name?
        
         | qPM9l3XJrF wrote:
         | Many people (correctly, in my view) criticized OpenAI for the
         | name, saying that openness should be evaluated on a case by
         | case basis. Glad they listened to critics instead of trying to
         | maintain consistency for its own sake.
        
       | scottisbrave84 wrote:
       | Amen
        
       | FL33TW00D wrote:
       | Whilst obviously BERT is not the same as GPT-3 in architecture,
       | Amazons recent paper discussing architecture optimizations for
       | BERT seems pretty relevant here
       | (https://arxiv.org/pdf/2010.10499.pdf) given the chance to
       | improve upon GPT-3s architecture (because it surely isn't the
       | best we can get). Have the Eleuther.ai team been exploring this?
        
       | minimaxir wrote:
       | Per Twitter, there will be more info about model performance
       | tomorrow:
       | https://twitter.com/arankomatsuzaki/status/13737326454445793...
        
         | [deleted]
        
         | stellaathena wrote:
         | We've added a table with some evaluation scores to the GitHub
         | repo, and you can see a comparison between our scores, GPT-2,
         | and GPT-3 here:
         | https://twitter.com/BlancheMinerva/status/137399189661642752...
         | 
         | tl;dr we are doing pretty much exactly as well as we expected
         | on LAMBADA and WikiText. Results on more sophisticated tasks
         | will take some time, but HuggingFace is currently working on
         | implementing our model in the transformers library and when
         | they do so we can easily run a lot of analyses very quickly.
         | 
         | We actually built an evaluation suite that integrates with HF,
         | but interfacing with the MTF code that GPT-Neo was written in
         | was too much of a pain in the ass because Mesh TensorFlow is
         | the worst. https://github.com/EleutherAI/lm-evaluation-harness
        
       | monkeydust wrote:
       | If I wanted to build an support Q&A system using texts from
       | support logs, training docs, transcribed videos etc etc
       | (basically as much text about my product as I can get) would this
       | model be a good start ?
        
         | malka wrote:
         | You should look on hugging faces https://huggingface.co/
         | 
         | https://huggingface.co/transformers/task_summary.html#extrac...
         | more precisely
        
         | FL33TW00D wrote:
         | Depending on how much content you've got, this blog post from
         | HuggingFace might be interesting:
         | https://yjernite.github.io/lfqa.html
        
         | ggnore7452 wrote:
         | also, for some quick and simple Q&A system. Haystack
         | https://github.com/deepset-ai/haystack (essentially dense
         | vector similarity on Elastic Search) looks pretty promising and
         | supports whole pipeline.
        
       | The_rationalist wrote:
       | Does it leverage deepspeed/zero 3?
        
         | minimaxir wrote:
         | That's PyTorch only; the current models are TensorFlow.
        
           | The_rationalist wrote:
           | Oh that's unfortunate, can't the models be exported to
           | pytorch through e.g onnx?
        
             | machinevision wrote:
             | Even small models can be a headache to export if they have
             | even they use anything custom. I can't even imagine
             | something the size of GPT-3.
        
               | Voloskaya wrote:
               | Larger models aren't really more complicated than smaller
               | ones though. GPT-2 is already supported, I believe only
               | difference with GPT-3 is sparse attention.
        
               | minimaxir wrote:
               | The GPT-2 models included with Transformers can export to
               | ONNX fine w/ a helper included with the Python
               | onnxruntime.
        
             | minimaxir wrote:
             | To run in PyTorch the model architecture must be ported.
             | 
             | ONNX is slightly different; you could export the model to
             | run in the onnxruntime but that has tradeoffs.
        
             | stellaathena wrote:
             | There's a PyTorch + DeepSpeed repository here:
             | https://github.com/EleutherAI/gpt-neox
        
         | MasterScrat wrote:
         | GPT-NeoX, which is a model from the same group but using GPUs
         | instead of TPUs, uses techniques from DeepSpeed:
         | 
         | https://github.com/EleutherAI/gpt-neox/
        
       | [deleted]
        
       | stellaathena wrote:
       | We've added a table with some evaluation scores to the GitHub
       | repo, and you can see a comparison between our scores, GPT-2, and
       | GPT-3 here:
       | https://twitter.com/BlancheMinerva/status/137399189661642752...
       | 
       | tl;dr we are doing pretty much exactly as well as we expected on
       | LAMBADA and WikiText. Results on more sophisticated tasks will
       | take some time, but HuggingFace is currently working on
       | implementing our model in the transformers library and when they
       | do so we can easily run a lot of analyses very quickly.
       | 
       | We actually built an evaluation suite that integrates with HF,
       | but interfacing with the MTF code that GPT-Neo was written in was
       | too much of a pain in the ass because Mesh TensorFlow is the
       | worst. https://github.com/EleutherAI/lm-evaluation-harness
        
       | Bostonian wrote:
       | Is something with billions of parameters actually a "model"? I
       | guess the answer is yes if the data set is even larger than that?
        
       | pjfin123 wrote:
       | GPT Paper:
       | 
       | "Specifically, we train GPT-3, an autoregressive language model
       | with 175 billion parameters"
       | 
       | README:
       | 
       | "1T or bust my dudes [...] An implementation of model & data
       | parallel GPT2 & GPT3 -like models, with the ability to scale up
       | to full GPT3 sizes (and possibly more!)"
       | 
       | It seems the largest model they released is 2.7 billion
       | parameters or ~0.01 the size of GPT-3. The most interesting part
       | about GPT-3 was its size and it seems this is only "GPT-3-like"
       | in architecture.
       | 
       | I also have a translation library with ~100 million (0.001 GPT-3)
       | parameters:
       | 
       | https://github.com/argosopentech/argos-translate
        
         | f430 wrote:
         | what does he mean when he says 1T or bust? Is he referring to 1
         | trillion parameters? Are you saying that GTP-3 has 2.7 trillion
         | parameters? Does it mean that to get to GPT-3 level it needs
         | 100x more amount of dataset?
        
           | pjfin123 wrote:
           | I interpreted it as aspiring to a trillion paramters but I'm
           | not sure.
        
           | jon_tau wrote:
           | The saying comes from a slide by Noam Shazeer (see: https://w
           | ww.youtube.com/watch?v=HgGyWS40g-g&ab_channel=Tenso...). It
           | just means the current goal should be to have models with 1
           | trillion parameters.
        
           | Voloskaya wrote:
           | GPT-3 has 175 billion parameters. So they need to scale by
           | 64x. They already have a comparable amount of data than what
           | was used by OpenAI, so it's about scaling the numbers of
           | GPUs.
        
             | f430 wrote:
             | I see so that means this GPT Neo is 64 less powerful?
        
               | Voloskaya wrote:
               | Accuracy and numbers of parameters don't scale linearly
               | together. It varies widely depending on exactly what you
               | are measuring accuracy on as well etc. But a very
               | approximate rule of thumb would be to say that accuracy
               | scales with the log of the parameter count (for the same
               | architecture).
        
         | stellaathena wrote:
         | GPT-3 is a model architecture, not a model. While the largest
         | GPT-3 model is 175B, that very paper has a table that includes
         | "GPT-3 XL" (1.3B) and "GPT-3 2.7B" as models in the GPT-3
         | architecture. The 2.7B model is the same size as Ada, a model
         | that OpenAI currently sells API access to under the moniker
         | "GPT-3"
        
           | Dylan16807 wrote:
           | None of the other models are even close to the big one, and
           | the paper also suggests calling the big one "GPT-3". And
           | people do that very often in practice. So it's often
           | ambiguous but saying the term _only_ means the architecture
           | isn 't right either.
        
       ___________________________________________________________________
       (page generated 2021-03-22 23:03 UTC)