[HN Gopher] GPT Neo: open-source GPT model, with pretrained 1.3B...
___________________________________________________________________
GPT Neo: open-source GPT model, with pretrained 1.3B & 2.7B weight
models
Author : pizza
Score : 533 points
Date : 2021-03-21 21:01 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| ve55 wrote:
| This is a nice release, but the title is a bit misleading as the
| released sizes (1.3B and 2.7B parameters) do not yet compare to
| the size of GPT-3 (175B), but rather GPT-2 (1.5B) instead
| (although future releases may have significantly more!).
|
| Edit: title improved, thank you!
| nl wrote:
| Yeah. They say they are doing a 10B release soon[1].
|
| I suspect they have run into training issues since they are
| moving to a new repo[2]
|
| [1]
| https://twitter.com/arankomatsuzaki/status/13737326468119674...
|
| [2] https://github.com/EleutherAI/gpt-neox/
| chillee wrote:
| It's more about hardware - these models were trained on TPUs,
| while GPT-NeoX is being trained on GPUs graciously provided
| by Coreweave.
| orra wrote:
| Any idea what the required GPU time would cost (if not
| donated)? Is GPT-3 just a commodity soon?
| minimaxir wrote:
| With training improvements such as DeepSpeed, the GPU
| costs will likely be substantially lower than what was
| available at the time OpenAI trained GPT-3. Still not
| free, though.
|
| The hard part with GPT-3 is it's big enough to make it
| difficult to actually _deploy_.
| stellaathena wrote:
| Our current estimate is that it requires between 2000 and
| 4000 V100 months.
| Voloskaya wrote:
| ~4M$ per full training give or take.
| teruakohatu wrote:
| The number thrown around for gpt-3 is $4.6 million, but I
| am not sure where that figure originates.
| minimaxir wrote:
| It was a number tossed around by a GPU hosting provider,
| based on their own costs:
| https://lambdalabs.com/blog/demystifying-gpt-3/
|
| The reality is that GPT-3 was likely "free" to train on
| Azure, as Microsoft has provided a lot of resources to
| OpenAI.
| pizza wrote:
| Fixed title to reflect that, thanks
| ve55 wrote:
| I would perhaps change 'GPT-3' to just say 'GPT' instead, as
| a more salient fix.
| stellaathena wrote:
| GPT-3 isn't a single model. It's a model architecture that
| is very closely followed by GPT-Neo. The 2.7B model is the
| exact same size as something OpenAI sells under the label
| "GPT-3"
| ve55 wrote:
| My line of thinking was that for the average HN reader,
| who has probably read 'GPT-3' perhaps 500 times by now
| (every instance of which was referencing OpenAI's
| infamous 175B model), it may be confusing for them to see
| this with the same label, when the release is not
| comparable as far as parameters/performance (yet). But as
| yourself and another commenter noted, it is still the
| GPT-3 architecture (or hopefully isomorphic to it), so I
| appreciate your correction as well.
| stellaathena wrote:
| That's fair. I also later learned that the title didn't
| explicitly mention model size at first, and I would have
| probably raised similar complaints had I seen that.
| Dylan16807 wrote:
| Is GPT-2's architecture any different?
| stellaathena wrote:
| Not hugely, but yes. I tend to think of GPT as a style of
| architecture with consistent themes and major features,
| but varying minor features and implementation details.
| Off the top of my head, I believe the most important
| difference is that GPT-3 alternates global and local
| attention while GPT-2 is all global attention.
|
| The two published GPT-Neo models follow GPT-3's lead but
| the repo lets the user pick whether to use global or
| local attention layers.
| nl wrote:
| This is incorrect. It's the GPT-3 model architecture and
| optimisations, and uses training techniques similar to
| GPT-3.
| ve55 wrote:
| Thank you, I've rephrased a few things to improve the
| wording with respect to this.
| sillysaurusx wrote:
| Please test their models before you take it at face value.
|
| Eleuther has a history of claiming to replicate projects when
| they haven't. For example, they shipped a DALL-E repo a few days
| after OpenAI announced it
| (https://twitter.com/theshawwn/status/1348017515897659392) which
| was broken, and they've walked back their GPT-3 replication
| claims to replicating 1.5B due to the fact that their
| architecture doesn't scale.
|
| As far as I can tell, they're generating a large amount of hype
| with grandiose claims that they can't deliver on.
|
| All I care about is whether you like their models and actually
| use them in practice. If you do, please let me know and I'll pipe
| down. But so far, I haven't heard of anyone who uses anything
| they've produced, and that worries me. Has anyone?
|
| One specific claim they made:
| https://twitter.com/BlancheMinerva/status/134727697554780980...
|
| "DALL-E is quite straight forward and already coded. We just need
| data to train it."
|
| No, DALL-E is neither straightforward nor was it successfully
| coded, especially back on January 7th.
|
| Anyway, carry on. I really don't like speaking badly of AI
| projects, and I hope that they succeed. The model release today
| is a good step forward, assuming it works. But it might be better
| to have the expectation of "the models don't work" until proven
| otherwise.
|
| I'd also like to point out that there are some capable people
| doing work at Eleuther. Sid in particular is one of the best TPU
| hackers in the scene. I just wish they would scale down their
| claims, release more models, and not claim that they've done X
| until actually doing X. For example, the readme says they have
| "the ability to scale up to full GPT3 sizes (and possibly more!),
| using the mesh-tensorflow library," which they don't.
| [deleted]
| 6gvONxR4sf7o wrote:
| People are always claiming to release replicated models by
| replicating the architecture (or main parts of it) but not
| testing whether it produces the same level of results. It's
| maddening, especially when the level of results is so directly
| measurable (just measure what the paper did, not that it's
| easy, just concrete).
| cookiengineer wrote:
| What I find interesting about their marketing(?) is that they
| identified a market niche that they want to position themselves
| in.
|
| Enterprise customers that have no idea about the technical
| details will just hear about OpenAI's success in this fancy new
| model and assume that Eleuther can deliver.
|
| I mean, most use cases for "big data" projects that are tiny in
| comparison with Alphabet's datasets will just work with GPT2
| fine, probably.
|
| And Enterprise customers that hear those claims and see some
| code, maybe some demo, is enough for them to start the
| consultancy process.
|
| In my opinion that's a policy problem that OpenAI introduced by
| not requiring the absolute reproducability of both the code and
| model, and both training and dataset of their models upon
| release.
|
| Stakes are pretty high in the AI industry, and OpenAI actively
| influences it. My dream was in the beginning that they are a
| source of verification, audits and "proof" that models are
| legit...yet I have the feeling lately that they just buzz
| around like everyone else.
|
| To this date I haven't seen anyone replicate any of the DNC
| results, for example.
|
| Anyways, just my two cents on this one.
| leogao wrote:
| To date, EleutherAI as an "organization" (read: basically a
| Discord server) has not really attempted any kind of
| marketing. It has no PR dept, just individuals tweeting about
| the work that Eleuther does.
| ma2rten wrote:
| Also I'm pretty sure there is no consultancy process.
| [deleted]
| loxias wrote:
| Geez, that's really harsh.
|
| I don't think any single thing you've claimed is factually
| wrong, and I don't speak for Eleuther nor am I attempting to
| justify their claims.
|
| But.
|
| As I understand it (mostly from lurking on their discord and
| reading publicly available materials) this is a group of
| volunteer academic types trying to replicate something great
| and awesome, with the only goal of giving it to the world. You
| could cut them some slack.
|
| I can't speak for you, but as a "for free, weekend project"
| what they've done certainly makes me feel I need to up my game.
| OgAstorga wrote:
| This has nothing to do with the good work, awesome
| intentions, nor the fact that they have no financial
| incentives behind this.
|
| Claiming something that is not true is in itself wrong.
| loxias wrote:
| > Claiming something that is not true is in itself wrong.
|
| I 100% agree with this.
|
| I also think that one catches more flies with honey than
| vinegar, and the criticism in the parent comment, while
| possibly valid, could be phrased more encouragingly and
| less combatively. It's easy to criticize, it's hard to
| create, and it's even harder to release.
| wiz21c wrote:
| > Claiming something that is not true is in itself wrong.
|
| yup, in any project, and especially the one done for the
| community where the only you get is satisfaction and fame,
| the success is super tied to communication. Good, honest
| communication is what builds trust.
| regisg wrote:
| "Claiming something that is not true is in itself wrong."
| Does this apply to butterflies disguising themselves as
| dead leaves?
| [deleted]
| stellaathena wrote:
| I am sorry that I was misinformed about the state of our
| DALL-E replication when I made that tweet. It was not
| malicious - I was reporting what I had been told by someone
| else.
|
| Yes I was wrong. That said, I had hoped that maybe after
| two an a half months Shawn would stop holding it over my
| head.
| nullc wrote:
| Wouldn't it be nice if OpenAI were like .. actually open? :P
| brunoluiz wrote:
| Considering last decade social consequences due to easy
| access to APIs and data, I am quite happy that these
| initiatives are cautious around opening up software which can
| have huge impact on society.
| hesdeadjim wrote:
| Unfortunately, the cat is out of the bag. Their methods are
| documented and the results exciting, so to a bad actor
| (especially state-sponsored) it's completely justified to
| spend millions attempting to replicate their results from
| what is publicly available.
| ryanackley wrote:
| Not just that. To even get access to their API, you need to
| apply. That is the future of AI I'm afraid without projects
| like this. That is Elites controlling AI and deciding who is
| "worthy" to use it.
|
| I'm sure they have the best of intentions but "worthiness" is
| subjective.
| TremendousJudge wrote:
| Depends on who you mean by "they". If you mean the
| researchers, then sure, they probably actually believe
| whatever's written in their ethics statement.
|
| Now, the actual owners? I don't believe it for a second
| You-Are-Right wrote:
| Freedom is Slavery.
| natch wrote:
| Good perspective but I would like to hear the response from the
| developers before concluding too much.
|
| This is not meant as a goad to you, but more just as more info
| for everyone, my understanding is it is an open source
| community of like minded people type project (as opposed to a
| bigco) and actively solicits contributions (by which I mean
| code and data) so anyone seeing room for improvement is welcome
| to step in from what I can tell.
|
| I did find your comment helpful and informative; just adding
| another angle here.
| stellaathena wrote:
| It's literally a couple people hanging out in a discord
| channel and doing this as a way to procrastinate their jobs.
| mirekrusin wrote:
| Peak of laziness - build ai to do your job so you have more
| time building ai.
| ImprobableTruth wrote:
| Disclaimer: I know absolutely nothing about machine learning.
|
| Isn't GPT-3 the architecture? Are they doing something
| different or why would it not scale?
| nmfisher wrote:
| GPT-3 is the name for the architecture, but there are a few
| different versions/sizes. The OpenAI version that impressed
| us all was ~170B parameters, this is far smaller.
|
| To go from 2.7B to 170B parameters will need more than just a
| few config tweaks. There's a whole bunch of hacks and tricks
| needed to coax a model to train at that scale, the Eleuther
| version is almost guaranteed to fail out-of-the-box.
| minimaxir wrote:
| It's worth noting that the GPT-3 paper did train models
| with more sane sizes (e.g. 1.5B) as a point of comparison.
| I am surprised/annoyed they never released them though.
| stellaathena wrote:
| It's because OpenAI sells them for profit. The "Ada"
| model is the same size as the larger of these two
| EleutherAI models.
| minimaxir wrote:
| Huh, I was wondering what the size of the non-davinci
| models were; guess that make sense.
|
| It's still telling that a "small" GPT-3 model can risk
| cannibalizing a larger model.
| stellaathena wrote:
| Ada is 2.7B, Babbage is 6.7B, Curie is 13.0B, and DaVinci
| is 175B. The new one they announced last month is in the
| 20-50B range I think, not totally sure though.
| nl wrote:
| Almost _all_ the challenges with GPT-sized models are
| engineering and training challenges, not architectural.
|
| How _do_ you train a model too big to fit in a single GPU? It
| 's doable, but not simple. How do you update weights across
| your cluster? etc etc
| sendtown_expwy wrote:
| I would guess that an average FAANG ML engineer could code up
| and successfully execute a forward/backward pass on a GPT-1
| or GPT-2 model with a day of effort or less. (GPT-3 a little
| harder, but not significantly). But is that model actually
| going to perform well? Most likely no. Model performance
| varies significantly due to subtle details in data processing
| implementations, seemingly insignificant details in code, and
| even from different numerical methods of calculating the same
| semantics.
|
| If you don't believe me, consider that many ML researchers
| track their commits (or exact code versions) extremely
| carefully, because oftentimes they will make some change (or
| changes) they think are inconsequential and later find that
| actually, their model broke. If they made too many changes,
| whoops, guess you have to binary search over the diff to see
| what happened since your last "good run".
|
| If the people who spent months (if not years) tuning a model
| can't tell whether it will work from the code, how could
| anyone else? Most ML researchers will not bother with most
| code that doesn't give proof of results (in terms of a model
| that can actually be evaluated) because it is just so
| unlikely that it will actually work well. Now, it might
| "work" in the sense that it converges and does something when
| you prompt it with examples. But will this GPT-3
| reimplementation actually outperform say, the 10x smaller T5
| checkpoint that was released by Google, or the other smaller
| language models others have released? If it doesn't, it's
| hard to argue that its very useful at all.
|
| I think that's the spirit of why the original commenter said
| what they did, but I still do applaud the efforts of this
| team (and hope that their implementation is, in fact, highly
| performant!)
| m00x wrote:
| It's the model, not the architecture, but you could say the
| model contains the architecture.
| f430 wrote:
| Can somebody explain to this beginner how to use this? Where can
| I load this code and start running it? How can I train it on a
| dataset and what do I need to prepare?
|
| Lots of language here I don't understand like what is he
| referring to when he says 1.5B or 1T weights?
|
| What resources/videos can I watch in order to start tinkering
| with this?
| vertis wrote:
| The repository readme actually includes a link to a notebook[1]
| that helps getting started on Google colab. It's as good a
| place to start as any:
|
| [1]:
| https://colab.research.google.com/github/EleutherAI/GPTNeo/b...
| f430 wrote:
| thanks, I never used this before. Do I have to add a credit
| card? How much will it cost to run this?
| zora_goron wrote:
| Colab is free to use -- you can click Runtime - Run All to
| run the cells in the notebook free-of-charge. (You may need
| to be logged in to a Google Account to run it.)
| f430 wrote:
| very cool! side question but is there a complete guide to
| learn PyTorch with Colab?
|
| I tried to learn ML a few years ago but gave up because I
| couldn't install CUDA on my machine for some reason. The
| landscape seems to change dramatically.
|
| I am interested in transformers in particular completing
| incomplete images like what
| https://openai.com/blog/image-gpt/ does, is there a
| project that implements that and can let me start
| training?
|
| I'm excited but I just get overwhelmed as to where I need
| to focus my attention on.
|
| My goal is to utilize something like image-gpt but for a
| more narrow domain (ex. only dealing with cats), how can
| I build my knowledge and skills towards that goal?
|
| Much thanks for your answers I'm really looking to learn
| these stuffs
| p1esk wrote:
| Your questions are easily googleable, but if you insist
| start at pytorch.org
| f430 wrote:
| I'm sure they are
| [deleted]
| leogao wrote:
| Could the title of this post be change to emphasize that the
| model sizes released were 1.3B and 2.7B? Something like
| "EleutherAI releases 1.3B and 2.7B parameter GPT-like language
| models". The current title implies that a full sized GPT-3 model
| is currently available, which is not the case.
|
| edit: the title has been changed, seems good enough
| flemhans wrote:
| So I would want to include a big corpus like GPT-3 or this
| newfangled "Neo" thing but still have it trained to respond to
| our own customers based on 200k email passages.
|
| How to create a hybrid?
| minimaxir wrote:
| You fine-tune an existing pretrained model on your proprietary
| dataset.
| jalammar wrote:
| I wouldn't trust any model to generate text for customers yet.
| Not even the largest GPT3. There are no guarantees on what they
| will output and could be damaging to your business.
|
| You're better off either: 1- Defining common "intents" that a
| lot of customer queries are categorized into, and having a
| model map the incoming message to the appropriate canned
| response. Look at Rasa, for an example of this.
|
| 2- if you insist on generating the text, have it be a
| recommendation to a human agent that either chooses to send it
| or writes their own response.
| stellaathena wrote:
| 200k emails is not enough to train a model from scratch. If you
| check out the google colab file in the GPT-Neo repository, it
| talks about how to fine-tune the model on data which is what
| you want to do
| choxi wrote:
| Is there anything a non-AI researcher can do to help support this
| project? Is there a way to donate money? Or could a software
| engineer help with testing, tooling, or other kinds of
| infrastructure?
|
| I was really excited about OpenAI's original plan and still
| believe that an open source solution is the best way to prevent
| the potential negative impacts AI might have on society. I can
| sort of appreciate why OpenAI went the route of going private and
| trying to monetize their work instead, it might prevent people
| from using their work nefariously and will probably provide them
| with way more capital to continue their efforts. But, I trust
| humanity as a collective more than any particular group of people
| in the long run. I'm sure there are many others like me who would
| be eager to help out if they could.
|
| Edit: EleutherAI has a whole page on their site about how others
| can contribute: https://www.eleuther.ai/get-involved/. I didn't
| see anything about accepting donations though, if anyone involved
| with the project was interested in setting up a crowdfunding
| account somewhere I'd be eager to donate.
| zmix wrote:
| You may indirectly support the project by supporting the host,
| that hosts their data, https://the-eye.eu
|
| Right on the front they write: > Hey there
| fans! We are currently looking for help funding large storage
| upgrades, > if you want to help us serve more data see
| our donation options (crypto, etc) > Thanks for
| reading, happy downloading!
| stellaathena wrote:
| The Eye has been a phenomenal partner and enables a lot of
| what we do. In addition to providing terabytes of storage for
| free, they also help us out with CPU from time to time.
| punnerud wrote:
| Indirectly they say you can donate money, in the form of
| computation that can be rented: "As an independent
| organization, we are dependent upon donations for our computing
| costs. We are always interested in speaking with people who can
| donate compute times."
| godmode2019 wrote:
| Following
| dylanbyte wrote:
| Curious to see what parameter size of gpt3 this will end up being
| equivalent to. Obviously we won't know until they evaluate their
| models.
| Voloskaya wrote:
| It's trained using the same architecture, and with a very
| similar dataset, so it should be very close.
| dylanbyte wrote:
| My experience is that replicating papers is actually
| nontrivial. For example someone announced they had replicated
| gpt2 some time back but when evals were run it turned about
| to be the equivalent of a much smaller model.
| [deleted]
| vincentmarle wrote:
| Does anyone know if there's a hosted version of this kind of GPT
| model somewhere? All I want to do is just call a GPT-2 API and
| get a response back, I'm not interested in setting up the entire
| infrastructure by myself.
| jalammar wrote:
| Hugging Face has that service
|
| https://huggingface.co/pricing
| ignoranceprior wrote:
| Are there any output samples? Would like to get a sense of how
| close this is to GPT-2 vs. GPT-3 levels of coherency.
| aeroscripts wrote:
| I've been having pretty good luck generating all sorts of
| stuff. There's a large chunk of text generated with the default
| unicorns prompt here:
| https://cdn.discordapp.com/attachments/823393531305525288/82...
|
| It seems really good at generating code too
|
| prompt:
|
| class ConvBlock(nn.Module): def
| __init__(self, channels, stride=1):
| super(ConvBlock, self).__init__() self.convolution
| = nn.Conv2d(channels//stride, channels, 3, stride=stride,
| padding=1) self.normalization =
| nn.BatchNorm2d(channels) #self.activation =
| nn.LeakyReLU() def forward(self, x):
|
| output: x = # x =
| self.convolution(x) # x = self.normalization(x)
| # x = self.activation(x) x =
| self.convolution(x) x = self.normalization(x)
| x = self.activation(x) return x
|
| class Resblock(nn.Module): def __init__(self,
| stride=1): super(Resblock, self).__init__()
| self.res = nn.Sequential( nn.Conv2d(9, 9, 3,
| stride=stride, padding=1), nn.BatchNorm2d(9),
| nn.LeakyReLU(0.2) ) self.output =
| nn.MaxPool2d(2) def forward(self, x):
| return self.res(x)
|
| Also, some "tongue twisters":
|
| Big ugly stick.
|
| Bowl full of chicken peas.
|
| New York is universally inadvisable.
|
| Long-eared seagull.
|
| This morning I have a tumor in me.
|
| A dog bark.
|
| Invisible dog hides.
|
| A crow blind.
|
| A cat can't see if it had a billion eyes.
|
| Bumblebee.
|
| Sheep herder, sheep herder.
|
| A fawns abducts.
|
| Two black birds are trapped.
|
| Bottle on her finger.
|
| Elephant sees another elephant.
|
| Bull.
|
| Mice in a box in the library.
|
| A church swelter.
|
| The door of a hotel opens.
|
| Bosnian honey melons.
|
| Grapes in excess.
|
| Cat is on the loose.
|
| Soil is shoveled into a glass jar
| loxias wrote:
| I'd love to know the minimum hardware requirements to run
| something like this locally.
| krick wrote:
| Are there some truly objective benchmarks to compare this to GPT
| 2/3?
| ipsum2 wrote:
| yes, see GLUE or superGLUE benchmarks. It assumes the answers
| have not been scraped and included in the dataset though.
| clircle wrote:
| I think this is an important problem. With logistic regression
| or deep learning, at least one can compare (out of sample)
| calibration curves or discrimination measures. With a language
| model, what can we do?
| bigpumpkin wrote:
| perplexity score against a corpus such as wikipedia?
| Basically how well the model predicts the next word.
| rkimb wrote:
| This is a good start, but given the breadth of applications
| this would hardly give us enough to compare, as the goal of
| these models isn't to simply recite Wikipedia articles.
| What about language translation? Content summarization?
| Code generation? Turing test performance?
| stellaathena wrote:
| Both models were trained on Wikipedia, so that's a
| particularly bad choice. But yes, in practice this is what
| people tend to do. Take results with a very large grain of
| salt though, as the domain of the prompts you feed it make
| a huge difference.
| PufPufPuf wrote:
| Did anyone manage to successfully run inference in the provided
| Google Colab (https://colab.research.google.com/github/EleutherAI
| /GPTNeo/b...)? I can run training, but can't manage to make the
| inference (even from a pre-trained model) work.
| MasterScrat wrote:
| Same here. I managed to make it "work" in the sense that it
| wouldn't crash during inference, but then it generated
| gibberish. Has anyone managed to make it work reliably?
| aeroscripts wrote:
| The problem in my case was "train_steps" in the model json
| file. Default is 0. The notebook sets it to 401000 which
| works.
| stellaathena wrote:
| Hi! Thanks for trying it out. There was a bug that should now
| be fixed. When I run the example unicorn prompt I get the
| follow. Don't hesitate to open an issue if you're still having
| trouble.
|
| "In a shocking finding, scientists discovered a herd of
| unicorns living in a remote, previously unexplored valley, in
| the Andes Mountains. Even more surprising to the researchers
| was the fact that the unicorns spoke perfect English.
|
| Bebek Uranzoglu, another member of the research team from the
| University of British Columbia, was working on a project the
| Latino-Canadian rodeo competition equipos to document a rare
| and remarkable ecosystem in the Andes Mountains.
|
| His curiosity was piqued when he spotted an adolescent herd of
| about 10 unicorns foraging in a forest near the valley of the
| Jumbo Flu Group. The unicorns -- whose numbers once swelled to
| 46,000 -- were perched on the forest floor and watched the
| researchers work.
|
| Urizoglu grew excited when he spotted another group that seemed
| to be thriving in an area below the herd. The team hoped the
| apparent population growth would indicate a human presence.
|
| But when a team of researchers set up a camera trap, they were
| surprised to find the unicorns in the first place, and in a
| forest near a lake -- in fact the forest was almost entirely
| made up of the animals. Despite their own magical presence, the
| team could not see the herd was populated by humans.
|
| "The whole place almost smelled like animals," says Bebek. "We
| were never able to find human footprints at any of the points
| we stood at. The trees were so large, you wouldn't have been
| able to walk 40 meters through them. We assumed that the truth
| of the matter was, 'Well the deer didn't like this forest at
| all.'"
| joshhart wrote:
| I think we need more funding outside of large tech companies and
| OpenAI for these kinds of things. I wonder if there is a way to
| crowdsource donations to rent the hardware to train big versions
| of these things in an open manner.
| FeepingCreature wrote:
| Is there something like chattingtransformer (
| https://pypi.org/project/chattingtransformer/ ) for gpt-neo? Ie.
| a trivial way to get text completion on a sample with sane
| defaults from the commandline.
|
| edit: Oh, I see the "generating text" section. Any way to run it
| on CPU, even if it takes an hour?
| victor9000 wrote:
| Stella mentioned elsewhere in this thread that HuggingFace is
| adding support for the Eleuther model, so text generation
| should become trivial once this work is complete.
| graiz wrote:
| A great start for a truly open approach. It's ironic that OpenAI
| isn't particularly open about its tech.
| victor9000 wrote:
| It was disappointing to see just how quickly ClosedAI changed
| its tune once they produced something of value.
| catillac wrote:
| Is ClosedAI some counterpart to OpenAI with a clever name?
| qPM9l3XJrF wrote:
| Many people (correctly, in my view) criticized OpenAI for the
| name, saying that openness should be evaluated on a case by
| case basis. Glad they listened to critics instead of trying to
| maintain consistency for its own sake.
| scottisbrave84 wrote:
| Amen
| FL33TW00D wrote:
| Whilst obviously BERT is not the same as GPT-3 in architecture,
| Amazons recent paper discussing architecture optimizations for
| BERT seems pretty relevant here
| (https://arxiv.org/pdf/2010.10499.pdf) given the chance to
| improve upon GPT-3s architecture (because it surely isn't the
| best we can get). Have the Eleuther.ai team been exploring this?
| minimaxir wrote:
| Per Twitter, there will be more info about model performance
| tomorrow:
| https://twitter.com/arankomatsuzaki/status/13737326454445793...
| [deleted]
| stellaathena wrote:
| We've added a table with some evaluation scores to the GitHub
| repo, and you can see a comparison between our scores, GPT-2,
| and GPT-3 here:
| https://twitter.com/BlancheMinerva/status/137399189661642752...
|
| tl;dr we are doing pretty much exactly as well as we expected
| on LAMBADA and WikiText. Results on more sophisticated tasks
| will take some time, but HuggingFace is currently working on
| implementing our model in the transformers library and when
| they do so we can easily run a lot of analyses very quickly.
|
| We actually built an evaluation suite that integrates with HF,
| but interfacing with the MTF code that GPT-Neo was written in
| was too much of a pain in the ass because Mesh TensorFlow is
| the worst. https://github.com/EleutherAI/lm-evaluation-harness
| monkeydust wrote:
| If I wanted to build an support Q&A system using texts from
| support logs, training docs, transcribed videos etc etc
| (basically as much text about my product as I can get) would this
| model be a good start ?
| malka wrote:
| You should look on hugging faces https://huggingface.co/
|
| https://huggingface.co/transformers/task_summary.html#extrac...
| more precisely
| FL33TW00D wrote:
| Depending on how much content you've got, this blog post from
| HuggingFace might be interesting:
| https://yjernite.github.io/lfqa.html
| ggnore7452 wrote:
| also, for some quick and simple Q&A system. Haystack
| https://github.com/deepset-ai/haystack (essentially dense
| vector similarity on Elastic Search) looks pretty promising and
| supports whole pipeline.
| The_rationalist wrote:
| Does it leverage deepspeed/zero 3?
| minimaxir wrote:
| That's PyTorch only; the current models are TensorFlow.
| The_rationalist wrote:
| Oh that's unfortunate, can't the models be exported to
| pytorch through e.g onnx?
| machinevision wrote:
| Even small models can be a headache to export if they have
| even they use anything custom. I can't even imagine
| something the size of GPT-3.
| Voloskaya wrote:
| Larger models aren't really more complicated than smaller
| ones though. GPT-2 is already supported, I believe only
| difference with GPT-3 is sparse attention.
| minimaxir wrote:
| The GPT-2 models included with Transformers can export to
| ONNX fine w/ a helper included with the Python
| onnxruntime.
| minimaxir wrote:
| To run in PyTorch the model architecture must be ported.
|
| ONNX is slightly different; you could export the model to
| run in the onnxruntime but that has tradeoffs.
| stellaathena wrote:
| There's a PyTorch + DeepSpeed repository here:
| https://github.com/EleutherAI/gpt-neox
| MasterScrat wrote:
| GPT-NeoX, which is a model from the same group but using GPUs
| instead of TPUs, uses techniques from DeepSpeed:
|
| https://github.com/EleutherAI/gpt-neox/
| [deleted]
| stellaathena wrote:
| We've added a table with some evaluation scores to the GitHub
| repo, and you can see a comparison between our scores, GPT-2, and
| GPT-3 here:
| https://twitter.com/BlancheMinerva/status/137399189661642752...
|
| tl;dr we are doing pretty much exactly as well as we expected on
| LAMBADA and WikiText. Results on more sophisticated tasks will
| take some time, but HuggingFace is currently working on
| implementing our model in the transformers library and when they
| do so we can easily run a lot of analyses very quickly.
|
| We actually built an evaluation suite that integrates with HF,
| but interfacing with the MTF code that GPT-Neo was written in was
| too much of a pain in the ass because Mesh TensorFlow is the
| worst. https://github.com/EleutherAI/lm-evaluation-harness
| Bostonian wrote:
| Is something with billions of parameters actually a "model"? I
| guess the answer is yes if the data set is even larger than that?
| pjfin123 wrote:
| GPT Paper:
|
| "Specifically, we train GPT-3, an autoregressive language model
| with 175 billion parameters"
|
| README:
|
| "1T or bust my dudes [...] An implementation of model & data
| parallel GPT2 & GPT3 -like models, with the ability to scale up
| to full GPT3 sizes (and possibly more!)"
|
| It seems the largest model they released is 2.7 billion
| parameters or ~0.01 the size of GPT-3. The most interesting part
| about GPT-3 was its size and it seems this is only "GPT-3-like"
| in architecture.
|
| I also have a translation library with ~100 million (0.001 GPT-3)
| parameters:
|
| https://github.com/argosopentech/argos-translate
| f430 wrote:
| what does he mean when he says 1T or bust? Is he referring to 1
| trillion parameters? Are you saying that GTP-3 has 2.7 trillion
| parameters? Does it mean that to get to GPT-3 level it needs
| 100x more amount of dataset?
| pjfin123 wrote:
| I interpreted it as aspiring to a trillion paramters but I'm
| not sure.
| jon_tau wrote:
| The saying comes from a slide by Noam Shazeer (see: https://w
| ww.youtube.com/watch?v=HgGyWS40g-g&ab_channel=Tenso...). It
| just means the current goal should be to have models with 1
| trillion parameters.
| Voloskaya wrote:
| GPT-3 has 175 billion parameters. So they need to scale by
| 64x. They already have a comparable amount of data than what
| was used by OpenAI, so it's about scaling the numbers of
| GPUs.
| f430 wrote:
| I see so that means this GPT Neo is 64 less powerful?
| Voloskaya wrote:
| Accuracy and numbers of parameters don't scale linearly
| together. It varies widely depending on exactly what you
| are measuring accuracy on as well etc. But a very
| approximate rule of thumb would be to say that accuracy
| scales with the log of the parameter count (for the same
| architecture).
| stellaathena wrote:
| GPT-3 is a model architecture, not a model. While the largest
| GPT-3 model is 175B, that very paper has a table that includes
| "GPT-3 XL" (1.3B) and "GPT-3 2.7B" as models in the GPT-3
| architecture. The 2.7B model is the same size as Ada, a model
| that OpenAI currently sells API access to under the moniker
| "GPT-3"
| Dylan16807 wrote:
| None of the other models are even close to the big one, and
| the paper also suggests calling the big one "GPT-3". And
| people do that very often in practice. So it's often
| ambiguous but saying the term _only_ means the architecture
| isn 't right either.
___________________________________________________________________
(page generated 2021-03-22 23:03 UTC)