[HN Gopher] GPT Neo: open-source GPT model, with pretrained 1.3B...
___________________________________________________________________
GPT Neo: open-source GPT model, with pretrained 1.3B & 2.7B weight
models
Author : pizza
Score : 148 points
Date : 2021-03-21 21:01 UTC (1 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| ve55 wrote:
| This is a nice release, but the title is a bit misleading as the
| released sizes (1.3B and 2.7B parameters) do not yet compare to
| the size of GPT-3 (175B), but rather GPT-2 (1.5B) instead
| (although future releases may have significantly more!).
|
| Edit: title improved, thank you!
| nl wrote:
| Yeah. They say they are doing a 10B release soon[1].
|
| I suspect they have run into training issues since they are
| moving to a new repo[2]
|
| [1]
| https://twitter.com/arankomatsuzaki/status/13737326468119674...
|
| [2] https://github.com/EleutherAI/gpt-neox/
| chillee wrote:
| It's more about hardware - these models were trained on TPUs,
| while GPT-NeoX is being trained on GPUs graciously provided
| by Coreweave.
| orra wrote:
| Any idea what the required GPU time would cost (if not
| donated)? Is GPT-3 just a commodity soon?
| pizza wrote:
| Fixed title to reflect that, thanks
| ve55 wrote:
| I would perhaps change 'GPT-3' to just say 'GPT' instead, as
| a more salient fix.
| f430 wrote:
| Can somebody explain to this beginner how to use this? Where can
| I load this code and start running it? How can I train it on a
| dataset and what do I need to prepare?
|
| Lots of language here I don't understand like what is he
| referring to when he says 1.5B or 1T weights?
|
| What resources/videos can I watch in order to start tinkering
| with this?
| vertis wrote:
| The repository readme actually includes a link to a notebook[1]
| that helps getting started on Google colab. It's as good a
| place to start as any:
|
| [1]:
| https://colab.research.google.com/github/EleutherAI/GPTNeo/b...
| f430 wrote:
| thanks, I never used this before. Do I have to add a credit
| card? How much will it cost to run this?
| zora_goron wrote:
| Colab is free to use -- you can click Runtime - Run All to
| run the cells in the notebook free-of-charge. (You may need
| to be logged in to a Google Account to run it.)
| [deleted]
| leogao wrote:
| Could the title of this post be change to emphasize that the
| model sizes released were 1.3B and 2.7B? Something like
| "EleutherAI releases 1.3B and 2.7B parameter GPT-like language
| models". The current title implies that a full sized GPT-3 model
| is currently available, which is not the case.
|
| edit: the title has been changed, seems good enough
| choxi wrote:
| Is there anything a non-AI researcher can do to help support this
| project? Is there a way to donate money? Or could a software
| engineer help with testing, tooling, or other kinds of
| infrastructure?
|
| I was really excited about OpenAI's original plan and still
| believe that an open source solution is the best way to prevent
| the potential negative impacts AI might have on society. I can
| sort of appreciate why OpenAI went the route of going private and
| trying to monetize their work instead, it might prevent people
| from using their work nefariously and will probably provide them
| with way more capital to continue their efforts. But, I trust
| humanity as a collective more than any particular group of people
| in the long run. I'm sure there are many others like me who would
| be eager to help out if they could.
|
| Edit: EleutherAI has a whole page on their site about how others
| can contribute: https://www.eleuther.ai/get-involved/. I didn't
| see anything about accepting donations though, if anyone involved
| with the project was interested in setting up a crowdfunding
| account somewhere I'd be eager to donate.
| dylanbyte wrote:
| Curious to see what parameter size of gpt3 this will end up being
| equivalent to. Obviously we won't know until they evaluate their
| models.
| joshhart wrote:
| I think we need more funding outside of large tech companies and
| OpenAI for these kinds of things. I wonder if there is a way to
| crowdsource donations to rent the hardware to train big versions
| of these things in an open manner.
| graiz wrote:
| A great start for a truly open approach. It's ironic that OpenAI
| isn't particularly open about its tech.
| FL33TW00D wrote:
| Whilst obviously BERT is not the same as GPT-3 in architecture,
| Amazons recent paper discussing architecture optimizations for
| BERT seems pretty relevant here
| (https://arxiv.org/pdf/2010.10499.pdf) given the chance to
| improve upon GPT-3s architecture (because it surely isn't the
| best we can get). Have the Eleuther.ai team been exploring this?
| minimaxir wrote:
| Per Twitter, there will be more info about model performance
| tomorrow:
| https://twitter.com/arankomatsuzaki/status/13737326454445793...
| [deleted]
| monkeydust wrote:
| If I wanted to build an support Q&A system using texts from
| support logs, training docs, transcribed videos etc etc
| (basically as much text about my product as I can get) would this
| model be a good start ?
| malka wrote:
| You should look on hugging faces https://huggingface.co/
|
| https://huggingface.co/transformers/task_summary.html#extrac...
| more precisely
| The_rationalist wrote:
| Does it leverage deepspeed/zero 3?
| minimaxir wrote:
| That's PyTorch only; the current models are TensorFlow.
| The_rationalist wrote:
| Oh that's unfortunate, can't the models be exported to
| pytorch through e.g onnx?
| machinevision wrote:
| Even small models can be a headache to export if they have
| even they use anything custom. I can't even imagine
| something the size of GPT-3.
| minimaxir wrote:
| The GPT-2 models included with Transformers can export to
| ONNX fine w/ a helper included with the Python
| onnxruntime.
| minimaxir wrote:
| To run in PyTorch the model architecture must be ported.
|
| ONNX is slightly different; you could export the model to
| run in the onnxruntime but that has tradeoffs.
| [deleted]
| pjfin123 wrote:
| GPT Paper:
|
| "Specifically, we train GPT-3, an autoregressive language model
| with 175 billion parameters"
|
| README:
|
| "1T or bust my dudes [...] An implementation of model & data
| parallel GPT2 & GPT3 -like models, with the ability to scale up
| to full GPT3 sizes (and possibly more!)"
|
| It seems the largest model they released is 2.7 billion
| parameters or ~0.01 the size of GPT-3. The most interesting part
| about GPT-3 was its size and it seems this is only "GPT-3-like"
| in architecture.
|
| I also have a translation library with ~100 million (0.001 GPT-3)
| parameters:
|
| https://github.com/argosopentech/argos-translate
| f430 wrote:
| what does he mean when he says 1T or bust? Is he referring to 1
| trillion parameters? Are you saying that GTP-3 has 2.7 trillion
| parameters? Does it mean that to get to GPT-3 level it needs
| 100x more amount of dataset?
| pjfin123 wrote:
| I interpreted it as aspiring to a trillion paramters but I'm
| not sure.
| jon_tau wrote:
| The saying comes from a slide by Noam Shazeer (see: https://w
| ww.youtube.com/watch?v=HgGyWS40g-g&ab_channel=Tenso...). It
| just means the current goal should be to have models with 1
| trillion parameters.
___________________________________________________________________
(page generated 2021-03-21 23:00 UTC)