[HN Gopher] GPT Neo: open-source GPT model, with pretrained 1.3B...
       ___________________________________________________________________
        
       GPT Neo: open-source GPT model, with pretrained 1.3B & 2.7B weight
       models
        
       Author : pizza
       Score  : 148 points
       Date   : 2021-03-21 21:01 UTC (1 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | ve55 wrote:
       | This is a nice release, but the title is a bit misleading as the
       | released sizes (1.3B and 2.7B parameters) do not yet compare to
       | the size of GPT-3 (175B), but rather GPT-2 (1.5B) instead
       | (although future releases may have significantly more!).
       | 
       | Edit: title improved, thank you!
        
         | nl wrote:
         | Yeah. They say they are doing a 10B release soon[1].
         | 
         | I suspect they have run into training issues since they are
         | moving to a new repo[2]
         | 
         | [1]
         | https://twitter.com/arankomatsuzaki/status/13737326468119674...
         | 
         | [2] https://github.com/EleutherAI/gpt-neox/
        
           | chillee wrote:
           | It's more about hardware - these models were trained on TPUs,
           | while GPT-NeoX is being trained on GPUs graciously provided
           | by Coreweave.
        
             | orra wrote:
             | Any idea what the required GPU time would cost (if not
             | donated)? Is GPT-3 just a commodity soon?
        
         | pizza wrote:
         | Fixed title to reflect that, thanks
        
           | ve55 wrote:
           | I would perhaps change 'GPT-3' to just say 'GPT' instead, as
           | a more salient fix.
        
       | f430 wrote:
       | Can somebody explain to this beginner how to use this? Where can
       | I load this code and start running it? How can I train it on a
       | dataset and what do I need to prepare?
       | 
       | Lots of language here I don't understand like what is he
       | referring to when he says 1.5B or 1T weights?
       | 
       | What resources/videos can I watch in order to start tinkering
       | with this?
        
         | vertis wrote:
         | The repository readme actually includes a link to a notebook[1]
         | that helps getting started on Google colab. It's as good a
         | place to start as any:
         | 
         | [1]:
         | https://colab.research.google.com/github/EleutherAI/GPTNeo/b...
        
           | f430 wrote:
           | thanks, I never used this before. Do I have to add a credit
           | card? How much will it cost to run this?
        
             | zora_goron wrote:
             | Colab is free to use -- you can click Runtime - Run All to
             | run the cells in the notebook free-of-charge. (You may need
             | to be logged in to a Google Account to run it.)
        
             | [deleted]
        
       | leogao wrote:
       | Could the title of this post be change to emphasize that the
       | model sizes released were 1.3B and 2.7B? Something like
       | "EleutherAI releases 1.3B and 2.7B parameter GPT-like language
       | models". The current title implies that a full sized GPT-3 model
       | is currently available, which is not the case.
       | 
       | edit: the title has been changed, seems good enough
        
       | choxi wrote:
       | Is there anything a non-AI researcher can do to help support this
       | project? Is there a way to donate money? Or could a software
       | engineer help with testing, tooling, or other kinds of
       | infrastructure?
       | 
       | I was really excited about OpenAI's original plan and still
       | believe that an open source solution is the best way to prevent
       | the potential negative impacts AI might have on society. I can
       | sort of appreciate why OpenAI went the route of going private and
       | trying to monetize their work instead, it might prevent people
       | from using their work nefariously and will probably provide them
       | with way more capital to continue their efforts. But, I trust
       | humanity as a collective more than any particular group of people
       | in the long run. I'm sure there are many others like me who would
       | be eager to help out if they could.
       | 
       | Edit: EleutherAI has a whole page on their site about how others
       | can contribute: https://www.eleuther.ai/get-involved/. I didn't
       | see anything about accepting donations though, if anyone involved
       | with the project was interested in setting up a crowdfunding
       | account somewhere I'd be eager to donate.
        
       | dylanbyte wrote:
       | Curious to see what parameter size of gpt3 this will end up being
       | equivalent to. Obviously we won't know until they evaluate their
       | models.
        
       | joshhart wrote:
       | I think we need more funding outside of large tech companies and
       | OpenAI for these kinds of things. I wonder if there is a way to
       | crowdsource donations to rent the hardware to train big versions
       | of these things in an open manner.
        
       | graiz wrote:
       | A great start for a truly open approach. It's ironic that OpenAI
       | isn't particularly open about its tech.
        
       | FL33TW00D wrote:
       | Whilst obviously BERT is not the same as GPT-3 in architecture,
       | Amazons recent paper discussing architecture optimizations for
       | BERT seems pretty relevant here
       | (https://arxiv.org/pdf/2010.10499.pdf) given the chance to
       | improve upon GPT-3s architecture (because it surely isn't the
       | best we can get). Have the Eleuther.ai team been exploring this?
        
       | minimaxir wrote:
       | Per Twitter, there will be more info about model performance
       | tomorrow:
       | https://twitter.com/arankomatsuzaki/status/13737326454445793...
        
         | [deleted]
        
       | monkeydust wrote:
       | If I wanted to build an support Q&A system using texts from
       | support logs, training docs, transcribed videos etc etc
       | (basically as much text about my product as I can get) would this
       | model be a good start ?
        
         | malka wrote:
         | You should look on hugging faces https://huggingface.co/
         | 
         | https://huggingface.co/transformers/task_summary.html#extrac...
         | more precisely
        
       | The_rationalist wrote:
       | Does it leverage deepspeed/zero 3?
        
         | minimaxir wrote:
         | That's PyTorch only; the current models are TensorFlow.
        
           | The_rationalist wrote:
           | Oh that's unfortunate, can't the models be exported to
           | pytorch through e.g onnx?
        
             | machinevision wrote:
             | Even small models can be a headache to export if they have
             | even they use anything custom. I can't even imagine
             | something the size of GPT-3.
        
               | minimaxir wrote:
               | The GPT-2 models included with Transformers can export to
               | ONNX fine w/ a helper included with the Python
               | onnxruntime.
        
             | minimaxir wrote:
             | To run in PyTorch the model architecture must be ported.
             | 
             | ONNX is slightly different; you could export the model to
             | run in the onnxruntime but that has tradeoffs.
        
       | [deleted]
        
       | pjfin123 wrote:
       | GPT Paper:
       | 
       | "Specifically, we train GPT-3, an autoregressive language model
       | with 175 billion parameters"
       | 
       | README:
       | 
       | "1T or bust my dudes [...] An implementation of model & data
       | parallel GPT2 & GPT3 -like models, with the ability to scale up
       | to full GPT3 sizes (and possibly more!)"
       | 
       | It seems the largest model they released is 2.7 billion
       | parameters or ~0.01 the size of GPT-3. The most interesting part
       | about GPT-3 was its size and it seems this is only "GPT-3-like"
       | in architecture.
       | 
       | I also have a translation library with ~100 million (0.001 GPT-3)
       | parameters:
       | 
       | https://github.com/argosopentech/argos-translate
        
         | f430 wrote:
         | what does he mean when he says 1T or bust? Is he referring to 1
         | trillion parameters? Are you saying that GTP-3 has 2.7 trillion
         | parameters? Does it mean that to get to GPT-3 level it needs
         | 100x more amount of dataset?
        
           | pjfin123 wrote:
           | I interpreted it as aspiring to a trillion paramters but I'm
           | not sure.
        
           | jon_tau wrote:
           | The saying comes from a slide by Noam Shazeer (see: https://w
           | ww.youtube.com/watch?v=HgGyWS40g-g&ab_channel=Tenso...). It
           | just means the current goal should be to have models with 1
           | trillion parameters.
        
       ___________________________________________________________________
       (page generated 2021-03-21 23:00 UTC)