https://blog.eleuther.ai/announcing-20b/

logoEleutherAI

  * Home
  * About
  * FAQ
  * Blog
  * Publications

dark mode switch button

Announcing GPT-NeoX-20B

Announcing GPT-NeoX-20B, a 20 billion parameter model trained in
collaboration with CoreWeave.
February 2, 2022 * Connor Leahy

GPT-NeoX-20B will be publicly downloadable from The Eye on the 9th of
February. In the meantime, you can already try out the model using
CoreWeave's and Anlatan's new inference service, GooseAI!

---------------------------------------------------------------------

After a year-long odyssey through months of chip shortage-induced
shipping delays, technical trials and tribulations, and aggressively
boring debugging, we are happy to finally announce EleutherAI's
latest open-source language model: GPT-NeoX-20B, a 20 billion
parameter model trained using our GPT-NeoX framework on GPUs
generously provided by our friends at CoreWeave.

GPT-NeoX-20B is, to our knowledge, the largest publicly accessible
pretrained general-purpose autoregressive language model, and we
expect it to perform well on many tasks.

We hope that the increased accessibility of models of this size will
aid in research towards the safe use of AI systems, and encourage
anyone interested in working in this direction to reach out to us.

As a thank you to our generous compute donors, we are delaying the
public downloadable release of the model by 7 days. On February 9,
2022, the full model weights will be downloadable for free under a
permissive Apache 2.0 license from The Eye.

There will be a #20b channel set up in our Discord for discussions of
this model. Please note that much like our other language models and
codebases, GPT-NeoX and GPT-NeoX-20B are very much research artifacts
and we do not recommend deploying either in a production setting
without careful consideration. In particular, we strongly encourage
those looking to use GPT-NeoX-20B to read the paper and datasheet on
our training data. There are still bugs to be ironed out and many
inefficiencies that could be addressed--but hey, we do this in our
free time, give us a break lol

---------------------------------------------------------------------

   Task     Category   Babbage Curie  GPT-J-6B FairSeq-13B GPT-NeoX-20B DaVinci
LAMBADA    Sentence    62.49%  69.51% 68.29%   70.95%      71.98%       75.16%
           Completion
           Natural
ANLI R1    Language    32.40%  32.80% 32.40%   34.00%      33.50%       36.30%
           Inference
           Natural
ANLI R2    Language    30.90%  33.50% 34.00%   33.00%      34.40%       37.00%
           Inference
           Natural
ANLI R3    Language    33.75%  35.50% 35.50%   34.75%      35.75%       36.83%
           Inference
WSC        Coreference 40.38%  54.81% 36.53%   57.69%      53.61%       63.46%
           Resolution
Winogrande Coreference 59.51%  64.56% 64.01%   67.40%      65.27%       69.93%
           Resolution
HellaSwag  Sentence    54.54%  49.54% 49.54%   55.44%      49.04%       59.18%
           Completion
Total                  39.40%  42.57% 40.28%   44.67%      43.31%       48.40%

Accuracy on standard language modeling tasks.

Subject Group Babbage Curie  GPT-J-6B FairSeq-13B GPT-NeoX-20B DaVinci
Humanities    27.01%  26.48% 28.07%   27.27%      28.70%       32.30%
Social        27.94%  29.24% 28.73%   27.94%      31.63%       35.87%
Science
STEM          25.83%  24.25% 25.71%   24.63%      26.27%       28.60%
Other         26.86%  28.84% 27.95%   27.33%      29.83%       36.85%
Total         26.78%  26.90% 27.38%   26.53%      28.77%       32.86%

Accuracy of factual knowledge by subject group, as measured by the
HendrycksTest evaluation.

(c) 2022 EleutherAI