[HN Gopher] Announcing GPT-NeoX-20B
___________________________________________________________________
Announcing GPT-NeoX-20B
Author : jscob
Score : 138 points
Date : 2022-02-02 16:03 UTC (6 hours ago)
(HTM) web link (blog.eleuther.ai)
(TXT) w3m dump (blog.eleuther.ai)
| fpgaminer wrote:
| So excited for this release. In the wake of AI Dungeon's
| downfall, having GPT-Neo to fallback on has been a saving grace.
| While the 6B model is nowhere near as good as the original AI
| Dungeon, which used OpenAI's 175B model, it was at least
| serviceable unlike the "gentled" AI Dungeon. And you could run it
| locally or through Colab, which was really cool. I ended up using
| it through NovelAI, since they've spent a lot of time fine-tuning
| the model and adding a plethora of features that end up improving
| the overall output. (NovelAI's interface is like AI Dungeon on
| steroids!) But there is a vibrant community of Colab notebooks
| and other tools for DIYers surrounding the GPT-Neo model.
|
| That said, besides being overall "dumber" than 175B GPT-3, the 6B
| model was missing a critical feature: prompting. 175B GPT-3 could
| be "prompted" to write things. For example, you could give it
| "Write a story about cyberpunk gnomes:" and it would go on to do
| just that, all on its own. GPT-Neo didn't really have that
| capability in my experience. The only way to get it to reliably
| write such a story is to begin writing it yourself, at which
| point GPT-Neo could help to continue the story.
|
| So I'm excited to see not just how much "smarter" Eleuther's new
| 20B model is, but also if it has attained that coveted prompting
| ability. Given the non-linear relationship between parameters and
| loss, my hopes are high.
|
| P.S. NovelAI recently added the Fairseq 13B model to their
| repertoire. I haven't had a chance to try it personally, but I've
| seen positive things about it. My bet is on GPT-NeoX-20B being
| better still.
| qlm wrote:
| What was AI Dungeon's downfall? Can't find much about it.
| minimaxir wrote:
| tl;dr AI Dungeon was required to add additional content
| filters after it went too off the rails, which caused
| community backlash.
|
| https://www.wired.com/story/ai-fueled-dungeon-game-got-
| much-...
| fpgaminer wrote:
| It was more than that. They also significantly downgraded
| the model. I didn't follow the details, but IIUC Dragon
| used the 175B directly initially, then I think they went
| down a model size at Open AI's behest. Finally, when Open
| AI announced pricing, AI Dungeon had to downgrade the model
| further.
|
| But yes, the content filtering got out of hand too. I was
| initially fine with it, as its proposed intention was to
| filter out really illegal stuff, like underage content. I
| rarely hit the filter. But then they tweaked it at some
| point and I was triggering it constantly on otherwise
| benign stuff.
|
| And they broke features constantly.
|
| When I unsubbed the state of AID was broken features,
| micro-transactions, terrible AI model, and a glitchy,
| puritanical content filter.
|
| The plus side is that it made the puny GPT-Neo model look
| like a godsend.
| causi wrote:
| _really illegal stuff, like underage content_
|
| Wait, isn't this output just text? How is a text AI
| generating illegal content?
| NovemberWhiskey wrote:
| The content may not be illegal to possess, but if it's
| obscene, then it can be illegal to sell it, produce it
| with the intention of selling it, transport it,
| distribute it, and so on.
| capableweb wrote:
| Could it really? I was under the impression that unless
| you incite someone to commit crimes (or confess to
| crimes), the story would be covered under "art" and
| therefore protected. It's just text after all. Where does
| the limit for "obscene" go?
| NovemberWhiskey wrote:
| In the U.S., it's called the Miller test:
| https://en.wikipedia.org/wiki/Miller_test
| capableweb wrote:
| Wow, I had no idea, that sounds really bad. The whole
| book banning debacle now makes sense and seems legal.
| That test seems to me to give way for courts to basically
| judge however they want, as all those three criteria are
| very subjective.
|
| Also first time I hear about "patently offensive" and now
| I'm laughing. Thanks!
| causi wrote:
| It's very funny to imagine picking up a romance novel and
| making it illegal by scrawling "by the way the girl was
| actually 16 the whole time" on the inside of the back
| cover.
| miohtama wrote:
| It's called thoughtcrime
|
| https://www.quora.com/Do-thought-crimes-exist-in-U-S-law
| bitforger wrote:
| I believe they're currently using AI21's 178B Jumbo model
| for Dragon. Since they're completely off of OpenAI now,
| the content filter is much more lax.
| benjismith wrote:
| The "prompting" ability you're referring to is called
| "instruction following", and here are some descriptions of it.
|
| https://openai.com/blog/instruction-following/
|
| I think the differences are more in the training data used,
| than in the nature of the model itself. So you could probably
| train your own instruction-following model on top of this raw
| 20B model.
| f38zf5vdt wrote:
| You can try the model out for free at goose.ai after making an
| account and going to the sandbox section.
| gibsonf1 wrote:
| How is pattern matching ever inference when there is no reference
| to the underlying computational model of what the words mean in
| spacetime?
|
| How is it helpful to see what word might come next when the word
| sequence is just based on statistics with no reference at all to
| meaning?
| ChefboyOG wrote:
| Have you ever used any sort of autocomplete?
| gibsonf1 wrote:
| Yes, and very much like it when quickly selecting from a
| scope of valid selections.
|
| This is not that. It is all A with no I.
| drxzcl wrote:
| Humans assign a lot of, well, meaning to meaning. It turns out
| that you can get a really good score on tasks that
| superficially you would think require actual understanding
| without programming any of that in.
|
| Does this mean the neural network has learned about meaning?
| Does that mean that it has just gotten really good at faking
| it? Does is mean that meaning itself doesn't really exist, and
| it's just a shorthand for advanced pattern matching? Does it
| matter?
|
| Honestly, we don't know. But we've been thinking about it for a
| very long time. See for example the famous Chinese Room thought
| experiment:
|
| https://en.wikipedia.org/wiki/Chinese_room
| gibsonf1 wrote:
| Try driving a car around without both conceptual and causal
| systems understanding of the world - meaning matters for
| survival.
| f38zf5vdt wrote:
| Right on, they're closing in on "Open"AI's best models. Can this
| still be run on a GPU, or does it require a lot more VRAM?
| stellaathena wrote:
| It can be run on an A40 or A6000, as well as the largest A100s.
| But other than that, no.
| bm-rf wrote:
| You could use Microsoft's DeepSpeed to run the model for
| inference on multiple GPUS, see
| https://www.deepspeed.ai/tutorials/inference-tutorial/
| djoldman wrote:
| How much VRAM does it use during inference?
| stellaathena wrote:
| ~40 GB with standard optimization. I suspect you can shrink
| it down more with some work, but it would require
| significant innovation to cram it into the next largest
| common chip size (24 GB, unless I'm misremembering)
| komuher wrote:
| Is 40GB already on float16?
| benjismith wrote:
| I'm super excited about this!
|
| I'm on the cusp of releasing a model into production that was
| fine-tuned upon your 6B model, and the results are quite
| excellent. I'd be very curious to try out the 20B model the next
| time we retrain.
|
| Are there any other differences in this release (number of
| layers, number of attention heads, etc) compared with the 6B
| model, or does it simply scale-up the number of parameters?
| guidovranken wrote:
| GPT-NeoX-20B will be publicly downloadable from The Eye on the
| 9th of February.
|
| The Eye as in the-eye.eu? That site has been down for a long
| time.
| stellaathena wrote:
| There is a mirror at https://mystic.the-eye.eu/ that has been
| up for a long time.
| drusepth wrote:
| Thanks for this. When the-eye.eu went down it broke a ton of
| my Colab notebooks and it was impossible to find a mirror.
| dash2 wrote:
| Does anyone know whether the spammy websites that sit at the top
| of search engine results are already generated by this kind of
| model?
| nefitty wrote:
| That's a use case. I don't see why anyone would go out of their
| way to make intelligible content for spam. Google is so broken
| right now that SEO hacks are easy to generate. Not to
| overstress the tangent, but without search operstors, I have to
| sift through pointless Gitlab/Github/Stackoverflow/Wikipedia
| clones all the time.
| ChefboyOG wrote:
| By and large, no.
|
| That's not to say that those sites are not generated
| programmatically--without a doubt, most of them are--but not by
| a cutting edge transformer model. The fact is, generating words
| has never been the bottleneck for blackhat SEO types.
| Generally, those sites are generating their content through
| some kind of scraping, or in rarer cases, paying pennies for
| nonsense articles. The page itself is structured for search
| (targeted H1s, metadata, etc.) and some kind of private blog
| network is used to create a pyramid of backlinks.
| trasz wrote:
| So, what does it do?
| [deleted]
| dqpb wrote:
| EleutherAI is the real open ai.
| btdmaster wrote:
| Awesome! Any chance for an online demo (like
| https://6b.eleuther.ai/)?
| stellaathena wrote:
| Coming soon!
| schleck8 wrote:
| Awesome, thanks for your work Stella & team!
| [deleted]
| terafo wrote:
| The best there is right now is a playground on
| https://goose.ai/
| stavros wrote:
| Which unfortunately doesn't work properly on Firefox (spaces
| are removed).
| nefitty wrote:
| Thank you to everyone who has worked on this. EleutherAI has
| become a touchstone in my mind on what is possible in open data
| and code. In creating alternatives to closed gardens they have
| shown me new possible paths. I know Linux has done the same for
| others.
|
| Huggingface has also made playing with this stuff super
| accessible. They've made me super curious about rust and AI/ML
| research which has influenced my personal engineering goals for
| the future. I am on your team Roko's Basilisk.
| monkeydust wrote:
| Shout out to Huggingface. As a business user it has allowed me
| to explore use cases around text summarisation very easily and
| provided ideas for future work. I clearly need to check out
| EleutherAI as well.
| rsync wrote:
| I came to this thread looking for comments that I would suspect
| were machine generated.
|
| I was not disappointed.
| coolspot wrote:
| Good bot
| nefitty wrote:
| Beep beep. That means thank you in my motherboard.
___________________________________________________________________
(page generated 2022-02-02 23:00 UTC)