[HN Gopher] The Pile: An 800GB Dataset of Diverse Text for Langu...
___________________________________________________________________
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Author : leogao
Score : 22 points
Date : 2021-01-01 22:19 UTC (41 minutes ago)
(HTM) web link (pile.eleuther.ai)
(TXT) w3m dump (pile.eleuther.ai)
| bratao wrote:
| I'm super excited by this dataset. The EleuterAI team is stellar
| and many great things are coming soon from them!
| legatus wrote:
| I think it's worth noting that EleutherAI is a grassroots
| collection of researchers, which distinguishes it from
| academia/industry labs.
|
| As part of their work on democratizing AI, they're now hoping to
| replicate GPT-3 and release it for free (unlike OpenAI's API).
|
| I would encourage everyone interested to join their discord
| server (https://discord.gg/BK2v3EJ) -- they're extremely friendly
| and I think it's a project worth contributing to.
| leogao wrote:
| Twitter thread:
| https://twitter.com/nabla_theta/status/1345130412532645888
___________________________________________________________________
(page generated 2021-01-01 23:00 UTC)