[HN Gopher] The Pile: An 800GB Dataset of Diverse Text for Langu...
       ___________________________________________________________________
        
       The Pile: An 800GB Dataset of Diverse Text for Language Modeling
        
       Author : leogao
       Score  : 22 points
       Date   : 2021-01-01 22:19 UTC (41 minutes ago)
        
 (HTM) web link (pile.eleuther.ai)
 (TXT) w3m dump (pile.eleuther.ai)
        
       | bratao wrote:
       | I'm super excited by this dataset. The EleuterAI team is stellar
       | and many great things are coming soon from them!
        
       | legatus wrote:
       | I think it's worth noting that EleutherAI is a grassroots
       | collection of researchers, which distinguishes it from
       | academia/industry labs.
       | 
       | As part of their work on democratizing AI, they're now hoping to
       | replicate GPT-3 and release it for free (unlike OpenAI's API).
       | 
       | I would encourage everyone interested to join their discord
       | server (https://discord.gg/BK2v3EJ) -- they're extremely friendly
       | and I think it's a project worth contributing to.
        
       | leogao wrote:
       | Twitter thread:
       | https://twitter.com/nabla_theta/status/1345130412532645888
        
       ___________________________________________________________________
       (page generated 2021-01-01 23:00 UTC)