[HN Gopher] Harvard Is Releasing a Free AI Training Dataset
       ___________________________________________________________________
        
       Harvard Is Releasing a Free AI Training Dataset
        
       Author : ilamont
       Score  : 24 points
       Date   : 2024-12-18 20:05 UTC (2 hours ago)
        
 (HTM) web link (www.wired.com)
 (TXT) w3m dump (www.wired.com)
        
       | nadis wrote:
       | "Around five times the size of the notorious Books3 dataset that
       | was used to train AI models like Meta's Llama, the Institutional
       | Data Initiative's database spans genres, decades, and languages,
       | with classics from Shakespeare, Charles Dickens, and Dante
       | included alongside obscure Czech math textbooks and Welsh pocket
       | dictionaries. Greg Leppert, executive director of the
       | Institutional Data Initiative, says the project is an attempt to
       | "level the playing field" by giving the general public, including
       | small players in the AI industry and individual researchers,
       | access to the sort of highly-refined and curated content
       | repositories that normally only established tech giants have the
       | resources to assemble. "
       | 
       | ^ this is pretty cool and interesting. The collaboration they're
       | doing with Boston Public Library to make articles similarly
       | accessible also sounds pretty exciting.
        
       | morgango wrote:
       | https://archive.is/xhJvc
        
       ___________________________________________________________________
       (page generated 2024-12-18 23:01 UTC)