[HN Gopher] Harvard Is Releasing a Free AI Training Dataset
___________________________________________________________________
Harvard Is Releasing a Free AI Training Dataset
Author : ilamont
Score : 24 points
Date : 2024-12-18 20:05 UTC (2 hours ago)
(HTM) web link (www.wired.com)
(TXT) w3m dump (www.wired.com)
| nadis wrote:
| "Around five times the size of the notorious Books3 dataset that
| was used to train AI models like Meta's Llama, the Institutional
| Data Initiative's database spans genres, decades, and languages,
| with classics from Shakespeare, Charles Dickens, and Dante
| included alongside obscure Czech math textbooks and Welsh pocket
| dictionaries. Greg Leppert, executive director of the
| Institutional Data Initiative, says the project is an attempt to
| "level the playing field" by giving the general public, including
| small players in the AI industry and individual researchers,
| access to the sort of highly-refined and curated content
| repositories that normally only established tech giants have the
| resources to assemble. "
|
| ^ this is pretty cool and interesting. The collaboration they're
| doing with Boston Public Library to make articles similarly
| accessible also sounds pretty exciting.
| morgango wrote:
| https://archive.is/xhJvc
___________________________________________________________________
(page generated 2024-12-18 23:01 UTC)