[HN Gopher] Is there a SETI-like project to train LLM on libgen,...
___________________________________________________________________
Is there a SETI-like project to train LLM on libgen, scihub and the
likes?
I would like to contribute my computing resources.
Author : sam_lowry_
Score : 22 points
Date : 2023-12-27 17:59 UTC (5 hours ago)
| sbierwagen wrote:
| By "SETI" I assume you mean the SETI@Home distributed computing
| project.
|
| There's a two-way market where you can rent out your GPU here:
| https://vast.ai/ If you want to _donate_ your GPU you could set
| your price to $0.00 I guess.
|
| Yandex and Hugging Face trained a couple nets using distributed
| GPUs. Their lessons learned paper is here:
| https://arxiv.org/abs/2106.10207 Takeaway: using consumer GPUs on
| the other end of consumer internet connections means you end up
| very I/O limited. 50 megabit upload on a cable modem doesn't
| compare well with 50000 megabit NVswitch links in a proper GPU
| cluster. As far as I can tell they are no longer doing
| distributed training.
|
| And it's widely assumed the big nets are trained on copyrighted
| material through libgen and scihub. So if your question is "is
| there an open source net that's trained on pirated books" the
| answer is "Yes, and it's called Llama/Mistral/Falcon etc etc"
| gverri wrote:
| I believe most modern LLMs already use them.
| fancymcpoopoo wrote:
| it's much more thrilling to walk into a bookstore and steal a
| book the old fashioned way
| throwaway81523 wrote:
| There are a lot of scanned books on the Internet Archive, though
| the OCR quality is pretty bad, and now there are the possible
| copyright issues of training on any still-in-copyright works.
|
| I'd expect any SETI-like project on libgen/sci-hub would have to
| be somewhat hush hush. It wouldn't surprise me if it's already
| being done by somebody evil, who will use the results for their
| own nefarious purposes and not release them.
___________________________________________________________________
(page generated 2023-12-27 23:01 UTC)