[HN Gopher] Is there a SETI-like project to train LLM on libgen,...
       ___________________________________________________________________
        
       Is there a SETI-like project to train LLM on libgen, scihub and the
       likes?
        
       I would like to contribute my computing resources.
        
       Author : sam_lowry_
       Score  : 22 points
       Date   : 2023-12-27 17:59 UTC (5 hours ago)
        
       | sbierwagen wrote:
       | By "SETI" I assume you mean the SETI@Home distributed computing
       | project.
       | 
       | There's a two-way market where you can rent out your GPU here:
       | https://vast.ai/ If you want to _donate_ your GPU you could set
       | your price to $0.00 I guess.
       | 
       | Yandex and Hugging Face trained a couple nets using distributed
       | GPUs. Their lessons learned paper is here:
       | https://arxiv.org/abs/2106.10207 Takeaway: using consumer GPUs on
       | the other end of consumer internet connections means you end up
       | very I/O limited. 50 megabit upload on a cable modem doesn't
       | compare well with 50000 megabit NVswitch links in a proper GPU
       | cluster. As far as I can tell they are no longer doing
       | distributed training.
       | 
       | And it's widely assumed the big nets are trained on copyrighted
       | material through libgen and scihub. So if your question is "is
       | there an open source net that's trained on pirated books" the
       | answer is "Yes, and it's called Llama/Mistral/Falcon etc etc"
        
       | gverri wrote:
       | I believe most modern LLMs already use them.
        
       | fancymcpoopoo wrote:
       | it's much more thrilling to walk into a bookstore and steal a
       | book the old fashioned way
        
       | throwaway81523 wrote:
       | There are a lot of scanned books on the Internet Archive, though
       | the OCR quality is pretty bad, and now there are the possible
       | copyright issues of training on any still-in-copyright works.
       | 
       | I'd expect any SETI-like project on libgen/sci-hub would have to
       | be somewhat hush hush. It wouldn't surprise me if it's already
       | being done by somebody evil, who will use the results for their
       | own nefarious purposes and not release them.
        
       ___________________________________________________________________
       (page generated 2023-12-27 23:01 UTC)