hngopher.com

       [HN Gopher] Authors Seek Meta's Torrent Client Logs and Seeding ...
       ___________________________________________________________________
        
       Authors Seek Meta's Torrent Client Logs and Seeding Data in AI
       Piracy Probe
        
       Author : miki123211
       Score  : 38 points
       Date   : 2025-01-20 20:38 UTC (2 hours ago)
        
 (HTM) web link (torrentfreak.com)
 (TXT) w3m dump (torrentfreak.com)
        
       | hnburnsy wrote:
       | Wonder if Meta is running a one way Usenet host. Much better than
       | torrents.
        
         | LtdJorge wrote:
         | The first rule of Usenet is: you do not talk about Usenet
        
           | geor9e wrote:
           | if it was meant to be kept secret it probably shouldnt have
           | been put on the AOL home portal in 1994
        
           | spokaneplumb wrote:
           | People breaking the first rule wasn't enough for me to crack
           | into the scene. The weird two-paid-services thing required to
           | use it effectively--a search service of some kind, and your
           | actual content provider--and the jankiness of the software
           | and sites involved were enough to get me to give up, after
           | spending some money but making no meaningful progress toward
           | pirating anything.
           | 
           | I started my piracy journey on Napster. I've done all the
           | other biggies. I've done off-the-beaten-path stuff like IRC
           | piracy channels. Private trackers. I have a soft spot for
           | Windowmaker and was dumb enough to run Gentoo so long that I
           | got kinda good at the "scary" deep parts of Linux sysadmin. I
           | can deal with fiddliness and allegedly-ugly UI.
           | 
           | Usenet piracy defeated me.
        
       | heroprotagonist wrote:
       | What's the lesson, hire contractors?
        
         | kevingadd wrote:
         | It's possible their friends in government will make this all go
         | away if they ask nicely enough.
        
           | pixelpoet wrote:
           | Would $1m suffice?
           | https://www.bbc.com/news/articles/c8j9e1x9z2xo
        
             | edoceo wrote:
             | That's the ante; gotta place the next wager.
        
               | plagiarist wrote:
               | The best ROI for the money is probably purchasing a
               | SCOTUS justice.
        
           | moshegramovsky wrote:
           | Yeah I had a Facebook account until today.
           | 
           | This whole thing copyright thing reminds me of when Mark
           | Zuckerberg was mad that someone posted photos of the interior
           | of his house or something.
        
       | FireBeyond wrote:
       | Try to use any of the big players training models and see how
       | quickly they remember how much they value copyright.
        
         | WhatsName wrote:
         | You mean OpenAIs infamous "you shall not train on the output of
         | our model" clause?
        
       | rockemsockem wrote:
       | It seemed obvious to me for a long time before modern LLM
       | training that any sort of training of machine intelligence would
       | have to rely on pirated content. There's just no other viable
       | alternative for efficiently acquiring large quantities of text
       | data. Buying millions of ebooks online would take a lot of
       | effort, downloading data from publishers isn't a thing that can
       | be done efficiently (assuming tech companies negotiated and threw
       | money at them), the only efficient way to access large volumes of
       | media is piracy. The media ecosystem doesn't allow anything else.
        
         | the-rc wrote:
         | Google has scans from Google Books, as well as all the ebooks
         | it sells on the Play Store.
        
           | lemoncookiechip wrote:
           | Wouldn't that still be piracy? They own the rights of
           | distribution, but do they (or Amazon) have the rights to use
           | said books for LLM training? And what rights would those even
           | be?
        
         | diggan wrote:
         | > There's just no other viable alternative for efficiently
         | acquiring large quantities of text data. [...] take a lot of
         | effort [...] isn't a thing that can be done efficiently [...]
         | only efficient way to access large volumes of media is piracy
         | 
         | Hypothetical: If the only way we could build AGI would be to
         | somehow read everyone's brain at least once, would it be worth
         | just ignoring everyone's wish regarding privacy one time to
         | suck up this data and have AGI moving forward?
        
           | nosbo wrote:
           | no
        
           | BriggyDwiggs42 wrote:
           | Could this agi cure cancer, and would it be in the hands of
           | the public? Then sure, otherwise nah.
        
         | IncreasePosts wrote:
         | Why would machine intelligence need an entire humanity's worth
         | of data to be machine intelligence? It seems like only a
         | training method that is really poor would need that much data.
        
         | aithrowawaycomm wrote:
         | I find it highly implausible that Meta doesn't have the
         | resources to obtain these legally. They could have reached out
         | to a publisher and ask to purchase ebooks in bulk - and if that
         | publisher says no, tough shit. The media ecosystem doesn't
         | exist for Big Tech to extract value from it!
         | 
         | "It would take a lot of effort to do it legally" is a pathetic
         | excuse for a company of Meta's size.
        
       ___________________________________________________________________
       (page generated 2025-01-20 23:01 UTC)