[HN Gopher] Authors Seek Meta's Torrent Client Logs and Seeding ...
___________________________________________________________________
Authors Seek Meta's Torrent Client Logs and Seeding Data in AI
Piracy Probe
Author : miki123211
Score : 38 points
Date : 2025-01-20 20:38 UTC (2 hours ago)
(HTM) web link (torrentfreak.com)
(TXT) w3m dump (torrentfreak.com)
| hnburnsy wrote:
| Wonder if Meta is running a one way Usenet host. Much better than
| torrents.
| LtdJorge wrote:
| The first rule of Usenet is: you do not talk about Usenet
| geor9e wrote:
| if it was meant to be kept secret it probably shouldnt have
| been put on the AOL home portal in 1994
| spokaneplumb wrote:
| People breaking the first rule wasn't enough for me to crack
| into the scene. The weird two-paid-services thing required to
| use it effectively--a search service of some kind, and your
| actual content provider--and the jankiness of the software
| and sites involved were enough to get me to give up, after
| spending some money but making no meaningful progress toward
| pirating anything.
|
| I started my piracy journey on Napster. I've done all the
| other biggies. I've done off-the-beaten-path stuff like IRC
| piracy channels. Private trackers. I have a soft spot for
| Windowmaker and was dumb enough to run Gentoo so long that I
| got kinda good at the "scary" deep parts of Linux sysadmin. I
| can deal with fiddliness and allegedly-ugly UI.
|
| Usenet piracy defeated me.
| heroprotagonist wrote:
| What's the lesson, hire contractors?
| kevingadd wrote:
| It's possible their friends in government will make this all go
| away if they ask nicely enough.
| pixelpoet wrote:
| Would $1m suffice?
| https://www.bbc.com/news/articles/c8j9e1x9z2xo
| edoceo wrote:
| That's the ante; gotta place the next wager.
| plagiarist wrote:
| The best ROI for the money is probably purchasing a
| SCOTUS justice.
| moshegramovsky wrote:
| Yeah I had a Facebook account until today.
|
| This whole thing copyright thing reminds me of when Mark
| Zuckerberg was mad that someone posted photos of the interior
| of his house or something.
| FireBeyond wrote:
| Try to use any of the big players training models and see how
| quickly they remember how much they value copyright.
| WhatsName wrote:
| You mean OpenAIs infamous "you shall not train on the output of
| our model" clause?
| rockemsockem wrote:
| It seemed obvious to me for a long time before modern LLM
| training that any sort of training of machine intelligence would
| have to rely on pirated content. There's just no other viable
| alternative for efficiently acquiring large quantities of text
| data. Buying millions of ebooks online would take a lot of
| effort, downloading data from publishers isn't a thing that can
| be done efficiently (assuming tech companies negotiated and threw
| money at them), the only efficient way to access large volumes of
| media is piracy. The media ecosystem doesn't allow anything else.
| the-rc wrote:
| Google has scans from Google Books, as well as all the ebooks
| it sells on the Play Store.
| lemoncookiechip wrote:
| Wouldn't that still be piracy? They own the rights of
| distribution, but do they (or Amazon) have the rights to use
| said books for LLM training? And what rights would those even
| be?
| diggan wrote:
| > There's just no other viable alternative for efficiently
| acquiring large quantities of text data. [...] take a lot of
| effort [...] isn't a thing that can be done efficiently [...]
| only efficient way to access large volumes of media is piracy
|
| Hypothetical: If the only way we could build AGI would be to
| somehow read everyone's brain at least once, would it be worth
| just ignoring everyone's wish regarding privacy one time to
| suck up this data and have AGI moving forward?
| nosbo wrote:
| no
| BriggyDwiggs42 wrote:
| Could this agi cure cancer, and would it be in the hands of
| the public? Then sure, otherwise nah.
| IncreasePosts wrote:
| Why would machine intelligence need an entire humanity's worth
| of data to be machine intelligence? It seems like only a
| training method that is really poor would need that much data.
| aithrowawaycomm wrote:
| I find it highly implausible that Meta doesn't have the
| resources to obtain these legally. They could have reached out
| to a publisher and ask to purchase ebooks in bulk - and if that
| publisher says no, tough shit. The media ecosystem doesn't
| exist for Big Tech to extract value from it!
|
| "It would take a lot of effort to do it legally" is a pathetic
| excuse for a company of Meta's size.
___________________________________________________________________
(page generated 2025-01-20 23:01 UTC)