[HN Gopher] Anthropic cut up millions of used books, and downloa...
       ___________________________________________________________________
        
       Anthropic cut up millions of used books, and downloaded 7M pirated
       ones - judge
        
       Author : pyman
       Score  : 365 points
       Date   : 2025-07-07 09:20 UTC (13 hours ago)
        
 (HTM) web link (www.businessinsider.com)
 (TXT) w3m dump (www.businessinsider.com)
        
       | pyman wrote:
       | Anthropic's cofounder, Ben Mann, downloaded million copies of
       | books from Library Genesis in 2021, fully aware that the material
       | was pirated.
       | 
       | Stealing is stealing. Let's stop with the double standards.
        
         | damnesian wrote:
         | oh well, the product has a cute name and will make someone a
         | billionaire, let's just give it the green light. who cares
         | about copyright in the age of AI?
        
         | originalvichy wrote:
         | At least most pirates just consume for personal use. Profiting
         | from piracy is a whole other level beyond just pirating a book.
        
           | pyman wrote:
           | Someone on Twitter said: "Oh well, P2P mp3 downloads,
           | although illegal, made contributions to the music industry"
           | 
           | That's not what's happening here. People weren't downloading
           | music illegally and reselling it on Claude.ai. And while P2P
           | networks led to some great tech, there's no solid proof they
           | actually improved the music industry.
        
             | drcursor wrote:
             | Let's not forget Spotify ;)
             | 
             | https://gizmodo.com/early-spotify-was-built-on-pirated-
             | mp3-f...
        
               | pyman wrote:
               | Those claims were never proved.
        
             | Imustaskforhelp wrote:
             | I really feel as if Youtube is the best sort of convenience
             | for music videos where most people watch ads whereas some
             | people can use an ad blocker.
             | 
             | I use an adblocker and tbh I think so many people on HN are
             | okay with ad blocking and not piracy when basically both
             | just block the end user from earning money.
             | 
             | I kind of believe that if you really like a software, you
             | really like something. Just ask them what their favourite
             | charity is and donate their or join their patreon/a direct
             | way to support them.
        
               | Workaccount2 wrote:
               | If you are someone who can think clearly, it's extremely
               | obvious that the conversation around copyright, LLMs,
               | piracy, and ad-blocking is
               | 
               | "What serves me personally the best for any given
               | situation" for 95% of people.
        
               | timeon wrote:
               | I think that critique of this case is not about piracy in
               | itself but how these companies are treated by courts vs.
               | how individuals are treated.
        
           | mnky9800n wrote:
           | I feel like profit was always a central motive of pirates. At
           | least from the historical documents known as, "The Pirates of
           | the Caribbean".
        
           | KoolKat23 wrote:
           | This isn't really profiting from piracy. They don't make
           | money off the raw input data. It's no different to consuming
           | for personal use.
           | 
           | They make money off the model weights, which is fair use (as
           | confirmed by recent case law).
        
             | j_w wrote:
             | This is absurd. Remove all of the content from the training
             | data that was pirated and what is the quality of the end
             | product now?
        
               | pyman wrote:
               | With Claude, people are paying Anthropic to access
               | answers that are generated from pirated books, without
               | the authors permission, credit, or compensation.
        
               | KoolKat23 wrote:
               | There is no copyright on knowledge.
               | 
               | If it outputs parts of the book verbatim then that's a
               | different story.
        
               | pyman wrote:
               | Let's don't change the focus of the debate.
               | 
               | Pirating 7 million books, remixing their content, and
               | using that to power Claude.ai is like counterfeiting 7
               | million branded products and selling them on your
               | personal website. The original creators don't get credit
               | or payment, and someone's profiting off their work.
               | 
               | All this happens while authors, many of them teachers,
               | are left scratching their heads with four kids to feed
        
               | KoolKat23 wrote:
               | That may be the case, but you'd have to have laws
               | changed.
        
               | SirMaster wrote:
               | >If it outputs parts of the book verbatim then that's a
               | different story.
               | 
               | But it does...
        
               | KoolKat23 wrote:
               | That's the law.
               | 
               | Please keep in mind, copyright is intended as a
               | compromise between benefit to society and to the
               | individual.
               | 
               | A thought experiment, students pirating textbooks and
               | applying that knowledge later on in their work?
        
               | j_w wrote:
               | When you say that's the law, as far as I'm aware a single
               | ruling by a lower court has been issued which upholds
               | that application. Hardly settled case law.
        
               | KoolKat23 wrote:
               | True, until then best to act as if it is the case.
               | 
               | In my opinion, it will be upheld.
               | 
               | Looking at what is stored and the manner which it is
               | stored. It makes sense that it's fair use.
        
               | j_w wrote:
               | We're talking about a summary judgement issued that has
               | not yet been appealed. That doesn't make it "settled."
               | 
               | If by "what is stored and the manner which it is stored"
               | is intended to signal model weights, I'm not sure what
               | the argument is? The four factors of copyright in no way
               | mention a storage medium for data, lossless or loss-y.
               | 
               | (1) the purpose and character of the use, including
               | whether such use is of a commercial nature or is for
               | nonprofit educational purposes; (2) the nature of the
               | copyrighted work; (3) the amount and substantiality of
               | the portion used in relation to the copyrighted work as a
               | whole; and (4) the effect of the use upon the potential
               | market for or value of the copyrighted work.
               | 
               | In my opinion, this will likely see a supreme court
               | ruling by the end of the decade.
        
               | KoolKat23 wrote:
               | The use is to train an AI model.
               | 
               | A trillion parameter SOTA model is not substantially
               | comprised of the one copyrighted piece. (If it was a
               | Harry Potter model trained only on Harry Potter books
               | this would be a different story).
               | 
               | Embeddings are not copy paste.
               | 
               | The last point about market impact would be where they
               | make their argument but it's tenuous. It's not the
               | primary use of AI models and built in prompts try to
               | avoid this, so it shouldn't be commonplace unless you're
               | jail breaking the model, most folk aren't.
        
           | mrcwinn wrote:
           | > At least most pirates just consume for personal use.
           | 
           | Easy for the pirate to say. Artists might argue their intent
           | was to trade compensation for one's personal enjoyment of the
           | work.
        
             | Workaccount2 wrote:
             | The gut punch of being a photographer selling your work on
             | display, someone walks by and lines up their phone to take
             | a perfect picture of your photograph, and then exclaims to
             | you "Your work is beautiful! I can't wait to print this out
             | and put it on my wall!"
        
             | jobs_throwaway wrote:
             | All the evidence shows that piracy is good for artists'
             | business. You make a good work, people are exposed to it
             | through piracy, and they end up buying more of your stuff
             | than they would otherwise. But keep crying about the
             | artist's plight
        
               | SketchySeaBeast wrote:
               | The way you've presented this, the evidence is just
               | "common sense", which isn't much evidence at all.
        
         | x3n0ph3n3 wrote:
         | Copyright infringement is not stealing.
        
           | 1oooqooq wrote:
           | actually, the Only time it's a (ethical) crime is when a
           | corporation does it at scale for profit.
        
           | pyman wrote:
           | Pirating a book and selling it on claude.ai is stealing, both
           | legally and morally.
        
             | zb3 wrote:
             | Who got robbed? Just because I'd pay for AI it doesn't mean
             | I'd buy these books.
        
               | pyman wrote:
               | You should ask the teachers who spent years writing those
               | books.
        
               | azangru wrote:
               | You keep saying the word "teachers"; but that word does
               | not appear in the text of the article. Why focus on the
               | teachers in particular?
               | 
               | Also, there are various incentives for teachers to
               | publish books. Money is just one of them (I wonder how
               | much revenue books bring to the teachers). Prestige and
               | academic recognition is another. There are probably
               | others still. How realistic is the depiction of a
               | deprived teacher whose livelihood depended on the books
               | he published once every several years?
        
               | zb3 wrote:
               | I did not ask them to write those books, and I wouldn't
               | buy those.
        
             | BlackFly wrote:
             | Making a copy differs from taking an existing object in all
             | aspects: literally, technically, legally and ethically.
             | Piracy is making a copy you have no legal right to.
             | Stealing is taking a physical object that you have no legal
             | right to. While the "no legal right to" seems the same
             | superficially, in practice the laws differ quite a bit
             | because the literal, technical and ethical aspects differ.
        
             | TiredOfLife wrote:
             | They are not selling it on claude.ai. If you can prove that
             | they are you will be rich.
        
             | thedevilslawyer wrote:
             | Where can I download Harry Potter on claude.ai pls?
        
               | slater wrote:
               | Why would you want to download a shitty book?
        
           | seydor wrote:
           | property infringement isn't either?
        
             | eviks wrote:
             | If you infringe by destroying property, then yes, it's not
             | stealing
        
           | impossiblefork wrote:
           | It's very similar to theft of service.
           | 
           | There's so many texts, and they're so sparse that if I could
           | copyright a work and never publish it, the restriction would
           | be irrelevant. The probability that you would accidentally
           | come upon something close enough that copyright was relevant
           | is almost infinitesimal.
           | 
           | Because of this copyright is an incredibly weak restriction,
           | and that it is as weak as it is shows clearly that any use of
           | a copyrighted work is due to the convenience that it is
           | available.
           | 
           | That is, it's about making use of the work somebody else has
           | done, not about that restricting you somehow.
           | 
           | Therefore copyright is much more legitimate than ordinary
           | property. Ordinary property, especially ownership of land,
           | can actually limit other people. But since copyright is so
           | sparse infringing on it is like going to world with near-
           | infinite space and picking the precise place where somebody
           | has planted a field and deciding to harvest from that
           | particular field.
           | 
           | Consequently I think copyright infringement might actually be
           | worse than stealing.
        
             | jpalawaga wrote:
             | you've created a very obvious category mistake in your
             | final summary by confusing intellectual property--which can
             | be copied at no penalty to an owner (except nebulous
             | 'alternate universe' theories)--with actual property, and a
             | farmer and his land, with a crop that cannot be enjoyed
             | twice.
             | 
             | you're saying copying a book is worse than robbing a farmer
             | of his food and/or livelihood, which cannot be replaced to
             | duplicated. Meanwhile, someone who copies a book does not
             | deprive the author of selling the book again (or a tasty
             | proceedings from harvest).
             | 
             | I can't say I agree, for obvious reasons.
        
               | impossiblefork wrote:
               | With this special infinite-land-land though, what's
               | special about the farmer's land is that he's expended
               | energy to make it that way, just as the author has
               | expended energy to find his text.
               | 
               | Just as the farmer obtains his livelihood from the
               | investment-of-energy-to-raise-crops-to-energy cycle the
               | author has his livelihood by the investment-of-energy-to-
               | finding-a-useful-work-to-energy cycle.
               | 
               | So he is in fact robbed in a very similar way.
        
               | jpalawaga wrote:
               | You're saying that a copy of a digital thing is the same
               | as the "only" of a physical thing. But that's not true.
               | You can't sell grain twice, but you can sell a movie many
               | times (especially when you account for format changes,
               | remasterings, platform locks, licensing for special
               | usecases like remixing, broadcasts, etc).
               | 
               | You'd have to steal the author's ownership of the
               | intellectual property in order for the comparison to be
               | valid, just as you stole ownership of his crop.
               | 
               | Separately, there is a reason why theft and copyright
               | infringement are two distinct concepts in law.
        
               | impossiblefork wrote:
               | The difference here though is that the copyright holder
               | sustains himself by the sales of his particular chosen
               | text, so it doesn't matter that the text can be
               | reproduced infinitely.
        
             | CaptainFever wrote:
             | > Consequently I think copyright infringement might
             | actually be worse than stealing.
             | 
             | I remember when piracy wasn't theft, and information wanted
             | to be free.
        
               | impossiblefork wrote:
               | So do I, then I found this reasoning I presented in my
               | comment and realised that piracy was actually quite bad.
               | 
               | Ordinary property is much worse than copyright, which is
               | both time limited and not necessarily obtained through
               | work, and which is much more limited in availability than
               | the number of sequences.
               | 
               | When someone owns land, that's actually a place you
               | stumble upon and can't enter, whereas you're not going to
               | ever stumble upon the story of even 'Nasse hittar en
               | stol' (swedish 'Nasse finds a chair') a very short book
               | for very small children.
        
         | Der_Einzige wrote:
         | Information wants to be free.
        
           | troyvit wrote:
           | Then why does Claude cost money?
        
         | dathinab wrote:
         | stealing with the intent to gain a unfair marked advantage so
         | that you can effectively kill any ethically legally correctly
         | acting company in a way which is very likely going to hurt many
         | authors through the products you create is far worse then just
         | stealing for personal use
         | 
         | that isn't "just" stealing, it's organized crime
        
         | 1970-01-01 wrote:
         | Let's get actual definitions of 'theft' before we leap into
         | double standards.
        
         | NoMoreNicksLeft wrote:
         | >Stealing is stealing.
         | 
         | Yes, but copying isn't stealing, because the person you "take"
         | from still has their copy.
         | 
         | If you're allowed to call copying _stealing_ , then I should be
         | allowed to call hysterical copyright rabblerousing _rape_. Quit
         | being a rapist, pyman.
        
         | kube-system wrote:
         | > Stealing is stealing. Let's stop with the double standards.
         | 
         | I get the sentiment, but that statement as is, is absurdly
         | reductive. Details matter. Even if someone takes merchandise
         | from a store without paying, their sentence will vary depending
         | on the details.
        
       | neo__ wrote:
       | Hopefully they were all good books at least.
        
         | pyman wrote:
         | they pirated the best ones, according to the authors
        
       | pyman wrote:
       | These are the people shaping the future of AI? What happened to
       | all the ethical values they love to preach about?
       | 
       | We've held China accountable for counterfeiting products for
       | decades and regulated their exports. So why should Anthropic be
       | allowed to export their products and services after engaging in
       | the same illegal activity?
        
         | lofaszvanitt wrote:
         | This is the underlying caste system coming to life right before
         | your eyes :D.
        
           | stephenitis wrote:
           | I think caste system is the wrong analogy here.
           | 
           | Comment is more about the pseudo ethical high ground
        
             | MangoToupe wrote:
             | Companies being above the law does create a stratified
             | system in this country for those who can benefit from said
             | companies and those who cannot. Call it what you like.
        
         | seydor wrote:
         | break things and move fast
        
         | benjiro wrote:
         | One rule for you, one rule for me ...
         | 
         | You never noticed the hypocrite behavior all over society?
         | 
         | * O, you drunk drive, big fine, lots of trouble. * O, you drunk
         | drive and are a senator, cop, mayor, ... Well, lets look the
         | other way.
         | 
         | * You have anger management issues and slam somebody to the
         | ground. Jail time. * You as a cop have anger management issues
         | and slams somebody to the ground. Well, paid time off while we
         | investigate and maybe a reprimand. Qualified immunity boy!
         | 
         | * You tax fraud for 10k, felony record, maybe jail time. * You
         | as a exec of a company do tax fraud for 100 million. After 10
         | years lawyering around, maybe you get something, maybe, ... o,
         | here is a fine of 5 million.
         | 
         | I am sorry but the idea of everybody being equal under the law
         | has always been a illusion.
         | 
         | We are holding China accountable for counterfeiting products
         | because it hurts OUR companies, and their income. But when its
         | "us vs us", well, then it becomes a bit more messy and in
         | general, those with the biggest backing (as in $$$, economic
         | value, and lawyers), tends to win.
         | 
         | Wait, if somebody steal my book, i can sue that person in
         | court, and get a payout (lawyers will cost me more but that is
         | not the point). If some AI company steals my book, well, the
         | chance you win is close to 1%, simply because lots of well paid
         | lawyers will make your winning hard to impossible.
         | 
         | Our society has always been based upon power, wealth and
         | influence. The more you have of it, the more you get away (or
         | reduced) with things, that gets other fined or jailed.
        
         | ffsm8 wrote:
         | > We've held China accountable for counterfeiting products for
         | decades and regulated their exports
         | 
         | We have? Are we from different multi-verses?
         | 
         | The one I've lived in to date has not done anything against
         | Chinese counterfeits beyond occasionally seizing counterfeit
         | goods during import. But that's merely occasionally enforcing
         | local counterfeit law, a far cry from punishing the entity
         | producing it.
         | 
         | As a matter of fact, the companies started outsourcing
         | _everything_ to China, making further IP theft and quasi-copies
         | even easier
        
           | Workaccount2 wrote:
           | I was gonna say, the enforcement is so weak that it's not
           | even really worth it to pursue consumer hardware here in the
           | US. Make product that is a hit, patent it, and still 1 month
           | later IYTUOP will be selling an identical copy for 1/3rd the
           | price on Amazon.
        
             | delfinom wrote:
             | Patent enforcement requires the patent holder to go after
             | violators. The said thing is, there are grounds to sue
             | Amazon facilitating it, just nobody has had the money to do
             | it. And no big company ever will because of the threat of
             | being locked out of AWS.
             | 
             | It's quite the mafia operation over at Amazon.
        
         | wmf wrote:
         | The unethical ones didn't buy any books.
        
         | bmitc wrote:
         | Silicon Valley has always been the antithesis of ethics. It's
         | foundations are much more right wing and libertarian, along the
         | extremist lines.
        
         | carlosjobim wrote:
         | Why is it unethical of them to use the information in all these
         | books? They are clearly not reselling the books in any way,
         | shape, or form. The information itself in a book can never be
         | copyrighted. You can also publish and sell material where you
         | quote other books within it.
        
       | aaron695 wrote:
       | Good, this is what Aaron Swartz was fighting for.
       | 
       | Against companies like Elsevier locking up the worlds knowledge.
       | 
       | Authors are no different to scientists, many had government
       | funding at one point, and it's the publishing companies that got
       | most of the sales.
       | 
       | You can disagree and think Aaron Swartz was evil, but you can't
       | have both.
       | 
       | You can take what Anthropic have show you is possible and do this
       | yourself now.
       | 
       | isohunt: freedom of information
        
       | ramon156 wrote:
       | Pirate and pay the fine is probably hell of a lot cheaper than
       | individually buying all these books. I'm not saying this is
       | justified, but what would you have done in their situation?
       | 
       | Sayi "they have the money" is not an argument. It's about the
       | amount of effort that is needed to individually buy, scan,
       | process millions of pages. If that's done for you, why re-do it
       | all?
        
         | TimorousBestie wrote:
         | 150K per work is the maximum fine for willful infringement
         | (which this is).
         | 
         | 105B+ is more than Anthropic is worth on paper.
         | 
         | Of course they're not going to be charged to the fullest extent
         | of the law, they're not a teenager running Napster in the early
         | 2000s.
        
           | voxic11 wrote:
           | Even if they don't qualify for willful infringement damages
           | (lets say they have a good faith belief their infringement
           | was covered by fair use) the standard statutory damages for
           | copyright infringement are $750-$30,000 per work.
        
           | eikenberry wrote:
           | Plus they did it with a profit motive which would entail
           | criminal proceedings.
        
           | dragonwriter wrote:
           | > 150K per work is the maximum fine for willful infringement
           | 
           | No, its not.
           | 
           | It's the maximum _statutory damages_ for willful
           | infringement, which this _has not_ be adjudicated to be. it
           | is not a fine, its an alternative to basis of recovery to
           | actual damages + infringers profits attributable to the
           | infringement.
           | 
           | Of course, there's also a very wide range of statutory
           | damages, the minimum (if it is not "innocent" infringement)
           | is $750/work.
           | 
           | > 105B+ is more than Anthropic is worth on paper.
           | 
           | The actual amount of 7 million works times $150,000/work is
           | $1.05 trillion, not $105 billion.
        
             | TimorousBestie wrote:
             | > It's the maximum statutory damages for willful
             | infringement, which this has not be adjudicated to be. it
             | is not a fine, its an alternative to basis of recovery to
             | actual damages + infringers profits attributable to the
             | infringement.
             | 
             | Yeah, you're probably right, I'm not a lawyer. The point is
             | that it doesn't matter what number the law says they should
             | pay, Anthropic can afford real lawyers and will therefore
             | only pay a pittance, if anything.
             | 
             | I'm old enough to remember what the feds did to Aaron
             | Schwarz, and I don't see what Anthropic did that was so
             | different, ethically speaking.
        
         | pyman wrote:
         | The problem with this thinking is that hundreds of thousands of
         | teachers who spent years writing great, useful books and
         | sharing knowledge and wisdom probably won't sue a billion
         | dollar company for stealing their work. What they'll likely do
         | is stop writing altogether.
         | 
         | I'm against Anthropic stealing teacher's work and discouraging
         | them from ever writing again. Some teachers are already saying
         | this (though probably not in California).
        
           | lofaszvanitt wrote:
           | They won't be needed anymore, once singularity is reached.
           | This might be their thought process. This also exemplifies
           | that the loathed caste system found in India is indeed in
           | place in western societies.
           | 
           | There is no equality, and seemingly there are worker bees who
           | can be exploited, and there are privileged ones, and of
           | course there are the queens.
        
             | pyman wrote:
             | :D
             | 
             | Note: My definition of singularity isn't the one they use
             | in San Francisco. It's the moment founders who stole the
             | life's work of thousands of teachers finally go to prison,
             | and their datacentres get seized.
        
               | lofaszvanitt wrote:
               | You can bet that this never gonna happen...
        
               | covercash wrote:
               | When the rich and powerful face zero consequences for
               | breaking laws and ignoring the social contracts that keep
               | our society functioning, you wind up with extreme
               | overcorrections. See Luigi.
        
               | achierius wrote:
               | How extreme is that, really? Not to justify murder: that
               | is clearly bad. But "killing one man" is evidently
               | something we, as a society, consider an "acceptable side-
               | effect" when a corporation does it -- hell, you can kill
               | thousands and get away scot-free if you're big enough.
               | 
               | Luigi was peanuts in comparison.
               | 
               | "THERE were two "Reigns of Terror," if we would but
               | remember it and consider it; the one wrought murder in
               | hot passion, the other in heartless cold blood; the one
               | lasted mere months, the other had lasted a thousand
               | years; the one inflicted death upon ten thousand persons,
               | the other upon a hundred millions; but our shudders are
               | all for the "horrors" of the minor Terror, the momentary
               | Terror, so to speak; whereas, what is the horror of swift
               | death by the axe, compared with lifelong death from
               | hunger, cold, insult, cruelty, and heart-break? What is
               | swift death by lightning compared with death by slow fire
               | at the stake? A city cemetery could contain the coffins
               | filled by that brief Terror which we have all been so
               | diligently taught to shiver at and mourn over; but all
               | France could hardly contain the coffins filled by that
               | older and real Terror--that unspeakably bitter and awful
               | Terror which none of us has been taught to see in its
               | vastness or pity as it deserves."
               | 
               | - Mark Twain
        
             | SketchySeaBeast wrote:
             | > They won't be needed anymore, once singularity is
             | reached.
             | 
             | And it just so happens that that belief says they can burn
             | whatever they want down because something in the future
             | might happen that absolves them of those crimes.
        
           | CuriouslyC wrote:
           | If you care so little about writing that AI puts you off it,
           | TBH you're probably not a great writer anyhow.
           | 
           | Writers that have an authentic human voice and help people
           | think about things in a new way will be fine for a while yet.
        
             | 4b11b4 wrote:
             | Yeah, people will still want to write. They might need new
             | ways to monetize it... that being said, even if people
             | still want to write they may not consider it a viable path.
             | Again, have to consider other monetization.
        
           | glimshe wrote:
           | That will be sad, although there will still be plenty of
           | great people who will write books anyway.
           | 
           | When it comes to a lot of these teachers, I'll say, copyright
           | work hand in hand with college and school course book
           | mandates. I've seen _plenty_ of teachers making crazy money
           | off students ' backs due to these mandates.
           | 
           | A lot of the content taught in undergrad and school hasn't
           | changed in decades or even centuries. I think we have all the
           | books we'll ever need in certain subjects already, but
           | copyright keeps enriching people who write new versions of
           | these.
        
           | NoMoreNicksLeft wrote:
           | Stealing? In what way?
           | 
           | Training a generative model on a book is the mechanical
           | equivalent of having a human read the book and learn from it.
           | Is it stealing if a person reads the book and learns from it?
        
             | blocko wrote:
             | Depends on how closely that person can reproduce the
             | original work without license or attribution
        
               | lcnPylGDnU4H9OF wrote:
               | It actually depends on whether or not they reproduce it
               | and especially what they do with the copy after making
               | it.
        
           | js8 wrote:
           | > The problem with this thinking is that hundreds of
           | thousands of teachers who spent years writing great, useful
           | books and sharing knowledge and wisdom probably won't sue a
           | billion dollar company for stealing their work. What they'll
           | likely do is stop writing altogether.
           | 
           | I think this is a fantasy. My father cowrote a Springer book
           | about physics. For the effort, he got like $400 and 6 author
           | copies.
           | 
           | Now, you might say he got a bad deal (or the book was bad),
           | but I don't think hundreds of thousands of authors do
           | significantly better. The reality is, people overwhelmingly
           | write because they want to, not because of money.
        
         | glimshe wrote:
         | Isn't "pirating" a felony with jail time, though? That's what I
         | remember from the FBI warning I had to see at the beginning of
         | every DVD I bought (but not "pirated" ones).
        
           | voxic11 wrote:
           | Yes criminal copyright infringement (willful copyright
           | infringement done for commercial gain or at a large scale) is
           | a felony.
        
         | kevingadd wrote:
         | Google did it the legal way with Google Books, didn't they?
        
         | maeln wrote:
         | If you wanted to be legit with 0 chance of going to court, you
         | would contact publisher and ask to pay a license to get access
         | to their catalog for training, and negotiate from that point.
         | 
         | This is what every company using media are doing (think
         | Spotify, Netflix, but also journal, ad agency, ...). I don't
         | know why people in HN are giving a pass to AI company for this
         | kind of behavior.
        
           | ohashi wrote:
           | Because they are mostly software developers who think it's
           | different because it impacts them.
        
           | CaptainFever wrote:
           | > I don't know why people in HN are giving a pass to AI
           | company for this kind of behavior.
           | 
           | As mentioned in The Fucking Article, there's a legal
           | difference between training an AI which largely doesn't
           | repeat things verbatim (ala Anthropic) and redistributing
           | media as a whole (ala Spotify, Netflix, journal, ad agency).
        
         | suyjuris wrote:
         | Just downloading them is of course cheaper, but it is worth
         | pointing out that, as the article states, they did also buy
         | legitimate copies of millions of books. (This includes all the
         | books involved in the lawsuit.) Based on the judgement itself,
         | Anthropic appears to train only on the books legitimately
         | acquired. Used books are quite cheap, after all, and can be
         | bought in bulk.
        
           | asadotzler wrote:
           | Buying a book is not license to re-sell that content for your
           | own profit. I can't buy a copy of your book, make a million
           | Xeroxes of it and sell those. The license you get when you
           | buy a book is for a single use, not a license to do what ever
           | you want with the contents of that book.
        
             | thedevilslawyer wrote:
             | What are you on about - the judge has literally said this
             | was not resell, and is transformative and fair use.
        
             | suyjuris wrote:
             | Yes, of course! In this case, the judge identified three
             | separate instances of copying: (1) downloading books
             | without authorisation to add to their internal library, (2)
             | scanning legitimately purchased books to add to their
             | internal library, and (3) taking data from their internal
             | library for the purposes of training LLMs. The purchasing
             | part is only relevant for (2) -- there the judge ruled that
             | this is fair use. This makes a lot of sense to me, since no
             | additional copies were created (they destroyed the physical
             | books after scanning), so this is just a single use, as you
             | say. The judge also ruled that (3) is fair use, but for a
             | different reason. (They declined to decide whether (1) is
             | fair use at this point, deferring to a later trial.)
        
         | darkoob12 wrote:
         | This is not about paying for a single copy. It would still be
         | wrong even if they have bought every single one of those books.
         | It is a form of plagiarism. The model will use someone else's
         | idea without proper attribution.
        
           | jeroenhd wrote:
           | Legally speaking, we don't know that yet. Early signs are
           | pointing at judges allowing this kind of crap because it's
           | almost impossible for most authors to point out what part of
           | the generated slop was originally theirs.
        
         | tmaly wrote:
         | At minimum they should have to buy the book they are deriving
         | weights from.
        
           | SirMaster wrote:
           | But should the purchase be like a personal license? Or like a
           | commercia license that costs way more?
           | 
           | Because for example if you buy a movie on disc, that's a
           | personal license and you can watch it yourself at home. But
           | you can't like play it at a large public venue that sell
           | tickets to watch it. You need a different and more expensive
           | license to make money off the usage of the content in a
           | larger capacity like that.
        
         | bmitc wrote:
         | > I'm not saying this is justified, but what would you have
         | done in their situation?
         | 
         | Individuals would have their lives ruined either from massive
         | fines or jail time.
        
         | blibble wrote:
         | > Pirate and pay the fine is probably hell of a lot cheaper
         | than individually buying all these books.
         | 
         | $500,000 per infringement...
        
           | jandrese wrote:
           | And the crazy thing is that might be cheaper when you
           | consider the alternative is to have your lawyers negotiate
           | with the lawyers for the publishing companies for the right
           | to use the works as training data. Not only is it many many
           | billable hours just to draw up the contract, but you can be
           | sure that many companies would either not play ball or set
           | extremely high rates. Finally, if the publishing companies
           | did bring a suit against Anthropic they might be asked to
           | prove each case of infringement, basically to show that a
           | specific work was used in training, which might be difficult
           | since you can't reverse a model to get the inputs. When
           | you're a billion dollar company it's much easier to get the
           | courts to take your side. This isn't like the music companies
           | suing teenagers who had a Kazaa account.
        
       | tliltocatl wrote:
       | If the AI movement will manage to undermine Imaginary Property,
       | it would redeem it's externalities threefold.
        
         | 57473m3n7Fur7h3 wrote:
         | I don't think that's gonna happen. I think they will manage to
         | get themselves out of trouble for it, while the rest of us will
         | still face serious problems if we are caught torrenting even
         | one singular little book.
        
           | tliltocatl wrote:
           | Even so, would be hard to prove that this particular little
           | book wasn't generated by Claude (oopsie, it happens to be a
           | verbatim copy of a copyrighted work, that happens sometimes,
           | those pesky LLMs).
        
             | pyman wrote:
             | You just need to audit their system. Shouldn't take more
             | than a couple of hours.
        
           | 2OEH8eoCRo0 wrote:
           | The Ocean Full of Bowling Balls
        
           | CaptainFever wrote:
           | It's already quite widespread and likely legal for average
           | people to train AI models on copyrighted material, in the
           | open weight AI communities like SD and LocalLLaMa.
           | 
           | Please, please differentiate between pirating books (which
           | Anthrophic is liable for, and is still illegal) and training
           | on copyrighted material (which was found to be legal, for
           | both corporations and average people).
        
         | ttoinou wrote:
         | It would be great, but I think some are worried that new AI
         | BigTech will find a way to continue enforcing IP on the rest of
         | society while it won't exist for them
        
           | Imustaskforhelp wrote:
           | I think that we are worried because I think that's exactly
           | what's going to happen/ is happening.
        
         | karel-3d wrote:
         | That would render GPL and friends redundant too... copyleft
         | depends on copyright.
        
           | CaptainFever wrote:
           | Copyleft nullifies copyright. Abolishing copyright and adding
           | right to repair laws (mandatory source files) would give the
           | same effect as everyone using copylefted licenses.
        
         | bayindirh wrote:
         | What are your feelings about how the small fish is stripped of
         | their arts, and their years of work becomes just a prompt?
         | Mainly comic artists and small musicians who are doing things
         | they like and putting out for people, but not for much money?
        
           | tliltocatl wrote:
           | "But think about the children". The copyright system is doing
           | too much damage to culture and society. Yes, it does provides
           | a pond for some small fish, but the overall damage outweighs
           | this. Like the fact that first estate provided sustainable
           | for arts and crafts to flourish doesn't make the ancient
           | regime any less screwed up.
        
             | bayindirh wrote:
             | I think I have worded my question wrong. I asked about not
             | about how AI affects the financials of these smaller
             | artists, but their wellbeing in general.
             | 
             | There are many small artists who do this not for money, but
             | for fun and have their renowned styles. Even their styles
             | are ripped off by these generative AI companies and turned
             | into a slot machine to earn money for themselves. These
             | artists didn't consent to that, and this affects their
             | (mental) well-beings.
             | 
             | With that context in mind, what do you think about these
             | people who are not in this for money is ripped out of their
             | years of achievement and their hard work exploited for
             | money by generative AI companies?
             | 
             | It's not about IP (with whatever expansion you prefer) or
             | laws, but ethics in general.
             | 
             | Substitute comics for any medium. Code, music, painting,
             | illustration, literature, short movies, etc.
        
               | CamperBob2 wrote:
               | (Shrug) If you want things to stay the same, both art and
               | technology are bad career choices.
        
               | bayindirh wrote:
               | (Huh) What if you are in the field to advance it, and
               | somebody steals your work and claims it as their own?
               | 
               | e.g.: https://news.ycombinator.com/item?id=44460552
        
               | CamperBob2 wrote:
               | Bummer
        
               | tliltocatl wrote:
               | I see your point, "AI art" sucks in general and this is
               | ethically sketchy as hell, but AIAK style copying has
               | never been covered by copyright in the first place. Yea,
               | it sucks to be alienated form your works. That's one of
               | the externalites I mentioned in the original comment. But
               | there is simply no remedy there. That's how the reality
               | is.
        
               | bayindirh wrote:
               | Thanks for your answer, and taking your time for writing
               | it!
               | 
               | Yes, style copying is generally considered legal, but as
               | another commenter posted in a related thread "scale
               | matters".
               | 
               | Maybe this will be reconsidered in the near future as the
               | scale is in a much more different level with Generative
               | AI. While there can be no technological solution to this
               | (since it's a social problem to begin with), maybe public
               | opinion about this issue will evolve over time.
               | 
               | To be crystal clear: I'm not against the tech. I'm
               | against abusing and exploiting people for solely monetary
               | profit.
        
               | frozenseven wrote:
               | (1) You can't copyright an art style. That's not a thing.
               | 
               | (2) Once you make something publicly available, anyone
               | can learn from it. No consent necessary.
               | 
               | (3) Being upset does not grant you special privileges
               | under the law.
               | 
               | (4) If you don't like the idea of paying for AI art, free
               | software is both plentiful and competitive with just
               | about anything proprietary.
        
           | protocolture wrote:
           | >Mainly comic artists and small musicians who are doing
           | things they like and putting out for people, but not for much
           | money?
           | 
           | The number of these artists I have seen receiving some bogus
           | DMCA takedown notice for fan art is crazy.
           | 
           | I saw a bloke give away some of his STL's because he received
           | a takedown request from games workshop and didnt have the
           | funds to fight it.
           | 
           | Its not that I want small artists to lose, its that I want
           | them to gain access to every bloody copyright and trademark
           | so they are more free to create.
           | 
           | Shit Conde Nast managed to pull something like 400 pulps off
           | the market, so they didnt interfere with their newly launched
           | James Patterson collaborations.
        
         | pxc wrote:
         | It's true that intellectual property is a flawed and harmful
         | mechanism for supporting creative work, and it needs to change,
         | but I don't think ensuring a positive outcome is as simple as
         | this. Whether or not such a power struggle between corporate
         | interests benefits the public rather than just some companies
         | will be largely accidental.
         | 
         | I do support intellectual property reform that would be
         | considered radical by some, as I imagine you do. But my highest
         | hopes for this situation are more modest: if AI companies are
         | told that their data must be in the public domain to train
         | against, we will finally have a powerful faction among
         | capitalists with a strong incentive to push back against the
         | copyright monopolists when it comes to the continuous renewal
         | of copyright terms.
         | 
         | If the "path of least resistance" for companies like Google,
         | Microsoft, and Meta becomes enlarging the public domain, we
         | might finally begin to address the stagnation of the public
         | domain, and that could be a good thing.
         | 
         | But I think even such a modest hope as that one is unlikely to
         | be realized. :-\
        
         | Der_Einzige wrote:
         | Yup.
         | 
         | My response to this whole thread is just "good"
         | 
         | Aaron Swartz is a saint and a martyr.
        
         | LtWorf wrote:
         | It will undermine it only for the rich owner of AI companies,
         | not for everyone.
        
       | Lionga wrote:
       | Based on the fact people went to jail for downloading some music
       | or movies, this guy will face a lifetime in prison for 7 million
       | books that he then used for commercial profit right?
       | 
       | Right guys we don't have rules for thee but not for me in the
       | land of the free?
        
       | 1oooqooq wrote:
       | Aaron Swartz rolling
        
         | pyman wrote:
         | He downloaded millions of academic articles and the government
         | charged him with multiple felonies.
         | 
         | The difference is, Aaron Swartz wasn't planning to build
         | massive datacenters with expensive Nvidia servers all over the
         | world.
        
           | mikewarot wrote:
           | >the government charged him with multiple felonies.
           | 
           | This was the result of a cruel and zealous overreach by the
           | prosecutor to try to advance her political career. It should
           | never have gone that far.
           | 
           | The failure of MIT to rally in support of Aaron will never be
           | forgiven.
        
             | pyman wrote:
             | I agree
        
           | omnimus wrote:
           | It's even worse considering all he downloaded was in public
           | domain so it was much less problematic considering copyright.
           | 
           | Lesson is simple. If you want to break a law make sure it is
           | very profitable because then you can find investors and get
           | away with it. If you play robin hood you will be met with a
           | hammer.
        
       | dandanua wrote:
       | Same did Meta and probably other big companies. People who praise
       | AGI are very short sighted. It will ruin the world with our
       | current morals and ethics. It's like a nuclear weapon in the
       | hands of barbarians (shit, we have that too, actually).
        
       | booleandilemma wrote:
       | So if I'm working on an LLM can I just steal millions of
       | copyrighted books? Is that how our farcical justice system works?
        
         | famahar wrote:
         | Make sure you have a few billion dollars ready so you can pay a
         | few million on the lawsuits. A volcano getting a cup of water
         | poured into it.
        
       | marapuru wrote:
       | Apparently it's a common business practice. Spotify (even though
       | I can't find any proof) seems to have build their software and
       | business on pirated music. There is some more in this Article
       | [0].
       | 
       | https://torrentfreak.com/spotifys-beta-used-pirate-mp3-files...
       | 
       | Funky quote:
       | 
       | > Rumors that early versions of Spotify used 'pirate' MP3s have
       | been floating around the Internet for years. People who had
       | access to the service in the beginning later reported downloading
       | tracks that contained 'Scene' labeling, tags, and formats, which
       | are the tell-tale signs that content hadn't been obtained
       | officially.
        
         | motbus3 wrote:
         | They had a second company (which I don't remember the name)
         | that allowed users to backup and share their music. When they
         | were exposed they dug that as deep as they could
        
           | pyman wrote:
           | No. There's no credible evidence Spotify had any secret
           | second company that allowed users to back up and share music
           | without authorisation
        
         | pyman wrote:
         | It was the opposite. Their mission was to combat music piracy
         | by offering a better, legal alternative.
         | 
         | Daniel Ek said: "my mission is to make music accessible and
         | legal to everyone, while ensuring artists and rights holders
         | got paid"
         | 
         | Also, the Swedish government has zero tolerance for piracy.
        
           | pyman wrote:
           | I know this might come as a shock to those living in San
           | Francisco, but things are different in other parts of the
           | world, like Uruguay, Sweden and the rest of Europe. From what
           | I've read, the European committee actually cares about
           | enforcing the law.
        
           | eviks wrote:
           | Mission is just words, they can _mean_ the opposite of deeds,
           | but they can 't _be_ the opposite, they live in different
           | realms.
        
         | KoolKat23 wrote:
         | There's plenty of startups gone legitimate.
         | 
         | Society underestimates the chasm that exists between an idea
         | and raising sufficient capital to act on those ideas.
         | 
         | Plenty of people have ideas.
         | 
         | We only really see those that successfully cross it.
         | 
         | Small things EULA breaches, consumer licenses being used
         | commercially for example.
        
           | pyman wrote:
           | There's no credible evidence Spotify built their company and
           | business on pirated music.
           | 
           | This is a narrative that gets passed around in certain
           | circles to justify stealing content.
        
             | YPPH wrote:
             | "Stealing" isn't an apt term here. Stealing a thing
             | permanently deprives the owner of the thing. What you're
             | describing is copyright infringement, not stealing.
             | 
             | In this context, stealing is often used as a pejorative
             | term to make piracy sound worse than it is. Except for mass
             | distribution, piracy is often regarded as a civil wrong,
             | and not a crime.
        
               | KoolKat23 wrote:
               | Best/most succinct explanation I've seen to date.
        
               | bumby wrote:
               | I think you make a good point, but there is some irony in
               | pointing out the distinction between colloquial and legal
               | use of the term "stealing" while also misusing the term
               | "piracy" to describe legal matters.
               | 
               | It would be more clear if you stick to either legal or
               | colloquial variants, instead of switching back and forth.
               | (Tbf, the judge in this case also used the term "piracy"
               | colloquially).
        
               | seadan83 wrote:
               | Then it is not possible to 'steal' an idea? Afaik 'to
               | steal'is simply to take without permission. If the thing
               | is abstract, then you might not have deprived the
               | original owner of that thing. If the thing is a physical
               | object, then the implication is tou now have physical
               | possession (in which case your definition seemingly
               | holds)
               | 
               |  _edit /addendum_: considering this a bit more - the
               | extent to which the original party is deprived of the
               | stolen thing is pertinent for awarding damages. For
               | example, imagine a small entity stealing from a large
               | one, like a small creator steals dungeon and dragons
               | rules. That doesn't deprive Hasbro of DnD, but it is
               | still theft (we're assuming a verbatim copy here lifted
               | directly from DnD books)
               | 
               | The example that I was pondering were shows in russia
               | that were almost literally "the sampsons." Did that stop
               | the Simpson's from airing in the US, its primary market?
               | No, but it was still theft, something was taken without
               | permission.
        
             | lmm wrote:
             | > There's no credible evidence Spotify built their company
             | and business on pirated music.
             | 
             | That's a statement carefully crafted to be impossible to
             | disprove. Of course they shipped pirated music (I've seen
             | the files). Of course anyone paying attention knew. Nothing
             | in the music industry was "clean" in those days. But, sure,
             | no credible evidence because any evidence anyone shows you
             | you'll decide is not credible. It's not in anyone's
             | interests to say anything and none of it matters.
        
           | hinterlands wrote:
           | The problem is that these "small things" are not necessarily
           | small if you're an individual.
           | 
           | If you're an individual pirating software or media, then from
           | the rights owners' perspective, the most rational thing to do
           | is to make an example of you. It doesn't happen everyday, but
           | it _does_ happen and it can destroy lives.
           | 
           | If you're a corporation doing the same, the calculation is
           | different. If you're small but growing, future revenues are
           | worth more than the money that can be extracted out of you
           | right now, so you might get a legal nastygram _with an offer
           | of a reasonable payment to bring you into compliance_. And if
           | you 're already big enough to be scary, litigation might be
           | just too expensive to the other side even if you answer the
           | letter with "lol, get lost".
           | 
           | Even in the worst case - if Anthropic loses and the company
           | is fined or even shuttered (unlikely) - the people who
           | participated in it are not going to be personally liable and
           | they've in all likelihood already profited immensely.
        
             | KoolKat23 wrote:
             | I agree, that was the point I was trying to make. It seems
             | small but until the business is up and running at
             | sufficient scale, the costs can be insurmountable.
             | 
             | And the system set up by society doesn't truly account for
             | this or care.
        
           | dathinab wrote:
           | but it's not some small things
           | 
           | but systematic wide spread big things and often many of them,
           | giving US giant a unfair combative advantage
           | 
           | and don't think if you are a EU company you can do the same
           | in the US, nop nop
           | 
           | but naturally the US insist that US companies can do that in
           | the EU and complain every time a US company is fined for not
           | complying for EU law
        
           | Barrin92 wrote:
           | >Society underestimates the chasm that exists between an idea
           | and raising sufficient capital to act on those ideas.
           | 
           | The AI sector, famously known for its inability to raise
           | funding. Anthropic has in the last four years raised 17
           | billion dollars
        
             | KoolKat23 wrote:
             | Only once chatgpt 3.5 was released...
             | 
             | Other industries do not have it this easy.
        
           | jowea wrote:
           | Uber
        
         | pjc50 wrote:
         | "recording obtained unofficially" and "doesn't have rights to
         | the recording" are separate things. So they could well have got
         | a license to stream a publisher's music but that didn't come
         | with an actual copy of some/all of the music.
        
         | techjamie wrote:
         | Crunchyroll was originally an anime piracy site that went legit
         | and started actually licensing content later. They started in
         | mid-2006, got VC funding in 2008, then made their first
         | licensing deal in 2009.
         | 
         | https://www.forbes.com/2009/08/04/online-anime-video-technol...
         | 
         | https://venturebeat.com/business/crunchyroll-for-pirated-ani...
        
           | haiku2077 wrote:
           | Good Old Games started out with the founders selling pirated
           | games on disc at local markets.
        
             | techjamie wrote:
             | Pirated games translated to Polish if possible, because
             | game devs weren't catering to the market with translations,
             | and Poland didn't respect foreign copyright.
        
           | Cyph0n wrote:
           | Yep, they were huge too - virtually anyone who watched free
           | anime back then would have known about them.
           | 
           | My theory is that once they saw how much traffic they were
           | getting, they realized how big of a market (subbed/dubbed)
           | anime was.
        
           | Shank wrote:
           | And now Crunchyroll is owned by (through a lot of companies,
           | like Aniplex of America, Aniplex, A1 Pictures) Sony, who
           | produces a large amount of anime!
        
         | dathinab wrote:
         | not just Spotify pretty much any (most?) current tech giant was
         | build by
         | 
         | - riding a wave of change
         | 
         | - not caring too much about legal constraints (or like they
         | would say now "distrupting" the market, which very very often
         | means doing illigal shit which beings them far more money then
         | any penalties they will ever face from it)
         | 
         | - or caring about ethics too much
         | 
         | - and for recent years (starting with Amazone) a lot of
         | technically illegal financing (technically undercutting
         | competitors prices long term based on money from else where
         | (e.g. investors) is unfair competitive advantage
         | (theoretically) clearly not allowed by anti monopoly laws. And
         | before you often still had other monopoly issues (e.g. see
         | wintel)
         | 
         | So yes not systematic not complying with law to get unfair
         | competitive advantage knowing that many of the laws are on the
         | larger picture toothless when applied to huge companies is
         | bread and butter work of US tech giants
        
           | benced wrote:
           | As you point out, they mostly did this before they were large
           | companies (where the public choice questions are less
           | problematic). Seems like the breaking of these laws was good
           | for everybody.
        
             | FirmwareBurner wrote:
             | _> Seems like the breaking of these laws was good for
             | everybody._
             | 
             | Are all music creators better off now than before Spotify?
        
               | megaman821 wrote:
               | The music pie is bigger now but it is split between more
               | people. Spotify brings in the most revenue for musicians
               | as a whole.
        
               | oblio wrote:
               | Is that why the biggest source of income for musicians
               | these days are live shows? Streaming basically killed
               | recording income for 99.9999% of musicians.
        
               | dathinab wrote:
               | Yeah, but like another post said it killed a lot of other
               | income streams.
               | 
               | And Spotify is a bad example as it ran into another psudo
               | monopoly with very unreasonable/unhealthy power (the few
               | large music labels holding rights to the majority of main
               | stream music).
               | 
               | They pretty much forced very bad terms onto Spotify which
               | is to some degree why Spotify is pushing podcasts, as
               | they can't be long term profitable with Music (raising
               | prices doesn't help if the issue is a percent cut which
               | rises too :/ )
        
             | dathinab wrote:
             | they where already big when they systematically broke this
             | laws
             | 
             | breaking this laws is what lifted them from big, to supper
             | marked dominant to a point where they have monopoly like
             | power
             | 
             | that is _never_ good for everyone, or even good for the
             | majority long term
             | 
             | what is good for everyone (but a few rich people and
             | sometimes the US government) is proper fair competition. It
             | drives down prices and allows people to vote with their
             | money, a it is a corner stone of the American dream it
             | pushes innovation and makes sure a country isn't left
             | behind. Monopoly like companies on the other hand tend to
             | have exactly the other effect, higher prices (long term),
             | corruption, stagnating innovation, and a completely
             | shattered American sound pretty bad for the majority of
             | Americans.
        
         | Workaccount2 wrote:
         | The common meme is that megacorps are shamelessly criminalistic
         | organizations that get away with doing anything they can to
         | maximize profits, while true in some regard, _totally pales in
         | comparison to the illegal things small businesses and start-ups
         | do_.
        
         | reaperducer wrote:
         | _Apparently it 's a common business practice._
         | 
         | It's not a common business practice. That's why it's considered
         | newsworthy.
         | 
         | People on the internet have forgotten that the news doesn't
         | report everyday, normal, common things, or it would be nothing
         | but a listing of people mowing their lawns or applying for
         | business loans. The reason something is in the news is because
         | it is unusual or remarkable.
         | 
         | "I saw it online, so it must happen all the time" is a dopy
         | lack of logic that infects society.
        
           | marapuru wrote:
           | You are right on that. I'll edit my post to reflect that.
           | 
           | Edit: Apologies, I can't edit it anymore.
        
         | lysace wrote:
         | You are missing the point. Spotify had permission from the
         | copyright holders and/or their national proxies to use those
         | songs in a limited beta in Sweden. They didn't have access to
         | clean audio data directly from the record companies, so in many
         | cases they used pirated rips instead.
         | 
         | What you really should be asking is whether they infringed on
         | the copyrights of the rippers. /s
        
         | pembrook wrote:
         | It wasn't just the content being pirated, but the early Spotify
         | UI was actually a 1:1 copy of Limewire.
        
         | NoMoreNicksLeft wrote:
         | This isn't as meaningful as it sounds. Nintendo was apparently
         | using scene roms for one of the official emulators on Wii (I
         | think?). Spotify might have received legally-obtained mp3s from
         | the record companies that were originally pulled from Napster
         | or whatever, because the people who work for record companies
         | are lazy hypocrites.
        
         | cmiles74 wrote:
         | Google Music originally let people upload their own digital
         | music files. The argument at the time was that whether or not
         | the files were legally obtained was not Google's problem. I
         | believe Amazon had a similar service.
         | 
         | https://www.computerworld.com/article/1447323/google-reporte...
        
       | motbus3 wrote:
       | It is shocking how courts have being ruling towards the benefits
       | of ai companies despite the obvious problem of allowing automatic
       | plagiarism
        
         | jobs_throwaway wrote:
         | Information wants to be free
        
           | NoOn3 wrote:
           | Then why do they sell their services instead of putting the
           | model in open source?
        
         | kristofferR wrote:
         | Not really, plagiarism is not a legal concept.
        
       | Kim_Bruning wrote:
       | actual title:
       | 
       | "Anthropic cut up millions of used books to train Claude -- and
       | downloaded over 7 million pirated ones too, a judge said."
       | 
       | A not-so-subtle difference.
       | 
       | That said, in a sane world, they shouldn't have needed to cut up
       | all those used books yet again when there's obviously already an
       | existing file that does all the work.
        
         | kube-system wrote:
         | The importance of acquiring the physical book was the transfer
         | of compensation to the author.
        
           | Kim_Bruning wrote:
           | You're not wrong, but that's one heck of a way to do it. It
           | involves the destruction of 7 million books, which ... I
           | really don't quite see the "promotion of Progress of Science
           | and useful Arts" in that.
        
         | CaptainFever wrote:
         | Yeah, I'm not sure if people realize that the whole reason they
         | _had_ to cut up the books was because they wanted to comply
         | with copyright law. Artificial scarcity.
        
       | greenavocado wrote:
       | Should have listened to those NordVPN ads on YouTube
        
       | sidewndr46 wrote:
       | So using the standard industry metrics for calculating the
       | financial impact of piracy, this would equate to something like
       | trillions of damages to the book publishing industry?
        
       | 2OEH8eoCRo0 wrote:
       | I've begun to wonder if this is why some large torrent sites
       | haven't been taken down. They are essentially able to crowdsource
       | all the work. There are some users who spend ungodly amounts of
       | time and money on these sites that I suspect are rich industry
       | benefactors.
        
       | neonate wrote:
       | https://archive.md/YLyPg
        
       | bgwalter wrote:
       | Here is how individuals are treated for massive copyright
       | infringement:
       | 
       | https://investors.autodesk.com/news-releases/news-release-de...
        
         | piker wrote:
         | I thought you'd go with this:
         | https://en.wikipedia.org/wiki/United_States_v._Swartz
        
           | dialup_sounds wrote:
           | Swartz wasn't charged with copyright infringement.
        
             | natch wrote:
             | *technically
        
               | kube-system wrote:
               | If you're discussing law, _an entirely different law in a
               | different title of US code_ is more than a technicality.
        
               | piker wrote:
               | No, the parent was referring to how someone "was
               | treated", and it would have been perfectly valid to
               | reference that case to make the same point.
               | 
               | What you're saying is like calling Al Capone a tax cheat.
               | Nonsense.
               | 
               | They went after Aaron over copyright.
        
               | dialup_sounds wrote:
               | Unlike much of the post hoc hagiography around Swartz,
               | it's literally true.
        
             | arandomhuman wrote:
             | No but he coincidentally passed away after he was accused
             | of it.
        
               | kube-system wrote:
               | No, the CFAA was the law that had him facing 35 years in
               | prison and $1m+ fines. It wasn't a copyright case.
        
               | tzs wrote:
               | He wasn't facing anywhere near that. When the DOJ charges
               | someone with a set of charges they like to say in the
               | press release that the person is facing N years, where
               | they get N by simply adding up the maximums for each
               | charge that it is possible for a hypothetical defendant
               | that has all the possible sentence enhancing factors to
               | get. They also ignore that some charges group for
               | sentencing--your sentence for the group is the maximum
               | sentence for the individual charges in the group.
               | 
               | Here's an article explaining in more detail [1].
               | 
               | Most experts say that if Swartz had gone to trial and the
               | prosecution had proved everything they alleged and the
               | judge had decided to make an example of Swartz and
               | sentence harshly it would have been around 7 years.
               | 
               | Swartz's own attorney said that if they had gone to trail
               | and lost he thought it was unlikely that Swartz would get
               | any jail time.
               | 
               | Swartz also had at least two plea bargain offers
               | available. One was for a guilty plea and 4 months. The
               | other was for a guilty plea and the prosecutors would ask
               | for 6 months but Swartz could ask the judge for less or
               | for probation instead and the judge would pick.
               | 
               | [1] https://www.popehat.com/2013/02/05/crime-whale-sushi-
               | sentenc...
        
               | kube-system wrote:
               | Yes, I meant "up to" that amount, which is implied when
               | many people say "facing" before a trial happens. But it's
               | not really relevant to my point, which was that it wasn't
               | a copyright case.
        
         | chourobin wrote:
         | copyright is not the same as piracy
        
           | asadotzler wrote:
           | piracy isn't a thing, except on the high seas. what you're
           | thinking about is copyright violation.
        
             | downrightmike wrote:
             | Yup, piracy sounds better than copyright violation.
             | 
             | "Piracy" is mostly a rhetorical term in the context of
             | copyright. Legally, it's still called infringement or
             | unauthorized copying. But industries and lobbying groups
             | (e.g., RIAA, MPAA) have favored "piracy" for its emotional
             | weight.
        
               | collingreen wrote:
               | Emotional weight or because it's intentionally
               | misleading.
        
               | admissionsguy wrote:
               | Does piracy have negative connotations? I thought
               | everyone thought pirates were cool
        
               | accrual wrote:
               | Everyone but the person(s) affected by the pirates, I
               | suppose.
        
           | achierius wrote:
           | Can you explain why? What makes them categorically different
           | or at the very least why is "piracy" quantitatively worse
           | than 'just' copyright violation?
        
             | arrosenberg wrote:
             | Piracy is theft - you have taken something and deprived the
             | original owner of it.
             | 
             | Copyright infringement is unauthorized reproduction - you
             | have made a copy of something, but you have not deprived
             | the original owner of it. At most, you denied them revenue
             | although generally less than the offended party claims,
             | since not all instances of copying would have otherwise
             | resulted in a sale.
        
               | fuzzfactor wrote:
               | I have about the same concept of piracy these days.
               | 
               | Real piracy always involves booty.
               | 
               | Naturally booty is wealth that has been hoarded.
               | 
               | Has nothing to do with wealth that may or may not come in
               | the future, regardless of whether any losses due to
               | piracy have taken place already or not.
        
               | ddingus wrote:
               | Yes, and the struggle with this back in the day was the
               | *IAA and related organizations wanted to equate
               | infringement with theft.
               | 
               | And to be clear, we javelin the word infringement
               | precisely because it is not theft.
               | 
               | In addition to the deprived revenue, piracy also improves
               | on the general relevance the author has or may have in
               | the public sphere. Essentially, one of the side effects
               | of piracy is basically advertising.
               | 
               | Doctorow was one of the early ones to bring this aspect
               | of it up.
        
             | charcircuit wrote:
             | Saying that piracy isn't copyright violation is an RMS
             | talking point. It's not worth trying to ask why because the
             | answer will be RMS said so and will not be backed by the
             | common usage of the word.
        
               | buzzerbetrayed wrote:
               | You legitimately have it completely backwards. The word
               | "piracy" was coopted to put a more severe spin on
               | copyright violation. As a result, it became "the common
               | usage of the word". But that was by design. And it's
               | worth pushing back on.
        
               | carlhjerpe wrote:
               | Sweden has a political party called "The Pirate
               | Party"(1), and "The Pirate Bay" is Swedish so I think a
               | couple of Swedes memeing before it was cool has a
               | significant impact on making the name stick but also
               | taking the seriousness out of it.
               | 
               | 1: https://piratpartiet.se/en/
        
               | charcircuit wrote:
               | I don't have it backwards. Language evolved, and piracy
               | got a new definition. It's even in the dictionary. Trying
               | to redefine words like this is futile and avoiding
               | certain words or replacing them with others is a quirk
               | that RMS has.
        
               | lcnPylGDnU4H9OF wrote:
               | > RMS
               | 
               | Referring to this? (Wikipedia's disambiguation page
               | doesn't seem to have a more likely article.)
               | 
               | https://en.wikipedia.org/wiki/Richard_Stallman#Copyright_
               | red...
        
               | charcircuit wrote:
               | Yes, quoting the following section:
               | Stallman places great importance on the words and labels
               | people use to talk about the world, including the
               | relationship between software and freedom. He asks people
               | to say free software and GNU/Linux, and to avoid the
               | terms intellectual property and piracy (in relation to
               | copying not approved by the publisher). One of his
               | criteria for giving an interview to a journalist is that
               | the journalist agrees to use his terminology throughout
               | the article.
        
               | lcnPylGDnU4H9OF wrote:
               | That seems rather agreeable, though. Stallman is
               | essentially saying that words are meaningful and
               | speakers/writers should be thoughtful about the meaning
               | of the words they use. In that context, refusing to use
               | terms like "intellectual property" and "piracy" because
               | of their meaning and the effect their use has on culture,
               | and especially insisting that journalists who interview
               | you use the same language, seems to be a means of
               | controlling the interpreted meaning of one's expressions.
               | 
               | (As an aside, it seems pointless to decry it as a
               | "talking point". The reason it was brought up is
               | presumably because the author agrees with it and thinks
               | it's relevant. It's also entirely possible that the
               | author, like me, made this argument without being aware
               | that it was popularized by Richard Stallman. If it makes
               | sense then you can hear the argument without hearing the
               | person and still find it agreeable.)
               | 
               | "Piracy" is used to refer to copyright violation to make
               | it sound scary and dangerous to people who don't know
               | better or otherwise don't think about it too hard. Just
               | imagine if they called it "banditry" instead; now tell me
               | that pirates are not bandits with boats. They may as well
               | have called it banditry and it's worth correcting that.
               | (I also think it's worth ridiculing but that doesn't
               | appear to be Stallman's primary point.) It's not banditry
               | (how _ridiculous_ would it be to call it that?), it 's
               | copyright infringement.
               | 
               | Edit:
               | 
               | Reading my comment again in the context of other things
               | you wrote, I suspect the argument will not pass muster
               | because you do not seem to see piracy's change in meaning
               | as manufactured by PR work purchased by media industry
               | leaders. I'm not really trying to convince you that it's
               | true but it may be worth considering that it is the
               | fundamental disagreement you seem to have with others on
               | Stallman's point; again, not saying you're wrong, just
               | that's where the disagreement is.
        
               | charcircuit wrote:
               | My point is that the 2 commenters are working off of
               | different definitions. One is using the common
               | definitions of words in English and the other is trying
               | to advocate for their ideological rooted definitions by
               | trying to correct people who use the normal English
               | definitions. 99% of the time how this will play out is
               | the idealog will preach about their values instead of
               | acknowledging that they are purposefully using different
               | definitions.
               | 
               | In short the post is bait.
        
               | lcnPylGDnU4H9OF wrote:
               | > In short the post is bait.
               | 
               | This is an uncharitable interpretation. The ostensible
               | point of the comment, or at least a stronger and still-
               | reasonable interpretation, is that they are trying to
               | point out that this specific word choice confuses
               | concepts, which it does. Richard Stallman and the
               | commenter in question are absolutely correct to point
               | that out. You actually seem to be agreeing with Stallman,
               | at least in the abstract.
               | 
               | It's should be acknowledged how/why the meaning of the
               | word changed. As I said, that seems to have been
               | manufactured, which suggests, at least to me, that their
               | (and Richard Stallman's) point is essentially the same as
               | yours. That is to say, the US media industry started
               | paying PR firms to use "piracy" as meaning something
               | other than its normal definition until that became the
               | common definition.
               | 
               | They should not purposely use a different definition like
               | that. That is Stallman's point, and why he refuses to say
               | "piracy" instead of "copyright infringement"; ocean
               | banditry is not copyright infringement and it is
               | confusing -- intentionally so -- to say that it is.
        
             | abeppu wrote:
             | Maybe the most memorable version of the response is this
             | the "Copying is not Theft" song.
             | https://www.youtube.com/watch?v=IeTybKL1pM4
        
             | NoMoreNicksLeft wrote:
             | Asked unironically: "What's worse, hijacking ships at sea
             | and holding their crews hostage for ransom on threat of
             | death, or downloading a song off the internet?" ...
        
         | nh23423fefe wrote:
         | What point are you making? 20 years ago, someone sold pirated
         | copies of software (wheres the transformation here) and that's
         | the same as using books in a training set? Judge already said
         | reading isnt infringement.
         | 
         | This is reaching at best.
        
           | amlib wrote:
           | Aren't you comparing the wrong things? First example is about
           | the output/outcome, what is the equivalent for LLMs? Also,
           | not all "pirated" things are sold, most are in fact
           | distributed for free.
           | 
           | "Pirates" also transform the works they distribute. They
           | crack it, translate it, compress it to decrease download
           | times, remove unnecessary things, make it easier to download
           | by splitting it in chunks (essential with dial-up, less so
           | nowadays), change distribution formats, offer it trough
           | different channels, bundle extra software and media that they
           | themselves might have coded like trainers, installers, sick
           | chiptunes and so on. Why is the "transformation" done by a
           | big corpo more legal in your views?
        
         | JimDabell wrote:
         | > illegally copying and selling pirated software
         | 
         | This is very different to what Anthropic did. Nobody was buying
         | copies of books from Anthropic instead of the copyright holder.
        
           | rvnx wrote:
           | At the very least, they should have purchased the originals
           | once
        
             | arandomhuman wrote:
             | Yeah, people have gone to jail for a few copies of content.
             | Taking that large of a corpus and getting off without
             | penalty would be a farce of the justice system.
        
               | rockemsockem wrote:
               | Bad decisions should not be repeated in the name of fair
               | application.
        
               | impossiblefork wrote:
               | They actually should, because generally an equal playing
               | field is more important that correct law.
               | 
               | As an extreme example, consider murder. Obviously it
               | should be illegal, but if it's legal for one group and
               | not for another, the group for which it's illegal will
               | probably be wiped out, having lost the ability to avenge
               | deaths in the group.
               | 
               | It's much more important that laws are applied
               | impartially and equally than that they are even a tiny
               | bit reasonable.
        
               | haneefmubarak wrote:
               | I think GP's point is that you should always seek to
               | apply the law correctly, hopefully setting precedent for
               | its correct application for everyone in the future.
        
           | armada651 wrote:
           | I wouldn't be so sure about that statement, no one has ruled
           | on the output of Anthropic's AI yet. If their AI spits out
           | the original copy of the book then it is practically the same
           | as buying a book from them instead of the copyright holder.
           | 
           | We've only dealt with the fairly straight-forward legal
           | questions so far. This legal battle is still far from being
           | settled.
        
             | KoolKat23 wrote:
             | It is extremely likely this will be declared fair use in
             | the end.
             | 
             | There's already one decision on a competitor.
             | 
             | It makes sense, if you think of how the model works.
        
             | cmiles74 wrote:
             | It's very unlikely that Claude will verbatim reproduce an
             | entire book from its training corpus. If that's the bar,
             | they are pretty safe in my opinion.
        
         | farceSpherule wrote:
         | Peterson was copying and selling pirated software.
         | 
         | Come up with a better comparison.
        
           | organsnyder wrote:
           | Anthropic is selling a service that incorporates these
           | pirated works.
        
             | adolph wrote:
             | That a service incorporating the authors' works exists is
             | not at issue. The plaintiffs' claims are, as summarized by
             | Alsup:                 First, Authors argue that using
             | works to train Claude's underlying LLMs        was like
             | using works to train any person to read and write, so
             | Authors        should be able to exclude Anthropic from
             | this use (Opp. 16).             Second, to that last point,
             | Authors further argue that the training was        intended
             | to memorize their works' creative elements -- not just
             | their        works' non-protectable ones (Opp. 17).
             | Third, Authors next argue that computers nonetheless should
             | not be        allowed to do what people do.
             | 
             | https://media.npr.org/assets/artslife/arts/2025/order.pdf
        
               | codedokode wrote:
               | Computers cannot learn and are not subjects to laws. What
               | happens, is a human takes a copyrighted work, makes an
               | unauthorized digital copy, and loads it into a computer
               | without authorization from copyright owner.
        
               | KoolKat23 wrote:
               | And they are not selling this or distributing this.
               | 
               | The model is very different.
        
               | cmiles74 wrote:
               | I have to disagree, without all the copyrighted input
               | data there would be no output data for these companies to
               | sell. This output data _is_ the product and they are
               | distributing it for dollars.
        
               | KoolKat23 wrote:
               | Copyright is concerned with the the actual physical copy.
               | The model isn't this. The end user would have to
               | carefully prompt the models algorithm to output a
               | copyright infringing piece.
               | 
               | This argument is more along the lines of: blaming
               | Microsoft Word for someone typing characters into the
               | word processors algorithm, and outputting a copy of an
               | existing book. (Yes, it is a lot easier, but the
               | rationale is the same). In my mind the end user prompting
               | the model would be the one potentially infringing.
        
               | cmiles74 wrote:
               | FWIW, I don't think there is a prompt that would reliably
               | produce, verbatim, a copyrighted work.
               | 
               | I do think that a big part of the reason Anthropic
               | downloaded millions of books from pirate torrents was
               | because they _needed_ that input data in order to
               | generate the output, their product.
               | 
               | I don't know what that is, but, IMHO, not sharing those
               | dollars with the creators of the content is clearly
               | wrong.
        
               | adolph wrote:
               | It can't be "unauthorized" if no authorization was
               | needed.
        
               | xdennis wrote:
               | > That a service incorporating the authors' works exists
               | is not at issue.
               | 
               | It's not an issue because it's not currently illegal
               | because nobody could have foreseen this years ago.
               | 
               | But it is profiting off of the unpaid work of millions.
               | And there's very little chance of change because it's so
               | hard to pass new protection laws when you're not Disney.
        
               | adolph wrote:
               | Marx wrote _The tradition of all dead generations weighs
               | like an Alp on the brains of the living._ and that would
               | be true if one were obligated to pay the full freight of
               | one 's antecedents. The more positive truth is that the
               | brains of the living reach new heights from that Alp and
               | build ever new heights for those who come afterwards.
        
               | CaptainFever wrote:
               | Let's not expand copyright law.
        
               | TeMPOraL wrote:
               | It's not an issue because it's not what this case was
               | about, as the linked document explicitly states. The
               | Authors did not contest the legality of the model's
               | outputs, only the inputs used in training.
        
               | megaman821 wrote:
               | Correct, the New York Times and Disney are suing for the
               | output side. I am going to hazard a guess that you won't
               | be able to circumvent copyright and trademark just
               | because you are using AI. Where that line is has yet to
               | be determined though.
        
               | TeMPOraL wrote:
               | Right, but where that line will be drawn will have major
               | impact on the near-term future of those models. If the
               | user is liable for distributing infringing output that
               | came from AI, that's not a problem for the field (and
               | IMHO a reasonable approach) - but if they succeed in
               | making the model vendors liable for the _possibility_ of
               | users generating infringing output, it 'll shake things
               | up pretty seriously.
        
               | lawlessone wrote:
               | > underlying LLMs was like using works to train any
               | person to read and write
               | 
               | I don't think humans learn via backprop or in
               | rounds/batches, our learning is more "online".
               | 
               | If I input text into an LLM it doesn't learn from that
               | unless the creators consciously include that data in the
               | next round of teaching their model.
               | 
               | Humans also don't require samples of every text in
               | history to learn to read and write well.
               | 
               | Hunter S Thompson didn't need to ingest the Harry Potter
               | books to write.
        
               | TeMPOraL wrote:
               | The first paragraph sounds absurd, so I looked into the
               | PDF, and here's the full version I found:
               | 
               | > _First, Authors argue that using works to train
               | Claude's underlying LLMs was like using works to train
               | any person to read and write, so Authors should be able
               | to exclude Anthropic from this use (Opp. 16). But Authors
               | cannot rightly exclude anyone from using their works for
               | training or learning as such. Everyone reads texts, too,
               | then writes new texts. They may need to pay for getting
               | their hands on a text in the first instance. But to make
               | anyone pay specifically for the use of a book each time
               | they read it, each time they recall it from memory, each
               | time they later draw upon it when writing new things in
               | new ways would be unthinkable. For centuries, we have
               | read and re-read books. We have admired, memorized, and
               | internalized their sweeping themes, their substantive
               | points, and their stylistic solutions to recurring
               | writing problems._
               | 
               | Couldn't have put it better myself (though $deity knows I
               | tried many times on HN). Glad to see Judge Alsup
               | continues to be the voice of common sense in legal
               | matters around technology.
        
               | cmiles74 wrote:
               | For everyone arguing that there's no harm in
               | anthropomorphizing an LLM, witness this rationalization.
               | They talk about training and learning as if this is
               | somehow comparable to human activities. The idea that LLM
               | training is comparable to a person learning seems way out
               | there to me.
               | 
               | "We have admired, memorized, and internalized their
               | sweeping themes, their substantive points, and their
               | stylistic solutions to recurring writing problems."
               | 
               | Claude is not doing any of these things. There is no
               | admiration, no internalizing of sweeping themes. There's
               | a network encoding data.
               | 
               | We're talking about a machine that accepts content and
               | then produces more content. It's not a person, it's owned
               | by a corporation that earns money on literally every word
               | this machine produces. If it didn't have this large
               | corpus of input data (copyrighted works) it could not
               | produce the output data for which people are willing to
               | pay money. This all happens at a scale no individual
               | could achieve because, as we know, it is a machine.
        
               | ben_w wrote:
               | There may be no admiration, but there definitely is an
               | internalising of sweeping themes, and all the other
               | things in your quotation, which anyone can fetch by
               | asking it for the themes/substantive points/stylistic
               | solutions of one of the books it has (for lack of a
               | better verb) read.
               | 
               | That the mechanism performing these things is a network
               | encoding data is... well, that description, at that level
               | of abstraction, is a similarity with the way a human does
               | it, not even a difference.
               | 
               | My network is a 3D mess made of pointy bi-lipid bags
               | exchanging protons across gaps moderated by the presence
               | of neurochemicals, rather than flat sheets of silicon
               | exchanging electrons across tuned energy band-gaps
               | moderated by other electrons, but it's still a network.
               | 
               | > We're talking about a machine that accepts content and
               | then produces more content. It's not a person, it's owned
               | by a corporation that earns money on literally every word
               | this machine produces. If it didn't have this large
               | corpus of input data (copyrighted works) it could not
               | produce the output data for which people are willing to
               | pay money. This all happens at a scale no individual
               | could achieve because, as we know, it is a machine.
               | 
               | My brain is a machine that accepts content in the form of
               | job offers and JIRA tickets (amongst other things), and
               | then produces more content in the form of pull requests
               | (amongst other things). For the sake specifically of this
               | question, do the other things make a difference? While I
               | count as a person and am not owned by any corporation,
               | when I work for one, they do earn money on the words this
               | biological machine produces. (And given all the models
               | which are free to use, the LLMs definitely don't earn
               | money on "literally" every word those models produce). If
               | I didn't have the large corpus of input data -- and there
               | absolutely was copyright on a lot of the school textbooks
               | and the TV broadcast educational content of the 80s and
               | 90s when I was at school, and the Java programming
               | language that formed the backbone of my university degree
               | -- I could not produce the output data for which people
               | are willing to pay money.
               | 
               | Should corporations who hire me be required to pay Oracle
               | every time I remember and use a solution that I learned
               | from a Java course, even when I'm not writing Java?
               | 
               | That the LLMs do this at a scale no individual could
               | achieve because it is a machine, means it's got the
               | potential to wipe me out economically. Economics threat
               | of automation has been a real issue at least since the
               | luddites if not earlier, and I don't know how the dice
               | will fall this time around, so even though I have one
               | layer of backup plan, I am well aware it may not work,
               | and if it doesn't then government action will have to
               | happen because a lot of other people will be in trouble
               | before trouble gets to me (and recent history shows that
               | this doesn't mean "there won't be trouble").
               | 
               | Copyright law is one example of government action. So is
               | mandatory education. So is UBI, but so too is feudalism.
               | 
               | Good luck to us all.
        
               | losvedir wrote:
               | > _Glad to see Judge Alsup continues to be the voice of
               | common sense in legal matters around technology_
               | 
               | Yep, that name's a blast from the past! He was the judge
               | on the big Google/Oracle case about Android and Java
               | years ago, IIRC. I think he even learned to write some
               | Java so he could better understand the case.
        
         | Aurornis wrote:
         | > Here is how individuals are treated for massive copyright
         | infringement:
         | 
         | When I clicked the link, I got an article about a _business_
         | that was selling millions of dollars of pirated software.
         | 
         | This guy made millions of dollars in profit by selling pirated
         | software. This wasn't a case of transformative works, nor of an
         | individual doing something for themselves. He was plainly
         | stealing and reselling something.
        
         | ysofunny wrote:
         | before breaking the law, set up a corporation to absorb the
         | liability!
         | 
         | in other words, provided you have enough spare capital to spin
         | up a corporation, you can break the law!!!!
        
         | stocksinsmocks wrote:
         | Anthropic isn't selling copies of the material to its users
         | though. I would think you couldn't lock someone up for reading
         | a book and summarizing or reciting portions of the contents.
         | 
         | Seven years for thumbing your nose at Autodesk when armed
         | robbery would get you less time says some interesting things
         | about the state of legal practice.
        
           | wmeredith wrote:
           | > summarizing or reciting portions of the contents
           | 
           | This absolutely falls under copyright law as I understand it
           | (not a lawyer). E.g. the disclaimer that rolls before every
           | NFL broadcast. The notice states that the broadcast is
           | copyrighted and any unauthorized use, including pictures,
           | descriptions, or accounts of the game, is prohibited. There
           | is wiggle room for fair use by news organizations, critics,
           | artists, etc.
        
             | steveklabnik wrote:
             | I can say "you cannot read this comment for any purpose"
             | but that doesn't supersede the law.
        
           | zahma wrote:
           | Except they aren't merely reading and reciting content, are
           | they? That's a rather disingenuous argument to make. All
           | these AI companies are high on billions in investment and
           | think they can run roughshod over all rules in the sprint
           | towards monetizing their services.
           | 
           | Make no mistake, they're seeking to exploit the contents of
           | that material for profits that are orders of magnitude larger
           | than what any shady pirated-material reseller would make. The
           | world looks the other way because these companies are
           | "visionary" and "transformational."
           | 
           | Maybe they are, and maybe they should even have a right to
           | these buried works, but what gives them the right to rip up
           | the rule book and (in all likelihood) suffer no repercussions
           | in an act tantamount to grand theft?
           | 
           | There's certainly an argument to be had about whether this
           | form of research and training is a moral good and beneficial
           | to society. My first impression is that the companies are too
           | opaque in how they use and retain these files, albeit for
           | some legitimate reasons, but nevertheless the archival
           | achievements are hidden from the public, so all that's left
           | is profit for the company on the backs of all these other
           | authors.
        
           | burnt-resistor wrote:
           | I'm wondering though how the law will construe AI able to
           | make a believable sequel to Moby Dick after digesting Herman
           | Melville's works. (Or replace Melville with a modern writer.)
        
       | dathinab wrote:
       | as far as I understand while training on books is clearly not
       | fair use (as the result will likely hurt the lively hood of
       | authors, especially not "best of the best" authors).
       | 
       | as long as you buy the book it still should be legal, that is if
       | you actually buy the book and not a "read only" eBook
       | 
       | but the 7_000_000 pirated books are a huge issue, and one from
       | which we have a lot of reason to believe isn't just specific to
       | Anthropic
        
         | asadotzler wrote:
         | Buying a copy of a book does not give you license to take the
         | exact content of that book, repackage it as a web service, and
         | sell it to millions of others. That's called theft.
        
       | russell_h wrote:
       | The title is clearly meant to generate outrage, but what is wrong
       | with cutting up a book that you own?
        
       | nickpsecurity wrote:
       | Buying, scanning, and discarding was in my proposal to train
       | under copyright restrictions.
       | 
       | You are often allowed to nake a digital copy of a physical work
       | you bought. There are tons of used, physical works thay would be
       | good for training LLM's. They'd also be good for training OCR
       | which could do many things, including improve book scanning for
       | training.
       | 
       | This could be reduced to a single act of book destruction per
       | copyrighted work or made unnecessary if copyright law allowed us
       | to share others' works digitally with their _licensed customers_.
       | Ex: people who own a physical copy or a license to one.
       | Obviously, the implementation could get complex but we wouldn 't
       | have to destroy books very often.
        
       | NHQ wrote:
       | The farce of treating a corporation as an individual precludes
       | common sense legal procedure to investigate people who are
       | responsible for criminal action taken by the company. Its
       | obviously premeditated and in all ways an illicit act knowingly
       | perpetrated by persons. The only discourse should be about
       | upending this penthouse legalism.
        
         | NHQ wrote:
         | The irony is that actually litigating copyright law would lead
         | to the repeal of said copyright law. And so in all cases of
         | backwaters laws that are used to "protect interests" of
         | "corporations" yet criminalize petty individual cases.
         | 
         | This of course cannot be allowed to happen, so the the legal
         | system is just a limbo, a bar which regular individuals must
         | strain to pass under but that corporations regularly overstep.
        
       | outside1234 wrote:
       | So if you incorporate you can do whatever you want without
       | criminal charges?
        
       | trinsic2 wrote:
       | I'm not seeing how this is fair use in either case.
       | 
       | Someone correct me if I am wrong but aren't these works being
       | digitized and transformed in a way to make a profit off of the
       | information that is included in these works?
       | 
       | It would be one thing for an individual to make person use of one
       | or more books, but you got to have some special blindness not to
       | see that a for-profit company's use of this information to
       | improve a for-profit model is clearly going against what
       | copyright stands for.
        
         | jimbob21 wrote:
         | They clearly were being digitized, but I think its a more
         | philosophical discussion that we're only banging our heads
         | against for the first time to say whether or not it is fair
         | use.
         | 
         | Simply, if the models can _think_ then it is no different than
         | a person reading many books and building something new from
         | their learnings. Digitization is just memory. If the models
         | cannot _think_ then it is meaningless digital regurgitation and
         | plagiarism, not to mention breach of copyright.
         | 
         | The quotes "consistent with copyright's purpose in enabling
         | creativity and fostering scientific progress." and "Like any
         | reader aspiring to be a writer" say, from what I can tell, that
         | the judge has legally ruled the model can think as a human
         | does, and therefore has the legal protections afforded to
         | "creatives."
        
           | palmotea wrote:
           | > Simply, if the models can think then it is no different
           | than a person reading many books and building something new
           | from their learnings.
           | 
           | No, that's fallacious. Using anthropomorphic words to
           | describe a machine does not give it the same kinds of rights
           | and affordances we give real people.
        
             | jimbob21 wrote:
             | Actually, it does, at least for this case. The judge just
             | said so.
        
               | NoOn3 wrote:
               | People have rights, machines don't. Otherwise, maybe give
               | machines the right to vote, for example?...
        
               | kube-system wrote:
               | This case is more like:
               | 
               | If a human uses a voting machine, they still have a right
               | to vote.
               | 
               | Machines don't have rights. The human using the machine
               | does.
        
               | protocolture wrote:
               | If I can use my brain to learn, I as a human can use my
               | computer to learn.
               | 
               | Its like, taking notes, or google image search caching
               | thumbnails. Honestly we dont even need the learning
               | metaphor to see this is obviously not an infringement.
        
             | pavon wrote:
             | The judge did use some language that analogized the
             | training with human learning. I don't read it as basing the
             | legal judgement on anthropomorphizing the LLM though, but
             | rather discussing whether it would be legal for a human to
             | do the same thing, then it is legal for a human to use a
             | computer to do so.                 First, Authors argue
             | that using works to train Claude's underlying LLMs was like
             | using       works to train any person to read and write, so
             | Authors should be able to exclude Anthropic       from this
             | use (Opp. 16). But Authors cannot rightly exclude anyone
             | from using their works for       training or learning as
             | such. Everyone reads texts, too, then writes new texts.
             | They may need       to pay for getting their hands on a
             | text in the first instance. But to make anyone pay
             | specifically for the use of a book each time they read it,
             | each time they recall it from memory,       each time they
             | later draw upon it when writing new things in new ways
             | would be unthinkable.       For centuries, we have read and
             | re-read books. We have admired, memorized, and internalized
             | their sweeping themes, their substantive points, and their
             | stylistic solutions to recurring writing       problems.
             | ...            In short, the purpose and character of using
             | copyrighted works to train LLMs to generate       new text
             | was quintessentially transformative. Like any reader
             | aspiring to be a writer,       Anthropic's LLMs trained
             | upon works not to race ahead and replicate or supplant them
             | -- but       to turn a hard corner and create something
             | different. If this training process reasonably
             | required making copies within the LLM or otherwise, those
             | copies were engaged in a       transformative use.
             | 
             | [1] https://authorsguild.org/app/uploads/2025/06/gov.uscour
             | ts.ca...
        
         | wrs wrote:
         | Copyright is not on "information", It's on the tangible
         | expression (i.e., the actual words). "Transformative use" is a
         | _defense_ in copyright infringement.
        
         | kristofferR wrote:
         | What do you think fair use is? The whole point of the fair use
         | clauses is that if you transform copyrighted works enough you
         | don't have to pay the original copyright holder.
        
           | kube-system wrote:
           | Fair use is not, at its core, about transformation. It's
           | about many types of uses that do not interfere with the
           | reasons for the rights we ascribe to authors. Fair use
           | doesn't require transformation.
        
         | skybrian wrote:
         | Copyright is largely about _distributing copies._ It's not
         | about making something vaguely similar or about referencing
         | copyrighted work to make something vaguely similar.
         | 
         | Although, there's an exception for fictional characters:
         | 
         | https://en.m.wikipedia.org/wiki/Copyright_protection_for_fic...
        
         | pavon wrote:
         | There is another case where companies slurped up all of the
         | internet and profited off the information, that makes a good
         | comparison - search engines.
         | 
         | Judges consider a four factor when examining fair use[1]. For
         | search engines,
         | 
         | 1) The use is transformative, as a tool to find content is very
         | different purpose than the content itself.
         | 
         | 2) Nature of the original work runs the full gamut, so search
         | engines don't get points for only consuming factual data, but
         | it was all publicly viewable by anyone as opposed to books
         | which require payment.
         | 
         | 3) The search engine store significant portions of the work in
         | the index, but it only redistributes small portions.
         | 
         | 4) Search engines, as original devised, don't compete with the
         | original, in fact they can improve potential market of the
         | original by helping more people find them. This has changed
         | over time though, and search engines are increasingly competing
         | with the content they index, and intentionally trying to show
         | the information that people want on the search page itself.
         | 
         | So traditional search which was transformative, only
         | republished small amounts of the originals, and didn't compete
         | with the originals fell firmly on the side of fair use.
         | 
         | Google News and Books on the other hand weren't so clear cut,
         | as they were showing larger portions of the works and were
         | competing with the originals. They had to make changes to those
         | products as a result of lawsuits.
         | 
         | So now lets look at LLMs:
         | 
         | 1) LLM are absolutely transformative. Generating new text at
         | users request is a very different purpose and character from
         | the original works.
         | 
         | 2) Again runs the full gamut (setting aside the clear copyright
         | infringement downloading of illegally distributed books which
         | is a separate issue)
         | 
         | 3) For training purposes, LLMs don't typically preserve entire
         | works, so the model is in a better place legally than a search
         | index, which has precedent that storing entire works privately
         | can be fair use depending on the other factors. For inference,
         | even though they are less likely to reproduce the originals in
         | their outputs than search engines, there are failure cases
         | where an LLM over-trained on a work, and a significant amount
         | the original can be reproduced.
         | 
         | 4) LLMs have tons of uses some of which complement the original
         | works and some of which compete directly with them. Because of
         | this, it is likely that whether LLMs are fair use will depend
         | on how they are being used - eg ignore the LLM altogether and
         | consider solely the output and whether it would be infringing
         | if a human created it.
         | 
         | This case was solely about whether training on books is fair
         | use, and did not consider any uses of the LLM. Because LLMs are
         | a very transformative use, and because they don't store
         | original verbatim, it weighs strongly as being fair use.
         | 
         | I think the real problems that LLMs face will be in factors 3
         | and 4, which is very much context specific. The judge himself
         | said that the plaintiffs are free to file additional lawsuits
         | if they believe the LLM outputs duplicate the original works.
         | 
         | [1] https://fairuse.stanford.edu/overview/fair-use/four-
         | factors/
        
         | NoMoreNicksLeft wrote:
         | Digitizing the books is the equivalent of a blind person doing
         | something to the book to make it readable to them... the
         | software can't read analog pages.
         | 
         | Learning from the book is, well, learning from the book. Yes,
         | they intended to make money off of that learning... but then I
         | guess a medical student reading medical textbooks intends to
         | profit off of what they learn from them. Guess that's not fair
         | use either (well, it's really just _use_ , as in the intended
         | use for all books since they were first invented).
         | 
         | Once a person has to believe that copyright has any moral
         | weight at all, I guess all rational though becomes impossible
         | for them. Somehow, they're not capable of entertaining the idea
         | that copyright policy was only ever supposed to be this
         | pragmatic thing to incentivize creative works... and that
         | whatever little value it has disappears entirely once the
         | policy is twisted to consolidate control.
        
         | kenmacd wrote:
         | > to make a profit off of the information that is included in
         | these works?
         | 
         | Isn't that what a lot of companies are doing, just through
         | employees? I read a lot of books, and took a lot of courses,
         | and now a company is profiting off that information.
        
         | protocolture wrote:
         | >clearly going against what copyright stands for.
         | 
         | Copyright isnt a digital moat. Its largely an agreement that
         | the work is available to the public, but the creator has a
         | limited amount of time to exploit it at market.
         | 
         | If you sell an AI model, or access to an AI model, theres
         | usually around 0% of the training data redistributed with the
         | model. You cant decompile it and find the book. As you aren't
         | redistributing the original work copyright is barely relevant.
         | 
         | Imagine suggesting that because you own the design of a hammer,
         | that all works created with the hammer belong to you and cant
         | be sold?
         | 
         | That someone came up with a new method of using books as a tool
         | to create a different work, does not entitle the original book
         | author to a cut of the pie.
        
       | ruffrey wrote:
       | Two of the top AI companies flouted ethics with regard to
       | training data. In OpenAI's case, the whistleblower probably got
       | whacked for exposing it.
       | 
       | Can anyone make a compelling argument that any of these AI
       | companies have the public's best interest in mind
       | (alignment/superalignment)?
        
       | k__ wrote:
       | So, how should we as a society handle this?
       | 
       | Ensure the models are open source, so everyone can use them, as
       | everyones data is in there?
       | 
       | Close those companies and force them to delete the models, as
       | they used copyright material?
        
       | carlosjobim wrote:
       | If ingesting books into an AI makes Anthropic criminals, then
       | Google et al are also criminals alike for making search indexes
       | of the Internet. Anything published online is equally
       | copyrighted.
        
         | kristofferR wrote:
         | Yeah, we can all agree that ingesting books is fair use and
         | transformative, but you gotta own what you ingest, you can't
         | just pirate it.
         | 
         | I can read 100 books and write a book based on the inspiration
         | I got from the 100 books without any issue. However, if I
         | pirate the 100 books I've still committed copyright
         | infringement despite my new book being fully legal/fair use.
        
           | carlosjobim wrote:
           | I disagree that it has anything to do with copyright. It is
           | at most theft. If I steal a bunch of books from the library,
           | I haven't committed any breach of copyright.
        
         | riskable wrote:
         | Exactly! If Anthropic is guilty of copyright infringement for
         | the mere act of downloading copyrighted books then so is
         | Google, Microsoft (Bing), DuckDuckGo, etc. Every search engine
         | that exists downloads pirated material every day. They'd all be
         | guilty.
         | 
         | Not only that but all of _us_ are guilty too because I 'm
         | positive we've all clicked on search results that contained
         | copyrighted content that was copied without permission. You may
         | not have even known it was such.
         | 
         | Remember: _Intent_ is irrelevant when it comes to copyright
         | infringement! It 's not that kind of law.
         | 
         | Intent can _guide_ a judge when they determine damages but that
         | 's about it.
        
       | 1970-01-01 wrote:
       | The buried lede here is Antrhopic will need to attempt to explain
       | to a judge that it is impossible to de-train 7M books from their
       | models.
        
         | nickpsecurity wrote:
         | I'm hoping they fail to incentivize using legal, open, and/or
         | licensed data. Then, thry might have to attempt to train a
         | Claude-class model on legal data. Then, I'll have a great,
         | legal model to use. :)
        
         | protocolture wrote:
         | Or they could be forced to settle a price for access to the
         | books.
        
       | dehrmann wrote:
       | The important parts:
       | 
       | > Alsup ruled that Anthropic's use of copyrighted books to train
       | its AI models was "exceedingly transformative" and qualified as
       | fair use
       | 
       | > "All Anthropic did was replace the print copies it had
       | purchased for its central library with more convenient space-
       | saving and searchable digital copies for its central library --
       | without adding new copies, creating new works, or redistributing
       | existing copies"
       | 
       | It was always somewhat obvious that pirating a library would be
       | copyright infringement. The interesting findings here are that
       | scanning and digitizing a library for internal use is OK, and
       | using it to train models is fair use.
        
         | jpalawaga wrote:
         | I don't think that's new. google set precedent for that more
         | than a decade ago. you're allowed to transform a book to
         | digital.
        
         | 6gvONxR4sf7o wrote:
         | You skipped quotes about the other important side:
         | 
         | > But Alsup drew a firm line when it came to piracy.
         | 
         | > "Anthropic had no entitlement to use pirated copies for its
         | central library," Alsup wrote. "Creating a permanent, general-
         | purpose library was not itself a fair use excusing Anthropic's
         | piracy."
         | 
         | That is, he ruled that
         | 
         | - buying, physically cutting up, physically digitizing books,
         | and using them for training is fair use
         | 
         | - pirating the books for their digital library is not fair use.
        
           | throwawayffffas wrote:
           | So all they have to do is go and buy a copy of each book they
           | pirated. They will have ceased and desisted.
        
             | superfrank wrote:
             | I'm trying to find the quote, but I'm pretty sure the judge
             | specifically said that going and buying the book after the
             | fact won't absolve them of liability. He said that for the
             | books they pirated they broke the law and should stand
             | trial for that and they cannot go back and un-break in by
             | buying a copy now.
             | 
             | Found it: https://www.nbcnews.com/tech/tech-news/federal-
             | judge-rules-c...
             | 
             | > "That Anthropic later bought a copy of a book it earlier
             | stole off the internet will not absolve it of liability for
             | the theft," [Judge] Alsup wrote, "but it may affect the
             | extent of statutory damages."
        
               | zoklet-enjoyer wrote:
               | Did they really steal if they didn't deprive anyone of
               | their copy? I don't think copying is theft.
        
               | badlibrarian wrote:
               | "Tell it to the Judge..."
        
               | kjkjadksj wrote:
               | You may not think it is but the law does.
        
               | buildbot wrote:
               | The law says it's copyright infringement, not theft.
        
               | axus wrote:
               | Agreed, the judge should avoid slang or even commonly
               | accepted synonyms in an official ruling. The charge is
               | not for theft.
               | 
               | Substitute infringement for theft.
        
               | hadlock wrote:
               | It's copyright infringement, which is not theft, they're
               | legally distinct in the eyes of the law. This is partly
               | why the "you wouldn't download a car" copyright ads were
               | so widely mocked.
        
               | __MatrixMan__ wrote:
               | Fun fact, they didn't have the rights to use the font
               | they used for those commercials:
               | https://news.ycombinator.com/item?id=43775926
        
               | gghffguhvc wrote:
               | Or the music. It was originally made as a one off for a
               | film festival. Movie industry defended the lawsuit over
               | the music.
        
               | fortran77 wrote:
               | It's fine that you think that way. But this is a
               | discusion of the laws of the United States of America and
               | ruling by American courts, not a discussion of your own
               | legal theories.
        
               | hnlmorg wrote:
               | The GP isn't talking about some edge case legal dilemma
               | that requires a lawyer or judge to comment. It's already
               | widely documented that copyright infringement is legally
               | distinct from theft.
        
               | freejazz wrote:
               | They also argued that they in no way could ever actually
               | license all the materials they ingested
        
               | dmd wrote:
               | I love this argument so much. "But judge, there's no way
               | I could ever afford to buy those jewels, so stealing them
               | must be OK."
        
               | AnthonyMouse wrote:
               | The argument is more along the lines of, negotiating with
               | millions of individuals each over a single copy of a work
               | would cause the transaction costs to exceed the payments,
               | and that kind of efficiency loss is the sort of thing
               | fair use exists to prevent. It's not socially beneficial
               | for the law to require you to create $2 in deadweight
               | loss in order to transfer $1, and the cost to the author
               | of not selling a single additional copy is not the thing
               | they were really objecting to.
        
               | exe34 wrote:
               | That's right, so I can't individually discuss terms with
               | each and every media creator, so from now on, I can just
               | pirate everything.
        
               | AnthonyMouse wrote:
               | Needing a copy of one book you're going to spend a week
               | reading has a lot less overhead than needing a copy of
               | every book that you're going to process with a computer
               | in bulk.
        
               | recursive wrote:
               | I like to glance at the cover art. I can do ten per
               | second when I really get into my flow state. Sometimes I
               | read them also, but that's incidental.
        
               | AnthonyMouse wrote:
               | If you go to the book store and glance at all the cover
               | art without buying any of them, do you expect to be sued
               | for this?
        
               | freejazz wrote:
               | If you do that and reproduce the covers or the protected
               | elements thereof, you should absolutely expect to be
               | sued.
        
               | AnthonyMouse wrote:
               | So for example, if the bookstore has a nice 4k
               | surveillance camera and you have access to it because you
               | work there, sitting at home and using it to look at the
               | cover art on all the books on display is something you'd
               | expect to be sued over?
        
               | Aeolun wrote:
               | This is literally why a lot of people pirate content,
               | yes. It's pretty much always the only way to obtain the
               | content, even if you are otherwise fine with paying for
               | it.
        
               | freejazz wrote:
               | > and that kind of efficiency loss is the sort of thing
               | fair use exists to prevent.
               | 
               | No it's not. And you ever heard of a publishing house?
               | They don't need to negotiate with every single author
               | individually. That's preposterous.
        
               | AnthonyMouse wrote:
               | It kind of is though?
               | 
               | It's not the _only_ reason fair use exists, but it 's the
               | thing that allows e.g. search engines to exist, and that
               | seems pretty important.
               | 
               | > And you ever heard of a publishing house? They don't
               | need to negotiate with every single author individually.
               | That's preposterous.
               | 
               | There are thousands of publishing houses and millions of
               | self-published authors on top of that. Many books are
               | also out of print or have unclear rights ownership.
        
               | freejazz wrote:
               | >It kind of is though?
               | 
               | No, it kinda isn't. Show me anything that supports this
               | idea beyond your own immediate conjecture right now.
               | 
               | >It's not the only reason fair use exists, but it's the
               | thing that allows e.g. search engines to exist, and that
               | seems pretty important.
               | 
               | No, that's the transformative element of what a search
               | engine provides. Search engines are not legal because
               | they can't contact each licensor, they are legal because
               | they are considered hugely transformative features.
               | 
               | >There are thousands of publishing houses and millions of
               | self-published authors on top of that. Many books are
               | also out of print or have unclear rights ownership.
               | 
               | Okay, and? How many customers does Microsoft bill on a
               | monthly basis?
        
               | AnthonyMouse wrote:
               | > Show me anything that supports this idea beyond your
               | own immediate conjecture right now
               | 
               | It's inherent in the nature of the test. The most
               | important fair use factor is the effect on the market for
               | the work, so if the use would be uneconomical without
               | fair use then the effect on the market is negligible
               | because the alternative would be that the use doesn't
               | happen rather than that the author gets paid for it.
               | 
               | > No, that's the transformative element of what a search
               | engine provides. Search engines are not legal because
               | they can't contact each licensor, they are legal because
               | they are considered hugely transformative features.
               | 
               | To make a search engine you have to do two things. One is
               | to download a copy of the whole internet, the other is to
               | create a search index. I'm talking about the first one,
               | you're talking about the second one.
               | 
               | > Okay, and? How many customers does Microsoft bill on a
               | monthly basis?
               | 
               | Microsoft does this with an automated system. There is no
               | single automated system where you can get every book ever
               | written, and separately interfacing with all of the many
               | systems needed in order to do it is the source of the
               | overhead.
        
               | freejazz wrote:
               | >It's inherent in the nature of the test. The most
               | important fair use factor is the effect on the market for
               | the work, so if the use would be uneconomical without
               | fair use then the effect on the market is negligible
               | because the alternative would be that the use doesn't
               | happen rather than that the author gets paid for it.
               | 
               | No, that's not the most important factor. The
               | transformative factor is the most important. Effect on
               | market for the work doesn't even support your argument
               | anyway. Your argument is about the cost of making the end
               | product, which is totally distinct from the market
               | effects on the copyright holder when the infringer makes
               | and releases the infringing product.
               | 
               | >To make a search engine you have to do two things. One
               | is to download a copy of the whole internet, the other is
               | to create a search index. I'm talking about the first
               | one, you're talking about the second one.
               | 
               | So? That doesn't make you right. Go read the opinions,
               | dude. This isn't something that's actually up for debate.
               | Search engines are fair uses because of their
               | transformative effect, not because they are really
               | expensive otherwise. Your argument doesn't even make
               | sense. By that logic, anything that's expensive becomes a
               | fair use. It's facially ridiculous. Them being expensive
               | is neither sufficient nor necessary for them to be a fair
               | use. Their transformative nature is both sufficient and
               | necessary to be found a fair use. Full stop.
               | 
               | >Microsoft does this with an automated system. There is
               | no single automated system where you can get every book
               | ever written, and separately interfacing with all of the
               | many systems needed in order to do it is the source of
               | the overhead.
               | 
               | Okay, and? They don't need to get every single book ever
               | written. The libraries they pirated do not consist of
               | "every single book ever written". It's hard to take this
               | argument in good faith because you're being so
               | ridiculous.
        
               | AnthonyMouse wrote:
               | > No, that's not the most important factor. The
               | transformative factor is the most important.
               | 
               | It's a four factor test because all of the factors are
               | relevant, but if the use has negligible effect on the
               | market for the work then it's pretty hard to get anywhere
               | with the others. For example, for cases like classroom
               | use, even making verbatim copies of the entire work is
               | often still fair use. Buying a separate copy for each
               | student to use for only a few minutes would make that use
               | uneconomical.
               | 
               | > Effect on market for the work doesn't even support your
               | argument anyway. You're argument is about the cost of
               | making the end product, which is totally distinct from
               | the market effects on the copyright holder when the
               | infringer makes and releases the infringing product.
               | 
               | We're talking about the temporary copies they make during
               | training. Those aren't being distributed to anyone else.
               | 
               | > So? That doesn't make you right.
               | 
               | Making a copy of everything on the internet is a
               | prerequisite to making a search engine. It's something
               | you have to do as a step to making the index, which is
               | the transformative step. Are you suggesting that doing
               | the first step is illegal or what do you propose
               | justifies it?
               | 
               | > By that logic, anything that's expensive becomes a fair
               | use. It's facially ridiculous.
               | 
               | Anything with unreasonably high transaction costs. Why is
               | that ridiculous? It doesn't exempt any of the normal
               | stuff like an individual person buying an individual
               | book.
               | 
               | > They don't need to get every single book ever written.
               | 
               | They need to get as many books as possible, with the
               | platonic ideal being every book. Whether or not the ideal
               | is feasible in practice, the question is whether it's
               | socially beneficial to impose a situation with
               | excessively high transaction costs in order to require
               | something with only trivial benefit to authors
               | (potentially selling one extra copy).
        
               | throwawayffffas wrote:
               | I don't even think their argument is about the money, I
               | think it's more like we couldn't possibly find all these
               | works in any other practical way.
        
               | irthomasthomas wrote:
               | Is copyright in America different to Britain? There, it
               | is legal to download books you don't own. Only
               | distribution is a crime, which most torrenters break by
               | seeding.
        
               | rahimnathwani wrote:
               | What do you mean by 'it is legal'?
               | 
               | Do you mean:
               | 
               | A) It's not a criminal offence?
               | 
               | B) The copyright owner cannot file a civil suit for
               | damages?
               | 
               | C) Something else?
        
               | irthomasthomas wrote:
               | > Only distribution is a crime
        
               | throwawayffffas wrote:
               | Only distribution with the intent to make money is a
               | crime. If you are doing it for free you are not
               | criminally liable. Unless I am missing something.
        
               | rahimnathwani wrote:
               | What relevance does that have to the present case? The
               | judge, in this civil matter, said there would be a trial.
               | He didn't say anything about it being a criminal trial.
               | The strings 'crim' and 'felon' do not appear in the
               | ruling.                 We will have a trial on the
               | pirated copies used to create Anthropic's central library
               | and the resulting damages, actual or statutory (including
               | for willfulness).
        
               | Aeolun wrote:
               | There can always be a trial, even if nothing was done to
               | warrant it.
               | 
               | I think the distinction between civil and criminal trials
               | is smaller in my home country. The fact that there is a
               | trial at all implies that someone commited a 'crime'.
        
               | throwawayffffas wrote:
               | I think it's very similar in both countries, but you have
               | got it wrong. Downloading a book without permission is
               | copyright infringement in both countries, regardless of
               | whether you distribute it.
               | 
               | In the UK it's a criminal offense if you distribute a
               | copyrighted work with the intent to make gain or with the
               | expectation that the owner will make a loss.
               | 
               | Gain and loss are only financial in this context.
               | 
               | Meaning that in both countries the copyright owner can
               | sue you for copyright infringement.
        
             | dragonwriter wrote:
             | > So all they have to do is go and buy a copy of each book
             | they pirated.
             | 
             | No, that doesn't undo the infringement. At most, that would
             | mitigate actual damages, but actual damages aren't likely
             | to be important, given that statutory damages are an
             | alternative and are likely to dwarf actual damages. (It may
             | also figure into how the court assigns statutory damages
             | within the very large range available for those, but that
             | range does not go down to $0.)
             | 
             | > They will have ceased and desisted.
             | 
             | "Cease and desist" is just to stop incurring _additional_
             | liability. (A potential plaintiff may accept that as
             | sufficient to _not sue_ if a request is made and the
             | potential defendant complies, because litigation is
             | uncertain and expensive. But  "cease and desist" doesn't
             | undo wrongs and neutralize liability when they've already
             | been sued over.)
        
               | rockemsockem wrote:
               | > So all they have to do is go and buy a copy of each
               | book they pirated.
               | 
               | For anyone else who wants to do the same thing though
               | this is likely all they need to do.
               | 
               | Cutting up and scanning books is hard work and actually
               | doing the same thing digitally to ebooks isn't labor free
               | either, especially when they have to be downloaded from
               | random sites and cleaned from different formats.
               | Torrenting a bunch of epubs and paying for individual
               | books is probably cheaper
        
             | tzs wrote:
             | Generally you don't want laws to work that way. You want to
             | set the penalties so that they discourage violating the
             | law.
             | 
             | Setting the penalty to what it would have cost to obey the
             | law in the first place does the opposite.
        
               | AnthonyMouse wrote:
               | That's for criminal laws where prosecutorial discretion
               | can then (in principle) be used in borderline cases to
               | prevent unjust outcomes.
               | 
               | If you give people a claim for damages which is an order
               | of magnitude larger than their actual damages, it
               | encourages litigiousness and becomes a vector for
               | shakedowns because the excessive cost of losing pressures
               | innocent defendants to settle even if there was a 90%
               | chance they would have won.
               | 
               | Meanwhile both parties have the incentive to settle in
               | civil cases when it's obvious who is going to win,
               | because a settlement to pay the damages is cheaper than
               | the cost of going to court and then having to pay the
               | same damages anyway. Which also provides a deterrent to
               | doing it to begin with, because even having to pay
               | lawyers to negotiate a settlement is a cost you don't
               | want to pay when it's clear that what you're doing is
               | going to have that result.
               | 
               | And when the result _isn 't_ clear, penalizing the
               | defendant in a case of first impression isn't just
               | either, _because_ it wasn 't clear and punitive measures
               | should be reserved for instances of unambiguous
               | wrongdoing.
        
               | badlibrarian wrote:
               | Statutory damages were written into the first federal
               | copyright law in 1790, and earlier in state law
               | (specified in Pounds because the dollar hadn't been
               | invented yet).
        
               | AnthonyMouse wrote:
               | The first federal copyright law in 1790:
               | 
               | https://copyright.gov/about/1790-copyright-act.html
               | 
               | Specified in dollars because dollars _had_ been invented
               | (in 1789), but in the amount of one half of one dollar,
               | i.e. $0.50. That 's 1790 dollars, of course, so a little
               | under $20 today. (There was basically no inflation for
               | the first 100+ years of that because the US dollar was
               | still backed by precious metals then; a dollar was worth
               | slightly _more_ in 1900 than in 1790.)
               | 
               | That seems more like an attempt to codify some amount of
               | plausible actual damages so people aren't arguing
               | endlessly about valuations, rather than an attempt to
               | impose punitive damages. Most notably because -- unlike
               | the current method -- it scales with the number of sheets
               | reproduced.
        
               | badlibrarian wrote:
               | My fault for the hanging clause: nearly a dozen state
               | laws preceded it and used pounds. Mostly because they
               | were based on the British law and also because the war
               | made a mess of the currency situation.
               | 
               | Statutory damages were added to reduce the burden on
               | plaintiffs. Which encourages people to stay in line. How
               | well this worked out and what it means when some company
               | nobody heard of 4 years ago downloads a billion
               | copyrighted pages and raises $3.5 billion against a $60
               | billion valuation...
               | 
               | Well suddenly $20/page still sounds about right.
        
               | AnthonyMouse wrote:
               | The <$20/page was the same for maps and charts, i.e.
               | things that typically have a single page in the entire
               | work, and came from a time when printing was done a page
               | at a time, i.e. you'd lay out a page and print as many
               | copies of that page as you'd expect to make copies of the
               | entire book, then hide them somewhere else while you
               | print the next page. It was basically a proxy for the
               | number of copies of the work they caught you trying to
               | make, not an attempt to turn a single copy of a 1000 page
               | book into a 1000x multiplier on liability. Notice that
               | otherwise you're letting the infringer choose the amount
               | of the damages, because a larger page size or tighter
               | layout would fit more words per page and therefore have
               | fewer pages per book. (How many "pages" is an HTML
               | document with infinite scroll?)
               | 
               | > Statutory damages were added to reduce the burden on
               | plaintiffs. Which encourages people to stay in line.
               | 
               | It encourages people to not spend a lot of resources
               | speculating about damages. That doesn't mean you need the
               | amount to be punitive rather than compensatory.
        
               | badlibrarian wrote:
               | Agree that a photo of a celebrity and a film containing
               | that celebrity shouldn't have the same number. But a
               | large punitive number in the context of willful
               | infringement seems right to me. And in practice it's all
               | negotiated down anyway, as evidenced by Internet
               | Archive's fourth 30-day stay of its pending $600+ million
               | lawsuit.
        
               | AnthonyMouse wrote:
               | "In practice it's negotiated down anyway" is precisely
               | the issue. If they bring a questionable case against you
               | and you think there's a significant chance you could win,
               | but then there's a small chance you get bankrupted, there
               | is unreasonable pressure for you to settle even if the
               | plaintiffs are in the wrong.
        
               | badlibrarian wrote:
               | I'm not sure what a "questionable case" for willful
               | copyright infringement might look like. Or an example
               | where someone was clearly in the right and got screwed.
               | It isn't the debtor's prison era.
               | 
               | Four factor test seems to be working, even in this case.
               | Don't love it (it goes against my values and what I need
               | to do in my job) but I get it.
               | 
               | Edit: we've triggered HN's patience for this discussion
               | and it's now blocking replies. You do seem a bit long on
               | Google and short on practical experience here. How else
               | would you propose these types of disagreements get
               | sorted? ("Anyone can be sued for anything"
               | notwithstanding.)
               | 
               | There are explicltly no punitive damages in US Copyright
               | law. And the "willful" provision in practice means
               | demonstrating ongoing disregard, after being informed.
               | It's a long walk to the end of that plank.
        
               | AnthonyMouse wrote:
               | > I'm not sure what a "questionable case" for willful
               | copyright infringement might look like.
               | 
               | You did anything which it's not clear whether it's fair
               | use or not. Willfulness is whether you knew you were
               | doing it, not whether you knew whether it was fair use,
               | which in many cases _nobody_ knows until a court decides
               | it, hence the problem.
               | 
               | You have to do it in order to get into court and find out
               | of you're allowed to do it (a ridiculous prerequisite to
               | begin with), and then if it goes against you, you have to
               | pay punitive damages?
        
           | jonas21 wrote:
           | As they mentioned, the piracy part is obvious. It's the fair
           | use part that will set an important precedent for being able
           | to train on copyrighted works as long as you have legally
           | acquired a copy.
        
             | wood_spirit wrote:
             | Cue physical books being licensed not sold in the futur
             | with restricted agreements ...
        
               | pier25 wrote:
               | Also music, videos, photos, etc.
        
               | mormegil wrote:
               | See first-sale doctrine
               | <https://en.wikipedia.org/wiki/First-sale_doctrine>
        
           | jasonlotito wrote:
           | From my understanding:
           | 
           | > pirating the books for their digital library is not fair
           | use.
           | 
           | "Pirating" is a fuzzy word and has no real meaning.
           | Specifically, I think this is the cruz:
           | 
           | > without adding new copies, creating new works, or
           | redistributing existing copies
           | 
           | Essentially: downloading is fine, sharing/uploading up is
           | not. Which makes sense. The assertion here is that Anthropic
           | (from this line) did not distribute the files they
           | downloaded.
        
             | codedokode wrote:
             | Downloading and using pirated software in a company is fine
             | then as long as it is not shared outside? If what you
             | describe is legal it makes no sense to pay for software.
        
               | jasonlotito wrote:
               | > Downloading a document is fine as long as it is not
               | shared outside?
               | 
               | I've fixed your question so that it accurately represents
               | what I said and doesn't put words in my mouth.
               | 
               | If I click on a link and download a document, is that
               | illegal?
               | 
               | I do not know if the person has the right to distribute
               | it or not. IANAL, but when people were getting sued by
               | the RIAA years back, it was never about downloading, but
               | also distribution.
               | 
               | As I said, IANAL, but feel free to correct me, but my
               | understanding is that downloading a document from the
               | internet is not illegal.
        
               | CaptainFever wrote:
               | > it was never about downloading, but also distribution.
               | 
               | Did you mean to write "but about distribution" here?
        
               | jasonlotito wrote:
               | Yes, thank you for catching that. Unfortunately, I cannot
               | edit it now.
        
               | pyrale wrote:
               | sci-hub suddenly becomes legal if all researchers adhere
               | to one big company, apparently.
               | 
               | After all, illegally downloading research papers in order
               | to write new ones is highly transformative.
        
             | AlotOfReading wrote:
             | The legal context here is that "format shifting" has not
             | previously been held to be sufficient for fair use on its
             | own, and downloading for personal use has also been
             | considered infringing. Just look at the numerous media
             | industry lawsuits against individuals that only mention
             | downloading, not sharing for examples.
             | 
             | It's a bit surprising that you can suddenly download
             | copyrighted materials for personal use and and it's kosher
             | as long as you don't share them with others.
        
               | jasonlotito wrote:
               | > the numerous media industry lawsuits against
               | individuals that only mention downloading,
               | 
               | I never saw any of these. All the cases I saw were
               | related to people using torrents or other P2P software
               | (which aren't just downloading). These might exist, but I
               | haven't seen them.
               | 
               | > It's a bit surprising that you can suddenly download
               | copyrighted materials for personal use and it's kosher as
               | long as you don't share them with others.
               | 
               | Every click on a link is a risk of downloading
               | copyrighted material you don't have the rights to.
               | 
               | Searching the internet, it appears that it's a civil
               | infraction, but it's also confused with the notion that
               | "piracy" is illegal, a term that's used for many
               | different purposes. I see "It is illegal to download any
               | music or movies that are copyrighted." under legal
               | advice, which I know as a statement is not true.
               | 
               | Hence my confusion.
               | 
               | I should note: I'm not arguing from the perspective of
               | whether it's morally or ethically right. Only that even
               | in the context of this thread, things are phrased that
               | aren't clear.
        
               | AlotOfReading wrote:
               | I just checked first individual suit I could find, which
               | was BMG v. Gonzalez. She used P2P, but the case was
               | specifically about her _downloading_ , not
               | redistributing.
        
             | eikenberry wrote:
             | Given that downloading requires you to copy the data to
             | download it, I'd think it would fall under "adding new
             | copies".
        
               | jasonlotito wrote:
               | > All Anthropic did was replace the print copies it had
               | purchased ... with more convenient space-saving and
               | searchable digital copies for its central library --
               | without adding new copies..."
               | 
               | That suggests otherwise.
        
           | pier25 wrote:
           | > _buying, physically cutting up, physically digitizing
           | books, and using them for training is fair use_
           | 
           | So Suno would only really need to buy the physical albums and
           | rip them to be able to generate music at an industrial scale?
        
             | theteapot wrote:
             | Yes.
        
               | pier25 wrote:
               | Actually it remains to be seen.
               | 
               | If you read the ruling, training was considered fair use
               | in part because Claude is not a book generation tool.
               | Hence it was deemed transformative. Definitely not what
               | Suno and Udio are doing.
        
             | ohdeargodno wrote:
             | Only if the physical albums don't have copy protection,
             | otherwise you're circumenventing it and that's illegal. Or
             | is it, against the right to private copy? If anything, AI
             | at least shows that all of the existing copyright laws are
             | utter bullshit made to make Disney happy.
             | 
             | Do keep in mind though: this is only for the wealthy.
             | They're still going to send the Pinkertons at your house if
             | you dare copy a Blu-ray.
        
               | zerocrates wrote:
               | With some minor exceptions, CDs don't have copy
               | protection.
        
               | FateOfNations wrote:
               | Minor exception: https://en.wikipedia.org/wiki/Sony_BMG_c
               | opy_protection_rootk...
        
               | nilamo wrote:
               | > They're still going to send the Pinkertons at your
               | house if you dare copy a Blu-ray.
               | 
               | Hey woah now, that's a Hasbro play, not a Disney one.
        
               | kbelder wrote:
               | No, because they can just play the album for the AI to
               | learn. AI training can be set up to exploit the analog
               | hole. Same with images/movies
        
             | itronitron wrote:
             | If it's fair use to train a model, that doesn't necessarily
             | imply that the model can be legally used to generate
             | anything.
        
               | pier25 wrote:
               | I've been reading a bit more about this. The training
               | might not be considered fair use if it's not considered
               | transformative.
               | 
               | Claude has been considered transformative given it's not
               | really meant to generate books but Suno or Midjourney are
               | absolutely in another category.
        
               | make3 wrote:
               | this is funny and potentially accurate
        
               | protocolture wrote:
               | Well there was that legal company who trained an LLM on
               | their oppositions legal documents and then generated
               | their own. I dont think inputs or outputs were ruled
               | legal in that regard.
               | 
               | But as long as the model isnt outputting infringing works
               | theres not really any issue there either.
        
             | jbverschoor wrote:
             | Same how it works in the Netherlands.
        
             | conradev wrote:
             | Yes! Training and generation are fair use. You are free to
             | train and generate whatever you want in your basement for
             | whatever purpose you see fit. Build a music collection, go
             | ham.
             | 
             | If the output from said model uses the voice of another
             | person, for example, we already have a legal framework in
             | place for determining if it is infringing on their rights,
             | independent of AI.
             | 
             | Courts have heard cases of individual artists copying
             | melodies, because melodies themselves are copyrightable:
             | https://www.hypebot.com/hypebot/2020/02/every-possible-
             | melod...
             | 
             | Copyright law is a lot more nuanced than anyone seems to
             | have the attention span for.
        
               | pier25 wrote:
               | > _Yes!_
               | 
               | But Suno is definitely not training models in their
               | basement for fun.
               | 
               | They are a private company selling music, using music
               | made by humans to train their models, to replace human
               | musicians and artists.
               | 
               | We'll see what the courts say but that doesn't sound like
               | fair use.
        
               | conradev wrote:
               | My understanding is that Suno does _not sell music_ , but
               | instead makes a tool for musicians to generate music and
               | sells access to this _tool_.
               | 
               | The law doesn't distinguish between basement and cloud -
               | it's a service. You can sell access to the service
               | without selling songs to consumers.
        
             | burnt-resistor wrote:
             | So not only did they pirate works but they destroyed
             | possibly collectible physical copies too. Kafkaesque.
        
               | bigyabai wrote:
               | Google set the precedent for this with an _even less
               | transformative_ use case: https://en.wikipedia.org/wiki/A
               | uthors_Guild,_Inc._v._Google,....
        
           | AnthonyMouse wrote:
           | > That is, he ruled that
           | 
           | > - buying, physically cutting up, physically digitizing
           | books, and using them for training is fair use
           | 
           | > - pirating the books for their digital library is not fair
           | use.
           | 
           | That seems inconsistent with one another. If it's fair use,
           | how is it piracy?
           | 
           | It also seems pragmatically trash. It doesn't do the authors
           | any good for the AI company to buy _one_ copy of their book
           | (and a used one at that), but it _does_ make it much harder
           | for smaller companies to compete with megacorps for AI stuff,
           | so it 's basically the stupidest of the plausible outcomes.
        
             | MrJohz wrote:
             | These are two separate actions that Anthropic did:
             | 
             | * They downloaded a massive online library of pirated books
             | that someone else was distributing illegally. This was not
             | fair use.
             | 
             | * They then digitised a bunch of books that they physically
             | owned copies of. This was fair use.
             | 
             | This part of the ruling is pretty much existing law. If you
             | have a physical book (or own a digital copy of a book), you
             | can largely do what you like with it within the confines of
             | your own home, including digitising it. But you are not
             | allowed to distribute those digital copies to others, nor
             | are you allowed to download other people's digital copies
             | that you don't own the rights to.
             | 
             | The interesting part of this ruling is that once Anthropic
             | had a legal digital copy of the books, they could use it
             | for training their AI models and then release the AI
             | models. According to the judge, this counts as fair use
             | (assuming the digital copies were legally sourced).
        
               | AnthonyMouse wrote:
               | > This part of the ruling is pretty much existing law. If
               | you have a physical book (or own a digital copy of a
               | book), you can largely do what you like with it within
               | the confines of your own home, including digitising it.
               | But you are not allowed to distribute those digital
               | copies to others, nor are you allowed to download other
               | people's digital copies that you don't own the rights to.
               | 
               | Can you point me to the US Supreme Court case where this
               | is existing law?
               | 
               | It's pretty clear that if you have a physical copy of a
               | book, you can lend it to someone. It also seems pretty
               | reasonable that the person borrowing it could make fair
               | use of it, e.g. if you borrow a book from the library to
               | write a book review and then quote an excerpt from it. So
               | the only thing that's left is, what if you do the same
               | thing over the internet?
               | 
               | Shouldn't we be able to distinguish this from the case
               | where someone is distributing multiple copies of a work
               | without authorization and the recipients are each making
               | and keeping permanent copies of it?
        
               | MrJohz wrote:
               | I cannot point to the case, because my entire knowledge
               | about the legality of this stuff comes from vaguely
               | following the articles about this case. But feel free to
               | read the judgement in this case where it will be spelled
               | out in much more detail.
               | 
               | Also, I don't quite understand how your example is
               | relevant to the case. If you give a book to a friend,
               | they are now the owner of that book and can do what they
               | like with it. If you photocopy that book and give them
               | the photocopy, they are not the owner of the book and you
               | have reproduced it without permission. The same is, I
               | believe, true of digital copies - this is how ebook
               | libraries work.
               | 
               | In this case, Anthropic were the legal owners of the
               | physical books, and so could do what they wanted with
               | them. They were not the legal owners of the digital
               | books, which means they can get prosecuted for copyright
               | infringement.
        
               | AnthonyMouse wrote:
               | > If you give a book to a friend, they are now the owner
               | of that book and can do what they like with it.
               | 
               | We're talking about lending rather than ownership
               | transfers, though of course you could regard lending as a
               | sort of ownership transfer with an agreement to transfer
               | it back later.
               | 
               | > If you photocopy that book and give them the photocopy,
               | they are not the owner of the book and you have
               | reproduced it without permission.
               | 
               | But then the question is whether the copy is fair use,
               | not who the owner of the original copy was, right? For
               | example, you can make a fair use photocopy of a page from
               | a library book.
               | 
               | > They were not the legal owners of the digital books,
               | which means they can get prosecuted for copyright
               | infringement.
               | 
               | Even if the copy they make falls under fair use and the
               | person who does own that copy of the book has no
               | objection to their doing this?
        
               | op00to wrote:
               | It is "established" law because the Copyright Act itself
               | and a string of unanimous or near-unanimous appellate
               | decisions (google ReDigi on digital transfers and Sony
               | and the first-sale for personal use and physical lending)
               | uniformly apply the same principles, leaving no circuit
               | split and no conflicting precedent for the Supreme Court
               | to resolve. In the U.S. system statutory text interpreted
               | consistently by the Courts of Appeals becomes binding law
               | nationwide unless and until the Supreme Court or Congress
               | says otherwise.
        
               | AnthonyMouse wrote:
               | Sony v. Universal _is_ a Supreme Court case, but that 's
               | the one where they say that sort of thing is fair use
               | rather than that it isn't. ReDigi isn't a Supreme Court
               | case, and it seems rather inconsistent with the Sony case
               | which is. To claim uniformity you'd then need all the
               | other circuit courts coming to the same conclusion rather
               | than just not having had any relevant cases there yet,
               | but is that the case?
        
               | cusaitech wrote:
               | The judge said they can train however I believe the judge
               | did not make any ruling regarding model outputs
        
           | icelancer wrote:
           | > You skipped quotes about the other important side:
           | 
           | He said:
           | 
           | > It was always somewhat obvious that pirating a library
           | would be copyright infringement.
           | 
           | ??
        
         | alok-g wrote:
         | AFAIK, Judge Vince Chhabria has countered that Fair Use
         | argument in a later order involving Meta.
         | 
         | https://www.courtlistener.com/docket/67569326/598/kadrey-v-m...
         | 
         | Note: I am not a lawyer.
        
         | franczesko wrote:
         | Is fruit of the poisonous tree rule applicable here?
        
         | sershe wrote:
         | Im not sure how I feel about what anthropic did on merit as a
         | matter of scale, but from a legalistic standpoint how is it
         | different from using the book to train the meat model in my
         | head? I could even learn bits by heart and quote them in
         | context.
        
         | MaxPock wrote:
         | How times change .They wanted to lock up Aaron Schwartz for
         | life for essentially doing the same thing Anthropic is doing.
        
       | guywithahat wrote:
       | If you own a book, it should be legal for your computer to take a
       | picture of it. I honestly feel bad for some of these AI companies
       | because the rules around copyright are changing just to target
       | them. I don't owe copyright to every book I read because I may
       | subconsciously incorporate their ideas into my future work.
        
         | raincole wrote:
         | Are we reading the same article? The article explicitly states
         | that it's okay to cut up and scan the books you own to train a
         | model from them.
         | 
         | > I honestly feel bad for some of these AI companies because
         | the rules around copyright are changing just to target them
         | 
         | The ruling would be a huge win for AI companies if held. It's
         | really weird that you reached the opposite conclusion.
        
         | rapind wrote:
         | Everything is different at scale. I'm not giving a specific
         | opinion on copyright here, but it just doesn't make sense when
         | we try to apply individual rights and rules to systems of
         | massive scale.
         | 
         | I really think we need to understand this as a society and also
         | realize that moneyed interests will downplay this as much as
         | possible. A lot of the problems we're having today are due to
         | insufficient regulation differentiating between individuals and
         | systems at scale.
        
         | organsnyder wrote:
         | The difference here is that an LLM is a mechanical process. It
         | may not be deterministic (at least, in a way that my brain
         | understands determinism), but it's still a machine.
         | 
         | What you're proposing is considering LLMs to be equal to humans
         | when considering how original works are created. You could make
         | the argument that LLM training data is no different from a
         | human "training" themself over a lifetime of consuming content,
         | but that's a philosophical argument that is at odds with our
         | current legal understanding of copyright law.
        
           | kevinpet wrote:
           | That's not a philosophical argument at odds with our current
           | understanding of copyright law. That's exactly what this
           | judge found copyright law currently is and it's quoted in the
           | article being discussed.
        
             | organsnyder wrote:
             | Thanks for pointing that out. Obviously I hadn't read the
             | whole article. That is an interesting determination the
             | judge made:
             | 
             | > Alsup ruled that Anthropic's use of copyrighted books to
             | train its AI models was "exceedingly transformative" and
             | qualified as fair use, a legal doctrine that allows certain
             | uses of copyrighted works without the copyright owner's
             | permission.
        
               | JoeAltmaier wrote:
               | There are still questions: is an AI a 'user' in the
               | copyright sense?
               | 
               | Or even, is an individual operating within the law as
               | fair use, the same as a voracious all-consuming AI
               | training bot consuming everything the same in spirit?
               | 
               | Consider a single person in a National Park, allowed to
               | pick and eat berries, compared to bringing a combine
               | harvester to take it all.
        
         | zerotolerance wrote:
         | "Judge says training Claude on books was fair use, but piracy
         | wasn't."
        
         | atomicnumber3 wrote:
         | The core problem here is that copyright already doesn't
         | actually follow any consistent logical reasoning. "Information
         | wants to be free" and so on. So our own evaluation of whether
         | anything is fair use or copyrighted or infringement thereof is
         | always going to be exclusively dictated by whatever a judge's
         | personal take on the pile of logical contradictions is.
         | Remember, nominally, the sole purpose of copyright is not
         | rooted in any notions of fairness or profitability or anything.
         | It's specifically to incentivize innovation.
         | 
         | So what is the right interpretation of the law with regards to
         | how AI is using it? What better incentivizes innovation? Do we
         | let AI companies scan everything because AI is innovative? Or
         | do we think letting AI vacuum up creative works to then
         | stochastically regurgitate tiny (or not so tiny) slices of them
         | at a time will hurt innovation elsewhere?
         | 
         | But obviously the real answer here is money. Copyright is
         | powerful because monied interests want it to be. Now that
         | copyright stands in the way of monied interests for perhaps the
         | first time, we will see how dedicated we actually were to
         | whatever justifications we've been seeing for DRM and copyright
         | for the last several decades.
        
         | Bjorkbat wrote:
         | Something missed in arguments such as these is that in
         | measuring fair use there's a consideration of impact on the
         | potential market for a rightsholder's present and future works.
         | In other words, can it be proven that what you are doing is
         | meaningfully depriving the author of future income.
         | 
         | Now, in theory, you learning from an author's works and
         | competing with them in the same market could meaningfully
         | deprive them of income, but it's a very difficult argument to
         | prove.
         | 
         | On the other hand, with AI companies it's an easier argument to
         | make. If Anthropic trained on _all_ of your books (which is
         | somewhat likely if you 're a fairly popular author) and you saw
         | a substantial loss of income after the release of one of their
         | better models (presumably because people are just using the LLM
         | to write their own stories rather than buy your stuff), then
         | it's a little bit easier to connect the dots. A company used
         | your works to build a machine that competes with you, which
         | arguably violates the fair use principle.
         | 
         | Gets to the very principle of copyright, which is that you
         | shouldn't have to compete against "yourself" because someone
         | copied you.
        
           | parliament32 wrote:
           | > a consideration of impact on the potential market for a
           | rightsholder's present and future works
           | 
           | This is one of those mental gymnastics exercises that makes
           | copyright law so obtuse and effectively unenforceable.
           | 
           | As an alternative, imagine a scriptwriter buys a textbook on
           | orbital mechanics, while writing Gravity (2013). A large
           | number of people watch the finished film, and learn something
           | about orbital mechanics, therefore not needing the textbook
           | anymore, causing a loss of revenue for the textbook author.
           | Should the author be entitled to a percentage of Gravity's
           | profit?
           | 
           | We'd be better off abolishing everything related to copyright
           | and IP law alltogether. These laws might've made sense back
           | in the days of the printing press but they're just
           | nonsensical nowadays.
        
             | Bjorkbat wrote:
             | Personally I think a more effective analogy would be if
             | someone used a textbook and created an online course /
             | curriculum effective enough that colleges stop recommending
             | the purchase of said textbook. It's honestly pretty
             | difficult to imagine a movie having a meaningful impact on
             | the sale of textbooks since they're required for high
             | school / college courses.
             | 
             | So here's the thing, I don't think a textbook author going
             | against a purveyor of online courseware has much of a
             | chance, nor do I think it _should_ have much of a chance,
             | because it probably lacks meaningful proof that their works
             | made a contribution to the creation of the courseware.
             | Would I feel differently if the textbook author could prove
             | in court that a substantial amount of their material
             | contributed to the creation of the courseware, and when I
             | say  "prove" I mean they had receipts to prove it? I think
             | that's where things get murky. If you can actually prove
             | that your works made a meaningful contribution to the thing
             | that you're competing against, then maybe you have a point.
             | The tricky part is defining meaningful. An individual
             | author doesn't make a meaningful contribution to the
             | training of an LLM, but a large number of popular and/or
             | prolific numbers can.
             | 
             | You bring up a good point, interpretation of fair use is
             | difficult, but at the end of the day I really don't think
             | we should abolish copyright and IP altogether. I think it's
             | a good thing that creative professionals have some security
             | in knowing that they have legal protections against having
             | to "compete against themselves"
        
               | TeMPOraL wrote:
               | > _An individual author doesn 't make a meaningful
               | contribution to the training of an LLM, but a large
               | number of popular and/or prolific numbers can._
               | 
               | That's a point I normally use to argue _against_ authors
               | being entitled to royalties on LLM outputs. An individual
               | author 's marginal contribution to an LLM is essentially
               | nil, and could be removed from the training set with no
               | meaningful impact on the model. It's only the
               | accumulation of a very large amount of works that turns
               | into a capable LLM.
        
       | shrubble wrote:
       | Let's say my AI company is training an AI on woodworking books
       | and at the end, it will describe in text and wireframe drawings
       | (but not the original or identical photos) how to do a particular
       | task.
       | 
       | If I didn't license all the books I trained on, am I not
       | depriving the publisher of revenue, given people will pay me for
       | the AI instead of buying the book?
        
         | mathiaspoint wrote:
         | The same argument applies to someone who learned from the book
         | and wrote an article explaining the idea to someone else.
        
         | mrkstu wrote:
         | If you paid a human author to do the same you'd be breaking no
         | law. Learning is the point of books existing in the first
         | place.
        
           | NoOn3 wrote:
           | Humans learning, not machines learning is the point of books.
        
       | hellohihello135 wrote:
       | It's easy to point fingers at others. Meanwhile the top comment
       | in this thread links to stolen content from Business Insider.
        
         | fakeBeerDrinker wrote:
         | How is it stolen from Business Insider? When I visit
         | businessinsider.com/anthropic-cut-pirated-millions-used-books-
         | train-claude-copyright-2025-6 I get the same story. My browser
         | caches the story, and I save it for archival purposes. How is
         | this theft?
        
           | hellohihello135 wrote:
           | BI decides who can access this content and who will get the
           | paywall. The link to archive page allows people to access
           | this content without permission. That's called stealing.
        
             | fakeBeerDrinker wrote:
             | When I hop on a VPN and enter ingconito mode from a clean
             | browser session, bypassing their paywall, is that stealing?
             | This doesn't meet the definition of stealing that I'm
             | familiar with.
        
         | jtrn wrote:
         | Best godamn comment in this whole thread. Now we can have fun
         | reading the the mental gymnastics !
        
       | adolph wrote:
       | Alsup detailed Anthropic's training process with books: The
       | OpenAI rival        spent "many millions of dollars" buying used
       | print books, which the        company or its vendors then
       | stripped of their bindings, cut the pages,        and scanned
       | into digital files.
       | 
       | I've noticed an increase in used book prices in the recent past
       | and now wonder if there is an LLM effect in the market.
        
       | codedokode wrote:
       | If AI companies are allowed to use pirated material to create
       | their products, does it mean that everyone can use pirated
       | software to create products? Where is the line?
       | 
       | Also please don't use word "learning", use "creating software
       | using copyrighted materials".
       | 
       | Also let's think together how can we prevent AI companies from
       | using our work using technical measures if the law doesn't work?
        
         | rvnx wrote:
         | ~1B USD in cash is the line where laws apply very differently
        
         | redcobra762 wrote:
         | It's abusive and wrong to try and prevent AI companies from
         | using your works at all.
         | 
         | The whole point of copyright is to ensure you're paid for your
         | work. AI companies shouldn't pirate, but if they pay for your
         | work, they should be able to use it however they please,
         | including training an LLM on it.
         | 
         | If that LLM reproduces your work, then the AI company is
         | violating copyright, but if the LLM doesn't reproduce your
         | work, then you have not been harmed. Trying to claim harm when
         | you haven't been due to some philosophical difference in
         | opinion with the AI company is an abuse of the courts.
        
           | codedokode wrote:
           | It is not wrong at all. The author decides what to do with
           | their work. AI companies are rich and can simply buy the
           | rights or hire people to create works.
           | 
           | I could agree with exceptions for non-commercial activity
           | like scientific research, but AI companies are made for
           | extracting profits and not for doing research.
           | 
           | > AI companies shouldn't pirate, but if they pay for your
           | work, they should be able to use it however they please,
           | including training an LLM on it.
           | 
           | It doesn't work this way. If you buy a movie it doesn't mean
           | you can sell goods with movie characters.
           | 
           | > then you have not been harmed.
           | 
           | I am harmed because less people will buy the book if they can
           | simply get an answer from LLM. Less people will hire me to
           | write code if an LLM trained on my code can do it. Maybe
           | instead of books we should start making applications that
           | protect the content and do not allow copying text or making
           | screenshots. ANd instead of open-source code we should
           | provide binary WASM modules.
        
             | redcobra762 wrote:
             | If you reproduce the material from a work you've purchased
             | then of course you're in violation of copyright, but that's
             | not what an LLM does (and when it does I already conceded
             | it's in violation and should be stopped). An LLM that
             | _doesn 't_ "sell goods with movie characters" is not in
             | violation.
             | 
             | And the harm you describe is not a recognized harm. You
             | don't own information, you own creative works in their
             | entirety. If your work is simply a reference, then the fact
             | being referenced isn't something you own, thus you are not
             | harmed if that fact is shared elsewhere.
             | 
             | It is an abuse of the courts to attempt to prevent people
             | who have purchased your works from using those works to
             | train an LLM. It's morally wrong.
        
               | codedokode wrote:
               | To load a printed book into a computer one has to
               | reproduce it in digital form without authorization.
               | That's making a copy.
        
               | redcobra762 wrote:
               | Making a digital copy of a physical book is fair use
               | under every legal structure I am aware of.
               | 
               | When you do it for a transformative purpose (turning it
               | into an LLM model) it's certainly fair use.
               | 
               | But more importantly, it's _ethical_ to do so, as the
               | agreement you 've made with the person you've purchased
               | the book from included permission to do exactly that.
        
               | seadan83 wrote:
               | Per the ruling, the problem is the books were not
               | purchased, they were downloaded from black market
               | websites. It's akin to shoplifting, what you do later
               | with the goods is a different matter.
               | 
               | Reasonable minds could debate the ethics of how the
               | material was used, this ruling judged the usage was legal
               | and fair use. The only problem is the material was in
               | effect stolen.
        
               | CaptainFever wrote:
               | > It is worse than ineffective; it is wrong too, because
               | software developers should not exercise such power over
               | what users do. Imagine selling pens with conditions about
               | what you can write with them; that would be noisome, and
               | we should not stand for it. Likewise for general
               | software. If you make something that is generally useful,
               | like a pen, people will use it to write all sorts of
               | things, even horrible things such as orders to torture a
               | dissident; but you must not have the power to control
               | people's activities through their pens. It is the same
               | for a text editor, compiler or kernel.
               | 
               | Sorry for the long quote, but basically this, yeah. A
               | major point of free software is that creators should not
               | have the power to impose arbitrary limits on the users of
               | their works. It is unethical.
               | 
               | It's why the GPL allows the user to disregard any
               | additional conditions, why it's viral, and why the FSF
               | spends so much effort on fighting "open source but..."
               | licenses.
        
             | CaptainFever wrote:
             | > Maybe instead of books we should start making
             | applications that protect the content and do not allow
             | copying text or making screenshots.
             | 
             | https://en.wikipedia.org/wiki/Analog_hole
        
           | DrillShopper wrote:
           | > The whole point of copyright is to ensure you're paid for
           | your work.
           | 
           | No. The point of copyright is that the author gets to decide
           | under what terms their works are copied. That's the essence
           | of copyright. In many cases, authors will happily sell you a
           | copy of their work, but they're under no obligation to do so.
           | They can claim a copyright and then never release their work
           | to the general public. That's perfectly within their rights,
           | and they can sue to stop anybody from distributing copies.
        
             | redcobra762 wrote:
             | We're operating under a model where the owner of the
             | copyright _has_ already sold their work. And while it 's
             | within their rights to stipulate conditions of the sale,
             | they did not do that, and fair use of the work as governed
             | under the laws the book was sold under encompasses its
             | conversion into an LLM model.
             | 
             | If the author didn't want their work to be included in an
             | LLM, they should not have sold it, just like if an author
             | didn't want their work to inspire someone else's work, they
             | should not have sold it.
        
               | DrillShopper wrote:
               | > fair use of the work as governed under the laws the
               | book was sold under encompasses its conversion into an
               | LLM model
               | 
               | If that were the case then this court case would not be
               | ongoing
        
               | lcnPylGDnU4H9OF wrote:
               | That seems to be a misunderstanding of what's disputed.
               | One fact that is disputed is whether or not the use of
               | the work qualifies as fair use and the judge determined
               | that it is because the result is sufficiently
               | transformative. Another disputed fact is whether the
               | books were acquired legally and the judge determined that
               | they were not. The reason the case is still ongoing is to
               | determine Anthropic's liability for illegally acquiring
               | copies of the books, not to determine the legal status of
               | the LLMs.
        
               | seadan83 wrote:
               | Yeah, this is part of the ruling. The judge decided that
               | the usage was sufficiently transformative and thus fair
               | use. The issue is the authors were selling their works
               | and the company went to a black market instead.
        
           | 827a wrote:
           | Current copyright law is not remotely sophisticated enough to
           | make determinations on AI fair use. Whether the courts say
           | current AI use is fair is irrelevant to the discussion most
           | people on this side would agree with: That we need new laws.
           | The work the AI companies stole to train on was created under
           | a copyright regime where the expectation was that, eh, a few
           | people would learn from and be inspired from your work, and
           | that feels great because you're empowering other humans.
           | Scale does not amplify Good. The regime has changed. The
           | expectations under what kinds of use copyright protects
           | against has fundamentally changed. The AI companies invented
           | New Horrors that no one could have predicted, Vader altered
           | the deal, no reasonable artist except the most forward-
           | thinking sci-fi authors would have remotely guessed what
           | their work would be used for, and thus could never have
           | conciously and fairly agreed to this exchange. Very few would
           | have agreed to it.
        
           | xdennis wrote:
           | > It's abusive and wrong to try and prevent AI companies from
           | using your works at all.
           | 
           | People don't view moral issues in the abstract.
           | 
           | A better perspective on this is the fact that human
           | individuals have created works which megacorps are training
           | on for free or for the price of a single book and creating
           | models which replace individuals.
           | 
           | The megacorps are only partially replacing individuals now,
           | but when the models get good enough they could replace humans
           | entirely.
           | 
           | When such a future happens will you still be siding with them
           | or with individual creators?
        
             | whycome wrote:
             | > A better perspective on this is the fact that human
             | individuals have created works which megacorps are training
             | on for free or for the price of a single book and creating
             | models which replace individuals.
             | 
             | Those damn kind readers and libraries. Giving their single
             | copy away when they just paid for the single.
        
         | whycome wrote:
         | But the AI used the content to learn how to copy and recreate
         | it. Is 're-creation' a better concept for us?
         | 
         | People already use pirated software for product creation.
         | 
         | Hypothetical:
         | 
         | I know a guy who learned photoshop on a pirated copy of
         | Photoshop. He went on to be a graphic designer. All his
         | earnings are 'proceeds from crime'
         | 
         | He never used the pirated software to produce content.
        
           | timeon wrote:
           | So can we officially download pirated content to learn stuff
           | now?
        
             | southernplaces7 wrote:
             | Sure, and I feel zero moral qualms about me or anyone else
             | doing it. The vast majority of the shit flows flows from
             | the other direction towards individuals and consumers when
             | it comes to content delivery companies and worse still,
             | software companies. Let's address that before wringing our
             | hands about individual acts of "piracy", even at scale.
             | 
             | I could, right now in just a few minutes, go download a
             | perfectly functional pirated copy of nearly any Adobe
             | program, nearly any Microsoft program and a whole range of
             | books and movies, yet I see zero real financial troubles
             | affecting any of the companies behind these. All the
             | contrary in fact.
        
             | whycome wrote:
             | How often does a link get posted here of content that is
             | behind a paywall? If you bypass it to read it, didny't you
             | just learn via illegal content? I'm not sure where the
             | "official" comes in, but it's clearly widely accepted.
             | 
             | If you watch a YouTube video to learn something and it's
             | later taken down for using copyrighted images, you learned
             | from illegal content.
        
         | megaman821 wrote:
         | Where are you reading that?
         | 
         | You are allowed to buy and scan books, and then used those
         | scanned books to create products. I guess you are also allowed
         | to pirate books and use the knowledge to create products if you
         | are willing to pay the damages to the rights holders for
         | copyright violations.
        
         | stackedinserter wrote:
         | When I was young and poor I learned on pirated software. Do I
         | owe Adobe, Microsoft and others a percentage of my today
         | income?
        
       | koolala wrote:
       | Anyone read the 2006 sci-fi book Rainbow's End that has this? It
       | was set in 2025.
        
         | solfox wrote:
         | I was 100% thinking this. GREAT book. And they, too, shredded
         | books to ingest them into the digital library! I don't recall
         | if it was an attempt to bypass copyright though; in Rainbow's
         | End, it was more technical, as it was easier to shred, scan the
         | pieces, and reassemble them in software, rather than scanning
         | each page.
        
       | Uhhrrr wrote:
       | From Vinge's "Rainbow's End":
       | 
       | > In fact this business was the ultimate in deconstruction: First
       | one and then the other would pull books off the racks and toss
       | them into the shredder's maw. The maintenance labels made calm
       | phrases of the horror: The raging maw was a "NaviCloud custom
       | debinder." The fabric tunnel that stretched out behind it was a
       | "camera tunnel...." The shredded fragments of books and magazine
       | flew down the tunnel like leaves in tornado, twisting and
       | tumbling. The inside of the fabric was stitched with thousands of
       | tiny cameras. The shreds were being photographed again and again,
       | from every angle and orientation, till finally the torn leaves
       | dropped into a bin just in front of Robert. Rescued data.
       | BRRRRAP! The monster advanced another foot into the stacks,
       | leaving another foot of empty shelves behind it.
        
         | microtherion wrote:
         | Yes, I was thinking of this passage as well. The technology
         | does not seem to have advanced to this particular point yet.
        
       | codedokode wrote:
       | > "Like any reader aspiring to be a writer, Anthropic's LLMs
       | trained upon works not to race ahead and replicate or supplant
       | them -- but to turn a hard corner and create something
       | different," he wrote.
       | 
       | But this analogy seems wrong. First, LLM is not a human and
       | cannot "learn" or "train" - only human can do it. And LLM
       | developers are not aspiring to become writers and do not learn
       | anything, they just want to profit by making software using
       | copyrighted material. Also people do not read millions of books
       | to become a writer.
        
         | CaptainFever wrote:
         | > But this analogy seems wrong. First, LLM is not a human and
         | cannot "learn" or "train" - only human can do it.
         | 
         | The analogy refers to humans using machines to do what would
         | already be legally if they did it manually.
         | 
         | > And LLM developers are not aspiring to become writers and do
         | not learn anything, they just want to profit by making software
         | using copyrighted material.
         | 
         | [Citation needed], and not a legal argument.
         | 
         | > Also people do not read millions of books to become a writer.
         | 
         | But people do hear millions of words as children.
        
       | m4rtink wrote:
       | Anyone else thinks destroying books for any reason is wrong ?
       | 
       | Or is it perhaps not an universal cultural/moral aspect ?
       | 
       | I guess for example in Europe people could be more sensitive to
       | it.
        
         | lawlessone wrote:
         | If they aren't one of a kind and they digitally preserved them
         | in some way i think i would be ok with it.
         | 
         | Saying that though there are tools for digitizing books that
         | don't require destroying them
        
         | stackedinserter wrote:
         | There's nothing sacred about books. There are plenty of books
         | that won't be missed if destroyed.
        
         | kbelder wrote:
         | I have purposefully destroyed one book in my life, in order to
         | prevent anyone from reading it:
         | 
         |  _Man of Two Worlds_ by Brian Herbert.
         | 
         | ...and I did the world a favor.
        
       | codedokode wrote:
       | By the way I wonder if recent advancement in protecting Youtube
       | videos from downloaders like yt-d*p are caused by unwillingness
       | to help rival AI companies gather the datasets.
        
       | lvl155 wrote:
       | It's marginally better than Meta torrenting z-lib.
        
       | randomNumber7 wrote:
       | I will never feel bad again for learning from copied books /S
        
       | jimnotgym wrote:
       | Hang on, it is OK under copyright law to scan a book I bought
       | second hand, destroy the hard copy and keep the scan in my online
       | library? That doesn't seem to chime with the copyright notices I
       | have read in books.
        
         | badlibrarian wrote:
         | First sale doctrine gives the person who sold the book you
         | bought the right to sell it to you. Fair Use permits you to
         | scan your copy, used or new. It's your book, you can destroy
         | it. But you have to delete your digital copy if you sell it or
         | give it away. And you can't distribute your digital copy.
        
         | kube-system wrote:
         | Fair use can be a pretty gray area and details matter, but
         | copying for personal use is frequently okay.
         | 
         | > That doesn't seem to chime with the copyright notices I have
         | read in books.
         | 
         | You shouldn't get your legal advice from someone with skin in
         | the game.
        
       | platunit10 wrote:
       | Every time an article like this surfaces, it always seems like
       | the majority of tech folks believe that training AI on
       | copyrighted material is NOT fair use, but the legal industry
       | disagrees.
       | 
       | Which of the following are true?
       | 
       | (a) the legal industry is susceptible to influence and corruption
       | 
       | (b) engineers don't understand how to legally interpret legal
       | text
       | 
       | (c) AI tech is new, and judges aren't technically qualified to
       | decide these scenarios
       | 
       | Most likely option is C, as we've seen this pattern many times
       | before.
        
         | rockemsockem wrote:
         | Idk, I think most people in tech I talk to IRL think it is fair
         | use?
         | 
         | I think the overly liberal, non-tech crowd has become really
         | vocal on HN as of late and your sample is likely biased by
         | these people.
        
         | 827a wrote:
         | Armchair commentators, including myself, tend to be imprecise
         | when speaking about whether something is illegal, versus
         | something should be illegal. Sometimes due to a
         | misunderstanding of the law, or an over-estimation of the
         | court's authority, or an over-estimation of our legislature's
         | productivity, or just because we're making conversation and
         | like talking.
        
         | CaptainFever wrote:
         | > Every time an article like this surfaces, it always seems
         | like the majority of tech folks believe that training AI on
         | copyrighted material is NOT fair use
         | 
         | Where are you getting your data from? My conclusions are the
         | exact opposite.
         | 
         | (Also, aren't judges by definition the only ones qualified to
         | declare if it is _actually_ fair use? You could make a case
         | that it _shouldn 't_ be fair use, but that's different from it
         | _being_ not fair use.)
        
         | redcobra762 wrote:
         | It's not likely you've actually gotten the opinion of the
         | "majority of tech folks", just the most outspoken ones, and
         | only in specific bubbles you belong to.
        
         | OkayPhysicist wrote:
         | There's a lot of conflation of "should/shouldn't" and
         | "is/isn't". The comments by tech folk you're alluding to mostly
         | think that it "shouldn't" be fair use, out of concern about the
         | societal consequences, whereas judges are looking at it and
         | saying that it "is" fair use, based on the existing law.
         | 
         | Any reasonable reading of the current state of fair use
         | doctrine makes it obvious that the process between _Harry
         | Potter and the Sorcerer 's Stone_ and "A computer program that
         | outputs responses to user prompts about a variety of topics" is
         | _wildly_ transformative, and thus the usage of the copyrighted
         | material is probably covered by fair use.
        
         | kube-system wrote:
         | I know for sure (b) is true. Way too many people on technical
         | forums read legal texts as if the process to interpret laws is
         | akin to a compiler generating a binary.
        
         | standardUser wrote:
         | I don't understand at all the resistance to training LLMs on
         | any and all materials available. Then again, I've always viewed
         | piracy as a compatible with markets and a democratizing force
         | upon them. I thought (wrongly?) that this was the widespread
         | progressive/leftist perspective, to err on the side of access
         | to information.
        
         | freshtake wrote:
         | If I allegedly train off of your training, which was trained
         | off of copyrighted content under fair use, we're good right?
         | 
         | Just asking for a friend who's into this sort of thing.
        
         | mrguyorama wrote:
         | Seeing as (a) is true in the US Supreme Court, it's probably at
         | least as true in the lower courts.
        
       | Zufriedenheit wrote:
       | Maybe to give something back to the pirates, Anthropic could
       | upload all the books they have digitized to the archive? /s
        
       | pmdr wrote:
       | They've all done that, it should be obvious by now. Training on
       | just freely available data only gets you so far.
        
       | stackedinserter wrote:
       | Everybody that wants to train an LLM, should buy every single
       | book, every single issue of a magazine or a newspaper, and
       | personally ask every person that ever left a comment on social
       | media. /s
       | 
       | If I was China I would buy every lawyer to drown western AI
       | companies in lawsuits, because it's an easy way to win AI race.
        
       | IOT_Apprentice wrote:
       | If Anthropic is funded by Amazon, they should have just asked
       | Amazon for unlimited download of EVERY book in the Amazon book
       | store, and all audio-books as well. It certainly would be faster
       | than buying one copy of each and tearing it apart.
        
       | godelski wrote:
       | The solution has always been: show us the training data.
       | 
       | As a researcher I've been furious that we publish papers where
       | the research data is unknown. To add insult to injury, we have
       | the _audacity_ to start making claims about  "zero-shot", "low-
       | shot", "OOD", and other such things. It is utterly laughable.
       | These would be tough claims to make * _even if we knew the data*_
       | , simply because of its size. But not knowing the data, it is
       | outlandish. Especially because the presumptions are "everything
       | on the internet." It would be like training on all of GitHub and
       | then writing your own _simple_ programming questions to test an
       | LLM[0]. Analyzing that amount of data is just intractable, and we
       | currently do not have the mathematical tools to do so. But this
       | is a much harder problem to crack when we 're just conjecturing
       | and ultimately this makes interoperability more difficult.
       | 
       | On top of all of that, we've been playing this weird legal game.
       | Where it seems that every company has had to cheat. I can
       | understand how smaller companies turn to torrenting to compete,
       | but when it is big names like Meta, Google, Nvidia, OpenAI
       | (Microsoft), etc it is just wild. This isn't even following the
       | highly controversial advice of Eric Schmidt "Steal everything,
       | then if you get big, let the lawyers figure it out." This is just
       | "steal everything, even if you could pay for it." We're talking
       | about the richest companies in the entire world. Some of the, if
       | not _the_ , richest companies to ever exist.
       | 
       | Look, can't we just try to be a _little_ ethical? There is, in
       | fact, enough money to go around. We 've seen unprecedented growth
       | in the last few years. It was only 2018 when Apple became the
       | first trillion dollar company, 2020 when it became the second two
       | trillion, and 2022 when it became the first three trillion dollar
       | company. Now we have 10 companies north of the trillion dollar
       | mark![3] (5 above $2T and 3 above $3T) These values have
       | _exploded_ in the last 5 years! It feels difficult to say that we
       | don 't have enough money to do things better. To at least not
       | completely screw over "the little guy." I am unconvinced that
       | these companies would be hindered if they had to broker some deal
       | for training data. Hell, they're already going to war over data
       | access.
       | 
       | My point here is that these two things align. We're talking about
       | how this technology is so dangerous (every single one of those
       | CEOs has made that statement) and yet we can't remain remotely
       | ethical? How can you shout "ONLY I CAN MAKE SAFE AI" while acting
       | so unethically? There's always moral gray areas but is this
       | really one of them? I even say this as someone who has torrented
       | books myself![4] We are holding back the data needed to make AI
       | safe and interpretable while handing the keys to those who
       | actively demonstrate that they should not hold the power. I don't
       | understand why this is even that controversial.
       | 
       | [0] Yes, this is a snipe at HumanEval. Yes, I will make the
       | strong claim that the dataset was spoiled from day 1. If you
       | doubt it, go read the paper and look at the questions
       | (HuggingFace).
       | 
       | [1] https://www.theverge.com/2024/8/14/24220658/google-eric-
       | schm...
       | 
       | [2]
       | https://en.wikipedia.org/wiki/List_of_public_corporations_by...
       | 
       | [3] https://companiesmarketcap.com/
       | 
       | [4] I can agree it is wrong, but can we agree there is a big
       | difference between a student torrenting a book and a
       | billion/trillion dollar company torrenting millions of books? I
       | even lean on the side of free access to information, and am a fan
       | of Aaron Swartz and SciHub. I make all my works available on
       | ArXiv. But we can recognize there's a big difference between a
       | singular person doing this at a small scale and a huge multi-
       | national conglomerate doing it at a large scale. I can't even
       | believe we so frequently compare these actions!
        
       | damnesian wrote:
       | seems like the "mis" is missing from the name.
        
       | 2OEH8eoCRo0 wrote:
       | Most of the comments missed the point. It's not that they trained
       | on books, it's that they pirated the books.
        
       | spandrew wrote:
       | Amazon has been doing this since the 2000's. Fun fact: This is
       | how AWS came about; for them to scale its "LOOK INSIDE!" feature
       | for all the books it was hoovering in an attempt to kill the last
       | benefit the bookstore had over them.
       | 
       | Ie. This is not a big deal. The only difference now is ppl are
       | rapidly frothing to be outraged by the mere sniff of new tech on
       | the horizon. Overton window in effect.
        
       | ChrisArchitect wrote:
       | Two week old news.
       | 
       | Some previous discussions:
       | 
       | https://news.ycombinator.com/item?id=44367850
       | 
       | https://news.ycombinator.com/item?id=44381838
       | 
       | https://news.ycombinator.com/item?id=44381639
        
       | burnt-resistor wrote:
       | 1980's: _Johnny No. 5 need input!_
       | 
       | 2020's: (Steals a bunch of books to profit off acquired
       | knowledge.)
        
       | throwawayffffas wrote:
       | The article doesn't say who is suing them. Is it a class action?
       | How many of these 7M pirated books have they written? Is it
       | publishing houses? How many of these books are relevant in this
       | judgement?
        
       | 1vuio0pswjnm7 wrote:
       | Buiness Insider fails to include the Order
       | 
       | https://ia800101.us.archive.org/15/items/gov.uscourts.cand.4...
        
       ___________________________________________________________________
       (page generated 2025-07-07 23:01 UTC)