[HN Gopher] Anthropic cut up millions of used books, and downloa...
___________________________________________________________________
Anthropic cut up millions of used books, and downloaded 7M pirated
ones - judge
Author : pyman
Score : 365 points
Date : 2025-07-07 09:20 UTC (13 hours ago)
(HTM) web link (www.businessinsider.com)
(TXT) w3m dump (www.businessinsider.com)
| pyman wrote:
| Anthropic's cofounder, Ben Mann, downloaded million copies of
| books from Library Genesis in 2021, fully aware that the material
| was pirated.
|
| Stealing is stealing. Let's stop with the double standards.
| damnesian wrote:
| oh well, the product has a cute name and will make someone a
| billionaire, let's just give it the green light. who cares
| about copyright in the age of AI?
| originalvichy wrote:
| At least most pirates just consume for personal use. Profiting
| from piracy is a whole other level beyond just pirating a book.
| pyman wrote:
| Someone on Twitter said: "Oh well, P2P mp3 downloads,
| although illegal, made contributions to the music industry"
|
| That's not what's happening here. People weren't downloading
| music illegally and reselling it on Claude.ai. And while P2P
| networks led to some great tech, there's no solid proof they
| actually improved the music industry.
| drcursor wrote:
| Let's not forget Spotify ;)
|
| https://gizmodo.com/early-spotify-was-built-on-pirated-
| mp3-f...
| pyman wrote:
| Those claims were never proved.
| Imustaskforhelp wrote:
| I really feel as if Youtube is the best sort of convenience
| for music videos where most people watch ads whereas some
| people can use an ad blocker.
|
| I use an adblocker and tbh I think so many people on HN are
| okay with ad blocking and not piracy when basically both
| just block the end user from earning money.
|
| I kind of believe that if you really like a software, you
| really like something. Just ask them what their favourite
| charity is and donate their or join their patreon/a direct
| way to support them.
| Workaccount2 wrote:
| If you are someone who can think clearly, it's extremely
| obvious that the conversation around copyright, LLMs,
| piracy, and ad-blocking is
|
| "What serves me personally the best for any given
| situation" for 95% of people.
| timeon wrote:
| I think that critique of this case is not about piracy in
| itself but how these companies are treated by courts vs.
| how individuals are treated.
| mnky9800n wrote:
| I feel like profit was always a central motive of pirates. At
| least from the historical documents known as, "The Pirates of
| the Caribbean".
| KoolKat23 wrote:
| This isn't really profiting from piracy. They don't make
| money off the raw input data. It's no different to consuming
| for personal use.
|
| They make money off the model weights, which is fair use (as
| confirmed by recent case law).
| j_w wrote:
| This is absurd. Remove all of the content from the training
| data that was pirated and what is the quality of the end
| product now?
| pyman wrote:
| With Claude, people are paying Anthropic to access
| answers that are generated from pirated books, without
| the authors permission, credit, or compensation.
| KoolKat23 wrote:
| There is no copyright on knowledge.
|
| If it outputs parts of the book verbatim then that's a
| different story.
| pyman wrote:
| Let's don't change the focus of the debate.
|
| Pirating 7 million books, remixing their content, and
| using that to power Claude.ai is like counterfeiting 7
| million branded products and selling them on your
| personal website. The original creators don't get credit
| or payment, and someone's profiting off their work.
|
| All this happens while authors, many of them teachers,
| are left scratching their heads with four kids to feed
| KoolKat23 wrote:
| That may be the case, but you'd have to have laws
| changed.
| SirMaster wrote:
| >If it outputs parts of the book verbatim then that's a
| different story.
|
| But it does...
| KoolKat23 wrote:
| That's the law.
|
| Please keep in mind, copyright is intended as a
| compromise between benefit to society and to the
| individual.
|
| A thought experiment, students pirating textbooks and
| applying that knowledge later on in their work?
| j_w wrote:
| When you say that's the law, as far as I'm aware a single
| ruling by a lower court has been issued which upholds
| that application. Hardly settled case law.
| KoolKat23 wrote:
| True, until then best to act as if it is the case.
|
| In my opinion, it will be upheld.
|
| Looking at what is stored and the manner which it is
| stored. It makes sense that it's fair use.
| j_w wrote:
| We're talking about a summary judgement issued that has
| not yet been appealed. That doesn't make it "settled."
|
| If by "what is stored and the manner which it is stored"
| is intended to signal model weights, I'm not sure what
| the argument is? The four factors of copyright in no way
| mention a storage medium for data, lossless or loss-y.
|
| (1) the purpose and character of the use, including
| whether such use is of a commercial nature or is for
| nonprofit educational purposes; (2) the nature of the
| copyrighted work; (3) the amount and substantiality of
| the portion used in relation to the copyrighted work as a
| whole; and (4) the effect of the use upon the potential
| market for or value of the copyrighted work.
|
| In my opinion, this will likely see a supreme court
| ruling by the end of the decade.
| KoolKat23 wrote:
| The use is to train an AI model.
|
| A trillion parameter SOTA model is not substantially
| comprised of the one copyrighted piece. (If it was a
| Harry Potter model trained only on Harry Potter books
| this would be a different story).
|
| Embeddings are not copy paste.
|
| The last point about market impact would be where they
| make their argument but it's tenuous. It's not the
| primary use of AI models and built in prompts try to
| avoid this, so it shouldn't be commonplace unless you're
| jail breaking the model, most folk aren't.
| mrcwinn wrote:
| > At least most pirates just consume for personal use.
|
| Easy for the pirate to say. Artists might argue their intent
| was to trade compensation for one's personal enjoyment of the
| work.
| Workaccount2 wrote:
| The gut punch of being a photographer selling your work on
| display, someone walks by and lines up their phone to take
| a perfect picture of your photograph, and then exclaims to
| you "Your work is beautiful! I can't wait to print this out
| and put it on my wall!"
| jobs_throwaway wrote:
| All the evidence shows that piracy is good for artists'
| business. You make a good work, people are exposed to it
| through piracy, and they end up buying more of your stuff
| than they would otherwise. But keep crying about the
| artist's plight
| SketchySeaBeast wrote:
| The way you've presented this, the evidence is just
| "common sense", which isn't much evidence at all.
| x3n0ph3n3 wrote:
| Copyright infringement is not stealing.
| 1oooqooq wrote:
| actually, the Only time it's a (ethical) crime is when a
| corporation does it at scale for profit.
| pyman wrote:
| Pirating a book and selling it on claude.ai is stealing, both
| legally and morally.
| zb3 wrote:
| Who got robbed? Just because I'd pay for AI it doesn't mean
| I'd buy these books.
| pyman wrote:
| You should ask the teachers who spent years writing those
| books.
| azangru wrote:
| You keep saying the word "teachers"; but that word does
| not appear in the text of the article. Why focus on the
| teachers in particular?
|
| Also, there are various incentives for teachers to
| publish books. Money is just one of them (I wonder how
| much revenue books bring to the teachers). Prestige and
| academic recognition is another. There are probably
| others still. How realistic is the depiction of a
| deprived teacher whose livelihood depended on the books
| he published once every several years?
| zb3 wrote:
| I did not ask them to write those books, and I wouldn't
| buy those.
| BlackFly wrote:
| Making a copy differs from taking an existing object in all
| aspects: literally, technically, legally and ethically.
| Piracy is making a copy you have no legal right to.
| Stealing is taking a physical object that you have no legal
| right to. While the "no legal right to" seems the same
| superficially, in practice the laws differ quite a bit
| because the literal, technical and ethical aspects differ.
| TiredOfLife wrote:
| They are not selling it on claude.ai. If you can prove that
| they are you will be rich.
| thedevilslawyer wrote:
| Where can I download Harry Potter on claude.ai pls?
| slater wrote:
| Why would you want to download a shitty book?
| seydor wrote:
| property infringement isn't either?
| eviks wrote:
| If you infringe by destroying property, then yes, it's not
| stealing
| impossiblefork wrote:
| It's very similar to theft of service.
|
| There's so many texts, and they're so sparse that if I could
| copyright a work and never publish it, the restriction would
| be irrelevant. The probability that you would accidentally
| come upon something close enough that copyright was relevant
| is almost infinitesimal.
|
| Because of this copyright is an incredibly weak restriction,
| and that it is as weak as it is shows clearly that any use of
| a copyrighted work is due to the convenience that it is
| available.
|
| That is, it's about making use of the work somebody else has
| done, not about that restricting you somehow.
|
| Therefore copyright is much more legitimate than ordinary
| property. Ordinary property, especially ownership of land,
| can actually limit other people. But since copyright is so
| sparse infringing on it is like going to world with near-
| infinite space and picking the precise place where somebody
| has planted a field and deciding to harvest from that
| particular field.
|
| Consequently I think copyright infringement might actually be
| worse than stealing.
| jpalawaga wrote:
| you've created a very obvious category mistake in your
| final summary by confusing intellectual property--which can
| be copied at no penalty to an owner (except nebulous
| 'alternate universe' theories)--with actual property, and a
| farmer and his land, with a crop that cannot be enjoyed
| twice.
|
| you're saying copying a book is worse than robbing a farmer
| of his food and/or livelihood, which cannot be replaced to
| duplicated. Meanwhile, someone who copies a book does not
| deprive the author of selling the book again (or a tasty
| proceedings from harvest).
|
| I can't say I agree, for obvious reasons.
| impossiblefork wrote:
| With this special infinite-land-land though, what's
| special about the farmer's land is that he's expended
| energy to make it that way, just as the author has
| expended energy to find his text.
|
| Just as the farmer obtains his livelihood from the
| investment-of-energy-to-raise-crops-to-energy cycle the
| author has his livelihood by the investment-of-energy-to-
| finding-a-useful-work-to-energy cycle.
|
| So he is in fact robbed in a very similar way.
| jpalawaga wrote:
| You're saying that a copy of a digital thing is the same
| as the "only" of a physical thing. But that's not true.
| You can't sell grain twice, but you can sell a movie many
| times (especially when you account for format changes,
| remasterings, platform locks, licensing for special
| usecases like remixing, broadcasts, etc).
|
| You'd have to steal the author's ownership of the
| intellectual property in order for the comparison to be
| valid, just as you stole ownership of his crop.
|
| Separately, there is a reason why theft and copyright
| infringement are two distinct concepts in law.
| impossiblefork wrote:
| The difference here though is that the copyright holder
| sustains himself by the sales of his particular chosen
| text, so it doesn't matter that the text can be
| reproduced infinitely.
| CaptainFever wrote:
| > Consequently I think copyright infringement might
| actually be worse than stealing.
|
| I remember when piracy wasn't theft, and information wanted
| to be free.
| impossiblefork wrote:
| So do I, then I found this reasoning I presented in my
| comment and realised that piracy was actually quite bad.
|
| Ordinary property is much worse than copyright, which is
| both time limited and not necessarily obtained through
| work, and which is much more limited in availability than
| the number of sequences.
|
| When someone owns land, that's actually a place you
| stumble upon and can't enter, whereas you're not going to
| ever stumble upon the story of even 'Nasse hittar en
| stol' (swedish 'Nasse finds a chair') a very short book
| for very small children.
| Der_Einzige wrote:
| Information wants to be free.
| troyvit wrote:
| Then why does Claude cost money?
| dathinab wrote:
| stealing with the intent to gain a unfair marked advantage so
| that you can effectively kill any ethically legally correctly
| acting company in a way which is very likely going to hurt many
| authors through the products you create is far worse then just
| stealing for personal use
|
| that isn't "just" stealing, it's organized crime
| 1970-01-01 wrote:
| Let's get actual definitions of 'theft' before we leap into
| double standards.
| NoMoreNicksLeft wrote:
| >Stealing is stealing.
|
| Yes, but copying isn't stealing, because the person you "take"
| from still has their copy.
|
| If you're allowed to call copying _stealing_ , then I should be
| allowed to call hysterical copyright rabblerousing _rape_. Quit
| being a rapist, pyman.
| kube-system wrote:
| > Stealing is stealing. Let's stop with the double standards.
|
| I get the sentiment, but that statement as is, is absurdly
| reductive. Details matter. Even if someone takes merchandise
| from a store without paying, their sentence will vary depending
| on the details.
| neo__ wrote:
| Hopefully they were all good books at least.
| pyman wrote:
| they pirated the best ones, according to the authors
| pyman wrote:
| These are the people shaping the future of AI? What happened to
| all the ethical values they love to preach about?
|
| We've held China accountable for counterfeiting products for
| decades and regulated their exports. So why should Anthropic be
| allowed to export their products and services after engaging in
| the same illegal activity?
| lofaszvanitt wrote:
| This is the underlying caste system coming to life right before
| your eyes :D.
| stephenitis wrote:
| I think caste system is the wrong analogy here.
|
| Comment is more about the pseudo ethical high ground
| MangoToupe wrote:
| Companies being above the law does create a stratified
| system in this country for those who can benefit from said
| companies and those who cannot. Call it what you like.
| seydor wrote:
| break things and move fast
| benjiro wrote:
| One rule for you, one rule for me ...
|
| You never noticed the hypocrite behavior all over society?
|
| * O, you drunk drive, big fine, lots of trouble. * O, you drunk
| drive and are a senator, cop, mayor, ... Well, lets look the
| other way.
|
| * You have anger management issues and slam somebody to the
| ground. Jail time. * You as a cop have anger management issues
| and slams somebody to the ground. Well, paid time off while we
| investigate and maybe a reprimand. Qualified immunity boy!
|
| * You tax fraud for 10k, felony record, maybe jail time. * You
| as a exec of a company do tax fraud for 100 million. After 10
| years lawyering around, maybe you get something, maybe, ... o,
| here is a fine of 5 million.
|
| I am sorry but the idea of everybody being equal under the law
| has always been a illusion.
|
| We are holding China accountable for counterfeiting products
| because it hurts OUR companies, and their income. But when its
| "us vs us", well, then it becomes a bit more messy and in
| general, those with the biggest backing (as in $$$, economic
| value, and lawyers), tends to win.
|
| Wait, if somebody steal my book, i can sue that person in
| court, and get a payout (lawyers will cost me more but that is
| not the point). If some AI company steals my book, well, the
| chance you win is close to 1%, simply because lots of well paid
| lawyers will make your winning hard to impossible.
|
| Our society has always been based upon power, wealth and
| influence. The more you have of it, the more you get away (or
| reduced) with things, that gets other fined or jailed.
| ffsm8 wrote:
| > We've held China accountable for counterfeiting products for
| decades and regulated their exports
|
| We have? Are we from different multi-verses?
|
| The one I've lived in to date has not done anything against
| Chinese counterfeits beyond occasionally seizing counterfeit
| goods during import. But that's merely occasionally enforcing
| local counterfeit law, a far cry from punishing the entity
| producing it.
|
| As a matter of fact, the companies started outsourcing
| _everything_ to China, making further IP theft and quasi-copies
| even easier
| Workaccount2 wrote:
| I was gonna say, the enforcement is so weak that it's not
| even really worth it to pursue consumer hardware here in the
| US. Make product that is a hit, patent it, and still 1 month
| later IYTUOP will be selling an identical copy for 1/3rd the
| price on Amazon.
| delfinom wrote:
| Patent enforcement requires the patent holder to go after
| violators. The said thing is, there are grounds to sue
| Amazon facilitating it, just nobody has had the money to do
| it. And no big company ever will because of the threat of
| being locked out of AWS.
|
| It's quite the mafia operation over at Amazon.
| wmf wrote:
| The unethical ones didn't buy any books.
| bmitc wrote:
| Silicon Valley has always been the antithesis of ethics. It's
| foundations are much more right wing and libertarian, along the
| extremist lines.
| carlosjobim wrote:
| Why is it unethical of them to use the information in all these
| books? They are clearly not reselling the books in any way,
| shape, or form. The information itself in a book can never be
| copyrighted. You can also publish and sell material where you
| quote other books within it.
| aaron695 wrote:
| Good, this is what Aaron Swartz was fighting for.
|
| Against companies like Elsevier locking up the worlds knowledge.
|
| Authors are no different to scientists, many had government
| funding at one point, and it's the publishing companies that got
| most of the sales.
|
| You can disagree and think Aaron Swartz was evil, but you can't
| have both.
|
| You can take what Anthropic have show you is possible and do this
| yourself now.
|
| isohunt: freedom of information
| ramon156 wrote:
| Pirate and pay the fine is probably hell of a lot cheaper than
| individually buying all these books. I'm not saying this is
| justified, but what would you have done in their situation?
|
| Sayi "they have the money" is not an argument. It's about the
| amount of effort that is needed to individually buy, scan,
| process millions of pages. If that's done for you, why re-do it
| all?
| TimorousBestie wrote:
| 150K per work is the maximum fine for willful infringement
| (which this is).
|
| 105B+ is more than Anthropic is worth on paper.
|
| Of course they're not going to be charged to the fullest extent
| of the law, they're not a teenager running Napster in the early
| 2000s.
| voxic11 wrote:
| Even if they don't qualify for willful infringement damages
| (lets say they have a good faith belief their infringement
| was covered by fair use) the standard statutory damages for
| copyright infringement are $750-$30,000 per work.
| eikenberry wrote:
| Plus they did it with a profit motive which would entail
| criminal proceedings.
| dragonwriter wrote:
| > 150K per work is the maximum fine for willful infringement
|
| No, its not.
|
| It's the maximum _statutory damages_ for willful
| infringement, which this _has not_ be adjudicated to be. it
| is not a fine, its an alternative to basis of recovery to
| actual damages + infringers profits attributable to the
| infringement.
|
| Of course, there's also a very wide range of statutory
| damages, the minimum (if it is not "innocent" infringement)
| is $750/work.
|
| > 105B+ is more than Anthropic is worth on paper.
|
| The actual amount of 7 million works times $150,000/work is
| $1.05 trillion, not $105 billion.
| TimorousBestie wrote:
| > It's the maximum statutory damages for willful
| infringement, which this has not be adjudicated to be. it
| is not a fine, its an alternative to basis of recovery to
| actual damages + infringers profits attributable to the
| infringement.
|
| Yeah, you're probably right, I'm not a lawyer. The point is
| that it doesn't matter what number the law says they should
| pay, Anthropic can afford real lawyers and will therefore
| only pay a pittance, if anything.
|
| I'm old enough to remember what the feds did to Aaron
| Schwarz, and I don't see what Anthropic did that was so
| different, ethically speaking.
| pyman wrote:
| The problem with this thinking is that hundreds of thousands of
| teachers who spent years writing great, useful books and
| sharing knowledge and wisdom probably won't sue a billion
| dollar company for stealing their work. What they'll likely do
| is stop writing altogether.
|
| I'm against Anthropic stealing teacher's work and discouraging
| them from ever writing again. Some teachers are already saying
| this (though probably not in California).
| lofaszvanitt wrote:
| They won't be needed anymore, once singularity is reached.
| This might be their thought process. This also exemplifies
| that the loathed caste system found in India is indeed in
| place in western societies.
|
| There is no equality, and seemingly there are worker bees who
| can be exploited, and there are privileged ones, and of
| course there are the queens.
| pyman wrote:
| :D
|
| Note: My definition of singularity isn't the one they use
| in San Francisco. It's the moment founders who stole the
| life's work of thousands of teachers finally go to prison,
| and their datacentres get seized.
| lofaszvanitt wrote:
| You can bet that this never gonna happen...
| covercash wrote:
| When the rich and powerful face zero consequences for
| breaking laws and ignoring the social contracts that keep
| our society functioning, you wind up with extreme
| overcorrections. See Luigi.
| achierius wrote:
| How extreme is that, really? Not to justify murder: that
| is clearly bad. But "killing one man" is evidently
| something we, as a society, consider an "acceptable side-
| effect" when a corporation does it -- hell, you can kill
| thousands and get away scot-free if you're big enough.
|
| Luigi was peanuts in comparison.
|
| "THERE were two "Reigns of Terror," if we would but
| remember it and consider it; the one wrought murder in
| hot passion, the other in heartless cold blood; the one
| lasted mere months, the other had lasted a thousand
| years; the one inflicted death upon ten thousand persons,
| the other upon a hundred millions; but our shudders are
| all for the "horrors" of the minor Terror, the momentary
| Terror, so to speak; whereas, what is the horror of swift
| death by the axe, compared with lifelong death from
| hunger, cold, insult, cruelty, and heart-break? What is
| swift death by lightning compared with death by slow fire
| at the stake? A city cemetery could contain the coffins
| filled by that brief Terror which we have all been so
| diligently taught to shiver at and mourn over; but all
| France could hardly contain the coffins filled by that
| older and real Terror--that unspeakably bitter and awful
| Terror which none of us has been taught to see in its
| vastness or pity as it deserves."
|
| - Mark Twain
| SketchySeaBeast wrote:
| > They won't be needed anymore, once singularity is
| reached.
|
| And it just so happens that that belief says they can burn
| whatever they want down because something in the future
| might happen that absolves them of those crimes.
| CuriouslyC wrote:
| If you care so little about writing that AI puts you off it,
| TBH you're probably not a great writer anyhow.
|
| Writers that have an authentic human voice and help people
| think about things in a new way will be fine for a while yet.
| 4b11b4 wrote:
| Yeah, people will still want to write. They might need new
| ways to monetize it... that being said, even if people
| still want to write they may not consider it a viable path.
| Again, have to consider other monetization.
| glimshe wrote:
| That will be sad, although there will still be plenty of
| great people who will write books anyway.
|
| When it comes to a lot of these teachers, I'll say, copyright
| work hand in hand with college and school course book
| mandates. I've seen _plenty_ of teachers making crazy money
| off students ' backs due to these mandates.
|
| A lot of the content taught in undergrad and school hasn't
| changed in decades or even centuries. I think we have all the
| books we'll ever need in certain subjects already, but
| copyright keeps enriching people who write new versions of
| these.
| NoMoreNicksLeft wrote:
| Stealing? In what way?
|
| Training a generative model on a book is the mechanical
| equivalent of having a human read the book and learn from it.
| Is it stealing if a person reads the book and learns from it?
| blocko wrote:
| Depends on how closely that person can reproduce the
| original work without license or attribution
| lcnPylGDnU4H9OF wrote:
| It actually depends on whether or not they reproduce it
| and especially what they do with the copy after making
| it.
| js8 wrote:
| > The problem with this thinking is that hundreds of
| thousands of teachers who spent years writing great, useful
| books and sharing knowledge and wisdom probably won't sue a
| billion dollar company for stealing their work. What they'll
| likely do is stop writing altogether.
|
| I think this is a fantasy. My father cowrote a Springer book
| about physics. For the effort, he got like $400 and 6 author
| copies.
|
| Now, you might say he got a bad deal (or the book was bad),
| but I don't think hundreds of thousands of authors do
| significantly better. The reality is, people overwhelmingly
| write because they want to, not because of money.
| glimshe wrote:
| Isn't "pirating" a felony with jail time, though? That's what I
| remember from the FBI warning I had to see at the beginning of
| every DVD I bought (but not "pirated" ones).
| voxic11 wrote:
| Yes criminal copyright infringement (willful copyright
| infringement done for commercial gain or at a large scale) is
| a felony.
| kevingadd wrote:
| Google did it the legal way with Google Books, didn't they?
| maeln wrote:
| If you wanted to be legit with 0 chance of going to court, you
| would contact publisher and ask to pay a license to get access
| to their catalog for training, and negotiate from that point.
|
| This is what every company using media are doing (think
| Spotify, Netflix, but also journal, ad agency, ...). I don't
| know why people in HN are giving a pass to AI company for this
| kind of behavior.
| ohashi wrote:
| Because they are mostly software developers who think it's
| different because it impacts them.
| CaptainFever wrote:
| > I don't know why people in HN are giving a pass to AI
| company for this kind of behavior.
|
| As mentioned in The Fucking Article, there's a legal
| difference between training an AI which largely doesn't
| repeat things verbatim (ala Anthropic) and redistributing
| media as a whole (ala Spotify, Netflix, journal, ad agency).
| suyjuris wrote:
| Just downloading them is of course cheaper, but it is worth
| pointing out that, as the article states, they did also buy
| legitimate copies of millions of books. (This includes all the
| books involved in the lawsuit.) Based on the judgement itself,
| Anthropic appears to train only on the books legitimately
| acquired. Used books are quite cheap, after all, and can be
| bought in bulk.
| asadotzler wrote:
| Buying a book is not license to re-sell that content for your
| own profit. I can't buy a copy of your book, make a million
| Xeroxes of it and sell those. The license you get when you
| buy a book is for a single use, not a license to do what ever
| you want with the contents of that book.
| thedevilslawyer wrote:
| What are you on about - the judge has literally said this
| was not resell, and is transformative and fair use.
| suyjuris wrote:
| Yes, of course! In this case, the judge identified three
| separate instances of copying: (1) downloading books
| without authorisation to add to their internal library, (2)
| scanning legitimately purchased books to add to their
| internal library, and (3) taking data from their internal
| library for the purposes of training LLMs. The purchasing
| part is only relevant for (2) -- there the judge ruled that
| this is fair use. This makes a lot of sense to me, since no
| additional copies were created (they destroyed the physical
| books after scanning), so this is just a single use, as you
| say. The judge also ruled that (3) is fair use, but for a
| different reason. (They declined to decide whether (1) is
| fair use at this point, deferring to a later trial.)
| darkoob12 wrote:
| This is not about paying for a single copy. It would still be
| wrong even if they have bought every single one of those books.
| It is a form of plagiarism. The model will use someone else's
| idea without proper attribution.
| jeroenhd wrote:
| Legally speaking, we don't know that yet. Early signs are
| pointing at judges allowing this kind of crap because it's
| almost impossible for most authors to point out what part of
| the generated slop was originally theirs.
| tmaly wrote:
| At minimum they should have to buy the book they are deriving
| weights from.
| SirMaster wrote:
| But should the purchase be like a personal license? Or like a
| commercia license that costs way more?
|
| Because for example if you buy a movie on disc, that's a
| personal license and you can watch it yourself at home. But
| you can't like play it at a large public venue that sell
| tickets to watch it. You need a different and more expensive
| license to make money off the usage of the content in a
| larger capacity like that.
| bmitc wrote:
| > I'm not saying this is justified, but what would you have
| done in their situation?
|
| Individuals would have their lives ruined either from massive
| fines or jail time.
| blibble wrote:
| > Pirate and pay the fine is probably hell of a lot cheaper
| than individually buying all these books.
|
| $500,000 per infringement...
| jandrese wrote:
| And the crazy thing is that might be cheaper when you
| consider the alternative is to have your lawyers negotiate
| with the lawyers for the publishing companies for the right
| to use the works as training data. Not only is it many many
| billable hours just to draw up the contract, but you can be
| sure that many companies would either not play ball or set
| extremely high rates. Finally, if the publishing companies
| did bring a suit against Anthropic they might be asked to
| prove each case of infringement, basically to show that a
| specific work was used in training, which might be difficult
| since you can't reverse a model to get the inputs. When
| you're a billion dollar company it's much easier to get the
| courts to take your side. This isn't like the music companies
| suing teenagers who had a Kazaa account.
| tliltocatl wrote:
| If the AI movement will manage to undermine Imaginary Property,
| it would redeem it's externalities threefold.
| 57473m3n7Fur7h3 wrote:
| I don't think that's gonna happen. I think they will manage to
| get themselves out of trouble for it, while the rest of us will
| still face serious problems if we are caught torrenting even
| one singular little book.
| tliltocatl wrote:
| Even so, would be hard to prove that this particular little
| book wasn't generated by Claude (oopsie, it happens to be a
| verbatim copy of a copyrighted work, that happens sometimes,
| those pesky LLMs).
| pyman wrote:
| You just need to audit their system. Shouldn't take more
| than a couple of hours.
| 2OEH8eoCRo0 wrote:
| The Ocean Full of Bowling Balls
| CaptainFever wrote:
| It's already quite widespread and likely legal for average
| people to train AI models on copyrighted material, in the
| open weight AI communities like SD and LocalLLaMa.
|
| Please, please differentiate between pirating books (which
| Anthrophic is liable for, and is still illegal) and training
| on copyrighted material (which was found to be legal, for
| both corporations and average people).
| ttoinou wrote:
| It would be great, but I think some are worried that new AI
| BigTech will find a way to continue enforcing IP on the rest of
| society while it won't exist for them
| Imustaskforhelp wrote:
| I think that we are worried because I think that's exactly
| what's going to happen/ is happening.
| karel-3d wrote:
| That would render GPL and friends redundant too... copyleft
| depends on copyright.
| CaptainFever wrote:
| Copyleft nullifies copyright. Abolishing copyright and adding
| right to repair laws (mandatory source files) would give the
| same effect as everyone using copylefted licenses.
| bayindirh wrote:
| What are your feelings about how the small fish is stripped of
| their arts, and their years of work becomes just a prompt?
| Mainly comic artists and small musicians who are doing things
| they like and putting out for people, but not for much money?
| tliltocatl wrote:
| "But think about the children". The copyright system is doing
| too much damage to culture and society. Yes, it does provides
| a pond for some small fish, but the overall damage outweighs
| this. Like the fact that first estate provided sustainable
| for arts and crafts to flourish doesn't make the ancient
| regime any less screwed up.
| bayindirh wrote:
| I think I have worded my question wrong. I asked about not
| about how AI affects the financials of these smaller
| artists, but their wellbeing in general.
|
| There are many small artists who do this not for money, but
| for fun and have their renowned styles. Even their styles
| are ripped off by these generative AI companies and turned
| into a slot machine to earn money for themselves. These
| artists didn't consent to that, and this affects their
| (mental) well-beings.
|
| With that context in mind, what do you think about these
| people who are not in this for money is ripped out of their
| years of achievement and their hard work exploited for
| money by generative AI companies?
|
| It's not about IP (with whatever expansion you prefer) or
| laws, but ethics in general.
|
| Substitute comics for any medium. Code, music, painting,
| illustration, literature, short movies, etc.
| CamperBob2 wrote:
| (Shrug) If you want things to stay the same, both art and
| technology are bad career choices.
| bayindirh wrote:
| (Huh) What if you are in the field to advance it, and
| somebody steals your work and claims it as their own?
|
| e.g.: https://news.ycombinator.com/item?id=44460552
| CamperBob2 wrote:
| Bummer
| tliltocatl wrote:
| I see your point, "AI art" sucks in general and this is
| ethically sketchy as hell, but AIAK style copying has
| never been covered by copyright in the first place. Yea,
| it sucks to be alienated form your works. That's one of
| the externalites I mentioned in the original comment. But
| there is simply no remedy there. That's how the reality
| is.
| bayindirh wrote:
| Thanks for your answer, and taking your time for writing
| it!
|
| Yes, style copying is generally considered legal, but as
| another commenter posted in a related thread "scale
| matters".
|
| Maybe this will be reconsidered in the near future as the
| scale is in a much more different level with Generative
| AI. While there can be no technological solution to this
| (since it's a social problem to begin with), maybe public
| opinion about this issue will evolve over time.
|
| To be crystal clear: I'm not against the tech. I'm
| against abusing and exploiting people for solely monetary
| profit.
| frozenseven wrote:
| (1) You can't copyright an art style. That's not a thing.
|
| (2) Once you make something publicly available, anyone
| can learn from it. No consent necessary.
|
| (3) Being upset does not grant you special privileges
| under the law.
|
| (4) If you don't like the idea of paying for AI art, free
| software is both plentiful and competitive with just
| about anything proprietary.
| protocolture wrote:
| >Mainly comic artists and small musicians who are doing
| things they like and putting out for people, but not for much
| money?
|
| The number of these artists I have seen receiving some bogus
| DMCA takedown notice for fan art is crazy.
|
| I saw a bloke give away some of his STL's because he received
| a takedown request from games workshop and didnt have the
| funds to fight it.
|
| Its not that I want small artists to lose, its that I want
| them to gain access to every bloody copyright and trademark
| so they are more free to create.
|
| Shit Conde Nast managed to pull something like 400 pulps off
| the market, so they didnt interfere with their newly launched
| James Patterson collaborations.
| pxc wrote:
| It's true that intellectual property is a flawed and harmful
| mechanism for supporting creative work, and it needs to change,
| but I don't think ensuring a positive outcome is as simple as
| this. Whether or not such a power struggle between corporate
| interests benefits the public rather than just some companies
| will be largely accidental.
|
| I do support intellectual property reform that would be
| considered radical by some, as I imagine you do. But my highest
| hopes for this situation are more modest: if AI companies are
| told that their data must be in the public domain to train
| against, we will finally have a powerful faction among
| capitalists with a strong incentive to push back against the
| copyright monopolists when it comes to the continuous renewal
| of copyright terms.
|
| If the "path of least resistance" for companies like Google,
| Microsoft, and Meta becomes enlarging the public domain, we
| might finally begin to address the stagnation of the public
| domain, and that could be a good thing.
|
| But I think even such a modest hope as that one is unlikely to
| be realized. :-\
| Der_Einzige wrote:
| Yup.
|
| My response to this whole thread is just "good"
|
| Aaron Swartz is a saint and a martyr.
| LtWorf wrote:
| It will undermine it only for the rich owner of AI companies,
| not for everyone.
| Lionga wrote:
| Based on the fact people went to jail for downloading some music
| or movies, this guy will face a lifetime in prison for 7 million
| books that he then used for commercial profit right?
|
| Right guys we don't have rules for thee but not for me in the
| land of the free?
| 1oooqooq wrote:
| Aaron Swartz rolling
| pyman wrote:
| He downloaded millions of academic articles and the government
| charged him with multiple felonies.
|
| The difference is, Aaron Swartz wasn't planning to build
| massive datacenters with expensive Nvidia servers all over the
| world.
| mikewarot wrote:
| >the government charged him with multiple felonies.
|
| This was the result of a cruel and zealous overreach by the
| prosecutor to try to advance her political career. It should
| never have gone that far.
|
| The failure of MIT to rally in support of Aaron will never be
| forgiven.
| pyman wrote:
| I agree
| omnimus wrote:
| It's even worse considering all he downloaded was in public
| domain so it was much less problematic considering copyright.
|
| Lesson is simple. If you want to break a law make sure it is
| very profitable because then you can find investors and get
| away with it. If you play robin hood you will be met with a
| hammer.
| dandanua wrote:
| Same did Meta and probably other big companies. People who praise
| AGI are very short sighted. It will ruin the world with our
| current morals and ethics. It's like a nuclear weapon in the
| hands of barbarians (shit, we have that too, actually).
| booleandilemma wrote:
| So if I'm working on an LLM can I just steal millions of
| copyrighted books? Is that how our farcical justice system works?
| famahar wrote:
| Make sure you have a few billion dollars ready so you can pay a
| few million on the lawsuits. A volcano getting a cup of water
| poured into it.
| marapuru wrote:
| Apparently it's a common business practice. Spotify (even though
| I can't find any proof) seems to have build their software and
| business on pirated music. There is some more in this Article
| [0].
|
| https://torrentfreak.com/spotifys-beta-used-pirate-mp3-files...
|
| Funky quote:
|
| > Rumors that early versions of Spotify used 'pirate' MP3s have
| been floating around the Internet for years. People who had
| access to the service in the beginning later reported downloading
| tracks that contained 'Scene' labeling, tags, and formats, which
| are the tell-tale signs that content hadn't been obtained
| officially.
| motbus3 wrote:
| They had a second company (which I don't remember the name)
| that allowed users to backup and share their music. When they
| were exposed they dug that as deep as they could
| pyman wrote:
| No. There's no credible evidence Spotify had any secret
| second company that allowed users to back up and share music
| without authorisation
| pyman wrote:
| It was the opposite. Their mission was to combat music piracy
| by offering a better, legal alternative.
|
| Daniel Ek said: "my mission is to make music accessible and
| legal to everyone, while ensuring artists and rights holders
| got paid"
|
| Also, the Swedish government has zero tolerance for piracy.
| pyman wrote:
| I know this might come as a shock to those living in San
| Francisco, but things are different in other parts of the
| world, like Uruguay, Sweden and the rest of Europe. From what
| I've read, the European committee actually cares about
| enforcing the law.
| eviks wrote:
| Mission is just words, they can _mean_ the opposite of deeds,
| but they can 't _be_ the opposite, they live in different
| realms.
| KoolKat23 wrote:
| There's plenty of startups gone legitimate.
|
| Society underestimates the chasm that exists between an idea
| and raising sufficient capital to act on those ideas.
|
| Plenty of people have ideas.
|
| We only really see those that successfully cross it.
|
| Small things EULA breaches, consumer licenses being used
| commercially for example.
| pyman wrote:
| There's no credible evidence Spotify built their company and
| business on pirated music.
|
| This is a narrative that gets passed around in certain
| circles to justify stealing content.
| YPPH wrote:
| "Stealing" isn't an apt term here. Stealing a thing
| permanently deprives the owner of the thing. What you're
| describing is copyright infringement, not stealing.
|
| In this context, stealing is often used as a pejorative
| term to make piracy sound worse than it is. Except for mass
| distribution, piracy is often regarded as a civil wrong,
| and not a crime.
| KoolKat23 wrote:
| Best/most succinct explanation I've seen to date.
| bumby wrote:
| I think you make a good point, but there is some irony in
| pointing out the distinction between colloquial and legal
| use of the term "stealing" while also misusing the term
| "piracy" to describe legal matters.
|
| It would be more clear if you stick to either legal or
| colloquial variants, instead of switching back and forth.
| (Tbf, the judge in this case also used the term "piracy"
| colloquially).
| seadan83 wrote:
| Then it is not possible to 'steal' an idea? Afaik 'to
| steal'is simply to take without permission. If the thing
| is abstract, then you might not have deprived the
| original owner of that thing. If the thing is a physical
| object, then the implication is tou now have physical
| possession (in which case your definition seemingly
| holds)
|
| _edit /addendum_: considering this a bit more - the
| extent to which the original party is deprived of the
| stolen thing is pertinent for awarding damages. For
| example, imagine a small entity stealing from a large
| one, like a small creator steals dungeon and dragons
| rules. That doesn't deprive Hasbro of DnD, but it is
| still theft (we're assuming a verbatim copy here lifted
| directly from DnD books)
|
| The example that I was pondering were shows in russia
| that were almost literally "the sampsons." Did that stop
| the Simpson's from airing in the US, its primary market?
| No, but it was still theft, something was taken without
| permission.
| lmm wrote:
| > There's no credible evidence Spotify built their company
| and business on pirated music.
|
| That's a statement carefully crafted to be impossible to
| disprove. Of course they shipped pirated music (I've seen
| the files). Of course anyone paying attention knew. Nothing
| in the music industry was "clean" in those days. But, sure,
| no credible evidence because any evidence anyone shows you
| you'll decide is not credible. It's not in anyone's
| interests to say anything and none of it matters.
| hinterlands wrote:
| The problem is that these "small things" are not necessarily
| small if you're an individual.
|
| If you're an individual pirating software or media, then from
| the rights owners' perspective, the most rational thing to do
| is to make an example of you. It doesn't happen everyday, but
| it _does_ happen and it can destroy lives.
|
| If you're a corporation doing the same, the calculation is
| different. If you're small but growing, future revenues are
| worth more than the money that can be extracted out of you
| right now, so you might get a legal nastygram _with an offer
| of a reasonable payment to bring you into compliance_. And if
| you 're already big enough to be scary, litigation might be
| just too expensive to the other side even if you answer the
| letter with "lol, get lost".
|
| Even in the worst case - if Anthropic loses and the company
| is fined or even shuttered (unlikely) - the people who
| participated in it are not going to be personally liable and
| they've in all likelihood already profited immensely.
| KoolKat23 wrote:
| I agree, that was the point I was trying to make. It seems
| small but until the business is up and running at
| sufficient scale, the costs can be insurmountable.
|
| And the system set up by society doesn't truly account for
| this or care.
| dathinab wrote:
| but it's not some small things
|
| but systematic wide spread big things and often many of them,
| giving US giant a unfair combative advantage
|
| and don't think if you are a EU company you can do the same
| in the US, nop nop
|
| but naturally the US insist that US companies can do that in
| the EU and complain every time a US company is fined for not
| complying for EU law
| Barrin92 wrote:
| >Society underestimates the chasm that exists between an idea
| and raising sufficient capital to act on those ideas.
|
| The AI sector, famously known for its inability to raise
| funding. Anthropic has in the last four years raised 17
| billion dollars
| KoolKat23 wrote:
| Only once chatgpt 3.5 was released...
|
| Other industries do not have it this easy.
| jowea wrote:
| Uber
| pjc50 wrote:
| "recording obtained unofficially" and "doesn't have rights to
| the recording" are separate things. So they could well have got
| a license to stream a publisher's music but that didn't come
| with an actual copy of some/all of the music.
| techjamie wrote:
| Crunchyroll was originally an anime piracy site that went legit
| and started actually licensing content later. They started in
| mid-2006, got VC funding in 2008, then made their first
| licensing deal in 2009.
|
| https://www.forbes.com/2009/08/04/online-anime-video-technol...
|
| https://venturebeat.com/business/crunchyroll-for-pirated-ani...
| haiku2077 wrote:
| Good Old Games started out with the founders selling pirated
| games on disc at local markets.
| techjamie wrote:
| Pirated games translated to Polish if possible, because
| game devs weren't catering to the market with translations,
| and Poland didn't respect foreign copyright.
| Cyph0n wrote:
| Yep, they were huge too - virtually anyone who watched free
| anime back then would have known about them.
|
| My theory is that once they saw how much traffic they were
| getting, they realized how big of a market (subbed/dubbed)
| anime was.
| Shank wrote:
| And now Crunchyroll is owned by (through a lot of companies,
| like Aniplex of America, Aniplex, A1 Pictures) Sony, who
| produces a large amount of anime!
| dathinab wrote:
| not just Spotify pretty much any (most?) current tech giant was
| build by
|
| - riding a wave of change
|
| - not caring too much about legal constraints (or like they
| would say now "distrupting" the market, which very very often
| means doing illigal shit which beings them far more money then
| any penalties they will ever face from it)
|
| - or caring about ethics too much
|
| - and for recent years (starting with Amazone) a lot of
| technically illegal financing (technically undercutting
| competitors prices long term based on money from else where
| (e.g. investors) is unfair competitive advantage
| (theoretically) clearly not allowed by anti monopoly laws. And
| before you often still had other monopoly issues (e.g. see
| wintel)
|
| So yes not systematic not complying with law to get unfair
| competitive advantage knowing that many of the laws are on the
| larger picture toothless when applied to huge companies is
| bread and butter work of US tech giants
| benced wrote:
| As you point out, they mostly did this before they were large
| companies (where the public choice questions are less
| problematic). Seems like the breaking of these laws was good
| for everybody.
| FirmwareBurner wrote:
| _> Seems like the breaking of these laws was good for
| everybody._
|
| Are all music creators better off now than before Spotify?
| megaman821 wrote:
| The music pie is bigger now but it is split between more
| people. Spotify brings in the most revenue for musicians
| as a whole.
| oblio wrote:
| Is that why the biggest source of income for musicians
| these days are live shows? Streaming basically killed
| recording income for 99.9999% of musicians.
| dathinab wrote:
| Yeah, but like another post said it killed a lot of other
| income streams.
|
| And Spotify is a bad example as it ran into another psudo
| monopoly with very unreasonable/unhealthy power (the few
| large music labels holding rights to the majority of main
| stream music).
|
| They pretty much forced very bad terms onto Spotify which
| is to some degree why Spotify is pushing podcasts, as
| they can't be long term profitable with Music (raising
| prices doesn't help if the issue is a percent cut which
| rises too :/ )
| dathinab wrote:
| they where already big when they systematically broke this
| laws
|
| breaking this laws is what lifted them from big, to supper
| marked dominant to a point where they have monopoly like
| power
|
| that is _never_ good for everyone, or even good for the
| majority long term
|
| what is good for everyone (but a few rich people and
| sometimes the US government) is proper fair competition. It
| drives down prices and allows people to vote with their
| money, a it is a corner stone of the American dream it
| pushes innovation and makes sure a country isn't left
| behind. Monopoly like companies on the other hand tend to
| have exactly the other effect, higher prices (long term),
| corruption, stagnating innovation, and a completely
| shattered American sound pretty bad for the majority of
| Americans.
| Workaccount2 wrote:
| The common meme is that megacorps are shamelessly criminalistic
| organizations that get away with doing anything they can to
| maximize profits, while true in some regard, _totally pales in
| comparison to the illegal things small businesses and start-ups
| do_.
| reaperducer wrote:
| _Apparently it 's a common business practice._
|
| It's not a common business practice. That's why it's considered
| newsworthy.
|
| People on the internet have forgotten that the news doesn't
| report everyday, normal, common things, or it would be nothing
| but a listing of people mowing their lawns or applying for
| business loans. The reason something is in the news is because
| it is unusual or remarkable.
|
| "I saw it online, so it must happen all the time" is a dopy
| lack of logic that infects society.
| marapuru wrote:
| You are right on that. I'll edit my post to reflect that.
|
| Edit: Apologies, I can't edit it anymore.
| lysace wrote:
| You are missing the point. Spotify had permission from the
| copyright holders and/or their national proxies to use those
| songs in a limited beta in Sweden. They didn't have access to
| clean audio data directly from the record companies, so in many
| cases they used pirated rips instead.
|
| What you really should be asking is whether they infringed on
| the copyrights of the rippers. /s
| pembrook wrote:
| It wasn't just the content being pirated, but the early Spotify
| UI was actually a 1:1 copy of Limewire.
| NoMoreNicksLeft wrote:
| This isn't as meaningful as it sounds. Nintendo was apparently
| using scene roms for one of the official emulators on Wii (I
| think?). Spotify might have received legally-obtained mp3s from
| the record companies that were originally pulled from Napster
| or whatever, because the people who work for record companies
| are lazy hypocrites.
| cmiles74 wrote:
| Google Music originally let people upload their own digital
| music files. The argument at the time was that whether or not
| the files were legally obtained was not Google's problem. I
| believe Amazon had a similar service.
|
| https://www.computerworld.com/article/1447323/google-reporte...
| motbus3 wrote:
| It is shocking how courts have being ruling towards the benefits
| of ai companies despite the obvious problem of allowing automatic
| plagiarism
| jobs_throwaway wrote:
| Information wants to be free
| NoOn3 wrote:
| Then why do they sell their services instead of putting the
| model in open source?
| kristofferR wrote:
| Not really, plagiarism is not a legal concept.
| Kim_Bruning wrote:
| actual title:
|
| "Anthropic cut up millions of used books to train Claude -- and
| downloaded over 7 million pirated ones too, a judge said."
|
| A not-so-subtle difference.
|
| That said, in a sane world, they shouldn't have needed to cut up
| all those used books yet again when there's obviously already an
| existing file that does all the work.
| kube-system wrote:
| The importance of acquiring the physical book was the transfer
| of compensation to the author.
| Kim_Bruning wrote:
| You're not wrong, but that's one heck of a way to do it. It
| involves the destruction of 7 million books, which ... I
| really don't quite see the "promotion of Progress of Science
| and useful Arts" in that.
| CaptainFever wrote:
| Yeah, I'm not sure if people realize that the whole reason they
| _had_ to cut up the books was because they wanted to comply
| with copyright law. Artificial scarcity.
| greenavocado wrote:
| Should have listened to those NordVPN ads on YouTube
| sidewndr46 wrote:
| So using the standard industry metrics for calculating the
| financial impact of piracy, this would equate to something like
| trillions of damages to the book publishing industry?
| 2OEH8eoCRo0 wrote:
| I've begun to wonder if this is why some large torrent sites
| haven't been taken down. They are essentially able to crowdsource
| all the work. There are some users who spend ungodly amounts of
| time and money on these sites that I suspect are rich industry
| benefactors.
| neonate wrote:
| https://archive.md/YLyPg
| bgwalter wrote:
| Here is how individuals are treated for massive copyright
| infringement:
|
| https://investors.autodesk.com/news-releases/news-release-de...
| piker wrote:
| I thought you'd go with this:
| https://en.wikipedia.org/wiki/United_States_v._Swartz
| dialup_sounds wrote:
| Swartz wasn't charged with copyright infringement.
| natch wrote:
| *technically
| kube-system wrote:
| If you're discussing law, _an entirely different law in a
| different title of US code_ is more than a technicality.
| piker wrote:
| No, the parent was referring to how someone "was
| treated", and it would have been perfectly valid to
| reference that case to make the same point.
|
| What you're saying is like calling Al Capone a tax cheat.
| Nonsense.
|
| They went after Aaron over copyright.
| dialup_sounds wrote:
| Unlike much of the post hoc hagiography around Swartz,
| it's literally true.
| arandomhuman wrote:
| No but he coincidentally passed away after he was accused
| of it.
| kube-system wrote:
| No, the CFAA was the law that had him facing 35 years in
| prison and $1m+ fines. It wasn't a copyright case.
| tzs wrote:
| He wasn't facing anywhere near that. When the DOJ charges
| someone with a set of charges they like to say in the
| press release that the person is facing N years, where
| they get N by simply adding up the maximums for each
| charge that it is possible for a hypothetical defendant
| that has all the possible sentence enhancing factors to
| get. They also ignore that some charges group for
| sentencing--your sentence for the group is the maximum
| sentence for the individual charges in the group.
|
| Here's an article explaining in more detail [1].
|
| Most experts say that if Swartz had gone to trial and the
| prosecution had proved everything they alleged and the
| judge had decided to make an example of Swartz and
| sentence harshly it would have been around 7 years.
|
| Swartz's own attorney said that if they had gone to trail
| and lost he thought it was unlikely that Swartz would get
| any jail time.
|
| Swartz also had at least two plea bargain offers
| available. One was for a guilty plea and 4 months. The
| other was for a guilty plea and the prosecutors would ask
| for 6 months but Swartz could ask the judge for less or
| for probation instead and the judge would pick.
|
| [1] https://www.popehat.com/2013/02/05/crime-whale-sushi-
| sentenc...
| kube-system wrote:
| Yes, I meant "up to" that amount, which is implied when
| many people say "facing" before a trial happens. But it's
| not really relevant to my point, which was that it wasn't
| a copyright case.
| chourobin wrote:
| copyright is not the same as piracy
| asadotzler wrote:
| piracy isn't a thing, except on the high seas. what you're
| thinking about is copyright violation.
| downrightmike wrote:
| Yup, piracy sounds better than copyright violation.
|
| "Piracy" is mostly a rhetorical term in the context of
| copyright. Legally, it's still called infringement or
| unauthorized copying. But industries and lobbying groups
| (e.g., RIAA, MPAA) have favored "piracy" for its emotional
| weight.
| collingreen wrote:
| Emotional weight or because it's intentionally
| misleading.
| admissionsguy wrote:
| Does piracy have negative connotations? I thought
| everyone thought pirates were cool
| accrual wrote:
| Everyone but the person(s) affected by the pirates, I
| suppose.
| achierius wrote:
| Can you explain why? What makes them categorically different
| or at the very least why is "piracy" quantitatively worse
| than 'just' copyright violation?
| arrosenberg wrote:
| Piracy is theft - you have taken something and deprived the
| original owner of it.
|
| Copyright infringement is unauthorized reproduction - you
| have made a copy of something, but you have not deprived
| the original owner of it. At most, you denied them revenue
| although generally less than the offended party claims,
| since not all instances of copying would have otherwise
| resulted in a sale.
| fuzzfactor wrote:
| I have about the same concept of piracy these days.
|
| Real piracy always involves booty.
|
| Naturally booty is wealth that has been hoarded.
|
| Has nothing to do with wealth that may or may not come in
| the future, regardless of whether any losses due to
| piracy have taken place already or not.
| ddingus wrote:
| Yes, and the struggle with this back in the day was the
| *IAA and related organizations wanted to equate
| infringement with theft.
|
| And to be clear, we javelin the word infringement
| precisely because it is not theft.
|
| In addition to the deprived revenue, piracy also improves
| on the general relevance the author has or may have in
| the public sphere. Essentially, one of the side effects
| of piracy is basically advertising.
|
| Doctorow was one of the early ones to bring this aspect
| of it up.
| charcircuit wrote:
| Saying that piracy isn't copyright violation is an RMS
| talking point. It's not worth trying to ask why because the
| answer will be RMS said so and will not be backed by the
| common usage of the word.
| buzzerbetrayed wrote:
| You legitimately have it completely backwards. The word
| "piracy" was coopted to put a more severe spin on
| copyright violation. As a result, it became "the common
| usage of the word". But that was by design. And it's
| worth pushing back on.
| carlhjerpe wrote:
| Sweden has a political party called "The Pirate
| Party"(1), and "The Pirate Bay" is Swedish so I think a
| couple of Swedes memeing before it was cool has a
| significant impact on making the name stick but also
| taking the seriousness out of it.
|
| 1: https://piratpartiet.se/en/
| charcircuit wrote:
| I don't have it backwards. Language evolved, and piracy
| got a new definition. It's even in the dictionary. Trying
| to redefine words like this is futile and avoiding
| certain words or replacing them with others is a quirk
| that RMS has.
| lcnPylGDnU4H9OF wrote:
| > RMS
|
| Referring to this? (Wikipedia's disambiguation page
| doesn't seem to have a more likely article.)
|
| https://en.wikipedia.org/wiki/Richard_Stallman#Copyright_
| red...
| charcircuit wrote:
| Yes, quoting the following section:
| Stallman places great importance on the words and labels
| people use to talk about the world, including the
| relationship between software and freedom. He asks people
| to say free software and GNU/Linux, and to avoid the
| terms intellectual property and piracy (in relation to
| copying not approved by the publisher). One of his
| criteria for giving an interview to a journalist is that
| the journalist agrees to use his terminology throughout
| the article.
| lcnPylGDnU4H9OF wrote:
| That seems rather agreeable, though. Stallman is
| essentially saying that words are meaningful and
| speakers/writers should be thoughtful about the meaning
| of the words they use. In that context, refusing to use
| terms like "intellectual property" and "piracy" because
| of their meaning and the effect their use has on culture,
| and especially insisting that journalists who interview
| you use the same language, seems to be a means of
| controlling the interpreted meaning of one's expressions.
|
| (As an aside, it seems pointless to decry it as a
| "talking point". The reason it was brought up is
| presumably because the author agrees with it and thinks
| it's relevant. It's also entirely possible that the
| author, like me, made this argument without being aware
| that it was popularized by Richard Stallman. If it makes
| sense then you can hear the argument without hearing the
| person and still find it agreeable.)
|
| "Piracy" is used to refer to copyright violation to make
| it sound scary and dangerous to people who don't know
| better or otherwise don't think about it too hard. Just
| imagine if they called it "banditry" instead; now tell me
| that pirates are not bandits with boats. They may as well
| have called it banditry and it's worth correcting that.
| (I also think it's worth ridiculing but that doesn't
| appear to be Stallman's primary point.) It's not banditry
| (how _ridiculous_ would it be to call it that?), it 's
| copyright infringement.
|
| Edit:
|
| Reading my comment again in the context of other things
| you wrote, I suspect the argument will not pass muster
| because you do not seem to see piracy's change in meaning
| as manufactured by PR work purchased by media industry
| leaders. I'm not really trying to convince you that it's
| true but it may be worth considering that it is the
| fundamental disagreement you seem to have with others on
| Stallman's point; again, not saying you're wrong, just
| that's where the disagreement is.
| charcircuit wrote:
| My point is that the 2 commenters are working off of
| different definitions. One is using the common
| definitions of words in English and the other is trying
| to advocate for their ideological rooted definitions by
| trying to correct people who use the normal English
| definitions. 99% of the time how this will play out is
| the idealog will preach about their values instead of
| acknowledging that they are purposefully using different
| definitions.
|
| In short the post is bait.
| lcnPylGDnU4H9OF wrote:
| > In short the post is bait.
|
| This is an uncharitable interpretation. The ostensible
| point of the comment, or at least a stronger and still-
| reasonable interpretation, is that they are trying to
| point out that this specific word choice confuses
| concepts, which it does. Richard Stallman and the
| commenter in question are absolutely correct to point
| that out. You actually seem to be agreeing with Stallman,
| at least in the abstract.
|
| It's should be acknowledged how/why the meaning of the
| word changed. As I said, that seems to have been
| manufactured, which suggests, at least to me, that their
| (and Richard Stallman's) point is essentially the same as
| yours. That is to say, the US media industry started
| paying PR firms to use "piracy" as meaning something
| other than its normal definition until that became the
| common definition.
|
| They should not purposely use a different definition like
| that. That is Stallman's point, and why he refuses to say
| "piracy" instead of "copyright infringement"; ocean
| banditry is not copyright infringement and it is
| confusing -- intentionally so -- to say that it is.
| abeppu wrote:
| Maybe the most memorable version of the response is this
| the "Copying is not Theft" song.
| https://www.youtube.com/watch?v=IeTybKL1pM4
| NoMoreNicksLeft wrote:
| Asked unironically: "What's worse, hijacking ships at sea
| and holding their crews hostage for ransom on threat of
| death, or downloading a song off the internet?" ...
| nh23423fefe wrote:
| What point are you making? 20 years ago, someone sold pirated
| copies of software (wheres the transformation here) and that's
| the same as using books in a training set? Judge already said
| reading isnt infringement.
|
| This is reaching at best.
| amlib wrote:
| Aren't you comparing the wrong things? First example is about
| the output/outcome, what is the equivalent for LLMs? Also,
| not all "pirated" things are sold, most are in fact
| distributed for free.
|
| "Pirates" also transform the works they distribute. They
| crack it, translate it, compress it to decrease download
| times, remove unnecessary things, make it easier to download
| by splitting it in chunks (essential with dial-up, less so
| nowadays), change distribution formats, offer it trough
| different channels, bundle extra software and media that they
| themselves might have coded like trainers, installers, sick
| chiptunes and so on. Why is the "transformation" done by a
| big corpo more legal in your views?
| JimDabell wrote:
| > illegally copying and selling pirated software
|
| This is very different to what Anthropic did. Nobody was buying
| copies of books from Anthropic instead of the copyright holder.
| rvnx wrote:
| At the very least, they should have purchased the originals
| once
| arandomhuman wrote:
| Yeah, people have gone to jail for a few copies of content.
| Taking that large of a corpus and getting off without
| penalty would be a farce of the justice system.
| rockemsockem wrote:
| Bad decisions should not be repeated in the name of fair
| application.
| impossiblefork wrote:
| They actually should, because generally an equal playing
| field is more important that correct law.
|
| As an extreme example, consider murder. Obviously it
| should be illegal, but if it's legal for one group and
| not for another, the group for which it's illegal will
| probably be wiped out, having lost the ability to avenge
| deaths in the group.
|
| It's much more important that laws are applied
| impartially and equally than that they are even a tiny
| bit reasonable.
| haneefmubarak wrote:
| I think GP's point is that you should always seek to
| apply the law correctly, hopefully setting precedent for
| its correct application for everyone in the future.
| armada651 wrote:
| I wouldn't be so sure about that statement, no one has ruled
| on the output of Anthropic's AI yet. If their AI spits out
| the original copy of the book then it is practically the same
| as buying a book from them instead of the copyright holder.
|
| We've only dealt with the fairly straight-forward legal
| questions so far. This legal battle is still far from being
| settled.
| KoolKat23 wrote:
| It is extremely likely this will be declared fair use in
| the end.
|
| There's already one decision on a competitor.
|
| It makes sense, if you think of how the model works.
| cmiles74 wrote:
| It's very unlikely that Claude will verbatim reproduce an
| entire book from its training corpus. If that's the bar,
| they are pretty safe in my opinion.
| farceSpherule wrote:
| Peterson was copying and selling pirated software.
|
| Come up with a better comparison.
| organsnyder wrote:
| Anthropic is selling a service that incorporates these
| pirated works.
| adolph wrote:
| That a service incorporating the authors' works exists is
| not at issue. The plaintiffs' claims are, as summarized by
| Alsup: First, Authors argue that using
| works to train Claude's underlying LLMs was like
| using works to train any person to read and write, so
| Authors should be able to exclude Anthropic from
| this use (Opp. 16). Second, to that last point,
| Authors further argue that the training was intended
| to memorize their works' creative elements -- not just
| their works' non-protectable ones (Opp. 17).
| Third, Authors next argue that computers nonetheless should
| not be allowed to do what people do.
|
| https://media.npr.org/assets/artslife/arts/2025/order.pdf
| codedokode wrote:
| Computers cannot learn and are not subjects to laws. What
| happens, is a human takes a copyrighted work, makes an
| unauthorized digital copy, and loads it into a computer
| without authorization from copyright owner.
| KoolKat23 wrote:
| And they are not selling this or distributing this.
|
| The model is very different.
| cmiles74 wrote:
| I have to disagree, without all the copyrighted input
| data there would be no output data for these companies to
| sell. This output data _is_ the product and they are
| distributing it for dollars.
| KoolKat23 wrote:
| Copyright is concerned with the the actual physical copy.
| The model isn't this. The end user would have to
| carefully prompt the models algorithm to output a
| copyright infringing piece.
|
| This argument is more along the lines of: blaming
| Microsoft Word for someone typing characters into the
| word processors algorithm, and outputting a copy of an
| existing book. (Yes, it is a lot easier, but the
| rationale is the same). In my mind the end user prompting
| the model would be the one potentially infringing.
| cmiles74 wrote:
| FWIW, I don't think there is a prompt that would reliably
| produce, verbatim, a copyrighted work.
|
| I do think that a big part of the reason Anthropic
| downloaded millions of books from pirate torrents was
| because they _needed_ that input data in order to
| generate the output, their product.
|
| I don't know what that is, but, IMHO, not sharing those
| dollars with the creators of the content is clearly
| wrong.
| adolph wrote:
| It can't be "unauthorized" if no authorization was
| needed.
| xdennis wrote:
| > That a service incorporating the authors' works exists
| is not at issue.
|
| It's not an issue because it's not currently illegal
| because nobody could have foreseen this years ago.
|
| But it is profiting off of the unpaid work of millions.
| And there's very little chance of change because it's so
| hard to pass new protection laws when you're not Disney.
| adolph wrote:
| Marx wrote _The tradition of all dead generations weighs
| like an Alp on the brains of the living._ and that would
| be true if one were obligated to pay the full freight of
| one 's antecedents. The more positive truth is that the
| brains of the living reach new heights from that Alp and
| build ever new heights for those who come afterwards.
| CaptainFever wrote:
| Let's not expand copyright law.
| TeMPOraL wrote:
| It's not an issue because it's not what this case was
| about, as the linked document explicitly states. The
| Authors did not contest the legality of the model's
| outputs, only the inputs used in training.
| megaman821 wrote:
| Correct, the New York Times and Disney are suing for the
| output side. I am going to hazard a guess that you won't
| be able to circumvent copyright and trademark just
| because you are using AI. Where that line is has yet to
| be determined though.
| TeMPOraL wrote:
| Right, but where that line will be drawn will have major
| impact on the near-term future of those models. If the
| user is liable for distributing infringing output that
| came from AI, that's not a problem for the field (and
| IMHO a reasonable approach) - but if they succeed in
| making the model vendors liable for the _possibility_ of
| users generating infringing output, it 'll shake things
| up pretty seriously.
| lawlessone wrote:
| > underlying LLMs was like using works to train any
| person to read and write
|
| I don't think humans learn via backprop or in
| rounds/batches, our learning is more "online".
|
| If I input text into an LLM it doesn't learn from that
| unless the creators consciously include that data in the
| next round of teaching their model.
|
| Humans also don't require samples of every text in
| history to learn to read and write well.
|
| Hunter S Thompson didn't need to ingest the Harry Potter
| books to write.
| TeMPOraL wrote:
| The first paragraph sounds absurd, so I looked into the
| PDF, and here's the full version I found:
|
| > _First, Authors argue that using works to train
| Claude's underlying LLMs was like using works to train
| any person to read and write, so Authors should be able
| to exclude Anthropic from this use (Opp. 16). But Authors
| cannot rightly exclude anyone from using their works for
| training or learning as such. Everyone reads texts, too,
| then writes new texts. They may need to pay for getting
| their hands on a text in the first instance. But to make
| anyone pay specifically for the use of a book each time
| they read it, each time they recall it from memory, each
| time they later draw upon it when writing new things in
| new ways would be unthinkable. For centuries, we have
| read and re-read books. We have admired, memorized, and
| internalized their sweeping themes, their substantive
| points, and their stylistic solutions to recurring
| writing problems._
|
| Couldn't have put it better myself (though $deity knows I
| tried many times on HN). Glad to see Judge Alsup
| continues to be the voice of common sense in legal
| matters around technology.
| cmiles74 wrote:
| For everyone arguing that there's no harm in
| anthropomorphizing an LLM, witness this rationalization.
| They talk about training and learning as if this is
| somehow comparable to human activities. The idea that LLM
| training is comparable to a person learning seems way out
| there to me.
|
| "We have admired, memorized, and internalized their
| sweeping themes, their substantive points, and their
| stylistic solutions to recurring writing problems."
|
| Claude is not doing any of these things. There is no
| admiration, no internalizing of sweeping themes. There's
| a network encoding data.
|
| We're talking about a machine that accepts content and
| then produces more content. It's not a person, it's owned
| by a corporation that earns money on literally every word
| this machine produces. If it didn't have this large
| corpus of input data (copyrighted works) it could not
| produce the output data for which people are willing to
| pay money. This all happens at a scale no individual
| could achieve because, as we know, it is a machine.
| ben_w wrote:
| There may be no admiration, but there definitely is an
| internalising of sweeping themes, and all the other
| things in your quotation, which anyone can fetch by
| asking it for the themes/substantive points/stylistic
| solutions of one of the books it has (for lack of a
| better verb) read.
|
| That the mechanism performing these things is a network
| encoding data is... well, that description, at that level
| of abstraction, is a similarity with the way a human does
| it, not even a difference.
|
| My network is a 3D mess made of pointy bi-lipid bags
| exchanging protons across gaps moderated by the presence
| of neurochemicals, rather than flat sheets of silicon
| exchanging electrons across tuned energy band-gaps
| moderated by other electrons, but it's still a network.
|
| > We're talking about a machine that accepts content and
| then produces more content. It's not a person, it's owned
| by a corporation that earns money on literally every word
| this machine produces. If it didn't have this large
| corpus of input data (copyrighted works) it could not
| produce the output data for which people are willing to
| pay money. This all happens at a scale no individual
| could achieve because, as we know, it is a machine.
|
| My brain is a machine that accepts content in the form of
| job offers and JIRA tickets (amongst other things), and
| then produces more content in the form of pull requests
| (amongst other things). For the sake specifically of this
| question, do the other things make a difference? While I
| count as a person and am not owned by any corporation,
| when I work for one, they do earn money on the words this
| biological machine produces. (And given all the models
| which are free to use, the LLMs definitely don't earn
| money on "literally" every word those models produce). If
| I didn't have the large corpus of input data -- and there
| absolutely was copyright on a lot of the school textbooks
| and the TV broadcast educational content of the 80s and
| 90s when I was at school, and the Java programming
| language that formed the backbone of my university degree
| -- I could not produce the output data for which people
| are willing to pay money.
|
| Should corporations who hire me be required to pay Oracle
| every time I remember and use a solution that I learned
| from a Java course, even when I'm not writing Java?
|
| That the LLMs do this at a scale no individual could
| achieve because it is a machine, means it's got the
| potential to wipe me out economically. Economics threat
| of automation has been a real issue at least since the
| luddites if not earlier, and I don't know how the dice
| will fall this time around, so even though I have one
| layer of backup plan, I am well aware it may not work,
| and if it doesn't then government action will have to
| happen because a lot of other people will be in trouble
| before trouble gets to me (and recent history shows that
| this doesn't mean "there won't be trouble").
|
| Copyright law is one example of government action. So is
| mandatory education. So is UBI, but so too is feudalism.
|
| Good luck to us all.
| losvedir wrote:
| > _Glad to see Judge Alsup continues to be the voice of
| common sense in legal matters around technology_
|
| Yep, that name's a blast from the past! He was the judge
| on the big Google/Oracle case about Android and Java
| years ago, IIRC. I think he even learned to write some
| Java so he could better understand the case.
| Aurornis wrote:
| > Here is how individuals are treated for massive copyright
| infringement:
|
| When I clicked the link, I got an article about a _business_
| that was selling millions of dollars of pirated software.
|
| This guy made millions of dollars in profit by selling pirated
| software. This wasn't a case of transformative works, nor of an
| individual doing something for themselves. He was plainly
| stealing and reselling something.
| ysofunny wrote:
| before breaking the law, set up a corporation to absorb the
| liability!
|
| in other words, provided you have enough spare capital to spin
| up a corporation, you can break the law!!!!
| stocksinsmocks wrote:
| Anthropic isn't selling copies of the material to its users
| though. I would think you couldn't lock someone up for reading
| a book and summarizing or reciting portions of the contents.
|
| Seven years for thumbing your nose at Autodesk when armed
| robbery would get you less time says some interesting things
| about the state of legal practice.
| wmeredith wrote:
| > summarizing or reciting portions of the contents
|
| This absolutely falls under copyright law as I understand it
| (not a lawyer). E.g. the disclaimer that rolls before every
| NFL broadcast. The notice states that the broadcast is
| copyrighted and any unauthorized use, including pictures,
| descriptions, or accounts of the game, is prohibited. There
| is wiggle room for fair use by news organizations, critics,
| artists, etc.
| steveklabnik wrote:
| I can say "you cannot read this comment for any purpose"
| but that doesn't supersede the law.
| zahma wrote:
| Except they aren't merely reading and reciting content, are
| they? That's a rather disingenuous argument to make. All
| these AI companies are high on billions in investment and
| think they can run roughshod over all rules in the sprint
| towards monetizing their services.
|
| Make no mistake, they're seeking to exploit the contents of
| that material for profits that are orders of magnitude larger
| than what any shady pirated-material reseller would make. The
| world looks the other way because these companies are
| "visionary" and "transformational."
|
| Maybe they are, and maybe they should even have a right to
| these buried works, but what gives them the right to rip up
| the rule book and (in all likelihood) suffer no repercussions
| in an act tantamount to grand theft?
|
| There's certainly an argument to be had about whether this
| form of research and training is a moral good and beneficial
| to society. My first impression is that the companies are too
| opaque in how they use and retain these files, albeit for
| some legitimate reasons, but nevertheless the archival
| achievements are hidden from the public, so all that's left
| is profit for the company on the backs of all these other
| authors.
| burnt-resistor wrote:
| I'm wondering though how the law will construe AI able to
| make a believable sequel to Moby Dick after digesting Herman
| Melville's works. (Or replace Melville with a modern writer.)
| dathinab wrote:
| as far as I understand while training on books is clearly not
| fair use (as the result will likely hurt the lively hood of
| authors, especially not "best of the best" authors).
|
| as long as you buy the book it still should be legal, that is if
| you actually buy the book and not a "read only" eBook
|
| but the 7_000_000 pirated books are a huge issue, and one from
| which we have a lot of reason to believe isn't just specific to
| Anthropic
| asadotzler wrote:
| Buying a copy of a book does not give you license to take the
| exact content of that book, repackage it as a web service, and
| sell it to millions of others. That's called theft.
| russell_h wrote:
| The title is clearly meant to generate outrage, but what is wrong
| with cutting up a book that you own?
| nickpsecurity wrote:
| Buying, scanning, and discarding was in my proposal to train
| under copyright restrictions.
|
| You are often allowed to nake a digital copy of a physical work
| you bought. There are tons of used, physical works thay would be
| good for training LLM's. They'd also be good for training OCR
| which could do many things, including improve book scanning for
| training.
|
| This could be reduced to a single act of book destruction per
| copyrighted work or made unnecessary if copyright law allowed us
| to share others' works digitally with their _licensed customers_.
| Ex: people who own a physical copy or a license to one.
| Obviously, the implementation could get complex but we wouldn 't
| have to destroy books very often.
| NHQ wrote:
| The farce of treating a corporation as an individual precludes
| common sense legal procedure to investigate people who are
| responsible for criminal action taken by the company. Its
| obviously premeditated and in all ways an illicit act knowingly
| perpetrated by persons. The only discourse should be about
| upending this penthouse legalism.
| NHQ wrote:
| The irony is that actually litigating copyright law would lead
| to the repeal of said copyright law. And so in all cases of
| backwaters laws that are used to "protect interests" of
| "corporations" yet criminalize petty individual cases.
|
| This of course cannot be allowed to happen, so the the legal
| system is just a limbo, a bar which regular individuals must
| strain to pass under but that corporations regularly overstep.
| outside1234 wrote:
| So if you incorporate you can do whatever you want without
| criminal charges?
| trinsic2 wrote:
| I'm not seeing how this is fair use in either case.
|
| Someone correct me if I am wrong but aren't these works being
| digitized and transformed in a way to make a profit off of the
| information that is included in these works?
|
| It would be one thing for an individual to make person use of one
| or more books, but you got to have some special blindness not to
| see that a for-profit company's use of this information to
| improve a for-profit model is clearly going against what
| copyright stands for.
| jimbob21 wrote:
| They clearly were being digitized, but I think its a more
| philosophical discussion that we're only banging our heads
| against for the first time to say whether or not it is fair
| use.
|
| Simply, if the models can _think_ then it is no different than
| a person reading many books and building something new from
| their learnings. Digitization is just memory. If the models
| cannot _think_ then it is meaningless digital regurgitation and
| plagiarism, not to mention breach of copyright.
|
| The quotes "consistent with copyright's purpose in enabling
| creativity and fostering scientific progress." and "Like any
| reader aspiring to be a writer" say, from what I can tell, that
| the judge has legally ruled the model can think as a human
| does, and therefore has the legal protections afforded to
| "creatives."
| palmotea wrote:
| > Simply, if the models can think then it is no different
| than a person reading many books and building something new
| from their learnings.
|
| No, that's fallacious. Using anthropomorphic words to
| describe a machine does not give it the same kinds of rights
| and affordances we give real people.
| jimbob21 wrote:
| Actually, it does, at least for this case. The judge just
| said so.
| NoOn3 wrote:
| People have rights, machines don't. Otherwise, maybe give
| machines the right to vote, for example?...
| kube-system wrote:
| This case is more like:
|
| If a human uses a voting machine, they still have a right
| to vote.
|
| Machines don't have rights. The human using the machine
| does.
| protocolture wrote:
| If I can use my brain to learn, I as a human can use my
| computer to learn.
|
| Its like, taking notes, or google image search caching
| thumbnails. Honestly we dont even need the learning
| metaphor to see this is obviously not an infringement.
| pavon wrote:
| The judge did use some language that analogized the
| training with human learning. I don't read it as basing the
| legal judgement on anthropomorphizing the LLM though, but
| rather discussing whether it would be legal for a human to
| do the same thing, then it is legal for a human to use a
| computer to do so. First, Authors argue
| that using works to train Claude's underlying LLMs was like
| using works to train any person to read and write, so
| Authors should be able to exclude Anthropic from this
| use (Opp. 16). But Authors cannot rightly exclude anyone
| from using their works for training or learning as
| such. Everyone reads texts, too, then writes new texts.
| They may need to pay for getting their hands on a
| text in the first instance. But to make anyone pay
| specifically for the use of a book each time they read it,
| each time they recall it from memory, each time they
| later draw upon it when writing new things in new ways
| would be unthinkable. For centuries, we have read and
| re-read books. We have admired, memorized, and internalized
| their sweeping themes, their substantive points, and their
| stylistic solutions to recurring writing problems.
| ... In short, the purpose and character of using
| copyrighted works to train LLMs to generate new text
| was quintessentially transformative. Like any reader
| aspiring to be a writer, Anthropic's LLMs trained
| upon works not to race ahead and replicate or supplant them
| -- but to turn a hard corner and create something
| different. If this training process reasonably
| required making copies within the LLM or otherwise, those
| copies were engaged in a transformative use.
|
| [1] https://authorsguild.org/app/uploads/2025/06/gov.uscour
| ts.ca...
| wrs wrote:
| Copyright is not on "information", It's on the tangible
| expression (i.e., the actual words). "Transformative use" is a
| _defense_ in copyright infringement.
| kristofferR wrote:
| What do you think fair use is? The whole point of the fair use
| clauses is that if you transform copyrighted works enough you
| don't have to pay the original copyright holder.
| kube-system wrote:
| Fair use is not, at its core, about transformation. It's
| about many types of uses that do not interfere with the
| reasons for the rights we ascribe to authors. Fair use
| doesn't require transformation.
| skybrian wrote:
| Copyright is largely about _distributing copies._ It's not
| about making something vaguely similar or about referencing
| copyrighted work to make something vaguely similar.
|
| Although, there's an exception for fictional characters:
|
| https://en.m.wikipedia.org/wiki/Copyright_protection_for_fic...
| pavon wrote:
| There is another case where companies slurped up all of the
| internet and profited off the information, that makes a good
| comparison - search engines.
|
| Judges consider a four factor when examining fair use[1]. For
| search engines,
|
| 1) The use is transformative, as a tool to find content is very
| different purpose than the content itself.
|
| 2) Nature of the original work runs the full gamut, so search
| engines don't get points for only consuming factual data, but
| it was all publicly viewable by anyone as opposed to books
| which require payment.
|
| 3) The search engine store significant portions of the work in
| the index, but it only redistributes small portions.
|
| 4) Search engines, as original devised, don't compete with the
| original, in fact they can improve potential market of the
| original by helping more people find them. This has changed
| over time though, and search engines are increasingly competing
| with the content they index, and intentionally trying to show
| the information that people want on the search page itself.
|
| So traditional search which was transformative, only
| republished small amounts of the originals, and didn't compete
| with the originals fell firmly on the side of fair use.
|
| Google News and Books on the other hand weren't so clear cut,
| as they were showing larger portions of the works and were
| competing with the originals. They had to make changes to those
| products as a result of lawsuits.
|
| So now lets look at LLMs:
|
| 1) LLM are absolutely transformative. Generating new text at
| users request is a very different purpose and character from
| the original works.
|
| 2) Again runs the full gamut (setting aside the clear copyright
| infringement downloading of illegally distributed books which
| is a separate issue)
|
| 3) For training purposes, LLMs don't typically preserve entire
| works, so the model is in a better place legally than a search
| index, which has precedent that storing entire works privately
| can be fair use depending on the other factors. For inference,
| even though they are less likely to reproduce the originals in
| their outputs than search engines, there are failure cases
| where an LLM over-trained on a work, and a significant amount
| the original can be reproduced.
|
| 4) LLMs have tons of uses some of which complement the original
| works and some of which compete directly with them. Because of
| this, it is likely that whether LLMs are fair use will depend
| on how they are being used - eg ignore the LLM altogether and
| consider solely the output and whether it would be infringing
| if a human created it.
|
| This case was solely about whether training on books is fair
| use, and did not consider any uses of the LLM. Because LLMs are
| a very transformative use, and because they don't store
| original verbatim, it weighs strongly as being fair use.
|
| I think the real problems that LLMs face will be in factors 3
| and 4, which is very much context specific. The judge himself
| said that the plaintiffs are free to file additional lawsuits
| if they believe the LLM outputs duplicate the original works.
|
| [1] https://fairuse.stanford.edu/overview/fair-use/four-
| factors/
| NoMoreNicksLeft wrote:
| Digitizing the books is the equivalent of a blind person doing
| something to the book to make it readable to them... the
| software can't read analog pages.
|
| Learning from the book is, well, learning from the book. Yes,
| they intended to make money off of that learning... but then I
| guess a medical student reading medical textbooks intends to
| profit off of what they learn from them. Guess that's not fair
| use either (well, it's really just _use_ , as in the intended
| use for all books since they were first invented).
|
| Once a person has to believe that copyright has any moral
| weight at all, I guess all rational though becomes impossible
| for them. Somehow, they're not capable of entertaining the idea
| that copyright policy was only ever supposed to be this
| pragmatic thing to incentivize creative works... and that
| whatever little value it has disappears entirely once the
| policy is twisted to consolidate control.
| kenmacd wrote:
| > to make a profit off of the information that is included in
| these works?
|
| Isn't that what a lot of companies are doing, just through
| employees? I read a lot of books, and took a lot of courses,
| and now a company is profiting off that information.
| protocolture wrote:
| >clearly going against what copyright stands for.
|
| Copyright isnt a digital moat. Its largely an agreement that
| the work is available to the public, but the creator has a
| limited amount of time to exploit it at market.
|
| If you sell an AI model, or access to an AI model, theres
| usually around 0% of the training data redistributed with the
| model. You cant decompile it and find the book. As you aren't
| redistributing the original work copyright is barely relevant.
|
| Imagine suggesting that because you own the design of a hammer,
| that all works created with the hammer belong to you and cant
| be sold?
|
| That someone came up with a new method of using books as a tool
| to create a different work, does not entitle the original book
| author to a cut of the pie.
| ruffrey wrote:
| Two of the top AI companies flouted ethics with regard to
| training data. In OpenAI's case, the whistleblower probably got
| whacked for exposing it.
|
| Can anyone make a compelling argument that any of these AI
| companies have the public's best interest in mind
| (alignment/superalignment)?
| k__ wrote:
| So, how should we as a society handle this?
|
| Ensure the models are open source, so everyone can use them, as
| everyones data is in there?
|
| Close those companies and force them to delete the models, as
| they used copyright material?
| carlosjobim wrote:
| If ingesting books into an AI makes Anthropic criminals, then
| Google et al are also criminals alike for making search indexes
| of the Internet. Anything published online is equally
| copyrighted.
| kristofferR wrote:
| Yeah, we can all agree that ingesting books is fair use and
| transformative, but you gotta own what you ingest, you can't
| just pirate it.
|
| I can read 100 books and write a book based on the inspiration
| I got from the 100 books without any issue. However, if I
| pirate the 100 books I've still committed copyright
| infringement despite my new book being fully legal/fair use.
| carlosjobim wrote:
| I disagree that it has anything to do with copyright. It is
| at most theft. If I steal a bunch of books from the library,
| I haven't committed any breach of copyright.
| riskable wrote:
| Exactly! If Anthropic is guilty of copyright infringement for
| the mere act of downloading copyrighted books then so is
| Google, Microsoft (Bing), DuckDuckGo, etc. Every search engine
| that exists downloads pirated material every day. They'd all be
| guilty.
|
| Not only that but all of _us_ are guilty too because I 'm
| positive we've all clicked on search results that contained
| copyrighted content that was copied without permission. You may
| not have even known it was such.
|
| Remember: _Intent_ is irrelevant when it comes to copyright
| infringement! It 's not that kind of law.
|
| Intent can _guide_ a judge when they determine damages but that
| 's about it.
| 1970-01-01 wrote:
| The buried lede here is Antrhopic will need to attempt to explain
| to a judge that it is impossible to de-train 7M books from their
| models.
| nickpsecurity wrote:
| I'm hoping they fail to incentivize using legal, open, and/or
| licensed data. Then, thry might have to attempt to train a
| Claude-class model on legal data. Then, I'll have a great,
| legal model to use. :)
| protocolture wrote:
| Or they could be forced to settle a price for access to the
| books.
| dehrmann wrote:
| The important parts:
|
| > Alsup ruled that Anthropic's use of copyrighted books to train
| its AI models was "exceedingly transformative" and qualified as
| fair use
|
| > "All Anthropic did was replace the print copies it had
| purchased for its central library with more convenient space-
| saving and searchable digital copies for its central library --
| without adding new copies, creating new works, or redistributing
| existing copies"
|
| It was always somewhat obvious that pirating a library would be
| copyright infringement. The interesting findings here are that
| scanning and digitizing a library for internal use is OK, and
| using it to train models is fair use.
| jpalawaga wrote:
| I don't think that's new. google set precedent for that more
| than a decade ago. you're allowed to transform a book to
| digital.
| 6gvONxR4sf7o wrote:
| You skipped quotes about the other important side:
|
| > But Alsup drew a firm line when it came to piracy.
|
| > "Anthropic had no entitlement to use pirated copies for its
| central library," Alsup wrote. "Creating a permanent, general-
| purpose library was not itself a fair use excusing Anthropic's
| piracy."
|
| That is, he ruled that
|
| - buying, physically cutting up, physically digitizing books,
| and using them for training is fair use
|
| - pirating the books for their digital library is not fair use.
| throwawayffffas wrote:
| So all they have to do is go and buy a copy of each book they
| pirated. They will have ceased and desisted.
| superfrank wrote:
| I'm trying to find the quote, but I'm pretty sure the judge
| specifically said that going and buying the book after the
| fact won't absolve them of liability. He said that for the
| books they pirated they broke the law and should stand
| trial for that and they cannot go back and un-break in by
| buying a copy now.
|
| Found it: https://www.nbcnews.com/tech/tech-news/federal-
| judge-rules-c...
|
| > "That Anthropic later bought a copy of a book it earlier
| stole off the internet will not absolve it of liability for
| the theft," [Judge] Alsup wrote, "but it may affect the
| extent of statutory damages."
| zoklet-enjoyer wrote:
| Did they really steal if they didn't deprive anyone of
| their copy? I don't think copying is theft.
| badlibrarian wrote:
| "Tell it to the Judge..."
| kjkjadksj wrote:
| You may not think it is but the law does.
| buildbot wrote:
| The law says it's copyright infringement, not theft.
| axus wrote:
| Agreed, the judge should avoid slang or even commonly
| accepted synonyms in an official ruling. The charge is
| not for theft.
|
| Substitute infringement for theft.
| hadlock wrote:
| It's copyright infringement, which is not theft, they're
| legally distinct in the eyes of the law. This is partly
| why the "you wouldn't download a car" copyright ads were
| so widely mocked.
| __MatrixMan__ wrote:
| Fun fact, they didn't have the rights to use the font
| they used for those commercials:
| https://news.ycombinator.com/item?id=43775926
| gghffguhvc wrote:
| Or the music. It was originally made as a one off for a
| film festival. Movie industry defended the lawsuit over
| the music.
| fortran77 wrote:
| It's fine that you think that way. But this is a
| discusion of the laws of the United States of America and
| ruling by American courts, not a discussion of your own
| legal theories.
| hnlmorg wrote:
| The GP isn't talking about some edge case legal dilemma
| that requires a lawyer or judge to comment. It's already
| widely documented that copyright infringement is legally
| distinct from theft.
| freejazz wrote:
| They also argued that they in no way could ever actually
| license all the materials they ingested
| dmd wrote:
| I love this argument so much. "But judge, there's no way
| I could ever afford to buy those jewels, so stealing them
| must be OK."
| AnthonyMouse wrote:
| The argument is more along the lines of, negotiating with
| millions of individuals each over a single copy of a work
| would cause the transaction costs to exceed the payments,
| and that kind of efficiency loss is the sort of thing
| fair use exists to prevent. It's not socially beneficial
| for the law to require you to create $2 in deadweight
| loss in order to transfer $1, and the cost to the author
| of not selling a single additional copy is not the thing
| they were really objecting to.
| exe34 wrote:
| That's right, so I can't individually discuss terms with
| each and every media creator, so from now on, I can just
| pirate everything.
| AnthonyMouse wrote:
| Needing a copy of one book you're going to spend a week
| reading has a lot less overhead than needing a copy of
| every book that you're going to process with a computer
| in bulk.
| recursive wrote:
| I like to glance at the cover art. I can do ten per
| second when I really get into my flow state. Sometimes I
| read them also, but that's incidental.
| AnthonyMouse wrote:
| If you go to the book store and glance at all the cover
| art without buying any of them, do you expect to be sued
| for this?
| freejazz wrote:
| If you do that and reproduce the covers or the protected
| elements thereof, you should absolutely expect to be
| sued.
| AnthonyMouse wrote:
| So for example, if the bookstore has a nice 4k
| surveillance camera and you have access to it because you
| work there, sitting at home and using it to look at the
| cover art on all the books on display is something you'd
| expect to be sued over?
| Aeolun wrote:
| This is literally why a lot of people pirate content,
| yes. It's pretty much always the only way to obtain the
| content, even if you are otherwise fine with paying for
| it.
| freejazz wrote:
| > and that kind of efficiency loss is the sort of thing
| fair use exists to prevent.
|
| No it's not. And you ever heard of a publishing house?
| They don't need to negotiate with every single author
| individually. That's preposterous.
| AnthonyMouse wrote:
| It kind of is though?
|
| It's not the _only_ reason fair use exists, but it 's the
| thing that allows e.g. search engines to exist, and that
| seems pretty important.
|
| > And you ever heard of a publishing house? They don't
| need to negotiate with every single author individually.
| That's preposterous.
|
| There are thousands of publishing houses and millions of
| self-published authors on top of that. Many books are
| also out of print or have unclear rights ownership.
| freejazz wrote:
| >It kind of is though?
|
| No, it kinda isn't. Show me anything that supports this
| idea beyond your own immediate conjecture right now.
|
| >It's not the only reason fair use exists, but it's the
| thing that allows e.g. search engines to exist, and that
| seems pretty important.
|
| No, that's the transformative element of what a search
| engine provides. Search engines are not legal because
| they can't contact each licensor, they are legal because
| they are considered hugely transformative features.
|
| >There are thousands of publishing houses and millions of
| self-published authors on top of that. Many books are
| also out of print or have unclear rights ownership.
|
| Okay, and? How many customers does Microsoft bill on a
| monthly basis?
| AnthonyMouse wrote:
| > Show me anything that supports this idea beyond your
| own immediate conjecture right now
|
| It's inherent in the nature of the test. The most
| important fair use factor is the effect on the market for
| the work, so if the use would be uneconomical without
| fair use then the effect on the market is negligible
| because the alternative would be that the use doesn't
| happen rather than that the author gets paid for it.
|
| > No, that's the transformative element of what a search
| engine provides. Search engines are not legal because
| they can't contact each licensor, they are legal because
| they are considered hugely transformative features.
|
| To make a search engine you have to do two things. One is
| to download a copy of the whole internet, the other is to
| create a search index. I'm talking about the first one,
| you're talking about the second one.
|
| > Okay, and? How many customers does Microsoft bill on a
| monthly basis?
|
| Microsoft does this with an automated system. There is no
| single automated system where you can get every book ever
| written, and separately interfacing with all of the many
| systems needed in order to do it is the source of the
| overhead.
| freejazz wrote:
| >It's inherent in the nature of the test. The most
| important fair use factor is the effect on the market for
| the work, so if the use would be uneconomical without
| fair use then the effect on the market is negligible
| because the alternative would be that the use doesn't
| happen rather than that the author gets paid for it.
|
| No, that's not the most important factor. The
| transformative factor is the most important. Effect on
| market for the work doesn't even support your argument
| anyway. Your argument is about the cost of making the end
| product, which is totally distinct from the market
| effects on the copyright holder when the infringer makes
| and releases the infringing product.
|
| >To make a search engine you have to do two things. One
| is to download a copy of the whole internet, the other is
| to create a search index. I'm talking about the first
| one, you're talking about the second one.
|
| So? That doesn't make you right. Go read the opinions,
| dude. This isn't something that's actually up for debate.
| Search engines are fair uses because of their
| transformative effect, not because they are really
| expensive otherwise. Your argument doesn't even make
| sense. By that logic, anything that's expensive becomes a
| fair use. It's facially ridiculous. Them being expensive
| is neither sufficient nor necessary for them to be a fair
| use. Their transformative nature is both sufficient and
| necessary to be found a fair use. Full stop.
|
| >Microsoft does this with an automated system. There is
| no single automated system where you can get every book
| ever written, and separately interfacing with all of the
| many systems needed in order to do it is the source of
| the overhead.
|
| Okay, and? They don't need to get every single book ever
| written. The libraries they pirated do not consist of
| "every single book ever written". It's hard to take this
| argument in good faith because you're being so
| ridiculous.
| AnthonyMouse wrote:
| > No, that's not the most important factor. The
| transformative factor is the most important.
|
| It's a four factor test because all of the factors are
| relevant, but if the use has negligible effect on the
| market for the work then it's pretty hard to get anywhere
| with the others. For example, for cases like classroom
| use, even making verbatim copies of the entire work is
| often still fair use. Buying a separate copy for each
| student to use for only a few minutes would make that use
| uneconomical.
|
| > Effect on market for the work doesn't even support your
| argument anyway. You're argument is about the cost of
| making the end product, which is totally distinct from
| the market effects on the copyright holder when the
| infringer makes and releases the infringing product.
|
| We're talking about the temporary copies they make during
| training. Those aren't being distributed to anyone else.
|
| > So? That doesn't make you right.
|
| Making a copy of everything on the internet is a
| prerequisite to making a search engine. It's something
| you have to do as a step to making the index, which is
| the transformative step. Are you suggesting that doing
| the first step is illegal or what do you propose
| justifies it?
|
| > By that logic, anything that's expensive becomes a fair
| use. It's facially ridiculous.
|
| Anything with unreasonably high transaction costs. Why is
| that ridiculous? It doesn't exempt any of the normal
| stuff like an individual person buying an individual
| book.
|
| > They don't need to get every single book ever written.
|
| They need to get as many books as possible, with the
| platonic ideal being every book. Whether or not the ideal
| is feasible in practice, the question is whether it's
| socially beneficial to impose a situation with
| excessively high transaction costs in order to require
| something with only trivial benefit to authors
| (potentially selling one extra copy).
| throwawayffffas wrote:
| I don't even think their argument is about the money, I
| think it's more like we couldn't possibly find all these
| works in any other practical way.
| irthomasthomas wrote:
| Is copyright in America different to Britain? There, it
| is legal to download books you don't own. Only
| distribution is a crime, which most torrenters break by
| seeding.
| rahimnathwani wrote:
| What do you mean by 'it is legal'?
|
| Do you mean:
|
| A) It's not a criminal offence?
|
| B) The copyright owner cannot file a civil suit for
| damages?
|
| C) Something else?
| irthomasthomas wrote:
| > Only distribution is a crime
| throwawayffffas wrote:
| Only distribution with the intent to make money is a
| crime. If you are doing it for free you are not
| criminally liable. Unless I am missing something.
| rahimnathwani wrote:
| What relevance does that have to the present case? The
| judge, in this civil matter, said there would be a trial.
| He didn't say anything about it being a criminal trial.
| The strings 'crim' and 'felon' do not appear in the
| ruling. We will have a trial on the
| pirated copies used to create Anthropic's central library
| and the resulting damages, actual or statutory (including
| for willfulness).
| Aeolun wrote:
| There can always be a trial, even if nothing was done to
| warrant it.
|
| I think the distinction between civil and criminal trials
| is smaller in my home country. The fact that there is a
| trial at all implies that someone commited a 'crime'.
| throwawayffffas wrote:
| I think it's very similar in both countries, but you have
| got it wrong. Downloading a book without permission is
| copyright infringement in both countries, regardless of
| whether you distribute it.
|
| In the UK it's a criminal offense if you distribute a
| copyrighted work with the intent to make gain or with the
| expectation that the owner will make a loss.
|
| Gain and loss are only financial in this context.
|
| Meaning that in both countries the copyright owner can
| sue you for copyright infringement.
| dragonwriter wrote:
| > So all they have to do is go and buy a copy of each book
| they pirated.
|
| No, that doesn't undo the infringement. At most, that would
| mitigate actual damages, but actual damages aren't likely
| to be important, given that statutory damages are an
| alternative and are likely to dwarf actual damages. (It may
| also figure into how the court assigns statutory damages
| within the very large range available for those, but that
| range does not go down to $0.)
|
| > They will have ceased and desisted.
|
| "Cease and desist" is just to stop incurring _additional_
| liability. (A potential plaintiff may accept that as
| sufficient to _not sue_ if a request is made and the
| potential defendant complies, because litigation is
| uncertain and expensive. But "cease and desist" doesn't
| undo wrongs and neutralize liability when they've already
| been sued over.)
| rockemsockem wrote:
| > So all they have to do is go and buy a copy of each
| book they pirated.
|
| For anyone else who wants to do the same thing though
| this is likely all they need to do.
|
| Cutting up and scanning books is hard work and actually
| doing the same thing digitally to ebooks isn't labor free
| either, especially when they have to be downloaded from
| random sites and cleaned from different formats.
| Torrenting a bunch of epubs and paying for individual
| books is probably cheaper
| tzs wrote:
| Generally you don't want laws to work that way. You want to
| set the penalties so that they discourage violating the
| law.
|
| Setting the penalty to what it would have cost to obey the
| law in the first place does the opposite.
| AnthonyMouse wrote:
| That's for criminal laws where prosecutorial discretion
| can then (in principle) be used in borderline cases to
| prevent unjust outcomes.
|
| If you give people a claim for damages which is an order
| of magnitude larger than their actual damages, it
| encourages litigiousness and becomes a vector for
| shakedowns because the excessive cost of losing pressures
| innocent defendants to settle even if there was a 90%
| chance they would have won.
|
| Meanwhile both parties have the incentive to settle in
| civil cases when it's obvious who is going to win,
| because a settlement to pay the damages is cheaper than
| the cost of going to court and then having to pay the
| same damages anyway. Which also provides a deterrent to
| doing it to begin with, because even having to pay
| lawyers to negotiate a settlement is a cost you don't
| want to pay when it's clear that what you're doing is
| going to have that result.
|
| And when the result _isn 't_ clear, penalizing the
| defendant in a case of first impression isn't just
| either, _because_ it wasn 't clear and punitive measures
| should be reserved for instances of unambiguous
| wrongdoing.
| badlibrarian wrote:
| Statutory damages were written into the first federal
| copyright law in 1790, and earlier in state law
| (specified in Pounds because the dollar hadn't been
| invented yet).
| AnthonyMouse wrote:
| The first federal copyright law in 1790:
|
| https://copyright.gov/about/1790-copyright-act.html
|
| Specified in dollars because dollars _had_ been invented
| (in 1789), but in the amount of one half of one dollar,
| i.e. $0.50. That 's 1790 dollars, of course, so a little
| under $20 today. (There was basically no inflation for
| the first 100+ years of that because the US dollar was
| still backed by precious metals then; a dollar was worth
| slightly _more_ in 1900 than in 1790.)
|
| That seems more like an attempt to codify some amount of
| plausible actual damages so people aren't arguing
| endlessly about valuations, rather than an attempt to
| impose punitive damages. Most notably because -- unlike
| the current method -- it scales with the number of sheets
| reproduced.
| badlibrarian wrote:
| My fault for the hanging clause: nearly a dozen state
| laws preceded it and used pounds. Mostly because they
| were based on the British law and also because the war
| made a mess of the currency situation.
|
| Statutory damages were added to reduce the burden on
| plaintiffs. Which encourages people to stay in line. How
| well this worked out and what it means when some company
| nobody heard of 4 years ago downloads a billion
| copyrighted pages and raises $3.5 billion against a $60
| billion valuation...
|
| Well suddenly $20/page still sounds about right.
| AnthonyMouse wrote:
| The <$20/page was the same for maps and charts, i.e.
| things that typically have a single page in the entire
| work, and came from a time when printing was done a page
| at a time, i.e. you'd lay out a page and print as many
| copies of that page as you'd expect to make copies of the
| entire book, then hide them somewhere else while you
| print the next page. It was basically a proxy for the
| number of copies of the work they caught you trying to
| make, not an attempt to turn a single copy of a 1000 page
| book into a 1000x multiplier on liability. Notice that
| otherwise you're letting the infringer choose the amount
| of the damages, because a larger page size or tighter
| layout would fit more words per page and therefore have
| fewer pages per book. (How many "pages" is an HTML
| document with infinite scroll?)
|
| > Statutory damages were added to reduce the burden on
| plaintiffs. Which encourages people to stay in line.
|
| It encourages people to not spend a lot of resources
| speculating about damages. That doesn't mean you need the
| amount to be punitive rather than compensatory.
| badlibrarian wrote:
| Agree that a photo of a celebrity and a film containing
| that celebrity shouldn't have the same number. But a
| large punitive number in the context of willful
| infringement seems right to me. And in practice it's all
| negotiated down anyway, as evidenced by Internet
| Archive's fourth 30-day stay of its pending $600+ million
| lawsuit.
| AnthonyMouse wrote:
| "In practice it's negotiated down anyway" is precisely
| the issue. If they bring a questionable case against you
| and you think there's a significant chance you could win,
| but then there's a small chance you get bankrupted, there
| is unreasonable pressure for you to settle even if the
| plaintiffs are in the wrong.
| badlibrarian wrote:
| I'm not sure what a "questionable case" for willful
| copyright infringement might look like. Or an example
| where someone was clearly in the right and got screwed.
| It isn't the debtor's prison era.
|
| Four factor test seems to be working, even in this case.
| Don't love it (it goes against my values and what I need
| to do in my job) but I get it.
|
| Edit: we've triggered HN's patience for this discussion
| and it's now blocking replies. You do seem a bit long on
| Google and short on practical experience here. How else
| would you propose these types of disagreements get
| sorted? ("Anyone can be sued for anything"
| notwithstanding.)
|
| There are explicltly no punitive damages in US Copyright
| law. And the "willful" provision in practice means
| demonstrating ongoing disregard, after being informed.
| It's a long walk to the end of that plank.
| AnthonyMouse wrote:
| > I'm not sure what a "questionable case" for willful
| copyright infringement might look like.
|
| You did anything which it's not clear whether it's fair
| use or not. Willfulness is whether you knew you were
| doing it, not whether you knew whether it was fair use,
| which in many cases _nobody_ knows until a court decides
| it, hence the problem.
|
| You have to do it in order to get into court and find out
| of you're allowed to do it (a ridiculous prerequisite to
| begin with), and then if it goes against you, you have to
| pay punitive damages?
| jonas21 wrote:
| As they mentioned, the piracy part is obvious. It's the fair
| use part that will set an important precedent for being able
| to train on copyrighted works as long as you have legally
| acquired a copy.
| wood_spirit wrote:
| Cue physical books being licensed not sold in the futur
| with restricted agreements ...
| pier25 wrote:
| Also music, videos, photos, etc.
| mormegil wrote:
| See first-sale doctrine
| <https://en.wikipedia.org/wiki/First-sale_doctrine>
| jasonlotito wrote:
| From my understanding:
|
| > pirating the books for their digital library is not fair
| use.
|
| "Pirating" is a fuzzy word and has no real meaning.
| Specifically, I think this is the cruz:
|
| > without adding new copies, creating new works, or
| redistributing existing copies
|
| Essentially: downloading is fine, sharing/uploading up is
| not. Which makes sense. The assertion here is that Anthropic
| (from this line) did not distribute the files they
| downloaded.
| codedokode wrote:
| Downloading and using pirated software in a company is fine
| then as long as it is not shared outside? If what you
| describe is legal it makes no sense to pay for software.
| jasonlotito wrote:
| > Downloading a document is fine as long as it is not
| shared outside?
|
| I've fixed your question so that it accurately represents
| what I said and doesn't put words in my mouth.
|
| If I click on a link and download a document, is that
| illegal?
|
| I do not know if the person has the right to distribute
| it or not. IANAL, but when people were getting sued by
| the RIAA years back, it was never about downloading, but
| also distribution.
|
| As I said, IANAL, but feel free to correct me, but my
| understanding is that downloading a document from the
| internet is not illegal.
| CaptainFever wrote:
| > it was never about downloading, but also distribution.
|
| Did you mean to write "but about distribution" here?
| jasonlotito wrote:
| Yes, thank you for catching that. Unfortunately, I cannot
| edit it now.
| pyrale wrote:
| sci-hub suddenly becomes legal if all researchers adhere
| to one big company, apparently.
|
| After all, illegally downloading research papers in order
| to write new ones is highly transformative.
| AlotOfReading wrote:
| The legal context here is that "format shifting" has not
| previously been held to be sufficient for fair use on its
| own, and downloading for personal use has also been
| considered infringing. Just look at the numerous media
| industry lawsuits against individuals that only mention
| downloading, not sharing for examples.
|
| It's a bit surprising that you can suddenly download
| copyrighted materials for personal use and and it's kosher
| as long as you don't share them with others.
| jasonlotito wrote:
| > the numerous media industry lawsuits against
| individuals that only mention downloading,
|
| I never saw any of these. All the cases I saw were
| related to people using torrents or other P2P software
| (which aren't just downloading). These might exist, but I
| haven't seen them.
|
| > It's a bit surprising that you can suddenly download
| copyrighted materials for personal use and it's kosher as
| long as you don't share them with others.
|
| Every click on a link is a risk of downloading
| copyrighted material you don't have the rights to.
|
| Searching the internet, it appears that it's a civil
| infraction, but it's also confused with the notion that
| "piracy" is illegal, a term that's used for many
| different purposes. I see "It is illegal to download any
| music or movies that are copyrighted." under legal
| advice, which I know as a statement is not true.
|
| Hence my confusion.
|
| I should note: I'm not arguing from the perspective of
| whether it's morally or ethically right. Only that even
| in the context of this thread, things are phrased that
| aren't clear.
| AlotOfReading wrote:
| I just checked first individual suit I could find, which
| was BMG v. Gonzalez. She used P2P, but the case was
| specifically about her _downloading_ , not
| redistributing.
| eikenberry wrote:
| Given that downloading requires you to copy the data to
| download it, I'd think it would fall under "adding new
| copies".
| jasonlotito wrote:
| > All Anthropic did was replace the print copies it had
| purchased ... with more convenient space-saving and
| searchable digital copies for its central library --
| without adding new copies..."
|
| That suggests otherwise.
| pier25 wrote:
| > _buying, physically cutting up, physically digitizing
| books, and using them for training is fair use_
|
| So Suno would only really need to buy the physical albums and
| rip them to be able to generate music at an industrial scale?
| theteapot wrote:
| Yes.
| pier25 wrote:
| Actually it remains to be seen.
|
| If you read the ruling, training was considered fair use
| in part because Claude is not a book generation tool.
| Hence it was deemed transformative. Definitely not what
| Suno and Udio are doing.
| ohdeargodno wrote:
| Only if the physical albums don't have copy protection,
| otherwise you're circumenventing it and that's illegal. Or
| is it, against the right to private copy? If anything, AI
| at least shows that all of the existing copyright laws are
| utter bullshit made to make Disney happy.
|
| Do keep in mind though: this is only for the wealthy.
| They're still going to send the Pinkertons at your house if
| you dare copy a Blu-ray.
| zerocrates wrote:
| With some minor exceptions, CDs don't have copy
| protection.
| FateOfNations wrote:
| Minor exception: https://en.wikipedia.org/wiki/Sony_BMG_c
| opy_protection_rootk...
| nilamo wrote:
| > They're still going to send the Pinkertons at your
| house if you dare copy a Blu-ray.
|
| Hey woah now, that's a Hasbro play, not a Disney one.
| kbelder wrote:
| No, because they can just play the album for the AI to
| learn. AI training can be set up to exploit the analog
| hole. Same with images/movies
| itronitron wrote:
| If it's fair use to train a model, that doesn't necessarily
| imply that the model can be legally used to generate
| anything.
| pier25 wrote:
| I've been reading a bit more about this. The training
| might not be considered fair use if it's not considered
| transformative.
|
| Claude has been considered transformative given it's not
| really meant to generate books but Suno or Midjourney are
| absolutely in another category.
| make3 wrote:
| this is funny and potentially accurate
| protocolture wrote:
| Well there was that legal company who trained an LLM on
| their oppositions legal documents and then generated
| their own. I dont think inputs or outputs were ruled
| legal in that regard.
|
| But as long as the model isnt outputting infringing works
| theres not really any issue there either.
| jbverschoor wrote:
| Same how it works in the Netherlands.
| conradev wrote:
| Yes! Training and generation are fair use. You are free to
| train and generate whatever you want in your basement for
| whatever purpose you see fit. Build a music collection, go
| ham.
|
| If the output from said model uses the voice of another
| person, for example, we already have a legal framework in
| place for determining if it is infringing on their rights,
| independent of AI.
|
| Courts have heard cases of individual artists copying
| melodies, because melodies themselves are copyrightable:
| https://www.hypebot.com/hypebot/2020/02/every-possible-
| melod...
|
| Copyright law is a lot more nuanced than anyone seems to
| have the attention span for.
| pier25 wrote:
| > _Yes!_
|
| But Suno is definitely not training models in their
| basement for fun.
|
| They are a private company selling music, using music
| made by humans to train their models, to replace human
| musicians and artists.
|
| We'll see what the courts say but that doesn't sound like
| fair use.
| conradev wrote:
| My understanding is that Suno does _not sell music_ , but
| instead makes a tool for musicians to generate music and
| sells access to this _tool_.
|
| The law doesn't distinguish between basement and cloud -
| it's a service. You can sell access to the service
| without selling songs to consumers.
| burnt-resistor wrote:
| So not only did they pirate works but they destroyed
| possibly collectible physical copies too. Kafkaesque.
| bigyabai wrote:
| Google set the precedent for this with an _even less
| transformative_ use case: https://en.wikipedia.org/wiki/A
| uthors_Guild,_Inc._v._Google,....
| AnthonyMouse wrote:
| > That is, he ruled that
|
| > - buying, physically cutting up, physically digitizing
| books, and using them for training is fair use
|
| > - pirating the books for their digital library is not fair
| use.
|
| That seems inconsistent with one another. If it's fair use,
| how is it piracy?
|
| It also seems pragmatically trash. It doesn't do the authors
| any good for the AI company to buy _one_ copy of their book
| (and a used one at that), but it _does_ make it much harder
| for smaller companies to compete with megacorps for AI stuff,
| so it 's basically the stupidest of the plausible outcomes.
| MrJohz wrote:
| These are two separate actions that Anthropic did:
|
| * They downloaded a massive online library of pirated books
| that someone else was distributing illegally. This was not
| fair use.
|
| * They then digitised a bunch of books that they physically
| owned copies of. This was fair use.
|
| This part of the ruling is pretty much existing law. If you
| have a physical book (or own a digital copy of a book), you
| can largely do what you like with it within the confines of
| your own home, including digitising it. But you are not
| allowed to distribute those digital copies to others, nor
| are you allowed to download other people's digital copies
| that you don't own the rights to.
|
| The interesting part of this ruling is that once Anthropic
| had a legal digital copy of the books, they could use it
| for training their AI models and then release the AI
| models. According to the judge, this counts as fair use
| (assuming the digital copies were legally sourced).
| AnthonyMouse wrote:
| > This part of the ruling is pretty much existing law. If
| you have a physical book (or own a digital copy of a
| book), you can largely do what you like with it within
| the confines of your own home, including digitising it.
| But you are not allowed to distribute those digital
| copies to others, nor are you allowed to download other
| people's digital copies that you don't own the rights to.
|
| Can you point me to the US Supreme Court case where this
| is existing law?
|
| It's pretty clear that if you have a physical copy of a
| book, you can lend it to someone. It also seems pretty
| reasonable that the person borrowing it could make fair
| use of it, e.g. if you borrow a book from the library to
| write a book review and then quote an excerpt from it. So
| the only thing that's left is, what if you do the same
| thing over the internet?
|
| Shouldn't we be able to distinguish this from the case
| where someone is distributing multiple copies of a work
| without authorization and the recipients are each making
| and keeping permanent copies of it?
| MrJohz wrote:
| I cannot point to the case, because my entire knowledge
| about the legality of this stuff comes from vaguely
| following the articles about this case. But feel free to
| read the judgement in this case where it will be spelled
| out in much more detail.
|
| Also, I don't quite understand how your example is
| relevant to the case. If you give a book to a friend,
| they are now the owner of that book and can do what they
| like with it. If you photocopy that book and give them
| the photocopy, they are not the owner of the book and you
| have reproduced it without permission. The same is, I
| believe, true of digital copies - this is how ebook
| libraries work.
|
| In this case, Anthropic were the legal owners of the
| physical books, and so could do what they wanted with
| them. They were not the legal owners of the digital
| books, which means they can get prosecuted for copyright
| infringement.
| AnthonyMouse wrote:
| > If you give a book to a friend, they are now the owner
| of that book and can do what they like with it.
|
| We're talking about lending rather than ownership
| transfers, though of course you could regard lending as a
| sort of ownership transfer with an agreement to transfer
| it back later.
|
| > If you photocopy that book and give them the photocopy,
| they are not the owner of the book and you have
| reproduced it without permission.
|
| But then the question is whether the copy is fair use,
| not who the owner of the original copy was, right? For
| example, you can make a fair use photocopy of a page from
| a library book.
|
| > They were not the legal owners of the digital books,
| which means they can get prosecuted for copyright
| infringement.
|
| Even if the copy they make falls under fair use and the
| person who does own that copy of the book has no
| objection to their doing this?
| op00to wrote:
| It is "established" law because the Copyright Act itself
| and a string of unanimous or near-unanimous appellate
| decisions (google ReDigi on digital transfers and Sony
| and the first-sale for personal use and physical lending)
| uniformly apply the same principles, leaving no circuit
| split and no conflicting precedent for the Supreme Court
| to resolve. In the U.S. system statutory text interpreted
| consistently by the Courts of Appeals becomes binding law
| nationwide unless and until the Supreme Court or Congress
| says otherwise.
| AnthonyMouse wrote:
| Sony v. Universal _is_ a Supreme Court case, but that 's
| the one where they say that sort of thing is fair use
| rather than that it isn't. ReDigi isn't a Supreme Court
| case, and it seems rather inconsistent with the Sony case
| which is. To claim uniformity you'd then need all the
| other circuit courts coming to the same conclusion rather
| than just not having had any relevant cases there yet,
| but is that the case?
| cusaitech wrote:
| The judge said they can train however I believe the judge
| did not make any ruling regarding model outputs
| icelancer wrote:
| > You skipped quotes about the other important side:
|
| He said:
|
| > It was always somewhat obvious that pirating a library
| would be copyright infringement.
|
| ??
| alok-g wrote:
| AFAIK, Judge Vince Chhabria has countered that Fair Use
| argument in a later order involving Meta.
|
| https://www.courtlistener.com/docket/67569326/598/kadrey-v-m...
|
| Note: I am not a lawyer.
| franczesko wrote:
| Is fruit of the poisonous tree rule applicable here?
| sershe wrote:
| Im not sure how I feel about what anthropic did on merit as a
| matter of scale, but from a legalistic standpoint how is it
| different from using the book to train the meat model in my
| head? I could even learn bits by heart and quote them in
| context.
| MaxPock wrote:
| How times change .They wanted to lock up Aaron Schwartz for
| life for essentially doing the same thing Anthropic is doing.
| guywithahat wrote:
| If you own a book, it should be legal for your computer to take a
| picture of it. I honestly feel bad for some of these AI companies
| because the rules around copyright are changing just to target
| them. I don't owe copyright to every book I read because I may
| subconsciously incorporate their ideas into my future work.
| raincole wrote:
| Are we reading the same article? The article explicitly states
| that it's okay to cut up and scan the books you own to train a
| model from them.
|
| > I honestly feel bad for some of these AI companies because
| the rules around copyright are changing just to target them
|
| The ruling would be a huge win for AI companies if held. It's
| really weird that you reached the opposite conclusion.
| rapind wrote:
| Everything is different at scale. I'm not giving a specific
| opinion on copyright here, but it just doesn't make sense when
| we try to apply individual rights and rules to systems of
| massive scale.
|
| I really think we need to understand this as a society and also
| realize that moneyed interests will downplay this as much as
| possible. A lot of the problems we're having today are due to
| insufficient regulation differentiating between individuals and
| systems at scale.
| organsnyder wrote:
| The difference here is that an LLM is a mechanical process. It
| may not be deterministic (at least, in a way that my brain
| understands determinism), but it's still a machine.
|
| What you're proposing is considering LLMs to be equal to humans
| when considering how original works are created. You could make
| the argument that LLM training data is no different from a
| human "training" themself over a lifetime of consuming content,
| but that's a philosophical argument that is at odds with our
| current legal understanding of copyright law.
| kevinpet wrote:
| That's not a philosophical argument at odds with our current
| understanding of copyright law. That's exactly what this
| judge found copyright law currently is and it's quoted in the
| article being discussed.
| organsnyder wrote:
| Thanks for pointing that out. Obviously I hadn't read the
| whole article. That is an interesting determination the
| judge made:
|
| > Alsup ruled that Anthropic's use of copyrighted books to
| train its AI models was "exceedingly transformative" and
| qualified as fair use, a legal doctrine that allows certain
| uses of copyrighted works without the copyright owner's
| permission.
| JoeAltmaier wrote:
| There are still questions: is an AI a 'user' in the
| copyright sense?
|
| Or even, is an individual operating within the law as
| fair use, the same as a voracious all-consuming AI
| training bot consuming everything the same in spirit?
|
| Consider a single person in a National Park, allowed to
| pick and eat berries, compared to bringing a combine
| harvester to take it all.
| zerotolerance wrote:
| "Judge says training Claude on books was fair use, but piracy
| wasn't."
| atomicnumber3 wrote:
| The core problem here is that copyright already doesn't
| actually follow any consistent logical reasoning. "Information
| wants to be free" and so on. So our own evaluation of whether
| anything is fair use or copyrighted or infringement thereof is
| always going to be exclusively dictated by whatever a judge's
| personal take on the pile of logical contradictions is.
| Remember, nominally, the sole purpose of copyright is not
| rooted in any notions of fairness or profitability or anything.
| It's specifically to incentivize innovation.
|
| So what is the right interpretation of the law with regards to
| how AI is using it? What better incentivizes innovation? Do we
| let AI companies scan everything because AI is innovative? Or
| do we think letting AI vacuum up creative works to then
| stochastically regurgitate tiny (or not so tiny) slices of them
| at a time will hurt innovation elsewhere?
|
| But obviously the real answer here is money. Copyright is
| powerful because monied interests want it to be. Now that
| copyright stands in the way of monied interests for perhaps the
| first time, we will see how dedicated we actually were to
| whatever justifications we've been seeing for DRM and copyright
| for the last several decades.
| Bjorkbat wrote:
| Something missed in arguments such as these is that in
| measuring fair use there's a consideration of impact on the
| potential market for a rightsholder's present and future works.
| In other words, can it be proven that what you are doing is
| meaningfully depriving the author of future income.
|
| Now, in theory, you learning from an author's works and
| competing with them in the same market could meaningfully
| deprive them of income, but it's a very difficult argument to
| prove.
|
| On the other hand, with AI companies it's an easier argument to
| make. If Anthropic trained on _all_ of your books (which is
| somewhat likely if you 're a fairly popular author) and you saw
| a substantial loss of income after the release of one of their
| better models (presumably because people are just using the LLM
| to write their own stories rather than buy your stuff), then
| it's a little bit easier to connect the dots. A company used
| your works to build a machine that competes with you, which
| arguably violates the fair use principle.
|
| Gets to the very principle of copyright, which is that you
| shouldn't have to compete against "yourself" because someone
| copied you.
| parliament32 wrote:
| > a consideration of impact on the potential market for a
| rightsholder's present and future works
|
| This is one of those mental gymnastics exercises that makes
| copyright law so obtuse and effectively unenforceable.
|
| As an alternative, imagine a scriptwriter buys a textbook on
| orbital mechanics, while writing Gravity (2013). A large
| number of people watch the finished film, and learn something
| about orbital mechanics, therefore not needing the textbook
| anymore, causing a loss of revenue for the textbook author.
| Should the author be entitled to a percentage of Gravity's
| profit?
|
| We'd be better off abolishing everything related to copyright
| and IP law alltogether. These laws might've made sense back
| in the days of the printing press but they're just
| nonsensical nowadays.
| Bjorkbat wrote:
| Personally I think a more effective analogy would be if
| someone used a textbook and created an online course /
| curriculum effective enough that colleges stop recommending
| the purchase of said textbook. It's honestly pretty
| difficult to imagine a movie having a meaningful impact on
| the sale of textbooks since they're required for high
| school / college courses.
|
| So here's the thing, I don't think a textbook author going
| against a purveyor of online courseware has much of a
| chance, nor do I think it _should_ have much of a chance,
| because it probably lacks meaningful proof that their works
| made a contribution to the creation of the courseware.
| Would I feel differently if the textbook author could prove
| in court that a substantial amount of their material
| contributed to the creation of the courseware, and when I
| say "prove" I mean they had receipts to prove it? I think
| that's where things get murky. If you can actually prove
| that your works made a meaningful contribution to the thing
| that you're competing against, then maybe you have a point.
| The tricky part is defining meaningful. An individual
| author doesn't make a meaningful contribution to the
| training of an LLM, but a large number of popular and/or
| prolific numbers can.
|
| You bring up a good point, interpretation of fair use is
| difficult, but at the end of the day I really don't think
| we should abolish copyright and IP altogether. I think it's
| a good thing that creative professionals have some security
| in knowing that they have legal protections against having
| to "compete against themselves"
| TeMPOraL wrote:
| > _An individual author doesn 't make a meaningful
| contribution to the training of an LLM, but a large
| number of popular and/or prolific numbers can._
|
| That's a point I normally use to argue _against_ authors
| being entitled to royalties on LLM outputs. An individual
| author 's marginal contribution to an LLM is essentially
| nil, and could be removed from the training set with no
| meaningful impact on the model. It's only the
| accumulation of a very large amount of works that turns
| into a capable LLM.
| shrubble wrote:
| Let's say my AI company is training an AI on woodworking books
| and at the end, it will describe in text and wireframe drawings
| (but not the original or identical photos) how to do a particular
| task.
|
| If I didn't license all the books I trained on, am I not
| depriving the publisher of revenue, given people will pay me for
| the AI instead of buying the book?
| mathiaspoint wrote:
| The same argument applies to someone who learned from the book
| and wrote an article explaining the idea to someone else.
| mrkstu wrote:
| If you paid a human author to do the same you'd be breaking no
| law. Learning is the point of books existing in the first
| place.
| NoOn3 wrote:
| Humans learning, not machines learning is the point of books.
| hellohihello135 wrote:
| It's easy to point fingers at others. Meanwhile the top comment
| in this thread links to stolen content from Business Insider.
| fakeBeerDrinker wrote:
| How is it stolen from Business Insider? When I visit
| businessinsider.com/anthropic-cut-pirated-millions-used-books-
| train-claude-copyright-2025-6 I get the same story. My browser
| caches the story, and I save it for archival purposes. How is
| this theft?
| hellohihello135 wrote:
| BI decides who can access this content and who will get the
| paywall. The link to archive page allows people to access
| this content without permission. That's called stealing.
| fakeBeerDrinker wrote:
| When I hop on a VPN and enter ingconito mode from a clean
| browser session, bypassing their paywall, is that stealing?
| This doesn't meet the definition of stealing that I'm
| familiar with.
| jtrn wrote:
| Best godamn comment in this whole thread. Now we can have fun
| reading the the mental gymnastics !
| adolph wrote:
| Alsup detailed Anthropic's training process with books: The
| OpenAI rival spent "many millions of dollars" buying used
| print books, which the company or its vendors then
| stripped of their bindings, cut the pages, and scanned
| into digital files.
|
| I've noticed an increase in used book prices in the recent past
| and now wonder if there is an LLM effect in the market.
| codedokode wrote:
| If AI companies are allowed to use pirated material to create
| their products, does it mean that everyone can use pirated
| software to create products? Where is the line?
|
| Also please don't use word "learning", use "creating software
| using copyrighted materials".
|
| Also let's think together how can we prevent AI companies from
| using our work using technical measures if the law doesn't work?
| rvnx wrote:
| ~1B USD in cash is the line where laws apply very differently
| redcobra762 wrote:
| It's abusive and wrong to try and prevent AI companies from
| using your works at all.
|
| The whole point of copyright is to ensure you're paid for your
| work. AI companies shouldn't pirate, but if they pay for your
| work, they should be able to use it however they please,
| including training an LLM on it.
|
| If that LLM reproduces your work, then the AI company is
| violating copyright, but if the LLM doesn't reproduce your
| work, then you have not been harmed. Trying to claim harm when
| you haven't been due to some philosophical difference in
| opinion with the AI company is an abuse of the courts.
| codedokode wrote:
| It is not wrong at all. The author decides what to do with
| their work. AI companies are rich and can simply buy the
| rights or hire people to create works.
|
| I could agree with exceptions for non-commercial activity
| like scientific research, but AI companies are made for
| extracting profits and not for doing research.
|
| > AI companies shouldn't pirate, but if they pay for your
| work, they should be able to use it however they please,
| including training an LLM on it.
|
| It doesn't work this way. If you buy a movie it doesn't mean
| you can sell goods with movie characters.
|
| > then you have not been harmed.
|
| I am harmed because less people will buy the book if they can
| simply get an answer from LLM. Less people will hire me to
| write code if an LLM trained on my code can do it. Maybe
| instead of books we should start making applications that
| protect the content and do not allow copying text or making
| screenshots. ANd instead of open-source code we should
| provide binary WASM modules.
| redcobra762 wrote:
| If you reproduce the material from a work you've purchased
| then of course you're in violation of copyright, but that's
| not what an LLM does (and when it does I already conceded
| it's in violation and should be stopped). An LLM that
| _doesn 't_ "sell goods with movie characters" is not in
| violation.
|
| And the harm you describe is not a recognized harm. You
| don't own information, you own creative works in their
| entirety. If your work is simply a reference, then the fact
| being referenced isn't something you own, thus you are not
| harmed if that fact is shared elsewhere.
|
| It is an abuse of the courts to attempt to prevent people
| who have purchased your works from using those works to
| train an LLM. It's morally wrong.
| codedokode wrote:
| To load a printed book into a computer one has to
| reproduce it in digital form without authorization.
| That's making a copy.
| redcobra762 wrote:
| Making a digital copy of a physical book is fair use
| under every legal structure I am aware of.
|
| When you do it for a transformative purpose (turning it
| into an LLM model) it's certainly fair use.
|
| But more importantly, it's _ethical_ to do so, as the
| agreement you 've made with the person you've purchased
| the book from included permission to do exactly that.
| seadan83 wrote:
| Per the ruling, the problem is the books were not
| purchased, they were downloaded from black market
| websites. It's akin to shoplifting, what you do later
| with the goods is a different matter.
|
| Reasonable minds could debate the ethics of how the
| material was used, this ruling judged the usage was legal
| and fair use. The only problem is the material was in
| effect stolen.
| CaptainFever wrote:
| > It is worse than ineffective; it is wrong too, because
| software developers should not exercise such power over
| what users do. Imagine selling pens with conditions about
| what you can write with them; that would be noisome, and
| we should not stand for it. Likewise for general
| software. If you make something that is generally useful,
| like a pen, people will use it to write all sorts of
| things, even horrible things such as orders to torture a
| dissident; but you must not have the power to control
| people's activities through their pens. It is the same
| for a text editor, compiler or kernel.
|
| Sorry for the long quote, but basically this, yeah. A
| major point of free software is that creators should not
| have the power to impose arbitrary limits on the users of
| their works. It is unethical.
|
| It's why the GPL allows the user to disregard any
| additional conditions, why it's viral, and why the FSF
| spends so much effort on fighting "open source but..."
| licenses.
| CaptainFever wrote:
| > Maybe instead of books we should start making
| applications that protect the content and do not allow
| copying text or making screenshots.
|
| https://en.wikipedia.org/wiki/Analog_hole
| DrillShopper wrote:
| > The whole point of copyright is to ensure you're paid for
| your work.
|
| No. The point of copyright is that the author gets to decide
| under what terms their works are copied. That's the essence
| of copyright. In many cases, authors will happily sell you a
| copy of their work, but they're under no obligation to do so.
| They can claim a copyright and then never release their work
| to the general public. That's perfectly within their rights,
| and they can sue to stop anybody from distributing copies.
| redcobra762 wrote:
| We're operating under a model where the owner of the
| copyright _has_ already sold their work. And while it 's
| within their rights to stipulate conditions of the sale,
| they did not do that, and fair use of the work as governed
| under the laws the book was sold under encompasses its
| conversion into an LLM model.
|
| If the author didn't want their work to be included in an
| LLM, they should not have sold it, just like if an author
| didn't want their work to inspire someone else's work, they
| should not have sold it.
| DrillShopper wrote:
| > fair use of the work as governed under the laws the
| book was sold under encompasses its conversion into an
| LLM model
|
| If that were the case then this court case would not be
| ongoing
| lcnPylGDnU4H9OF wrote:
| That seems to be a misunderstanding of what's disputed.
| One fact that is disputed is whether or not the use of
| the work qualifies as fair use and the judge determined
| that it is because the result is sufficiently
| transformative. Another disputed fact is whether the
| books were acquired legally and the judge determined that
| they were not. The reason the case is still ongoing is to
| determine Anthropic's liability for illegally acquiring
| copies of the books, not to determine the legal status of
| the LLMs.
| seadan83 wrote:
| Yeah, this is part of the ruling. The judge decided that
| the usage was sufficiently transformative and thus fair
| use. The issue is the authors were selling their works
| and the company went to a black market instead.
| 827a wrote:
| Current copyright law is not remotely sophisticated enough to
| make determinations on AI fair use. Whether the courts say
| current AI use is fair is irrelevant to the discussion most
| people on this side would agree with: That we need new laws.
| The work the AI companies stole to train on was created under
| a copyright regime where the expectation was that, eh, a few
| people would learn from and be inspired from your work, and
| that feels great because you're empowering other humans.
| Scale does not amplify Good. The regime has changed. The
| expectations under what kinds of use copyright protects
| against has fundamentally changed. The AI companies invented
| New Horrors that no one could have predicted, Vader altered
| the deal, no reasonable artist except the most forward-
| thinking sci-fi authors would have remotely guessed what
| their work would be used for, and thus could never have
| conciously and fairly agreed to this exchange. Very few would
| have agreed to it.
| xdennis wrote:
| > It's abusive and wrong to try and prevent AI companies from
| using your works at all.
|
| People don't view moral issues in the abstract.
|
| A better perspective on this is the fact that human
| individuals have created works which megacorps are training
| on for free or for the price of a single book and creating
| models which replace individuals.
|
| The megacorps are only partially replacing individuals now,
| but when the models get good enough they could replace humans
| entirely.
|
| When such a future happens will you still be siding with them
| or with individual creators?
| whycome wrote:
| > A better perspective on this is the fact that human
| individuals have created works which megacorps are training
| on for free or for the price of a single book and creating
| models which replace individuals.
|
| Those damn kind readers and libraries. Giving their single
| copy away when they just paid for the single.
| whycome wrote:
| But the AI used the content to learn how to copy and recreate
| it. Is 're-creation' a better concept for us?
|
| People already use pirated software for product creation.
|
| Hypothetical:
|
| I know a guy who learned photoshop on a pirated copy of
| Photoshop. He went on to be a graphic designer. All his
| earnings are 'proceeds from crime'
|
| He never used the pirated software to produce content.
| timeon wrote:
| So can we officially download pirated content to learn stuff
| now?
| southernplaces7 wrote:
| Sure, and I feel zero moral qualms about me or anyone else
| doing it. The vast majority of the shit flows flows from
| the other direction towards individuals and consumers when
| it comes to content delivery companies and worse still,
| software companies. Let's address that before wringing our
| hands about individual acts of "piracy", even at scale.
|
| I could, right now in just a few minutes, go download a
| perfectly functional pirated copy of nearly any Adobe
| program, nearly any Microsoft program and a whole range of
| books and movies, yet I see zero real financial troubles
| affecting any of the companies behind these. All the
| contrary in fact.
| whycome wrote:
| How often does a link get posted here of content that is
| behind a paywall? If you bypass it to read it, didny't you
| just learn via illegal content? I'm not sure where the
| "official" comes in, but it's clearly widely accepted.
|
| If you watch a YouTube video to learn something and it's
| later taken down for using copyrighted images, you learned
| from illegal content.
| megaman821 wrote:
| Where are you reading that?
|
| You are allowed to buy and scan books, and then used those
| scanned books to create products. I guess you are also allowed
| to pirate books and use the knowledge to create products if you
| are willing to pay the damages to the rights holders for
| copyright violations.
| stackedinserter wrote:
| When I was young and poor I learned on pirated software. Do I
| owe Adobe, Microsoft and others a percentage of my today
| income?
| koolala wrote:
| Anyone read the 2006 sci-fi book Rainbow's End that has this? It
| was set in 2025.
| solfox wrote:
| I was 100% thinking this. GREAT book. And they, too, shredded
| books to ingest them into the digital library! I don't recall
| if it was an attempt to bypass copyright though; in Rainbow's
| End, it was more technical, as it was easier to shred, scan the
| pieces, and reassemble them in software, rather than scanning
| each page.
| Uhhrrr wrote:
| From Vinge's "Rainbow's End":
|
| > In fact this business was the ultimate in deconstruction: First
| one and then the other would pull books off the racks and toss
| them into the shredder's maw. The maintenance labels made calm
| phrases of the horror: The raging maw was a "NaviCloud custom
| debinder." The fabric tunnel that stretched out behind it was a
| "camera tunnel...." The shredded fragments of books and magazine
| flew down the tunnel like leaves in tornado, twisting and
| tumbling. The inside of the fabric was stitched with thousands of
| tiny cameras. The shreds were being photographed again and again,
| from every angle and orientation, till finally the torn leaves
| dropped into a bin just in front of Robert. Rescued data.
| BRRRRAP! The monster advanced another foot into the stacks,
| leaving another foot of empty shelves behind it.
| microtherion wrote:
| Yes, I was thinking of this passage as well. The technology
| does not seem to have advanced to this particular point yet.
| codedokode wrote:
| > "Like any reader aspiring to be a writer, Anthropic's LLMs
| trained upon works not to race ahead and replicate or supplant
| them -- but to turn a hard corner and create something
| different," he wrote.
|
| But this analogy seems wrong. First, LLM is not a human and
| cannot "learn" or "train" - only human can do it. And LLM
| developers are not aspiring to become writers and do not learn
| anything, they just want to profit by making software using
| copyrighted material. Also people do not read millions of books
| to become a writer.
| CaptainFever wrote:
| > But this analogy seems wrong. First, LLM is not a human and
| cannot "learn" or "train" - only human can do it.
|
| The analogy refers to humans using machines to do what would
| already be legally if they did it manually.
|
| > And LLM developers are not aspiring to become writers and do
| not learn anything, they just want to profit by making software
| using copyrighted material.
|
| [Citation needed], and not a legal argument.
|
| > Also people do not read millions of books to become a writer.
|
| But people do hear millions of words as children.
| m4rtink wrote:
| Anyone else thinks destroying books for any reason is wrong ?
|
| Or is it perhaps not an universal cultural/moral aspect ?
|
| I guess for example in Europe people could be more sensitive to
| it.
| lawlessone wrote:
| If they aren't one of a kind and they digitally preserved them
| in some way i think i would be ok with it.
|
| Saying that though there are tools for digitizing books that
| don't require destroying them
| stackedinserter wrote:
| There's nothing sacred about books. There are plenty of books
| that won't be missed if destroyed.
| kbelder wrote:
| I have purposefully destroyed one book in my life, in order to
| prevent anyone from reading it:
|
| _Man of Two Worlds_ by Brian Herbert.
|
| ...and I did the world a favor.
| codedokode wrote:
| By the way I wonder if recent advancement in protecting Youtube
| videos from downloaders like yt-d*p are caused by unwillingness
| to help rival AI companies gather the datasets.
| lvl155 wrote:
| It's marginally better than Meta torrenting z-lib.
| randomNumber7 wrote:
| I will never feel bad again for learning from copied books /S
| jimnotgym wrote:
| Hang on, it is OK under copyright law to scan a book I bought
| second hand, destroy the hard copy and keep the scan in my online
| library? That doesn't seem to chime with the copyright notices I
| have read in books.
| badlibrarian wrote:
| First sale doctrine gives the person who sold the book you
| bought the right to sell it to you. Fair Use permits you to
| scan your copy, used or new. It's your book, you can destroy
| it. But you have to delete your digital copy if you sell it or
| give it away. And you can't distribute your digital copy.
| kube-system wrote:
| Fair use can be a pretty gray area and details matter, but
| copying for personal use is frequently okay.
|
| > That doesn't seem to chime with the copyright notices I have
| read in books.
|
| You shouldn't get your legal advice from someone with skin in
| the game.
| platunit10 wrote:
| Every time an article like this surfaces, it always seems like
| the majority of tech folks believe that training AI on
| copyrighted material is NOT fair use, but the legal industry
| disagrees.
|
| Which of the following are true?
|
| (a) the legal industry is susceptible to influence and corruption
|
| (b) engineers don't understand how to legally interpret legal
| text
|
| (c) AI tech is new, and judges aren't technically qualified to
| decide these scenarios
|
| Most likely option is C, as we've seen this pattern many times
| before.
| rockemsockem wrote:
| Idk, I think most people in tech I talk to IRL think it is fair
| use?
|
| I think the overly liberal, non-tech crowd has become really
| vocal on HN as of late and your sample is likely biased by
| these people.
| 827a wrote:
| Armchair commentators, including myself, tend to be imprecise
| when speaking about whether something is illegal, versus
| something should be illegal. Sometimes due to a
| misunderstanding of the law, or an over-estimation of the
| court's authority, or an over-estimation of our legislature's
| productivity, or just because we're making conversation and
| like talking.
| CaptainFever wrote:
| > Every time an article like this surfaces, it always seems
| like the majority of tech folks believe that training AI on
| copyrighted material is NOT fair use
|
| Where are you getting your data from? My conclusions are the
| exact opposite.
|
| (Also, aren't judges by definition the only ones qualified to
| declare if it is _actually_ fair use? You could make a case
| that it _shouldn 't_ be fair use, but that's different from it
| _being_ not fair use.)
| redcobra762 wrote:
| It's not likely you've actually gotten the opinion of the
| "majority of tech folks", just the most outspoken ones, and
| only in specific bubbles you belong to.
| OkayPhysicist wrote:
| There's a lot of conflation of "should/shouldn't" and
| "is/isn't". The comments by tech folk you're alluding to mostly
| think that it "shouldn't" be fair use, out of concern about the
| societal consequences, whereas judges are looking at it and
| saying that it "is" fair use, based on the existing law.
|
| Any reasonable reading of the current state of fair use
| doctrine makes it obvious that the process between _Harry
| Potter and the Sorcerer 's Stone_ and "A computer program that
| outputs responses to user prompts about a variety of topics" is
| _wildly_ transformative, and thus the usage of the copyrighted
| material is probably covered by fair use.
| kube-system wrote:
| I know for sure (b) is true. Way too many people on technical
| forums read legal texts as if the process to interpret laws is
| akin to a compiler generating a binary.
| standardUser wrote:
| I don't understand at all the resistance to training LLMs on
| any and all materials available. Then again, I've always viewed
| piracy as a compatible with markets and a democratizing force
| upon them. I thought (wrongly?) that this was the widespread
| progressive/leftist perspective, to err on the side of access
| to information.
| freshtake wrote:
| If I allegedly train off of your training, which was trained
| off of copyrighted content under fair use, we're good right?
|
| Just asking for a friend who's into this sort of thing.
| mrguyorama wrote:
| Seeing as (a) is true in the US Supreme Court, it's probably at
| least as true in the lower courts.
| Zufriedenheit wrote:
| Maybe to give something back to the pirates, Anthropic could
| upload all the books they have digitized to the archive? /s
| pmdr wrote:
| They've all done that, it should be obvious by now. Training on
| just freely available data only gets you so far.
| stackedinserter wrote:
| Everybody that wants to train an LLM, should buy every single
| book, every single issue of a magazine or a newspaper, and
| personally ask every person that ever left a comment on social
| media. /s
|
| If I was China I would buy every lawyer to drown western AI
| companies in lawsuits, because it's an easy way to win AI race.
| IOT_Apprentice wrote:
| If Anthropic is funded by Amazon, they should have just asked
| Amazon for unlimited download of EVERY book in the Amazon book
| store, and all audio-books as well. It certainly would be faster
| than buying one copy of each and tearing it apart.
| godelski wrote:
| The solution has always been: show us the training data.
|
| As a researcher I've been furious that we publish papers where
| the research data is unknown. To add insult to injury, we have
| the _audacity_ to start making claims about "zero-shot", "low-
| shot", "OOD", and other such things. It is utterly laughable.
| These would be tough claims to make * _even if we knew the data*_
| , simply because of its size. But not knowing the data, it is
| outlandish. Especially because the presumptions are "everything
| on the internet." It would be like training on all of GitHub and
| then writing your own _simple_ programming questions to test an
| LLM[0]. Analyzing that amount of data is just intractable, and we
| currently do not have the mathematical tools to do so. But this
| is a much harder problem to crack when we 're just conjecturing
| and ultimately this makes interoperability more difficult.
|
| On top of all of that, we've been playing this weird legal game.
| Where it seems that every company has had to cheat. I can
| understand how smaller companies turn to torrenting to compete,
| but when it is big names like Meta, Google, Nvidia, OpenAI
| (Microsoft), etc it is just wild. This isn't even following the
| highly controversial advice of Eric Schmidt "Steal everything,
| then if you get big, let the lawyers figure it out." This is just
| "steal everything, even if you could pay for it." We're talking
| about the richest companies in the entire world. Some of the, if
| not _the_ , richest companies to ever exist.
|
| Look, can't we just try to be a _little_ ethical? There is, in
| fact, enough money to go around. We 've seen unprecedented growth
| in the last few years. It was only 2018 when Apple became the
| first trillion dollar company, 2020 when it became the second two
| trillion, and 2022 when it became the first three trillion dollar
| company. Now we have 10 companies north of the trillion dollar
| mark![3] (5 above $2T and 3 above $3T) These values have
| _exploded_ in the last 5 years! It feels difficult to say that we
| don 't have enough money to do things better. To at least not
| completely screw over "the little guy." I am unconvinced that
| these companies would be hindered if they had to broker some deal
| for training data. Hell, they're already going to war over data
| access.
|
| My point here is that these two things align. We're talking about
| how this technology is so dangerous (every single one of those
| CEOs has made that statement) and yet we can't remain remotely
| ethical? How can you shout "ONLY I CAN MAKE SAFE AI" while acting
| so unethically? There's always moral gray areas but is this
| really one of them? I even say this as someone who has torrented
| books myself![4] We are holding back the data needed to make AI
| safe and interpretable while handing the keys to those who
| actively demonstrate that they should not hold the power. I don't
| understand why this is even that controversial.
|
| [0] Yes, this is a snipe at HumanEval. Yes, I will make the
| strong claim that the dataset was spoiled from day 1. If you
| doubt it, go read the paper and look at the questions
| (HuggingFace).
|
| [1] https://www.theverge.com/2024/8/14/24220658/google-eric-
| schm...
|
| [2]
| https://en.wikipedia.org/wiki/List_of_public_corporations_by...
|
| [3] https://companiesmarketcap.com/
|
| [4] I can agree it is wrong, but can we agree there is a big
| difference between a student torrenting a book and a
| billion/trillion dollar company torrenting millions of books? I
| even lean on the side of free access to information, and am a fan
| of Aaron Swartz and SciHub. I make all my works available on
| ArXiv. But we can recognize there's a big difference between a
| singular person doing this at a small scale and a huge multi-
| national conglomerate doing it at a large scale. I can't even
| believe we so frequently compare these actions!
| damnesian wrote:
| seems like the "mis" is missing from the name.
| 2OEH8eoCRo0 wrote:
| Most of the comments missed the point. It's not that they trained
| on books, it's that they pirated the books.
| spandrew wrote:
| Amazon has been doing this since the 2000's. Fun fact: This is
| how AWS came about; for them to scale its "LOOK INSIDE!" feature
| for all the books it was hoovering in an attempt to kill the last
| benefit the bookstore had over them.
|
| Ie. This is not a big deal. The only difference now is ppl are
| rapidly frothing to be outraged by the mere sniff of new tech on
| the horizon. Overton window in effect.
| ChrisArchitect wrote:
| Two week old news.
|
| Some previous discussions:
|
| https://news.ycombinator.com/item?id=44367850
|
| https://news.ycombinator.com/item?id=44381838
|
| https://news.ycombinator.com/item?id=44381639
| burnt-resistor wrote:
| 1980's: _Johnny No. 5 need input!_
|
| 2020's: (Steals a bunch of books to profit off acquired
| knowledge.)
| throwawayffffas wrote:
| The article doesn't say who is suing them. Is it a class action?
| How many of these 7M pirated books have they written? Is it
| publishing houses? How many of these books are relevant in this
| judgement?
| 1vuio0pswjnm7 wrote:
| Buiness Insider fails to include the Order
|
| https://ia800101.us.archive.org/15/items/gov.uscourts.cand.4...
___________________________________________________________________
(page generated 2025-07-07 23:01 UTC)