[HN Gopher] Anthropic destroyed millions of print books to build...
       ___________________________________________________________________
        
       Anthropic destroyed millions of print books to build its AI models
        
       Author : bayindirh
       Score  : 9 points
       Date   : 2025-06-25 21:06 UTC (1 hours ago)
        
 (HTM) web link (arstechnica.com)
 (TXT) w3m dump (arstechnica.com)
        
       | JohnFen wrote:
       | > In the process, the company cut millions of print books from
       | their bindings, scanned them into digital files, and threw away
       | the originals solely for the purpose of training AI
       | 
       | Oh boy. The more I learn about how genAI companies work, the more
       | detestable they appear to be.
        
         | ThrowawayR2 wrote:
         | You got suckered by the clickbait. Destructive scanning (https:
         | //en.wikipedia.org/wiki/Book_scanning#Destructive_scan...)
         | isn't unusual for books that are common enough that an
         | individual volume is of no particular value.
        
           | bayindirh wrote:
           | I mean, they could have gotten e-book versions of the books,
           | or even preprint PDFs.
           | 
           | In an era where people are starting to calculate the
           | environmental impact of the jobs they run on the cloud and
           | start to optimize it, adding that much load on recycling
           | system is not a wise choice, but only a selfish one.
        
             | ThrowawayR2 wrote:
             | I'm sure they would have loved to save the hassle and
             | expense of disassembling physical books. Presumably
             | something legal related or cost related prevented them from
             | going that route.
        
               | JohnFen wrote:
               | Yes, they did it as a workaround for copyright. TFA
               | explains that aspect.
        
             | AlotOfReading wrote:
             | I strongly suspect that dealing with ebooks on this scale
             | might actually be even more onerous than the physical
             | volumes.
             | 
             | The physical stuff is straightforward. Buy books from bulk
             | sellers, rip off everything and put them into off-the-self
             | rigs for digitization. It's straightforward, directly
             | scalable, can use any book, and your main issue is format
             | shifting, which anthropic successfully argued here. No DRM,
             | you buy exactly the books you need, and every book is
             | processed exactly the same way.
             | 
             | If you try to buy ebooks, you get wrapped up in onerous
             | licensing terms about copying, and how you're able to use
             | them, how long you're able to access them, and so on. Many
             | books won't even be available (or can only be licensed
             | alongside a bunch of others) and you have to deal with DRM
             | you can't strip without creating additional copyright
             | issues.
             | 
             | We've somehow created a world where physical objects are
             | more free than bits.
        
           | JohnFen wrote:
           | I didn't get suckered by anything. I'm aware of the practice.
           | I find it objectionable. That they did this is just another
           | thing on the growing list of objectionable things that genAI
           | companies seem to enjoy doing.
           | 
           | To be honest, I probably wouldn't have even commented on it
           | if it were the only bad thing these companies do.
        
       | EA-3167 wrote:
       | I don't like Anthropic, I think their "marketing through fear"
       | approach to be shitty and frankly I'm over the AI "boom" anyway.
       | 
       | BUT... here's the only line in that whole article that really
       | matters, because this is a headline meant to create an impression
       | that isn't corrected for quite a while.
       | 
       | > The court documents don't indicate that any rare books were
       | destroyed in this process--Anthropic purchased its books in bulk
       | from major retailers
       | 
       | Books are routinely pulped and recycled, they aren't holy, and if
       | they aren't rare then frankly who cares what techniques they use
       | to scan them? The issue is whether or not "AI" learning
       | represents fair use, which the courts so far have ruled that it
       | does.
        
         | bayindirh wrote:
         | > any rare books were destroyed in this proces
         | 
         | Does it matter? It's waste at the end of the day. Instead they
         | could have bought e-books. Just because we can recycle paper,
         | it doesn't mean we have the luxury to create waste as we see
         | fit, esp. when climate change became this severe.
         | 
         | > which the courts so far have ruled that it does.
         | 
         | Any concrete cases you can cite?
         | 
         | From [0], for example, while the course said that the authors
         | failed to argue their case, the second observation is complete
         | opposite of what you said. Citing the article directly:
         | Opinion suggests AI models do generally violate law.
         | 
         | In the same spirit, I think I can safely assume that they
         | violated copyright law, since they earn money by circumventing
         | it, and fair use doesn't like for-profit copying.
         | 
         | [0]: https://news.bloomberglaw.com/litigation/meta-beats-
         | copyrigh...
        
           | kirrent wrote:
           | TFA is based on the ruling which found that Anthropic
           | training on these books was fair use.
        
           | robocat wrote:
           | > It's waste at the end of the day
           | 
           | Rubbish.
           | 
           | More likely they are taking a waste stream of books and
           | _reusing_ and possibly even recycling.
           | 
           | Few people want old books, and many people that have books
           | are throwing them out or donating them. I don't think I know
           | anybody under 30 with a bookshelf of books they obviously
           | intend to keep for life. Bookshelves used to be an elite
           | status symbol, now I often see them as image rather than
           | reference (e.g. part off backdrop behind influencer vid).
           | 
           | It is likely they didn't destroy much of value, since they
           | will have minimized their purchasing costs. Modern DRM is not
           | helping.
        
       ___________________________________________________________________
       (page generated 2025-06-25 23:01 UTC)