[HN Gopher] Core copyright violation moves ahead in The Intercep...
       ___________________________________________________________________
        
       Core copyright violation moves ahead in The Intercept's lawsuit
       against OpenAI
        
       Author : giuliomagnifico
       Score  : 289 points
       Date   : 2024-11-29 13:48 UTC (1 days ago)
        
 (HTM) web link (www.niemanlab.org)
 (TXT) w3m dump (www.niemanlab.org)
        
       | philipwhiuk wrote:
       | It's extremely lousy that you have to pre-register copyright.
       | 
       | That would make the USCO a defacto clearinghouse for news.
        
         | throw646577 wrote:
         | You don't have to pre-register copyright in any Berne
         | Convention countries. Your copyright exists from the moment you
         | create something.
         | 
         | (ETA: This paragraph below is diametrically wrong. Sorry.)
         | 
         | AFAIK in the USA, registered copyright is necessary if you want
         | to bring a lawsuit and get more than statutory damages, which
         | are capped low enough that corporations do pre-register work.
         | 
         | Not the case in all Berne countries; you don't need this in the
         | UK for example, but then the payouts are typically a lot lower
         | in the UK. Statutory copyright payouts in the USA can be enough
         | to make a difference to an individual author/artist.
         | 
         | As I understand it, OpenAI could still be on the hook for up to
         | $150K per article if it can be demonstrated it is wilful
         | copyright violation. It's hard to see how they can argue with a
         | straight face that it is accidental. But then OpenAI is, like
         | several other tech unicorns, a bad faith manufacturing device.
        
           | Loughla wrote:
           | You seem to know more about this than me. I have a family
           | member who "invented" some electronics things. He hasn't done
           | anything with the inventions (I'm pretty sure they're
           | quackery).
           | 
           | But to ensure his patent, he mailed himself a sealed copy of
           | the plans. He claims the postage date stamp will hold up in
           | court if he ever needs it.
           | 
           | Is that a thing? Or is it just more tinfoil business? It's
           | hard to tell with him.
        
             | throw646577 wrote:
             | Honestly I don't know whether that actually is a meaningful
             | thing to do anymore; it may be with patents.
             | 
             | It certainly used to be a legal device people used.
             | 
             | Essentially it is low-budget notarisation. If your family
             | member believes they have something which is timely and
             | valuable, it might be better to seek out proper legal
             | notarisation, though -- you'd consult a Notary Public:
             | 
             | https://en.wikipedia.org/wiki/Notary_public
        
             | WillAdams wrote:
             | It won't hold up in court, and given that the post-office
             | will mail/deliver unsealed letters (which may then be
             | sealed after the fact), will be viewed rather dimly.
             | 
             | Buy your family member a copy of:
             | 
             | https://www.goodreads.com/book/show/58734571-patent-it-
             | yours...
        
               | Y_Y wrote:
               | Surely the NSA will retain a copy which can be checked
        
               | Tuna-Fish wrote:
               | Even if they did, it in fact cannot be checked. There is
               | precedent that you cannot subpoena NSA for their
               | intercepts, because exactly what has been intercepted and
               | stored is privileged information.
        
               | hiatus wrote:
               | > There is precedent that you cannot subpoena NSA for
               | their intercepts
               | 
               | I know it's tangential to this thread but could you link
               | to further reading?
        
               | ysofunny wrote:
               | but only in a real democracy
        
             | cma wrote:
             | The USmoved to first to file years ago. Whoever files first
             | gets it, except if he publishes it publicly there is a
             | 1-year inventor's grace period (it would not apply to a
             | self mail or private mail to other people).
             | 
             | This is patent, not copyright.
        
             | Isamu wrote:
             | Mailing yourself using registered mail is a very old tactic
             | to establish a date for your documents using an official
             | government entity, so this can be meaningful in court.
             | However this may not provide the protection he needs.
             | Copyright law differs from patent law and he should seek
             | legal advice
        
             | dataflow wrote:
             | Even if the date is verifiable, what would it even prove?
             | If it's not public then I don't believe it can count as
             | prior art to begin with.
        
             | blibble wrote:
             | presumably the intention is to prove the existence of the
             | specific plans at a specific time?
             | 
             | I guess the modern version would be to sha256 the plans and
             | shove it into a bitcoin transaction
             | 
             | good luck explaining that to a judge
        
           | Isamu wrote:
           | Right, you can register before you bring a lawsuit. Pre-
           | registration makes your claim stronger, as does notice of
           | copyright.
        
           | dataflow wrote:
           | That's what I thought too, but why does the article say:
           | 
           | > Infringement suits require that relevant works were first
           | registered with the U.S. Copyright Office (USCO).
        
             | throw646577 wrote:
             | OK so it turns out I am wrong here! Cool.
             | 
             | I have it upside down/diametrically wrong, however you see
             | fit. Right that structures exist, exactly wrong on how they
             | apply.
             | 
             | It is registration that guarantees access to statutory
             | damages:
             | 
             | https://www.justia.com/intellectual-
             | property/copyright/infri...
             | 
             | Without registration you still have your natural copyright,
             | but you would have to try to recover the profits made by
             | the infringer.
             | 
             | Which does sound like more of an uphill struggle for The
             | Intercept, because OpenAI could maybe just say "anything we
             | earn from this is de minimis considering how much errr
             | similar material is errrr in the training set"
             | 
             | Oh man it's going to take a long time for me to get my
             | brain to accept this truth over what I'd always understood.
        
             | zerocrates wrote:
             | You have to register to sue, but you have the copyright
             | automatically at the moment the work is created.
             | 
             | You can go register after an infringement and still sue,
             | but you then won't be able to get statutory damages or
             | attorney's fees.
             | 
             | Statutory damages are a big deal in general but especially
             | here where proving how much of OpenAI's revenue is due to
             | your specific articles is probably impossible. Which is why
             | they're suing under this DMCA provision: it's not an
             | infringement suit so the registration requirement doesn't
             | apply, and there's a separate statutory damages provision
             | for it.
        
           | pera wrote:
           | > _It 's hard to see how they can argue with a straight face
           | that it is accidental_
           | 
           | It's another instance of "move fast, break things" (i.e.
           | "keep your eyes shut while breaking the law at scale")
        
             | renewiltord wrote:
             | Yes, because all progress depends upon the unreasonable
             | man.
        
       | 0xcde4c3db wrote:
       | The claim that's being allowed to proceed is under 17 USC 1202,
       | which is about stripping metadata like the title and author. Not
       | exactly "core copyright violation". Am I missing something?
        
         | anamexis wrote:
         | I read the headline as the copyright violation claim being core
         | to the lawsuit.
        
           | H8crilA wrote:
           | The plaintiffs focused on exactly this part - removal of
           | metadata - probably because it's the most likely to hold in
           | courts. One judge remarked on it pretty explicitly, saying
           | that it's just a proxy topic for the real issue of the usage
           | of copyrighted material in model training.
           | 
           | I.e., it's some legalese trick, but "everyone knows" what's
           | really at stake.
        
             | 0xcde4c3db wrote:
             | Yeah; I think that's essentially where the disconnect is
             | rooted for me. It seems to me (a non-lawyer, to be clear)
             | that it's _damn hard_ to make the case for model training
             | necessarily being meat-and-potatoes  "infringement" as
             | things are defined in Title 17 Chapter 1. I see it as
             | firmly in the grey area between "a mere change of physical
             | medium or deterministic mathematical transformation clearly
             | isn't a defense against infringement on its own" and "
             | _giant toke_ come on, man, Terry Brooks was obviously just
             | ripping off Tolkien ". There might be a tension between
             | what constitutes "substantial similarity" through analog
             | and digital lenses, especially as the question pertains to
             | those who actually distribute weights.
        
               | kyledrake wrote:
               | I think you're at the heart of it, and you've humorously
               | framed the grey area here and it's very weird. Sans a
               | ruling that, for example, computers are too deterministic
               | to be creative, copyright laws really seem to imply that
               | LLM training is legal. Learning and then creating
               | something new from what you learned isn't copyright
               | infringement, so what's the legal argument here? A ruling
               | declaring this copyright infringement is likely going to
               | have crazy ripple effects going way beyond LLMs,
               | something a good judge is going to be very mindful of.
               | 
               | Ultimately, this is probably going to require congress to
               | create new laws to codify this.
        
               | mikae1 wrote:
               | According to us law, is the Internet Archive a library? I
               | know they received a DMCA excemption.
               | 
               | If so, you could argue that your local library returns
               | perfect copies of copyrighted works too. IMO it's somehow
               | different from a business turning the results of their
               | scraping into a profit machinery.
        
               | kyledrake wrote:
               | My understanding is that there is no concept of a library
               | license and that you just say you're a library and
               | therefore become one, and whether your claim survives is
               | more a product of social cultural acceptance than actual
               | legal structures but someone is welcome to correct me.
               | 
               | The internet archive also scrapes the web for content,
               | does not pay authors, the difference being that it spits
               | out literal copies of the content it scraped, whereas an
               | LLM fundamentally attempts to derive a new thing from the
               | knowledge it obtains.
               | 
               | I just can't figure out how to plug this into copyright
               | law. It feels like a new thing.
        
               | quectophoton wrote:
               | Also, Google Translate, when used to translate web pages:
               | 
               | > does not pay authors
               | 
               | Check.
               | 
               | > it spits out literal copies of the content it scraped
               | 
               | Check.
               | 
               | > attempts to derive a new thing from the knowledge it
               | obtains.
               | 
               | Check.
               | 
               | * Is interactive: Check.
               | 
               | * Can output text that sounds syntactically and
               | grammatically correct, but a human can instantly say
               | "that doesn't look right": Check.
               | 
               | * Changing one word in a sentence affects words in a
               | completely different sentence, because that changed the
               | context: Check.
        
         | CaptainFever wrote:
         | Also, is there really any benefit to stripping author metadata?
         | Was it basically a preprocessing step?
         | 
         | It seems to me that it shouldn't really affect model quality
         | all that much, is it?
         | 
         | Also, in the amended complaint:
         | 
         | > not to notify ChatGPT users when the responses they received
         | were protected by journalists' copyrights
         | 
         | Wasn't it already quite clear that as long as the articles
         | weren't replicated, it wasn't protected? Or is that still being
         | fought in this case?
         | 
         | In the decision:
         | 
         | > I agree with Defendants. Plai ntiffs allege that ChatGPT has
         | been trained on "a scrape of most of the internet, " Compl. ,
         | 29, which includes massive amounts of information from
         | innumerable sources on almost any given subject. Plaintiffs
         | have nowhere alleged that the information in their articles is
         | copyrighted, nor could they do so . When a user inputs a
         | question into ChatGPT, ChatGPT synthesizes the relevant
         | information in its repository into an answer. Given the
         | quantity of information contained in the repository, the
         | likelihood that ChatGPT would output plagiarized content from
         | one of Plaintiffs' articles seems remote. And while Plaintiffs
         | provide third-party statistics indicating that an earlier
         | version of ChatGPT generated responses containing signifi cant
         | amounts of pl agiarized content, Compl. ~ 5, Plaintiffs have
         | not plausibly alleged that there is a " substantial risk" that
         | the current version of ChatGPT will generate a response
         | plagiarizing one of Plaintiffs' articles.
        
           | freejazz wrote:
           | >Also, is there really any benefit to stripping author
           | metadata? Was it basically a preprocessing step?
           | 
           | Have you read 1202? It's all about hiding your infringement.
        
         | Kon-Peki wrote:
         | Violations of 17 USC 1202 can be punished pretty severely. It's
         | not about just money, either.
         | 
         | If, _during the trial_ , the judge thinks that OpenAI is going
         | to be found to be in violation, he can order all of OpenAIs
         | computer equipment be impounded. If OpenAI is found to be in
         | violation, he can then order permanent destruction of the
         | models and OpenAI would have to start over from scratch in a
         | manner that doesn't violate the law.
         | 
         | Whether you call that "core" or not, OpenAI cannot afford to
         | lose these parts that are left of this lawsuit.
        
           | zozbot234 wrote:
           | > he can order all of OpenAIs computer equipment be
           | impounded.
           | 
           | Arrrrr matey, this is going to be fun.
        
             | Kon-Peki wrote:
             | People have been complaining about the DMCA for 2+ decades
             | now. I guess it's great if you are on the winning side. But
             | boy does it suck to be on the losing side.
        
               | immibis wrote:
               | And normal people can't get on the winning side. I'm
               | trying to get Github to DMCA my own repositories, since
               | it blocked my account and therefore I decided it no
               | longer has the right to host them. Same with Stack
               | Exchange.
               | 
               | GitHub's ignored me so far, and Stack Exchange explicitly
               | said no (then I sent them an even broader legal request
               | under GDPR)
        
               | ralph84 wrote:
               | When you uploaded your code to GitHub you granted them a
               | license to host it. You can't use DMCA against someone
               | who's operating within the parameters of the license you
               | granted them.
        
               | tremon wrote:
               | Their stance is that GitHub revoked that license by
               | blocking their account.
        
               | Dylan16807 wrote:
               | Is it?
               | 
               | And what would connect those two things together?
        
               | immibis wrote:
               | GitHub's terms of service specify the license is granted
               | as necessary to provide the service. Since the service is
               | not provided they don't have a license.
        
               | Dylan16807 wrote:
               | Hosting the code is providing the service, whether you
               | have a working account or not.
               | 
               | Also was this code open source? Your stack exchange
               | contributions were open source, so they don't need any
               | ToS-based permission in the first place. They have access
               | under CC BY-SA.
        
             | immibis wrote:
             | It won't happen. Judges only order that punishment for the
             | little guys.
        
           | nickpsecurity wrote:
           | " If OpenAI is found to be in violation, he can then order
           | permanent destruction of the models and OpenAI would have to
           | start over from scratch in a manner that doesn't violate the
           | law."
           | 
           | That is exactly why I suggested companies train some models
           | on public domain and licensed data. That risk disappears or
           | is very minimal. They could also be used for code and
           | synthetic data generation without legal issues on the
           | outputs.
        
             | 3pt14159 wrote:
             | The problem is that you don't get the same quality of data
             | if you go about it that way. I love ChatGPT and I
             | understand that we're figuring out this new media landscape
             | but I really hope it doesn't turn out to neuter the models.
             | The models are really well done.
        
               | nickpsecurity wrote:
               | If I steal money, I can get way more done than I do now
               | by earning it legally. Yet, you won't see me regularly
               | dismissing legitimate jobs by posting comparisons to what
               | my numbers would look like if stealing I.P..
               | 
               | We must start with moral and legal behavior. Within that,
               | we look at what opportunities we have. Then, we pick the
               | best ones. Those we can't have are a side effect of the
               | tradeoffs we've made (or tolerated) in our system.
        
               | tremon wrote:
               | That is OpenAI's problem, not their victims'.
        
             | jsheard wrote:
             | That's what Adobe and Getty Images are doing with their
             | image generation models, both are exclusively using their
             | own licensed stock image libraries so they (and their
             | users) are on pretty safe ground.
        
               | nickpsecurity wrote:
               | That's good. I hope more do. This list has those doing it
               | under the Fairly Trained banner:
               | 
               | https://www.fairlytrained.org/certified-models
        
           | pnut wrote:
           | There would be a highly embarrassing walking back of such a
           | ruling, when Sam Altman flexes his political network and
           | effectively overrules it.
           | 
           | He spends his time amassing power and is well positioned to
           | plow over a speed bump like that.
        
       | james_sulivan wrote:
       | Meanwhile China is using everything available to train their AI
       | models
        
         | goatlover wrote:
         | We don't want to be like China.
        
           | tokioyoyo wrote:
           | Fair. But I made a comment somewhere else that, if their
           | models become better than ours, they'll be incorporated into
           | products. Then we're back to being depended on China for LLM
           | model development as well, on top of manufacturing.
           | Realistically that'll be banned because of National Security
           | laws or something, but companies tend to choose the path of
           | "best and cheapest" no matter what.
        
         | paxys wrote:
         | You think china is using uncensored news articles from western
         | media to train its AI models?
        
           | dmead wrote:
           | Yes. And they're being marked as bad during the alignment
           | process.
        
           | warkdarrior wrote:
           | For sure. The models are definitely safety tuned after pre-
           | training
        
       | zb3 wrote:
       | Forecast: OpenAI and The Intercept will settle and OpenAI users
       | will pay for it.
        
         | jsheard wrote:
         | Yep, the game plan is to keep settling out of court so that
         | (they hope) no legal precedent is set that would effectively
         | make their entire business model illegal. That works until they
         | run out of money I guess, but they probably can't keep it up
         | forever.
        
           | echoangle wrote:
           | Wouldn't the better method to throw all your money at one
           | suit you can make an example of and try to win that one? You
           | can't effectively settle every single suit if you have no
           | realistic chance of winning, otherwise every single publisher
           | on the internet will come and try to get their money.
        
             | lokar wrote:
             | Too high risk. Every year you can delay you keep lining
             | your pockets.
        
             | gr3ml1n wrote:
             | That's a good strategy, but you have to have the right
             | case. One where OpenAI feels confident they can win and
             | establish favorable precedent. If the facts of the case
             | aren't advantageous, it's probably not worth the risk.
        
         | tokioyoyo wrote:
         | Side question, why doesn't other companies get the same
         | attention? Anthropic, xAI and others have deep pockets, and
         | scraped the same data, I'm assuming? It could be a gold mine
         | for all these news agencies to keep settling out of court to
         | make some buck.
        
       | ysofunny wrote:
       | the very idea of "this digital asset is exclusively mine" cannot
       | die soon enough
       | 
       | let real physically tangible assets keep the exclusivity
       | _problem_
       | 
       | let's not undo the advantages unlocked by the digital internet;
       | let us prevent a few from locking down this grand boon of digital
       | abundance such that the problem becomes saturation of data
       | 
       | let us say no to digital scarcity
        
         | cess11 wrote:
         | I think you'll find that most people aren't comfortable with
         | this in practice. They'd like e.g. the state to be able to keep
         | secrets, such as personal information regarding citizens and
         | the stuff foreign spies would like to copy.
        
           | jMyles wrote:
           | Obviously we're all impacted in these perceptions by our
           | bubbles, but it would surprise me at this particular moment
           | in the history of US politics to find that most people favor
           | the existence of the state at all, let alone its ability to
           | keep secret personal information regarding citizens.
        
             | goatlover wrote:
             | Most people aren't anarchists, and think the state is
             | necessary for complex societies to function.
        
               | jMyles wrote:
               | My sense is that the constituency of people who prefer
               | deprecation of the US state is much larger than just
               | anarchists.
        
               | noitpmeder wrote:
               | Sound like you exist in some very insulated bubbles.
        
               | warkdarrior wrote:
               | Would this deprecation of the state include disbanding
               | the police and the armed forces? I'm guessing the people
               | who are for the deprecation of the state would answer
               | quite differently if the question specified details of
               | government functions.
        
               | jMyles wrote:
               | ...I mean, police are deeply unpopular in the American
               | political consciousness, and have been since prior to
               | their rebrand from "slave patrols" in the 19th century.
               | Surely you recall that, only four years ago, millions of
               | people took to the streets calling for a completion to
               | the unfinished business of abolition?
               | 
               | Obviously the armed forces are much less despised than
               | the police. But given that private gun ownership is at an
               | all-time high (with woman and people of color -
               | historically marginalized groups with regard to arms
               | equality - making up the lion's share of the recent
               | increase), I'm not sure that people are feeling
               | particularly vulnerable to invasion either.
               | 
               | Is the state really that popular in your circle? How do
               | people express their esteem? Am I just missing it?
        
             | cess11 wrote:
             | Really? Are Food Not Bombs and the IWW that popular we're
             | you live?
        
         | CaptainFever wrote:
         | This is, in fact, the core value of the hacker ethos. _Hacker_
         | News.
         | 
         | > The belief that information-sharing is a powerful positive
         | good, and that it is an ethical duty of hackers to share their
         | expertise by writing open-source code and facilitating access
         | to information and to computing resources wherever possible.
         | 
         | > Most hackers subscribe to the hacker ethic in sense 1, and
         | many act on it by writing and giving away open-source software.
         | A few go further and assert that all information should be free
         | and any proprietary control of it is bad; this is the
         | philosophy behind the GNU project.
         | 
         | http://www.catb.org/jargon/html/H/hacker-ethic.html
         | 
         | Perhaps if the Internet didn't kill copyright, AI will.
         | (Hyperbole)
         | 
         | (Personally my belief is more nuanced than this; I'm fine with
         | very limited copyright, but my belief is closer to yours than
         | the current system we have.)
        
           | ysofunny wrote:
           | oh please, then, riddle me why does my comment has -1 votes
           | on "hacker" news
           | 
           | which has indeed turned into "i-am-rich-cuz-i-own-tech-
           | stock"news
        
             | CaptainFever wrote:
             | Yes, I have no idea either. I find it disappointing.
             | 
             | I think people simply like it when data is liberated from
             | corporations, but hate it when data is liberated from them.
             | (Though this case is a corporation too so idk. Maybe just
             | "AI bad"?)
        
             | alwa wrote:
             | I did not contribute a vote either way to your comment
             | above, but I would point out that you get more of what you
             | reward. Maybe the reward is monetary, like an author paid
             | for spending their life writing books. Maybe it's smaller,
             | more reputational or social--like people who generate
             | thoughtful commentary here, or Wikipedia's editors, or
             | hobbyists' forums.
             | 
             | When you strip people's names from their words, as the
             | specific count here charges; and you strip out any reason
             | or even way for people to reward good work when they
             | appreciate it; and you put the disembodied words in the
             | mouth of a monolithic, anthropomorphized statistical model
             | tuned to mimic a conversation partner... what type of
             | thought is it that becomes abundant in this world you
             | propose, of "data abundance"?
             | 
             | In that world, the only people who still have incentive to
             | create are the ones whose content has _negative_ value, who
             | make things people otherwise wouldn't want to see:
             | advertisers, spammers, propagandists, trolls... where's the
             | upside of a world saturated with that?
        
           | onetokeoverthe wrote:
           | Creators freely sharing with attribution requested is
           | different than creations being ruthlessly harvested and
           | repurposed without permission.
           | 
           | https://creativecommons.org/share-your-work/
        
             | CaptainFever wrote:
             | > A few go further and assert that all information should
             | be free and any proprietary control of it is bad; this is
             | the philosophy behind the GNU project.
             | 
             | In this view, the ideal world is one where copyright is
             | abolished (but not moral rights). So piracy is good, and
             | datasets are also good.
             | 
             | Asking creators to license their work freely is simply a
             | compromise due to copyright unfortunately still existing.
             | (Note that even if creators don't license their work
             | freely, this view still permits you to pirate or mod it
             | against their wishes.)
             | 
             | (My view is not this extreme, but my point is that this
             | view was, and hopefully is, still common amongst hackers.)
             | 
             | I will ignore the moralizing words (eg "ruthless",
             | "harvested" to mean "copied"). It's not productive to the
             | conversation.
        
               | onetokeoverthe wrote:
               | If not respected, some Creators will strike, lay flat,
               | not post, go underground.
               | 
               | Ignoring moral rights of creators is the issue.
        
               | CaptainFever wrote:
               | Moral rights involve the attribution of works where
               | reasonable and practical. Clearly doing so during
               | inference is not reasonable or practical (you'll have to
               | attribute all of humanity!) but attributing individual
               | sources _is_ possible and _is_ already being done in
               | cases like ChatGPT Search.
               | 
               | So I don't think you actually mean moral rights, since
               | it's not being ignored here.
               | 
               | But the first sentence of your comment still stands
               | regardless of what you meant by moral rights. To that,
               | well... we're still commenting here, are we not? Despite
               | it with almost 100% certainty being used to train AI.
               | We're still here.
               | 
               | And yes, funding is a thing, which I agree needs
               | copyright for the most part unfortunately. But does
               | training AI on, for example, a book really reduce the
               | need to buy the book, if it is not reproduced?
               | 
               | Remember, training is not just about facts, but about
               | learning how humans talk, how _languages_ work, how books
               | work, etc. Learning that won 't reduce the book's
               | economical value.
               | 
               | And yes, summaries may reduce the value. But summaries
               | already exist. Wikipedia, Cliff's Notes. I think the main
               | defense is that you can't copyright facts.
        
               | onetokeoverthe wrote:
               | _we 're still commenting here, are we not? Despite it
               | with almost 100% certainty being used to train AI. We're
               | still here_
               | 
               | ?!?! Comparing and equating commenting to creative works.
               | ?!?!
               | 
               | These comments are NOT equivalent to the 17 full time
               | months it took me to write a nonfiction book.
               | 
               | Or an 8 year art project.
               | 
               | When I give away _my_ work _I_ decide to whom and how.
        
               | CaptainFever wrote:
               | I have already covered these points in the latter
               | paragraphs.
               | 
               | You might want to take a look at
               | https://www.gnu.org/philosophy/shouldbefree.en.html
        
               | onetokeoverthe wrote:
               | _I 'll_ decide the distribution of _my_ work. Be it 100
               | million unique views or NOT at all.
        
               | CaptainFever wrote:
               | If you don't have a proper argument, it's best not to
               | distribute your comment at all.
        
               | onetokeoverthe wrote:
               | If saying it's _my_ work is not a  "proper" argument,
               | that says it all.
        
               | CaptainFever wrote:
               | Indeed, owner.
               | 
               | Look, either actually read the link and refute the points
               | within, or don't. But there's no use discussing anything
               | if you're unwilling to even understand and seriously
               | refute a single point being made here, other than
               | repeating "mine, mine, mine".
        
               | onetokeoverthe wrote:
               | Read it. Lots of nots, and no respect.
               | 
               |  _In the process, [OpenAI] trained ChatGPT not to
               | acknowledge or respect copyright, not to notify ChatGPT
               | users when the responses they received were protected by
               | journalists' copyrights, and not to provide attribution
               | when using the works of human journalists_
        
               | CaptainFever wrote:
               | No, wrong link.
               | 
               | https://news.ycombinator.com/item?id=42279218
        
             | a57721 wrote:
             | > freely sharing with attribution requested
             | 
             | If I share my texts/sounds/images for free, harvesting and
             | regurgitating them omits the requested attribution. Even
             | the most permissive CC license (excluding CC0 public
             | domain) still requires an attribution.
        
           | AlienRobot wrote:
           | I think an ethical hacker is someone who uses their expertise
           | to help those without.
           | 
           | How could an ethical hacker side with OpenAI, when OpenAI is
           | using its technological expertise to exploit creators
           | without?
        
             | CaptainFever wrote:
             | I won't necessarily argue against that moral view, but in
             | this case it is two large corporations fighting. One has
             | the power of tech, the other has the power of the state
             | (copyright). So I don't think that applies in this case
             | specifically.
        
               | Xelynega wrote:
               | Aren't you ignoring that common law is built on
               | precedent? If they win this case, that makes it a lot
               | easier for people who's copyright is being infringed on
               | an individual level to get justice.
        
               | CaptainFever wrote:
               | You're correct, but I think many don't realize how many
               | small model trainers and fine-tuners there are currently.
               | For example, PonyXL, or the many models and fine-tunes on
               | CivitAI made by hobbyists.
               | 
               | So basically the reasoning is this:
               | 
               | - NYT vs OpenAI, neither is disenfranchied - OpenAI vs
               | individual creators, creators are disenfranchised - NYT
               | vs individual model trainers, model trainers are
               | disenfranchised - Individual model trainers vs individual
               | creators, neither are disenfranchised
               | 
               | And if only one can win, and since the view is that
               | information should be free, it biases the argument
               | towards the model trainers.
        
               | AlienRobot wrote:
               | What "information" are you talking about? It's a text and
               | image generator.
               | 
               | Your argument is that it's okay to scrape content when
               | you are an individual. It doesn't change the fact those
               | individuals are people with technical expertise using it
               | to exploit people without.
               | 
               | If they wrote a bot to annoy people but published how
               | many people got angry about it, would you say it's okay
               | because that is information?
               | 
               | You need to draw the line somewhere.
        
               | CaptainFever wrote:
               | Text and images _are_ information, though.
               | 
               | > If they wrote a bot to annoy people but published how
               | many people got angry about it, would you say it's okay
               | because that is information?
               | 
               | Kind of? It's not okay, but not because it is usage of
               | information without consent (this is the "information
               | should free" part), but because it is intentionally and
               | unnecessarily annoying and angering people (this is the
               | "don't use the information for evil" part which I _think_
               | is your position).
               | 
               | "See? Similarly, even in your view, model trainers aren't
               | bad because they're using data. They're bad in general
               | because they're exploiting creatives."
               | 
               | But why is it exploitative?
               | 
               | "They're putting the creatives out of a job." But this
               | applies to automation in general.
               | 
               | "They're putting creatives out of a job, using data they
               | created." This is the strongest argument for me. It does
               | intuitively feel exploitative. However, there are several
               | issues:
               | 
               | 1. Not all models or datasets do that. For instance, no
               | one is visibly getting paid to write comments on HN, or
               | to write fanfics on the non-commercial fanfic site AO3.
               | Since the data creators are not doing it as a job in the
               | first place, it does not make sense to talk about them
               | losing their job because of the very same data.
               | 
               | 2. Not all models or datasets do that. For example, spam
               | filters, AI classifiers. All of this can be trained from
               | the entire Internet and not be exploitative because there
               | is no job replacement involved here.
               | 
               | 3. Some models already do that, and are already well and
               | morally accepted. For example, Google Translate.
               | 
               | 4. This may be resolved by going the other way and making
               | more models open source (or even leaks), so more
               | creatives can use it freely, so they can make use of the
               | productive power.
               | 
               | "Because they're using creatives' information without
               | consent." But as mentioned, it's not about the
               | information or consent. It's about what you do with the
               | information.
               | 
               | Finally, because this is a legal case, it's also
               | important to talk about the morality of using the state
               | to restrict people from using information freely, even if
               | their use of the information is morally wrong.
               | 
               | If you believe in free culture as in free speech, then it
               | is wrong to restrict such a use using the law, even
               | though we might agree it is morally wrong. But this
               | really depends if you believe in free culture as in free
               | speech in the first place, which is a debate much larger
               | than this.
        
           | Xelynega wrote:
           | I don't understand what the "hacker ethos" could have to do
           | with defending openai's blatant stealing of people's content
           | for their own profit.
           | 
           | Openai is not sharing their data(they're keeping it private
           | to profit off of), so how could it be anywhere near the
           | "hacker ethos" to believe that everyone else needs to hand
           | over their data to openai for free?
        
             | CaptainFever wrote:
             | Following the "GNU-flavour hacker ethos" as described, one
             | concludes that it is right for OpenAI to copy data without
             | restriction, it is wrong for NYT to restrict others from
             | using their data, and it is _also_ wrong for OpenAI to
             | restrict the sharing of their model weights or outputs for
             | training.
             | 
             | Luckily, most people seem to ignore OpenAI's hypocritical
             | TOS against sharing their output weights for training. I
             | would go one step further and say that they should share
             | the weights completely, but I understand there's practical
             | issues with that.
             | 
             | Luckily, we can kind of "exfiltrate" the weights by
             | training on their output. Or wait for someone to leak it,
             | like NovelAI did.
        
           | raincole wrote:
           | Open AI scrapping copyrighted materials to make a proprietary
           | model is the _exact_ opposite of what GNU promotes.
        
             | CaptainFever wrote:
             | As I mentioned in another comment:
             | 
             | "Scrapping" (scraping) copyrighted materials is not the
             | wrong thing to do.
             | 
             | Making it proprietary is.
             | 
             | It is important to be clear about what is wrong, so you
             | don't accidentally end up fighting for copyright expansion,
             | or fighting against open models.
        
         | guerrilla wrote:
         | Sure, as soon as people have an alternative way to survive.
        
       | whywhywhywhy wrote:
       | It's so weird to me seeing journalists complaining about
       | copyright and people taking something they did.
       | 
       | The whole of journalism is taking the acts of others and
       | repeating them, why does a journalist claim they have the rights
       | to someone else's actions when someone simply looks at something
       | they did and repeat it.
       | 
       | If no one else ever did anything, the journalist would have
       | nothing to report, it's inherently about replicating the work and
       | acts of others.
        
         | barapa wrote:
         | This is terribly unpersuasive
        
         | PittleyDunkin wrote:
         | > The whole of journalism is taking the acts of others and
         | repeating them
         | 
         | Hilarious (and depressing) that this is what people think
         | journalists do.
        
           | SoftTalker wrote:
           | What is a "journalist?" It sounds old-fashioned.
           | 
           | They are "content creators" now.
        
         | echoangle wrote:
         | That's a pretty narrow view of journalism. If you look into
         | newspapers, it's not just a list of events but also opinion
         | pieces, original research, reports etc. The main infringement
         | isn't with the basic reporting of facts but with the original
         | part that's done by the writer.
        
         | razakel wrote:
         | Or you could just not do illegal and/or immoral things that are
         | worthy of reporting.
        
       | hydrolox wrote:
       | I understand that regulations exist and how there can be
       | copyright violations, but shouldn't we be concerned that other..
       | more lenient governments (mainly China) who are opposed to the US
       | will use this to get ahead? If OpenAI is significantly set back.
        
         | fny wrote:
         | No. OpenAI is suspected to be worth over $150B. They can
         | absolutely afford to pay people for data.
         | 
         | Edit: People commenting need to understand that $150B is the
         | _discounted value of future revenues._ So... yes they can pay
         | out... yes they will be worth less... and yes that 's fair to
         | the people who created the information.
         | 
         | I can't believe there are so many apologists on HN for what
         | amounts to vacuuming up peoples data for financial gain.
        
           | suby wrote:
           | OpenAI is not profitable, and to achieve what they have
           | achieved they had to scrape basically the entire internet. I
           | don't have a hard time believing that OpenAI could not exist
           | if they had to respect copyright.
           | 
           | https://www.cnbc.com/2024/09/27/openai-sees-5-billion-
           | loss-t...
        
             | jpalawaga wrote:
             | technically open ai has respected copyright, except in the
             | (few) instances they produce non-fair-use amounts of
             | copyrighted material.
             | 
             | dmca does not cover scraping.
        
             | noitpmeder wrote:
             | That's a good thing! If a company cannot raise to fame
             | unless they violate laws, it should not have been there.
             | 
             | There is plenty of public domain text that could have
             | taught a LLM English.
        
               | suby wrote:
               | I'm not convinced that the economic harm to content
               | creators is greater than the productivity gains and
               | accessibility of knowledge for users (relative to how
               | competent it would be if trained just on public domain
               | text). Personally, I derive immense value from ChatGPT /
               | Claude. It's borderline life changing for me.
               | 
               | As time goes on, I imagine that it'll increasingly be the
               | case that these LLM's will displace people out of their
               | jobs / careers. I don't know whether the harm done will
               | be greater than the benefit to society. I'm sure the
               | answer will depend on who it is that you ask.
               | 
               | > That's a good thing! If a company cannot raise to fame
               | unless they violate laws, it should not have been there.
               | 
               | Obviously given what I wrote above, I'd consider it a bad
               | thing if LLM tech severely regressed due to copyright
               | law. Laws are not inherently good or bad. I think you can
               | make a good argument that this tech will be a net
               | negative for society, but I don't think it's valid to do
               | so just on the basis that it is breaking the law as it is
               | today.
        
               | DrillShopper wrote:
               | > I'm not convinced that the economic harm to content
               | creators is greater than the productivity gains and
               | accessibility of knowledge for users (relative to how
               | competent it would be if trained just on public domain
               | text).
               | 
               | Good thing whether or not something is a copyright
               | violation doesn't depend on if you can make more money
               | with someone else's work than they can.
        
               | suby wrote:
               | I understand the anger about large tech companies using
               | others work without compensation, especially when both
               | they and their users benefit financially. But this goes
               | beyond economcis. LLM tech could accelerate advances in
               | medicine and technology. I strongly believe that we're
               | going to see societal benefits in education, healthcare,
               | especially mental health support thanks to this tech.
               | 
               | I also think that someone making money off LLM's is a
               | separate question from whether or not the original
               | creator has been harmed. I think many creators are going
               | to benefit from better tools, and we'll likely see new
               | forms of creation become viable.
               | 
               | We already recognize that certain uses of intellectual
               | property should be permitted for societies benefit. We
               | have fair use doctrine, patent compulsory licensing for
               | public health, research exmpetions, and public libraries.
               | Transformative use is also permitted, and LLMs are
               | inherently transformative. Look at the volume of data
               | that they ingest compared to the final size of a trained
               | model, and how fundamentally different the output format
               | is from the input data.
               | 
               | Human progress has always built upon existing knowledge.
               | Consider how both Darwin and Wallace independently
               | developed evolution theory at roughly the same time --
               | not from isolation, but from building on the intellectual
               | foundation of their era. Everything in human culture
               | builds on what came before.
               | 
               | That all being said, I'm also sure that this tech is
               | going to negative impact people too. Like I said in the
               | other reply, whether or not this tech is good or bad will
               | depend on who you ask. I just think that we should weigh
               | these costs against the potential benefits to society as
               | a whole rather than simply preserving existing systems,
               | or blindly following the law as if the law is inherently
               | just or good. Copyright law was made before this tech was
               | even imagined, and it seems fair to now evaluate whether
               | the current copyright regime makes sense if it turns out
               | that it'd keep us in some local maximum.
        
           | jsheard wrote:
           | The OpenAI that is assumed to keep being able to harvest
           | every form of IP without compensation is valued at $150B, an
           | OpenAI that has to pay for data would be worth significantly
           | less. They're currently not even expecting to turn a profit
           | until 2029, and that's _without_ paying for data.
           | 
           | https://finance.yahoo.com/news/report-reveals-
           | openais-44-bil...
        
           | mrweasel wrote:
           | That's not real money tough. You need actual cash on hand to
           | pay for stuff, OpenAI only have the money they've been given
           | by investors. I suspect that many of the investors wouldn't
           | have been so keen if they knew that OpenAI would need an
           | additional couple of billions a year to pay for data.
        
             | __loam wrote:
             | That's too bad that your business isn't viable without the
             | largest single violation of copyright of all time.
        
           | nickpsecurity wrote:
           | That doesn't mean they have $150B to hand over. What you can
           | cite is the $10 billion they got from Microsoft.
           | 
           | I'm sure they could use a chunk of that to buy competitive
           | I.P. for both companies to use for training. They can also
           | pay experts to create it. They could even sell that to others
           | for use in smaller models to finance creating or buying even
           | more I.P. for their models.
        
           | wvenable wrote:
           | > I can't believe there are so many apologists on HN for what
           | amounts to vacuuming up peoples data for financial gain.
           | 
           | If you are consistently strict about this than almost we do
           | is impossible without paying someone. You read books and now
           | you have a job? Pay up. You've been vacuuming up people's
           | data for years. You listened to music for the last 4 decades
           | and produced "a unique" song? Yeah right.
           | 
           | Copyright has done as much (or more) to limit human progress
           | and expression than it has done to improve it.
           | 
           | If ChatGPT is not literally copying works themselves, I don't
           | even see how copyright applieas.
        
             | mongol wrote:
             | The process of reading it into their training data is a way
             | of copying it. It exists somewhere and they need to copy it
             | in order to ingest it.
        
               | wvenable wrote:
               | By that logic you're violating copyright by using a web
               | browser.
        
               | Suppafly wrote:
               | >By that logic you're violating copyright by using a web
               | browser.
               | 
               | You would be except for the fact that publishing stuff on
               | the web gives people an implicit license to download it
               | for the purposes of viewing it.
        
               | Timwi wrote:
               | Not sure about US or other jurisdictions, but that's not
               | how any of this works in Germany. In Germany downloading
               | anything from anywhere (even a movie) is never illegal
               | and does not require a license. What's illegal is
               | publishing/disseminating copyrighted content without
               | authorization. BitTorrenting a movie is illegal because
               | you're distributing it to other torrenters. Streaming a
               | movie on your website is illegal because it's public. You
               | can be held liable for using a photo from the web to
               | illustrate your eBay auction, not because you downloaded
               | it but because you republished it.
               | 
               | OpenAI (and Google and everyone else) is creating a
               | publicly-accessible system that produces output that
               | could be derived from copyrighted material.
        
               | Tomte wrote:
               | > In Germany [...]
               | 
               | That's confidently and completely wrong.
        
               | __loam wrote:
               | The nature of the copy does actually matter.
        
             | CJefferson wrote:
             | We can, and do, choose to treat normal people different
             | from billion dollar companies that are attempting to suck
             | up all human output and turn it into their own personal
             | profit.
             | 
             | If they were, say, a charity doing this for the good of
             | mankind, I'd have more sympathy. Shame they never were.
        
               | tolmasky wrote:
               | The way to treat them differently is not by making them
               | share profits with another corporation. The logical
               | endgame of all this isn't "stopping LLMs," it's Disney
               | happening to own a critical mass of IP to be able to
               | legally train and run LLMs that make movies, firing all
               | their employees, and no smaller company ever having a
               | chance in hell with competing with a literal century's
               | worth of IP powering a _generative_ model.
               | 
               | The best party about all this is that Disney initially
               | took off by... making use of public domain works.
               | Copyright used to last 14 years. You'd be able to create
               | derivative works of most the art in your life at some
               | point. Now you're never allowed to. And more often than
               | not, not to grant a monopoly to the "author", but to the
               | corporation that hired them. The correct analysis
               | shouldn't be OpenAI vs. Intercept or Disney of whomever.
               | You're just choosing kings at that point.
        
             | IsTom wrote:
             | > produced "a unique" song?
             | 
             | People do get sued for making songs that are too similar to
             | previously made songs. One defence available is that
             | they've never heard it themselves before.
             | 
             | If you want to treat AI like humans then if AI output is
             | similar enough to copyrighted material it should get sued.
             | Then you try to prove that it didn't ingest the original
             | version somehow.
        
               | noitpmeder wrote:
               | The fact that these lawsuits aren't as simple as "is my
               | copywrited work in your training set, yes or no" is
               | boggling.
        
               | __loam wrote:
               | I feel like at some point the people in favor of this are
               | going to realize that whether the data was ingested into
               | a training set is completely immaterial to the fact that
               | these companies downloaded data they don't have a license
               | to use to a company server somewhere with the intention
               | to use it for commercial use.
        
             | GeoAtreides wrote:
             | Ah yes, humans and LLMs are exactly the same, learning the
             | same way, reasoning the same way, they're practically
             | indistinguishable. So that's why it makes sense to equate
             | humans reading books with computer programs ingesting and
             | processing the equivalent of billions of books in literal
             | days or months.
        
               | Timwi wrote:
               | While I agree with your sentiment in general, this thread
               | is about the legal situation and your argument is
               | unfortunately not a legal one.
        
               | anileated wrote:
               | "A person is fundamentally different from an LLM" does
               | not need a legal argument and is implied by the fact that
               | LLMs do not have human rights, or even anything
               | comparable to animal rights.
               | 
               | A legal argument would be needed to argue the other way.
               | This argument would imply granting LLMs some degree of
               | human rights, which the very industry profiting from
               | these copyright violations will never let happen for
               | obvious reasons.
        
               | notahacker wrote:
               | The other problem with the legal argument that it's "just
               | like a person learning" is that corporations whose human
               | employees have learned what copyrighted characters look
               | like and then start incorporating them into their art are
               | considered guilty of copyright violation, and don't get
               | to deploy the "it's not an intentional copyright
               | violation from someone who should have known better, it's
               | just a tool outputting what the user requested"
               | defence...
        
             | DrillShopper wrote:
             | > You read books and now you have a job? Pay up.
             | 
             | It is disingenuous to imply the scale of someone buying
             | books and reading them (for which the publisher and author
             | are compensated) or borrowing them from the library and
             | reading them (again, for which the publisher and author are
             | compensated) is the same as the wholesale copying without
             | permission or payment of anything not behind a pay wall on
             | the Internet.
        
         | dmead wrote:
         | I'm more concerned that someone people in the tech world are
         | conflating Sam Altman's interest with the national interest.
        
           | jMyles wrote:
           | Am I jazzed about Sam Altman making billions? No.
           | 
           | Am I even more concerned about the state having control over
           | the future corpus of knowledge via this doomed-in-any-case
           | vector of "intellectual property"? Yes.
           | 
           | I think it will be easier to overcome the influence of
           | billionaires when we drop the pretext that the state is a
           | more primal force than the internet.
        
             | dmead wrote:
             | 100% disagree. "It'll be fine bro" is not a substitute for
             | having a vote over policy decisions made by the government.
             | What you're talking about has a name. It starts with F and
             | was very popular in Italy in the early to mid 20th century.
        
               | jMyles wrote:
               | Rapidity of Godwin's law notwithstanding, I'm not
               | disputing the importance of equity in decision-making.
               | But this matter is more complex than that: it's obvious
               | that the internet doesn't tolerate censorship even if it
               | is dressed as intellectual property. I prefer an open and
               | democratic internet to one policied by childish legacy
               | states, the presence of which serves only (and only
               | sometimes) to drive content into open secrecy.
               | 
               | It seems particularly unfair to equate any questioning of
               | the wisdom of copyright laws (even when applied in
               | situations where we might not care for the defendant, as
               | with this case) with fascism.
        
               | dmead wrote:
               | It's not Godwin's law when it's correct. Just because
               | it's cool and on the Internet doesn't mean you get to
               | throw out people's stake in how their lives are run.
        
               | jMyles wrote:
               | > throw out people's stake in how their lives are run
               | 
               | FWIW, you're talking to a professional musician.
               | Ostensibly, the IP complex is designed to protect me. I
               | cannot fathom how you can regard it as the "people's
               | stake in how their lives are run". Eliminating copyright
               | will almost certainly give people more control over their
               | digital lives, not less.
               | 
               | > It's not Godwin's law when it's correct.
               | 
               | Just to be clear, you are doubling down on the claim that
               | sunsetting copyright laws is tantamount to nazism?
        
               | dmead wrote:
               | Not at all. Go re read above.
        
           | astrange wrote:
           | Easy to turn one into the other, just get someone to leak the
           | model weights.
        
         | worble wrote:
         | Should we also be concerned that other governments use slave
         | labor (among other human rights violations) and will use that
         | to get ahead?
        
           | logicchains wrote:
           | It's hysterical to compare training an ML model with slave
           | labour. It's perfectly fine and accepted for a human to read
           | and learn from content online without paying anything to the
           | author when that content has been made available online for
           | free, it's absurd to assert that it somehow becomes a human
           | rights violation when the learning is done by a non-
           | biological brain instead.
        
             | Kbelicius wrote:
             | > It's hysterical to compare training an ML model with
             | slave labour.
             | 
             | Nobody did that.
             | 
             | > It's perfectly fine and accepted for a human to read and
             | learn from content online without paying anything to the
             | author when that content has been made available online for
             | free, it's absurd to assert that it somehow becomes a human
             | rights violation when the learning is done by a non-
             | biological brain instead.
             | 
             | It makes sense. There is always scale to consider in these
             | things.
        
               | totallykvothe wrote:
               | worble literally did make that comparison. It is possible
               | for comparisons to be made using other rhetorical devices
               | than just saying "I am comparing a to b".
        
               | Terr_ wrote:
               | > worble literally did make that comparison
               | 
               | No, their mention of "slave labor" is not a comparison to
               | how LLMs work, nor an assertion of moral equivalence.
               | 
               | Instead it is just one example to demonstrate that
               | chasing economic/geopolitical competitiveness is not a
               | _carte blanche_ to adopt practices that might be immoral
               | or unjust.
        
         | devsda wrote:
         | Get ahead in terms of what? Do you believe that the material in
         | public domain or legally available content that doesn't violate
         | copyrights is not enough to research AI/LLMs or is the concern
         | about purely commercial interests?
         | 
         | China also supposedly has abusive labor practices. So, should
         | other countries start relaxing their labor laws to avoid
         | falling behind ?
        
         | mu53 wrote:
         | Isn't it a greater risk that creators lose their income and
         | nobody is creating the content anymore?
         | 
         | Take for instance what has happened with news because of the
         | internet. Not exactly the same, but similar forces at work. It
         | turned into a race to the bottom with everyone trying to
         | generate content as cheaply as possible to get maximum
         | engagement with tech companies siphoning revenue. Expensive,
         | investigative pieces from educated journalists disappeared in
         | favor of stuff that looks like spam. Pre-Internet news was
         | higher quality
         | 
         | Imagine that same effect happening for all content? Art,
         | writing, academic pieces. Its a real risk that openai has
         | peaked in quality
        
           | CuriouslyC wrote:
           | Lots of people create without getting paid to do it. A lot of
           | music and art is unprofitable. In fact, you could argue that
           | when the mainstream media companies got completely captured
           | by suits with no interest in the things their companies
           | invested in, that was when creativity died and we got
           | consigned to genre-box superhero pop hell.
        
           | eastbound wrote:
           | I don't know. When I look at news from before, there never
           | was investigative journalism. It was all opinion swaying
           | editos, until alternate voices voiced their
           | counternarratives. It's just not in newspapers because they
           | are too politically biased to produce the two sides of
           | stories that we've always asked them to do. It's on other
           | media.
           | 
           | But investigative journalism has not disappeared. If
           | anything, it has grown.
        
             | mu53 wrote:
             | its changed. Investigative journalism is done by non-
             | profits specializing in it, who have various financial
             | motives.
             | 
             | The budgets at newspapers used to be much larger and fund
             | more investigative journalism with a clearer motive.
        
           | BeFlatXIII wrote:
           | > Isn't it a greater risk that creators lose their income and
           | nobody is creating the content anymore?
           | 
           | There are already multiple lifetimes of quality content out
           | there. It's difficult to get worked up about the potential
           | future losses.
        
         | immibis wrote:
         | Absolutely: if copyright is slowing down innovation, we should
         | abolish copyright.
         | 
         | Not just turn a blind eye when it's the right people doing it.
         | They don't even have a legal exemption passed by Congress -
         | they're just straight-up breaking the law and getting away with
         | it. Which is how America works, I suppose.
        
           | JoshTriplett wrote:
           | Exactly. They rushed to violate copyright on a massive scale
           | _quickly_ , and now are making the argument that it shouldn't
           | apply to them and they couldn't possibly operate in
           | compliance with it. As long as humans don't get to ignore
           | copyright, AI shouldn't either.
        
             | Filligree wrote:
             | Humans do get to ignore copyright, when they do the same
             | thing OpenAI has been doing.
        
               | slyall wrote:
               | Exactly.
               | 
               | Should I be paying a proportion of my salary to all the
               | copyright holders of the books, song, TV shows and movies
               | I consumed during my life?
               | 
               | If a Hollywood writer says she "learnt a lot about
               | writing by watching the Simpsons" will Fox have an
               | additional claim on her earnings?
        
               | dijksterhuis wrote:
               | > Should I be paying a proportion of my salary to all the
               | copyright holders of the books, song, TV shows and movies
               | I consumed during my life?
               | 
               | you already are.
               | 
               | a proportion of what you pay for books, music, tv shows,
               | movies goes to rights holders already.
               | 
               | any subscription to spotify/apple music/netflix/hbo; any
               | book/LP/CD/DVD/VHS; any purchased digital download ... a
               | portion of that sales is paid back to rights holders.
               | 
               | so... i'm not entirely sure what your comment is trying
               | to argue for.
               | 
               | are you arguing that you should get paid a rebate for
               | your salary that's already been spent on copyright
               | payments to rights holders?
               | 
               | > If a Hollywood writer says she "learnt a lot about
               | writing by watching the Simpsons" will Fox have an
               | additional claim on her earnings?
               | 
               | no. that's not how copyright functions.
               | 
               | the actual episodes of the simpsons are the copyrighted
               | work.
               | 
               | broadcasting/allowing purchases of those episode incurs
               | the copyright as it involves COPYING the material itself.
               | 
               | COPYright is about the rights of the rights holder when
               | their work is COPIED, where a "work" is the material
               | which the copyright applies to.
               | 
               | merely mentioning the existence of a tv show involves
               | zero copying of a registered work.
               | 
               | being inspired by another TV show to go off and write
               | your own tv show involves zero copying of the work.
               | 
               | a hollywood writer rebroadcasting a simpsons during a TV
               | interview would be a different matter. same with the
               | hollywood writer just taking scenes from a simpsons
               | episode and putting it into their film. that's COPYing
               | the material.
               | 
               | ---
               | 
               | when it comes to open AI, obviously this is a legal gray
               | area until courts start ruling.
               | 
               | but the accusations are that OpenAi COPIED the
               | intercept's works by downloading them.
               | 
               | openAi transferred the work to openAi servers. they made
               | a copy. and now openAi are profiting from that copy of
               | the work that they took, without any permission or
               | remuneration for the rights holder of the copyrighted
               | work.
               | 
               | essentially, openAI did what you're claiming is the
               | status quo for you... but it's not the status quo for
               | you.
               | 
               | so yeah, your comment confuses me. hopefully you're being
               | sarcastic and it's just gone completely over my head.
        
               | slyall wrote:
               | The problem is the anti-AI people who complain about AI
               | are going for several steps in the chain (and often they
               | are vague about which ones they are talking about at any
               | point).
               | 
               | As well as the "copying" of content some are also
               | claiming that the output of a LLM should result in paying
               | royalties back to the owning of the material used in
               | training.
               | 
               | So if an AI produces a sitcom script then the copyright
               | holders of those tv shows it ingested should get paid
               | royalties. In additional to the money paid to copy files
               | around.
               | 
               | Which leads to the precedent that if a writer creates a
               | sitcom then the copyright holders of sitcoms she watched
               | should get paid for "training" her.
        
               | jashmatthews wrote:
               | When humans learn and copy too closely we call that
               | plagiarism. If an LLM does it how should we deal with
               | that?
        
               | chii wrote:
               | > If an LLM does it how should we deal with that?
               | 
               | why not deal with it the same way as humans have been
               | dealt with in the past?
               | 
               | If you copied an art piece using photoshop, you would've
               | violated copyright. Photoshop (and adobe) itself never
               | committed copyright violations.
               | 
               | Somehow, if you swap photoshop with openAI and chatGPT,
               | then people claim that the actual application itself is a
               | copyright violation.
        
               | dijksterhuis wrote:
               | this isn't the same.
               | 
               | > If you copied an art piece using photoshop, you
               | would've violated copyright. Photoshop (and adobe) itself
               | never committed copyright violations.
               | 
               | the COPYing is happening on your local machine with non-
               | cloud versions of Photoshop.
               | 
               | you are making a copy, using a tool, and then
               | distributing that copy.
               | 
               | in music royalty terms, the making a copy is the
               | Mechanical right, while distributing the copy is the
               | Performing right.
               | 
               | and you are liable in this case.
               | 
               | > Somehow, if you swap photoshop with openAI and chatGPT,
               | then people claim that the actual application itself is a
               | copyright violation
               | 
               | OpenAI make a copy of the original works to create
               | training data.
               | 
               | when the original works are reproduced verbatim
               | (memorisation in LLMs is a thing), then that is the
               | copyrighted work being distributed.
               | 
               | mechanical and performing rights, again.
               | 
               | but the twist is that ChatGPT does the copying on their
               | servers and delivers it to your device.
               | 
               | they are creating a new copy and distributing that copy.
               | 
               | which makes them liable.
               | 
               | --
               | 
               | you are right that "ChatGPT" is just a tool.
               | 
               | however, the interesting legal grey area with this is --
               | are ChatGPT model weights an _encoded copy_ of the
               | copyrighted works?
               | 
               | that's where the conversation about the tool itself being
               | a copyright violation comes in.
               | 
               | photoshop provides no mechanism to recite The Art Of War
               | out of the box. an LLM could be trained to do so (like,
               | it's a hypothetical example but hopefully you get the
               | point).
        
               | chii wrote:
               | > OpenAI make a copy of the original works to create
               | training data.
               | 
               | if a user is allowed to download said copy to view on
               | their browser, why isn't that same right given to openAI
               | to download a copy to view for them? What openAI chooses
               | to do with the viewed information is up to them - such as
               | distilling summary statistics, or whatever.
               | 
               | > are ChatGPT model weights an encoded copy of the
               | copyrighted works? that is indeed the most interesting
               | legal gray area. I personally believe that it is not. The
               | information distilled from those works do not constitute
               | any copyrightable information, as it is not literary, but
               | informational.
               | 
               | It's irrelevant that you could recover the original works
               | from these weights - you could recover the same original
               | works from the digits of pi!
        
               | dijksterhuis wrote:
               | heads up: you may want to edit your second quote
               | 
               | --
               | 
               | > if a user is allowed to download said copy to view on
               | their browser, why isn't that same right given to openAI
               | to download a copy to view for them?
               | 
               | whether you can download a copy from your browser doesn't
               | matter. whether the work is registered as copyrighted
               | does (and following on from that, who is distributing the
               | work - aka allowing you to download the copy - and for
               | what purposes).
               | 
               | from the article (on phone cba to grab a quote) it makes
               | clear that the Intercept's works were _not registered_ as
               | copyrighted works with whatever the name of the US
               | copyright office was.
               | 
               | ergo, those works are _not copyrighted_ and, yes, they
               | essentially are public domain and no remuneration is
               | required ...
               | 
               | (they cannot remove DMCA attribution information when
               | distributing copies of the works though, which is what
               | the case is now about.)
               | 
               | but for all the other _registered_ works that OpenAI has
               | downloaded, creating their copy, used in training data,
               | which the model then reproduces as a memorised copy --
               | that is copyright infringement.
               | 
               | like, in case it's not clear, i've been responding to
               | what people are saying about copyright specifically. not
               | this specific case.
               | 
               | > The information distilled from those works do not
               | constitute any copyrightable information, as it is not
               | literary, but informational.
               | 
               | that's one argument.
               | 
               | my argument would be it is a form of
               | compression/decompression when the model weights result
               | in memorised (read: overfitted) training data being
               | regurgitated verbatim.
               | 
               | put the specific prompt in, you get the decompressed copy
               | out the other end.
               | 
               | it's like a zip file you download with a new album of
               | music. except, in this case, instead of double clicking
               | on the file you have to type in a prompt to get the
               | decompressed audio files (or text in LLM case)
               | 
               | > It's irrelevant that you could recover the original
               | works from these weights - you could recover the same
               | original works from the digits of pi!
               | 
               | actually, that's the whole point of courts ruling on
               | this.
               | 
               | the boundaries of what is considered reproduction is at
               | question. it is up to the courts to decide on the red
               | lines (probably blurry gray areas for a while).
               | 
               | if i specifically ask a model to reproduce an exact
               | song... is that different to the model doing it
               | accidentally?
               | 
               | i don't think so. but a court might see it differently.
               | 
               | as someone who worked in music copyright, is a musician,
               | sees the effects of people stealing musicians efforts all
               | the time, i hope the little guys come out of this on top.
               | 
               | sadly, they usually don't.
        
               | dijksterhuis wrote:
               | i've been avoiding replying to your comment for a bit,
               | and now i realised why.
               | 
               | edit: i am so sorry about the wall of text.
               | 
               | > some are also claiming that the output of a LLM should
               | result in paying royalties back to the owning of the
               | material used in training.
               | 
               | > So if an AI produces a sitcom script then the copyright
               | holders of those tv shows it ingested should get paid
               | royalties. In additional to the money paid to copy files
               | around.
               | 
               | what you're talking about here is the concept of
               | "derivative works" made from other, source works.
               | 
               | this is subtly different to reproduction of a work.
               | 
               | see the last half of this comment for my thoughts on what
               | the interesting thing courts need to work out regarding
               | verbatim reproduction
               | https://news.ycombinator.com/item?id=42282003
               | 
               | in the derivative works case, it's slightly different.
               | 
               | sampling in music is the best example i've got for this.
               | 
               | if i take four popular songs, cut 10 seconds of each, and
               | then join each of the bits together to create a new track
               | -- that is a new, derivative work.
               | 
               | but i have not sufficiently modified the source works.
               | they are clearly recognisable. i am just using
               | copyrighted material in a really obvious way. the core of
               | my "new" work is actually just four reproductions of the
               | work of other people.
               | 
               | in that case -- that derivative work, under music
               | copyright law, requires the original copyright rights
               | holders to be paid for all usage and copying of their
               | works.
               | 
               | basically, a royalty split gets agreed, or there's a
               | court case. and then there's a royalty split anyway
               | (probably some damages too).
               | 
               | in my case, when i make music with samples, i make sure i
               | mangle and process those samples until the source work is
               | no longer recognisable. i've legit made it part of my
               | workflow.
               | 
               | it's no longer the original copyrighted work. it's
               | something completely new and fully unrecognisable.
               | 
               | the issue with LLMs, not just ChatGpt, is that they will
               | reproduce both verbatim and recognisably similar output
               | to original source works.
               | 
               | the original source copyrighted work is clearly
               | recognisable, even if not an exact verbatim copy.
               | 
               | and that's what you've probably seen folks talking about,
               | at least it sounds like it to me.
               | 
               | > Which leads to the precedent that if a writer creates a
               | sitcom then the copyright holders of sitcoms she watched
               | should get paid for "training" her.
               | 
               | robin thicke "blurred lines" --
               | 
               | * https://en.m.wikipedia.org/wiki/Pharrell_Williams_v._Br
               | idgep...
               | 
               | * https://en.m.wikipedia.org/wiki/Blurred_Lines (scroll
               | down)
               | 
               | yes, there is already some very limited precedent, at
               | least for a narrow specific case involving sheet music in
               | the US.
               | 
               | the TL;DR IANAL version of the question at hand in the
               | case was "did the defendants write the song with the
               | intention of replicating a hook from the plaintiff's
               | work".
               | 
               | the jury decided, yes they did.
               | 
               | this is different to your example in that they
               | _specifically went out to replicate the that specific
               | musical component of a song_.
               | 
               | in your example, you're talking about someone having
               | "watched" a thing one time and then having to pay
               | royalties to those people as a result.
               | 
               | that's more akin to "being inspired" by, and is protected
               | under US law _i think_ IANAL. it came up in blurred
               | lines, but, well, yeah. https://en.m.wikipedia.org/wiki/I
               | dea%E2%80%93expression_dist...
               | 
               | again, the red line of infringement / not infringement is
               | ultimately up to the courts to rule on.
               | 
               | --
               | 
               | anyway, this is very different to what openAi/chatGpt is
               | doing.
               | 
               | openAi takes the works. chatgpt edits them according to
               | user requests (feed forward through the model). then the
               | output is distributed to the user. and that output could
               | be considered to be a derivative work (see massive amount
               | of text i wrote above, i'm sorry).
               | 
               | LLMs aren't sitting there going "i feel like recreating a
               | marvin gaye song". it takes data, encodes/decodes it,
               | then produces an output. it is a mechanical process, not
               | a creative one. there's no ideas here. no inspiration or
               | expression.
               | 
               | an LLM is not a human being. it is a tool, which creates
               | outputs that are often strikingly similar to source
               | copyrighted works.
               | 
               | their users might be specifically asking to replicate
               | songs though. in which case, openAi could be facilitating
               | copyright infringement (wether through derivative works
               | or not).
               | 
               | and that's an interesting legal question by itself. are
               | they facilitating the production of derivative works
               | through the copying of copyrighted source works?
               | 
               | i would say they are. and, in some cases, the derivative
               | works are obviously derived.
        
               | Suppafly wrote:
               | >a proportion of what you pay for books, music, tv shows,
               | movies goes to rights holders already.
               | 
               | When I borrow a book from a friend, how do the original
               | authors get paid for that?
        
               | dijksterhuis wrote:
               | they don't.
               | 
               | borrowing a book is not creating a COPY of the book. you
               | are not taking the pages, reproducing all of the text on
               | those pages, and then giving that reproduction to your
               | friend.
               | 
               | that is what a COPY is. borrowing the book is not a COPY.
               | you're just giving them the thing you already bought. it
               | is a transfer of ownership, albeit temporarily, not a
               | copy.
               | 
               | if you were copying the files from a digitally downloaded
               | album of music and giving those new copies to your friend
               | (music royalties were my specialty) then technically you
               | would be in breach of copyright. you have copied the
               | works.
               | 
               | but because it's such a small scale (an individual with
               | another individual) it's not going to be financially
               | worth it to take the case to court.
               | 
               | so copyright holders just cut their losses with one
               | friend sharing it with another friend, and focus on other
               | infringements instead.
               | 
               | which is where the whole torrenting thing comes in. if i
               | can track 7000 people who have all downloaded the same
               | torrented album, now i can just send a letter / court
               | date to those 7000 people.
               | 
               | the costs of enforcement are reduced because of scale.
               | 7000 people, all found the same thing, in a way that can
               | be tracked.
               | 
               | and the ultimate, one person/company has download the
               | works and making them available to others to download,
               | without paying for the rights to make copies when
               | distributing.
               | 
               | that's the ultimate goldmine for copyright infringement
               | lawsuits. and it sounds suspiciously like openAi's
               | business model.
        
               | __loam wrote:
               | Yeah it turns out humans have more rights than computer
               | programs and tech startups.
        
               | triceratops wrote:
               | So make OpenAI sleep 8 hours a day, pay income and
               | payroll taxes with the same deductions as a natural human
               | etc...
        
               | immibis wrote:
               | Copying copyrighted works?
        
               | chii wrote:
               | learning, and extracting useful information from
               | copyrighted works.
               | 
               | These extracted useful information cannot and should not
               | be copyrightable.
        
               | azemetre wrote:
               | If you're arguing that OpenAI should be compelled to make
               | all their technology and models free then I think we all
               | agree, but it sounds like you're trying to weasel your
               | way into letting a corpo get away with breaking the law
               | while running away with billions.
        
               | catlifeonmars wrote:
               | That's really expensive to do, so in practice only
               | wealthy humans or corporations can do so. Still seems
               | unfair.
        
             | treyd wrote:
             | ChatGPT doesn't violate copyright, it's a software
             | application. "Open"AI does, it's a company run by humans
             | (for now).
        
           | tpmoney wrote:
           | > they're just straight-up breaking the law and getting away
           | with it.
           | 
           | So far this has not been determined and there's plenty of
           | reasonable arguments that they are not breaking copyright
           | law.
        
           | blackqueeriroh wrote:
           | > Absolutely: if copyright is slowing down innovation, we
           | should abolish copyright.
           | 
           | Is this sarcasm?
        
             | immibis wrote:
             | No. If something slows down innovation and suffocates the
             | economy, why would you (an economically minded politician)
             | keep it?
        
               | noitpmeder wrote:
               | Because the world shouldn't be run primarily by
               | economically minded politicians??
               | 
               | I'm sure China gets competitive advantages from their use
               | of indentured and slave-like labor forces, and mass
               | reeducation programs in camps. Should the US allow these
               | things to happen? What about if a private business
               | starts?
               | 
               | But remember, they're just trying to compete with China
               | on a fair playing field, so everything is permitted
               | right?
        
               | redwall_hp wrote:
               | You might want to look at the constitutional amendment
               | enshrining slave labor "as a punishment for a crime," and
               | the world's largest prison population. Much of your food
               | supply has links to prison labor.
               | 
               | https://apnews.com/article/prison-to-plate-inmate-labor-
               | inve...
               | 
               | But don't worry, it's not considered "slave labor"
               | because there's a nominal wage of a few pennies involved
               | and it's not technically "forced." You just might be
               | tortured with solitary confinement if you don't do it.
               | 
               | We need to point fewer fingers and clean up the problems
               | here.
        
         | bogwog wrote:
         | This type of argument is ignorant, cowardly, shortsighted, and
         | regressive. Both technology and society will progress when we
         | find a formula that is sustainable and incentivizes everyone
         | involved to maximize their contributions without it all blowing
         | up in our faces someday. Copyright law is far from perfect, but
         | it protects artists who want to try and make a living from
         | their work, and it incentivizes creativity that places without
         | such protections usually end up just imitating.
         | 
         | When we find that sustainable framework for AI, China or
         | <insert-boogeyman-here> will just end up imitating it. Idk what
         | harms you're imagining might come from that ("get ahead" is too
         | vague to mean anything), but I just want to point out that that
         | isn't how you become a leader in anything. Even worse, if
         | _they_ are the ones who find that formula first while we take
         | shortcuts to  "get ahead", then we will be the ones doing the
         | imitation in the end.
        
           | gaganyaan wrote:
           | Copyright is a dead man walking and that's a good thing.
           | Let's applaud the end of a temporary unnatural state of
           | affairs.
        
             | Andrex wrote:
             | Care to make it interesting?
             | 
             | What do you consider "dead" and what do you consider a
             | reasonable timeframe for this to occur?
             | 
             | I have 60 or so years and $50.
        
               | bdangubic wrote:
               | I am in as well, I have 50 or so years and $60 (though
               | would gladly put $600k on this... :) )
        
             | CJefferson wrote:
             | If OpenAI wants copyright to be dead, then they could just
             | give out all their models copyright free.
        
         | FpUser wrote:
         | Shall we install the emperor then?
        
       | quarterdime wrote:
       | Interesting. Two key quotes:
       | 
       | > It is unclear if the Intercept ruling will embolden other
       | publications to consider DMCA litigation; few publications have
       | followed in their footsteps so far. As time goes on, there is
       | concern that new suits against OpenAI would be vulnerable to
       | statute of limitations restrictions, particularly if news
       | publishers want to cite the training data sets underlying
       | ChatGPT. But the ruling is one signal that Loevy & Loevy is
       | narrowing in on a specific DMCA claim that can actually stand up
       | in court.
       | 
       | > Like The Intercept, Raw Story and AlterNet are asking for
       | $2,500 in damages for each instance that OpenAI allegedly removed
       | DMCA-protected information in its training data sets. If damages
       | are calculated based on each individual article allegedly used to
       | train ChatGPT, it could quickly balloon to tens of thousands of
       | violations.
       | 
       | Tens of thousands of violations at $2500 each would amount to
       | tens of millions of dollars in damages. I am not familiar with
       | this field, does anyone have a sense of whether the total cost of
       | retraining (without these alleged DMCA violations) might compare
       | to these damages?
        
         | Xelynega wrote:
         | If you're going to retrain your model because of this ruling,
         | wouldn't it make sense to remove _all_ DMCA protected content
         | from your training data instead of just the one you were most
         | recently sued for(especially if it sets precedent)
        
           | jsheard wrote:
           | It would make sense from a legal standpoint, but I don't
           | think they could do that without massively regressing their
           | models performance to the point that it would jeopardize
           | their viability as a company.
        
             | zozbot234 wrote:
             | They might make it work by (1) having lots of public domain
             | content, for the purpose of training their models on basic
             | language use, and (2) preserving source/attribution
             | metadata about what copyrighted content they do use, so
             | that the models can surface this attribution to the user
             | during inference. Even if the latter is not 100% foolproof,
             | it might still be useful in most cases and show good faith
             | intent.
        
               | CaptainFever wrote:
               | The latter one is possible with RAG solutions like
               | ChatGPT Search, which do already provide sources! :)
               | 
               | But for inference in general, I'm not sure it makes too
               | much sense. Training data is not just about learning
               | facts, but also (mainly?) about how language works, how
               | people talk, etc. Which is kind of too fundamental to be
               | attributed to, IMO. (Attribution: Humanity)
               | 
               | But who knows. Maybe it _can_ be done for more fact-like
               | stuff.
        
               | TeMPOraL wrote:
               | > _Training data is not just about learning facts, but
               | also (mainly?) about how language works, how people talk,
               | etc._
               | 
               | All of that and more, all at the same time.
               | 
               | Attribution at inference level is bound to work more-less
               | the same way as humans attribute things during
               | conversations: "As ${attribution} said, ${some quote}",
               | or "I remember reading about it in ${attribution-1} -
               | ${some statements}; ... or maybe it was in
               | ${attribution-2}?...". Such attributions are often wrong,
               | as people hallucinate^Wmisremember where they saw or
               | heard something.
               | 
               | RAG obviously can work for this, as well as other
               | solutions involving retrieving, finding or confirming
               | sources. That's just like when a human actually looks up
               | the source when citing something - and has similar
               | caveats and costs.
        
               | CaptainFever wrote:
               | That sounds about right. When I ask ChatGPT about "ought
               | implies can" for example, it cites Kant.
        
               | noitpmeder wrote:
               | Or this point, I'm sure there is more than enough
               | publically and feely usable content to "learn how
               | language works". There is no need to hoover up private or
               | license-unclear content if that is your goal.
        
               | CaptainFever wrote:
               | I would actually love it if that was true. It would
               | reduce a lot of legal headaches for sure. But if that was
               | true, why were previous GPT versions not as good at
               | understanding language? I can only conclude that it's
               | because that's not actually true. There's not enough
               | digital public domain materials to train a LLM to
               | understand language competently.
               | 
               | Perhaps old texts in physical form, then? It'll cost a
               | lot to digitize that, wouldn't it? And it wouldn't really
               | be accessible to AI hobbyists. Unless the digitization is
               | publicly funded or something.
               | 
               | (A big part of this is also how insanely long copyright
               | lasts (nearly a hundred years!) that keeps most of the
               | Internet's material from being public domain in the first
               | place, but I won't belabour that point here.)
               | 
               | Edit:
               | 
               | Fair enough, I can see your point. "Surely it is cheaper
               | to digitize old texts or buy a license to Google Books
               | than to potentially lose a court case? Either OpenAI
               | really likes risking it to save a bit of money, or they
               | really wanted facts not contained in old texts."
               | 
               | And yeah, I guess that's true. I _could_ say  "but facts
               | aren't copyrightable" (which was supported by the judge's
               | decision from the TFA), but then that's a different
               | debate about whether or not people should be able to own
               | facts. Which does have some inroads (e.g. a right against
               | being summarized because it removes the reason to read
               | original news articles).
        
             | Xelynega wrote:
             | I agree, just want to make sure "they can't stop doing
             | illegal things or they wouldn't be a success" is said out
             | loud instead of left to subtext.
        
               | CuriouslyC wrote:
               | They can't stop doing things some people don't like
               | (people who also won't stop doing things other people
               | don't like). The legality of the claims is questionable
               | which is why most are getting thrown out, but we'll see
               | if this narrow approach works out.
               | 
               | I'm sure there are also a number of easy technical ways
               | to "include" the metadata while mostly ignoring it during
               | training that would skirt the letter of the law if
               | needed.
        
               | Xelynega wrote:
               | If we really want to be technical, in common law systems
               | anything is legal as long as the highest court to
               | challenge it decides it's legal.
               | 
               | I guess I should have used the phrase "common sense
               | stealing in any other context" to be more precise?
        
               | krisoft wrote:
               | > I guess I should have used the phrase "common sense
               | stealing in any other context" to be more precise?
               | 
               | Clearly not common sense stealing. The Intercept was not
               | deprived of their content. If OpenAI would have sneaked
               | into their office and server farm and took all the hard
               | drives and paper copies with the content that would be
               | "common sense stealing".
        
               | TheOtherHobbes wrote:
               | Very much common sense copyright violation though.
               | 
               | Copyright means you're not allowed to copy something
               | without permission.
               | 
               | It's that simple. There is no "Yes but you still have
               | your book" argument, because copyright is _a claim on
               | commercial value_ , not a claim on instantiation.
               | 
               | There's some minimal wiggle room for fair use, but
               | _clearly_ making an electronic copy and creating a
               | condensed electronic version of the content - no matter
               | how abstracted - and using it for profit is not fair use.
        
               | chii wrote:
               | > Copyright means you're not allowed to copy something
               | without permission.
               | 
               | but is training an AI copying? And if so, why isn't
               | someone learning from said work not considered copying in
               | their brain?
        
               | hiatus wrote:
               | Is training an AI the same as a person learning
               | something? You haven't shown that to be the case.
        
               | chii wrote:
               | no i havent, but judging by the name - machine learning -
               | i think it is the case.
        
               | yyuugg wrote:
               | Do you think starfish and jellyfish are fish? Judging by
               | the name they are...
        
               | nkrisc wrote:
               | Because AI isn't a person.
        
               | throw646577 wrote:
               | > but is training an AI copying?
               | 
               | If the AI produces chunks of training set nearly verbatim
               | when prompted, it looks like copying.
               | 
               | > And if so, why isn't someone learning from said work
               | not considered copying in their brain?
               | 
               | Well, their brain, while learning, is not someone's
               | published work product, for one thing. This should be
               | obvious.
               | 
               | But their brain can violate copyright by producing work
               | as the output of that learning, and be guilty of
               | plagiarism, etc. If I memorise a passage of your
               | copyrighted book when I am a child, and then write it in
               | my book when I am an adult, I've infringed.
               | 
               | The fact that most jurisdictions don't consider the work
               | of an AI to be copyrightable does not mean it cannot ever
               | be infringing.
        
               | CuriouslyC wrote:
               | The output of a model can be copyright violation. In
               | fact, even if the model was never trained on copyright
               | content, if I provided copyright text then told the model
               | to regurgitate it verbatim that would be a violation.
               | 
               | That does not make the model copyright violation itself.
        
               | pera wrote:
               | A product from a company is not a person. An LLM is not a
               | brain.
               | 
               | If you transcode a CD to mp3 and build a business around
               | selling these files without the author's permission you'd
               | be in big legal problems.
               | 
               | Tech products that "accidentally" reproduce materials
               | without the owners' permission (e.g. someone uploading La
               | La Land into YouTube) have processes to remove them by
               | simply filling a form. Can you do that with ChatGPT?
        
               | lelanthran wrote:
               | Because the law considers scale.
               | 
               | It's legal for you to possess a single joint. It's not
               | legal for you to possess a warehouse of 400 tons of weed.
               | 
               | The line between legal and not legal is sometimes based
               | on scale; being able to ingest a single book and learn
               | from it is not the same scale as ingesting the entire
               | published works of mankind and learning from it.
        
               | krisoft wrote:
               | Are you describing what the law is or what you feel the
               | law should be? Because those things are not always the
               | same.
        
               | IshKebab wrote:
               | It's not definitely illegal _yet_.
        
               | yyuugg wrote:
               | It's also definitely not _not_ illegal either. Case law
               | is very much tbd.
        
             | asdff wrote:
             | I wonder if they can say something like "we aren't scraping
             | your protected content, we are merely scraping this old
             | model we don't maintain anymore and it happened to have
             | protected content in it from before the ruling" then you've
             | essentially won all of humanities output, as you can
             | already scrape the new primary information (scientific
             | articles and other datasets designed for researchers to
             | freely access) and whatever junk outputted by the content
             | mills is just going to be a poor summarizations of that
             | primary information.
             | 
             | Other factors that help this effort of an old model + new
             | public facing data being complete, are the idea that other
             | forms of media like storytelling and music have already
             | converged onto certain prevailing patters. For stories we
             | expect a certain style of plot development and complain
             | when its missing or not as we expect. For music most
             | anything being listened to is lyrics no one is deeply
             | reading into put over the same old chord progressions we've
             | always had. For art there are just too few of us who are
             | actually going out of our way to get familiar with novel
             | art vs the vast bulk of the worlds present day artistic
             | effort which goes towards product advertisement, which once
             | again follows certain patterns people have been publishing
             | in psychological journals for decades now.
             | 
             | In a sense we've already put out enough data and made
             | enough of our world formulaic to the point where I believe
             | we've set up for a perfect singularity already in terms of
             | what can be generated for the average person who looks at a
             | screen today. And because of that I think even a lack of
             | any new training on such content wouldn't hurt openai at
             | all.
        
               | andyjohnson0 wrote:
               | > I wonder if they can say something like "we aren't
               | scraping your protected content, we are merely scraping
               | this old model we don't maintain anymore and it happened
               | to have protected content in it from before the ruling"
               | 
               | I'm not a lawyer, but I know enough to be pretty
               | confident that that wouldn't work. The law is about
               | _intent_. Coming up with  "one weird trick" to work-
               | around a potential court ruling is unlikely to impress a
               | judge.
        
             | TeMPOraL wrote:
             | Only half-serious, but: I wonder if they can dance with the
             | publishers around this issue long enough for most of the
             | contested text to become part of public court records, and
             | then claim they're now training off that. <trollface>
        
               | jprete wrote:
               | Being part of a public court record doesn't seem like
               | something that would invalidate copyright.
        
             | criddell wrote:
             | That might be the point. If your business model is built on
             | reselling something you've built on stuff you've taken
             | without payment or permission, maybe the business isn't
             | viable.
        
           | A4ET8a8uTh0 wrote:
           | Re-training can be done, but, and it is not a small but,
           | models already do exist and can be used locally suggesting
           | that the milk has been spilled for too long at this point.
           | Separately, neutering them effectively lowers their value as
           | opposed to their non-neutered counterparts.
        
           | ashoeafoot wrote:
           | What about bombing? You could always smuggle dmca content in
           | training sets hoping for a payout?
        
             | Xelynega wrote:
             | The onus is on the person collecting massive amounts of
             | data and circumventing DMCA protections to ensure they're
             | not doing anything illegal.
             | 
             | "well someone snuck in some DMCA content" when sharing
             | family photos and doesn't suddenly make it legal to share
             | that DMCA protected content with your photos...
        
           | sandworm101 wrote:
           | But all content is DMCA protected. Avoiding copyrighted
           | content means not having content as all material is
           | automatically copyrighted. One would be limited to licensed
           | content, which is another minefield.
           | 
           | The apparant loophole is between copyrighted work and
           | copyrighted work that is _also_ registered. But registration
           | can occur at any time, meaning there is little practical
           | difference. Unless you have perfect licenses for all your
           | training data, which nobody does, you have to accept the risk
           | of copyright suits.
        
             | Xelynega wrote:
             | Yes, that's how every other industry that redistributes
             | content works.
             | 
             | You have to license content you want to use, you cant just
             | use it for free because it's on the internet.
             | 
             | Netflix doesn't just start hosting shows and hope they
             | don't get a copyright suit...
        
             | noitpmeder wrote:
             | It's insane to me that people don't agree that you need to
             | require a license to train your proprietary for-profit
             | model on someone else's work.
        
       | logicchains wrote:
       | Eventually we're going to have embodied models capable of live
       | learning and it'll be extremely apparent how absurd the ideas of
       | the copyright extremists are. Because in their world, it'd be
       | illegal for an intelligent robot to watch TV, read a book or
       | browse the internet like a human can, because it could remember
       | what it saw and potentially regurgitate it in future.
        
         | luqtas wrote:
         | problem is when a human company profits over their scrape...
         | this isn't a non-profit running out of volunteers & a total
         | distant reality from autonomous robots learning it way by
         | itself
         | 
         | we are discussing an emergent cause that has social &
         | ecological consequences. servers are power hungry stuff that
         | may or not run on a sustainable grid (that also has a bazinga
         | of problems like leaking heavy chemicals on solar panels
         | production, hydro-electric plants destroying their surroundings
         | etc.) & the current state of producing hardware, be a sweatshop
         | or conflict minerals. lets forget creators copyright violation
         | that is written in the law code of almost every existing
         | country and no artist is making billions out of the abuse of
         | their creation right (often they are pretty chill on getting
         | their stuff mentioned, remixed and whatever)
        
         | openrisk wrote:
         | Leaving aside the hypothetical "live learning AGI" of the
         | future (given that money is made or lost _now_ ), would a human
         | regurgitating content that is not theirs - but presented as if
         | it is - be acceptable to you?
        
           | CuriouslyC wrote:
           | I don't know about you but my friends don't tell me that Joe
           | Schmoe of Reuters published a report that said XYZ copyright
           | XXXX. They say "XYZ happened."
        
             | openrisk wrote:
             | In have a friend that recites all day amazingly long pieces
             | of literature by heart. He says he just wrote them. He also
             | produces a vast number of paintings in all styles, claiming
             | he is a really talented painter.
        
             | noitpmeder wrote:
             | So when everyone in the world starts going to your friend
             | instead of paying Reuters, what happens then?
        
               | CuriouslyC wrote:
               | Reuters finds a new business model? What did horse and
               | buggy drivers do, pivot to romance themed city tours? I'm
               | sure media companies will figure something out.
        
               | openrisk wrote:
               | So who and why will produce the news for your friend to
               | steal? The horse and buggy metaphor is getting tiresome
               | when its used as some sort signalling of "progress
               | oriented minds" and creative destruction enthusiasts
               | versus the luddites.
        
         | Karliss wrote:
         | If humanity ever gets to the point where intelligent robots are
         | capable of watching TV like human can, having to adjust
         | copyright laws seems like the least of problems. How about
         | having to adjust almost every law related to basic "human"
         | rights, ownership, being to establish a contract, being
         | responsible for crimes and endless other things.
         | 
         | But for now your washing machine cannot own other things, and
         | you owning a washing machine isn't considered slavery.
        
         | JoshTriplett wrote:
         | > copyright extremists
         | 
         | It's not copyright "extremism" to expect a level playing field.
         | As long as humans have to adhere to copyright, so should AI
         | companies. If you want to abolish copyright, by all means do,
         | but don't give AI a special exemption.
        
           | IAmGraydon wrote:
           | Except LLMs are in no way violating copyright in the true
           | sense of the word. They aren't spitting out a copy of what
           | they ingested.
        
             | JoshTriplett wrote:
             | Go make a movie using the same plot as a Disney movie, that
             | doesn't copy any of the text or images of the original, and
             | see how far "not spitting out a copy" gets you in court.
             | 
             | AI's approach to copyright is very much "rules for thee but
             | not for me".
        
               | bdangubic wrote:
               | 100% agree. but now a million$ question - how would you
               | deal with AI when it comes to copyright? what rules could
               | we possibly put in place?
        
               | JoshTriplett wrote:
               | The same rules we already have: follow the license of
               | whatever you use. If something doesn't have a license,
               | don't use it. And if someone says "but we can't build AI
               | that way!", too bad, go fix it for everyone first.
        
               | slyall wrote:
               | You have a lot of opinions on AI for somebody who has
               | only read stuff in the public domain
        
               | noitpmeder wrote:
               | Most Information about AI is in the public domain....?
        
               | slyall wrote:
               | I mean "public domain" in the copyright context, not the
               | "trade secret" context.
        
               | rcxdude wrote:
               | That might get you pretty far in court, actually. You'd
               | have to be pretty close in terms of the sequence of
               | events, character names, etc. Especially considering how
               | many Disney movies are based on pre-existing stories, if
               | you were, to, say, make a movie featuring talking animals
               | that more or less followed the plot of Hamlet, you would
               | have a decent chance of prevailing in court, given the
               | resources to fight their army of lawyers.
        
           | CuriouslyC wrote:
           | It's actually the opposite of what you're saying. I can 100%
           | legally do all the things that they're suing OpenAI for.
           | Their whole argument is that the rules should be different
           | when a machine does it than a human.
        
             | JoshTriplett wrote:
             | Only because it would be unconscionable to apply copyright
             | to actual human brains, so we don't. But, for instance, you
             | _absolutely can_ commit copyright violation by reading
             | something and then writing something very similar, which is
             | one reason why reverse engineering commonly uses clean-room
             | techniques. AI training is in no way a clean room.
        
             | nhinck3 wrote:
             | You literally can't
        
               | p_l wrote:
               | You literally can.
               | 
               | Your _ability_ to regurgitate remembered article that is
               | copyrighted does not make your _brain_ a derivative work
               | because removing that specific article from the training
               | set is below noise floor of impact.
               | 
               |  _However_ reproducing the copyrighted material based on
               | that is a violation because the created reproduction
               | _does_ critically depend on that copyrighted material.
               | 
               | (Gross simplification) Similar to how you can watch &
               | read a lot of Star Wars and then even ape Ralph McQuarrie
               | style in your own drawings but unless the result is
               | unmistakenly related to Star Wars there's no copyright
               | infringement - but there is if someone looks at the
               | result and goes "that's Star Wars, isn't it?"
        
               | nhinck3 wrote:
               | Can you regurgitate billions of pieces of information to
               | hundreds of thousands of other people in a way that
               | competes with the source of that information?
        
               | CuriouslyC wrote:
               | If there was only one source for a piece of news ever,
               | you might be able to make that argument in good faith,
               | but when there are 20 outlets with competing versions of
               | the same story it doesn't hold.
        
         | IAmGraydon wrote:
         | Exactly. Also core to the copyright extremists' delusional
         | train of thought is the fact that they don't seem to understand
         | (or admit) that ingesting, creating a model, and then
         | outputting based on that model is exactly what people do when
         | they observe others' works and are inspired to create.
        
         | CuriouslyC wrote:
         | You have to understand, the media companies don't give a shit
         | about the logic, in fact I'm sure a lot of the people pushing
         | the litigation probably see the absurdity of it. This is a
         | business turf war, the stated litigation is whatever excuse
         | they can find to try and go on the offensive against someone
         | they see as a potential threat. The pro copyright group (big
         | media) sees the writing on the wall, that they're about to get
         | dunked on by big tech, and they're thrashing and screaming
         | because $$$.
        
         | tokioyoyo wrote:
         | The problem is, we can't come up with a solution where both
         | parties are happy, because in the end, consumers choose one
         | (getting information from news agencies) or the other (getting
         | information from chatgpt). So, both are fighting for life.
        
       | 3pt14159 wrote:
       | Is there a way to figure out if OpenAI ingested my blog? If the
       | settlements are $2500 per article then I'll take a free used cars
       | worth of payments if its available.
        
         | jazzyjackson wrote:
         | I suppose the cost of legal representation would cancel it out.
         | I can just imagine a class action where anyone who posted on
         | blogger.com between 2002 and 2012 eventually gets a check for
         | 28 dollars.
         | 
         | If I were more optimistic I could imagine a UBI funded by
         | lawsuits against AGI, some combination of lost wages and
         | intellectual property infringement. Can't figure out exactly
         | how much more important an article on The Intercept had on
         | shifting weights than your hacker news comments, might as well
         | just pay everyone equally since we're all equally screwed
        
           | dwattttt wrote:
           | Wouldn't the point of the class action to be to dilute the
           | cost of representation? If the damages per article are high
           | and there's plenty of class members, I imagine the limit
           | would be how much OpenAI has to pay out.
        
           | SahAssar wrote:
           | If you posted on blogger.com (or any platform with enough
           | money to hire lawyers) you probably gave them a license that
           | is irrevocable, non-exclusive and able to be sublicensed.
           | 
           | There are reasons for that (they need a license to show it on
           | the platform) but usually these agreements are overly broad
           | because everyone except the user is covering their ass too
           | much.
           | 
           | Those licenses will now be used to sell that content/data for
           | purposes that nobody thought about when you started your
           | account.
        
         | Brajeshwar wrote:
         | There was a Washington Post article that did something on this
         | (but not for OpenAI). Check if your website is there at
         | https://www.washingtonpost.com/technology/interactive/2023/a...
         | 
         | There should a was to check for OpenAI. But my guess is, if
         | Google does it, OpenAI and others must be using the
         | same/similar resource pool.
         | 
         | My website has some 56K Token and I have no clue what that was,
         | but something is there
         | https://www.dropbox.com/scl/fi/2tq4mg16jup2qyk3os6ox/brajesh...
        
       | bastloing wrote:
       | Isn't this the same thing Google has been doing for years with
       | their search engine? Only difference is Google keeps the data
       | internal, whereas openai spits it out to you. But it's still
       | scraped and stored in both cases.
        
         | jazzyjackson wrote:
         | A component of fair use is to what degree the derivative work
         | displaces the original. Google's argument has always been that
         | they direct traffic to the original, whereas AI summaries
         | (which Google of course is just as guilty of as openai)
         | completely obsoletes the original publication. The argument now
         | is that the derivative work (LLM model) is transformative, ie,
         | different enough that it doesn't economically compete with the
         | original. I think it's a losing argument but we'll see what the
         | courts arrive at.
        
           | CaptainFever wrote:
           | Is this specific to AI or specific to summaries in general?
           | Do summaries, like the ones found in Wikipedia or Cliffs
           | Notes, not have the same effect of making it such that people
           | no longer have to view the original work as much?
           | 
           | Note: do you mean the _model_ is transformative, or the
           | _summaries_ are transformative? I think your comment holds up
           | either way but I think it 's better to be clear which one you
           | mean.
        
         | LinuxBender wrote:
         | In my opinion _not a lawyer_ , Google at least references where
         | they obtained the data and did not regurgitate it as if they
         | were the creators that created something new. _obfuscated
         | plagiarism via LLM._ Some claim derivative works but I have
         | always seen that as quite a stretch. People here expect me to
         | cite references yet LLM 's somehow escape this level of
         | scrutiny.
        
       | efitz wrote:
       | I would trust AI a lot more if it gave answers more like:
       | 
       |  _"Source A on date 1 said XYX"_
       | 
       |  _"Source B ..."_
       | 
       |  _"Synthesizing these, it seems that the majority opinion is X
       | but Y is also a commonly held opinion."_
       | 
       | Instead of what it does now, which is make extremely confident,
       | unsourced statements.
       | 
       | It looks like the copyright lawsuits are rent-seeking as much as
       | anything else; another reason I hate copyright in its current
       | form.
        
         | CaptainFever wrote:
         | ChatGPT Search provides this, by the way, though it relies a
         | lot on the quality of Bing search results. Consensus.app does
         | this but for research papers, and has been very useful to me.
        
           | maronato wrote:
           | More often than not in my experience, clicking these sources
           | takes me to pages that either don't exist, don't have the
           | information ChatGPT is quoting, or ChatGPT completely
           | misinterpreted the content.
        
         | akira2501 wrote:
         | > which is make extremely confident,
         | 
         | One of the results the LLM has available to itself is a
         | confidence value. It should, at the very least, provide this
         | along with it's answer. Perhaps if it did people would stop
         | calling it 'AI'.'
        
           | pavon wrote:
           | My understanding is that this confidence value is not a
           | measure of how likely something is correct/true, but more
           | along the lines of how likely that sentence would be.
           | Including it could be more misleading than helpful, for
           | example if it is repeating commonly misunderstood
           | information.
        
           | ethernot wrote:
           | I'm not sure that it's possible to produce anything
           | reasonable in that space. It would need to know how far it is
           | away from correct to provide a usable confidence value
           | otherwise it'd just be hallucinating a number in the same way
           | as the result.
           | 
           | An analogy. Take a former commuter friend of mine, Mr Skol
           | (named after his favourite breakfast drink). Seen on a
           | minibus I had to get to work years ago, we shared many
           | interesting conversations. Now he was a confident expert on
           | everything. If asked to rate his confidence in a subject it
           | would be a good 95% at least. However he spoke absolute
           | garbage because his brain was rotten away from drinking Skol
           | for breakfast, and the odd crack chaser. I suspect his model
           | was still better than GPT-4o. But an average person could
           | determine the veracity of his arguments.
           | 
           | Thus confidence should be externally rated as an entity with
           | knowledge cannot necessarily rate itself for it has bias.
           | Which then brings in the question of how do you do that. Well
           | you'd have to do the research you were going to do anyway and
           | compare. So now you've used the AI and done the research
           | which you would have had to do if the AI didn't exist. So the
           | AI at this point becomes a cost over benefit if you need
           | something with any level of confidence and accuracy.
           | 
           | Thus the value is zero unless you need crap information,
           | which is at least here, never, unless I'm generating a
           | picture of a goat driving a train or something. And I'm not
           | sure that has any commercial value. But it's fun at least.
        
             | readyplayernull wrote:
             | Do androids dream of Dunning-Kruger?
        
         | 1vuio0pswjnm7 wrote:
         | "It looks like the copyright lawsuits are rent-seeking as much
         | as anything else;"
         | 
         | If an entity charges fees for "AI", then is it "rent-seeking"
         | 
         | (Assume that the entity is not the author of the training data
         | used)
        
         | Paul-E wrote:
         | This is what a number of startups, such as Yurts.ai and
         | Vannevar Labs, are racing to build for organizations. I
         | wouldn't be surprised if, in 5-10 years, most large corps and
         | government agencies had these sort of LLM/RAGs over their
         | internal documents.
        
       | ashoeafoot wrote:
       | Will we see human washing, where Ai art or works get a "Made by
       | man" final touch in some third world mechanical turk den? Would
       | that add another financial detracting layer to the ai winter?
        
         | Retric wrote:
         | The law generally takes a dim view of such attempts to get
         | around things like that. AI biggest defense is claiming they
         | are so beneficial to society that what they are doing is fine.
        
           | gmueckl wrote:
           | That argument stands on the mother of all slippery slopes!
           | Just find a way to make your product mpressive or ubiquitous
           | and all of a sudden it doesn't matter how much you break the
           | law along the way? That's so insane I don't even know where
           | to start.
        
             | ashoeafoot wrote:
             | Worked for purdue
        
             | Retric wrote:
             | YouTube, AirBnB, Uber, and many _many_ others have all done
             | stuff that's blatant against the law but gotten away with
             | it due to utility.
        
             | rcxdude wrote:
             | Why not, considering copyright law specifically has fair
             | use outlined for that kind of thing? It's not some
             | overriding consequence of law, it's that copyright is a
             | granting of a privilege to individuals and that that
             | privilege is not absolute.
        
           | gaganyaan wrote:
           | That is not in any way the biggest defense
        
             | Retric wrote:
             | It's worked for many startups and court cases in the past.
             | Copyright even has many explicit examples of the utility
             | loophole look at say: https://en.wikipedia.org/wiki/Sony_Co
             | rp._of_America_v._Unive....
        
         | righthand wrote:
         | That will probably happen to some extent if not already.
         | However I think people will just stop publishing online if
         | malicious corps like OpenAI are just going to harvest works for
         | their own gain. People publish for personal gain, not to enrich
         | the public or enrich private entities.
        
           | Filligree wrote:
           | However, I get my personal gain regardless of whether or not
           | the text is also ingested into ChatGPT.
           | 
           | In fact, since I use ChatGPT a lot, I get more gain if it is.
        
             | righthand wrote:
             | How much of your income have you spent on ChatGPT vs how
             | much ChatGPT has increased your income?
        
               | Filligree wrote:
               | ChatGPT doesn't increase my income. It's useful for my
               | hobbies, and probably made those more expensive.
        
         | CuriouslyC wrote:
         | There's no point in having third world mechanical turk dens do
         | finishing passes on AI output unless you're trying to make it
         | worse.
         | 
         | Artists are already using AI to photobash images, and writers
         | are using AI to outline and create rough drafts. The point of
         | having a human in the loop is to tell the AI what is worth
         | creating, then recognize where the AI output can be improved.
         | If we have algorithms telling the AI what to make and content
         | mill hacks smearing shit on the output to make it look more
         | human, that would be the worst of both worlds.
        
           | TheDong wrote:
           | I think the point of the comment isn't to have this finishing
           | layer to make things "better", but to make things "legal".
           | 
           | Humans are allowed to synthesize a bunch of inputs together
           | and produce a new novel copyrighted.
           | 
           | An algorithm, if it mixes a bunch of copyrighted things
           | together by itself, plausibly is incapable of producing a
           | novel copyright, and instead inherits the old copyright.
           | 
           | Just like Clean Room Design
           | (https://en.wikipedia.org/wiki/Clean-room_design) can be used
           | to re-create the same software free of the original
           | copyright, I think the parent is arguing that a mechanical
           | turk process could allow AI to produce the same output free
           | of the original copyright.
        
         | doctorpangloss wrote:
         | Did you miss the twist at the end of the article?
         | 
         | > Andrew Deck is a generative AI staff writer at Nieman Lab...
        
       | ada1981 wrote:
       | I'm still of the opinion that we should be allowed to train on
       | any data a human can read online.
        
         | smitelli wrote:
         | ...Limited to a human's average rate of consumption, right?
        
           | warkdarrior wrote:
           | Yes, just like my download speed is capped by the speed of me
           | writing bytes on paper.
        
           | ada1981 wrote:
           | Is there any other information processing we limit to human
           | speed?
        
       | cynicalsecurity wrote:
       | Yeah, let's stop the progress because a few magazines no one
       | cares about are unhappy.
        
         | a57721 wrote:
         | Maybe just don't use data from the unhappy magazines you don't
         | care about in the first place?
        
       | bastloing wrote:
       | Who would be forever grateful if openai removed all of The
       | Intercept's content permanently and refused to crawl it in the
       | future?
        
         | noitpmeder wrote:
         | Sure, and then do that with every other piece of work theyre
         | unfairly using
        
           | bastloing wrote:
           | I'd actually leave it up to the owner. Some want their work
           | removed, some don't care.
        
       | dr_dshiv wrote:
       | Proposed: 10% tax as copyright settlement, half to pay for past
       | creators and and half to pay current creative culture
        
         | dawnerd wrote:
         | Problem with that is it's too easy to effectively have 0 taxes.
        
       | _giorgio_ wrote:
       | Just train the models in Japan.
       | 
       | *No copyright.*
       | 
       | https://insights.manageengine.com/artificial-intelligence/th...
       | 
       | https://news.ycombinator.com/item?id=38842788
        
       | theropost wrote:
       | Copyright laws, in many ways, feel outdated and unnecessarily
       | rigid. They often appear to disproportionately favor large
       | corporations without providing equivalent value to society. For
       | example, brands like Disney have leveraged long-running
       | copyrights to generate billions, or even tens of billions, of
       | dollars through enforcement over extended periods. This approach
       | feels excessive and unsustainable.
       | 
       | The reliance on media saturation and marketing creates a
       | perception that certain works are inherently more valuable than
       | others, despite new creative works constantly being developed.
       | While I agree that companies should have the right to profit from
       | their investments, such as a $500 million movie, there should be
       | reasonable limits. Once they recoup their costs, including a
       | reasonable profit multiplier, the copyright could be considered
       | fulfilled and should expire.
       | 
       | Holding onto copyrights indefinitely or for excessively long
       | periods serves primarily to sustain a system that benefits
       | lawyers and enforcement agencies, rather than providing
       | meaningful value to society. For instance, enforcing a copyright
       | from the 1940s for a multinational corporation that already
       | generates billions makes little sense.
       | 
       | There should be a balanced framework. If I invest significant
       | time and effort--say 100 hours--into creating a work, I should be
       | entitled to earn a reasonable return, perhaps 10 times the effort
       | I put in. However, after that point, the copyright should no
       | longer apply. Current laws have spiraled out of control, failing
       | to strike a balance between protecting creators and fostering
       | innovation. Reform is long overdue.
        
         | pclmulqdq wrote:
         | I am personally in favor of strong, short copyrights (and
         | patents). 90+ year copyrights are just absurd. Most movies make
         | almost all their money in the first 10 years anyway, and a
         | strong 10- or 20-year copyright would keep the economics of
         | movie and music production largely the same.
        
       | tolmasky wrote:
       | The logical endgame of all this isn't "stopping LLMs," it's
       | Disney happening to own a critical mass of IP to be able to more
       | or less exclusively legally train and run LLMs that make movies,
       | firing all their employees, and no smaller company ever having a
       | chance in hell with competing with a literal century's worth of
       | IP powering a _generative_ model. This turns the already
       | egregiously generous backwards facing monopoly into a forward
       | facing monopoly.
       | 
       | None of this was ever the point of copyright. The best part about
       | all this is that Disney initially took off by... making use of
       | public domain works. Copyright used to last 14 years. You'd be
       | able to create derivative works of most the art in your life at
       | some point. Disney is ironically the proof of how _constructive_
       | a system that regularly turns works over to the public domain can
       | be. But thanks to lobbying by Disney, now you're never allowed to
       | create a derivative work of the art in your life.
       | 
       | Copyright is only possible because _we the public fund the
       | infrastructure necessary to maintain it_. "IP" isn't self
       | manifesting like physical items. Me having a cup necessarily
       | means you don't have it. That's not how ideas and pictures work.
       | You can infinitely perfectly duplicate them. Thus we set up laws
       | and courts and police to create a complicated _simulation_ of
       | physical properties for IP. Your tax dollars pay for that. The
       | original deal was that in exchange, those works would enter the
       | public domain to give back to society. We've gotten so far from
       | that that people now argue about OpenAI "stealing" from authors,
       | when the authors most of the time don't even own the works --
       | their employers do! What a sad comedy where we've forgotten we
       | have a stake in this too and instead argue over which corporation
       | should "own" the exclusive ability to cheaply and blazingly fast
       | create future works while everyone else has to do it the hard
       | way.
        
         | tzkaln wrote:
         | ClosedAI etc. are certainly stealing from open source authors
         | and web site creators, who do own the copyright.
         | 
         | That said, I agree with putting more emphasis on individual
         | creators, even if they have sold the copyright to corporations.
         | I was appalled by the Google settlement with the author's
         | guild: Why does a guild decide who owns what and who gets
         | compensations?
         | 
         | Both Disney and ClosedAI are in the wrong here. I'm the
         | opposite of a Marxist, but Marx' analysis was frequently right.
         | He used the term "alienation from one's work" in the context of
         | factory workers. Now people are being alienated from their
         | intellectual work, which is stolen, laundered and then sold
         | back to them.
        
           | ToucanLoucan wrote:
           | I mean, not to be that guy, but multiple Marxist and Marxist-
           | adjacent people I know and am have been out here pointing out
           | how this was exactly and always what was going to happen
           | since the LLM hype cycle really kicked into high-gear in
           | mid-2023. And I was told in no uncertain terms, many times,
           | on here, about how I was being a doomer, a pessimist, a
           | luddite, etc. etc. because I and many like me saw the writing
           | on the wall, immediately, that while generative AI
           | represented a neat thing for folks to play with, that it
           | would, like every other emerging tech, quickly become the
           | sole domain of the monied entities that already run the rest
           | of our lives, and this would be bad for basically everyone
           | long term.
           | 
           | And yeah it looks be shaping up as exactly that.
        
             | trinsic2 wrote:
             | Yep. And people support it like "No its not going to be
             | like that this time" bullshit.
        
           | marcosdumay wrote:
           | I don't think you need to be a Marxist to accept that his
           | observation that people are being alienated from their work
           | capacity is spot on.
           | 
           | The "Marxsist" name is either about believing on the parts
           | that aren't true or about the political philosophy (that
           | honestly, can't stand by its own without the wrong facts).
           | The ones that fit reality only make one a "realist".
        
         | notahacker wrote:
         | If I thought that nobody had a chance in hell of competing with
         | generative models compiled by Disney from its corpus of
         | lighthearted family movies, I'd be even less keen to give
         | unlimited power to create derivative works out of everything in
         | history to the companies with the greatest amount of computing
         | power, which in this case happens to be a subsidiary of
         | Microsoft.
         | 
         | All property rights depends on public funding the
         | infrastructure to enforce them. If I believed movies derived
         | from applying generative AI techniques to other movies was the
         | endgame of human creativity, I'd find your endgame of it being
         | the fiefdom of corporations who sold enough Windows licenses to
         | own billions of dollars worth of computer hardware even more
         | dystopian than it being invested in the corporations who
         | originally paid for the movies...
        
           | horsawlarway wrote:
           | Two thoughts
           | 
           | 1. You are assuming that "greatest computing power" is a
           | requirement. I think we're actually seeing a trend in the
           | opposite direction with recent generative art models: It
           | turns out consumer grade hardware is "enough" in basically
           | all cases, and renting the compute you might otherwise be
           | missing is cheap. I don't buy this as the barrier.
           | 
           | 2. Given #1, I think you are framing the conversation in a
           | very duplicitive manner by pitching this as "either Microsoft
           | or Disney - pick your oppressor". I'd suggest that breaking
           | the current fuckery in copyright, and restoring something
           | more sane (like the 7 + 7 year original timespans) would
           | benefit individuals who want to make stories and art far more
           | than it would benefit corporations. Disney is literaly _THE_
           | reason for half of the current extensions in timespan. They
           | don 't want reduced copyright - they want to curtail
           | expression in favor of profit. This case just happens to have
           | a convienent opponent for public sentiment.
           | 
           | ---
           | 
           | Further - "All property rights depends on public funding the
           | infrastructure to enforce them" Is false. This is only the
           | case for intellectual property rights, where nothing need be
           | removed from one person for the other to be "in violation".
        
             | catlifeonmars wrote:
             | > All property rights depends on public funding the
             | infrastructure to enforce them
             | 
             | Still true, because people generally depend on the legal
             | system and police departments to enforce physical property
             | rights (both are publicly funded entities).
        
             | notahacker wrote:
             | I'm assuming greater computing power is a requirement
             | because creating generative feature length movies (which is
             | a few orders of magnitude more complex than creating PNGs)
             | is something only massive corporations can afford the
             | computing power to do at the moment (and the implied bar
             | for excellence something we haven't reached). Certainly
             | computing power and dev resource are more of a bottleneck
             | to creating successful AI movies than not having access to
             | the Disney canon which was the argument the OP made for
             | anything other than OpenAI having unlimited rights over
             | everyones content leading inexorably to a Disney generative
             | AI monopoly. (another weakness of that is I'm not sure the
             | Disney canon is _sufficient_ training data for Disney to
             | replace their staff with generative movies, never mind
             | _necessary_ for anyone else to ever make a marketable
             | quality movie again)
             | 
             | Given #1, I think the OP is framing the conversation in a
             | far more duplicitous manner by assuming that in a lawsuit
             | against AI which doesn't even involve Disney, the only
             | beneficiary of OpenAI not winning will be Disney. Disney
             | extending copyright laws in past decades has nothing to do
             | with a 10 year old internet company objecting to Open AI
             | stripping all the copyright information off its recent
             | articles before feeding them into its generative model.
             | 
             | > Further - "All property rights depends on public funding
             | the infrastructure to enforce them" Is false. This is only
             | the case for intellectual property rights, where nothing
             | need be removed from one person for the other to be "in
             | violation".
             | 
             | People who don't respect physical property are just as
             | capable of removing it as people who don't respect
             | intellectual property are capable of copying it. In both
             | cases the thing that prevents them doing so is a legal
             | system and taxpayer funded enforcement against people that
             | don't play by the rules.
        
         | adventured wrote:
         | AI is absolutely a further wealth concentrator by its very
         | nature. It will not liberate the bottom 3/4, it will not free
         | up their time by allowing them to work a lot less (as so many
         | incorrectly predict now). Eric Schmidt for example has some
         | particularly incorrect claims out there right now about how AI
         | will widely liberate people from having to work so many hours,
         | it will prove laughable in hindsight. Those that wield high-end
         | AI, and the extreme cost of operations that will go with it,
         | will reap extraordinary wealth over the coming century. Elon
         | Musk style wealth. Very few will have access to the resources
         | necessary to operate the best AI (the cost will continue to
         | climb over what companies like Microsoft, Google, Amazon,
         | OpenAI, etc are already spending).
         | 
         | Sure, various AI assistants will make more aspects of your life
         | automated. In that sense it'll buy people more time in their
         | private lives. It won't get most people a meaningful increase
         | in wealth, which is the ultimate liberator of time. That is,
         | financial independence.
         | 
         | And you can already see the ratio of people that are highly
         | engaged with utilizing the latest LLMs, paying for them, versus
         | either rarely or never using them (either not caring/interested
         | in utilizing, or not understanding how to do so effectively).
         | It's heavily bifurcated between the elites and everybody else,
         | just as most tech advances have been so far. A decade ago a
         | typical lower / lower middle class person could have gone to
         | the library and learned JavaScript and over the course of years
         | could have dramatically increased their earning potential (a
         | process that takes time to be clear); for the same reason that
         | rarely happens by volition, they also will not utilize LLMs to
         | advance their lives despite the wide availability of them. AI
         | will end up doing trivial automation tasks for the bottom 50%.
         | For the top ~1/4 it will produce enormous further wealth from
         | equity holdings and business process productivity gains
         | (boosting wealth from business ownership, which the bottom 50%
         | lacks universally).
        
         | cxr wrote:
         | > Me having a cup necessarily means you don't have it. That's
         | not how ideas and pictures work. You can infinitely perfectly
         | duplicate them.
         | 
         | This is a stupid argument, no matter how often it comes up.
         | 
         | If I hire Alice to come to my sandwich shop and make sandwiches
         | for customers all week and then on payday I say, "Welp, no need
         | to pay you--the sandwiches are already made!" then Alice is
         | definitely out something, and I am categorically a piece of
         | shit for trotting out this line of reasoning to try to justify
         | not paying her.
         | 
         | If I do the same thing except I commission Alice to do a
         | drawing for a friend's birthday, then I am no less a piece of
         | shit if I make my own copy once she's shown it to me and try to
         | get out of paying since I'm not using "her" copy.
         | 
         | (Notice that in neither case was the thing produced ever
         | something that Alice was going to have for herself--she was
         | never going to take home 400 sandwiches, nor was she ever
         | interested in a portrait of your friend and his pet rabbit.)
         | 
         | If Alice senses that I'd be interested in the drawing but might
         | not be totally swayed until I see it for myself, so she
         | proactively decides to make the drawing upfront before
         | approaching me, then it doesn't fundamentally change the
         | balance from the previous scenario--she's out no less in that
         | case than if I approached her first and then refuse to pay
         | after the fact. (If she was wrong and it turns it I didn't
         | actually want it because she misjudged and will not be able to
         | recoup her investment, fair. But that's not the same as if she
         | didn't misjudge and I come to her with this bankrupt argument
         | of, "You already made the drawing, and what's done is done, and
         | since it's infinitely reproducible, why should I owe you
         | anything?")
         | 
         | Copyright duration is too long. But the fundamental difference
         | between rivalrous possession of physical artifacts and
         | infinitely reproducible ideas really needs to stay the hell out
         | of these debates. It's a tired, empty talking point that
         | doesn't actually address the substance of what IP laws are
         | really about.
        
           | kweingar wrote:
           | This isn't really an argument though. It's an assertion that
           | not honoring a commission agreement (or an employment
           | contract) is equivalent to not paying for a license to an
           | existing work. I tend to disagree. I could be persuaded
           | otherwise, but I'd need to hear an argument other than
           | "clearly these are the same thing."
        
             | cxr wrote:
             | > This isn't really an argument though. It's an assertion
             | that not honoring a commission agreement
             | 
             | Wrong. It's that (not honoring an agreement negotiated
             | beforehand) _and_ an argument against treating past-action-
             | thing as inherently zero-cost and /or zero-value; the fact
             | that a prior agreement is an element in the offered
             | scenarios doesn't negate or neutralize the rest of it (just
             | like the fact that a sandwich shop is an element in one of
             | the scenarios doesn't negate or neutralize the broader
             | reality for non-sandwich-involving scenarios).
             | 
             | And that's before we mention: _there _is_ such an prior
             | agreement in the case of modern IP_ --you can't not contend
             | with the fact that if Alice is operating in the United
             | States which has existing legislation granting her a
             | "temporary monopoly" on her creative output, and then she
             | generates the output on the basis that she'll be protected
             | by the law of the land, and then you decide that you just
             | don't agree with the idea of IP, then Alice is getting
             | screwed over by someone not holding up their end of the
             | bargain.
        
               | trinsic2 wrote:
               | I'm sorry, the two are not even remotely the same. Saying
               | it over and over again doesn't make it so.
        
               | cxr wrote:
               | You wanna, like, actually digest what I wrote there? The
               | second comment here is so unlike the first that your
               | "Saying it over and over again" remark can only lead to
               | the conclusion that you either didn't read it or didn't
               | grok it. They're two different comments about two
               | different things.
               | 
               | > I'm sorry
               | 
               | Are you? I think you mixed up the words "insincere" and
               | "sorry".
        
         | tlb wrote:
         | I find "this wasn't the point of copyright", referring to the
         | motivations of 18th century legislators, unpersuasive. They
         | were making up rules that were good for the foreseeable future,
         | but they didn't foresee everything and certainly not everyone
         | being connected to a global data network.
         | 
         | Persuasive arguments should focus on what's good for the world
         | today.
        
         | benreesman wrote:
         | Sounds like it's supposed to be hopeless to compete already:
         | https://www.zenger.news/2023/06/10/sam-altman-says-its-
         | hopel....
        
         | marcosdumay wrote:
         | Either copyrights exist, and people can't copy creative works
         | "owned" by somebody else, or copyrights don't exist and people
         | can copy those at will.
         | 
         | "Copyrights exist, and people can copy others works if they
         | have enough computing power to multiplex it with other works
         | and demultiplex to get it back" is not a reasonable position.
         | 
         | I'm all for limiting it to 15 or 20 years, and requiring
         | registration. If you want to completely end them, I'd be ok
         | with that too (but I think it's suboptimal). But "end them to
         | rich people" isn't acceptable.
        
       | torginus wrote:
       | How sturdy is this claim?
       | 
       | If we presume it's illegal to train on copyrighted works, but
       | Wikipedia, a website summarizing the article is perfectly legal,
       | then what would happen if we got LLM A to summarize the article
       | and use that to train LLM B.
       | 
       | LLM A could be trained on public domain works.
        
         | vorpalhex wrote:
         | LLM B would be a very bad LLM with only limited vocabulary and
         | turn of phrase, and would tend to have a single writing tone.
         | 
         | And no, having 5000 different summarizing LLMs doesn't help
         | here.
         | 
         | It's sort of like taking a photograph of a photograph.
        
         | miohtama wrote:
         | If it is illegal to train on copyrighted work, it will also
         | benefit actors that are free to ignore laws, like Chinese
         | public private companies. It means Western companies will lose
         | in the AI race.
        
           | tapoxi wrote:
           | Then we don't respect their copyrights? Why is this some sort
           | of unsolvable problem and the only solution is to allow mega
           | corporations to sell us AI that is trained on the work of
           | artists without their consent?
        
       ___________________________________________________________________
       (page generated 2024-11-30 23:01 UTC)