[HN Gopher] Core copyright violation moves ahead in The Intercep...
___________________________________________________________________
Core copyright violation moves ahead in The Intercept's lawsuit
against OpenAI
Author : giuliomagnifico
Score : 289 points
Date : 2024-11-29 13:48 UTC (1 days ago)
(HTM) web link (www.niemanlab.org)
(TXT) w3m dump (www.niemanlab.org)
| philipwhiuk wrote:
| It's extremely lousy that you have to pre-register copyright.
|
| That would make the USCO a defacto clearinghouse for news.
| throw646577 wrote:
| You don't have to pre-register copyright in any Berne
| Convention countries. Your copyright exists from the moment you
| create something.
|
| (ETA: This paragraph below is diametrically wrong. Sorry.)
|
| AFAIK in the USA, registered copyright is necessary if you want
| to bring a lawsuit and get more than statutory damages, which
| are capped low enough that corporations do pre-register work.
|
| Not the case in all Berne countries; you don't need this in the
| UK for example, but then the payouts are typically a lot lower
| in the UK. Statutory copyright payouts in the USA can be enough
| to make a difference to an individual author/artist.
|
| As I understand it, OpenAI could still be on the hook for up to
| $150K per article if it can be demonstrated it is wilful
| copyright violation. It's hard to see how they can argue with a
| straight face that it is accidental. But then OpenAI is, like
| several other tech unicorns, a bad faith manufacturing device.
| Loughla wrote:
| You seem to know more about this than me. I have a family
| member who "invented" some electronics things. He hasn't done
| anything with the inventions (I'm pretty sure they're
| quackery).
|
| But to ensure his patent, he mailed himself a sealed copy of
| the plans. He claims the postage date stamp will hold up in
| court if he ever needs it.
|
| Is that a thing? Or is it just more tinfoil business? It's
| hard to tell with him.
| throw646577 wrote:
| Honestly I don't know whether that actually is a meaningful
| thing to do anymore; it may be with patents.
|
| It certainly used to be a legal device people used.
|
| Essentially it is low-budget notarisation. If your family
| member believes they have something which is timely and
| valuable, it might be better to seek out proper legal
| notarisation, though -- you'd consult a Notary Public:
|
| https://en.wikipedia.org/wiki/Notary_public
| WillAdams wrote:
| It won't hold up in court, and given that the post-office
| will mail/deliver unsealed letters (which may then be
| sealed after the fact), will be viewed rather dimly.
|
| Buy your family member a copy of:
|
| https://www.goodreads.com/book/show/58734571-patent-it-
| yours...
| Y_Y wrote:
| Surely the NSA will retain a copy which can be checked
| Tuna-Fish wrote:
| Even if they did, it in fact cannot be checked. There is
| precedent that you cannot subpoena NSA for their
| intercepts, because exactly what has been intercepted and
| stored is privileged information.
| hiatus wrote:
| > There is precedent that you cannot subpoena NSA for
| their intercepts
|
| I know it's tangential to this thread but could you link
| to further reading?
| ysofunny wrote:
| but only in a real democracy
| cma wrote:
| The USmoved to first to file years ago. Whoever files first
| gets it, except if he publishes it publicly there is a
| 1-year inventor's grace period (it would not apply to a
| self mail or private mail to other people).
|
| This is patent, not copyright.
| Isamu wrote:
| Mailing yourself using registered mail is a very old tactic
| to establish a date for your documents using an official
| government entity, so this can be meaningful in court.
| However this may not provide the protection he needs.
| Copyright law differs from patent law and he should seek
| legal advice
| dataflow wrote:
| Even if the date is verifiable, what would it even prove?
| If it's not public then I don't believe it can count as
| prior art to begin with.
| blibble wrote:
| presumably the intention is to prove the existence of the
| specific plans at a specific time?
|
| I guess the modern version would be to sha256 the plans and
| shove it into a bitcoin transaction
|
| good luck explaining that to a judge
| Isamu wrote:
| Right, you can register before you bring a lawsuit. Pre-
| registration makes your claim stronger, as does notice of
| copyright.
| dataflow wrote:
| That's what I thought too, but why does the article say:
|
| > Infringement suits require that relevant works were first
| registered with the U.S. Copyright Office (USCO).
| throw646577 wrote:
| OK so it turns out I am wrong here! Cool.
|
| I have it upside down/diametrically wrong, however you see
| fit. Right that structures exist, exactly wrong on how they
| apply.
|
| It is registration that guarantees access to statutory
| damages:
|
| https://www.justia.com/intellectual-
| property/copyright/infri...
|
| Without registration you still have your natural copyright,
| but you would have to try to recover the profits made by
| the infringer.
|
| Which does sound like more of an uphill struggle for The
| Intercept, because OpenAI could maybe just say "anything we
| earn from this is de minimis considering how much errr
| similar material is errrr in the training set"
|
| Oh man it's going to take a long time for me to get my
| brain to accept this truth over what I'd always understood.
| zerocrates wrote:
| You have to register to sue, but you have the copyright
| automatically at the moment the work is created.
|
| You can go register after an infringement and still sue,
| but you then won't be able to get statutory damages or
| attorney's fees.
|
| Statutory damages are a big deal in general but especially
| here where proving how much of OpenAI's revenue is due to
| your specific articles is probably impossible. Which is why
| they're suing under this DMCA provision: it's not an
| infringement suit so the registration requirement doesn't
| apply, and there's a separate statutory damages provision
| for it.
| pera wrote:
| > _It 's hard to see how they can argue with a straight face
| that it is accidental_
|
| It's another instance of "move fast, break things" (i.e.
| "keep your eyes shut while breaking the law at scale")
| renewiltord wrote:
| Yes, because all progress depends upon the unreasonable
| man.
| 0xcde4c3db wrote:
| The claim that's being allowed to proceed is under 17 USC 1202,
| which is about stripping metadata like the title and author. Not
| exactly "core copyright violation". Am I missing something?
| anamexis wrote:
| I read the headline as the copyright violation claim being core
| to the lawsuit.
| H8crilA wrote:
| The plaintiffs focused on exactly this part - removal of
| metadata - probably because it's the most likely to hold in
| courts. One judge remarked on it pretty explicitly, saying
| that it's just a proxy topic for the real issue of the usage
| of copyrighted material in model training.
|
| I.e., it's some legalese trick, but "everyone knows" what's
| really at stake.
| 0xcde4c3db wrote:
| Yeah; I think that's essentially where the disconnect is
| rooted for me. It seems to me (a non-lawyer, to be clear)
| that it's _damn hard_ to make the case for model training
| necessarily being meat-and-potatoes "infringement" as
| things are defined in Title 17 Chapter 1. I see it as
| firmly in the grey area between "a mere change of physical
| medium or deterministic mathematical transformation clearly
| isn't a defense against infringement on its own" and "
| _giant toke_ come on, man, Terry Brooks was obviously just
| ripping off Tolkien ". There might be a tension between
| what constitutes "substantial similarity" through analog
| and digital lenses, especially as the question pertains to
| those who actually distribute weights.
| kyledrake wrote:
| I think you're at the heart of it, and you've humorously
| framed the grey area here and it's very weird. Sans a
| ruling that, for example, computers are too deterministic
| to be creative, copyright laws really seem to imply that
| LLM training is legal. Learning and then creating
| something new from what you learned isn't copyright
| infringement, so what's the legal argument here? A ruling
| declaring this copyright infringement is likely going to
| have crazy ripple effects going way beyond LLMs,
| something a good judge is going to be very mindful of.
|
| Ultimately, this is probably going to require congress to
| create new laws to codify this.
| mikae1 wrote:
| According to us law, is the Internet Archive a library? I
| know they received a DMCA excemption.
|
| If so, you could argue that your local library returns
| perfect copies of copyrighted works too. IMO it's somehow
| different from a business turning the results of their
| scraping into a profit machinery.
| kyledrake wrote:
| My understanding is that there is no concept of a library
| license and that you just say you're a library and
| therefore become one, and whether your claim survives is
| more a product of social cultural acceptance than actual
| legal structures but someone is welcome to correct me.
|
| The internet archive also scrapes the web for content,
| does not pay authors, the difference being that it spits
| out literal copies of the content it scraped, whereas an
| LLM fundamentally attempts to derive a new thing from the
| knowledge it obtains.
|
| I just can't figure out how to plug this into copyright
| law. It feels like a new thing.
| quectophoton wrote:
| Also, Google Translate, when used to translate web pages:
|
| > does not pay authors
|
| Check.
|
| > it spits out literal copies of the content it scraped
|
| Check.
|
| > attempts to derive a new thing from the knowledge it
| obtains.
|
| Check.
|
| * Is interactive: Check.
|
| * Can output text that sounds syntactically and
| grammatically correct, but a human can instantly say
| "that doesn't look right": Check.
|
| * Changing one word in a sentence affects words in a
| completely different sentence, because that changed the
| context: Check.
| CaptainFever wrote:
| Also, is there really any benefit to stripping author metadata?
| Was it basically a preprocessing step?
|
| It seems to me that it shouldn't really affect model quality
| all that much, is it?
|
| Also, in the amended complaint:
|
| > not to notify ChatGPT users when the responses they received
| were protected by journalists' copyrights
|
| Wasn't it already quite clear that as long as the articles
| weren't replicated, it wasn't protected? Or is that still being
| fought in this case?
|
| In the decision:
|
| > I agree with Defendants. Plai ntiffs allege that ChatGPT has
| been trained on "a scrape of most of the internet, " Compl. ,
| 29, which includes massive amounts of information from
| innumerable sources on almost any given subject. Plaintiffs
| have nowhere alleged that the information in their articles is
| copyrighted, nor could they do so . When a user inputs a
| question into ChatGPT, ChatGPT synthesizes the relevant
| information in its repository into an answer. Given the
| quantity of information contained in the repository, the
| likelihood that ChatGPT would output plagiarized content from
| one of Plaintiffs' articles seems remote. And while Plaintiffs
| provide third-party statistics indicating that an earlier
| version of ChatGPT generated responses containing signifi cant
| amounts of pl agiarized content, Compl. ~ 5, Plaintiffs have
| not plausibly alleged that there is a " substantial risk" that
| the current version of ChatGPT will generate a response
| plagiarizing one of Plaintiffs' articles.
| freejazz wrote:
| >Also, is there really any benefit to stripping author
| metadata? Was it basically a preprocessing step?
|
| Have you read 1202? It's all about hiding your infringement.
| Kon-Peki wrote:
| Violations of 17 USC 1202 can be punished pretty severely. It's
| not about just money, either.
|
| If, _during the trial_ , the judge thinks that OpenAI is going
| to be found to be in violation, he can order all of OpenAIs
| computer equipment be impounded. If OpenAI is found to be in
| violation, he can then order permanent destruction of the
| models and OpenAI would have to start over from scratch in a
| manner that doesn't violate the law.
|
| Whether you call that "core" or not, OpenAI cannot afford to
| lose these parts that are left of this lawsuit.
| zozbot234 wrote:
| > he can order all of OpenAIs computer equipment be
| impounded.
|
| Arrrrr matey, this is going to be fun.
| Kon-Peki wrote:
| People have been complaining about the DMCA for 2+ decades
| now. I guess it's great if you are on the winning side. But
| boy does it suck to be on the losing side.
| immibis wrote:
| And normal people can't get on the winning side. I'm
| trying to get Github to DMCA my own repositories, since
| it blocked my account and therefore I decided it no
| longer has the right to host them. Same with Stack
| Exchange.
|
| GitHub's ignored me so far, and Stack Exchange explicitly
| said no (then I sent them an even broader legal request
| under GDPR)
| ralph84 wrote:
| When you uploaded your code to GitHub you granted them a
| license to host it. You can't use DMCA against someone
| who's operating within the parameters of the license you
| granted them.
| tremon wrote:
| Their stance is that GitHub revoked that license by
| blocking their account.
| Dylan16807 wrote:
| Is it?
|
| And what would connect those two things together?
| immibis wrote:
| GitHub's terms of service specify the license is granted
| as necessary to provide the service. Since the service is
| not provided they don't have a license.
| Dylan16807 wrote:
| Hosting the code is providing the service, whether you
| have a working account or not.
|
| Also was this code open source? Your stack exchange
| contributions were open source, so they don't need any
| ToS-based permission in the first place. They have access
| under CC BY-SA.
| immibis wrote:
| It won't happen. Judges only order that punishment for the
| little guys.
| nickpsecurity wrote:
| " If OpenAI is found to be in violation, he can then order
| permanent destruction of the models and OpenAI would have to
| start over from scratch in a manner that doesn't violate the
| law."
|
| That is exactly why I suggested companies train some models
| on public domain and licensed data. That risk disappears or
| is very minimal. They could also be used for code and
| synthetic data generation without legal issues on the
| outputs.
| 3pt14159 wrote:
| The problem is that you don't get the same quality of data
| if you go about it that way. I love ChatGPT and I
| understand that we're figuring out this new media landscape
| but I really hope it doesn't turn out to neuter the models.
| The models are really well done.
| nickpsecurity wrote:
| If I steal money, I can get way more done than I do now
| by earning it legally. Yet, you won't see me regularly
| dismissing legitimate jobs by posting comparisons to what
| my numbers would look like if stealing I.P..
|
| We must start with moral and legal behavior. Within that,
| we look at what opportunities we have. Then, we pick the
| best ones. Those we can't have are a side effect of the
| tradeoffs we've made (or tolerated) in our system.
| tremon wrote:
| That is OpenAI's problem, not their victims'.
| jsheard wrote:
| That's what Adobe and Getty Images are doing with their
| image generation models, both are exclusively using their
| own licensed stock image libraries so they (and their
| users) are on pretty safe ground.
| nickpsecurity wrote:
| That's good. I hope more do. This list has those doing it
| under the Fairly Trained banner:
|
| https://www.fairlytrained.org/certified-models
| pnut wrote:
| There would be a highly embarrassing walking back of such a
| ruling, when Sam Altman flexes his political network and
| effectively overrules it.
|
| He spends his time amassing power and is well positioned to
| plow over a speed bump like that.
| james_sulivan wrote:
| Meanwhile China is using everything available to train their AI
| models
| goatlover wrote:
| We don't want to be like China.
| tokioyoyo wrote:
| Fair. But I made a comment somewhere else that, if their
| models become better than ours, they'll be incorporated into
| products. Then we're back to being depended on China for LLM
| model development as well, on top of manufacturing.
| Realistically that'll be banned because of National Security
| laws or something, but companies tend to choose the path of
| "best and cheapest" no matter what.
| paxys wrote:
| You think china is using uncensored news articles from western
| media to train its AI models?
| dmead wrote:
| Yes. And they're being marked as bad during the alignment
| process.
| warkdarrior wrote:
| For sure. The models are definitely safety tuned after pre-
| training
| zb3 wrote:
| Forecast: OpenAI and The Intercept will settle and OpenAI users
| will pay for it.
| jsheard wrote:
| Yep, the game plan is to keep settling out of court so that
| (they hope) no legal precedent is set that would effectively
| make their entire business model illegal. That works until they
| run out of money I guess, but they probably can't keep it up
| forever.
| echoangle wrote:
| Wouldn't the better method to throw all your money at one
| suit you can make an example of and try to win that one? You
| can't effectively settle every single suit if you have no
| realistic chance of winning, otherwise every single publisher
| on the internet will come and try to get their money.
| lokar wrote:
| Too high risk. Every year you can delay you keep lining
| your pockets.
| gr3ml1n wrote:
| That's a good strategy, but you have to have the right
| case. One where OpenAI feels confident they can win and
| establish favorable precedent. If the facts of the case
| aren't advantageous, it's probably not worth the risk.
| tokioyoyo wrote:
| Side question, why doesn't other companies get the same
| attention? Anthropic, xAI and others have deep pockets, and
| scraped the same data, I'm assuming? It could be a gold mine
| for all these news agencies to keep settling out of court to
| make some buck.
| ysofunny wrote:
| the very idea of "this digital asset is exclusively mine" cannot
| die soon enough
|
| let real physically tangible assets keep the exclusivity
| _problem_
|
| let's not undo the advantages unlocked by the digital internet;
| let us prevent a few from locking down this grand boon of digital
| abundance such that the problem becomes saturation of data
|
| let us say no to digital scarcity
| cess11 wrote:
| I think you'll find that most people aren't comfortable with
| this in practice. They'd like e.g. the state to be able to keep
| secrets, such as personal information regarding citizens and
| the stuff foreign spies would like to copy.
| jMyles wrote:
| Obviously we're all impacted in these perceptions by our
| bubbles, but it would surprise me at this particular moment
| in the history of US politics to find that most people favor
| the existence of the state at all, let alone its ability to
| keep secret personal information regarding citizens.
| goatlover wrote:
| Most people aren't anarchists, and think the state is
| necessary for complex societies to function.
| jMyles wrote:
| My sense is that the constituency of people who prefer
| deprecation of the US state is much larger than just
| anarchists.
| noitpmeder wrote:
| Sound like you exist in some very insulated bubbles.
| warkdarrior wrote:
| Would this deprecation of the state include disbanding
| the police and the armed forces? I'm guessing the people
| who are for the deprecation of the state would answer
| quite differently if the question specified details of
| government functions.
| jMyles wrote:
| ...I mean, police are deeply unpopular in the American
| political consciousness, and have been since prior to
| their rebrand from "slave patrols" in the 19th century.
| Surely you recall that, only four years ago, millions of
| people took to the streets calling for a completion to
| the unfinished business of abolition?
|
| Obviously the armed forces are much less despised than
| the police. But given that private gun ownership is at an
| all-time high (with woman and people of color -
| historically marginalized groups with regard to arms
| equality - making up the lion's share of the recent
| increase), I'm not sure that people are feeling
| particularly vulnerable to invasion either.
|
| Is the state really that popular in your circle? How do
| people express their esteem? Am I just missing it?
| cess11 wrote:
| Really? Are Food Not Bombs and the IWW that popular we're
| you live?
| CaptainFever wrote:
| This is, in fact, the core value of the hacker ethos. _Hacker_
| News.
|
| > The belief that information-sharing is a powerful positive
| good, and that it is an ethical duty of hackers to share their
| expertise by writing open-source code and facilitating access
| to information and to computing resources wherever possible.
|
| > Most hackers subscribe to the hacker ethic in sense 1, and
| many act on it by writing and giving away open-source software.
| A few go further and assert that all information should be free
| and any proprietary control of it is bad; this is the
| philosophy behind the GNU project.
|
| http://www.catb.org/jargon/html/H/hacker-ethic.html
|
| Perhaps if the Internet didn't kill copyright, AI will.
| (Hyperbole)
|
| (Personally my belief is more nuanced than this; I'm fine with
| very limited copyright, but my belief is closer to yours than
| the current system we have.)
| ysofunny wrote:
| oh please, then, riddle me why does my comment has -1 votes
| on "hacker" news
|
| which has indeed turned into "i-am-rich-cuz-i-own-tech-
| stock"news
| CaptainFever wrote:
| Yes, I have no idea either. I find it disappointing.
|
| I think people simply like it when data is liberated from
| corporations, but hate it when data is liberated from them.
| (Though this case is a corporation too so idk. Maybe just
| "AI bad"?)
| alwa wrote:
| I did not contribute a vote either way to your comment
| above, but I would point out that you get more of what you
| reward. Maybe the reward is monetary, like an author paid
| for spending their life writing books. Maybe it's smaller,
| more reputational or social--like people who generate
| thoughtful commentary here, or Wikipedia's editors, or
| hobbyists' forums.
|
| When you strip people's names from their words, as the
| specific count here charges; and you strip out any reason
| or even way for people to reward good work when they
| appreciate it; and you put the disembodied words in the
| mouth of a monolithic, anthropomorphized statistical model
| tuned to mimic a conversation partner... what type of
| thought is it that becomes abundant in this world you
| propose, of "data abundance"?
|
| In that world, the only people who still have incentive to
| create are the ones whose content has _negative_ value, who
| make things people otherwise wouldn't want to see:
| advertisers, spammers, propagandists, trolls... where's the
| upside of a world saturated with that?
| onetokeoverthe wrote:
| Creators freely sharing with attribution requested is
| different than creations being ruthlessly harvested and
| repurposed without permission.
|
| https://creativecommons.org/share-your-work/
| CaptainFever wrote:
| > A few go further and assert that all information should
| be free and any proprietary control of it is bad; this is
| the philosophy behind the GNU project.
|
| In this view, the ideal world is one where copyright is
| abolished (but not moral rights). So piracy is good, and
| datasets are also good.
|
| Asking creators to license their work freely is simply a
| compromise due to copyright unfortunately still existing.
| (Note that even if creators don't license their work
| freely, this view still permits you to pirate or mod it
| against their wishes.)
|
| (My view is not this extreme, but my point is that this
| view was, and hopefully is, still common amongst hackers.)
|
| I will ignore the moralizing words (eg "ruthless",
| "harvested" to mean "copied"). It's not productive to the
| conversation.
| onetokeoverthe wrote:
| If not respected, some Creators will strike, lay flat,
| not post, go underground.
|
| Ignoring moral rights of creators is the issue.
| CaptainFever wrote:
| Moral rights involve the attribution of works where
| reasonable and practical. Clearly doing so during
| inference is not reasonable or practical (you'll have to
| attribute all of humanity!) but attributing individual
| sources _is_ possible and _is_ already being done in
| cases like ChatGPT Search.
|
| So I don't think you actually mean moral rights, since
| it's not being ignored here.
|
| But the first sentence of your comment still stands
| regardless of what you meant by moral rights. To that,
| well... we're still commenting here, are we not? Despite
| it with almost 100% certainty being used to train AI.
| We're still here.
|
| And yes, funding is a thing, which I agree needs
| copyright for the most part unfortunately. But does
| training AI on, for example, a book really reduce the
| need to buy the book, if it is not reproduced?
|
| Remember, training is not just about facts, but about
| learning how humans talk, how _languages_ work, how books
| work, etc. Learning that won 't reduce the book's
| economical value.
|
| And yes, summaries may reduce the value. But summaries
| already exist. Wikipedia, Cliff's Notes. I think the main
| defense is that you can't copyright facts.
| onetokeoverthe wrote:
| _we 're still commenting here, are we not? Despite it
| with almost 100% certainty being used to train AI. We're
| still here_
|
| ?!?! Comparing and equating commenting to creative works.
| ?!?!
|
| These comments are NOT equivalent to the 17 full time
| months it took me to write a nonfiction book.
|
| Or an 8 year art project.
|
| When I give away _my_ work _I_ decide to whom and how.
| CaptainFever wrote:
| I have already covered these points in the latter
| paragraphs.
|
| You might want to take a look at
| https://www.gnu.org/philosophy/shouldbefree.en.html
| onetokeoverthe wrote:
| _I 'll_ decide the distribution of _my_ work. Be it 100
| million unique views or NOT at all.
| CaptainFever wrote:
| If you don't have a proper argument, it's best not to
| distribute your comment at all.
| onetokeoverthe wrote:
| If saying it's _my_ work is not a "proper" argument,
| that says it all.
| CaptainFever wrote:
| Indeed, owner.
|
| Look, either actually read the link and refute the points
| within, or don't. But there's no use discussing anything
| if you're unwilling to even understand and seriously
| refute a single point being made here, other than
| repeating "mine, mine, mine".
| onetokeoverthe wrote:
| Read it. Lots of nots, and no respect.
|
| _In the process, [OpenAI] trained ChatGPT not to
| acknowledge or respect copyright, not to notify ChatGPT
| users when the responses they received were protected by
| journalists' copyrights, and not to provide attribution
| when using the works of human journalists_
| CaptainFever wrote:
| No, wrong link.
|
| https://news.ycombinator.com/item?id=42279218
| a57721 wrote:
| > freely sharing with attribution requested
|
| If I share my texts/sounds/images for free, harvesting and
| regurgitating them omits the requested attribution. Even
| the most permissive CC license (excluding CC0 public
| domain) still requires an attribution.
| AlienRobot wrote:
| I think an ethical hacker is someone who uses their expertise
| to help those without.
|
| How could an ethical hacker side with OpenAI, when OpenAI is
| using its technological expertise to exploit creators
| without?
| CaptainFever wrote:
| I won't necessarily argue against that moral view, but in
| this case it is two large corporations fighting. One has
| the power of tech, the other has the power of the state
| (copyright). So I don't think that applies in this case
| specifically.
| Xelynega wrote:
| Aren't you ignoring that common law is built on
| precedent? If they win this case, that makes it a lot
| easier for people who's copyright is being infringed on
| an individual level to get justice.
| CaptainFever wrote:
| You're correct, but I think many don't realize how many
| small model trainers and fine-tuners there are currently.
| For example, PonyXL, or the many models and fine-tunes on
| CivitAI made by hobbyists.
|
| So basically the reasoning is this:
|
| - NYT vs OpenAI, neither is disenfranchied - OpenAI vs
| individual creators, creators are disenfranchised - NYT
| vs individual model trainers, model trainers are
| disenfranchised - Individual model trainers vs individual
| creators, neither are disenfranchised
|
| And if only one can win, and since the view is that
| information should be free, it biases the argument
| towards the model trainers.
| AlienRobot wrote:
| What "information" are you talking about? It's a text and
| image generator.
|
| Your argument is that it's okay to scrape content when
| you are an individual. It doesn't change the fact those
| individuals are people with technical expertise using it
| to exploit people without.
|
| If they wrote a bot to annoy people but published how
| many people got angry about it, would you say it's okay
| because that is information?
|
| You need to draw the line somewhere.
| CaptainFever wrote:
| Text and images _are_ information, though.
|
| > If they wrote a bot to annoy people but published how
| many people got angry about it, would you say it's okay
| because that is information?
|
| Kind of? It's not okay, but not because it is usage of
| information without consent (this is the "information
| should free" part), but because it is intentionally and
| unnecessarily annoying and angering people (this is the
| "don't use the information for evil" part which I _think_
| is your position).
|
| "See? Similarly, even in your view, model trainers aren't
| bad because they're using data. They're bad in general
| because they're exploiting creatives."
|
| But why is it exploitative?
|
| "They're putting the creatives out of a job." But this
| applies to automation in general.
|
| "They're putting creatives out of a job, using data they
| created." This is the strongest argument for me. It does
| intuitively feel exploitative. However, there are several
| issues:
|
| 1. Not all models or datasets do that. For instance, no
| one is visibly getting paid to write comments on HN, or
| to write fanfics on the non-commercial fanfic site AO3.
| Since the data creators are not doing it as a job in the
| first place, it does not make sense to talk about them
| losing their job because of the very same data.
|
| 2. Not all models or datasets do that. For example, spam
| filters, AI classifiers. All of this can be trained from
| the entire Internet and not be exploitative because there
| is no job replacement involved here.
|
| 3. Some models already do that, and are already well and
| morally accepted. For example, Google Translate.
|
| 4. This may be resolved by going the other way and making
| more models open source (or even leaks), so more
| creatives can use it freely, so they can make use of the
| productive power.
|
| "Because they're using creatives' information without
| consent." But as mentioned, it's not about the
| information or consent. It's about what you do with the
| information.
|
| Finally, because this is a legal case, it's also
| important to talk about the morality of using the state
| to restrict people from using information freely, even if
| their use of the information is morally wrong.
|
| If you believe in free culture as in free speech, then it
| is wrong to restrict such a use using the law, even
| though we might agree it is morally wrong. But this
| really depends if you believe in free culture as in free
| speech in the first place, which is a debate much larger
| than this.
| Xelynega wrote:
| I don't understand what the "hacker ethos" could have to do
| with defending openai's blatant stealing of people's content
| for their own profit.
|
| Openai is not sharing their data(they're keeping it private
| to profit off of), so how could it be anywhere near the
| "hacker ethos" to believe that everyone else needs to hand
| over their data to openai for free?
| CaptainFever wrote:
| Following the "GNU-flavour hacker ethos" as described, one
| concludes that it is right for OpenAI to copy data without
| restriction, it is wrong for NYT to restrict others from
| using their data, and it is _also_ wrong for OpenAI to
| restrict the sharing of their model weights or outputs for
| training.
|
| Luckily, most people seem to ignore OpenAI's hypocritical
| TOS against sharing their output weights for training. I
| would go one step further and say that they should share
| the weights completely, but I understand there's practical
| issues with that.
|
| Luckily, we can kind of "exfiltrate" the weights by
| training on their output. Or wait for someone to leak it,
| like NovelAI did.
| raincole wrote:
| Open AI scrapping copyrighted materials to make a proprietary
| model is the _exact_ opposite of what GNU promotes.
| CaptainFever wrote:
| As I mentioned in another comment:
|
| "Scrapping" (scraping) copyrighted materials is not the
| wrong thing to do.
|
| Making it proprietary is.
|
| It is important to be clear about what is wrong, so you
| don't accidentally end up fighting for copyright expansion,
| or fighting against open models.
| guerrilla wrote:
| Sure, as soon as people have an alternative way to survive.
| whywhywhywhy wrote:
| It's so weird to me seeing journalists complaining about
| copyright and people taking something they did.
|
| The whole of journalism is taking the acts of others and
| repeating them, why does a journalist claim they have the rights
| to someone else's actions when someone simply looks at something
| they did and repeat it.
|
| If no one else ever did anything, the journalist would have
| nothing to report, it's inherently about replicating the work and
| acts of others.
| barapa wrote:
| This is terribly unpersuasive
| PittleyDunkin wrote:
| > The whole of journalism is taking the acts of others and
| repeating them
|
| Hilarious (and depressing) that this is what people think
| journalists do.
| SoftTalker wrote:
| What is a "journalist?" It sounds old-fashioned.
|
| They are "content creators" now.
| echoangle wrote:
| That's a pretty narrow view of journalism. If you look into
| newspapers, it's not just a list of events but also opinion
| pieces, original research, reports etc. The main infringement
| isn't with the basic reporting of facts but with the original
| part that's done by the writer.
| razakel wrote:
| Or you could just not do illegal and/or immoral things that are
| worthy of reporting.
| hydrolox wrote:
| I understand that regulations exist and how there can be
| copyright violations, but shouldn't we be concerned that other..
| more lenient governments (mainly China) who are opposed to the US
| will use this to get ahead? If OpenAI is significantly set back.
| fny wrote:
| No. OpenAI is suspected to be worth over $150B. They can
| absolutely afford to pay people for data.
|
| Edit: People commenting need to understand that $150B is the
| _discounted value of future revenues._ So... yes they can pay
| out... yes they will be worth less... and yes that 's fair to
| the people who created the information.
|
| I can't believe there are so many apologists on HN for what
| amounts to vacuuming up peoples data for financial gain.
| suby wrote:
| OpenAI is not profitable, and to achieve what they have
| achieved they had to scrape basically the entire internet. I
| don't have a hard time believing that OpenAI could not exist
| if they had to respect copyright.
|
| https://www.cnbc.com/2024/09/27/openai-sees-5-billion-
| loss-t...
| jpalawaga wrote:
| technically open ai has respected copyright, except in the
| (few) instances they produce non-fair-use amounts of
| copyrighted material.
|
| dmca does not cover scraping.
| noitpmeder wrote:
| That's a good thing! If a company cannot raise to fame
| unless they violate laws, it should not have been there.
|
| There is plenty of public domain text that could have
| taught a LLM English.
| suby wrote:
| I'm not convinced that the economic harm to content
| creators is greater than the productivity gains and
| accessibility of knowledge for users (relative to how
| competent it would be if trained just on public domain
| text). Personally, I derive immense value from ChatGPT /
| Claude. It's borderline life changing for me.
|
| As time goes on, I imagine that it'll increasingly be the
| case that these LLM's will displace people out of their
| jobs / careers. I don't know whether the harm done will
| be greater than the benefit to society. I'm sure the
| answer will depend on who it is that you ask.
|
| > That's a good thing! If a company cannot raise to fame
| unless they violate laws, it should not have been there.
|
| Obviously given what I wrote above, I'd consider it a bad
| thing if LLM tech severely regressed due to copyright
| law. Laws are not inherently good or bad. I think you can
| make a good argument that this tech will be a net
| negative for society, but I don't think it's valid to do
| so just on the basis that it is breaking the law as it is
| today.
| DrillShopper wrote:
| > I'm not convinced that the economic harm to content
| creators is greater than the productivity gains and
| accessibility of knowledge for users (relative to how
| competent it would be if trained just on public domain
| text).
|
| Good thing whether or not something is a copyright
| violation doesn't depend on if you can make more money
| with someone else's work than they can.
| suby wrote:
| I understand the anger about large tech companies using
| others work without compensation, especially when both
| they and their users benefit financially. But this goes
| beyond economcis. LLM tech could accelerate advances in
| medicine and technology. I strongly believe that we're
| going to see societal benefits in education, healthcare,
| especially mental health support thanks to this tech.
|
| I also think that someone making money off LLM's is a
| separate question from whether or not the original
| creator has been harmed. I think many creators are going
| to benefit from better tools, and we'll likely see new
| forms of creation become viable.
|
| We already recognize that certain uses of intellectual
| property should be permitted for societies benefit. We
| have fair use doctrine, patent compulsory licensing for
| public health, research exmpetions, and public libraries.
| Transformative use is also permitted, and LLMs are
| inherently transformative. Look at the volume of data
| that they ingest compared to the final size of a trained
| model, and how fundamentally different the output format
| is from the input data.
|
| Human progress has always built upon existing knowledge.
| Consider how both Darwin and Wallace independently
| developed evolution theory at roughly the same time --
| not from isolation, but from building on the intellectual
| foundation of their era. Everything in human culture
| builds on what came before.
|
| That all being said, I'm also sure that this tech is
| going to negative impact people too. Like I said in the
| other reply, whether or not this tech is good or bad will
| depend on who you ask. I just think that we should weigh
| these costs against the potential benefits to society as
| a whole rather than simply preserving existing systems,
| or blindly following the law as if the law is inherently
| just or good. Copyright law was made before this tech was
| even imagined, and it seems fair to now evaluate whether
| the current copyright regime makes sense if it turns out
| that it'd keep us in some local maximum.
| jsheard wrote:
| The OpenAI that is assumed to keep being able to harvest
| every form of IP without compensation is valued at $150B, an
| OpenAI that has to pay for data would be worth significantly
| less. They're currently not even expecting to turn a profit
| until 2029, and that's _without_ paying for data.
|
| https://finance.yahoo.com/news/report-reveals-
| openais-44-bil...
| mrweasel wrote:
| That's not real money tough. You need actual cash on hand to
| pay for stuff, OpenAI only have the money they've been given
| by investors. I suspect that many of the investors wouldn't
| have been so keen if they knew that OpenAI would need an
| additional couple of billions a year to pay for data.
| __loam wrote:
| That's too bad that your business isn't viable without the
| largest single violation of copyright of all time.
| nickpsecurity wrote:
| That doesn't mean they have $150B to hand over. What you can
| cite is the $10 billion they got from Microsoft.
|
| I'm sure they could use a chunk of that to buy competitive
| I.P. for both companies to use for training. They can also
| pay experts to create it. They could even sell that to others
| for use in smaller models to finance creating or buying even
| more I.P. for their models.
| wvenable wrote:
| > I can't believe there are so many apologists on HN for what
| amounts to vacuuming up peoples data for financial gain.
|
| If you are consistently strict about this than almost we do
| is impossible without paying someone. You read books and now
| you have a job? Pay up. You've been vacuuming up people's
| data for years. You listened to music for the last 4 decades
| and produced "a unique" song? Yeah right.
|
| Copyright has done as much (or more) to limit human progress
| and expression than it has done to improve it.
|
| If ChatGPT is not literally copying works themselves, I don't
| even see how copyright applieas.
| mongol wrote:
| The process of reading it into their training data is a way
| of copying it. It exists somewhere and they need to copy it
| in order to ingest it.
| wvenable wrote:
| By that logic you're violating copyright by using a web
| browser.
| Suppafly wrote:
| >By that logic you're violating copyright by using a web
| browser.
|
| You would be except for the fact that publishing stuff on
| the web gives people an implicit license to download it
| for the purposes of viewing it.
| Timwi wrote:
| Not sure about US or other jurisdictions, but that's not
| how any of this works in Germany. In Germany downloading
| anything from anywhere (even a movie) is never illegal
| and does not require a license. What's illegal is
| publishing/disseminating copyrighted content without
| authorization. BitTorrenting a movie is illegal because
| you're distributing it to other torrenters. Streaming a
| movie on your website is illegal because it's public. You
| can be held liable for using a photo from the web to
| illustrate your eBay auction, not because you downloaded
| it but because you republished it.
|
| OpenAI (and Google and everyone else) is creating a
| publicly-accessible system that produces output that
| could be derived from copyrighted material.
| Tomte wrote:
| > In Germany [...]
|
| That's confidently and completely wrong.
| __loam wrote:
| The nature of the copy does actually matter.
| CJefferson wrote:
| We can, and do, choose to treat normal people different
| from billion dollar companies that are attempting to suck
| up all human output and turn it into their own personal
| profit.
|
| If they were, say, a charity doing this for the good of
| mankind, I'd have more sympathy. Shame they never were.
| tolmasky wrote:
| The way to treat them differently is not by making them
| share profits with another corporation. The logical
| endgame of all this isn't "stopping LLMs," it's Disney
| happening to own a critical mass of IP to be able to
| legally train and run LLMs that make movies, firing all
| their employees, and no smaller company ever having a
| chance in hell with competing with a literal century's
| worth of IP powering a _generative_ model.
|
| The best party about all this is that Disney initially
| took off by... making use of public domain works.
| Copyright used to last 14 years. You'd be able to create
| derivative works of most the art in your life at some
| point. Now you're never allowed to. And more often than
| not, not to grant a monopoly to the "author", but to the
| corporation that hired them. The correct analysis
| shouldn't be OpenAI vs. Intercept or Disney of whomever.
| You're just choosing kings at that point.
| IsTom wrote:
| > produced "a unique" song?
|
| People do get sued for making songs that are too similar to
| previously made songs. One defence available is that
| they've never heard it themselves before.
|
| If you want to treat AI like humans then if AI output is
| similar enough to copyrighted material it should get sued.
| Then you try to prove that it didn't ingest the original
| version somehow.
| noitpmeder wrote:
| The fact that these lawsuits aren't as simple as "is my
| copywrited work in your training set, yes or no" is
| boggling.
| __loam wrote:
| I feel like at some point the people in favor of this are
| going to realize that whether the data was ingested into
| a training set is completely immaterial to the fact that
| these companies downloaded data they don't have a license
| to use to a company server somewhere with the intention
| to use it for commercial use.
| GeoAtreides wrote:
| Ah yes, humans and LLMs are exactly the same, learning the
| same way, reasoning the same way, they're practically
| indistinguishable. So that's why it makes sense to equate
| humans reading books with computer programs ingesting and
| processing the equivalent of billions of books in literal
| days or months.
| Timwi wrote:
| While I agree with your sentiment in general, this thread
| is about the legal situation and your argument is
| unfortunately not a legal one.
| anileated wrote:
| "A person is fundamentally different from an LLM" does
| not need a legal argument and is implied by the fact that
| LLMs do not have human rights, or even anything
| comparable to animal rights.
|
| A legal argument would be needed to argue the other way.
| This argument would imply granting LLMs some degree of
| human rights, which the very industry profiting from
| these copyright violations will never let happen for
| obvious reasons.
| notahacker wrote:
| The other problem with the legal argument that it's "just
| like a person learning" is that corporations whose human
| employees have learned what copyrighted characters look
| like and then start incorporating them into their art are
| considered guilty of copyright violation, and don't get
| to deploy the "it's not an intentional copyright
| violation from someone who should have known better, it's
| just a tool outputting what the user requested"
| defence...
| DrillShopper wrote:
| > You read books and now you have a job? Pay up.
|
| It is disingenuous to imply the scale of someone buying
| books and reading them (for which the publisher and author
| are compensated) or borrowing them from the library and
| reading them (again, for which the publisher and author are
| compensated) is the same as the wholesale copying without
| permission or payment of anything not behind a pay wall on
| the Internet.
| dmead wrote:
| I'm more concerned that someone people in the tech world are
| conflating Sam Altman's interest with the national interest.
| jMyles wrote:
| Am I jazzed about Sam Altman making billions? No.
|
| Am I even more concerned about the state having control over
| the future corpus of knowledge via this doomed-in-any-case
| vector of "intellectual property"? Yes.
|
| I think it will be easier to overcome the influence of
| billionaires when we drop the pretext that the state is a
| more primal force than the internet.
| dmead wrote:
| 100% disagree. "It'll be fine bro" is not a substitute for
| having a vote over policy decisions made by the government.
| What you're talking about has a name. It starts with F and
| was very popular in Italy in the early to mid 20th century.
| jMyles wrote:
| Rapidity of Godwin's law notwithstanding, I'm not
| disputing the importance of equity in decision-making.
| But this matter is more complex than that: it's obvious
| that the internet doesn't tolerate censorship even if it
| is dressed as intellectual property. I prefer an open and
| democratic internet to one policied by childish legacy
| states, the presence of which serves only (and only
| sometimes) to drive content into open secrecy.
|
| It seems particularly unfair to equate any questioning of
| the wisdom of copyright laws (even when applied in
| situations where we might not care for the defendant, as
| with this case) with fascism.
| dmead wrote:
| It's not Godwin's law when it's correct. Just because
| it's cool and on the Internet doesn't mean you get to
| throw out people's stake in how their lives are run.
| jMyles wrote:
| > throw out people's stake in how their lives are run
|
| FWIW, you're talking to a professional musician.
| Ostensibly, the IP complex is designed to protect me. I
| cannot fathom how you can regard it as the "people's
| stake in how their lives are run". Eliminating copyright
| will almost certainly give people more control over their
| digital lives, not less.
|
| > It's not Godwin's law when it's correct.
|
| Just to be clear, you are doubling down on the claim that
| sunsetting copyright laws is tantamount to nazism?
| dmead wrote:
| Not at all. Go re read above.
| astrange wrote:
| Easy to turn one into the other, just get someone to leak the
| model weights.
| worble wrote:
| Should we also be concerned that other governments use slave
| labor (among other human rights violations) and will use that
| to get ahead?
| logicchains wrote:
| It's hysterical to compare training an ML model with slave
| labour. It's perfectly fine and accepted for a human to read
| and learn from content online without paying anything to the
| author when that content has been made available online for
| free, it's absurd to assert that it somehow becomes a human
| rights violation when the learning is done by a non-
| biological brain instead.
| Kbelicius wrote:
| > It's hysterical to compare training an ML model with
| slave labour.
|
| Nobody did that.
|
| > It's perfectly fine and accepted for a human to read and
| learn from content online without paying anything to the
| author when that content has been made available online for
| free, it's absurd to assert that it somehow becomes a human
| rights violation when the learning is done by a non-
| biological brain instead.
|
| It makes sense. There is always scale to consider in these
| things.
| totallykvothe wrote:
| worble literally did make that comparison. It is possible
| for comparisons to be made using other rhetorical devices
| than just saying "I am comparing a to b".
| Terr_ wrote:
| > worble literally did make that comparison
|
| No, their mention of "slave labor" is not a comparison to
| how LLMs work, nor an assertion of moral equivalence.
|
| Instead it is just one example to demonstrate that
| chasing economic/geopolitical competitiveness is not a
| _carte blanche_ to adopt practices that might be immoral
| or unjust.
| devsda wrote:
| Get ahead in terms of what? Do you believe that the material in
| public domain or legally available content that doesn't violate
| copyrights is not enough to research AI/LLMs or is the concern
| about purely commercial interests?
|
| China also supposedly has abusive labor practices. So, should
| other countries start relaxing their labor laws to avoid
| falling behind ?
| mu53 wrote:
| Isn't it a greater risk that creators lose their income and
| nobody is creating the content anymore?
|
| Take for instance what has happened with news because of the
| internet. Not exactly the same, but similar forces at work. It
| turned into a race to the bottom with everyone trying to
| generate content as cheaply as possible to get maximum
| engagement with tech companies siphoning revenue. Expensive,
| investigative pieces from educated journalists disappeared in
| favor of stuff that looks like spam. Pre-Internet news was
| higher quality
|
| Imagine that same effect happening for all content? Art,
| writing, academic pieces. Its a real risk that openai has
| peaked in quality
| CuriouslyC wrote:
| Lots of people create without getting paid to do it. A lot of
| music and art is unprofitable. In fact, you could argue that
| when the mainstream media companies got completely captured
| by suits with no interest in the things their companies
| invested in, that was when creativity died and we got
| consigned to genre-box superhero pop hell.
| eastbound wrote:
| I don't know. When I look at news from before, there never
| was investigative journalism. It was all opinion swaying
| editos, until alternate voices voiced their
| counternarratives. It's just not in newspapers because they
| are too politically biased to produce the two sides of
| stories that we've always asked them to do. It's on other
| media.
|
| But investigative journalism has not disappeared. If
| anything, it has grown.
| mu53 wrote:
| its changed. Investigative journalism is done by non-
| profits specializing in it, who have various financial
| motives.
|
| The budgets at newspapers used to be much larger and fund
| more investigative journalism with a clearer motive.
| BeFlatXIII wrote:
| > Isn't it a greater risk that creators lose their income and
| nobody is creating the content anymore?
|
| There are already multiple lifetimes of quality content out
| there. It's difficult to get worked up about the potential
| future losses.
| immibis wrote:
| Absolutely: if copyright is slowing down innovation, we should
| abolish copyright.
|
| Not just turn a blind eye when it's the right people doing it.
| They don't even have a legal exemption passed by Congress -
| they're just straight-up breaking the law and getting away with
| it. Which is how America works, I suppose.
| JoshTriplett wrote:
| Exactly. They rushed to violate copyright on a massive scale
| _quickly_ , and now are making the argument that it shouldn't
| apply to them and they couldn't possibly operate in
| compliance with it. As long as humans don't get to ignore
| copyright, AI shouldn't either.
| Filligree wrote:
| Humans do get to ignore copyright, when they do the same
| thing OpenAI has been doing.
| slyall wrote:
| Exactly.
|
| Should I be paying a proportion of my salary to all the
| copyright holders of the books, song, TV shows and movies
| I consumed during my life?
|
| If a Hollywood writer says she "learnt a lot about
| writing by watching the Simpsons" will Fox have an
| additional claim on her earnings?
| dijksterhuis wrote:
| > Should I be paying a proportion of my salary to all the
| copyright holders of the books, song, TV shows and movies
| I consumed during my life?
|
| you already are.
|
| a proportion of what you pay for books, music, tv shows,
| movies goes to rights holders already.
|
| any subscription to spotify/apple music/netflix/hbo; any
| book/LP/CD/DVD/VHS; any purchased digital download ... a
| portion of that sales is paid back to rights holders.
|
| so... i'm not entirely sure what your comment is trying
| to argue for.
|
| are you arguing that you should get paid a rebate for
| your salary that's already been spent on copyright
| payments to rights holders?
|
| > If a Hollywood writer says she "learnt a lot about
| writing by watching the Simpsons" will Fox have an
| additional claim on her earnings?
|
| no. that's not how copyright functions.
|
| the actual episodes of the simpsons are the copyrighted
| work.
|
| broadcasting/allowing purchases of those episode incurs
| the copyright as it involves COPYING the material itself.
|
| COPYright is about the rights of the rights holder when
| their work is COPIED, where a "work" is the material
| which the copyright applies to.
|
| merely mentioning the existence of a tv show involves
| zero copying of a registered work.
|
| being inspired by another TV show to go off and write
| your own tv show involves zero copying of the work.
|
| a hollywood writer rebroadcasting a simpsons during a TV
| interview would be a different matter. same with the
| hollywood writer just taking scenes from a simpsons
| episode and putting it into their film. that's COPYing
| the material.
|
| ---
|
| when it comes to open AI, obviously this is a legal gray
| area until courts start ruling.
|
| but the accusations are that OpenAi COPIED the
| intercept's works by downloading them.
|
| openAi transferred the work to openAi servers. they made
| a copy. and now openAi are profiting from that copy of
| the work that they took, without any permission or
| remuneration for the rights holder of the copyrighted
| work.
|
| essentially, openAI did what you're claiming is the
| status quo for you... but it's not the status quo for
| you.
|
| so yeah, your comment confuses me. hopefully you're being
| sarcastic and it's just gone completely over my head.
| slyall wrote:
| The problem is the anti-AI people who complain about AI
| are going for several steps in the chain (and often they
| are vague about which ones they are talking about at any
| point).
|
| As well as the "copying" of content some are also
| claiming that the output of a LLM should result in paying
| royalties back to the owning of the material used in
| training.
|
| So if an AI produces a sitcom script then the copyright
| holders of those tv shows it ingested should get paid
| royalties. In additional to the money paid to copy files
| around.
|
| Which leads to the precedent that if a writer creates a
| sitcom then the copyright holders of sitcoms she watched
| should get paid for "training" her.
| jashmatthews wrote:
| When humans learn and copy too closely we call that
| plagiarism. If an LLM does it how should we deal with
| that?
| chii wrote:
| > If an LLM does it how should we deal with that?
|
| why not deal with it the same way as humans have been
| dealt with in the past?
|
| If you copied an art piece using photoshop, you would've
| violated copyright. Photoshop (and adobe) itself never
| committed copyright violations.
|
| Somehow, if you swap photoshop with openAI and chatGPT,
| then people claim that the actual application itself is a
| copyright violation.
| dijksterhuis wrote:
| this isn't the same.
|
| > If you copied an art piece using photoshop, you
| would've violated copyright. Photoshop (and adobe) itself
| never committed copyright violations.
|
| the COPYing is happening on your local machine with non-
| cloud versions of Photoshop.
|
| you are making a copy, using a tool, and then
| distributing that copy.
|
| in music royalty terms, the making a copy is the
| Mechanical right, while distributing the copy is the
| Performing right.
|
| and you are liable in this case.
|
| > Somehow, if you swap photoshop with openAI and chatGPT,
| then people claim that the actual application itself is a
| copyright violation
|
| OpenAI make a copy of the original works to create
| training data.
|
| when the original works are reproduced verbatim
| (memorisation in LLMs is a thing), then that is the
| copyrighted work being distributed.
|
| mechanical and performing rights, again.
|
| but the twist is that ChatGPT does the copying on their
| servers and delivers it to your device.
|
| they are creating a new copy and distributing that copy.
|
| which makes them liable.
|
| --
|
| you are right that "ChatGPT" is just a tool.
|
| however, the interesting legal grey area with this is --
| are ChatGPT model weights an _encoded copy_ of the
| copyrighted works?
|
| that's where the conversation about the tool itself being
| a copyright violation comes in.
|
| photoshop provides no mechanism to recite The Art Of War
| out of the box. an LLM could be trained to do so (like,
| it's a hypothetical example but hopefully you get the
| point).
| chii wrote:
| > OpenAI make a copy of the original works to create
| training data.
|
| if a user is allowed to download said copy to view on
| their browser, why isn't that same right given to openAI
| to download a copy to view for them? What openAI chooses
| to do with the viewed information is up to them - such as
| distilling summary statistics, or whatever.
|
| > are ChatGPT model weights an encoded copy of the
| copyrighted works? that is indeed the most interesting
| legal gray area. I personally believe that it is not. The
| information distilled from those works do not constitute
| any copyrightable information, as it is not literary, but
| informational.
|
| It's irrelevant that you could recover the original works
| from these weights - you could recover the same original
| works from the digits of pi!
| dijksterhuis wrote:
| heads up: you may want to edit your second quote
|
| --
|
| > if a user is allowed to download said copy to view on
| their browser, why isn't that same right given to openAI
| to download a copy to view for them?
|
| whether you can download a copy from your browser doesn't
| matter. whether the work is registered as copyrighted
| does (and following on from that, who is distributing the
| work - aka allowing you to download the copy - and for
| what purposes).
|
| from the article (on phone cba to grab a quote) it makes
| clear that the Intercept's works were _not registered_ as
| copyrighted works with whatever the name of the US
| copyright office was.
|
| ergo, those works are _not copyrighted_ and, yes, they
| essentially are public domain and no remuneration is
| required ...
|
| (they cannot remove DMCA attribution information when
| distributing copies of the works though, which is what
| the case is now about.)
|
| but for all the other _registered_ works that OpenAI has
| downloaded, creating their copy, used in training data,
| which the model then reproduces as a memorised copy --
| that is copyright infringement.
|
| like, in case it's not clear, i've been responding to
| what people are saying about copyright specifically. not
| this specific case.
|
| > The information distilled from those works do not
| constitute any copyrightable information, as it is not
| literary, but informational.
|
| that's one argument.
|
| my argument would be it is a form of
| compression/decompression when the model weights result
| in memorised (read: overfitted) training data being
| regurgitated verbatim.
|
| put the specific prompt in, you get the decompressed copy
| out the other end.
|
| it's like a zip file you download with a new album of
| music. except, in this case, instead of double clicking
| on the file you have to type in a prompt to get the
| decompressed audio files (or text in LLM case)
|
| > It's irrelevant that you could recover the original
| works from these weights - you could recover the same
| original works from the digits of pi!
|
| actually, that's the whole point of courts ruling on
| this.
|
| the boundaries of what is considered reproduction is at
| question. it is up to the courts to decide on the red
| lines (probably blurry gray areas for a while).
|
| if i specifically ask a model to reproduce an exact
| song... is that different to the model doing it
| accidentally?
|
| i don't think so. but a court might see it differently.
|
| as someone who worked in music copyright, is a musician,
| sees the effects of people stealing musicians efforts all
| the time, i hope the little guys come out of this on top.
|
| sadly, they usually don't.
| dijksterhuis wrote:
| i've been avoiding replying to your comment for a bit,
| and now i realised why.
|
| edit: i am so sorry about the wall of text.
|
| > some are also claiming that the output of a LLM should
| result in paying royalties back to the owning of the
| material used in training.
|
| > So if an AI produces a sitcom script then the copyright
| holders of those tv shows it ingested should get paid
| royalties. In additional to the money paid to copy files
| around.
|
| what you're talking about here is the concept of
| "derivative works" made from other, source works.
|
| this is subtly different to reproduction of a work.
|
| see the last half of this comment for my thoughts on what
| the interesting thing courts need to work out regarding
| verbatim reproduction
| https://news.ycombinator.com/item?id=42282003
|
| in the derivative works case, it's slightly different.
|
| sampling in music is the best example i've got for this.
|
| if i take four popular songs, cut 10 seconds of each, and
| then join each of the bits together to create a new track
| -- that is a new, derivative work.
|
| but i have not sufficiently modified the source works.
| they are clearly recognisable. i am just using
| copyrighted material in a really obvious way. the core of
| my "new" work is actually just four reproductions of the
| work of other people.
|
| in that case -- that derivative work, under music
| copyright law, requires the original copyright rights
| holders to be paid for all usage and copying of their
| works.
|
| basically, a royalty split gets agreed, or there's a
| court case. and then there's a royalty split anyway
| (probably some damages too).
|
| in my case, when i make music with samples, i make sure i
| mangle and process those samples until the source work is
| no longer recognisable. i've legit made it part of my
| workflow.
|
| it's no longer the original copyrighted work. it's
| something completely new and fully unrecognisable.
|
| the issue with LLMs, not just ChatGpt, is that they will
| reproduce both verbatim and recognisably similar output
| to original source works.
|
| the original source copyrighted work is clearly
| recognisable, even if not an exact verbatim copy.
|
| and that's what you've probably seen folks talking about,
| at least it sounds like it to me.
|
| > Which leads to the precedent that if a writer creates a
| sitcom then the copyright holders of sitcoms she watched
| should get paid for "training" her.
|
| robin thicke "blurred lines" --
|
| * https://en.m.wikipedia.org/wiki/Pharrell_Williams_v._Br
| idgep...
|
| * https://en.m.wikipedia.org/wiki/Blurred_Lines (scroll
| down)
|
| yes, there is already some very limited precedent, at
| least for a narrow specific case involving sheet music in
| the US.
|
| the TL;DR IANAL version of the question at hand in the
| case was "did the defendants write the song with the
| intention of replicating a hook from the plaintiff's
| work".
|
| the jury decided, yes they did.
|
| this is different to your example in that they
| _specifically went out to replicate the that specific
| musical component of a song_.
|
| in your example, you're talking about someone having
| "watched" a thing one time and then having to pay
| royalties to those people as a result.
|
| that's more akin to "being inspired" by, and is protected
| under US law _i think_ IANAL. it came up in blurred
| lines, but, well, yeah. https://en.m.wikipedia.org/wiki/I
| dea%E2%80%93expression_dist...
|
| again, the red line of infringement / not infringement is
| ultimately up to the courts to rule on.
|
| --
|
| anyway, this is very different to what openAi/chatGpt is
| doing.
|
| openAi takes the works. chatgpt edits them according to
| user requests (feed forward through the model). then the
| output is distributed to the user. and that output could
| be considered to be a derivative work (see massive amount
| of text i wrote above, i'm sorry).
|
| LLMs aren't sitting there going "i feel like recreating a
| marvin gaye song". it takes data, encodes/decodes it,
| then produces an output. it is a mechanical process, not
| a creative one. there's no ideas here. no inspiration or
| expression.
|
| an LLM is not a human being. it is a tool, which creates
| outputs that are often strikingly similar to source
| copyrighted works.
|
| their users might be specifically asking to replicate
| songs though. in which case, openAi could be facilitating
| copyright infringement (wether through derivative works
| or not).
|
| and that's an interesting legal question by itself. are
| they facilitating the production of derivative works
| through the copying of copyrighted source works?
|
| i would say they are. and, in some cases, the derivative
| works are obviously derived.
| Suppafly wrote:
| >a proportion of what you pay for books, music, tv shows,
| movies goes to rights holders already.
|
| When I borrow a book from a friend, how do the original
| authors get paid for that?
| dijksterhuis wrote:
| they don't.
|
| borrowing a book is not creating a COPY of the book. you
| are not taking the pages, reproducing all of the text on
| those pages, and then giving that reproduction to your
| friend.
|
| that is what a COPY is. borrowing the book is not a COPY.
| you're just giving them the thing you already bought. it
| is a transfer of ownership, albeit temporarily, not a
| copy.
|
| if you were copying the files from a digitally downloaded
| album of music and giving those new copies to your friend
| (music royalties were my specialty) then technically you
| would be in breach of copyright. you have copied the
| works.
|
| but because it's such a small scale (an individual with
| another individual) it's not going to be financially
| worth it to take the case to court.
|
| so copyright holders just cut their losses with one
| friend sharing it with another friend, and focus on other
| infringements instead.
|
| which is where the whole torrenting thing comes in. if i
| can track 7000 people who have all downloaded the same
| torrented album, now i can just send a letter / court
| date to those 7000 people.
|
| the costs of enforcement are reduced because of scale.
| 7000 people, all found the same thing, in a way that can
| be tracked.
|
| and the ultimate, one person/company has download the
| works and making them available to others to download,
| without paying for the rights to make copies when
| distributing.
|
| that's the ultimate goldmine for copyright infringement
| lawsuits. and it sounds suspiciously like openAi's
| business model.
| __loam wrote:
| Yeah it turns out humans have more rights than computer
| programs and tech startups.
| triceratops wrote:
| So make OpenAI sleep 8 hours a day, pay income and
| payroll taxes with the same deductions as a natural human
| etc...
| immibis wrote:
| Copying copyrighted works?
| chii wrote:
| learning, and extracting useful information from
| copyrighted works.
|
| These extracted useful information cannot and should not
| be copyrightable.
| azemetre wrote:
| If you're arguing that OpenAI should be compelled to make
| all their technology and models free then I think we all
| agree, but it sounds like you're trying to weasel your
| way into letting a corpo get away with breaking the law
| while running away with billions.
| catlifeonmars wrote:
| That's really expensive to do, so in practice only
| wealthy humans or corporations can do so. Still seems
| unfair.
| treyd wrote:
| ChatGPT doesn't violate copyright, it's a software
| application. "Open"AI does, it's a company run by humans
| (for now).
| tpmoney wrote:
| > they're just straight-up breaking the law and getting away
| with it.
|
| So far this has not been determined and there's plenty of
| reasonable arguments that they are not breaking copyright
| law.
| blackqueeriroh wrote:
| > Absolutely: if copyright is slowing down innovation, we
| should abolish copyright.
|
| Is this sarcasm?
| immibis wrote:
| No. If something slows down innovation and suffocates the
| economy, why would you (an economically minded politician)
| keep it?
| noitpmeder wrote:
| Because the world shouldn't be run primarily by
| economically minded politicians??
|
| I'm sure China gets competitive advantages from their use
| of indentured and slave-like labor forces, and mass
| reeducation programs in camps. Should the US allow these
| things to happen? What about if a private business
| starts?
|
| But remember, they're just trying to compete with China
| on a fair playing field, so everything is permitted
| right?
| redwall_hp wrote:
| You might want to look at the constitutional amendment
| enshrining slave labor "as a punishment for a crime," and
| the world's largest prison population. Much of your food
| supply has links to prison labor.
|
| https://apnews.com/article/prison-to-plate-inmate-labor-
| inve...
|
| But don't worry, it's not considered "slave labor"
| because there's a nominal wage of a few pennies involved
| and it's not technically "forced." You just might be
| tortured with solitary confinement if you don't do it.
|
| We need to point fewer fingers and clean up the problems
| here.
| bogwog wrote:
| This type of argument is ignorant, cowardly, shortsighted, and
| regressive. Both technology and society will progress when we
| find a formula that is sustainable and incentivizes everyone
| involved to maximize their contributions without it all blowing
| up in our faces someday. Copyright law is far from perfect, but
| it protects artists who want to try and make a living from
| their work, and it incentivizes creativity that places without
| such protections usually end up just imitating.
|
| When we find that sustainable framework for AI, China or
| <insert-boogeyman-here> will just end up imitating it. Idk what
| harms you're imagining might come from that ("get ahead" is too
| vague to mean anything), but I just want to point out that that
| isn't how you become a leader in anything. Even worse, if
| _they_ are the ones who find that formula first while we take
| shortcuts to "get ahead", then we will be the ones doing the
| imitation in the end.
| gaganyaan wrote:
| Copyright is a dead man walking and that's a good thing.
| Let's applaud the end of a temporary unnatural state of
| affairs.
| Andrex wrote:
| Care to make it interesting?
|
| What do you consider "dead" and what do you consider a
| reasonable timeframe for this to occur?
|
| I have 60 or so years and $50.
| bdangubic wrote:
| I am in as well, I have 50 or so years and $60 (though
| would gladly put $600k on this... :) )
| CJefferson wrote:
| If OpenAI wants copyright to be dead, then they could just
| give out all their models copyright free.
| FpUser wrote:
| Shall we install the emperor then?
| quarterdime wrote:
| Interesting. Two key quotes:
|
| > It is unclear if the Intercept ruling will embolden other
| publications to consider DMCA litigation; few publications have
| followed in their footsteps so far. As time goes on, there is
| concern that new suits against OpenAI would be vulnerable to
| statute of limitations restrictions, particularly if news
| publishers want to cite the training data sets underlying
| ChatGPT. But the ruling is one signal that Loevy & Loevy is
| narrowing in on a specific DMCA claim that can actually stand up
| in court.
|
| > Like The Intercept, Raw Story and AlterNet are asking for
| $2,500 in damages for each instance that OpenAI allegedly removed
| DMCA-protected information in its training data sets. If damages
| are calculated based on each individual article allegedly used to
| train ChatGPT, it could quickly balloon to tens of thousands of
| violations.
|
| Tens of thousands of violations at $2500 each would amount to
| tens of millions of dollars in damages. I am not familiar with
| this field, does anyone have a sense of whether the total cost of
| retraining (without these alleged DMCA violations) might compare
| to these damages?
| Xelynega wrote:
| If you're going to retrain your model because of this ruling,
| wouldn't it make sense to remove _all_ DMCA protected content
| from your training data instead of just the one you were most
| recently sued for(especially if it sets precedent)
| jsheard wrote:
| It would make sense from a legal standpoint, but I don't
| think they could do that without massively regressing their
| models performance to the point that it would jeopardize
| their viability as a company.
| zozbot234 wrote:
| They might make it work by (1) having lots of public domain
| content, for the purpose of training their models on basic
| language use, and (2) preserving source/attribution
| metadata about what copyrighted content they do use, so
| that the models can surface this attribution to the user
| during inference. Even if the latter is not 100% foolproof,
| it might still be useful in most cases and show good faith
| intent.
| CaptainFever wrote:
| The latter one is possible with RAG solutions like
| ChatGPT Search, which do already provide sources! :)
|
| But for inference in general, I'm not sure it makes too
| much sense. Training data is not just about learning
| facts, but also (mainly?) about how language works, how
| people talk, etc. Which is kind of too fundamental to be
| attributed to, IMO. (Attribution: Humanity)
|
| But who knows. Maybe it _can_ be done for more fact-like
| stuff.
| TeMPOraL wrote:
| > _Training data is not just about learning facts, but
| also (mainly?) about how language works, how people talk,
| etc._
|
| All of that and more, all at the same time.
|
| Attribution at inference level is bound to work more-less
| the same way as humans attribute things during
| conversations: "As ${attribution} said, ${some quote}",
| or "I remember reading about it in ${attribution-1} -
| ${some statements}; ... or maybe it was in
| ${attribution-2}?...". Such attributions are often wrong,
| as people hallucinate^Wmisremember where they saw or
| heard something.
|
| RAG obviously can work for this, as well as other
| solutions involving retrieving, finding or confirming
| sources. That's just like when a human actually looks up
| the source when citing something - and has similar
| caveats and costs.
| CaptainFever wrote:
| That sounds about right. When I ask ChatGPT about "ought
| implies can" for example, it cites Kant.
| noitpmeder wrote:
| Or this point, I'm sure there is more than enough
| publically and feely usable content to "learn how
| language works". There is no need to hoover up private or
| license-unclear content if that is your goal.
| CaptainFever wrote:
| I would actually love it if that was true. It would
| reduce a lot of legal headaches for sure. But if that was
| true, why were previous GPT versions not as good at
| understanding language? I can only conclude that it's
| because that's not actually true. There's not enough
| digital public domain materials to train a LLM to
| understand language competently.
|
| Perhaps old texts in physical form, then? It'll cost a
| lot to digitize that, wouldn't it? And it wouldn't really
| be accessible to AI hobbyists. Unless the digitization is
| publicly funded or something.
|
| (A big part of this is also how insanely long copyright
| lasts (nearly a hundred years!) that keeps most of the
| Internet's material from being public domain in the first
| place, but I won't belabour that point here.)
|
| Edit:
|
| Fair enough, I can see your point. "Surely it is cheaper
| to digitize old texts or buy a license to Google Books
| than to potentially lose a court case? Either OpenAI
| really likes risking it to save a bit of money, or they
| really wanted facts not contained in old texts."
|
| And yeah, I guess that's true. I _could_ say "but facts
| aren't copyrightable" (which was supported by the judge's
| decision from the TFA), but then that's a different
| debate about whether or not people should be able to own
| facts. Which does have some inroads (e.g. a right against
| being summarized because it removes the reason to read
| original news articles).
| Xelynega wrote:
| I agree, just want to make sure "they can't stop doing
| illegal things or they wouldn't be a success" is said out
| loud instead of left to subtext.
| CuriouslyC wrote:
| They can't stop doing things some people don't like
| (people who also won't stop doing things other people
| don't like). The legality of the claims is questionable
| which is why most are getting thrown out, but we'll see
| if this narrow approach works out.
|
| I'm sure there are also a number of easy technical ways
| to "include" the metadata while mostly ignoring it during
| training that would skirt the letter of the law if
| needed.
| Xelynega wrote:
| If we really want to be technical, in common law systems
| anything is legal as long as the highest court to
| challenge it decides it's legal.
|
| I guess I should have used the phrase "common sense
| stealing in any other context" to be more precise?
| krisoft wrote:
| > I guess I should have used the phrase "common sense
| stealing in any other context" to be more precise?
|
| Clearly not common sense stealing. The Intercept was not
| deprived of their content. If OpenAI would have sneaked
| into their office and server farm and took all the hard
| drives and paper copies with the content that would be
| "common sense stealing".
| TheOtherHobbes wrote:
| Very much common sense copyright violation though.
|
| Copyright means you're not allowed to copy something
| without permission.
|
| It's that simple. There is no "Yes but you still have
| your book" argument, because copyright is _a claim on
| commercial value_ , not a claim on instantiation.
|
| There's some minimal wiggle room for fair use, but
| _clearly_ making an electronic copy and creating a
| condensed electronic version of the content - no matter
| how abstracted - and using it for profit is not fair use.
| chii wrote:
| > Copyright means you're not allowed to copy something
| without permission.
|
| but is training an AI copying? And if so, why isn't
| someone learning from said work not considered copying in
| their brain?
| hiatus wrote:
| Is training an AI the same as a person learning
| something? You haven't shown that to be the case.
| chii wrote:
| no i havent, but judging by the name - machine learning -
| i think it is the case.
| yyuugg wrote:
| Do you think starfish and jellyfish are fish? Judging by
| the name they are...
| nkrisc wrote:
| Because AI isn't a person.
| throw646577 wrote:
| > but is training an AI copying?
|
| If the AI produces chunks of training set nearly verbatim
| when prompted, it looks like copying.
|
| > And if so, why isn't someone learning from said work
| not considered copying in their brain?
|
| Well, their brain, while learning, is not someone's
| published work product, for one thing. This should be
| obvious.
|
| But their brain can violate copyright by producing work
| as the output of that learning, and be guilty of
| plagiarism, etc. If I memorise a passage of your
| copyrighted book when I am a child, and then write it in
| my book when I am an adult, I've infringed.
|
| The fact that most jurisdictions don't consider the work
| of an AI to be copyrightable does not mean it cannot ever
| be infringing.
| CuriouslyC wrote:
| The output of a model can be copyright violation. In
| fact, even if the model was never trained on copyright
| content, if I provided copyright text then told the model
| to regurgitate it verbatim that would be a violation.
|
| That does not make the model copyright violation itself.
| pera wrote:
| A product from a company is not a person. An LLM is not a
| brain.
|
| If you transcode a CD to mp3 and build a business around
| selling these files without the author's permission you'd
| be in big legal problems.
|
| Tech products that "accidentally" reproduce materials
| without the owners' permission (e.g. someone uploading La
| La Land into YouTube) have processes to remove them by
| simply filling a form. Can you do that with ChatGPT?
| lelanthran wrote:
| Because the law considers scale.
|
| It's legal for you to possess a single joint. It's not
| legal for you to possess a warehouse of 400 tons of weed.
|
| The line between legal and not legal is sometimes based
| on scale; being able to ingest a single book and learn
| from it is not the same scale as ingesting the entire
| published works of mankind and learning from it.
| krisoft wrote:
| Are you describing what the law is or what you feel the
| law should be? Because those things are not always the
| same.
| IshKebab wrote:
| It's not definitely illegal _yet_.
| yyuugg wrote:
| It's also definitely not _not_ illegal either. Case law
| is very much tbd.
| asdff wrote:
| I wonder if they can say something like "we aren't scraping
| your protected content, we are merely scraping this old
| model we don't maintain anymore and it happened to have
| protected content in it from before the ruling" then you've
| essentially won all of humanities output, as you can
| already scrape the new primary information (scientific
| articles and other datasets designed for researchers to
| freely access) and whatever junk outputted by the content
| mills is just going to be a poor summarizations of that
| primary information.
|
| Other factors that help this effort of an old model + new
| public facing data being complete, are the idea that other
| forms of media like storytelling and music have already
| converged onto certain prevailing patters. For stories we
| expect a certain style of plot development and complain
| when its missing or not as we expect. For music most
| anything being listened to is lyrics no one is deeply
| reading into put over the same old chord progressions we've
| always had. For art there are just too few of us who are
| actually going out of our way to get familiar with novel
| art vs the vast bulk of the worlds present day artistic
| effort which goes towards product advertisement, which once
| again follows certain patterns people have been publishing
| in psychological journals for decades now.
|
| In a sense we've already put out enough data and made
| enough of our world formulaic to the point where I believe
| we've set up for a perfect singularity already in terms of
| what can be generated for the average person who looks at a
| screen today. And because of that I think even a lack of
| any new training on such content wouldn't hurt openai at
| all.
| andyjohnson0 wrote:
| > I wonder if they can say something like "we aren't
| scraping your protected content, we are merely scraping
| this old model we don't maintain anymore and it happened
| to have protected content in it from before the ruling"
|
| I'm not a lawyer, but I know enough to be pretty
| confident that that wouldn't work. The law is about
| _intent_. Coming up with "one weird trick" to work-
| around a potential court ruling is unlikely to impress a
| judge.
| TeMPOraL wrote:
| Only half-serious, but: I wonder if they can dance with the
| publishers around this issue long enough for most of the
| contested text to become part of public court records, and
| then claim they're now training off that. <trollface>
| jprete wrote:
| Being part of a public court record doesn't seem like
| something that would invalidate copyright.
| criddell wrote:
| That might be the point. If your business model is built on
| reselling something you've built on stuff you've taken
| without payment or permission, maybe the business isn't
| viable.
| A4ET8a8uTh0 wrote:
| Re-training can be done, but, and it is not a small but,
| models already do exist and can be used locally suggesting
| that the milk has been spilled for too long at this point.
| Separately, neutering them effectively lowers their value as
| opposed to their non-neutered counterparts.
| ashoeafoot wrote:
| What about bombing? You could always smuggle dmca content in
| training sets hoping for a payout?
| Xelynega wrote:
| The onus is on the person collecting massive amounts of
| data and circumventing DMCA protections to ensure they're
| not doing anything illegal.
|
| "well someone snuck in some DMCA content" when sharing
| family photos and doesn't suddenly make it legal to share
| that DMCA protected content with your photos...
| sandworm101 wrote:
| But all content is DMCA protected. Avoiding copyrighted
| content means not having content as all material is
| automatically copyrighted. One would be limited to licensed
| content, which is another minefield.
|
| The apparant loophole is between copyrighted work and
| copyrighted work that is _also_ registered. But registration
| can occur at any time, meaning there is little practical
| difference. Unless you have perfect licenses for all your
| training data, which nobody does, you have to accept the risk
| of copyright suits.
| Xelynega wrote:
| Yes, that's how every other industry that redistributes
| content works.
|
| You have to license content you want to use, you cant just
| use it for free because it's on the internet.
|
| Netflix doesn't just start hosting shows and hope they
| don't get a copyright suit...
| noitpmeder wrote:
| It's insane to me that people don't agree that you need to
| require a license to train your proprietary for-profit
| model on someone else's work.
| logicchains wrote:
| Eventually we're going to have embodied models capable of live
| learning and it'll be extremely apparent how absurd the ideas of
| the copyright extremists are. Because in their world, it'd be
| illegal for an intelligent robot to watch TV, read a book or
| browse the internet like a human can, because it could remember
| what it saw and potentially regurgitate it in future.
| luqtas wrote:
| problem is when a human company profits over their scrape...
| this isn't a non-profit running out of volunteers & a total
| distant reality from autonomous robots learning it way by
| itself
|
| we are discussing an emergent cause that has social &
| ecological consequences. servers are power hungry stuff that
| may or not run on a sustainable grid (that also has a bazinga
| of problems like leaking heavy chemicals on solar panels
| production, hydro-electric plants destroying their surroundings
| etc.) & the current state of producing hardware, be a sweatshop
| or conflict minerals. lets forget creators copyright violation
| that is written in the law code of almost every existing
| country and no artist is making billions out of the abuse of
| their creation right (often they are pretty chill on getting
| their stuff mentioned, remixed and whatever)
| openrisk wrote:
| Leaving aside the hypothetical "live learning AGI" of the
| future (given that money is made or lost _now_ ), would a human
| regurgitating content that is not theirs - but presented as if
| it is - be acceptable to you?
| CuriouslyC wrote:
| I don't know about you but my friends don't tell me that Joe
| Schmoe of Reuters published a report that said XYZ copyright
| XXXX. They say "XYZ happened."
| openrisk wrote:
| In have a friend that recites all day amazingly long pieces
| of literature by heart. He says he just wrote them. He also
| produces a vast number of paintings in all styles, claiming
| he is a really talented painter.
| noitpmeder wrote:
| So when everyone in the world starts going to your friend
| instead of paying Reuters, what happens then?
| CuriouslyC wrote:
| Reuters finds a new business model? What did horse and
| buggy drivers do, pivot to romance themed city tours? I'm
| sure media companies will figure something out.
| openrisk wrote:
| So who and why will produce the news for your friend to
| steal? The horse and buggy metaphor is getting tiresome
| when its used as some sort signalling of "progress
| oriented minds" and creative destruction enthusiasts
| versus the luddites.
| Karliss wrote:
| If humanity ever gets to the point where intelligent robots are
| capable of watching TV like human can, having to adjust
| copyright laws seems like the least of problems. How about
| having to adjust almost every law related to basic "human"
| rights, ownership, being to establish a contract, being
| responsible for crimes and endless other things.
|
| But for now your washing machine cannot own other things, and
| you owning a washing machine isn't considered slavery.
| JoshTriplett wrote:
| > copyright extremists
|
| It's not copyright "extremism" to expect a level playing field.
| As long as humans have to adhere to copyright, so should AI
| companies. If you want to abolish copyright, by all means do,
| but don't give AI a special exemption.
| IAmGraydon wrote:
| Except LLMs are in no way violating copyright in the true
| sense of the word. They aren't spitting out a copy of what
| they ingested.
| JoshTriplett wrote:
| Go make a movie using the same plot as a Disney movie, that
| doesn't copy any of the text or images of the original, and
| see how far "not spitting out a copy" gets you in court.
|
| AI's approach to copyright is very much "rules for thee but
| not for me".
| bdangubic wrote:
| 100% agree. but now a million$ question - how would you
| deal with AI when it comes to copyright? what rules could
| we possibly put in place?
| JoshTriplett wrote:
| The same rules we already have: follow the license of
| whatever you use. If something doesn't have a license,
| don't use it. And if someone says "but we can't build AI
| that way!", too bad, go fix it for everyone first.
| slyall wrote:
| You have a lot of opinions on AI for somebody who has
| only read stuff in the public domain
| noitpmeder wrote:
| Most Information about AI is in the public domain....?
| slyall wrote:
| I mean "public domain" in the copyright context, not the
| "trade secret" context.
| rcxdude wrote:
| That might get you pretty far in court, actually. You'd
| have to be pretty close in terms of the sequence of
| events, character names, etc. Especially considering how
| many Disney movies are based on pre-existing stories, if
| you were, to, say, make a movie featuring talking animals
| that more or less followed the plot of Hamlet, you would
| have a decent chance of prevailing in court, given the
| resources to fight their army of lawyers.
| CuriouslyC wrote:
| It's actually the opposite of what you're saying. I can 100%
| legally do all the things that they're suing OpenAI for.
| Their whole argument is that the rules should be different
| when a machine does it than a human.
| JoshTriplett wrote:
| Only because it would be unconscionable to apply copyright
| to actual human brains, so we don't. But, for instance, you
| _absolutely can_ commit copyright violation by reading
| something and then writing something very similar, which is
| one reason why reverse engineering commonly uses clean-room
| techniques. AI training is in no way a clean room.
| nhinck3 wrote:
| You literally can't
| p_l wrote:
| You literally can.
|
| Your _ability_ to regurgitate remembered article that is
| copyrighted does not make your _brain_ a derivative work
| because removing that specific article from the training
| set is below noise floor of impact.
|
| _However_ reproducing the copyrighted material based on
| that is a violation because the created reproduction
| _does_ critically depend on that copyrighted material.
|
| (Gross simplification) Similar to how you can watch &
| read a lot of Star Wars and then even ape Ralph McQuarrie
| style in your own drawings but unless the result is
| unmistakenly related to Star Wars there's no copyright
| infringement - but there is if someone looks at the
| result and goes "that's Star Wars, isn't it?"
| nhinck3 wrote:
| Can you regurgitate billions of pieces of information to
| hundreds of thousands of other people in a way that
| competes with the source of that information?
| CuriouslyC wrote:
| If there was only one source for a piece of news ever,
| you might be able to make that argument in good faith,
| but when there are 20 outlets with competing versions of
| the same story it doesn't hold.
| IAmGraydon wrote:
| Exactly. Also core to the copyright extremists' delusional
| train of thought is the fact that they don't seem to understand
| (or admit) that ingesting, creating a model, and then
| outputting based on that model is exactly what people do when
| they observe others' works and are inspired to create.
| CuriouslyC wrote:
| You have to understand, the media companies don't give a shit
| about the logic, in fact I'm sure a lot of the people pushing
| the litigation probably see the absurdity of it. This is a
| business turf war, the stated litigation is whatever excuse
| they can find to try and go on the offensive against someone
| they see as a potential threat. The pro copyright group (big
| media) sees the writing on the wall, that they're about to get
| dunked on by big tech, and they're thrashing and screaming
| because $$$.
| tokioyoyo wrote:
| The problem is, we can't come up with a solution where both
| parties are happy, because in the end, consumers choose one
| (getting information from news agencies) or the other (getting
| information from chatgpt). So, both are fighting for life.
| 3pt14159 wrote:
| Is there a way to figure out if OpenAI ingested my blog? If the
| settlements are $2500 per article then I'll take a free used cars
| worth of payments if its available.
| jazzyjackson wrote:
| I suppose the cost of legal representation would cancel it out.
| I can just imagine a class action where anyone who posted on
| blogger.com between 2002 and 2012 eventually gets a check for
| 28 dollars.
|
| If I were more optimistic I could imagine a UBI funded by
| lawsuits against AGI, some combination of lost wages and
| intellectual property infringement. Can't figure out exactly
| how much more important an article on The Intercept had on
| shifting weights than your hacker news comments, might as well
| just pay everyone equally since we're all equally screwed
| dwattttt wrote:
| Wouldn't the point of the class action to be to dilute the
| cost of representation? If the damages per article are high
| and there's plenty of class members, I imagine the limit
| would be how much OpenAI has to pay out.
| SahAssar wrote:
| If you posted on blogger.com (or any platform with enough
| money to hire lawyers) you probably gave them a license that
| is irrevocable, non-exclusive and able to be sublicensed.
|
| There are reasons for that (they need a license to show it on
| the platform) but usually these agreements are overly broad
| because everyone except the user is covering their ass too
| much.
|
| Those licenses will now be used to sell that content/data for
| purposes that nobody thought about when you started your
| account.
| Brajeshwar wrote:
| There was a Washington Post article that did something on this
| (but not for OpenAI). Check if your website is there at
| https://www.washingtonpost.com/technology/interactive/2023/a...
|
| There should a was to check for OpenAI. But my guess is, if
| Google does it, OpenAI and others must be using the
| same/similar resource pool.
|
| My website has some 56K Token and I have no clue what that was,
| but something is there
| https://www.dropbox.com/scl/fi/2tq4mg16jup2qyk3os6ox/brajesh...
| bastloing wrote:
| Isn't this the same thing Google has been doing for years with
| their search engine? Only difference is Google keeps the data
| internal, whereas openai spits it out to you. But it's still
| scraped and stored in both cases.
| jazzyjackson wrote:
| A component of fair use is to what degree the derivative work
| displaces the original. Google's argument has always been that
| they direct traffic to the original, whereas AI summaries
| (which Google of course is just as guilty of as openai)
| completely obsoletes the original publication. The argument now
| is that the derivative work (LLM model) is transformative, ie,
| different enough that it doesn't economically compete with the
| original. I think it's a losing argument but we'll see what the
| courts arrive at.
| CaptainFever wrote:
| Is this specific to AI or specific to summaries in general?
| Do summaries, like the ones found in Wikipedia or Cliffs
| Notes, not have the same effect of making it such that people
| no longer have to view the original work as much?
|
| Note: do you mean the _model_ is transformative, or the
| _summaries_ are transformative? I think your comment holds up
| either way but I think it 's better to be clear which one you
| mean.
| LinuxBender wrote:
| In my opinion _not a lawyer_ , Google at least references where
| they obtained the data and did not regurgitate it as if they
| were the creators that created something new. _obfuscated
| plagiarism via LLM._ Some claim derivative works but I have
| always seen that as quite a stretch. People here expect me to
| cite references yet LLM 's somehow escape this level of
| scrutiny.
| efitz wrote:
| I would trust AI a lot more if it gave answers more like:
|
| _"Source A on date 1 said XYX"_
|
| _"Source B ..."_
|
| _"Synthesizing these, it seems that the majority opinion is X
| but Y is also a commonly held opinion."_
|
| Instead of what it does now, which is make extremely confident,
| unsourced statements.
|
| It looks like the copyright lawsuits are rent-seeking as much as
| anything else; another reason I hate copyright in its current
| form.
| CaptainFever wrote:
| ChatGPT Search provides this, by the way, though it relies a
| lot on the quality of Bing search results. Consensus.app does
| this but for research papers, and has been very useful to me.
| maronato wrote:
| More often than not in my experience, clicking these sources
| takes me to pages that either don't exist, don't have the
| information ChatGPT is quoting, or ChatGPT completely
| misinterpreted the content.
| akira2501 wrote:
| > which is make extremely confident,
|
| One of the results the LLM has available to itself is a
| confidence value. It should, at the very least, provide this
| along with it's answer. Perhaps if it did people would stop
| calling it 'AI'.'
| pavon wrote:
| My understanding is that this confidence value is not a
| measure of how likely something is correct/true, but more
| along the lines of how likely that sentence would be.
| Including it could be more misleading than helpful, for
| example if it is repeating commonly misunderstood
| information.
| ethernot wrote:
| I'm not sure that it's possible to produce anything
| reasonable in that space. It would need to know how far it is
| away from correct to provide a usable confidence value
| otherwise it'd just be hallucinating a number in the same way
| as the result.
|
| An analogy. Take a former commuter friend of mine, Mr Skol
| (named after his favourite breakfast drink). Seen on a
| minibus I had to get to work years ago, we shared many
| interesting conversations. Now he was a confident expert on
| everything. If asked to rate his confidence in a subject it
| would be a good 95% at least. However he spoke absolute
| garbage because his brain was rotten away from drinking Skol
| for breakfast, and the odd crack chaser. I suspect his model
| was still better than GPT-4o. But an average person could
| determine the veracity of his arguments.
|
| Thus confidence should be externally rated as an entity with
| knowledge cannot necessarily rate itself for it has bias.
| Which then brings in the question of how do you do that. Well
| you'd have to do the research you were going to do anyway and
| compare. So now you've used the AI and done the research
| which you would have had to do if the AI didn't exist. So the
| AI at this point becomes a cost over benefit if you need
| something with any level of confidence and accuracy.
|
| Thus the value is zero unless you need crap information,
| which is at least here, never, unless I'm generating a
| picture of a goat driving a train or something. And I'm not
| sure that has any commercial value. But it's fun at least.
| readyplayernull wrote:
| Do androids dream of Dunning-Kruger?
| 1vuio0pswjnm7 wrote:
| "It looks like the copyright lawsuits are rent-seeking as much
| as anything else;"
|
| If an entity charges fees for "AI", then is it "rent-seeking"
|
| (Assume that the entity is not the author of the training data
| used)
| Paul-E wrote:
| This is what a number of startups, such as Yurts.ai and
| Vannevar Labs, are racing to build for organizations. I
| wouldn't be surprised if, in 5-10 years, most large corps and
| government agencies had these sort of LLM/RAGs over their
| internal documents.
| ashoeafoot wrote:
| Will we see human washing, where Ai art or works get a "Made by
| man" final touch in some third world mechanical turk den? Would
| that add another financial detracting layer to the ai winter?
| Retric wrote:
| The law generally takes a dim view of such attempts to get
| around things like that. AI biggest defense is claiming they
| are so beneficial to society that what they are doing is fine.
| gmueckl wrote:
| That argument stands on the mother of all slippery slopes!
| Just find a way to make your product mpressive or ubiquitous
| and all of a sudden it doesn't matter how much you break the
| law along the way? That's so insane I don't even know where
| to start.
| ashoeafoot wrote:
| Worked for purdue
| Retric wrote:
| YouTube, AirBnB, Uber, and many _many_ others have all done
| stuff that's blatant against the law but gotten away with
| it due to utility.
| rcxdude wrote:
| Why not, considering copyright law specifically has fair
| use outlined for that kind of thing? It's not some
| overriding consequence of law, it's that copyright is a
| granting of a privilege to individuals and that that
| privilege is not absolute.
| gaganyaan wrote:
| That is not in any way the biggest defense
| Retric wrote:
| It's worked for many startups and court cases in the past.
| Copyright even has many explicit examples of the utility
| loophole look at say: https://en.wikipedia.org/wiki/Sony_Co
| rp._of_America_v._Unive....
| righthand wrote:
| That will probably happen to some extent if not already.
| However I think people will just stop publishing online if
| malicious corps like OpenAI are just going to harvest works for
| their own gain. People publish for personal gain, not to enrich
| the public or enrich private entities.
| Filligree wrote:
| However, I get my personal gain regardless of whether or not
| the text is also ingested into ChatGPT.
|
| In fact, since I use ChatGPT a lot, I get more gain if it is.
| righthand wrote:
| How much of your income have you spent on ChatGPT vs how
| much ChatGPT has increased your income?
| Filligree wrote:
| ChatGPT doesn't increase my income. It's useful for my
| hobbies, and probably made those more expensive.
| CuriouslyC wrote:
| There's no point in having third world mechanical turk dens do
| finishing passes on AI output unless you're trying to make it
| worse.
|
| Artists are already using AI to photobash images, and writers
| are using AI to outline and create rough drafts. The point of
| having a human in the loop is to tell the AI what is worth
| creating, then recognize where the AI output can be improved.
| If we have algorithms telling the AI what to make and content
| mill hacks smearing shit on the output to make it look more
| human, that would be the worst of both worlds.
| TheDong wrote:
| I think the point of the comment isn't to have this finishing
| layer to make things "better", but to make things "legal".
|
| Humans are allowed to synthesize a bunch of inputs together
| and produce a new novel copyrighted.
|
| An algorithm, if it mixes a bunch of copyrighted things
| together by itself, plausibly is incapable of producing a
| novel copyright, and instead inherits the old copyright.
|
| Just like Clean Room Design
| (https://en.wikipedia.org/wiki/Clean-room_design) can be used
| to re-create the same software free of the original
| copyright, I think the parent is arguing that a mechanical
| turk process could allow AI to produce the same output free
| of the original copyright.
| doctorpangloss wrote:
| Did you miss the twist at the end of the article?
|
| > Andrew Deck is a generative AI staff writer at Nieman Lab...
| ada1981 wrote:
| I'm still of the opinion that we should be allowed to train on
| any data a human can read online.
| smitelli wrote:
| ...Limited to a human's average rate of consumption, right?
| warkdarrior wrote:
| Yes, just like my download speed is capped by the speed of me
| writing bytes on paper.
| ada1981 wrote:
| Is there any other information processing we limit to human
| speed?
| cynicalsecurity wrote:
| Yeah, let's stop the progress because a few magazines no one
| cares about are unhappy.
| a57721 wrote:
| Maybe just don't use data from the unhappy magazines you don't
| care about in the first place?
| bastloing wrote:
| Who would be forever grateful if openai removed all of The
| Intercept's content permanently and refused to crawl it in the
| future?
| noitpmeder wrote:
| Sure, and then do that with every other piece of work theyre
| unfairly using
| bastloing wrote:
| I'd actually leave it up to the owner. Some want their work
| removed, some don't care.
| dr_dshiv wrote:
| Proposed: 10% tax as copyright settlement, half to pay for past
| creators and and half to pay current creative culture
| dawnerd wrote:
| Problem with that is it's too easy to effectively have 0 taxes.
| _giorgio_ wrote:
| Just train the models in Japan.
|
| *No copyright.*
|
| https://insights.manageengine.com/artificial-intelligence/th...
|
| https://news.ycombinator.com/item?id=38842788
| theropost wrote:
| Copyright laws, in many ways, feel outdated and unnecessarily
| rigid. They often appear to disproportionately favor large
| corporations without providing equivalent value to society. For
| example, brands like Disney have leveraged long-running
| copyrights to generate billions, or even tens of billions, of
| dollars through enforcement over extended periods. This approach
| feels excessive and unsustainable.
|
| The reliance on media saturation and marketing creates a
| perception that certain works are inherently more valuable than
| others, despite new creative works constantly being developed.
| While I agree that companies should have the right to profit from
| their investments, such as a $500 million movie, there should be
| reasonable limits. Once they recoup their costs, including a
| reasonable profit multiplier, the copyright could be considered
| fulfilled and should expire.
|
| Holding onto copyrights indefinitely or for excessively long
| periods serves primarily to sustain a system that benefits
| lawyers and enforcement agencies, rather than providing
| meaningful value to society. For instance, enforcing a copyright
| from the 1940s for a multinational corporation that already
| generates billions makes little sense.
|
| There should be a balanced framework. If I invest significant
| time and effort--say 100 hours--into creating a work, I should be
| entitled to earn a reasonable return, perhaps 10 times the effort
| I put in. However, after that point, the copyright should no
| longer apply. Current laws have spiraled out of control, failing
| to strike a balance between protecting creators and fostering
| innovation. Reform is long overdue.
| pclmulqdq wrote:
| I am personally in favor of strong, short copyrights (and
| patents). 90+ year copyrights are just absurd. Most movies make
| almost all their money in the first 10 years anyway, and a
| strong 10- or 20-year copyright would keep the economics of
| movie and music production largely the same.
| tolmasky wrote:
| The logical endgame of all this isn't "stopping LLMs," it's
| Disney happening to own a critical mass of IP to be able to more
| or less exclusively legally train and run LLMs that make movies,
| firing all their employees, and no smaller company ever having a
| chance in hell with competing with a literal century's worth of
| IP powering a _generative_ model. This turns the already
| egregiously generous backwards facing monopoly into a forward
| facing monopoly.
|
| None of this was ever the point of copyright. The best part about
| all this is that Disney initially took off by... making use of
| public domain works. Copyright used to last 14 years. You'd be
| able to create derivative works of most the art in your life at
| some point. Disney is ironically the proof of how _constructive_
| a system that regularly turns works over to the public domain can
| be. But thanks to lobbying by Disney, now you're never allowed to
| create a derivative work of the art in your life.
|
| Copyright is only possible because _we the public fund the
| infrastructure necessary to maintain it_. "IP" isn't self
| manifesting like physical items. Me having a cup necessarily
| means you don't have it. That's not how ideas and pictures work.
| You can infinitely perfectly duplicate them. Thus we set up laws
| and courts and police to create a complicated _simulation_ of
| physical properties for IP. Your tax dollars pay for that. The
| original deal was that in exchange, those works would enter the
| public domain to give back to society. We've gotten so far from
| that that people now argue about OpenAI "stealing" from authors,
| when the authors most of the time don't even own the works --
| their employers do! What a sad comedy where we've forgotten we
| have a stake in this too and instead argue over which corporation
| should "own" the exclusive ability to cheaply and blazingly fast
| create future works while everyone else has to do it the hard
| way.
| tzkaln wrote:
| ClosedAI etc. are certainly stealing from open source authors
| and web site creators, who do own the copyright.
|
| That said, I agree with putting more emphasis on individual
| creators, even if they have sold the copyright to corporations.
| I was appalled by the Google settlement with the author's
| guild: Why does a guild decide who owns what and who gets
| compensations?
|
| Both Disney and ClosedAI are in the wrong here. I'm the
| opposite of a Marxist, but Marx' analysis was frequently right.
| He used the term "alienation from one's work" in the context of
| factory workers. Now people are being alienated from their
| intellectual work, which is stolen, laundered and then sold
| back to them.
| ToucanLoucan wrote:
| I mean, not to be that guy, but multiple Marxist and Marxist-
| adjacent people I know and am have been out here pointing out
| how this was exactly and always what was going to happen
| since the LLM hype cycle really kicked into high-gear in
| mid-2023. And I was told in no uncertain terms, many times,
| on here, about how I was being a doomer, a pessimist, a
| luddite, etc. etc. because I and many like me saw the writing
| on the wall, immediately, that while generative AI
| represented a neat thing for folks to play with, that it
| would, like every other emerging tech, quickly become the
| sole domain of the monied entities that already run the rest
| of our lives, and this would be bad for basically everyone
| long term.
|
| And yeah it looks be shaping up as exactly that.
| trinsic2 wrote:
| Yep. And people support it like "No its not going to be
| like that this time" bullshit.
| marcosdumay wrote:
| I don't think you need to be a Marxist to accept that his
| observation that people are being alienated from their work
| capacity is spot on.
|
| The "Marxsist" name is either about believing on the parts
| that aren't true or about the political philosophy (that
| honestly, can't stand by its own without the wrong facts).
| The ones that fit reality only make one a "realist".
| notahacker wrote:
| If I thought that nobody had a chance in hell of competing with
| generative models compiled by Disney from its corpus of
| lighthearted family movies, I'd be even less keen to give
| unlimited power to create derivative works out of everything in
| history to the companies with the greatest amount of computing
| power, which in this case happens to be a subsidiary of
| Microsoft.
|
| All property rights depends on public funding the
| infrastructure to enforce them. If I believed movies derived
| from applying generative AI techniques to other movies was the
| endgame of human creativity, I'd find your endgame of it being
| the fiefdom of corporations who sold enough Windows licenses to
| own billions of dollars worth of computer hardware even more
| dystopian than it being invested in the corporations who
| originally paid for the movies...
| horsawlarway wrote:
| Two thoughts
|
| 1. You are assuming that "greatest computing power" is a
| requirement. I think we're actually seeing a trend in the
| opposite direction with recent generative art models: It
| turns out consumer grade hardware is "enough" in basically
| all cases, and renting the compute you might otherwise be
| missing is cheap. I don't buy this as the barrier.
|
| 2. Given #1, I think you are framing the conversation in a
| very duplicitive manner by pitching this as "either Microsoft
| or Disney - pick your oppressor". I'd suggest that breaking
| the current fuckery in copyright, and restoring something
| more sane (like the 7 + 7 year original timespans) would
| benefit individuals who want to make stories and art far more
| than it would benefit corporations. Disney is literaly _THE_
| reason for half of the current extensions in timespan. They
| don 't want reduced copyright - they want to curtail
| expression in favor of profit. This case just happens to have
| a convienent opponent for public sentiment.
|
| ---
|
| Further - "All property rights depends on public funding the
| infrastructure to enforce them" Is false. This is only the
| case for intellectual property rights, where nothing need be
| removed from one person for the other to be "in violation".
| catlifeonmars wrote:
| > All property rights depends on public funding the
| infrastructure to enforce them
|
| Still true, because people generally depend on the legal
| system and police departments to enforce physical property
| rights (both are publicly funded entities).
| notahacker wrote:
| I'm assuming greater computing power is a requirement
| because creating generative feature length movies (which is
| a few orders of magnitude more complex than creating PNGs)
| is something only massive corporations can afford the
| computing power to do at the moment (and the implied bar
| for excellence something we haven't reached). Certainly
| computing power and dev resource are more of a bottleneck
| to creating successful AI movies than not having access to
| the Disney canon which was the argument the OP made for
| anything other than OpenAI having unlimited rights over
| everyones content leading inexorably to a Disney generative
| AI monopoly. (another weakness of that is I'm not sure the
| Disney canon is _sufficient_ training data for Disney to
| replace their staff with generative movies, never mind
| _necessary_ for anyone else to ever make a marketable
| quality movie again)
|
| Given #1, I think the OP is framing the conversation in a
| far more duplicitous manner by assuming that in a lawsuit
| against AI which doesn't even involve Disney, the only
| beneficiary of OpenAI not winning will be Disney. Disney
| extending copyright laws in past decades has nothing to do
| with a 10 year old internet company objecting to Open AI
| stripping all the copyright information off its recent
| articles before feeding them into its generative model.
|
| > Further - "All property rights depends on public funding
| the infrastructure to enforce them" Is false. This is only
| the case for intellectual property rights, where nothing
| need be removed from one person for the other to be "in
| violation".
|
| People who don't respect physical property are just as
| capable of removing it as people who don't respect
| intellectual property are capable of copying it. In both
| cases the thing that prevents them doing so is a legal
| system and taxpayer funded enforcement against people that
| don't play by the rules.
| adventured wrote:
| AI is absolutely a further wealth concentrator by its very
| nature. It will not liberate the bottom 3/4, it will not free
| up their time by allowing them to work a lot less (as so many
| incorrectly predict now). Eric Schmidt for example has some
| particularly incorrect claims out there right now about how AI
| will widely liberate people from having to work so many hours,
| it will prove laughable in hindsight. Those that wield high-end
| AI, and the extreme cost of operations that will go with it,
| will reap extraordinary wealth over the coming century. Elon
| Musk style wealth. Very few will have access to the resources
| necessary to operate the best AI (the cost will continue to
| climb over what companies like Microsoft, Google, Amazon,
| OpenAI, etc are already spending).
|
| Sure, various AI assistants will make more aspects of your life
| automated. In that sense it'll buy people more time in their
| private lives. It won't get most people a meaningful increase
| in wealth, which is the ultimate liberator of time. That is,
| financial independence.
|
| And you can already see the ratio of people that are highly
| engaged with utilizing the latest LLMs, paying for them, versus
| either rarely or never using them (either not caring/interested
| in utilizing, or not understanding how to do so effectively).
| It's heavily bifurcated between the elites and everybody else,
| just as most tech advances have been so far. A decade ago a
| typical lower / lower middle class person could have gone to
| the library and learned JavaScript and over the course of years
| could have dramatically increased their earning potential (a
| process that takes time to be clear); for the same reason that
| rarely happens by volition, they also will not utilize LLMs to
| advance their lives despite the wide availability of them. AI
| will end up doing trivial automation tasks for the bottom 50%.
| For the top ~1/4 it will produce enormous further wealth from
| equity holdings and business process productivity gains
| (boosting wealth from business ownership, which the bottom 50%
| lacks universally).
| cxr wrote:
| > Me having a cup necessarily means you don't have it. That's
| not how ideas and pictures work. You can infinitely perfectly
| duplicate them.
|
| This is a stupid argument, no matter how often it comes up.
|
| If I hire Alice to come to my sandwich shop and make sandwiches
| for customers all week and then on payday I say, "Welp, no need
| to pay you--the sandwiches are already made!" then Alice is
| definitely out something, and I am categorically a piece of
| shit for trotting out this line of reasoning to try to justify
| not paying her.
|
| If I do the same thing except I commission Alice to do a
| drawing for a friend's birthday, then I am no less a piece of
| shit if I make my own copy once she's shown it to me and try to
| get out of paying since I'm not using "her" copy.
|
| (Notice that in neither case was the thing produced ever
| something that Alice was going to have for herself--she was
| never going to take home 400 sandwiches, nor was she ever
| interested in a portrait of your friend and his pet rabbit.)
|
| If Alice senses that I'd be interested in the drawing but might
| not be totally swayed until I see it for myself, so she
| proactively decides to make the drawing upfront before
| approaching me, then it doesn't fundamentally change the
| balance from the previous scenario--she's out no less in that
| case than if I approached her first and then refuse to pay
| after the fact. (If she was wrong and it turns it I didn't
| actually want it because she misjudged and will not be able to
| recoup her investment, fair. But that's not the same as if she
| didn't misjudge and I come to her with this bankrupt argument
| of, "You already made the drawing, and what's done is done, and
| since it's infinitely reproducible, why should I owe you
| anything?")
|
| Copyright duration is too long. But the fundamental difference
| between rivalrous possession of physical artifacts and
| infinitely reproducible ideas really needs to stay the hell out
| of these debates. It's a tired, empty talking point that
| doesn't actually address the substance of what IP laws are
| really about.
| kweingar wrote:
| This isn't really an argument though. It's an assertion that
| not honoring a commission agreement (or an employment
| contract) is equivalent to not paying for a license to an
| existing work. I tend to disagree. I could be persuaded
| otherwise, but I'd need to hear an argument other than
| "clearly these are the same thing."
| cxr wrote:
| > This isn't really an argument though. It's an assertion
| that not honoring a commission agreement
|
| Wrong. It's that (not honoring an agreement negotiated
| beforehand) _and_ an argument against treating past-action-
| thing as inherently zero-cost and /or zero-value; the fact
| that a prior agreement is an element in the offered
| scenarios doesn't negate or neutralize the rest of it (just
| like the fact that a sandwich shop is an element in one of
| the scenarios doesn't negate or neutralize the broader
| reality for non-sandwich-involving scenarios).
|
| And that's before we mention: _there _is_ such an prior
| agreement in the case of modern IP_ --you can't not contend
| with the fact that if Alice is operating in the United
| States which has existing legislation granting her a
| "temporary monopoly" on her creative output, and then she
| generates the output on the basis that she'll be protected
| by the law of the land, and then you decide that you just
| don't agree with the idea of IP, then Alice is getting
| screwed over by someone not holding up their end of the
| bargain.
| trinsic2 wrote:
| I'm sorry, the two are not even remotely the same. Saying
| it over and over again doesn't make it so.
| cxr wrote:
| You wanna, like, actually digest what I wrote there? The
| second comment here is so unlike the first that your
| "Saying it over and over again" remark can only lead to
| the conclusion that you either didn't read it or didn't
| grok it. They're two different comments about two
| different things.
|
| > I'm sorry
|
| Are you? I think you mixed up the words "insincere" and
| "sorry".
| tlb wrote:
| I find "this wasn't the point of copyright", referring to the
| motivations of 18th century legislators, unpersuasive. They
| were making up rules that were good for the foreseeable future,
| but they didn't foresee everything and certainly not everyone
| being connected to a global data network.
|
| Persuasive arguments should focus on what's good for the world
| today.
| benreesman wrote:
| Sounds like it's supposed to be hopeless to compete already:
| https://www.zenger.news/2023/06/10/sam-altman-says-its-
| hopel....
| marcosdumay wrote:
| Either copyrights exist, and people can't copy creative works
| "owned" by somebody else, or copyrights don't exist and people
| can copy those at will.
|
| "Copyrights exist, and people can copy others works if they
| have enough computing power to multiplex it with other works
| and demultiplex to get it back" is not a reasonable position.
|
| I'm all for limiting it to 15 or 20 years, and requiring
| registration. If you want to completely end them, I'd be ok
| with that too (but I think it's suboptimal). But "end them to
| rich people" isn't acceptable.
| torginus wrote:
| How sturdy is this claim?
|
| If we presume it's illegal to train on copyrighted works, but
| Wikipedia, a website summarizing the article is perfectly legal,
| then what would happen if we got LLM A to summarize the article
| and use that to train LLM B.
|
| LLM A could be trained on public domain works.
| vorpalhex wrote:
| LLM B would be a very bad LLM with only limited vocabulary and
| turn of phrase, and would tend to have a single writing tone.
|
| And no, having 5000 different summarizing LLMs doesn't help
| here.
|
| It's sort of like taking a photograph of a photograph.
| miohtama wrote:
| If it is illegal to train on copyrighted work, it will also
| benefit actors that are free to ignore laws, like Chinese
| public private companies. It means Western companies will lose
| in the AI race.
| tapoxi wrote:
| Then we don't respect their copyrights? Why is this some sort
| of unsolvable problem and the only solution is to allow mega
| corporations to sell us AI that is trained on the work of
| artists without their consent?
___________________________________________________________________
(page generated 2024-11-30 23:01 UTC)