[HN Gopher] Core copyright violation moves ahead in The Intercep...
       ___________________________________________________________________
        
       Core copyright violation moves ahead in The Intercept's lawsuit
       against OpenAI
        
       Author : giuliomagnifico
       Score  : 171 points
       Date   : 2024-11-29 13:48 UTC (9 hours ago)
        
 (HTM) web link (www.niemanlab.org)
 (TXT) w3m dump (www.niemanlab.org)
        
       | philipwhiuk wrote:
       | It's extremely lousy that you have to pre-register copyright.
       | 
       | That would make the USCO a defacto clearinghouse for news.
        
         | throw646577 wrote:
         | You don't have to pre-register copyright in any Berne
         | Convention countries. Your copyright exists from the moment you
         | create something.
         | 
         | (ETA: This paragraph below is diametrically wrong. Sorry.)
         | 
         | AFAIK in the USA, registered copyright is necessary if you want
         | to bring a lawsuit and get more than statutory damages, which
         | are capped low enough that corporations do pre-register work.
         | 
         | Not the case in all Berne countries; you don't need this in the
         | UK for example, but then the payouts are typically a lot lower
         | in the UK. Statutory copyright payouts in the USA can be enough
         | to make a difference to an individual author/artist.
         | 
         | As I understand it, OpenAI could still be on the hook for up to
         | $150K per article if it can be demonstrated it is wilful
         | copyright violation. It's hard to see how they can argue with a
         | straight face that it is accidental. But then OpenAI is, like
         | several other tech unicorns, a bad faith manufacturing device.
        
           | Loughla wrote:
           | You seem to know more about this than me. I have a family
           | member who "invented" some electronics things. He hasn't done
           | anything with the inventions (I'm pretty sure they're
           | quackery).
           | 
           | But to ensure his patent, he mailed himself a sealed copy of
           | the plans. He claims the postage date stamp will hold up in
           | court if he ever needs it.
           | 
           | Is that a thing? Or is it just more tinfoil business? It's
           | hard to tell with him.
        
             | throw646577 wrote:
             | Honestly I don't know whether that actually is a meaningful
             | thing to do anymore; it may be with patents.
             | 
             | It certainly used to be a legal device people used.
             | 
             | Essentially it is low-budget notarisation. If your family
             | member believes they have something which is timely and
             | valuable, it might be better to seek out proper legal
             | notarisation, though -- you'd consult a Notary Public:
             | 
             | https://en.wikipedia.org/wiki/Notary_public
        
             | WillAdams wrote:
             | It won't hold up in court, and given that the post-office
             | will mail/deliver unsealed letters (which may then be
             | sealed after the fact), will be viewed rather dimly.
             | 
             | Buy your family member a copy of:
             | 
             | https://www.goodreads.com/book/show/58734571-patent-it-
             | yours...
        
               | Y_Y wrote:
               | Surely the NSA will retain a copy which can be checked
        
               | Tuna-Fish wrote:
               | Even if they did, it in fact cannot be checked. There is
               | precedent that you cannot subpoena NSA for their
               | intercepts, because exactly what has been intercepted and
               | stored is privileged information.
        
               | hiatus wrote:
               | > There is precedent that you cannot subpoena NSA for
               | their intercepts
               | 
               | I know it's tangential to this thread but could you link
               | to further reading?
        
               | ysofunny wrote:
               | but only in a real democracy
        
             | cma wrote:
             | The USmoved to first to file years ago. Whoever files first
             | gets it, except if he publishes it publicly there is a
             | 1-year inventor's grace period (it would not apply to a
             | self mail or private mail to other people).
             | 
             | This is patent, not copyright.
        
             | Isamu wrote:
             | Mailing yourself using registered mail is a very old tactic
             | to establish a date for your documents using an official
             | government entity, so this can be meaningful in court.
             | However this may not provide the protection he needs.
             | Copyright law differs from patent law and he should seek
             | legal advice
        
             | dataflow wrote:
             | Even if the date is verifiable, what would it even prove?
             | If it's not public then I don't believe it can count as
             | prior art to begin with.
        
             | blibble wrote:
             | presumably the intention is to prove the existence of the
             | specific plans at a specific time?
             | 
             | I guess the modern version would be to sha256 the plans and
             | shove it into a bitcoin transaction
             | 
             | good luck explaining that to a judge
        
           | Isamu wrote:
           | Right, you can register before you bring a lawsuit. Pre-
           | registration makes your claim stronger, as does notice of
           | copyright.
        
           | dataflow wrote:
           | That's what I thought too, but why does the article say:
           | 
           | > Infringement suits require that relevant works were first
           | registered with the U.S. Copyright Office (USCO).
        
             | throw646577 wrote:
             | OK so it turns out I am wrong here! Cool.
             | 
             | I have it upside down/diametrically wrong, however you see
             | fit. Right that structures exist, exactly wrong on how they
             | apply.
             | 
             | It is registration that guarantees access to statutory
             | damages:
             | 
             | https://www.justia.com/intellectual-
             | property/copyright/infri...
             | 
             | Without registration you still have your natural copyright,
             | but you would have to try to recover the profits made by
             | the infringer.
             | 
             | Which does sound like more of an uphill struggle for The
             | Intercept, because OpenAI could maybe just say "anything we
             | earn from this is de minimis considering how much errr
             | similar material is errrr in the training set"
             | 
             | Oh man it's going to take a long time for me to get my
             | brain to accept this truth over what I'd always understood.
        
           | pera wrote:
           | > _It 's hard to see how they can argue with a straight face
           | that it is accidental_
           | 
           | It's another instance of "move fast, break things" (i.e.
           | "keep your eyes shut while breaking the law at scale")
        
             | renewiltord wrote:
             | Yes, because all progress depends upon the unreasonable
             | man.
        
       | 0xcde4c3db wrote:
       | The claim that's being allowed to proceed is under 17 USC 1202,
       | which is about stripping metadata like the title and author. Not
       | exactly "core copyright violation". Am I missing something?
        
         | anamexis wrote:
         | I read the headline as the copyright violation claim being core
         | to the lawsuit.
        
           | H8crilA wrote:
           | The plaintiffs focused on exactly this part - removal of
           | metadata - probably because it's the most likely to hold in
           | courts. One judge remarked on it pretty explicitly, saying
           | that it's just a proxy topic for the real issue of the usage
           | of copyrighted material in model training.
           | 
           | I.e., it's some legalese trick, but "everyone knows" what's
           | really at stake.
        
         | CaptainFever wrote:
         | Also, is there really any benefit to stripping author metadata?
         | Was it basically a preprocessing step?
         | 
         | It seems to me that it shouldn't really affect model quality
         | all that much, is it?
         | 
         | Also, in the amended complaint:
         | 
         | > not to notify ChatGPT users when the responses they received
         | were protected by journalists' copyrights
         | 
         | Wasn't it already quite clear that as long as the articles
         | weren't replicated, it wasn't protected? Or is that still being
         | fought in this case?
         | 
         | In the decision:
         | 
         | > I agree with Defendants. Plai ntiffs allege that ChatGPT has
         | been trained on "a scrape of most of the internet, " Compl. ,
         | 29, which includes massive amounts of information from
         | innumerable sources on almost any given subject. Plaintiffs
         | have nowhere alleged that the information in their articles is
         | copyrighted, nor could they do so . When a user inputs a
         | question into ChatGPT, ChatGPT synthesizes the relevant
         | information in its repository into an answer. Given the
         | quantity of information contained in the repository, the
         | likelihood that ChatGPT would output plagiarized content from
         | one of Plaintiffs' articles seems remote. And while Plaintiffs
         | provide third-party statistics indicating that an earlier
         | version of ChatGPT generated responses containing signifi cant
         | amounts of pl agiarized content, Compl. ~ 5, Plaintiffs have
         | not plausibly alleged that there is a " substantial risk" that
         | the current version of ChatGPT will generate a response
         | plagiarizing one of Plaintiffs' articles.
        
           | freejazz wrote:
           | >Also, is there really any benefit to stripping author
           | metadata? Was it basically a preprocessing step?
           | 
           | Have you read 1202? It's all about hiding your infringement.
        
         | Kon-Peki wrote:
         | Violations of 17 USC 1202 can be punished pretty severely. It's
         | not about just money, either.
         | 
         | If, _during the trial_ , the judge thinks that OpenAI is going
         | to be found to be in violation, he can order all of OpenAIs
         | computer equipment be impounded. If OpenAI is found to be in
         | violation, he can then order permanent destruction of the
         | models and OpenAI would have to start over from scratch in a
         | manner that doesn't violate the law.
         | 
         | Whether you call that "core" or not, OpenAI cannot afford to
         | lose these parts that are left of this lawsuit.
        
           | zozbot234 wrote:
           | > he can order all of OpenAIs computer equipment be
           | impounded.
           | 
           | Arrrrr matey, this is going to be fun.
        
             | Kon-Peki wrote:
             | People have been complaining about the DMCA for 2+ decades
             | now. I guess it's great if you are on the winning side. But
             | boy does it suck to be on the losing side.
        
               | immibis wrote:
               | And normal people can't get on the winning side. I'm
               | trying to get Github to DMCA my own repositories, since
               | it blocked my account and therefore I decided it no
               | longer has the right to host them. Same with Stack
               | Exchange.
               | 
               | GitHub's ignored me so far, and Stack Exchange explicitly
               | said no (then I sent them an even broader legal request
               | under GDPR)
        
               | ralph84 wrote:
               | When you uploaded your code to GitHub you granted them a
               | license to host it. You can't use DMCA against someone
               | who's operating within the parameters of the license you
               | granted them.
        
               | tremon wrote:
               | Their stance is that GitHub revoked that license by
               | blocking their account.
        
             | immibis wrote:
             | It won't happen. Judges only order that punishment for the
             | little guys.
        
           | nickpsecurity wrote:
           | " If OpenAI is found to be in violation, he can then order
           | permanent destruction of the models and OpenAI would have to
           | start over from scratch in a manner that doesn't violate the
           | law."
           | 
           | That is exactly why I suggested companies train some models
           | on public domain and licensed data. That risk disappears or
           | is very minimal. They could also be used for code and
           | synthetic data generation without legal issues on the
           | outputs.
        
             | 3pt14159 wrote:
             | The problem is that you don't get the same quality of data
             | if you go about it that way. I love ChatGPT and I
             | understand that we're figuring out this new media landscape
             | but I really hope it doesn't turn out to neuter the models.
             | The models are really well done.
        
               | nickpsecurity wrote:
               | If I steal money, I can get way more done than I do now
               | by earning it legally. Yet, you won't see me regularly
               | dismissing legitimate jobs by posting comparisons to what
               | my numbers would look like if stealing I.P..
               | 
               | We must start with moral and legal behavior. Within that,
               | we look at what opportunities we have. Then, we pick the
               | best ones. Those we can't have are a side effect of the
               | tradeoffs we've made (or tolerated) in our system.
        
               | tremon wrote:
               | That is OpenAI's problem, not their victims'.
        
             | jsheard wrote:
             | That's what Adobe and Getty Images are doing with their
             | image generation models, both are exclusively using their
             | own licensed stock image libraries so they (and their
             | users) are on pretty safe ground.
        
               | nickpsecurity wrote:
               | That's good. I hope more do. This list has those doing it
               | under the Fairly Trained banner:
               | 
               | https://www.fairlytrained.org/certified-models
        
       | james_sulivan wrote:
       | Meanwhile China is using everything available to train their AI
       | models
        
         | goatlover wrote:
         | We don't want to be like China.
        
           | tokioyoyo wrote:
           | Fair. But I made a comment somewhere else that, if their
           | models become better than ours, they'll be incorporated into
           | products. Then we're back to being depended on China for LLM
           | model development as well, on top of manufacturing.
           | Realistically that'll be banned because of National Security
           | laws or something, but companies tend to choose the path of
           | "best and cheapest" no matter what.
        
       | zb3 wrote:
       | Forecast: OpenAI and The Intercept will settle and OpenAI users
       | will pay for it.
        
         | jsheard wrote:
         | Yep, the game plan is to keep settling out of court so that
         | (they hope) no legal precedent is set that would effectively
         | make their entire business model illegal. That works until they
         | run out of money I guess, but they probably can't keep it up
         | forever.
        
           | echoangle wrote:
           | Wouldn't the better method to throw all your money at one
           | suit you can make an example of and try to win that one? You
           | can't effectively settle every single suit if you have no
           | realistic chance of winning, otherwise every single publisher
           | on the internet will come and try to get their money.
        
             | lokar wrote:
             | Too high risk. Every year you can delay you keep lining
             | your pockets.
        
             | gr3ml1n wrote:
             | That's a good strategy, but you have to have the right
             | case. One where OpenAI feels confident they can win and
             | establish favorable precedent. If the facts of the case
             | aren't advantageous, it's probably not worth the risk.
        
         | tokioyoyo wrote:
         | Side question, why doesn't other companies get the same
         | attention? Anthropic, xAI and others have deep pockets, and
         | scraped the same data, I'm assuming? It could be a gold mine
         | for all these news agencies to keep settling out of court to
         | make some buck.
        
       | ysofunny wrote:
       | the very idea of "this digital asset is exclusively mine" cannot
       | die soon enough
       | 
       | let real physically tangible assets keep the exclusivity
       | _problem_
       | 
       | let's not undo the advantages unlocked by the digital internet;
       | let us prevent a few from locking down this grand boon of digital
       | abundance such that the problem becomes saturation of data
       | 
       | let us say no to digital scarcity
        
         | cess11 wrote:
         | I think you'll find that most people aren't comfortable with
         | this in practice. They'd like e.g. the state to be able to keep
         | secrets, such as personal information regarding citizens and
         | the stuff foreign spies would like to copy.
        
           | jMyles wrote:
           | Obviously we're all impacted in these perceptions by our
           | bubbles, but it would surprise me at this particular moment
           | in the history of US politics to find that most people favor
           | the existence of the state at all, let alone its ability to
           | keep secret personal information regarding citizens.
        
             | goatlover wrote:
             | Most people aren't anarchists, and think the state is
             | necessary for complex societies to function.
        
               | jMyles wrote:
               | My sense is that the constituency of people who prefer
               | deprecation of the US state is much larger than just
               | anarchists.
        
             | cess11 wrote:
             | Really? Are Food Not Bombs and the IWW that popular we're
             | you live?
        
         | CaptainFever wrote:
         | This is, in fact, the core value of the hacker ethos. _Hacker_
         | News.
         | 
         | > The belief that information-sharing is a powerful positive
         | good, and that it is an ethical duty of hackers to share their
         | expertise by writing open-source code and facilitating access
         | to information and to computing resources wherever possible.
         | 
         | > Most hackers subscribe to the hacker ethic in sense 1, and
         | many act on it by writing and giving away open-source software.
         | A few go further and assert that all information should be free
         | and any proprietary control of it is bad; this is the
         | philosophy behind the GNU project.
         | 
         | http://www.catb.org/jargon/html/H/hacker-ethic.html
         | 
         | Perhaps if the Internet didn't kill copyright, AI will.
         | (Hyperbole)
         | 
         | (Personally my belief is more nuanced than this; I'm fine with
         | very limited copyright, but my belief is closer to yours than
         | the current system we have.)
        
           | ysofunny wrote:
           | oh please, then, riddle me why does my comment has -1 votes
           | on "hacker" news
           | 
           | which has indeed turned into "i-am-rich-cuz-i-own-tech-
           | stock"news
        
             | CaptainFever wrote:
             | Yes, I have no idea either. I find it disappointing.
             | 
             | I think people simply like it when data is liberated from
             | corporations, but hate it when data is liberated from them.
             | (Though this case is a corporation too so idk. Maybe just
             | "AI bad"?)
        
             | alwa wrote:
             | I did not contribute a vote either way to your comment
             | above, but I would point out that you get more of what you
             | reward. Maybe the reward is monetary, like an author paid
             | for spending their life writing books. Maybe it's smaller,
             | more reputational or social--like people who generate
             | thoughtful commentary here, or Wikipedia's editors, or
             | hobbyists' forums.
             | 
             | When you strip people's names from their words, as the
             | specific count here charges; and you strip out any reason
             | or even way for people to reward good work when they
             | appreciate it; and you put the disembodied words in the
             | mouth of a monolithic, anthropomorphized statistical model
             | tuned to mimic a conversation partner... what type of
             | thought is it that becomes abundant in this world you
             | propose, of "data abundance"?
             | 
             | In that world, the only people who still have incentive to
             | create are the ones whose content has _negative_ value, who
             | make things people otherwise wouldn't want to see:
             | advertisers, spammers, propagandists, trolls... where's the
             | upside of a world saturated with that?
        
           | onetokeoverthe wrote:
           | Creators freely sharing with attribution requested is
           | different than creations being ruthlessly harvested and
           | repurposed without permission.
           | 
           | https://creativecommons.org/share-your-work/
        
             | CaptainFever wrote:
             | > A few go further and assert that all information should
             | be free and any proprietary control of it is bad; this is
             | the philosophy behind the GNU project.
             | 
             | In this view, the ideal world is one where copyright is
             | abolished (but not moral rights). So piracy is good, and
             | datasets are also good.
             | 
             | Asking creators to license their work freely is simply a
             | compromise due to copyright unfortunately still existing.
             | (Note that even if creators don't license their work
             | freely, this view still permits you to pirate or mod it
             | against their wishes.)
             | 
             | (My view is not this extreme, but my point is that this
             | view was, and hopefully is, still common amongst hackers.)
             | 
             | I will ignore the moralizing words (eg "ruthless",
             | "harvested" to mean "copied"). It's not productive to the
             | conversation.
        
               | onetokeoverthe wrote:
               | If not respected, some Creators will strike, lay flat,
               | not post, go underground.
               | 
               | Ignoring moral rights of creators is the issue.
        
               | CaptainFever wrote:
               | Moral rights involve the attribution of works where
               | reasonable and practical. Clearly doing so during
               | inference is not reasonable or practical (you'll have to
               | attribute all of humanity!) but attributing individual
               | sources _is_ possible and _is_ already being done in
               | cases like ChatGPT Search.
               | 
               | So I don't think you actually mean moral rights, since
               | it's not being ignored here.
               | 
               | But the first sentence of your comment still stands
               | regardless of what you meant by moral rights. To that,
               | well... we're still commenting here, are we not? Despite
               | it with almost 100% certainty being used to train AI.
               | We're still here.
               | 
               | And yes, funding is a thing, which I agree needs
               | copyright for the most part unfortunately. But does
               | training AI on, for example, a book really reduce the
               | need to buy the book, if it is not reproduced?
               | 
               | Remember, training is not just about facts, but about
               | learning how humans talk, how _languages_ work, how books
               | work, etc. Learning that won 't reduce the book's
               | economical value.
               | 
               | And yes, summaries may reduce the value. But summaries
               | already exist. Wikipedia, Cliff's Notes. I think the main
               | defense is that you can't copyright facts.
        
               | onetokeoverthe wrote:
               | _we 're still commenting here, are we not? Despite it
               | with almost 100% certainty being used to train AI. We're
               | still here_
               | 
               | ?!?! Comparing and equating commenting to creative works.
               | ?!?!
               | 
               | These comments are NOT equivalent to the 17 full time
               | months it took me to write a nonfiction book.
               | 
               | Or an 8 year art project.
               | 
               | When I give away _my_ work _I_ decide to whom and how.
        
             | a57721 wrote:
             | > freely sharing with attribution requested
             | 
             | If I share my texts/sounds/images for free, harvesting and
             | regurgitating them omits the requested attribution. Even
             | the most permissive CC license (excluding CC0 public
             | domain) still requires an attribution.
        
           | AlienRobot wrote:
           | I think an ethical hacker is someone who uses their expertise
           | to help those without.
           | 
           | How could an ethical hacker side with OpenAI, when OpenAI is
           | using its technological expertise to exploit creators
           | without?
        
             | CaptainFever wrote:
             | I won't necessarily argue against that moral view, but in
             | this case it is two large corporations fighting. One has
             | the power of tech, the other has the power of the state
             | (copyright). So I don't think that applies in this case
             | specifically.
        
               | Xelynega wrote:
               | Aren't you ignoring that common law is built on
               | precedent? If they win this case, that makes it a lot
               | easier for people who's copyright is being infringed on
               | an individual level to get justice.
        
               | CaptainFever wrote:
               | You're correct, but I think many don't realize how many
               | small model trainers and fine-tuners there are currently.
               | For example, PonyXL, or the many models and fine-tunes on
               | CivitAI made by hobbyists.
               | 
               | So basically the reasoning is this:
               | 
               | - NYT vs OpenAI, neither is disenfranchied - OpenAI vs
               | individual creators, creators are disenfranchised - NYT
               | vs individual model trainers, model trainers are
               | disenfranchised - Individual model trainers vs individual
               | creators, neither are disenfranchised
               | 
               | And if only one can win, and since the view is that
               | information should be free, it biases the argument
               | towards the model trainers.
        
               | AlienRobot wrote:
               | What "information" are you talking about? It's a text and
               | image generator.
               | 
               | Your argument is that it's okay to scrape content when
               | you are an individual. It doesn't change the fact those
               | individuals are people with technical expertise using it
               | to exploit people without.
               | 
               | If they wrote a bot to annoy people but published how
               | many people got angry about it, would you say it's okay
               | because that is information?
               | 
               | You need to draw the line somewhere.
        
           | Xelynega wrote:
           | I don't understand what the "hacker ethos" could have to do
           | with defending openai's blatant stealing of people's content
           | for their own profit.
           | 
           | Openai is not sharing their data(they're keeping it private
           | to profit off of), so how could it be anywhere near the
           | "hacker ethos" to believe that everyone else needs to hand
           | over their data to openai for free?
        
             | CaptainFever wrote:
             | Following the "GNU-flavour hacker ethos" as described, one
             | concludes that it is right for OpenAI to copy data without
             | restriction, it is wrong for NYT to restrict others from
             | using their data, and it is _also_ wrong for OpenAI to
             | restrict the sharing of their model weights or outputs for
             | training.
             | 
             | Luckily, most people seem to ignore OpenAI's hypocritical
             | TOS against sharing their output weights for training. I
             | would go one step further and say that they should share
             | the weights completely, but I understand there's practical
             | issues with that.
             | 
             | Luckily, we can kind of "exfiltrate" the weights by
             | training on their output. Or wait for someone to leak it,
             | like NovelAI did.
        
       | whywhywhywhy wrote:
       | It's so weird to me seeing journalists complaining about
       | copyright and people taking something they did.
       | 
       | The whole of journalism is taking the acts of others and
       | repeating them, why does a journalist claim they have the rights
       | to someone else's actions when someone simply looks at something
       | they did and repeat it.
       | 
       | If no one else ever did anything, the journalist would have
       | nothing to report, it's inherently about replicating the work and
       | acts of others.
        
         | barapa wrote:
         | This is terribly unpersuasive
        
         | PittleyDunkin wrote:
         | > The whole of journalism is taking the acts of others and
         | repeating them
         | 
         | Hilarious (and depressing) that this is what people think
         | journalists do.
        
           | SoftTalker wrote:
           | What is a "journalist?" It sounds old-fashioned.
           | 
           | They are "content creators" now.
        
         | echoangle wrote:
         | That's a pretty narrow view of journalism. If you look into
         | newspapers, it's not just a list of events but also opinion
         | pieces, original research, reports etc. The main infringement
         | isn't with the basic reporting of facts but with the original
         | part that's done by the writer.
        
         | razakel wrote:
         | Or you could just not do illegal and/or immoral things that are
         | worthy of reporting.
        
       | hydrolox wrote:
       | I understand that regulations exist and how there can be
       | copyright violations, but shouldn't we be concerned that other..
       | more lenient governments (mainly China) who are opposed to the US
       | will use this to get ahead? If OpenAI is significantly set back.
        
         | fny wrote:
         | No. OpenAI is suspected to be worth over $150B. They can
         | absolutely afford to pay people for data.
         | 
         | Edit: People commenting need to understand that $150B is the
         | _discounted value of future revenues._ So... yes they can pay
         | out... yes they will be worth less... and yes that 's fair to
         | the people who created the information.
         | 
         | I can't believe there are so many apologists on HN for what
         | amounts to vacuuming up peoples data for financial gain.
        
           | suby wrote:
           | OpenAI is not profitable, and to achieve what they have
           | achieved they had to scrape basically the entire internet. I
           | don't have a hard time believing that OpenAI could not exist
           | if they had to respect copyright.
           | 
           | https://www.cnbc.com/2024/09/27/openai-sees-5-billion-
           | loss-t...
        
             | jpalawaga wrote:
             | technically open ai has respected copyright, except in the
             | (few) instances they produce non-fair-use amounts of
             | copyrighted material.
             | 
             | dmca does not cover scraping.
        
           | jsheard wrote:
           | The OpenAI that is assumed to keep being able to harvest
           | every form of IP without compensation is valued at $150B, an
           | OpenAI that has to pay for data would be worth significantly
           | less. They're currently not even expecting to turn a profit
           | until 2029, and that's _without_ paying for data.
           | 
           | https://finance.yahoo.com/news/report-reveals-
           | openais-44-bil...
        
           | mrweasel wrote:
           | That's not real money tough. You need actual cash on hand to
           | pay for stuff, OpenAI only have the money they've been given
           | by investors. I suspect that many of the investors wouldn't
           | have been so keen if they knew that OpenAI would need an
           | additional couple of billions a year to pay for data.
        
           | nickpsecurity wrote:
           | That doesn't mean they have $150B to hand over. What you can
           | cite is the $10 billion they got from Microsoft.
           | 
           | I'm sure they could use a chunk of that to buy competitive
           | I.P. for both companies to use for training. They can also
           | pay experts to create it. They could even sell that to others
           | for use in smaller models to finance creating or buying even
           | more I.P. for their models.
        
         | dmead wrote:
         | I'm more concerned that someone people in the tech world are
         | conflating Sam Altman's interest with the national interest.
        
           | jMyles wrote:
           | Am I jazzed about Sam Altman making billions? No.
           | 
           | Am I even more concerned about the state having control over
           | the future corpus of knowledge via this doomed-in-any-case
           | vector of "intellectual property"? Yes.
           | 
           | I think it will be easier to overcome the influence of
           | billionaires when we drop the pretext that the state is a
           | more primal force than the internet.
        
             | dmead wrote:
             | 100% disagree. "It'll be fine bro" is not a substitute for
             | having a vote over policy decisions made by the government.
             | What you're talking about has a name. It starts with F and
             | was very popular in Italy in the early to mid 20th century.
        
               | jMyles wrote:
               | Rapidity of Godwin's law notwithstanding, I'm not
               | disputing the importance of equity in decision-making.
               | But this matter is more complex than that: it's obvious
               | that the internet doesn't tolerate censorship even if it
               | is dressed as intellectual property. I prefer an open and
               | democratic internet to one policied by childish legacy
               | states, the presence of which serves only (and only
               | sometimes) to drive content into open secrecy.
               | 
               | It seems particularly unfair to equate any questioning of
               | the wisdom of copyright laws (even when applied in
               | situations where we might not care for the defendant, as
               | with this case) with fascism.
        
               | dmead wrote:
               | It's not Godwin's law when it's correct. Just because
               | it's cool and on the Internet doesn't mean you get to
               | throw out people's stake in how their lives are run.
        
               | jMyles wrote:
               | > throw out people's stake in how their lives are run
               | 
               | FWIW, you're talking to a professional musician.
               | Ostensibly, the IP complex is designed to protect me. I
               | cannot fathom how you can regard it as the "people's
               | stake in how their lives are run". Eliminating copyright
               | will almost certainly give people more control over their
               | digital lives, not less.
               | 
               | > It's not Godwin's law when it's correct.
               | 
               | Just to be clear, you are doubling down on the claim that
               | sunsetting copyright laws is tantamount to nazism?
        
         | worble wrote:
         | Should we also be concerned that other governments use slave
         | labor (among other human rights violations) and will use that
         | to get ahead?
        
           | logicchains wrote:
           | It's hysterical to compare training an ML model with slave
           | labour. It's perfectly fine and accepted for a human to read
           | and learn from content online without paying anything to the
           | author when that content has been made available online for
           | free, it's absurd to assert that it somehow becomes a human
           | rights violation when the learning is done by a non-
           | biological brain instead.
        
             | Kbelicius wrote:
             | > It's hysterical to compare training an ML model with
             | slave labour.
             | 
             | Nobody did that.
             | 
             | > It's perfectly fine and accepted for a human to read and
             | learn from content online without paying anything to the
             | author when that content has been made available online for
             | free, it's absurd to assert that it somehow becomes a human
             | rights violation when the learning is done by a non-
             | biological brain instead.
             | 
             | It makes sense. There is always scale to consider in these
             | things.
        
         | devsda wrote:
         | Get ahead in terms of what? Do you believe that the material in
         | public domain or legally available content that doesn't violate
         | copyrights is not enough to research AI/LLMs or is the concern
         | about purely commercial interests?
         | 
         | China also supposedly has abusive labor practices. So, should
         | other countries start relaxing their labor laws to avoid
         | falling behind ?
        
         | mu53 wrote:
         | Isn't it a greater risk that creators lose their income and
         | nobody is creating the content anymore?
         | 
         | Take for instance what has happened with news because of the
         | internet. Not exactly the same, but similar forces at work. It
         | turned into a race to the bottom with everyone trying to
         | generate content as cheaply as possible to get maximum
         | engagement with tech companies siphoning revenue. Expensive,
         | investigative pieces from educated journalists disappeared in
         | favor of stuff that looks like spam. Pre-Internet news was
         | higher quality
         | 
         | Imagine that same effect happening for all content? Art,
         | writing, academic pieces. Its a real risk that openai has
         | peaked in quality
        
           | CuriouslyC wrote:
           | Lots of people create without getting paid to do it. A lot of
           | music and art is unprofitable. In fact, you could argue that
           | when the mainstream media companies got completely captured
           | by suits with no interest in the things their companies
           | invested in, that was when creativity died and we got
           | consigned to genre-box superhero pop hell.
        
           | eastbound wrote:
           | I don't know. When I look at news from before, there never
           | was investigative journalism. It was all opinion swaying
           | editos, until alternate voices voiced their
           | counternarratives. It's just not in newspapers because they
           | are too politically biased to produce the two sides of
           | stories that we've always asked them to do. It's on other
           | media.
           | 
           | But investigative journalism has not disappeared. If
           | anything, it has grown.
        
         | immibis wrote:
         | Absolutely: if copyright is slowing down innovation, we should
         | abolish copyright.
         | 
         | Not just turn a blind eye when it's the right people doing it.
         | They don't even have a legal exemption passed by Congress -
         | they're just straight-up breaking the law and getting away with
         | it. Which is how America works, I suppose.
        
           | JoshTriplett wrote:
           | Exactly. They rushed to violate copyright on a massive scale
           | _quickly_ , and now are making the argument that it shouldn't
           | apply to them and they couldn't possibly operate in
           | compliance with it. As long as humans don't get to ignore
           | copyright, AI shouldn't either.
        
             | Filligree wrote:
             | Humans do get to ignore copyright, when they do the same
             | thing OpenAI has been doing.
        
               | slyall wrote:
               | Exactly.
               | 
               | Should I be paying a proportion of my salary to all the
               | copyright holders of the books, song, TV shows and movies
               | I consumed during my life?
               | 
               | If a Hollywood writer says she "learnt a lot about
               | writing by watching the Simpsons" will Fox have an
               | additional claim on her earnings?
        
               | __loam wrote:
               | Yeah it turns out humans have more rights than computer
               | programs and tech startups.
        
         | bogwog wrote:
         | This type of argument is ignorant, cowardly, shortsighted, and
         | regressive. Both technology and society will progress when we
         | find a formula that is sustainable and incentivizes everyone
         | involved to maximize their contributions without it all blowing
         | up in our faces someday. Copyright law is far from perfect, but
         | it protects artists who want to try and make a living from
         | their work, and it incentivizes creativity that places without
         | such protections usually end up just imitating.
         | 
         | When we find that sustainable framework for AI, China or
         | <insert-boogeyman-here> will just end up imitating it. Idk what
         | harms you're imagining might come from that ("get ahead" is too
         | vague to mean anything), but I just want to point out that that
         | isn't how you become a leader in anything. Even worse, if
         | _they_ are the ones who find that formula first while we take
         | shortcuts to  "get ahead", then we will be the ones doing the
         | imitation in the end.
        
           | gaganyaan wrote:
           | Copyright is a dead man walking and that's a good thing.
           | Let's applaud the end of a temporary unnatural state of
           | affairs.
        
       | quarterdime wrote:
       | Interesting. Two key quotes:
       | 
       | > It is unclear if the Intercept ruling will embolden other
       | publications to consider DMCA litigation; few publications have
       | followed in their footsteps so far. As time goes on, there is
       | concern that new suits against OpenAI would be vulnerable to
       | statute of limitations restrictions, particularly if news
       | publishers want to cite the training data sets underlying
       | ChatGPT. But the ruling is one signal that Loevy & Loevy is
       | narrowing in on a specific DMCA claim that can actually stand up
       | in court.
       | 
       | > Like The Intercept, Raw Story and AlterNet are asking for
       | $2,500 in damages for each instance that OpenAI allegedly removed
       | DMCA-protected information in its training data sets. If damages
       | are calculated based on each individual article allegedly used to
       | train ChatGPT, it could quickly balloon to tens of thousands of
       | violations.
       | 
       | Tens of thousands of violations at $2500 each would amount to
       | tens of millions of dollars in damages. I am not familiar with
       | this field, does anyone have a sense of whether the total cost of
       | retraining (without these alleged DMCA violations) might compare
       | to these damages?
        
         | Xelynega wrote:
         | If you're going to retrain your model because of this ruling,
         | wouldn't it make sense to remove _all_ DMCA protected content
         | from your training data instead of just the one you were most
         | recently sued for(especially if it sets precedent)
        
           | jsheard wrote:
           | It would make sense from a legal standpoint, but I don't
           | think they could do that without massively regressing their
           | models performance to the point that it would jeopardize
           | their viability as a company.
        
             | zozbot234 wrote:
             | They might make it work by (1) having lots of public domain
             | content, for the purpose of training their models on basic
             | language use, and (2) preserving source/attribution
             | metadata about what copyrighted content they do use, so
             | that the models can surface this attribution to the user
             | during inference. Even if the latter is not 100% foolproof,
             | it might still be useful in most cases and show good faith
             | intent.
        
               | CaptainFever wrote:
               | The latter one is possible with RAG solutions like
               | ChatGPT Search, which do already provide sources! :)
               | 
               | But for inference in general, I'm not sure it makes too
               | much sense. Training data is not just about learning
               | facts, but also (mainly?) about how language works, how
               | people talk, etc. Which is kind of too fundamental to be
               | attributed to, IMO. (Attribution: Humanity)
               | 
               | But who knows. Maybe it _can_ be done for more fact-like
               | stuff.
        
               | TeMPOraL wrote:
               | > _Training data is not just about learning facts, but
               | also (mainly?) about how language works, how people talk,
               | etc._
               | 
               | All of that and more, all at the same time.
               | 
               | Attribution at inference level is bound to work more-less
               | the same way as humans attribute things during
               | conversations: "As ${attribution} said, ${some quote}",
               | or "I remember reading about it in ${attribution-1} -
               | ${some statements}; ... or maybe it was in
               | ${attribution-2}?...". Such attributions are often wrong,
               | as people hallucinate^Wmisremember where they saw or
               | heard something.
               | 
               | RAG obviously can work for this, as well as other
               | solutions involving retrieving, finding or confirming
               | sources. That's just like when a human actually looks up
               | the source when citing something - and has similar
               | caveats and costs.
        
             | Xelynega wrote:
             | I agree, just want to make sure "they can't stop doing
             | illegal things or they wouldn't be a success" is said out
             | loud instead of left to subtext.
        
               | CuriouslyC wrote:
               | They can't stop doing things some people don't like
               | (people who also won't stop doing things other people
               | don't like). The legality of the claims is questionable
               | which is why most are getting thrown out, but we'll see
               | if this narrow approach works out.
               | 
               | I'm sure there are also a number of easy technical ways
               | to "include" the metadata while mostly ignoring it during
               | training that would skirt the letter of the law if
               | needed.
        
               | Xelynega wrote:
               | If we really want to be technical, in common law systems
               | anything is legal as long as the highest court to
               | challenge it decides it's legal.
               | 
               | I guess I should have used the phrase "common sense
               | stealing in any other context" to be more precise?
        
             | asdff wrote:
             | I wonder if they can say something like "we aren't scraping
             | your protected content, we are merely scraping this old
             | model we don't maintain anymore and it happened to have
             | protected content in it from before the ruling" then you've
             | essentially won all of humanities output, as you can
             | already scrape the new primary information (scientific
             | articles and other datasets designed for researchers to
             | freely access) and whatever junk outputted by the content
             | mills is just going to be a poor summarizations of that
             | primary information.
             | 
             | Other factors that help this effort of an old model + new
             | public facing data being complete, are the idea that other
             | forms of media like storytelling and music have already
             | converged onto certain prevailing patters. For stories we
             | expect a certain style of plot development and complain
             | when its missing or not as we expect. For music most
             | anything being listened to is lyrics no one is deeply
             | reading into put over the same old chord progressions we've
             | always had. For art there are just too few of us who are
             | actually going out of our way to get familiar with novel
             | art vs the vast bulk of the worlds present day artistic
             | effort which goes towards product advertisement, which once
             | again follows certain patterns people have been publishing
             | in psychological journals for decades now.
             | 
             | In a sense we've already put out enough data and made
             | enough of our world formulaic to the point where I believe
             | we've set up for a perfect singularity already in terms of
             | what can be generated for the average person who looks at a
             | screen today. And because of that I think even a lack of
             | any new training on such content wouldn't hurt openai at
             | all.
        
             | TeMPOraL wrote:
             | Only half-serious, but: I wonder if they can dance with the
             | publishers around this issue long enough for most of the
             | contested text to become part of public court records, and
             | then claim they're now training off that. <trollface>
        
               | jprete wrote:
               | Being part of a public court record doesn't seem like
               | something that would invalidate copyright.
        
           | A4ET8a8uTh0 wrote:
           | Re-training can be done, but, and it is not a small but,
           | models already do exist and can be used locally suggesting
           | that the milk has been spilled for too long at this point.
           | Separately, neutering them effectively lowers their value as
           | opposed to their non-neutered counterparts.
        
           | ashoeafoot wrote:
           | What about bombing? You could always smuggle dmca content in
           | training sets hoping for a payout?
        
             | Xelynega wrote:
             | The onus is on the person collecting massive amounts of
             | data and circumventing DMCA protections to ensure they're
             | not doing anything illegal.
             | 
             | "well someone snuck in some DMCA content" when sharing
             | family photos and doesn't suddenly make it legal to share
             | that DMCA protected content with your photos...
        
           | sandworm101 wrote:
           | But all content is DMCA protected. Avoiding copyrighted
           | content means not having content as all material is
           | automatically copyrighted. One would be limited to licensed
           | content, which is another minefield.
           | 
           | The apparant loophole is between copyrighted work and
           | copyrighted work that is _also_ registered. But registration
           | can occur at any time, meaning there is little practical
           | difference. Unless you have perfect licenses for all your
           | training data, which nobody does, you have to accept the risk
           | of copyright suits.
        
             | Xelynega wrote:
             | Yes, that's how every other industry that redistributes
             | content works.
             | 
             | You have to license content you want to use, you cant just
             | use it for free because it's on the internet.
             | 
             | Netflix doesn't just start hosting shows and hope they
             | don't get a copyright suit...
        
       | logicchains wrote:
       | Eventually we're going to have embodied models capable of live
       | learning and it'll be extremely apparent how absurd the ideas of
       | the copyright extremists are. Because in their world, it'd be
       | illegal for an intelligent robot to watch TV, read a book or
       | browse the internet like a human can, because it could remember
       | what it saw and potentially regurgitate it in future.
        
         | luqtas wrote:
         | problem is when a human company profits over their scrape...
         | this isn't a non-profit running out of volunteers & a total
         | distant reality from autonomous robots learning it way by
         | itself
         | 
         | we are discussing an emergent cause that has social &
         | ecological consequences. servers are power hungry stuff that
         | may or not run on a sustainable grid (that also has a bazinga
         | of problems like leaking heavy chemicals on solar panels
         | production, hydro-electric plants destroying their surroundings
         | etc.) & the current state of producing hardware, be a sweatshop
         | or conflict minerals. lets forget creators copyright violation
         | that is written in the law code of almost every existing
         | country and no artist is making billions out of the abuse of
         | their creation right (often they are pretty chill on getting
         | their stuff mentioned, remixed and whatever)
        
         | openrisk wrote:
         | Leaving aside the hypothetical "live learning AGI" of the
         | future (given that money is made or lost _now_ ), would a human
         | regurgitating content that is not theirs - but presented as if
         | it is - be acceptable to you?
        
           | CuriouslyC wrote:
           | I don't know about you but my friends don't tell me that Joe
           | Schmoe of Reuters published a report that said XYZ copyright
           | XXXX. They say "XYZ happened."
        
         | Karliss wrote:
         | If humanity ever gets to the point where intelligent robots are
         | capable of watching TV like human can, having to adjust
         | copyright laws seems like the least of problems. How about
         | having to adjust almost every law related to basic "human"
         | rights, ownership, being to establish a contract, being
         | responsible for crimes and endless other things.
         | 
         | But for now your washing machine cannot own other things, and
         | you owning a washing machine isn't considered slavery.
        
         | JoshTriplett wrote:
         | > copyright extremists
         | 
         | It's not copyright "extremism" to expect a level playing field.
         | As long as humans have to adhere to copyright, so should AI
         | companies. If you want to abolish copyright, by all means do,
         | but don't give AI a special exemption.
        
           | IAmGraydon wrote:
           | Except LLMs are in no way violating copyright in the true
           | sense of the word. They aren't spitting out a copy of what
           | they ingested.
        
             | JoshTriplett wrote:
             | Go make a movie using the same plot as a Disney movie, that
             | doesn't copy any of the text or images of the original, and
             | see how far "not spitting out a copy" gets you in court.
             | 
             | AI's approach to copyright is very much "rules for thee but
             | not for me".
        
               | bdangubic wrote:
               | 100% agree. but now a million$ question - how would you
               | deal with AI when it comes to copyright? what rules could
               | we possibly put in place?
        
               | JoshTriplett wrote:
               | The same rules we already have: follow the license of
               | whatever you use. If something doesn't have a license,
               | don't use it. And if someone says "but we can't build AI
               | that way!", too bad, go fix it for everyone first.
        
               | slyall wrote:
               | You have a lot of opinions on AI for somebody who has
               | only read stuff in the public domain
        
               | rcxdude wrote:
               | That might get you pretty far in court, actually. You'd
               | have to be pretty close in terms of the sequence of
               | events, character names, etc. Especially considering how
               | many Disney movies are based on pre-existing stories, if
               | you were, to, say, make a movie featuring talking animals
               | that more or less followed the plot of Hamlet, you would
               | have a decent chance of prevailing in court, given the
               | resources to fight their army of lawyers.
        
           | CuriouslyC wrote:
           | It's actually the opposite of what you're saying. I can 100%
           | legally do all the things that they're suing OpenAI for.
           | Their whole argument is that the rules should be different
           | when a machine does it than a human.
        
             | JoshTriplett wrote:
             | Only because it would be unconscionable to apply copyright
             | to actual human brains, so we don't. But, for instance, you
             | _absolutely can_ commit copyright violation by reading
             | something and then writing something very similar, which is
             | one reason why reverse engineering commonly uses clean-room
             | techniques. AI training is in no way a clean room.
        
         | IAmGraydon wrote:
         | Exactly. Also core to the copyright extremists' delusional
         | train of thought is the fact that they don't seem to understand
         | (or admit) that ingesting, creating a model, and then
         | outputting based on that model is exactly what people do when
         | they observe others' works and are inspired to create.
        
         | CuriouslyC wrote:
         | You have to understand, the media companies don't give a shit
         | about the logic, in fact I'm sure a lot of the people pushing
         | the litigation probably see the absurdity of it. This is a
         | business turf war, the stated litigation is whatever excuse
         | they can find to try and go on the offensive against someone
         | they see as a potential threat. The pro copyright group (big
         | media) sees the writing on the wall, that they're about to get
         | dunked on by big tech, and they're thrashing and screaming
         | because $$$.
        
         | tokioyoyo wrote:
         | The problem is, we can't come up with a solution where both
         | parties are happy, because in the end, consumers choose one
         | (getting information from news agencies) or the other (getting
         | information from chatgpt). So, both are fighting for life.
        
       | 3pt14159 wrote:
       | Is there a way to figure out if OpenAI ingested my blog? If the
       | settlements are $2500 per article then I'll take a free used cars
       | worth of payments if its available.
        
         | jazzyjackson wrote:
         | I suppose the cost of legal representation would cancel it out.
         | I can just imagine a class action where anyone who posted on
         | blogger.com between 2002 and 2012 eventually gets a check for
         | 28 dollars.
         | 
         | If I were more optimistic I could imagine a UBI funded by
         | lawsuits against AGI, some combination of lost wages and
         | intellectual property infringement. Can't figure out exactly
         | how much more important an article on The Intercept had on
         | shifting weights than your hacker news comments, might as well
         | just pay everyone equally since we're all equally screwed
        
           | dwattttt wrote:
           | Wouldn't the point of the class action to be to dilute the
           | cost of representation? If the damages per article are high
           | and there's plenty of class members, I imagine the limit
           | would be how much OpenAI has to pay out.
        
           | SahAssar wrote:
           | If you posted on blogger.com (or any platform with enough
           | money to hire lawyers) you probably gave them a license that
           | is irrevocable, non-exclusive and able to be sublicensed.
        
       | bastloing wrote:
       | Isn't this the same thing Google has been doing for years with
       | their search engine? Only difference is Google keeps the data
       | internal, whereas openai spits it out to you. But it's still
       | scraped and stored in both cases.
        
         | jazzyjackson wrote:
         | A component of fair use is to what degree the derivative work
         | displaces the original. Google's argument has always been that
         | they direct traffic to the original, whereas AI summaries
         | (which Google of course is just as guilty of as openai)
         | completely obsoletes the original publication. The argument now
         | is that the derivative work (LLM model) is transformative, ie,
         | different enough that it doesn't economically compete with the
         | original. I think it's a losing argument but we'll see what the
         | courts arrive at.
        
           | CaptainFever wrote:
           | Is this specific to AI or specific to summaries in general?
           | Do summaries, like the ones found in Wikipedia or Cliffs
           | Notes, not have the same effect of making it such that people
           | no longer have to view the original work as much?
           | 
           | Note: do you mean the _model_ is transformative, or the
           | _summaries_ are transformative? I think your comment holds up
           | either way but I think it 's better to be clear which one you
           | mean.
        
         | LinuxBender wrote:
         | In my opinion _not a lawyer_ , Google at least references where
         | they obtained the data and did not regurgitate it as if they
         | were the creators that created something new. _obfuscated
         | plagiarism via LLM._ Some claim derivative works but I have
         | always seen that as quite a stretch. People here expect me to
         | cite references yet LLM 's somehow escape this level of
         | scrutiny.
        
       | efitz wrote:
       | I would trust AI a lot more if it gave answers more like:
       | 
       |  _"Source A on date 1 said XYX"_
       | 
       |  _"Source B ..."_
       | 
       |  _"Synthesizing these, it seems that the majority opinion is X
       | but Y is also a commonly held opinion."_
       | 
       | Instead of what it does now, which is make extremely confident,
       | unsourced statements.
       | 
       | It looks like the copyright lawsuits are rent-seeking as much as
       | anything else; another reason I hate copyright in its current
       | form.
        
         | CaptainFever wrote:
         | ChatGPT Search provides this, by the way, though it relies a
         | lot on the quality of Bing search results. Consensus.app does
         | this but for research papers, and has been very useful to me.
        
           | maronato wrote:
           | More often than not in my experience, clicking these sources
           | takes me to pages that either don't exist, don't have the
           | information ChatGPT is quoting, or ChatGPT completely
           | misinterpreted the content.
        
         | akira2501 wrote:
         | > which is make extremely confident,
         | 
         | One of the results the LLM has available to itself is a
         | confidence value. It should, at the very least, provide this
         | along with it's answer. Perhaps if it did people would stop
         | calling it 'AI'.'
        
       | ashoeafoot wrote:
       | Will we see human washing, where Ai art or works get a "Made by
       | man" final touch in some third world mechanical turk den? Would
       | that add another financial detracting layer to the ai winter?
        
         | Retric wrote:
         | The law generally takes a dim view of such attempts to get
         | around things like that. AI biggest defense is claiming they
         | are so beneficial to society that what they are doing is fine.
        
           | gmueckl wrote:
           | That argument stands on the mother of all slippery slopes!
           | Just find a way to make your product mpressive or ubiquitous
           | and all of a sudden it doesn't matter how much you break the
           | law along the way? That's so insane I don't even know where
           | to start.
        
             | ashoeafoot wrote:
             | Worked for purdue
        
             | Retric wrote:
             | YouTube, AirBnB, Uber, and many _many_ others have all done
             | stuff that's blatant against the law but gotten away with
             | it due to utility.
        
             | rcxdude wrote:
             | Why not, considering copyright law specifically has fair
             | use outlined for that kind of thing? It's not some
             | overriding consequence of law, it's that copyright is a
             | granting of a privilege to individuals and that that
             | privilege is not absolute.
        
           | gaganyaan wrote:
           | That is not in any way the biggest defense
        
             | Retric wrote:
             | It's worked for many startups and court cases in the past.
             | Copyright even has many explicit examples of the utility
             | loophole look at say: https://en.wikipedia.org/wiki/Sony_Co
             | rp._of_America_v._Unive....
        
         | righthand wrote:
         | That will probably happen to some extent if not already.
         | However I think people will just stop publishing online if
         | malicious corps like OpenAI are just going to harvest works for
         | their own gain. People publish for personal gain, not to enrich
         | the public or enrich private entities.
        
           | Filligree wrote:
           | However, I get my personal gain regardless of whether or not
           | the text is also ingested into ChatGPT.
           | 
           | In fact, since I use ChatGPT a lot, I get more gain if it is.
        
         | CuriouslyC wrote:
         | There's no point in having third world mechanical turk dens do
         | finishing passes on AI output unless you're trying to make it
         | worse.
         | 
         | Artists are already using AI to photobash images, and writers
         | are using AI to outline and create rough drafts. The point of
         | having a human in the loop is to tell the AI what is worth
         | creating, then recognize where the AI output can be improved.
         | If we have algorithms telling the AI what to make and content
         | mill hacks smearing shit on the output to make it look more
         | human, that would be the worst of both worlds.
        
       | ada1981 wrote:
       | I'm still of the opinion that we should be allowed to train on
       | any data a human can read online.
        
       | cynicalsecurity wrote:
       | Yeah, let's stop the progress because a few magazines no one
       | cares about are unhappy.
        
         | a57721 wrote:
         | Maybe just don't use data from the unhappy magazines you don't
         | care about in the first place?
        
       | bastloing wrote:
       | Who would be forever grateful if openai removed all of The
       | Intercept's content permanently and refused to crawl it in the
       | future?
        
       ___________________________________________________________________
       (page generated 2024-11-29 23:00 UTC)