[HN Gopher] Judge dismisses DMCA copyright claim in GitHub Copil...
___________________________________________________________________
Judge dismisses DMCA copyright claim in GitHub Copilot suit
Author : samspenc
Score : 344 points
Date : 2024-07-09 18:25 UTC (1 days ago)
(HTM) web link (www.theregister.com)
(TXT) w3m dump (www.theregister.com)
| rolph wrote:
| copilot was apparently snipping license bearing comments, and
| applying "semantic" variations of the remaining code.
|
| i would package the entire code as a series of comments, [ideally
| this would be snipped by the pliagarists] leaving a snippet of
| example code that no one of sound mind would allow to execute,
| being proffered by copilot.
| ChrisMarshallNY wrote:
| _> of sound mind_
|
| That's a reach, these days...
|
| I'm seeing some really ... _interesting_ ... behavior, being
| exhibited by folks that, at first blush, I think are kids, just
| out of bootcamp, but, on further inspection, turn out to be
| middle-aged professionals.
|
| I really think Teh Internets Tubes have been rather corrosive
| to collective mental health.
| klyrs wrote:
| The ability to think for oneself will diminish rapidly in an
| environment that rewards one for not doing so.
|
| Smart people still exist. They just aren't online.
| kirth_gersen wrote:
| Suicide by words, here?
| nyc_data_geek wrote:
| The Internet is still one of the easiest ways to find and
| participate in communities and conversations with other
| smart people, if you're invested in vetting and filtering
| who/what you're engaging with.
|
| That said, I expect the ease of such will continue to
| decline as we approach a largely dead Internet, primarily
| consisting of bots talking to bots trying to sell each
| other herbal brain force supplements or whatever
| satvikpendem wrote:
| From Plato's dialogue Phaedrus 14, 274c-275b:
|
| Socrates: I heard, then, that at Naucratis, in Egypt, was
| one of the ancient gods of that country, the one whose
| sacred bird is called the ibis, and the name of the god
| himself was Theuth. He it was who invented numbers and
| arithmetic and geometry and astronomy, also draughts and
| dice, and, most important of all, letters.
|
| Now the king of all Egypt at that time was the god Thamus,
| who lived in the great city of the upper region, which the
| Greeks call the Egyptian Thebes, and they call the god
| himself Ammon. To him came Theuth to show his inventions,
| saying that they ought to be imparted to the other
| Egyptians. But Thamus asked what use there was in each, and
| as Theuth enumerated their uses, expressed praise or blame,
| according as he approved or disapproved.
|
| "The story goes that Thamus said many things to Theuth in
| praise or blame of the various arts, which it would take
| too long to repeat; but when they came to the letters,
| "This invention, O king," said Theuth, "will make the
| Egyptians wiser and will improve their memories; for it is
| an elixir of memory and wisdom that I have discovered." But
| Thamus replied, "Most ingenious Theuth, one man has the
| ability to beget arts, but the ability to judge of their
| usefulness or harmfulness to their users belongs to
| another; and now you, who are the father of letters, have
| been led by your affection to ascribe to them a power the
| opposite of that which they really possess.
|
| "For this invention will produce forgetfulness in the minds
| of those who learn to use it, because they will not
| practice their memory. Their trust in writing, produced by
| external characters which are no part of themselves, will
| discourage the use of their own memory within them. You
| have invented an elixir not of memory, but of reminding;
| and you offer your pupils the appearance of wisdom, not
| true wisdom, for they will read many things without
| instruction and will therefore seem to know many things,
| when they are for the most part ignorant and hard to get
| along with, since they are not wise, but only appear wise."
| ChrisMarshallNY wrote:
| That's great!
|
| They nailed us, what, four thousand years ago?
| satvikpendem wrote:
| Humans have been anatomically unchanged for 50,000 years,
| I'd imagine every generation lamented the young with
| their new technology, otherwise we wouldn't have seen so
| many examples in written history, it is just that we have
| no records from prehistory, by definition.
| courseofaction wrote:
| Awesome. Serves as a counter-example - would HN consider
| literacy to be damaging to the mind, or are we similarly
| mistaken by thinking that LLMs necessarily degrade the
| abilities of their users?
|
| Pre-writing 'texts' (such as the Iliad) were memorized by
| poets, which is reflected in their forms which made more
| use of memory-friendly forms like rhyming, consistent
| meter, and close repetition.
|
| Writing allowed greater complexity and more
| complex/information dense literary forms.
|
| I feel that intelligent, critical LLM usage is just
| writing with less laboriousnes, which opens up the
| writer's ability to explore ideas more widely rather than
| spend their time on the technical aspects of knowledge
| production.
| klyrs wrote:
| Does it serve as a counterexample? Or did the predicted
| loss of memory function come to pass?
|
| Worth noting that people were smoking plain old opium
| back in those times; I'd be reluctant to apply their
| reasoning to fentanyl.
| satvikpendem wrote:
| What are you talking about with your second paragraph? I
| can't tell if it's supposed to be an analogy or whether
| you actually think everyone was smoking opium back then.
| klyrs wrote:
| Yes, the ancient Greeks were smoking opium. Nobody said
| that "everyone" was doing it, but its use was pretty
| widespread in neolithic Europe even before Sumerians were
| cultivating poppies Mesopotamia, back in 3400BCE.
|
| https://en.wikipedia.org/wiki/Opium
| satvikpendem wrote:
| I see, thanks for the clarification.
| klyrs wrote:
| Precisely the quote I was thinking of, thank you.
| rolph wrote:
| ..that suggests there is actually a chance that someone would
| go for such a boobytrap.
| bityard wrote:
| This is pretty interesting, and I have conflicted feelings about
| the (seemingly obvious) outcome of this trial.
|
| I wonder, if MS and OpenAI win, does that mean it will be legal
| for anyone to take the leaked source code for a proprietary
| product, train an LLM on it, and then ask the LLM to emit a
| version of it that is different enough to avoid copyright
| infringement?
|
| That would be quite the double-edged sword for proprietary
| software companies.
| ChrisMarshallNY wrote:
| I suspect that this is exactly what will happen; not just with
| code, but also prose and artwork.
|
| Someone is likely to design an LLM that is specifically trained
| to do exactly that.
|
| Lots of money to be made...
| devmor wrote:
| On the matter of artwork there's no need for suspicion - it
| is and has been happening for a while now. There are entire
| online databases dedicated to providing non-consenting
| artist's "styles" as downloadable model parameters by name.
| ChrisMarshallNY wrote:
| Try getting Mickey Mouse comics.
|
| That should be fun...
| satvikpendem wrote:
| Style is not copyrightable so I see nothing wrong with
| making essentially a robot that can paint in the style of
| someone else.
| falcolas wrote:
| In isolation, no. But the produced works can be too close
| for fair use (as demonstrated with the Prince pieces by
| Andy Warhol), and passing it off as a piece from the
| original artist can open you up to forgery/fraud charges.
|
| To put another way, the motivations to produce art in
| another artist's style can still land the artist/buyer in
| legal trouble regardless of fair use.
| satvikpendem wrote:
| Yes that is true, but I don't think the people who use
| style transfer are actually passing it off as the
| original, they just like it for the aesthetic value of
| their own images. In other words, no one using the Van
| Gogh LoRA is actually trying to forge the Starry Night.
| falcolas wrote:
| Given the value of an "authentic" painting of the Starry
| Night (or more realistically the value of something
| forged in, say, Samwise Didier's style) I can't agree
| with "no one".
|
| I have to imagine that it's likely quite popular to sell
| AI generated art that mimics or copies existing works.
| satvikpendem wrote:
| Do you use AI art generators? Flaws are extremely easily
| found out, it is only good for a rough snapshot (without
| much fiddling and even then, artifacts remain). I can
| guarantee you it is definitely _not_ popular to sell
| existing works made with AI, you are better off hiring an
| actual forger. In fact, your suggestion is even the first
| I 've even heard of such an idea.
| 8organicbits wrote:
| I guess there's always a greater fool, but forging an oil
| painting using AI digital images seems pretty far
| fetched.
| devmor wrote:
| The legality of using someone's copyrighted work to train
| a model to reproduce it without their consent is still
| under debate - but the morality of the act at least, is
| not related to its legality - be it positively or
| negatively; and I personally consider it abhorrent.
| satvikpendem wrote:
| Under what morals do you consider it "abhorrent?" I bet
| got a straight answer from those I've asked about this as
| the counter arguments seem too easy to make.
| devmor wrote:
| It's just pure exploitation. You're using the product of
| someone's work to create a machine that takes away their
| work.
| struant wrote:
| Why is doing a task with a machine suddenly objectionable
| when the same task performed by humans is perfectly fine?
| sensanaty wrote:
| A man with a small canoe catching a few fish with a
| fishing rod for his dinner is very different to a
| commercial fishing vessel trawling through the ocean with
| a massive net to catch thousands of fish at once. The two
| are treated differently under the law, and have different
| rules that apply to them due to the difference in scale.
|
| Scale matters, and the scale that computers/these AIs
| operate under are absurd compared to a person doing it
| manually.
| satvikpendem wrote:
| Why does scale matter in terms of AI? Just because a
| computer can do it at scale doesn't mean it should be
| treated similarly to your analogy. Rather than using an
| analogy, please tell me why it matters that computers can
| do something like AI at scale rather than individuals
| doing it.
| devmor wrote:
| Chiefly, scale and accountability.
|
| The work of a person can be mitigated and a person can be
| held accountable for their actions.
|
| Much of our society operates on the idea that we don't
| need to codify and enforce every single good or bad thing
| due to these reasons; and having such an underpinning
| affords us greater personal freedom.
| satvikpendem wrote:
| This does not actually answer the question of why it is
| bad (in your opinion) in the first place, it just states
| that bad things are mitigated. I am looking for a
| concrete answer to the former, not a justification of the
| latter. The former is what usually AI opponents can never
| answer, they assume prima facie that AI is bad, for
| whatever reason.
| devmor wrote:
| I answered your question plainly, but I'll try to go into
| detail. I have a suspicion that you don't see this as the
| philosophical issue that AI detractors do, and perhaps
| that hasn't been clearly communicated to you in the
| answers you've received, leading to your distaste for
| them or confusion at why they don't meet your criteria.
|
| I believe that this kind of generative AI is bad because
| it approximates human behavior at an inhuman scale and
| cannot be held accountable in any way. This upends the
| entire social structure upon which humans have relied to
| keep each other in-check since the advent of the modern
| concept of "justice" beginning with the Code of
| Hammurabi.
|
| In essence: Because you cannot punish, rehabilitate or
| extract recompense from a machine, it should not be
| allowed in any way to approximate a member of society.
|
| This logic does not apply to machines that "automate"
| labor, because those machines do not approximate human
| communication - they do not pretend to be us.
| satvikpendem wrote:
| Your argument can be applied to the printing press or the
| automatic loom, and before you say that AI is much more
| at scale, I do not think that it is any more at scale
| than producing billions of books and garments cheaply. If
| you instead say that AI is more autonomous than the prior
| which require human functionality, I will remind you that
| no AI today (and likely into the future) produces outputs
| autonomously with no human input (and indeed, many humans
| tweak those outputs further, making it more like photo
| editing than end-to-end solutions). Even if they could
| perfectly read your mind and output end-to-end, _you must
| first think_ for them to do what you desire.
|
| Should those machines then be subject to your same
| philosophies? I'd suspect you'd say "that's different"
| somehow but it is only because you are alive at this
| moment and these machines have been normalized to you
| that you do not care about them. Were you to be born in a
| few centuries, you would likely feel the same way most do
| about the prior machines, and indeed, you'd be hard
| pressed to find anyone who think that future generation's
| AI (probably simply called technology then) is
| problematic as you do today. Recency bias is one hell of
| a drug.
| satvikpendem wrote:
| Why does someone's work matter?
| sensanaty wrote:
| Why do you want the end result of the work if the work
| itself doesn't matter?
| satvikpendem wrote:
| I replied to the other comment.
| devmor wrote:
| If it didn't matter, you wouldn't want to take it.
| satvikpendem wrote:
| The word "work" is being overloaded here, their work as
| in output might matter but I am asking why they must work
| at all in the first place. If your answer is because they
| must procure money to survive, that is an economic
| failure, not one of AI. Jobs are simply a roundabout way
| of distributing money for output to be produced, if an AI
| can produce the output, the job need not exist. This is
| the same argument that has been used for centuries as
| automation advances in every field, but suddenly, when it
| comes for _my_ white collar high tech industry? It 's an
| outrage.
|
| Even then, their work as output can matter but that
| doesn't necessarily mean they (should) have a _per se_
| right to their work without other people also using it,
| especially in cases where their work is not used as
| outputs directly, which is what plagiarism is. If that
| were the case, no one could learn from a other 's work,
| regardless of whether that one is a person or a computer.
| devmor wrote:
| Remember, we are discussing art here, not white collar
| tech jobs. AI coming for _my_ job would be unpleasant and
| devastating, but that, like you said, is an economic
| problem. That I agree on.
|
| I don't think there is a way to continue this particular
| branch of this argument without devolving into a debate
| on the value of human life like a couple of Macedonian
| philosophers - suffice to say, my point of view is that
| the work of others has intrinsic value tied to intent,
| and machines do not have intent.
|
| If no output of humans has intrinsic value, then once
| machines can approximate humans sufficiently there is no
| reason for humans to exist - and that is an outcome that
| I, as a human, reject with all of my being.
| satvikpendem wrote:
| Output of humans has value _to humans;_ art does not have
| value to beings outside of humans, of course. That does
| not mean that one cannot use a machine to create new
| outputs, and it doesn 't mean that those will or will not
| have value, as again, value is subjective to the (human)
| beholder. We see this already with people praising AI
| art. Therefore, I do not believe that intent matters in
| the slightest as long as people deem something valuable.
|
| The reason for humans existing is not because of the
| output they produce (indeed, that is dystopic), humans
| have worth inherently, regardless of what they output.
| This is also what nihilists have figured out, so maybe
| that is something you should look into if you seriously
| have such an opinion as expressed in your last paragraph.
| CuriouslyC wrote:
| I sure wish I could non-consent to people observing me in
| the world, I'd like to move through society invisibly and
| only show myself when it benefitted me. Unfortunately, the
| only answer is to stay inside if I don't want people to see
| me.
| vkou wrote:
| > I sure wish I could non-consent to people observing me
| in the world,
|
| You aren't allowed to use photos _featuring_ a non-
| consenting person to, for instance promote a product.
|
| You are allowed to use photos _including_ a non-
| consenting person.
|
| There's a lot of complicated law, differing between
| different jurisdictions to cover this question, and to
| balance the needs of the public with commercial desires.
| It's not as simple as you make it sound, and there's no
| reason we should just default to bending over backwards
| for commercial interests.
|
| Laws exist to serve society, not the other way around.
| CuriouslyC wrote:
| I'm sure that the people who are being constantly
| victimized by paparazzi would like to know those rules
| that you just quoted, and have them be enforced.
| vkou wrote:
| If you had done a little research into this question,
| you'd realize that 1A use cases ('journalism') are
| treated by law quite differently than use of likeness for
| commercial intent.
|
| This is my whole point. There isn't a single, one-size-
| fits-all rule that a five year old can comprehend that
| describes any particular country's legal framework around
| the many, many different dimensions of tension between
| public and private interests on this incredibly broad
| question.
|
| And none of the existing frameworks fit the new use cases
| well, and we should probably have an open political
| debate about what we want to do going forward.
| CuriouslyC wrote:
| I'll happily take your picture against your will and put
| it on the internet with the tag "vkou mad at
| photographer, news at 11"
| vkou wrote:
| Okay? What will that prove? That you can be an ass?
|
| Being an ass is generally not illegal. _Particular_
| behaviours might be, but no legal or social system
| intends to censure you for every possible one, and most
| people who are experts in law or ethics don 't believe
| that they should.
|
| If you identify particular problems with the particular
| paparazzi laws in your country, that's an interesting
| conversation, and maybe, if framed well, an interesting
| data point for this discussion, but is not in itself the
| 'last word' on it. Just because you can torture an
| analogy, doesn't mean the analogy has a lot of power.
| sweeter wrote:
| > consent Careful... A lot of people online have
| selective understanding when it comes to this concept.
| It's selfishness and self-centredness taken to it's
| extreme, and not seeing other people as humans, but as
| tools for their consumption to be used and tossed aside
| for pleasure or for profit. It's one of the most
| disgusting things I've layed eyes on.
| devmor wrote:
| We are not discussing people observing people. We are
| discussing programs observing people.
| CuriouslyC wrote:
| Seems like a meaningless distinction in the face of a
| government that defines giving money as speech.
| immibis wrote:
| Note that in Europe (broadly speaking), this is a right
| people have.
| ADeerAppeared wrote:
| > Someone is likely to design an LLM that is specifically
| trained to do exactly that.
|
| Perplexity AI.
| chimeracoder wrote:
| > Perplexity AI.
|
| How does this describe Perplexity AI more than any other
| LLM?
| ADeerAppeared wrote:
| I am referring to their service rather than their LLM in
| specific.
|
| Perplexity is in the business of using an LLM to
| paraphrase existing content, then serving that up as
| their own "work" in a way that directly harms the
| original content they took.
|
| It's not even a question of "Is AI training copyright
| infringement", they're just doing copyright infringement
| with AI. And it's horribly common already.
| mcmcmc wrote:
| They plagiarize and blame it on the third party service
| they use for web scraping
|
| https://www.theverge.com/2024/6/27/24187405/perplexity-
| ai-tw...
| epolanski wrote:
| I feel like what really matters is who has more money to
| throw in tribunals.
|
| Somehow I feel if it was "Adobe vs dev that claims his code
| was spit by copilot" it would not end the same.
| crote wrote:
| I was mainly inspired by this section:
|
| > Specifically, the judge cited the study's observation that
| Copilot reportedly "rarely emits memorized code in benign
| situations, and most memorization occurs only when the model
| has been prompted with long code excerpts that are very
| similar to the training data."
|
| That almost sounds like it'd be fine to train an "art
| transformation model" which takes an image and transforms it,
| which for all the frames of a specific Disney movie _just so
| happen_ to output the very next frame...
| saint_fiasco wrote:
| That sounds like the opposite from the quote. The art
| transformation model you propose WOULD emit memorized art
| in benign situations, so in that judge's opinion it WOULD
| count as plagiarism.
| TomatoCo wrote:
| With how modern video codecs use data from previous frames
| you could make a not-entirely-specious argument that we
| already have a tool that can do this and it's called
| ffmpeg.
| devmor wrote:
| Following existing law and applying reasonable expectations, I
| would point to the old adage "intent is 9/10ths of the law".
|
| It would probably be legal to do this, as long as no one could
| reasonably show that you intentionally trained the LLM on said
| leaked source code with the intent to reproduce the product.
|
| Of course, civil suits could be another matter entirely. If you
| pick a product to rip off that's owned by a multi-billion
| dollar company, all that can save you is the ethical limits of
| their legal team's consciences.
| spencerflem wrote:
| Not unless a big company is the one doing it lol
| pennomi wrote:
| Or even those AI-powered decompilers people are working on...
| you could clone virtually any software with that. Surely there
| will be limitations.
| beeboobaa3 wrote:
| The limitation is the amount of money & political power the
| owner of the software you're cloning has.
| wongarsu wrote:
| The source code of Windows XP is widely available. Same with
| a ~2 year old version of Bing, Bing Maps, Cortana etc. Yet
| that doesn't seem to have had major negative effects on those
| products. If anything having the Windows source code
| available seems to be a net boon for Windows development.
| Sometimes looking at the source is just better if the
| documentation is unclear.
| userbinator wrote:
| MS probably hates that the source for XP/2K3 leaked because
| it means more people will put in effort to fix and
| extend/backport, even if it's not truly legal, when MS
| would rather coerce them into the latest most invasive and
| user-hostile version. Also because projects like NTVDMx64
| show how some of their decisions have been political
| instead of technical as they like to claim.
|
| Far less people care about Bing or Cortana.
| mr_toad wrote:
| If you compiled it and the resulting binary was substantially
| similar to the original you'd likely get sued.
| Legend2440 wrote:
| I mean you can legally do this by hand right now. That's how
| they cloned the IBM bios back in the day. IBM sued and lost.
| marcosdumay wrote:
| No, that's not.
|
| They cloned the bios by observing how it behaved and writing
| code that behaved the same way. Nobody even looked at the
| bios code.
| wvenable wrote:
| That's not how they did it. They had one team read the BIOS
| source listings in the IBM PC Technical Reference Manual
| and create a technical specification and a second team take
| that specification and write a new BIOS [1]. The second
| team never saw the original code so therefore they could
| not have copied it.
|
| To do something similar with AI, you really need to train
| one AI on the source code and then have it explain that
| code to a second AI that never saw the original code.
|
| [1] https://en.wikipedia.org/wiki/Phoenix_Technologies
| axus wrote:
| I thought there was a "clean room", where the people reading
| it and the people writing it were different; and they made a
| written specification instead of a Vulcan mind meld.
| jeroenhd wrote:
| A Wine fork built using an LLM trained on leaked Windows code
| might be pretty useful.
| witx wrote:
| You'd get a Wine full of ads, the need for an account to use
| and the not so occasional BSoD /s
| Spivak wrote:
| No, because judges aren't robots applying the law like code.
| Intent matters. If you do this it will be painfully obvious
| that your intent is to duplicate a large body copywritten code.
| yellowapple wrote:
| It's painfully obvious that the intent of GitHub Copilot is
| to duplicate a large body of copyrighted code.
| tpmoney wrote:
| It doesn't appear to be painfully obvious. Both because
| they're not losing court cases yet, and there's a huge
| swath of non copyrighted code being produced by co-pilot
| every day. By contrast the plaintiffs apparently were
| unable to induce Copilot to duplicate any parts of their
| code.
| Bognar wrote:
| Oh so that's why Copilot has a filter to prevent suggesting
| copyrighted code, because the intent is to duplicate
| copyrighted code. It all makes sense now.
| yellowapple wrote:
| That's exactly what I expect to happen with the source code to
| Microsoft's own software products, namely Windows.
|
| Hilarity will ensue :)
| mr_toad wrote:
| The misappropriation of the code (a trade secret) would likely
| be grounds for legal action against the people who stole it and
| the people who received it. A lot depends on jurisdiction.
|
| But if it was made public and then if an unrelated third party
| were to re-write the code in such a way that it was non-
| infringing, then it would be non-infringing. That's just a
| tautology.
| throwaway562if1 wrote:
| Let's be honest: It will be legal if you're a $3 trillion
| company, and not if you're not.
| stale2002 wrote:
| By definition you are allowed to take leaked source coded and
| change it enough such that it avoids infringement, and this
| will avoid infringement.
|
| The LLM has nothing to do with it, and isn't required here.
| elzbardico wrote:
| Yeah. In a ideal world where an open source developer gets
| equal treatment from the law facing a giant corporation with
| hordes of very expensive lawyers and "technical experts".
| daedrdev wrote:
| > The anonymous programmers have repeatedly insisted Copilot
| could, and would, generate code identical to what they had
| written themselves, which is a key pillar of their lawsuit since
| there is an identicality requirement for their DMCA claim.
| However, Judge Tigar earlier ruled the plaintiffs hadn't actually
| demonstrated instances of this happening, which prompted a
| dismissal of the claim with a chance to amend it.
|
| It sounds fair from how the article describes it
| whimsicalism wrote:
| Huh. There have definitely been well publicized examples of
| this happening, like the quake inverse square root
| polishTar wrote:
| Fast inverse square root is now part of the public domain.
|
| Also, even if this weren't the case you can't sue for damages
| to other people (they'd need to bring their own suit)
| anonymoushn wrote:
| Is the particular implementation that the model spits out
| 70+ years old?
| immibis wrote:
| Has it really already been 70 years since John Carmack
| died?
| polishTar wrote:
| Ah, you're right. I was wrong to say "public domain".
|
| It would be more correct to say Quake III Arena was
| released to the public as free software under the GPLv2
| license.
| KnightHawk3 wrote:
| There is a large gap between public domain and GPL. For
| starters if Copilot is emitting GPL code for closed
| source projects... that's copyright infringement.
| FireBeyond wrote:
| That would be _license_ infringement, not copyright
| infringement.
| immibis wrote:
| Copyright infringement is emitting the code. The license
| gives you permission to emit the code, under certain
| conditions. If you don't meet the conditions, it's still
| copyright infringement like before.
| voxic11 wrote:
| You can't copyright a mathematical operation. Only a
| particular implementation of it, and even then it may not be
| copyrightable if its a straightforward and obvious
| implementation.
|
| That said the implementation doesn't appear to be totally
| trivial and copilot apparently even copies the comments which
| are almost certainly copyrightable in themselves.
|
| https://x.com/StefanKarpinski/status/1410971061181681674
| https://github.com/id-Software/Quake-III-
| Arena/blob/dbe4ddb1...
|
| However a twitter post on its own isn't evidence a court will
| accept. You would need the original poster to testify that
| what is seen in the post is actually what he got from copilot
| and not just a meme or joke that he made.
|
| Also the plaintiffs in this case don't include id-Software
| and there is some evidence that id-Software actually stole
| the fast inverse sqrt code from 3dfx so they might not want
| to bring a claim here anyways.
| beeboobaa3 wrote:
| https://en.wikipedia.org/wiki/Illegal_number
| whimsicalism wrote:
| Not sure where you thought I said you could copyright a
| mathematical operation, I was clearly referring to the
| implementation due to the mention of "quake".
|
| When it was reported, I was able to reproduce it myself.
| TechDebtDevin wrote:
| Weren't people getting it to spit out valid windows keys
| also?
| pas wrote:
| GPT4 regurgitated almost full NYT articles verbatim. It's
| strange that this lawsuit seems to be so amateurish that
| they failed to properly demonstrate the reproduction.
| Though of course it might require a lot of legal
| technicalities that we naively think are trivial but they
| might be not.
| Kim_Bruning wrote:
| I read that case.
|
| Absolutely there were a few outliers where a judge might
| want to look more closely. I'd be surprised if -under
| scrutiny- there wouldn't be any issues whatsoever that
| OpenAI overlooked.
|
| However, it seemed to me that over half of the NYT
| complaints were examples of using the -then rather new-
| ChatGPT web browsing feature to browse their own website.
| In the case, they then claimed surprise when it did just
| what you'd expect a web browsing feature to do.
| voidfunc wrote:
| Its even simpler, iD is owned by ZeniMax. ZeniMax is owned
| by Microsoft.. who would they even sue?
| nvr219 wrote:
| "Trust no one... even yourself"
| naikrovek wrote:
| That's not how that works.
|
| All the plaintiffs would need to do is provide evidence
| that copywritten code was produced verbatim. This
| includes showing the copyrighted code on GitHub, showing
| copilot reproducing the code (including how you
| manipulated copilot to do it), showing that they match,
| and showing that the setting to turn off reproduction of
| public code is set.
|
| It makes no difference who owns the copyrighted code, it
| need only be shown that copilot is violating copyright.
| Microsoft can't say "uhh that doesn't count" or whatever
| simply because they own a company that owns a company
| that owns copyright on the code.
| sulandor wrote:
| > You can't copyright a mathematical operation.
|
| i agree from a philosophical pov, but this is clearly not
| the case in law.
|
| https://en.wikipedia.org/wiki/Illegal_number
| williamcotton wrote:
| _The second step is to remove from consideration aspects
| of the program which are not legally protectable by
| copyright. The analysis is done at each level of
| abstraction identified in the previous step. The court
| identifies three factors to consider during this step:
| elements dictated by efficiency, elements dictated by
| external factors, and elements taken from the public
| domain._
|
| https://en.wikipedia.org/wiki/Abstraction-Filtration-
| Compari...
| banish-m4 wrote:
| Algorithms can and are definitely patented in utility
| patents in the US.
| wongarsu wrote:
| It reads like the judge required them to show it happened to
| their code, not to any code in general. That's a much higher
| bar. There are thousands of instances of fast inverse square
| root in the training data but only one copy of your random
| github repositories. Getting to model to reproduce your code
| verbatim might be possible for all we know, but it isn't
| trivial.
| whimsicalism wrote:
| of course for standing. but it seems like with the right
| plaintiffs this could have gone forward
| Dylan16807 wrote:
| If it only copies code that has been widely stolen
| already then that's a lot weaker of a case and is
| something they can do a lot to prevent on a technical
| level.
| brookst wrote:
| But that's like saying my lawsuit alleging Taylor Swift
| copied my song could have gone forward with a plaintiff
| who had, years ago, written a song similar to what Ms.
| Swift recorded recently. That"s true, but perhaps the
| lesson here is that damages that hinge on statistically
| rare victims should not extrapolated out to provide
| windfalls for people who have not been harmed.
| whimsicalism wrote:
| i think that is a weak analogy and also unnecessary bc it
| is already clear what i am saying
| Suppafly wrote:
| >It reads like the judge required them to show it happened
| to their code, not to any code in general.
|
| Rightly so, you have to show some sort of damage to sue
| someone, not just theoretical damages.
| sleepybrett wrote:
| It could be forced, of course. I can republish my
| copyrighted code millions of times all over the internet.
| Next time they retrain there is a good chance my code will
| end up in their corpus, maybe many many times, reinforcing
| it statistically.
| daedrdev wrote:
| The article mentions that GitHub copilot has been trained to
| avoid directly copying specific cases it knows, and that
| although you can get it to spit out copyright code by
| prefixing the copyrighted code as a starting point, in normal
| us cases its quite rare.
| dathinab wrote:
| yes, but you need to show that it happened _in your case_,
| not that it can happen in general.
| ADeerAppeared wrote:
| Where it gets ethnically dubious is that:
|
| 1. The copilot team rushed to slap a copyright filter on top to
| keep these verbatim examples from showing up, and now claims
| they never happen.
|
| 2. LLMs are prone to paraphrasing. Just because you filter out
| verbatim copies doesn't mean there isn't still copyright
| infringement/plagiarism/whatever you want to call it. The
| copyright filter is only a legal protection, not a practical
| protection against the issue of copyright infringement.
|
| Everyone who knows how these systems work understand this. The
| copilot FAQ to this day claims that you should run copyright
| scanning tools on your codebase because your developers might
| "copy code from an online source or library".
|
| Github has it's own research from 2021 showing that these tools
| do indeed copy their training data occasionally:
| https://github.blog/2021-06-30-github-copilot-research-recit...
|
| They clearly know the problem is real. Their own research
| agreed, their FAQs and legal documents are carefully phrased to
| avoid admitting it. But rather than owning up to the problem,
| it's "Ner ner ner ner ner, you can't prove it to a boomer
| judge".
| squarefoot wrote:
| > 1.
|
| Isn't that akin to destruction of evidence?
| ADeerAppeared wrote:
| Legally? No.
|
| In spirit? ... Probably?
|
| Unlike most LLMs, Github copilot can trivially solve their
| copyright problem by just using only code they have the
| right to reproduce.
|
| They have a giant corpus of code tagged with license,
| SELECT BY license MIT/Equivalent and you're done, problem
| solved because those licenses explicitly grant permission
| for this kind of reuse.
|
| (It's still not very cash money to take open source work
| for commercial gain without paying the original authors,
| and there's a humorous question if MIT-copilot would need
| to come with a multi-gigabyte attribution file, but
| everyone widely agrees it's legal and permitted.)
|
| The only reason you'd hack a filter on top rather than
| doing the above is if you'd want to hide the copyright
| problem. It's an objectively worse solution.
| Spivak wrote:
| > Unlike most LLMs, Github copilot can trivially solve
| their copyright problem by just using only code they have
| the right to reproduce.
|
| Absolutely not trivial, in fact completely impossible by
| computer alone. You can't determine if you have the right
| to reproduce a piece of code just by looking at the code
| and tags themselves. *Taps the color-of-your-bits sign.*
|
| * I can fork a GPL project on Github and replace the
| license file with MIT. Okay to reproduce?
|
| * If I license my project as MIT but it includes code I
| copied inappropriately and don't have the right to
| reproduce myself, can Github? (No) This one is why
| indemnity clauses exist on contracted works.
|
| * I create a git repo for work and select the MIT license
| but I don't actually own the copyright on that code and
| so that license is worthless.
| gkbrk wrote:
| There is no difference when it comes to MIT and GPL here.
| If your model outputs my MIT licensed code, you still
| need to provide attribution in the form of a copyright
| notice as required by the MIT license.
| sleepybrett wrote:
| Have the copyleft people, or anyone else, produced some
| boilerplate licenses that explicitly deny use in training
| models?
| abigail95 wrote:
| Not in any way I'm aware of - and would be required if they
| were served a DMCA notification/Cease and Desist against a
| specific prompt.
|
| The people that think Copilot is infringng their copyright
| would be happy with that I would think? Unless they take a
| much stricter definition of fair use than current courts
| do.
| tpmoney wrote:
| No more so than scanner/printer manufacturers adding tech
| to prevent you from scanning and printing currency is
| destruction of evidence that they are in fact producing
| illegal machines for counterfeiting.
| bawolff wrote:
| I would think it is pretty obviously not.
|
| Is taking away a drunk driver's keys (before they get in
| the car) destruction of the evidence of their drunk
| driving?
| squarefoot wrote:
| This is not what I meant. By placing a copyright filter
| _and_ claiming it never happened (please read the line I
| was replying to) before the system can be audited, they
| 're indeed taking away the drunk driver's keys, which is
| a good thing, but also removing the offending car before
| Police arrives.
| bawolff wrote:
| In this metaphor, removing the car of someone who was
| going to drink and drive but didn't, is certainly not a
| crime. Presumably though you mean removing the car after
| drunk driving actually took place - which might be, but
| probably depends a lot on if the person knew, and what
| the intent of the action was.
|
| In the current case - its unclear if any crime took place
| at all, it seems clear that the primary intent was to
| prevent future crime not hide evidence of past ones. Most
| importantly the past version of the app is not destroyed
| (presumably). Github still has the version of the
| software without the copyright filter. If relavent and
| appropriate, the court could order them to produce the
| original version. It can't be destroying evidence if the
| evidence was not destroyed.
| squarefoot wrote:
| Yes, sorta. We're talking about software, therefore a
| piece of code that does something programmatically isn't
| like the drunk driver in a car that may cause more
| accidents, and although we aren't sure about that we
| prevent him/her to drive anyway just to be safe. The
| software would most certainly repeat its routine because
| it has be written to do so, that's why I wondered about
| destruction of evidence; by removing/modifying it, or
| placing filters, they would prevent it from repeating the
| wrongdoing, but also take away any means of auditing the
| software to find what happened and why.
| bawolff wrote:
| > The copilot team rushed to slap a copyright filter on top
| to keep these verbatim examples from showing up, and now
| claims they never happen.
|
| Well if the copyright filter is working they indeed aren't
| happening. Putting in safe gaurds to prevent something from
| happening doesn't mean you're guilty of it. Putting a railing
| on a balcony doesn't imply the balcony with railing is
| unsafe.
|
| > LLMs are prone to paraphrasing. Just because you filter out
| verbatim copies doesn't mean there isn't still copyright
| infringement/plagiarism/whatever you want to call it
|
| Copyright infringement and plagerism are different things.
| Stuff can be copyright infringement without being plagerized,
| and can be plagerized without being copyright infringement.
| The two concepts are similar but should not be conflated,
| especially in a legal context.
|
| Courts decide based on laws, not on gut feeling about what is
| "fair".
|
| > They clearly know the problem is real
|
| They know the risk is real. That is not the same thing as
| saying that they actually comitted copyright infringement.
|
| A risk of something happening is not the same as actually
| doing the thing.
|
| > "Ner ner ner ner ner, you can't prove it to a boomer
| judge".
|
| Its always a cop-out to assume that they lost the argument
| because the judge didn't understand. I suspect the judge
| understood just fine but the law and the evidence simply
| wasn't on their side.
| FireBeyond wrote:
| > Well if the copyright filter is working they indeed
| aren't happening. Putting in safe gaurds to prevent
| something from happening doesn't mean you're guilty of it.
| Putting a railing on a balcony doesn't imply the balcony
| with railing is unsafe.
|
| Doesn't mean you weren't, at some point, guilty of it,
| either. It doesn't retcon things.
| Dylan16807 wrote:
| Yeah but I think the main concern in this situation is
| copilot moving forward, not their past mistakes.
| bawolff wrote:
| Sure, which is why we require evidence of wrong doing.
| Otherwise its just a witch hunt.
|
| After all, you yourself probably cannot prove that you
| didn't commit the same offense at some point in time in
| the past. Like Russel's teapot, its almost always
| impossible to disprove something like that.
| nl wrote:
| > Just because you filter out verbatim copies doesn't mean
| there isn't still copyright infringement/plagiarism/whatever
| you want to call it.
|
| Actually, it does. The production of the output is what
| matters here.
| kelnos wrote:
| If you copy someone else's copyrighted work and then
| rearrange a few lines and rename a few things, you're
| probably still infringing.
| Spivak wrote:
| For a book or a song, for sure, although that isn't
| really punished. Search the drama surrounding a popular
| YA author in the 10's, Cassandra Claire. For code since
| you can only copy the form and not the function that
| might actually be enough.
|
| People do clean room implementations because of paranoia,
| not because it's actually a necessary requirement.
| Retric wrote:
| Moving a few things around means your internal process
| already had copywrite infringement.
| Spivak wrote:
| Probably not. Copyright infringement in the manner we're
| talking about presumes you already have license to access
| the code (like how Github does). What you don't have
| license to do is _distribute_ the code -- entirely or not
| without meeting certain conditions. You 're perfectly
| free to do whatever naughty things you want with the
| code, sans run it, in private.
|
| The literal act of making modifications isn't
| infringement until you distribute those modifications --
| and we're talking about a situation where you've changed
| the code enough that it isn't considered a derivative
| work anymore (apparently) so that's kosher.
| Retric wrote:
| First the case would be dismissed if Copilot had
| permission to make copies. Clearly they didn't. Copyright
| cares about copies, for profit distribution just makes
| this worse.
|
| > you already have license to access the code
|
| This isn't access, that occurs before the AI is trained.
| It's access > make copy for training > AI does lossy
| compression > request unzips that compression making a
| new copy > process fuzzes the copy so it's not so obvious
| > derivative work sent to users.
| warkdarrior wrote:
| Clearly Copilot had permission to make (unmodified)
| copies, the same way Github's webserver had permission to
| make (unmodified) copies. The lawsuit is about making
| partial copies without attribution.
| Retric wrote:
| GitHub's web server is not the same thing as Copilot and
| needs separate permission.
|
| GitHub didn't just copy open source code they copped
| _everything_ without respect to license. As such
| attribution which may have allowed some copying isn't
| generally relevant.
|
| Really a public repo on GitHub doesn't even mean the
| person uploading it owns the code, if they needed to
| verify ownership before training they couldn't have
| started. Thus by necessity they must take the stance that
| copyright is irrelevant.
| dspillett wrote:
| _> The copilot team rushed to slap a copyright filter on top
| to keep these verbatim examples from showing up, and now
| claims they never happen._
|
| More than that: the fact that they claimed it wasn't possible
| _before_ adding the filter, to filter out the thing that said
| wasn 't possible. This doesn't help me trust anything else
| they might say or have already said.
|
| My take on that was always: if it isn't possible, then why
| are MS not training the AIs on their internal code (like that
| for Office, in the case of MS with their copilot product) as
| well as public code? There must be good examples for it to
| learn from in there, unless of course they thing public code
| is massively better than their internal works.
| klabb3 wrote:
| This is so stupid. Going after likeness is doomed to fail
| against constantly mutating enemies like booming tech companies
| with infinite resources. And likeness itself isn't even that
| big of a deal, and even if you win it's such a minor case-by-
| case event that puts an enormous burden of proof on the victims
| to even get started. If the narrative centers around likeness,
| they've already won.
|
| The main issue, as I see it, is that they took copyrighted
| material and made new commercial products without compensating
| (let alone acquiring permission from) the rights holders, ie
| their suppliers. Specifically, they sneaked a fair use sticker
| on mass AI training, with neither precedent nor a ruling
| anywhere. Fair use originates in times before there were even
| computers. (Imo it's as outrageous as applying a free-mushroom-
| picking-on-non-cultivated-land law to justify industrial scale
| farming on private land.) That's what should be challenged.
| mvdtnz wrote:
| What were the plaintiffs even thinking when they submitted a
| claim based on identicality without being able to produce a
| single instance of copilot generating a verbatim copy. Even the
| research they submitted was unable to make a claim any stronger
| than "it's possibly in theory but we've never seen it".
| AmericanChopper wrote:
| A lot of people post AI outrage comments on HN that are clearly
| based on a rather poor understanding of the law and legal
| processes. This entire case and all of the plaintiffs
| statements about it reads like one of those comments.
| loceng wrote:
| This kind of argument makes me feel like it also supports the
| abolition of patents: eventually multiple other people will come
| up with the same obvious solution, which becomes obvious once a
| person spends enough time looking at a problem.
| CodeWriter23 wrote:
| The Patent System is not intended to be a test of exclusive
| original thought.
|
| The function of the Patent System is to incentivize search for
| solutions by temporarily securing exclusive right to market
| novel devices and processes for the discoverer.
| loceng wrote:
| Of non-obvious inventions. My argument being all inventions
| are obvious once attention is applied to that area and scope.
| CodeWriter23 wrote:
| Requiring attention IMO takes something out of the realm of
| "obvious". And the standard is "novel".
| loceng wrote:
| Everything in the future is novel, so that's a moot
| qualifier.
|
| Everything requires attention to be seen, once somethign
| becomes "obvious" is fully determined where you're
| looking and the scope you're zoomed in on.
|
| E.g. "matter is solid" until you zoom in and realize
| matter is mostly made up of space.
| CodeWriter23 wrote:
| Moot in your opinion. The idea is to bring the future
| more expediently by providing temporary incentive to
| pioneers reaching into the future.
| loceng wrote:
| You just proved my point with your second sentence - that
| everything in the future will come.
|
| And bringing things more expediently is the actual
| opinion here, unsupported, where arguably it actually
| slows down not only progress but the value of that
| progress not being as widely distributed as it otherwise
| would be.
| CodeWriter23 wrote:
| You continue to miss my point. Your point is a lazy, "the
| future will get here whenever it does" perspective. Mine
| is incentivizing discovery brings future innovations
| sooner.
| erik_seaberg wrote:
| Unfortunately USPTO takes "non-obvious" to mean that it wasn't
| already suggested by combining patents or other written work,
| so if you are the first to work a problem you can claim easy
| solutions that anyone with a clue would have quickly reached.
| Land rushes to fence off new fields seem inevitable.
| pledess wrote:
| I thought "the Copilot coding assistant was trained on open
| source software hosted on GitHub and as such would suggest
| snippets from those public projects to other programmers without
| care for licenses" was explicitly allowed by the GitHub Terms of
| Service: https://docs.github.com/en/site-policy/github-
| terms/github-t... "If you set your pages and repositories to be
| viewed publicly, you grant each User of GitHub a nonexclusive,
| worldwide license to use, display, and perform Your Content
| through the GitHub Service." In other words, in addition to
| what's allowed by the LICENSE file in your repo, you are also
| separately licensing your code "to use ... through the GitHub
| Service" and this would (in my interpretation) include use by
| Copilot for training, and use by Copilot to deliver snippets to
| any other GitHub user.
| dmitrygr wrote:
| Lots of my code is on github (eg
| https://github.com/syuu1228/uARM), uploaded by others. I gave
| no license for its use in training. What now?
| zdragnar wrote:
| If the person didn't have your permission or permission from
| the license to agree to github's terms, then you sue the
| person who uploaded it to GitHub.
|
| You don't get to go after GitHub because you have no
| contractual relationship with them. At best, you can get an
| injunction forcing them to take it down, though getting them
| to un-train copilot may not be feasible. At best you'd get a
| small cash offer, since you're unlikely to be able to justify
| any damages in a suit.
| dredmorbius wrote:
| 17 USC SS504 says otherwise:
|
| _... the copyright owner may elect, at any time before
| final judgment is rendered, to recover, instead of actual
| damages and profits, an award of statutory damages for all
| infringements ... in a sum of not less than $750 or more
| than $30,000. ... in a case where the copyright owner
| sustains the burden of proving, and the court finds, that
| infringement was committed willfully, the court in its
| discretion may increase the award of statutory damages to a
| sum of not more than $150,000._
|
| <https://www.law.cornell.edu/uscode/text/17/504>
|
| The issue isn't contract. It's copyright infringement.
| 201984 wrote:
| So hypothetically, if a developer publishes GPL software on
| Codeberg, and someone uploads it to GitHub, could the
| original developer file takedowns against the Github copy?
|
| I'm curious if Github's ToS make uploading GPL software you
| don't own a copyright violation.
| votepaunchy wrote:
| No, because the GPL is already more permissive than the
| GitHub TOS.
| pton_xd wrote:
| > then you sue the person who uploaded it to GitHub.
|
| > You don't get to go after GitHub because you have no
| contractual relationship with them
|
| What makes you say that? If someone eg uploads my
| copyrighted work to YouTube, I file a DMCA notice with
| YouTube to stop distributing my work. If YT ignores the
| notice then I can pursue them with a lawsuit.
|
| How is this situation different?
| singleshot_ wrote:
| DMCA explicitly gives you a cause of action against the
| party who does not properly comply with your request. GP
| asserts that you lack a cause of action against GitHub
| before they fail to comply with DMCA but I'm not certain
| I agree.
| stefan_ wrote:
| DMCA is a narrow protection for operators of public
| websites like GitHub. I don't see what it has to do with
| GitHub taking the data submitted to it with dubious
| sourcing and developing their CoPilot whatever based on
| it. That has nothing to do with the privileges in DMCA.
| singleshot_ wrote:
| That's right. You have lost the thread of what we are
| talking about: causes of action based on privity vs those
| created by statute.
| simion314 wrote:
| That will work if I upload only my code, but there are many
| open source projects where there are more then one author and
| GithHub did not acquired the rights from all the authors, the
| uploader to GitHub might not even be the author too.
| Brian_K_White wrote:
| That just means github can display the code, and you can see
| the code, but that does not mean you can then profit from or
| redistribute (profit or no) the code without attribution.
|
| Amazon has the rights to publish a book, and you have the right
| to receive a copy of the book, but neither of those gives you
| the right to re-publish the book under your own name.
| rurcliped wrote:
| "use, display, and perform Your Content through the GitHub
| Service" might allow a wide range of uses on GitHub Pages
| websites, even if https://example.github.io is monetized
| (monetization is permitted by
| https://docs.github.com/en/site-policy/github-
| terms/github-t... in a few cases)
| purpleblue wrote:
| Can you insist or put instructions that AIs do not train on your
| code? If they train on your code but don't produce the exact same
| output, is there any protection you can have from that?
| archontes wrote:
| When are people going to get that this isn't a right folks
| have?
|
| If your code is readable, the public can learn from it.
|
| Copyright doesn't extend to function.
| ADeerAppeared wrote:
| People aren't going to get it, because you don't get them.
|
| People have the right to learn _non-copyrightable elements_
| from your code.
|
| The claim is that AI learns _copyrightable elements_.
| archontes wrote:
| The comment chain you are replying to includes a request to
| not train an AI on one's code.
|
| I agree it's certainly possible for AI to produce
| infringing output.
|
| Nevertheless, people don't have the right to enforce a
| limitation on training.
| warkdarrior wrote:
| And to give a concrete example, in my view it should be
| allowed to use any source code to train a model such that
| the model learns that code is bad or insecure or slow or
| otherwise undesirable. In other words, it should be
| allowed to train on anything as long as the model does
| NOT produce that training data verbatim.
| archontes wrote:
| Maybe you should update your view with 17 USC 106.
|
| https://www.law.cornell.edu/uscode/text/17/106
| LegionMammal978 wrote:
| What copyrightable elements of the original work persist
| in the model, if it is incapable of outputting them? I
| can derive a SHA-1 hash from a copyrighted image, and yet
| it would be absurd to call that a derivative work.
| carom wrote:
| The public is not learning from it. A person or corporation
| is creating a derivative work of it. Training a model is
| deriving a function from the training data. It is not "a
| human learning something by reading it".
| archontes wrote:
| It's an extreme stretch to say that the model weights are a
| derivative work of the training data given the legal
| definition of "derivative work".
| timeon wrote:
| It is processed data at the end of the day. And no it is
| not like human reading. You can't read whole Github.
| stale2002 wrote:
| That doesn't make it a derivative work.
|
| If I "process data" by doing a word count of a book, and
| then I publish the number of words in that book (not the
| words themself! Just a word count!) I haven't created a
| derivative work.
|
| Processing data isn't automatically infringement.
| verandaguy wrote:
| More thinking out loud than answering your question, but
| nightshade for code and other plain text formats would be cool.
| munificent wrote:
| _> Indeed, last year GitHub was said to have tuned its
| programming assistant to generate slight variations of ingested
| training code to prevent its output from being accused of being
| an exact copy of licensed software._
|
| If I, a human, were to:
|
| 1. Carefully read and memorize some copyrighted code.
|
| 2. Produce new code that is textually identical to that. But in
| the process of typing it up, I randomly mechanically tweak a few
| identifiers or something to produce code that has the exact same
| semantics but isn't character-wise identical.
|
| 3. Claim that as new original code without the original
| copyright.
|
| I assume that I would get my ass kicked legally speaking. That
| reads to me exactly like deliberate copyright infringement with
| willful obfuscation of my infringement.
|
| How is it any different when a machine does the same thing?
| singleshot_ wrote:
| The guy who owns the machine is really rich, while you are more
| or less (all due respect of course) not worth suing.
|
| That's why I think the opposite of what you claim is true: if
| you were to do this, absolutely nothing would happen. When they
| do it, they will get sued over and over until the law changes
| and they can't be sued, or they enter some mutually-beneficial
| relationship with the parties who keep suing.
| beeboobaa3 wrote:
| > if you were to do this, absolutely nothing would happen
|
| Read up on the DMCA and the impact it has on e.g. nintendo
| emulators and the developers thereof
| dmix wrote:
| Those emulators are very popular though to the point of
| potentially impacting another business's bottom line. Where
| an individual putting it out a small block of code isn't
| exactly going to attract expensive lawyers.
|
| I'm skeptical Github Copilot reproducing a couple functions
| potentially used by some random Github project is going to
| be a threat to another party's livelihood.
|
| When AI gets good enough to make full duplicates of apps
| I'd be more concerned about the source. Thousands of
| smaller pieces drawn from a million sources and being
| combined in novel ways is less worrying though.
| BadHumans wrote:
| There is no impact to a company's bottom line when you
| are emulating a product they do not sell.
| lcouturi wrote:
| Yuzu, the emulator that was sued by Nintendo, was
| emulating the Nintendo Switch, which is a product
| Nintendo does sell.
| BadHumans wrote:
| Yuzu is not the only emulator taken down by Nintendo and
| Nintendo is not the only company that has gone after
| emulators.
| lcouturi wrote:
| In that case, could you clarify what instances of this
| you're referring to?
|
| The death of Citra wasn't really a deliberate action on
| the part of Nintendo, it was collateral damage. Citra was
| started by Yuzu developers and as part of the settlement
| they were not able to continue working on it. Citra's
| development had long been for the most part taken over by
| different developers, but the Yuzu people were still
| hosting the online infrastructure and had ownership of
| the GitHub repository, so they took all of it down. Some
| of the people who were maintaining Citra before the
| lawsuit opened up a new repository, but development has
| slowed down considerably because the taking down of the
| original repository has caused an unfortunate splintering
| of the community into many different forks.
|
| There is some speculation Nintendo was involved with the
| death of the Nintendo 64 emulator UltraHLE a long time
| back, but this was never confirmed. If indeed they did go
| after UltraHLE, then this would just like Yuzu be a case
| of them taking down an emulator for a console they were
| still profiting from, as UltraHLE was released in 1999.
|
| The most famous example of companies going after
| emulators is Sony, which went after Connectix Virtual
| Game Station and Bleem!. Both were PS1 emulators released
| in 1999, a period during which Sony was still very much
| profiting from PS1 sales. Sony lost both lawsuits and
| hasn't gone after emulators since.
|
| In 2017, Atlus tried to take down the Patreon page for
| RPCS3, a PS3 emulator. However, Atlus only went after the
| Patreon page, not the emulator itself, which they did
| because of their use of Persona 5 screenshots on said
| page. The screenshots were simply taken down and the
| Patreon page was otherwise left alone. Of note is that
| Atlus is a game developer, so they were never profiting
| from PS3 sales. However, they were certainly still
| profiting from Persona 5 sales, which had only released
| in 2016.
|
| These are the only examples I can remember. Did I miss
| anything?
| fragmede wrote:
| the bnetd emulator, that let Diablo and StarCraft players
| not have to pay Blizzard for the privilege of buying the
| game, though that's a bit different.
| omegacharlie wrote:
| emulators for many nintendo consoles have been developed
| and released while the console was still sold and have
| been left alone as long as they had no direct links to
| piracy, recent events are a bit of a change.
|
| > There is some speculation Nintendo was involved with
| the death of the Nintendo 64 emulator UltraHLE a long
| time back, but this was never confirmed.
|
| iirc it got c&d but a case was never filed in court, the
| source code turned up eventually anyways.
| fragmede wrote:
| Yes there is. If I can emulate Super Mario Odyssey on my
| PC, I don't need to buy a Nintendo Switch. If it wasn't
| available there, I'd have to buy a Nintendo Switch to
| play it. That's a lost sale for Nintendo. You could argue
| that I wasn't going to buy a switch anyway, but then
| we're getting too into hypotheticals.
| ExoticPearTree wrote:
| This is the same reasoning the music and movie industries
| use when they go after people downloading music. And
| contrary to the popular opinion, I think it is wrong: if
| people want to pay, they will pay. Same for movies: if
| people would really want to pay for a movie, they would
| go to a cinema. Or stream it after a week or two. But
| there are also people who would jump through hoops than
| pay for music or movies. And that is not a lost sale
| because there was never an intention to buy something in
| the first place.
| singleshot_ wrote:
| I enjoy how you removed the "I think" qualifier which
| suggested that it's very possible that you're right.
|
| I'm quite well read on the DMCA but admit you probably know
| far more about how Nintendo wields it.
|
| Still, I suggest that it's a lot more likely that GitHub is
| going to get sued than you or GP.
|
| Finally, I believe using the legal system to bully
| independent software developers is, in legal terms, super
| lame. We are probably in the same side here.
| bawolff wrote:
| DMCA (at least the take down requests part) is not really
| suing someone and not really about making money. Its
| about getting certain works off the internet.
|
| You are probably more likely to be on the wrong end of a
| dmca take down request as a poor person since you dont
| have the resources to fight it, and its not about
| recovering damages just censorship.
| singleshot_ wrote:
| We are really losing the plot of what this thread is
| about here, but: DMCA takedown requests that are ignored
| or wheee the site does not comply with the process are
| subject to private civil action. Obviously, a takedown
| request is distinct from suing someone. And the way that
| the rights holder forces the site to remove the content
| is under threat of monetary penalties.
| beeboobaa3 wrote:
| Rules for thee but not for me (rich companies). Think of the
| shareholders!
| Analemma_ wrote:
| > How is it any different when a machine does the same thing?
|
| Because intent matters in the law. If you intended to reproduce
| copyrighted code verbatim but tried to hide your activity with
| a few tweaks, that's a very different thing from using a tool
| which _occasionally_ reproduces copyrighted code by accident
| but clearly was not designed for that purpose, and much more
| often than not outputs transformative works.
| archontes wrote:
| Not in copyright. The work speaks for itself, and the
| function of code is not a copyrightable aspect.
| bawolff wrote:
| The intent of the work can matter when determining if de
| minimis applies as well as fair use.
| olliej wrote:
| Um, the entire intent of these "AI" systems is explicitly to
| reproduce copyrighted work with mechanical changes to make it
| not appear to be a verbatim copy.
|
| That is the whole purpose and mechanism by which they
| operate.
|
| Also the intent does not matter under law - not intending to
| break the law is not a defense if you break the law. Not
| intending to take someone's property doesn't mean it becomes
| your property. You _might_ get less penalties and /or
| charges, due to intent (the obvious examples being murder vs
| manslaughter, etc).
|
| But here we have an entire ecosystem where the model is "scan
| copyrighted material" followed by "regurgitate that material
| with mechanical changes to fit the surrounding context and to
| appear to be 'new' content".
|
| Moreover given that this 'new' code is just a regurgitation
| of existing code with mutations to make it appear to fit the
| context and not directly identical to the existing code, then
| that 'new' code cannot be subject to copyright (you can't
| claim copyright to something you did not create, copyright
| does not protect output of mechanical or automatic
| transformations of other copyrighted content, and copyright
| does not protect the result of "natural processes", e.g 'I
| asked a statistical model to give me a statically plausible
| sequence of tokens and it did'). So in the best case scenario
| - the one where the copyright laundering as a service tool is
| not treated as just that, any code it produces is not
| protectable by copyright, and anyone can just copy "your
| work" without the license and (because you've said if you
| weren't intending to violate copyright it's ok) they can say
| they could not distinguish the non-copyright-protected work
| from the protected work and assumed that therefore none of it
| was subject to copyright. To be super sure though they
| weren't violating any of your copyrights, they then ran an
| "AI tool" to make the names better and better suit your
| style.
|
| I am so sick of these arguments where people spout nonsense
| about "AI" systems magically "understanding" or "knowing"
| anything - they are very expensive statistical models, the
| produce statistically plausible strings of text, by a
| combination of copying the text of others wholesale, and
| filling the remaining space with bullshit that for basic
| tasks is often correct enough, and for anything else is wrong
| - because again they're just producing plausible sequences of
| tokens and have no understanding of anything beyond that.
|
| To be very very very clear: if an AI system "understood"
| anything it was doing, it would not need to ingest
| essentially all the text that anyone has ever written, just
| to produce content that is at best only locally coherent, and
| that is frequently incorrect in more or less every domain to
| which it is applied. Take code completion (as in this case):
| Developers can write code without essentially reading all the
| code that has ever existed just so that they can write basic
| code, because developers understand code. Developers don't
| intermingle random unrelated and non-present variables or
| functions in their code as they write, because they
| understand what variables are and therefore they can't use
| non existent ones. "AI" on the other hand required more power
| than many countries to "learn" by reading as much as possible
| all code ever written, and then produce nonsense output for
| anything complex because they're still just generating a
| string of tokens that is plausible according to their
| statistical model - the result of these AIs is essentially
| binary: it has been in effect asked to produce code that does
| something that was in its training corpus and can be copied
| essentially verbatim, with a transformation path to make it
| fit, or it's not in the training corpus and you get random
| and generally incorrect code - hopefully wrong enough it
| fails to build, because they're also good at generating code
| that looks plausible but only fails at runtime because
| plausible sequence of tokens often overlaps with 'things a
| compiler will accept'.
| shkkmo wrote:
| > Also the intent does not matter under law - not intending
| to break the law is not a defense if you break the law
|
| Intent frequently matters a great deal when applying laws.
|
| In the specific area of copyright law, it doesn't itself
| make the use non infringing, but it can absolutely impact
| the damages or a fair use argument.
| Kim_Bruning wrote:
| I actually once tracked this claim down in the case of
| stable diffusion.
|
| I concluded that it was just completely impossible for a
| properly trained stable diffusion model to reproduce the
| works it was trained on.
|
| The SD model easily fits on a typical USB stick, and
| comfortably in the memory of a modern consumer GPU.
|
| The training corpus for SD is a pretty large chunk of image
| data on the internet. That absolutely does _not_ fit in GPU
| memory - by several orders of magnitude.
|
| No form of compression known to man would be able to get it
| that small. People smarter than me say it's mathematically
| not even possible.
|
| Now for closed models, you might be able to argue something
| else is going on and they're sneakily not training neural
| nets or something. But the open models we can inspect?
| Definitely not.
|
| Modern ML/AI models are doing Something Else. We can argue
| what that Something Else is, but it's not (normally)
| holding copies of all the things used to train them.
| anigbrowl wrote:
| It's equally plausible to say you don't intend to reproduce
| copyrighted code verbatim but occasionally do so given either
| a sufficiently specific prompt or because the reproduced code
| is so generic that it probably gets rewritten a hundred times
| a day because that's how people learned to do basic things
| from books or documentation or their education.
| munificent wrote:
| _> clearly was not designed for that purpose,_
|
| I'm not aware of evidence that support that claim. If I ask
| ChatGPT "Give me a recipe for squirrel lemon stew" and it so
| happens that one person did write a recipe for that exact
| thing on the Internet, then I would expect that the most
| accurate, truthful response would be that exact recipe.
| Anything else would essentially be hallucination.
| zmmmmm wrote:
| i think you are misconceiving then how LLMs work / what
| they are
|
| You can certainly try to hit a nail with a screw driver,
| but that doesn't make the screw driver a hammer.
| paulddraper wrote:
| Perfect analogy.
| munificent wrote:
| As I understand it, LLMs are intended to answer questions
| as "truthfully" as they can. Their understanding of truth
| comes from the corpus they are trained on. If you ask a
| question where the corpus happens to have something very
| close to that question and its answer, I would expect the
| LLM to burp up that answer. Anything less would be
| hallucination.
|
| Of course, if I ask a question that isn't as well served
| by the corpus, it has to do its best to interpolate an
| answer from what it knows.
|
| But ultimately its job is to extract information from a
| corpus and serve it up with as much semantic fidelity to
| the original corpus as possible. If I ask how many moons
| Earth has, it should say "one". If I ask it what the
| third line of Poe's "The Raven" is, it should say "While
| I nodded, nearly napping, suddenly there came a
| tapping,". Anything else is wrong.
|
| If you ask it a specific enough question where only a
| tiny corner of its corpus is relevant, I would expect it
| to end up either reproducing the possibly copyright piece
| of that corpus or, perhaps worse, cough up some bullshit
| because it's trying to avoid overfitting.
|
| (I'm ignoring for the moment LLM use cases like image
| synthesis where you _want_ it to hallucinate to be
| "creative".)
| kortilla wrote:
| They are all hallucinations. Calling lies hallucinations
| and truths normal output is nonsense.
| zmmmmm wrote:
| I get that's what you and a lot of people want it to be,
| but it isn't what they are. They are quite literally
| probabilistic text generation engines. Let's emphasise
| that: the output is produced randomly by sampling from
| distributions, or in simple terms, like rolling a dice.
| In a concrete sense it is non-deterministic. Even if an
| exact answer is in the corpus, its output is not going to
| be that answer, but the most probable answer from all the
| text in the corpus. If that one answer that exactly
| matches contradicts the weight of other less exact
| answers you won't see it.
|
| And you probably wouldn't want to - if I ask if donuts
| are radioactive and one person explicitly said that on
| the internet you probably aren't going to tell me you
| want it to spit out that answer just because it exactly
| matches what you asked. You want it to learn from the
| overwhelimg corpus of related knowledge that says donuts
| are food, people routinely eat them, etc etc and tell you
| they aren't radioactive.
| remuskaos wrote:
| Recipes are not copyrightable for that exact reason.
| sleepybrett wrote:
| Substitue recipe for literally any other piece of unique
| information.
| dmix wrote:
| That's a significant over simplification of how it works though
| to the point of almost not being a useful analogy.
|
| If your analogy was you were a human who memorized every
| variation of a problem (and every other known problem) and
| there was a tiny perctange of a chance where you reproduced
| that exact varation of one you memorized, but then added an
| after the fact filter so you don't directly reproduce it...
|
| It's more like musicians who basically copy a bunch of music
| patterns or chord progressions before then notice their final
| output sounds too similar to another song (which happens often
| IRL) then changes it to be more original before releasing it to
| the public.
| ADeerAppeared wrote:
| > If you analogy was you were a human who memorized every
| variation of a problem (and every other known problem)
|
| This is mere assumption. AI is _supposed to_ work like that,
| but that 's a goal, and not the result of current
| implementations. Research shows that they do memorize
| _solutions_ as well, and quite regularly so. (This is an
| unavoidable flaw in current LLMs; They must be capable of
| memorizing input verbatim in order to learn specific facts.)
|
| > and there was a tiny perctange of a chance where you
| reproduced that exact varation of one you memorized
|
| This is copyright infringement. _Actionable_ copyright
| infringement. The big music publishers go after this kind of
| accidental partial reproduction.
|
| > but then added an after the fact filter so you don't
| directly reproduce it...
|
| "Legally distinct" is a gimmick that only works where the
| copyright is on specific identifiable parts of a work.
|
| Changing a variable name does not make a code snippet
| "legally distinct", it's still copyright infringement.
| dmix wrote:
| Meh I still see that as a big oversimplification. Context
| matters. Even if the copyright courts often ignore that for
| wealthy entities. Someone reproducing a song using AI and
| publishing it as their own copyright infringement, a person
| specifically querying an AI engine, that sucked up billions
| of lines of information and generates what you ask it do
| with a sma probability it will reproduce a small subset of
| a larger commercial project and sends it to someone in a
| chatbox is not exactly the same IMO.
|
| This is Github Copilot after all. I use it daily and it
| autocompletes lines of code or generates functions you can
| find on stackoverflow. It's not letting giving you the
| source code to Twitter in full and letting you put it on
| the internet as a business under another name.
| belorn wrote:
| We are currently seeing the music industry reacting to AI
| learning a bunch of music patterns and chord progressions and
| outputting works that sounds very similar to existing music
| and artists. They are not liking it.
|
| To just see how much they disliked it, youtube copyright
| strikes is basically a trained AI to detect music patterns to
| identify sound with slight variations or copyrighted songs
| and take videos down. Generating slight variations was one of
| the early method that videos used to bypass the take down
| system.
| archontes wrote:
| You might not get your ass kicked. Copyright doesn't protect
| function, to the point where the court will assess the degree
| to which the style of the code can be separated from the
| function. In the even that they aren't separable, the code is
| not copyrightable.
|
| https://www.wardandsmith.com/articles/supreme-court-announce...
|
| https://easlerlaw.com/software-computer-code-copyrighted#:~:...
| ADeerAppeared wrote:
| The simple version is that code _is_ copyrightable as an
| _expression_. And the underlaying algorithm is _patentable_.
|
| The legal term you're looking for here is the "Abstraction-
| Filtration-Comparison" test; What remains if you subtract all
| the non-copyrightable elements from a given piece of code.
| adrian_b wrote:
| Algorithms have become patentable only very recently in the
| history of patents, without a rationale being ever provided
| for this change, and in some countries they have never
| become patentable.
|
| Even in the countries other than USA where algorithms have
| become patentable, that happened only due to USA
| blackmailing those countries into changing their laws "to
| protect (American) IP".
|
| It is true however that there exist some quite old patents
| which in fact have patented algorithms, but those were
| disguised as patents for some machines executing those
| algorithms, in order to satisfy the existing laws.
| tomxor wrote:
| US copyright does protect for "substantial similarity" [0].
| And at the other end of the spectrum, this has been abused in
| absurd ways to argue that substantially different code has
| infringed.
|
| In Zenimax vs Oculus they basically argued that a bunch of
| really abstract yet entirely generic parts of the code were
| shared, we are talking some nested for loops, certain
| combinations of if statements, and due to a lack of a
| qualitative understanding of code, syntax, common patterns,
| and what might actually qualify for substantively novel code
| in the courtroom, this was accepted as infringing. [1]
|
| Point is, the legal system is highly selective when it comes
| to corporate interests.
|
| [0] https://en.wikipedia.org/wiki/Substantial_similarity
|
| [1] https://arstechnica.com/gaming/2017/02/doom-co-creator-
| defen...
| talldayo wrote:
| > Point is, the legal system is highly selective when it
| comes to corporate interests.
|
| I don't even think it's that. In recent cases like Oracle
| v. Google and Corellium v. Apple, Fair Use prevailed with
| all sorts of conflicting corporate interests at play. The
| Zenimax v. Oculus case very much revolved around NDAs that
| Carmack had signed and not the propagation of trade
| secrets. Where IP is strictly the only thing being
| concerned, the literal interpretation of Fair Use does
| still seem to exist.
|
| Or for a more plain example, Authors Guild. v. Google where
| Google defended their indexing of thousands of copywritten
| books as Fair Use.
| tpmoney wrote:
| In fact, go to far as to argue your example of Authors
| Guild v. Google is a good indication that most cases will
| probably go an AI platform's way. It's a pretty parallel
| case to a number of the arguments. Indexing required
| ingesting whole works of copyright material verbatim. It
| utilized that ingested data to produce a new commercial
| work consisting of output derived from that data. If I
| remember the case correctly, google even displayed
| snippets when matching a search so the searcher could see
| the match in context, reproducing the works verbatim for
| those snippets and one could presume (though I don't
| recall if it was coded against), that with sufficiently
| clever search prompts, someone could get the index search
| to reproduce a substantial portion of a work.
|
| Arguably, the AI platforms have an even stronger case as
| their nominal goal is not to have their systems reproduce
| any part of the works verbatim.
| belorn wrote:
| A key finding that the judge said in the Authors Guild v.
| Google case was that the authors benefited from the tool
| that google created. A search tool is not a replacement
| for a book, and are much more likely to generate
| awareness of the book which in turn should increase sales
| for the author.
|
| AI platforms that replaces and directly compete with
| authors can not use the same argument. If anything, those
| suing AI platforms are more likely to bring up Authors
| Guild v. Google as a guiding case to determine when to
| apply fair use.
| jcranmer wrote:
| > In fact, go to far as to argue your example of Authors
| Guild v. Google is a good indication that most cases will
| probably go an AI platform's way.
|
| The more recent Warhol decision argues quite strongly in
| the opposite direction. It fronts market impact as the
| central factor in fair use analysis, explicitly saying
| that whether or not a use is transformative is in decent
| part dependent on the degree to which it replaces the
| original. So if you're writing a generative AI tool that
| will generate stock photos that it generated by scraping
| stock photo databases... I mean, the fair use analysis
| need consist of nothing more than that sentence to
| conclude that the use is totally not fair; none of the
| factors weigh in favor it.
| tpmoney wrote:
| I think that decision is much narrower than "market
| impact". It's specifically about substitution, and to
| that end, I don't see a good argument that Co-Pilot
| substitutes for any of the works it was trained on. No
| one is buying a license to co-pilot to replace buying a
| license to Photoshop, or GIMP, or Linux, or Tux Racer.
| Nor is Github selling co-pilot for that use.
|
| To the extent that a user of co-pilot could induce it to
| produce enough of a copyrighted work to both infringe on
| the content (remember that algorithms are not protected
| by copyright) and substitute for the original by
| licensing in lieu of, I would expect the courts to
| examine that in the ways it currently views a xerox
| machine being used to create copies of a book. While the
| machine might have enabled the infringement, it is the
| person using the machine to produce and then distribute
| copies that is doing the infringing not the xerox machine
| itself nor Xerox the company.
|
| Specifically in the opinion the court says:
|
| >If an original work and a secondary use share
|
| >the same or highly similar purposes, and the secondary
| use
|
| >is of a commercial nature, the first factor is likely to
|
| >weigh against fair use, absent some other justification
| for
|
| >copying.
|
| I find it difficult to come up with a good case that any
| given work used to train co-pilot and co-pilot itself
| share "the same or highly similar purposes". Even in the
| case of say someone having a code generator that was used
| in training of co-pilot, I think the courts would also be
| looking at the degree to which co-pilot is dependent on
| that program. I don't know off hand if there are any
| court cases challenging the use of copyright works in a
| large collage of work (like say a portrait of a person
| made from Time Magazine covers of portraits), but again
| my expectation here is that the court would find that
| while the entire work (that is the magazine cover) was
| used and reproduced, that reproduction is a tiny fraction
| of the secondary work and not substantial to its purpose.
|
| Similarly we have this line:
|
| >Whether the purpose and character of a use weighs in
| favor
|
| >of fair use is, instead, an objective inquiry into what
| use
|
| >was made, i.e., what the user does with the original
| work.
|
| Which I think supports my comparison to the xerox
| machine. If the plaintiffs against Co-Pilot could have
| shown that a substantial majority of users and uses of
| Co-Pilot was producing infringing works or producing
| works that substitute for the training material, they
| might prevail in an argument that co-pilot is infringing
| regardless if the intent of github. But I suspect even
| that hurdle would be pretty hard to clear.
| jcranmer wrote:
| Of the various recent uses of generative AI, Copilot is
| probably the one most likely to be found fair use and
| image generation the least likely.
|
| But in any case, Authors Guild is not the final word on
| the subject, and anyone trying to argue for (or against)
| fair use for generative AI who ignores Warhol is going to
| have a bad day in court. The way I see it, Authors Guild
| says that if you are thoughtful about how you design your
| product, and talk to your lawyers early and continuously
| about how to ensure your use is fair and will be seen as
| fair in the courts, you can indeed do a lot of copying
| and still be fair use.
| tpmoney wrote:
| I agree. Nothing is going to be the final word until more
| of these cases are heard. But I still don't think Warhol
| is as strong even against other uses of generative AI,
| and in fact I think in some ways argues in their favor.
| The court in Warhol specifically rejects the idea that
| the AWF usage is sufficiently transformed by the nature
| of the secondary work being recognizably a Warhol. I
| think that would work the other way around too, that a
| work being significantly in a given style is not
| sufficient for infringement. While certainly someone
| might buy a license to say, Stable Diffusion and attempt
| to generate a Warhol style image, someone might also buy
| some paints and a book of Warhol images to study and
| produce the same thing. Provided the produced images are
| not actually infringements or transformations of
| identifiably original Warhol works, even if they are in
| his style, I think there's a good argument to be made
| that the use and the tool are non-infringing.
|
| Or put differently, if the Warhol image had used
| Goldsmith's image as a reference for a silk screen
| portrait of Steve Tyler, I'm not sure the case would have
| gone the same way. Warhol's image is obviously and
| directly derived from Goldsmith's image and found
| infringing when licensed to magazines, yet if Warhol had
| instead gone out and taken black and white portraits of
| prince, even in Goldsmith's style after having seen it,
| would it have been infringing? I think the closest case
| we have to that would have been the suit between Huey
| Lewis and Ray Parker Jr. over "I Want a New
| Drug"/"Ghostbusters" but that was settled without a
| judgement.
|
| I do agree that Warhol is a stronger argument against
| artistic AI models, but it would very much have to depend
| on the specifics of the case. The AWF usage here was
| found to be infringing, with no judgement made of the
| creation and usage of the work in general, but
| specifically with regard to licensing the work to the
| magazine. They point out the opposite case that his
| Campbell paintings are well established as non-infringing
| in general, but that the use of them licensed as logos
| for soup makers might well be. So as is the issue with
| most lawsuits (and why I think AI models in general will
| win the day), the devil is in the details.
| wahern wrote:
| > US copyright does protect for "substantial similarity"
|
| Substantial similarity refers to three different legal
| analyses for comparing works. In each case what the
| analysis is attempting to achieve is different, but in no
| case does it operate to prohibit similarity, per se.
|
| The Wikipedia page points out two meanings. The first is a
| rule for establishing provenance. Copyright protects
| originality, not novelty. The difference is that if two
| people coincidentally create identical works, one after
| another, the second-in-time creator has not violated any
| right of the first. (Contrast with patents, which do
| protect novelty.) In this context, substantial similarity
| is a way to help establish a rebuttable presumption that
| the latter work is not original, but inspired by the
| former; it's a form of circumstantial evidence. Normally a
| defendant wouldn't admit outright they were knowingly
| inspired by another work, though they might admit this if
| their defense focuses on the second meaning, below. The
| plaintiff would also need to provide evidence of access or
| exposure to the earlier work to establish provenance;
| similarity alone isn't sufficient.
|
| The second meaning relates to the fact that a work is
| composed of multiple forms and layers of expression. Not
| all are copyrightable, and the aggregate of copyrightable
| elements needs to surpass a minimum threshold of content.
| Substantial similarity here means a plaintiff needs to
| establish that there are _enough_ _copyrightable_
| _elements_ in common. Two works might be near identical,
| but not be substantially similar if they look identical
| merely because they 're primarily composed of the same non-
| copyrightable expressions, regardless of provenance.
|
| There's a third meaning, IIRC, referring to a standard for
| showing similarity at the pleadings stage. This often
| involves a superficial analysis of apparent similarity
| between works, but it's just a procedural rule for shutting
| down spurious claims as quickly as possible.
| copywrong2 wrote:
| Copyright is abused often. Our modern version of copyright
| is BS and only benefits large corps who buy a lot of IP.
| whythre wrote:
| Yep. Now it is a legal cudgel wielded most effectively by
| corporate giants. It has mutated to become completely
| philosophically opposed to what it was expressly created
| to protect.
| torginus wrote:
| If I were to license a cover of a song for a music video, I'd
| have to license both the original song and the cover itself.
|
| I'd say this is extremely relevant in this case.
| bryanrasmussen wrote:
| if that is the case why do people ever license covers?
|
| to clarify - I thought you just had to negotiate with the
| cover artist about rights and pay a nominal fee for usage
| of the song for cover purposes - that is to say you do not
| negotiate with the original artist, you negotiate with a
| cover artist and the whole process is cheaper?
| seanhunter wrote:
| You're maybe thinking about this in a way that's not
| helping you to understand the system and why it works the
| way it does. It's very clear when you think of a specific
| case.
|
| Say you want to make a recording of "Valerie" by the
| Zutons. You need permission (a license) from the
| songwriters (the Zutons presumably) to do this. You
| usually get this permission by paying a fee. Having done
| that, you can do your recording. Whenever that recording
| is played (or used) you will get a performance royalty
| and they will get a songwriting royalty.
|
| Say you want to use a cover of "Valerie" by the Zutons in
| your film or whatever. Say the Mark Ronson version
| featuring Amy Winehouse. You need permission (a license)
| from the person who produced that version (Mark Ronson or
| his company) and will need to pay them a fee, some of
| which goes to the songwriter as part of their deal with
| Mark Ronson which gave him the license to produce his
| cover in the first place.
|
| The Zutons don't have the right to sell you a license to
| Mark Ronson's version so if that's the version you want
| you have to negotiate with him. Likewise he doesn't have
| the right to sell you a license like the license he has
| (ie a license to do a recording/performance) so if you
| want that you have to negotiate with them.
| seadan83 wrote:
| Cover songs have a special abd explicit law covering them.
| Not relevant.
| giamma wrote:
| Software like Blackduck or Scanoss is designed to identify
| exactly that type of behaviour. It is used very often to scan
| closed source software and to check whether it contains
| snippets that are copied from open source with incompatible
| licenses (e.g. GPL).
|
| To be able to do so, these softwares build a syntax tree of
| what your code snippet is, and compare the tree structure
| with similar trees in open source software without being
| fooled by variable names. To speed up the search, they also
| compute a signature for these trees so that the signature can
| be more easily searched in their database of open source
| code.
| scott_w wrote:
| While correct, the example given is that they COPY the code,
| then make adjustments to hide the fact. I suspect this is
| still a copyright violation. It's interesting that a judge
| sees it differently when it's just run through a programme.
| I'm not a legal expert so I'm guessing it's a bit more
| complex than the headline?
| scott_w wrote:
| Ok I read the article and it looks like the issue is the
| DMCA specifically, which require the code to be more
| identical than is presented. I'm guessing separate claims
| could still come from other copyright laws?
| itishappy wrote:
| No copy-paste was explicitly used. They compressed it into
| a latent space and recreated from memory, perhaps with a
| dash of "creativity" for flavor. Hypothetically, of course.
|
| The distinction is pedantic but important, IMHO. AI doesn't
| explicitly copy either.
| scott_w wrote:
| But isn't that the same as memorising it and rewriting
| the implementation from memory? I'm sure "it wasn't an
| exact reproduction" is not much of a defence.
| itishappy wrote:
| I sure think so. I also think that (to first order) this
| is exactly what modern AI products do. Is a lossy copy
| still a copy?
| scott_w wrote:
| I would have thought so but I'm not a lawyer. The article
| suggests DMCA is intended for direct copies so that's why
| it failed here. Maybe more general copyright laws would
| apply for lossy copies.
| ars wrote:
| > I assume that I would get my ass kicked legally speaking.
|
| Maybe, maybe not. It's not as simple as you made it out to be.
| If you write a book with lots of stuff and you got inspiration
| from other books, and even put in phrases wholesale, but
| modified to use your own character names instead, I'm not
| convinced you would lose.
|
| The court would look at the work as a whole, not single pieces
| of it.
|
| They would also check if you are just copying things verbatim,
| or if you memorize a pattern and emit the same pattern - for
| example look at lawsuits about copying music, where they'll
| claim this part of the music is the same as that part.
|
| It's really not as cut and dry as you make it out to be.
| williamcotton wrote:
| Just to set the stage and not entirely specific to this
| complaint... It really depends on what is and isn't subject to
| copyright for software.
|
| Broadly, there is the distinction between expressive and
| functional code. [1]
|
| And then there are the specific tests that have been developed
| by the courts to separate the expressive and functional aspects
| of software. [2] [3]
|
| In practice it is very expensive for a plaintiff to do such
| analysis. For the most part the damages related to copyright
| are not worth the time and money. Plaintiffs tend to go for
| trade secret related damages as they are not restricted by the
| above tests.
|
| There are also arguments to be made of de minimis infringements
| that are not worth the time of the court.
|
| Most importantly the plaintiff fundamentally has the burden of
| proof and cannot just say that copying must have taken place.
| They need concrete evidence.
|
| [1] https://en.wikipedia.org/wiki/Idea-expression_distinction
|
| [2]
| https://en.wikipedia.org/wiki/Structure,_sequence_and_organi...
|
| [3] https://en.wikipedia.org/wiki/Abstraction-Filtration-
| Compari...
| wvenable wrote:
| You probably do this all the time. Forget memorizing but
| undoubtedly you've read code, learned from it, and then likely
| reproduced similar code. Probably nothing terribly important,
| just a function here or there. Maybe even reproduced something
| you did for a previous employer.
| Aerroon wrote:
| arr.sort((a, b) => a - b);
|
| comes to mind. I bet most js devs have written this verbatim.
| JoshTriplett wrote:
| You have a much smaller lobbying budget than the AI industry,
| and you didn't flagrantly rush to copy billions of copyrighted
| works as quickly as possible and then push a narrative acting
| like that's the immutable status quo that must continue to be
| permitted lest the now-massive industry built atop copyright
| violation be destroyed.
|
| Violate one or two copyrights, get sued or DMCAed out of
| existence. Violate billions, on the other hand, and you
| magically become immune to the rules everyone else has to
| follow.
| nadermx wrote:
| What about the copyrights purpose of furthering the arts and
| sciences?
| JoshTriplett wrote:
| Copyright has utterly failed to serve that purpose for a
| long time, and has been actively counterproductive.
|
| But if you want to argue that copyright is
| counterproductive, I completely agree. That's an argument
| for reducing or eliminating it across the board, fairly,
| for everyone; it's not an argument for giving a free pass
| to AI training while still enforcing it on everyone _else_.
| idle_zealot wrote:
| Could these "free passes" for AI training serve as a
| legal wedge to increase the scope of fair use in other
| cases? Pro-business selective enforcement sucks, but so
| long as model weights are being released and the public
| is benefiting then stubbornly insisting that overzealous
| copyright laws be enforced seems self-defeating.
| adra wrote:
| Without copyright, entire industries would've been dead a
| long time ago, including many movies, games, books, tv,
| music, etc.
|
| Just because their lobbies tend to push the boundary of
| copyright into the absurd doesn't mean these industries
| aren't worth saving. There should be actually respectful
| lawmakers who seek for a balance of public and commercial
| interests.
| pdonis wrote:
| _> Without copyright, entire industries would 've been
| dead a long time ago, including many movies, games,
| books, tv, music, etc._
|
| Citation needed. There are many ways to make money from
| producing content other than restricting how copies of it
| can be distributed. The owner should be able to choose
| copyright as a means of control, but that doesn't mean
| nobody would create any content at all without copyright
| as a means of control.
| boplicity wrote:
| There's _nothing_ preventing people from producing works
| and releasing them without copyright restriction. If that
| were a more sustainable model, it would be happening far
| more often.
|
| As it is now, especially in the creative fields (which I
| am most knowledgeable about), the current system has
| allowed for a incredible flourishing of creation, which
| you'd have to be pretty daft to deny.
| TylerE wrote:
| Given that copyrighting is automatic at the instant of
| creation, that is, um, debatable.
|
| Slapping 3 lines in LICENSE.TXT doesn't override the
| Berne convention.
| trogdor wrote:
| Are you claiming that an author cannot place their work
| in the public domain?
| pdonis wrote:
| Yes, they can't, because there is no legally reliable way
| to do it (briefly, because the law really doesn't like
| the idea of property that doesn't have an owner, so if
| you try to place a work of yours in the public domain,
| what you're actually doing is making it abandoned
| property so anyone who wants to can claim _they_ own it
| and restrict everyone else, including you, from using
| it). The best an author can do is to give a license that
| basically lets anyone do what they want with the work.
| Creative Commons has licenses that do that.
| tpush wrote:
| In most of the world no, they can't.
| chii wrote:
| > If that were a more sustainable model, it would be
| happening far more often.
|
| that's not the argument. The fact that there currently
| are restrictions on producing derivative works is the
| problem. You cannot produce a star wars story, without
| getting consent from disney. You cannot write a harry
| potter story, without consent from Rowling.
| boplicity wrote:
| That's not actually true. There's nothing stopping you
| from producing derivative works. Publishing and/or
| profiting from other people's work does have some
| restrictions though.
|
| There's actually a huge and thriving community of people
| publishing derivative works, in a not-for-profit basis,
| on Archive of Our Own. (Among other places.)
| pdonis wrote:
| _> There 's actually a huge and thriving community of
| people publishing derivative works, in a not-for-profit
| basis, on Archive of Our Own. (Among other places.)_
|
| Yes, and none of those people are _making a living_ at
| creating things. That 's why they are allowed by the
| copyright owners to do what they're doing--because it's
| not commercial. Try to actually _sell_ a derivative work
| of something you don 't own the copyright for and see how
| fast the big media companies come after you. You
| acknowledge that when you say there are "restrictions"
| (an understatement if I ever saw one) on profiting from
| other people's work (where "other people" here means the
| media companies, _not_ the people who actually _created_
| the work).
|
| It is true that without our current copyright regime, the
| "industries" that produce Star Wars, Disney, etc.
| products would not exist in their current form. But does
| that mean works like those would not have been created?
| Does it mean we would have less of them? I strongly doubt
| it. What it _would_ mean is that more of the profits from
| those works would go to the actual creative people
| instead of middlemen.
| boplicity wrote:
| > Yes, and none of those people are making a living at
| creating things.
|
| Again, not true. One of the most famous examples is
| likely Naomi Novik, who is a bestselling author, in
| addition to a prolific producer of derivative works
| published on AO3. Many other commercially successful
| authors publish derivative works on this platform as
| well.
|
| > It is true that without our current copyright regime,
| the "industries" that produce Star Wars, Disney, etc.
| products would not exist in their current form. But does
| that mean works like those would not have been created?
| Does it mean we would have less of them? I strongly doubt
| it. What it would mean is that more of the profits from
| those works would go to the actual creative people
| instead of middlemen.
|
| Speculate all you want about an alternative system, but
| you really don't know what would have happened, or what
| would happen moving forward.
| pdonis wrote:
| _> not true_
|
| Sorry, I meant they're not making a living at creating
| derivative works of copyrighted content. They can't, for
| the reasons you give. Nor can other people make a living
| creating derivative works of _their_ commercially
| published work. That is an obvious barrier to creation.
| copywrong2 wrote:
| People do not put out their stuff. People get lured into
| contracts selling their IP to a shitty company that then
| publishes stuff, of course WITH copyright so they can
| make money while the artist doesnt
| pdonis wrote:
| _> the current system has allowed for a incredible
| flourishing of creation_
|
| No, the current system has allowed for an incredible
| flourishing of middlemen who don't create anything
| themselves but coerce creative people into agreements
| that give the middlemen virtually all the profits.
| seadan83 wrote:
| Copyright laws prevent piracy. It is interesting to live
| in a country with no enforced copyrights and EVERYTHING
| is pirated. I think it is easy to not know about that
| context and just see the stick side of copyright vis-a-
| vis big money corporations
| cvwright wrote:
| So true! Copyrights that last 20 years would be
| completely reasonable. Maybe with exponentially
| increasing fees for successive renewals, for super
| valuable properties like Disney movies.
| TylerE wrote:
| For that matter, if you think China ripping everyone else
| off is bad now... well, just wait until every company can
| do that.
| Qwertious wrote:
| If everyone could do it, it wouldn't be as big a deal -
| small western businesses would be on a more level playing
| field, since they would be almost as immune from being
| sued by big businesses as Chinese businesses are. As it
| is, small businesses aren't protected by patents (because
| a patent is a $10k+ ticket to a $100k+ lawsuit against a
| competitor with a $1M+ budget for lawyers) while still
| being bound by the restrictions of big business's
| patents. It's lose/lose.
| DoItToMe81 wrote:
| Trademark isn't copyright, so no.
| matheusmoreira wrote:
| Nobody cares anymore. We're sick of their rent seeking,
| of their perpetual monopolies on culture. Balance?
| Compromise? We don't want to hear it.
|
| Nearly two hundred years ago one man warned everyone this
| would happen. Nobody listened. These are the
| consequences.
|
| "At present the holder of copyright has the public
| feeling on his side. Those who invade copyright are
| regarded as knaves who take the bread out of the mouths
| of deserving men. Everybody is well pleased to see them
| restrained by the law, and compelled to refund their ill-
| gotten gains. No tradesman of good repute will have
| anything to do with such disgraceful transactions. Pass
| this law: and that feeling is at an end. Men very
| different from the present race of piratical booksellers
| will soon infringe this intolerable monopoly. Great
| masses of capital will be constantly employed in the
| violation of the law. Every art will be employed to evade
| legal pursuit; and the whole nation will be in the plot.
| On which side indeed should the public sympathy be when
| the question is whether some book as popular as "Robinson
| Crusoe" or the "Pilgrim's Progress" shall be in every
| cottage, or whether it shall be confined to the libraries
| of the rich for the advantage of the great-grandson of a
| bookseller who, a hundred years before, drove a hard
| bargain for the copyright with the author when in great
| distress? Remember too that, when once it ceases to be
| considered as wrong and discreditable to invade literary
| property, no person can say where the invasion will stop.
| The public seldom makes nice distinctions. The wholesome
| copyright which now exists will share in the disgrace and
| danger of the new copyright which you are about to
| create. And you will find that, in attempting to impose
| unreasonable restraints on the reprinting of the works of
| the dead, you have, to a great extent, annulled those
| restraints which now prevent men from pillaging and
| defrauding the living."
|
| https://www.thepublicdomain.org/2014/07/24/macaulay-on-
| copyr...
| Zambyte wrote:
| Books, music, and games are a lot older than copyright.
| adra wrote:
| Have you looked at who created these things by and large?
| For the most part, you have: - aristocrats that were
| wealthy that didn't need to "work" to survive and put
| food on the table - crafts people supported through the
| patronage of a rich person (or religious order) who deign
| to support your art - (kinda modern world) national
| governments who want to support their national art often
| as a fear that other larger nations cultural influences
| will dwarf their
|
| Are you implying that these three pillars will be able to
| produce anywhere near the current amount of content we
| produce?
|
| How in the world where digital copies are effectively
| free to copy and infinitum would a creator reap any
| benefits from that network effect?
|
| A modern equivalent would be famous YouTubers who all
| they do all day is "watch" other people's hard earned
| videos. The super lazy ones will not direct people to the
| original, don't provide meaningful commentary, just
| consumes the video as 'content' to feed their own
| audience and provides no value to the original creator.
| The position to kill copyright entirely would amplify
| this "just bypass the original source" to lower value of
| the original creator to zero.
| Zambyte wrote:
| > Are you implying that these three pillars will be able
| to produce anywhere near the current amount of content we
| produce?
|
| Do you think the vast "amount of content we produce" is
| actually propped _up_ by copyright? Have you ever heard
| of someone who started their career on YouTube due to
| copyright? On the contrary, how often have you heard of
| people _stopping_ their YouTube career due to copyright,
| or explicitly limiting the content they create? I have
| only heard of cases of the latter. In fact, the latter
| partially happened to me.
|
| > How in the world where digital copies are effectively
| free to copy and infinitum would a creator reap any
| benefits from that network effect?
|
| You are making an assumption that people should reap
| (monetary) benefits for creating things. What you are
| ignoring is that the world where digital copies are
| effectively free is also the world where original works
| are insanely cheap as well. In this world, people create
| regardless of monetary gain.
|
| To make this point: how much money did you make from this
| comment that you posted? It's covered by copyright, so
| surely you would not have created it if not for your own
| benefit.
| adra wrote:
| Spending 6 minutes of my life engaging in political
| discourse is a far swing from hundreds of individuals
| producing a movie that took millions of dollars to
| produce. Both are just as easily digitally repeatable,
| but the expensive content is likely way more beneficial
| to society as a whole. I am choosing to engage in this
| hobby because I receive the means to provide this content
| recreationally. I fail to see this scaling to anything of
| any real quality outside of some isolated instances. For
| instance, some video game enthusiasts are using the work
| of Bethesda to make a new game call fallout London. It's
| a knock off fallout game using the base code engine that
| Bethesda built for their commercial games. The game is
| exceptional in that it could actually achieve a mostly
| compatible level of a commercial product as long as you
| ignore that they're leveraging the engines and story
| which were developed by commercial interests. In the same
| time, 10's to hundreds of thousands of people are
| employed every year to produce video games for commercial
| reasons. Will they all stop making games if copyright was
| dead? No, but the vast majority would.
| copywrong2 wrote:
| Yeah many industries like:
|
| - Big Corps that buy IP
|
| - Patent Trolls
|
| - Companies that fuck over artists
| JumpCrisscross wrote:
| > _Copyright has utterly failed to serve that purpose for
| a long time, and has been actively counterproductive_
|
| This debate is tired because nobody brings citations. The
| pro-copyright lobby cites numbers of jobs. The anti,
| nothing. In that midst, of course we're going to stick
| with the _status quo_.
| kibwen wrote:
| This is a specious argument. It is impossible for us to
| gesture at the works of art that do not exist _because_
| of draconian copyright. Humans have been remixing each
| others ' works for millions of years, and the artificial
| restriction on derivative work is actively destroying our
| collective culture. There should be _thousands_ of
| professional works (books, movies, etc.) based on Lord Of
| The Rings by now, many of which would surpass the
| originals in quality given enough time, and we have been
| robbed of them. And Lord Of The Rings is an outlier in
| that it still remains culturally relevant despite its
| age; _most_ works will remain copyrighted for far longer
| than their original audience was even alive, meaning that
| those millions of flowers never get their chance to
| bloom.
| eropple wrote:
| This is all true, and in a vacuum I agree with it.
| There's a pretty core problem with these kinds of
| assertions, though: people have to make rent. Never have
| I seen a substantiative, pass-the-sniff-test argument for
| how to make practical this system when your authors and
| your artists need to eat in a system of modern capital.
|
| So I'm asking genuinely: what's your plan? What's the A
| to B if you could pass a law tomorrow?
| uhoh-itsmaciek wrote:
| Copyright is not optimized for making sure artists and
| authors get enough to eat. It's optimized for people with
| a lot of money to make even more money by exploiting
| artists and authors.
|
| I doubt there's a simple answer (I certainly don't have
| one), but the current system is not exactly a creators'
| utopia.
| Qwertious wrote:
| Not the person you responded to, but:
|
| >So I'm asking genuinely: what's your plan? What's the A
| to B if you could pass a law tomorrow?
|
| Patreon (or liberapay etc). Take a look at youtube: so
| many creators are actively saying "youtube doesn't pay
| the bills, if you like us then please support us on
| Patreon". Patreon works. Some of the time, at least -
| just like copyright. Also crowdsourcing (e.g.
| Kickstarter), which worked out well for games like FTL
| and Kingdom Come: Deliverance.
|
| Although, I personally don't believe copyright should be
| abolished - it just needs some amendments. It needs a
| duration amendment - not a flat duration (fast-fashion
| doesn't need even 5 years of copyright, but aerospace
| software regularly needs several decades just to break
| profitable), but either some duration-mechanism or a
| simple discrimination by industry.
|
| Also, I think any sort of functional copyright (e.g.
| software copyright) ought to have an incentive or
| requirement to publish the functional bits - for
| instance, router firmware ought to require the source
| code in escrow (to be published once copyright duration
| expires) for any legal protections against reverse-
| engineering to be mounted. Unpublished source code is a
| trade secret, and should be treated as such.
|
| Also, these discussions don't seem to mention fanfiction,
| which demonstrates plenty of people write good works
| without being professionally paid and without the
| protection of copyright.
| davrosthedalek wrote:
| How many subscribers on patreon are there because the
| creators provides pay-walled extra content? How many
| would remain if that pay-walled content would be mirrored
| directly by youtube or on youtube?
|
| Crowdsourcing might work better, but how many would
| donate to a game where, instead of getting it cheaper as
| a kickstarter supporter, they could get free after it is
| released?
| JoshTriplett wrote:
| > What's the A to B if you could pass a law tomorrow?
|
| Top priority: UBI, together with a world in which there's
| so much surplus productivity that things can survive and
| thrive without having "how does this make huge amounts of
| money" as its top priority to optimize for.
|
| Apart from that: Conventions/concerts/festivals (tickets
| to a unique live event with a crowd of other fans),
| merchandise (pay for a physical object), patronage (pay
| for the ongoing creation of a thing),
| crowdfunding/Kickstarter (pay for a thing to come into
| existence that doesn't exist yet), brand/quality
| preference (many people prefer to support the original
| even if copies can be made), commissions (pay for unique
| work to be created for you), something akin to "venture
| funding", and the general premise that if a work spawns
| ten thousand spinoffs and a couple of them are incredible
| hits they're likely to direct some portion of their
| success back towards the work they build upon if that's
| generally looked upon favorably.
|
| People have an incredible desire both to create and to
| enjoy the creations of others, and that's not going to
| stop. It is very likely that the concept of the $1B movie
| would disappear, and in trade we'd get the creation of
| far far more works.
| eropple wrote:
| Yeah, this is what I was expecting. I have no love for
| Disney et al but I think that this is dire (aside from
| UBI, which would be great but is fictional without a
| large-scale shift in American culture).
|
| "Everybody else gets paid for the work they do; you get
| paid for things _around_ the work you do, _if you 're
| lucky_" is a way to expect creatives to live that, to put
| a point on it, always ends up being "for thee, but not
| for me". It's bad enough today--I think you described
| something worse.
| JoshTriplett wrote:
| The current model is "most people get paid for the work
| they do, but you get paid for people copying work you've
| already done", which already seems asymmetric. This would
| change the model to "people get paid for the work they
| do, and not paid _again_ for copying work they 've
| already done".
| eropple wrote:
| We converged on a system that protects the
| commercialization of copies because, in practice, "the
| first copy costs $X0,000" is not a viable way to pay your
| rent.
|
| If we want art to be the province of the willfully
| destitute or the idle rich (and I do mean _rich_ , the
| destruction of a functional middle class has compacted
| the available free time of huge swaths of society!), this
| is a good way to do it. I would rather other voices be
| included.
| JoshTriplett wrote:
| We converged on a system that makes copying illegal
| because that system was invented in an era when the only
| people who _could_ copy were those with specialized
| equipment (e.g. printing presses). In that world, those
| who might do the copying were often larger than those
| whose works were being copied, and copyright had more
| potential to be "protective".
|
| That system hasn't been updated for a world in which
| everyone can make perfect-fidelity copies or
| modifications at the touch of a key; on the contrary,
| it's been made _stricter_. And worse, per the story we
| 're commenting on here, the much larger players who are
| mass-copying works largely by individuals or smaller
| entities have become effectively exempt from copyright,
| while copyright continues to restrict individuals and
| smaller entities, and the systems designed by those large
| players and trained on all those copied works are
| crowding individuals _out_ of art and other creative
| endeavors.
|
| I don't think the current system deserves valorizing, nor
| can it be credited as being intentionally designed to
| bring about most of the effects it currently serves.
|
| I'm not suggesting that deleting copyright overnight will
| produce a perfect system, nor am I suggesting that it has
| zero positive effects. I'm suggesting that it's doing
| substantial harm and needs a _massive_ overhaul, not
| minor tweaks.
| Kim_Bruning wrote:
| My own business model is to create Things That Don't
| Exist Yet. This (typically bespoke work) is actually the
| majority of work in any era I think. For me, copyright
| doesn't do much, it mostly gets in the way.
|
| If you pass the law tomorrow -all else being equal- my
| profits would stay equal or go up somewhat.
| JoshTriplett wrote:
| > It is impossible for us to gesture at the works of art
| that do not exist because of draconian copyright.
|
| We can gesture at the tiniest tip of the iceberg by
| observing things that are regularly created in violation
| of copyright but not typically attacked and taken down
| until they get popular:
|
| - Game modding, romhacks, fangames, remakes, and similar.
|
| - Memes (often based on copyrighted content)
|
| - Stage play adaptations of movies (without authorization
|
| - Unofficial translations
|
| - Machinima
|
| - Speedruns, Let's Play videos, and streams (very often
| taken down)
|
| - Music remixes and sampling
|
| - Video mashups
|
| - Fan edits/cuts, "Abridged" series
|
| - Archiving and preservation of content that would
| otherwise be lost
|
| - Fan films
|
| - Fanfiction
|
| - Fanart
|
| - Homebrew content for tabletop games
| sleepybrett wrote:
| > "- Speedruns, Let's Play videos, and streams (very
| often taken down)"
|
| Very often taken down, only by nintendo.
| JoshTriplett wrote:
| There are several other publishers who regularly go after
| gameplay footage of people playing their games. It's not
| as visible, because it's hard to notice the _absence_ of
| a thing.
| Kim_Bruning wrote:
| Fashion is traditionally not copyrightable[1] , and the
| fashion industry is doing rather well.
|
| Similarly our IT infrastructure is now built mostly on [a
| set of patches to the copyright system][2] called F/L/OSS
| that provided more freedom to authors and users, and lead
| to more innovation and proliferation of solutions.
|
| So even just in the modern west, we can see thriving
| ecosystems where copyright is absent or adjusted; and
| where the outcomes are immediately visible on the street.
|
| [1] Though a quick search shows that lawyers are making
| inroads.
|
| [2] One way of describing it at least, YMMV.
| kelnos wrote:
| That ship sailed long ago. While copyright can and is used
| at times to protect the "little guy", the law is written as
| it is in order to protect and further corporate interests.
|
| The current manifestation of copyright is about rent-
| seeking, not promoting innovation and creativity. That it
| may also do so is entirely coincidental.
| ryandrake wrote:
| Also, if it _wasn 't_ about rent-seeking and preventing
| access to works, copyright wouldn't have to last for
| decades, many multiples of a work's useful commercial
| life. The fact that it does last this long shows that
| it's not about promoting innovation and creativity.
| DoItToMe81 wrote:
| Copyright was invented by a cartel of noblemen, the
| British Stationer's Company, who, due to liberal reform,
| were going to lose their publishing monopoly. The
| implementation of copyright law as they helped pen
| allowed them to mostly continue their position while
| portraying it as "protecting the little guy".
|
| Funny how both the rhetoric and intentions are the same
| after three hundred years.
| CobrastanJorji wrote:
| You want to look at the Supreme Court case "Eldred v.
| Ashcroft." Eldred challenged Congress for retroactively
| extending existing copyrights, for extending the patent
| protections on existing inventions could not possibly
| further arts and sciences. They also argued that if
| Congress had the power to continually extend existing
| copyrights by N years every N years, the Constitutional
| power of "for a limited time" had no meaning.
|
| The Supreme Court's decision was a bunch of bullshit around
| "well, y'know, people live longer these days, and some
| creators are still alive who expected these to last their
| whole lives, and golly, coincidentally this really helps
| giant corporations."
| teeray wrote:
| Copyright's purpose is a cudgel to be wielded to enrich the
| holder for, ideally, eternity. If "eternity" is threatened,
| you use proceeds from copyright to change copyright law to
| protect future proceeds.
| fragmede wrote:
| works the same for banks and owing them money
| justinclift wrote:
| > Violate one or two copyrights, get sued or DMCAed out of
| existence. Violate billions, on the other hand, and you
| magically become immune to the rules everyone else has to
| follow.
|
| Sounds like the same concept as commonly said of "murderer vs
| conqueror".
|
| Could probably be applied to many other fields for disruption
| too. Not the murderer bit (!), more the "break one or two
| laws -> scaled up massively to a potential new paradigm".
| RF_Savage wrote:
| Violate billions or millions is what they used to nail warez
| folks with. So there is that.
| marsten wrote:
| There's a strong geopolitical angle as well. If you force
| American companies to license all training data for LLMs,
| that is such a gargantuan undertaking it would effectively
| set US companies back by years relative to Chinese
| competitors, who are under no such restrictions.
|
| Bottom line, if you're doing something considered relevant to
| the national interest then that buys you a lot of leeway.
| Kim_Bruning wrote:
| You will need to first demonstrate that actual copying took
| place. And that what copying that did take place was actually
| illegal or infringing.
|
| As we're seeing in court, that's a very interesting question.
| It turns out that the answers are very counter-intuitive to
| many.
| stale2002 wrote:
| > acting like that's the immutable status quo
|
| It is immutable.
|
| What are you going to do about it? Confiscate everyone's home
| gamer PCs?
|
| Even in the most extreme hypothetical where lawsuits shutdown
| OpenAI, that doesn't delete the stable diffusion models that
| I have on my external hard drives.
|
| The tech is out there. It's too late.
| hyperpape wrote:
| From the article:
|
| > The most recently dismissed claims were fairly important,
| with one pertaining to infringement under the Digital
| Millennium Copyright Act (DMCA), section 1202(b), which
| basically says you shouldn't remove without permission crucial
| "copyright management" information, such as in this context who
| wrote the code and the terms of use, as licenses tend to
| dictate.
|
| > It was argued in the class-action suit that Copilot was
| stripping that info out when offering code snippets from
| people's projects, which in their view would break 1202(b).
|
| > The judge disagreed, however, on the grounds that the code
| suggested by Copilot was not identical enough to the
| developers' own copyright-protected work, and thus section
| 1202(b) did not apply. Indeed, last year GitHub was said to
| have tuned its programming assistant to generate slight
| variations of ingested training code to prevent its output from
| being accused of being an exact copy of licensed software.
|
| So (not a lawyer!) this reads like the point about GitHub
| tuning their model is not a generic defense against any and all
| claims of copyright infringement, but a response to a specific
| claim that this violates a provision of the DMCA.
|
| I don't know whether this is a reasonable defense or not, but
| your intuitions or mine about whether there is a general
| copyright violation or what's fair are not necessarily relevant
| to how the judge construes that very specific bit of legal
| code.
| xinayder wrote:
| What I got from this is, you can copy someone's copyrighted
| work provided you tweak a few things here and there. I wonder
| how this holds up in court if you don't have billions at your
| disposal.
| sleepybrett wrote:
| Weird Al should be in the clear then, he changes probably
| 85% of all the song lyrics in his covers.
| sensanaty wrote:
| Weird Al explicitly seeks out permission from copyright
| holders and won't do a cover if he doesn't get their go-
| ahead [1].
|
| Pretty much the exact opposite of all these AI companies
| :p
|
| https://www.weirdal.com/archives/faq/
| skybrian wrote:
| The machine alone doesn't do anything. The user and machine
| together constitute a larger system, and with autocomplete, the
| user is charge. What's the user's intent?
|
| I suspect that a lot of copyright violations are enabled by
| cut-and-paste and screenshot-taking functionality, and maybe we
| need to be careful with autocomplete, too? It's the user's
| responsibility to avoid this. We should be careful using our
| tools. Do users take enough care in this case? Is it possible
| to take enough care while still using CoPilot?
|
| I've switched from CoPilot to Cody, but I use them the same
| way, to write _my_ code. There 's no particular reason to use
| CoPilot's output verbatim and lots of good reasons not to. By
| the time I've adapted it to my code base and code style and
| refactored it to hell and back, it's an expression of how _I_
| want to solve a problem, and I 'm pretty confident claiming
| ownership.
|
| Is that confidence misplaced? Are other people more careless?
| BeefWellington wrote:
| > The machine alone doesn't do anything.
|
| By the same token, the machine alone can't download pirated
| movies. Yet the sites hosting those movies are targeted as
| the infringers.
|
| There's a point at which foisting this responsibility on the
| users is simply socializing losses. Ultimately Copilot is the
| one serving the code up - regardless of the user's request.
| If the user then goes on to republish that work as their own
| it becomes two mistakes. It'll be interesting to see if any
| lawyers are capable of articulating that well enough in any
| of these lawsuits.
|
| > Is that confidence misplaced? Are other people more
| careless?
|
| I would say yes, for two reasons. One is that using code of
| unknown provenance means you're opening yourself to unknown
| legal risks. The second is if you're rewriting it fully (so
| as not to run afoul of easily spotted copyright) that's not
| actually "clean room" and you're still open to problems. I'd
| also wonder what the point of using a code writing LLM is
| anyways if you're doing all the authorship yourself. It seems
| like doing double the work.
| skybrian wrote:
| It _is_ a lot of work to do a lot of rewrites, but it's
| noncommercial and I'm not in a hurry. And autocomplete is
| still pretty useful.
| xorcist wrote:
| Why stop there? Extrapolate that thought, keep generating more
| variants of the code, claim copyright, and seek rent from other
| people doing the same thing. To extrapolate full circle, there
| would be a business opportunity to generate as many variants as
| possible for the original author, to prevent all this from
| happening.
|
| As long as we're not required to register copyright there's no
| reason to think the above will play out. International
| copyright agreements are not limited to verbatim copies only.
| BeefWellington wrote:
| > Why stop there? Extrapolate that thought, keep generating
| more variants of the code, claim copyright, and seek rent
| from other people doing the same thing. To extrapolate full
| circle, there would be a business opportunity to generate as
| many variants as possible for the original author, to prevent
| all this from happening.
|
| This has already been done[1] in music, though in their case
| they released them to the public domain. Admittedly I think
| that was more of a protest than anything.
|
| [1]: https://www.vice.com/en/article/wxepzw/musicians-
| algorithmic...
| eftychis wrote:
| Adding to the sibling comments:
|
| First: every human is per se doing that already. We have - to
| handwave - a "reasonable person" bar to separate violations
| versus results of learning and new innovation.
|
| Second: You can be a holder of copyright and your creations
| result in copyrightable artifacts. Anything generated by the
| program has been held as uncopyrightable.
| _heimdall wrote:
| The actual answer here, regardless of a court ruling, is that
| you'd go broke if anyone big enough tried to go after you for
| it.
|
| Legal protections for source code are still pretty fuzzy,
| understandably so given how comparatively new the industry is.
| That doesn't stop lawyers from racking up huge fees though, it
| actually helps because they need so much more prep time to
| debate a case that is so unclear and/or lacking precedent.
| ProAm wrote:
| > How is it any different when a machine does the same thing?
|
| Literally the bank account behind the action...
| jollofricepeas wrote:
| No clue.
|
| But what if the generative AI were used to create music instead
| of code would the court have ruled differently?
|
| CONSIDER:
|
| In 2015, a federal judge order Thicke & Pharrell to pay 50% of
| proceeds to the Marvin Gaye estate for being "too similar" to
| the song, "Gots to Give It Up".
|
| Comparison and commentary:
| https://youtu.be/7_UiQueteN4?si=SkClbyBMOcucigRm
|
| Comparison of both songs:
| https://youtu.be/ziz9HW2ZmmY?si=3_VZzfoLT-NrozoK
| roenxi wrote:
| If you tell a programmer to implement a function foo(a, b) then
| there are actually only a tiny number of ways to do that,
| semantically speaking, for any given foo. The number of options
| narrows quickly as the programmer implementing it gets more
| competent.
|
| Choosing function signatures is an art form but after that
| "copying" is hard to judge.
| sva_ wrote:
| > a function foo(a, b) then there are actually only a tiny
| number of ways to do that
|
| I'd argue there are infinite ways to implement any function,
| just almost all of them are extremely bad.
| woah wrote:
| You would not get your ass kicked legally speaking. Copyright
| is not that broad. It's not a patent.
| skissane wrote:
| > I assume that I would get my ass kicked legally speaking.
| That reads to me exactly like deliberate copyright infringement
| with willful obfuscation of my infringement.
|
| It looks like wilful obfuscation because the obfuscation is so
| simplistic. But as the obfuscation gets increasingly
| sophisticated, it becomes ever harder to distinguish wilful
| obfuscation from genuine originality.
| chii wrote:
| > But sufficiently complex obfuscation of infringement is
| very hard to distinguish from genuine originality.
|
| for the purposes of copyright, originality is not required,
| just different expressions. It's ideas (aka, patent) that
| require originality.
|
| The 'sufficiently complex obfuscation' is exactly what
| people's brains go through when they learn, and re-produced
| what they learnt in a different context.
|
| I argue that AI-training can be considered to be doing the
| same.
| skissane wrote:
| Some different scenarios:
|
| (1) You leave your employer, don't take any code with you,
| start your own company, reimplement your ex-employer's
| product from scratch, but you do it in a very different way
| (different language, different design choices, different
| tech stack, different architecture)
|
| (2) You leave your employer, take their code with you,
| start your own company, make some superficial changes to
| their code to obscure your theft but the copying is obvious
| to anyone who scratches the surface
|
| (3) You leave your employer, take their code with you,
| start your own company, start very heavily manually
| refactoring their code, within a few months it looks
| completely different, very difficult to distinguish from
| (1) unless you have evidence of the process of its creation
|
| (4) You leave your employer, take their code with you,
| start your own company, download some "infringement
| obfuscation AI agent" from the Internet and give it your
| employer's codebase, within a few hours it has transformed
| it into something difficult to distinguish from (1) if you
| didn't know the history
|
| (1) is unlikely to be held to be infringing. (2) is rather
| obviously going to be held to be infringing. But what about
| (3)? IANAL, but I suspect if you admitted that is how you
| did it, a judge would be unlikely to be very sympathetic.
| Your best hope would be to insist you actually did (1)
| instead. And then the outcome of the case might come down
| to whether the judge/jury believes your claim you actually
| did (1), or the plaintiff/prosecution's claim you did (3).
|
| And (4) is basically just (3) with AI to make it a lot
| faster and quicker. Such an agent likely doesn't exist yet,
| but it could happen.
|
| Timing is obviously a factor. If you leave your employer
| and launch a clone of their app the next week, everyone is
| going to think either you stole their code, or you were
| moonlighting on writing it (in which case they may legally
| own it anyway). If it takes you 12 months, it becomes more
| believable you wrote it from scratch. But if someone uses
| AI to launder code theft, maybe they can build the "clone"
| in a few days or weeks, and then spend a few months
| relaxing and recharging before going public with it
| megaman821 wrote:
| Numbers 2, 3, & 4 are all illegal because they start with
| an illegal action.
|
| If I find a dollar on the sidewalk and put it in my
| wallets, is that stealing? If I punch a man getting
| change at a hotdog stand and a dollar falls on the
| sidewalk and then I put that in my wallet, is that
| stealing?
|
| It doesn't matter what the scenario is after you stole
| code from your former employer, all actions are poisoned
| after.
| bawolff wrote:
| > How is it any different when a machine does the same thing?
|
| I think the argument is that the machine is not doing that, or
| at least there isn't evidence that it is doing that.
|
| Specificly no evidence that github is doing both 1 and 2 at the
| same time. There might be cases where it makes trivial changes
| to code (point 2) but for code that does not meet the threshold
| of originality. Similarly there might be cases with copyrighted
| code where the idea of it is taken, but it is expressed in such
| a different way that it is not a straightforward derrivitave of
| the expression (keeping in mind you cannot copyright an idea,
| only its expression. Using a similar approach or algorithm is
| not copyright infringement)
|
| And finally, someone has to demonstrate it is actually
| happening and not just in theory could happen. Generally courts
| dont punish people for future crimes they haven't comitted yet
| (sometimes you can get in trouble for being reckless even if
| nothing bad happens, but i dont think that applies to copyrighg
| infringement)
| constantcrying wrote:
| >I assume that I would get my ass kicked legally speaking.
|
| Why? This is no different than copy pasting and modifying a bit
| of code from some documentation/other project/tutorial/SO.
| Surely if that were a basis for copyright infringement most
| semi-large software projects would be infringing on copyright.
|
| I don't think anyone here should be willing to open the can if
| worms that is copy pasting small snippets of code and modifying
| them.
|
| The judge seems to argue that the non-identical copies are at
| issue here and that they only happen under contrived
| circumstances. My moral opinion is that this is irrelevant and
| that even the defendant is the wrong person. Even verbatim
| copies of code snippets shouldn't be copyright infringement and
| suing the company providing the AI is wrong to begin with, as
| the AI or its providercan not possibly be the one to infringe.
| ExoticPearTree wrote:
| I don't think it works that way. During the course of your
| professional career as a developer you change jobs. And let's
| say that at every job you create APIs. Besides the particular
| functions those API provide, the API code itself (how you
| interact with clients, databases etc.) will be pretty much the
| same as whatever you did at previous jobs. Does this constitute
| copyright experience or is just experience?
|
| My analogy is that if Copilot doesn't provide 100% code from
| another repository it is OK to be used by other people trained
| with code available on GitHub.
| sim7c00 wrote:
| it depends on how much tax you are paying really. if you pay
| billions in taxes annually, they might see past it. if the
| company you copied from pays billions in taxes anually. you
| will go to jail. if this isn't painfuly obvious by now...
| Kuinox wrote:
| You are taking the plaintiff statement as is, which is wrong.
| You can blame the media that didn't made it clear that it was a
| statement from the plaintiff.
| isodev wrote:
| It would. And this is where some legislation "in the spirit of"
| would have helped. So Microsoft's huge legal arm can't just
| wiggle their way out on technicalities. Clearly, the law is not
| prepared to face the challenge of copyright violations on the
| scale created by the LLMs.
|
| I also think it's not just copyright. It's simply not right to
| create a product on top of the collective work of all open
| source developers monetize them on the absurt scale Microsoft
| operates and never ever credit the original creators.
| bmitc wrote:
| Regardless of the details here, it's become quite clear that
| the judicial system is for corporations. It doesn't matter
| whether they win, lose, or settle, as they win regardless,
| since the monetary benefits of what got them in court in the
| first place far outweigh any punishment or settlement cost.
| rnkn wrote:
| It seems the total disregard that the tech community showed
| toward copyright when it was artists losing out has come back
| to bite. Face-eating leopards, etc.
| bena wrote:
| I agree. I don't see the difference.
|
| That's the entire reason "clean room reverse engineering" is
| done.
|
| Using nothing but the binary itself, work out how things are
| done. Making sure that the reverse engineers don't even have
| access to any material that could look like it came from the
| other organization in question. And that it is provable.
| alickz wrote:
| who gets to copyright claim the various array sorting
| algorithms then?
| kjellsbells wrote:
| Days like this, I wonder what Borges would have made of such
| questions.
|
| "Pierre Menard, author of redis"
|
| I know from experience that parents are aggressively pushing
| their children into STEM to maximize their chances of being
| economically secure, but, I really feel that we need a
| generation of philosophers and humanists to sift through the
| issues that our technology is raising. What does it mean to
| know something? What does authorship mean? Is a translated work
| the same as the original? Borges, Steiner, and the rest have as
| much to contribute as Ellison, Zuckerberg, and Altman.
| p0w3n3d wrote:
| How is it anything different? You have no money. And Microsoft
| has. The problem on this is that it will give a huge leverage
| to rich companies over poor, because those rich can steal
| (memorize with AI) anything including music
| hn_throwaway_99 wrote:
| A slight aside, but this is the subtitle:
|
| > A few devs versus the powerful forces of Redmond - who did you
| think was going to win?
|
| I hate that kind of obnoxious "journalism". Sometimes the little
| guy is actually wrong. To clarify, I'm not commenting on the
| specifics of this case, I just hate how fake our online discourse
| has been by appealing to "big guy evil" before even bringing up
| the specifics of the case.
| epolanski wrote:
| I think you're misinterpreting the sentence.
|
| I think it merely implies MS has more resources to throw at the
| legal case.
| gpm wrote:
| I don't think that's something you can take away from the
| little-guy big-guy narrative. Class actions are funded by
| courts awarding lawyers _huge_ payouts if they win, not
| directly by the plaintiffs. There should be plenty of
| resources on both sides of this fight.
| mcmcmc wrote:
| You are sorely underestimating the legal resources
| available to one of the most powerful companies on earth
| gpm wrote:
| I don't believe I am. To flush out my statement more
| fully there are diminishing returns on investing more
| money into a lawsuit, and both sides in a class action
| with this much money at stake should be sufficiently
| funded to be far beyond the point of diminishing returns.
|
| I'm not claiming Microsoft doesn't have tons of
| resources, I'm claiming that the plaintiffs attorneys
| should be sufficiently funded that the difference in
| outcomes is negligible.
| megaman821 wrote:
| Maybe but lack of resources doesn't seem to be the main
| problem. A handful of devs claim copyright infringement, the
| Judge says show me and they can't. Maybe if they had millions
| of lawyers trying to get Copilot to produce their copyrighted
| code, their case would be stronger.
| yieldcrv wrote:
| they also have more resources to ensure they covered their
| liability surface before any legal case materialized
|
| aka the plaintiffs were wrong and had no idea what they were
| talking about
| hn_throwaway_99 wrote:
| I strongly disagree. I don't see how you can interpret that
| sentence, especially given the "who did you think was going
| to win?" part, and ignore the implication that Microsoft won
| _solely because of_ their size and money.
|
| There is actually zero evidence that the judge issued his
| ruling based on Microsoft's superior legal team, so why even
| put that sentence in there anyway?
| deciplex wrote:
| > Sometimes the little guy is actually wrong.
|
| He is, sometimes. Also sometimes, the moon passes exactly
| between the sun and Earth, a new star appears in the sky, the
| magnetic field of our planet reverses, a proton decays (jury is
| still out on that one, actually). Etc.
|
| Tools like Copilot are plagiarism machines. We know the data
| they're being trained on, and a conclusion of "that's
| plagiarism" is not - or anyway should not be - controversial.
| I'm not terribly _against_ the notion of a plagiarism machine
| but I am against the owners of such machines reaping profits
| from them to the exclusion of the people who provide the source
| material. This is theft.
|
| More importantly, getting back to big guys and little guys: big
| guys gang up on little guys all the time. It's usually how they
| get to be big. They tend to be the ones who realize that
| working together against the rest of us is to their benefit.
| So, in the interest of pushing back on that a little, and
| recognizing that I am after all a fellow "little guy"
| (figuratively speaking anyway), I tend to support the "little
| guy" unless I have overwhelming evidence confirming that they
| are, in fact, both _wrong_ and that _supporting them anyway
| would be against my best interest._ Neither is the case, here.
|
| At any rate, the subtitle here references a pretty ubiquitous
| and, I'm happy to report, increasingly well-known and
| understood facet of our economic and social institutions, which
| is that they absolutely positively do not work for us or
| further our interests in any sense.
| tpmoney wrote:
| One would think if these were "plagiarism machines", that one
| of the plaintiffs would have been able to produce even a
| single instance of the copying they alleged.
| JumpCrisscross wrote:
| > _big guys gang up on little guys all the time_
|
| And obnoxious individuals gum up enterprises. It's lazy to
| the point of dismissal to conclude based on bigness.
| ryandrake wrote:
| You can't predict right or wrong based on bigness, but you
| can very often predict who will win.
|
| EDIT: And by "win" I mean not who the judge will side with,
| but who will end up chugging along fine financially and who
| will end up broke.
| hn_throwaway_99 wrote:
| > EDIT: And by "win" I mean not who the judge will side
| with, but who will end up chugging along fine financially
| and who will end up broke.
|
| I can certainly agree with that sentence, but that is
| definitely not how the Register was referring to "win"
| (they clearly just meant the judicial outcome), so it's
| obnoxious to imply the legal ruling went Microsoft's way
| solely due to their greater resources.
| griftrejection wrote:
| Won't anyone think of the corporations? :(
| sensanaty wrote:
| Those poor corporations, however will they survive? I say
| we let them dump chemicals straight into our oceans, after
| all we don't want to _gum them up_ from earning infinite
| profit!
| Bognar wrote:
| It's The Register, they are always like this. Especially when
| Microsoft is involved.
| epolanski wrote:
| I am not strongly opinionated on this, but the very fact
| Microsoft used all the code it could find, bar their own has
| always looked suspicious to me.
| jfoster wrote:
| Is that a fact? If true, not sure whether it would have bearing
| on the legal questions, but certainly would make it seem like
| their actions are not in very good faith. Would love to hear
| their explanation if it did get raised in court.
| cdrini wrote:
| I mean, I imagine it used a lot of their public code, like VS
| code, typescript, the new windows terminal, or anything on
| https://github.com/microsoft . They didn't use their private
| code, but they didn't use anyone else's private code either.
| sensanaty wrote:
| They _claim_ to not use anyone 's private code, but I
| wouldn't trust the psychopathic C-suite at M$ not to murder
| kittens and human babies if it made the line go up a quarter
| of a percentage point, yet alone something like this.
| cdrini wrote:
| You're free to speculate, but they have on multiple
| occasions said they don't train on private repos.
| Furthermore, there's no real incentive for them to do so,
| since (1) there are a lot of public repos, and (2) training
| on private repos opens them up to leaking things like
| private keys which would be a nightmare. It just doesn't
| make a lot of sense for them to do it.
| WesternWind wrote:
| Wait... So Microsoft doesn't use Microsoft Teams, it uses Slack?
| danpalmer wrote:
| GitHub uses Slack, and has done since long before the Microsoft
| acquisition. GitHub also does a ton of chat-ops, or at least
| used to, so their migration from Campfire to Slack was a big
| move for the company, I doubt they want to move again.
| chrismsimpson wrote:
| If this is how the law is applied for code, are we to expect this
| is also how it will be applied for other data (e.g. audio a la
| Udio and Suno)?
| nashashmi wrote:
| Big question: this thing called "training" AI off of data, how
| much of this is "training" and how much of this is
| "synthesizing"? It seems like if code is being copied and
| rephrased, it is synthetic. Not much "learning" and "training"
| going on here.
| bsza wrote:
| Should we move to modified versions of FOSS licenses that forbid
| AI training?
|
| Found this: https://github.com/non-ai-licenses/non-ai-licenses
|
| Legally sound or not, these should at least prevent your code
| from being included in Copilot's training data, hopefully without
| affecting any other use case. I'm going to use one of these next
| time I start a new project.
| hardwaresofton wrote:
| Note that wouldn't be F/OSS -- maybe OSS but the F wouldn't be
| there.
| bsza wrote:
| Yes, that is clear. But personally I wouldn't want to write
| FOSS code anyway until Copilot learns to properly attribute
| FOSS code. Switching to a more permissive license later on
| shouldn't be an issue.
| cmeacham98 wrote:
| If copilot is ruled fair use it doesn't matter what your
| license is, fair use superceeds it.
| gpm wrote:
| > Legally sound or not, these should at least prevent your code
| from being included in Copilot's training data
|
| Has microsoft said this or something?
| bsza wrote:
| I assumed (heard somewhere) that they only include open
| source repos in the training data.
|
| Turns out I was wrong. They don't care.
|
| https://web.archive.org/web/20210708165143/https://twitter.c.
| ..
| stale2002 wrote:
| You can write whatever words you want on a piece of paper or
| uploaded to the info section of a GitHub repo.
|
| That doesn't mean anyone has to follow it.
|
| If it's legal to train on other people's stuff, without their
| permission, this would still apply to your code even if your
| code includes a license that said "I double extra declare that
| you can't train AI on this!!".
| cellis wrote:
| I would like to ask an obvious question to the legally inclined
| here. How is this any different than remixing a song
| (lyrics/audio)? It's not "identical", and doesn't output
| "verbatim" lyrics or audio. What is the distinction between <LLM>
| and <Singer/Remixer who outputs remixed lyrics/audio>. By a quick
| Google search it seems remixes violate copyright.
| default-kramer wrote:
| I'm not legally inclined, but... code and music are different?
| There must be different standards for when code is too similar,
| for when music is too similar, for when pictures are too
| similar, for when books are too similar.
|
| Also, remixes almost always _do_ contain verbatim lyrics and
| /or samples from the original song. LLM output isn't supposed
| to contain verbatim copies, but I've been told that sometimes
| it does. (I don't know much about LLMs and I don't think
| Copilot is useful. I want my 2010-era Intellisense back, when
| it was extremely fast and predictable.)
| yazzku wrote:
| > The judge disagreed, however, on the grounds that the code
| suggested by Copilot was not identical enough to the developers'
| own copyright-protected work, and thus section 1202(b) did not
| apply.
|
| How did they reach this conclusion? How can you prove that it
| never copies a code snippet verbatim, versus just showing that it
| does for one specific code snippet? The latter is a lot easier to
| show, but I don't know what is it exactly that the prosecution
| claimed. I guess the size of the copy also matters in copyright
| violations?
| cdrini wrote:
| I think there's a difference between a mathematical proof and
| legal proof. The mathematical proof would be "show that it
| never copies a code snippet verbatim", and you of course cannot
| prove that by example.
|
| Legal proof is I think different (not a lawyer). They're more
| pragmatic. If, observing a lot of cases where it does not
| verbatim copy, and, if an expert provides a reasonable argument
| as to why it is unlikely to verbatim copy, that is enough legal
| proof for a judge to conclude that the output is not identical
| enough to the developers copyrighted code.
| sagarpatil wrote:
| Off topic: How does the judiciary decide which judge to choose
| for such highly technical case?
| benced wrote:
| District courts can set their own policies. The Northern
| California District - where this was filed - allocates a case
| according to the last 2 digits of the case number. Source:
| https://www.cand.uscourts.gov/judges/civil-docketing-assignm...
| Tomte wrote:
| That's Matthew Butterick's case.
| slicktux wrote:
| Yet people keep feeding it their code by using GitHub as their
| repo... Just how we use the internet to share information;
| there's just no escaping it.
| snvzz wrote:
| All GitHub needs to do to make most happy is offer an opt-out
| toggle.
|
| It still doesn't.
| MagicMoonlight wrote:
| The issue I have is that these models are inherently trained to
| duplicate stuff. You train them by comparing the output to the
| original.
|
| If I made an "advanced music engine" which rips Taylor swift
| files and duplicates them, I would be sued to oblivion. Why does
| calling it an AI suddenly fix that?
|
| They should have to train them on information they legally own.
| cdrini wrote:
| They're not "inherently trained to duplicate"; I think that's a
| bit of a disingenuous oversimplification. They're trained to
| learn abstract patterns in large datasets, and remix those
| patterns in response to a prompt.
|
| "You train them by comparing the output to the original." To
| the best of my knowledge this isn't correct; can you expand or
| cite a reference?
| rrobukef wrote:
| They are trained to duplicate, we just hope they do so by
| abstracting patterns. Various techniques stack the deck to
| make it difficult to memorize everything but it still happens
| easily, especially for replicated knowledge.
|
| "You train them by comparing the output to the original." ->
|
| You train neural networks by producing output for known
| input, comparing the output with a cost-function to the
| expected output, and updating your system towards minimizing
| the cost, repeatedly, until it stops improving or you tire of
| waiting. Cost functions must have a minimal value when the
| output matches exactly the expected to work mathematically.
| Engineering-wise you can possibly fudge things and they
| probably do so ... now.
|
| I don't agree with your critiques. It isn't an
| oversimplification, published code literally works as stated.
| cdrini wrote:
| I disagree with the statement "they are trained to
| duplicate" because "to" implies a purpose/intent which is
| incorrect. I.e. "they are trained with the purpose of
| duplication". This is I believe pretty uncontroversially
| false. We already have methods to duplicate data. They are
| trained with the purpose of learning abstract patterns is
| much more correct. One of the biggest _problems_ of
| training is duplication, aka over-fitting. To say it's the
| purpose is imo disengenious.
|
| Ah I see what they meant by that statement. It is true that
| supervised learning operates on labelled input/output
| pairs, and that neural networks generally use gradient
| descent/back propogation. (Disclaimer: it's been a few
| years since I've done any of this myself so don't quite
| remember it that well, and the field has changed a lot).
| Note since the parameter space of the neural network is
| usually _significantly_ smaller than the training data set,
| a network will not tend to minimise that cost function near
| 0 for an individual sample since doing so will worsen the
| overall result. There is inherent "fudging", although near
| identical output can potentially happen. The statement here
| is more reasonable and similar to the training process than
| the first.
| perlgeek wrote:
| From the article:
|
| > The anonymous programmers have repeatedly insisted Copilot
| could, and would, generate code identical to what they had
| written themselves, which is a key pillar of their lawsuit since
| there is an identicality requirement for their DMCA claim.
| However, Judge Tigar earlier ruled the plaintiffs hadn't actually
| demonstrated instances of this happening, which prompted a
| dismissal of the claim with a chance to amend it.
|
| So, the problem is really one of the lack of evidence, which
| seems... like a pretty basic mistake from the plaintiffs?
|
| They could've taken a screencap video back when Copilot still
| produced code more verbatim, and used that as evidence, I assume.
| albertTJames wrote:
| Looking good ! Go Copilot !
| lumb63 wrote:
| It seems to me that regardless of the outcome of this case, some
| developers do not want to have their code used to train LLMs.
| There may need to be a new license created to restrict this usage
| of software. Or, maybe developers will simply stop contributing
| open source. In today's day and age, where open source code
| serves as a tool to pad Microsoft's pockets, I certainly will not
| publish any of my software open source, despite how much I would
| like to (under GPL) in order to help fellow developers.
|
| If I were Microsoft, I'd really be concerned that I'm going to
| kill my golden goose by causing a large-scale exodus from GitHub
| or open source development more generally. Another idea I've
| considered is publishing boatloads of useless or incorrect code
| to poison their training data.
|
| As I see it, people should be able to restrict how people use
| something that they gave them. If some people prefer that their
| code is not used to train LLMs, there should be a way to enforce
| that.
| xinayder wrote:
| > I certainly will not publish any of my software open source,
| despite how much I would like to (under GPL) in order to help
| fellow developers.
|
| I think this is a rather radical approach. You're undermining
| the OSS movement because you dislike Microsoft (I do too). I
| think adding a clause or dual licensing your work is more
| effective at stopping big-tech funded AI crawlers than just not
| adhering to open source.
|
| You can host your code on sourcehut or Codeberg (Forgejo), you
| don't NEED to host it on a Microsoft owned platform.
| elzbardico wrote:
| I love the OSS movement. But the OSS movement is dependent on
| developers making a living somewhere else. If Microsoft
| effectively replace our class or at least a big part of it
| with AI, OSS becomes mostly irrelevant.
|
| Not everyone is multi-generationally rich or absurdly frugal.
| Most people like having good jobs.
| infecto wrote:
| I am personally happy to share all my public code to support
| the development of better models. While I believe the benefits
| of contributing to open source outweigh the drawbacks, and I
| don't foresee a "large-scale exodus from GitHub", it's
| ultimately up to individual developers to decide how their code
| is used.
| zamadatix wrote:
| "I don't license as open source because $something which I
| don't like could use my code" is a pretty common note over time
| but, despite almost always coming with a warning of the end of
| some large open source segment, is rarely impactful in any way.
| Some people probably will use a special license and most won't
| care except for when they run across projects using said one
| offs and it becomes a pain to integrate licensing models.
| naikrovek wrote:
| You silly whiners. The lawsuit was pure gesture from the
| beginning, and I said so at the time. You were all so sure that
| GitHub were breaking several laws, and now that you haven't
| gotten your way, you're saying the courts are corrupt.
|
| The mere fact that this suit was _dismissed_ means that there was
| not enough evidence to hold a trial. But you all think know
| better than the judge and the attorneys who worked on this, I
| assume?
|
| Without commenting further about the merit of the suit, I will
| say that it is very telling that everyone here thinks they know
| better than the legal professionals who worked on this case for
| probably hundreds of hours over the past few months, while those
| of you who are active commenters here have given this case no
| more than 10 hours of thought at the most.
|
| It is very sad to me that we no longer trust professionals, and
| each believe ourselves to be smarter and more capable than anyone
| else at a career that we don't even practice. Moreover, a lot of
| you seem to believe that you have unique realizations that the
| professionals working on these things have all somehow missed.
|
| Each of you may be (and probably are, really) the world's
| foremost expert in _something_ and I need you all to understand
| that being an expert in one or more things does not grant you
| expertise in anything else. You may be the most valuable software
| developer at a government contractor doing top secret work, and
| you may be so knowledgeable that other companies contract your
| time for help with their work, and that 's awesome. but this
| skill inherently has zero bearing on your ability to understand a
| fucking lawsuit about copilot.
|
| It is hard for people to swallow the fact that "I'm very smart
| here, but not there" and they will often default to "I'm very
| smart here, so I am very smart there." That is not true by
| default, and this is very rarely true, even with effort spent to
| make it happen.
|
| The suit was dismissed because it didn't meet the criteria
| required. You do not know more than the people involved. You are
| not seeing some obvious fact that the experts have missed. You
| simply hate Microsoft and you want them to suffer, and you get
| mad when legal matters impede that.
| passwordoops wrote:
| "The lack of documents from the Windows maker is apparently down
| to "technical difficulties" in collecting Slack messages"
|
| Wait, I'm forced to use Teams at work but Microsoft employees are
| on Slack?!
| nancyp wrote:
| Linux/OSS is cancer. Said who? Anything in public domain is for
| grab by them.
|
| Until the open tech community is chicken enough to not boycott
| their no open source stuff such as github and linked in a proof
| nothing will happen.
| warkdarrior wrote:
| Sir, are you OK??
| chidli1234 wrote:
| Microsoft has deep pockets. Judges aren't objective. More at 11.
___________________________________________________________________
(page generated 2024-07-10 23:01 UTC)