[HN Gopher] GitHub Copilot is not infringing copyright
       ___________________________________________________________________
        
       GitHub Copilot is not infringing copyright
        
       Author : aarroyoc
       Score  : 256 points
       Date   : 2021-07-05 11:10 UTC (11 hours ago)
        
 (HTM) web link (juliareda.eu)
 (TXT) w3m dump (juliareda.eu)
        
       | kube-system wrote:
       | > If it were not possible to prohibit the use and modification of
       | software code by means of copyright, then there would be no need
       | for licences that prevent developers from making use of those
       | prohibition rights (of course, free software licenses would still
       | fulfil the important function of contractually requiring the
       | publication of modified source code).
       | 
       | The parenthetical backpedaling here is the _entire point_ of
       | copyleft. If it wasn 't, copyleft wouldn't exist -- people would
       | just release their software as public domain.
       | 
       | The opposite of "copyleft" isn't "copyright".
       | 
       | The opposite of "copyleft" is "never published", in which case,
       | copyright is irrelevant.
       | 
       | There is plenty of commercial closed-source software based on
       | software released under permissive licenses like BSD, MIT or
       | Apache, because they are not copyleft.
        
         | sombremesa wrote:
         | I'd argue that copyright is still relevant when the source code
         | isn't published. It's not too difficult to copy an algorithm
         | from a binary even if you don't have the source.
        
           | kube-system wrote:
           | Fair. When I wrote that, I was thinking "not published" as in
           | server software.
        
       | malwrar wrote:
       | Who cares if they're infringing copyright?
       | 
       | Microsoft bought the place that has a lot of our code and now is
       | going to try and sell us a tool that will regurgitate it back on
       | demand. The entire software industry is already largely based and
       | advanced by the unpaid labor of open-source software project
       | developers, GitHub as a popular open source ally could at least
       | pretend to honor the gentleman's agreement of at least agreeing
       | to respect the open-source origins of a ton of its stack.
       | 
       | If the tool was also open we probably wouldn't have nearly as big
       | a problem, but I guess Microsoft has to recoup the cost of their
       | completely unnecessary purchase.
        
         | IshKebab wrote:
         | Microsoft could easily do this even if they didn't own GitHub.
         | Anyone can download all of the code on GitHub.
        
       | chrisseaton wrote:
       | But doesn't Copilot generate verbatim copies of entire copyright
       | methods that implement non-trivial novel algorithms, including
       | comments?
       | 
       | The article doesn't seem to address this?
        
         | creshal wrote:
         | Yes. The author apparently did no research of her own and just
         | assumed Github's FAQ was trustworthy.
        
           | ramraj07 wrote:
           | Yes and you did no further research on your own since GitHub
           | already said it's going to fix that (and a competent engineer
           | would know it's trivial to fix that as well).
        
           | chrisseaton wrote:
           | Data mined and regenerated it.
        
       | rektide wrote:
       | I've seen way too many screenshots of a dozen-line complete XHR
       | wrappers being suggested[1] to complete a function to imagine
       | Copilot as a generative machine. It's a somewhat fancy copy paste
       | engine, with phenomenal search. But it's smuggled through enough
       | complexity & machinery to obfuscate any legal obligations that
       | might be attached to the original source material.
       | 
       | The article does not set itself up to address this at all:
       | 
       | > Since Copilot also uses the numerous GitHub repositories under
       | copyleft licences such as the GPL as training material, some
       | commentators accuse GitHub of copyright infringement, because
       | Copilot itself is not released under a copyleft licence, but is
       | to be offered as a paid service after a test phase.
       | 
       | I'm all for discussion of whether Copilot itself has to be
       | copyleft. But to me, the immediate concern is that Copilot seems
       | like a way to take copyleft works and remove the copyleft license
       | from those works.
       | 
       | [1] https://mastodon.social/@cjd/106513694972486353
        
         | hnfong wrote:
         | Isn't that what GPL allows, and is what the AGPL is for if you
         | don't want people to take your code and host it as an online
         | service?
        
           | rektide wrote:
           | the service itself is a source-code-copier.
           | 
           | GPL does not permit you to copy source code without
           | attribution. this copier does not provide attribution.
           | 
           | as i just said, i'm not so interested in debating the source-
           | code-copier's licensing. i think it could go either way but i
           | don't really care. the copied source code that the source
           | code copier copies is interesting to me, and feel like the
           | stochiastic parrot act bullshit they are pulling is massive
           | massive sinfully evil bullshit without attribution. the
           | stochiastic parrot can't just ignore all the licensing of
           | what it parrots out.
        
       | hu3 wrote:
       | Please copy my code. Reality is I'll be gone in 100 years tops
       | and I'd be more than glad if my crappy code actually helps
       | someone.
       | 
       | As for attribution, we all learn by looking at code from all
       | kinds of licenses. Between Stack Overflow, projects hosted in
       | GitHub, libraries that sit on our vendor directories and even
       | closed source projects there's a lot that is carried over to new
       | projects without attribution.
       | 
       | We're heading to a world were most projects are basically
       | libraries glued together anyway. Standing on the shoulders of
       | giants and all that.
       | 
       | The dream of an omniscient pair programming buddy is slowly
       | coming to fruition and I for one welcome.
       | 
       | Copilot is just a tool, fancy search engine for the code that's
       | available online. Projects should be judged by the way they use
       | Copilot just like I'm judged if I misuse my car.
       | 
       | I couldn't care less whether my name is shoved in some ever
       | increasing CONTRIBUTOR.md file that no one but machines will
       | read.
       | 
       | I'm actually going to start documenting blocks of code more
       | thoroughly so Copilot can better infer what each block does.
        
       | COMMENT___ wrote:
       | TLDR; GitHub will eventually add some kind of "data usage
       | reporting" utility that could show which parts of final your code
       | made with help of this CuckPilot could potentially infringe
       | copyright with links to other known sources of these parts of
       | code. Then they will tell you that it is your responsibility to
       | ensure that your final code does not have copyright issues.
        
       | [deleted]
        
       | Syzygies wrote:
       | Whatever the law, when does learning from what we read devolve
       | into plagiarism?
       | 
       | The poster child for this category would be those programs that
       | generate nonsense English text that recognizably resembles a
       | known author. They choose the next character at random,
       | conditionally based on the previous characters. Too short a
       | context, and the results are gibberish. Too long a context, and
       | the results are plagiarism.
        
       | [deleted]
        
       | phoe-krk wrote:
       | _> On the other hand, the argument that the outputs of GitHub
       | Copilot are derivative works of the training data is based on the
       | assumption that a machine can produce works. This assumption is
       | wrong and counterproductive. Copyright law has only ever applied
       | to intellectual creations - where there is no creator, there is
       | no work. This means that machine-generated code like that of
       | GitHub Copilot is not a work under copyright law at all, so it is
       | not a derivative work either. The output of a machine simply does
       | not qualify for copyright protection - it is in the public
       | domain. That is good news for the open movement and not something
       | that needs fixing._
       | 
       | This is very good news. This line of thought implies that we can
       | legally feed all proprietary code into GitHub Copilot in order to
       | teach it all the patented and secret tricks of the companies we
       | can see (since data mining is not copyright infrigement) in order
       | to have it print those secrets back when we ask it to (so they
       | become public domain).
       | 
       | /s
        
         | amelius wrote:
         | > This line of thought implies that we can legally feed all
         | proprietary code into GitHub Copilot
         | 
         | And lines such as:                   char
         | *base64_pirated_mp4_file = "YW55IGNhcm5hbCBwbGVhc3VyZS4...";
        
         | shp0ngle wrote:
         | Patent is not copyright.
         | 
         | Secret tricks are usually a company secret, not really
         | protected by copyright either.
         | 
         | NDAs and similar are also not copyrights.
         | 
         | Even trademark is not copyright.
        
         | tgsovlerkhgsel wrote:
         | "Patented and secret tricks" are not protected by copyright, if
         | the output was an actual reimplementation of an idea instead of
         | Copilot regurgitating existing code
         | (https://news.ycombinator.com/item?id=27710287).
         | 
         | The specific implementations are protected by copyright, and
         | the ideas may be protected by patents. In the case of "secret"
         | tricks, they may be protected by trade secret laws, but not if
         | it's in a public GitHub repo.
        
         | jakobdabo wrote:
         | > The output of a machine simply does not qualify for copyright
         | protection
         | 
         | Good, does the `cp` or `cat` command qualify for the "output of
         | a machine"? Now I can uncopyright everything, hooray. What
         | about converting a video or an image to another format? Again,
         | it's just output of a machine.
         | 
         | Added:
         | 
         | Really, I would've been happy if this was the situation, as
         | I'm, in general, against patents and copyright (in the form
         | that they are now being used).
        
         | chalst wrote:
         | > Copyright law has only ever applied to intellectual creations
         | - where there is no creator, there is no work
         | 
         | This is, at best, an oversimplification. Code compiled from
         | copyrighted source code is derived work inheriting that
         | copyright according to long-established law. This is exactly
         | why the legal issues around machine learning applied to
         | copyrighted corpora have been contentious.
        
         | b3morales wrote:
         | Wow, that's quite a strawman/bait-and-switch from the article;
         | thanks for highlighting it.
         | 
         | If Copilot is just a machine -- a glorified typewriter -- then
         | the machine's _operator_ is responsible for its output.
         | 
         | Or does the author seriously want to claim that any code added
         | via Copilot to a proprietary codebase would not be proprietary
         | as well? If that were true, Copilot's userbase is going to
         | be...limited.
        
           | syshum wrote:
           | So then we come full circle on infringement, if the operator
           | is responsible for the code produced by co-pilot then the
           | articles claim that is not infringement because it is machine
           | created fails, as the operator is responsible
           | 
           | The argument in the article is that all code made by co-pilot
           | is not infringement because there is no copyright attached.
           | You seem to imply that the copyright of all code made by co-
           | pilot is copyrighted by the operator of co-pilot thus would
           | then fall under copyright law, and thus would/could be
           | infringing
        
             | b3morales wrote:
             | > The argument in the article is that all code made by co-
             | pilot is not infringement because there is no copyright
             | attached.
             | 
             | Right; I don't see how that's even remotely a tenable
             | argument. I think the article is trying to eat its cake and
             | have it too.
        
         | bsza wrote:
         | Except Copilot itself is not open source, so your only way to
         | feed that proprietary code into it would be to upload it to
         | github, which would make _you_ an infringer.
        
           | heavyset_go wrote:
           | As you type you send the code you're writing to Copilot's
           | backend.
        
             | bsza wrote:
             | From the copilot telemetry docs [0]:
             | 
             | > _The GitHub Copilot collects activity from the user's
             | Visual Studio Code editor, tied to a timestamp, and
             | metadata._
             | 
             | [...]
             | 
             | > _This data will only be used by GitHub for:_
             | 
             | [...]
             | 
             | > - _Improving the underlying code generation models, e.g.
             | by providing positive and negative examples (but always so
             | that your private code is not used as input to suggest code
             | for other users of GitHub Copilot)_
             | 
             | I'm inclined to believe this. After all, why would they
             | taint the training data with code from a random guy who is
             | _asking_ for help when they have more than a hundred
             | thousand repos with 100+ stars?
             | 
             | [0] [https://docs.github.com/en/github/copilot/about-
             | github-copil...]
        
           | TeMPOraL wrote:
           | > _your only way to feed that proprietary code into it would
           | be to upload it to github, which would make you an infringer_
           | 
           |  _Somebody_ has to do it. Someone ready to take one for the
           | team (or better yet, already skilled in software piracy, so
           | that it 's not a big deal for them). Then, if this argument
           | holds, _everyone_ gets the result in public domain.
           | 
           | As I see it, the argument presented in this post essentially
           | makes Copilot to be an universal copyright laundering
           | machine. Not just for code, but for anything that can be
           | represented digitally.
           | 
           | Obviously this won't stand. While I can see Github ending up
           | protected from all liability, the only way for this to not
           | kill copyright is for the _users_ of Copilot to become at
           | risk of copyright infringement. Which kills the whole value
           | proposition of Copilot.
        
             | dTal wrote:
             | This seems like a good time to ask what the heck _is_ the
             | value proposition of a thing like this. Are people really
             | going to use the output of this blindly? And if they 're
             | going to audit every line - is that really easier than just
             | writing the code yourself? Honestly, _at best_ it feels
             | like a machine for introducing pernicious bugs that  "look
             | right" but are semantically wrong. (Which reminds me - were
             | any of the Underhanded C Contest entries in the training
             | data?)
        
               | TeMPOraL wrote:
               | That's a very good question. On the surface, the idea
               | _seems_ to be helping people write code faster - but as
               | you observe, properly auditing generated code is _more_
               | work than actually writing it from scratch.
               | 
               | Best I can think of in terms of real value delivered, is
               | helping people with first drafts, breaking through the
               | "staring at an empty page" problem. But even with this, I
               | feel it's too risky compared to doing a StackOverflow
               | search, where you can at least see some explanations,
               | discussions, and other relevant context.
               | 
               | It's definitely an interesting vision demonstrator -
               | despite not being quite there, it lets us see that a tool
               | like this that _actually worked well_ (in terms of
               | generating correct, explainable, license-respecting code)
               | would be very useful.
        
               | visarga wrote:
               | Assume GitHub designs a filter to detect similarities to
               | the training set and displays an attribution link with
               | the result, as a comment. It's no different from using a
               | search engine to find the code and putting it in your
               | project, especially that the code is public already and
               | visible to multiple search engines. You are ultimately
               | responsible, just like you are every day with Google.
               | 
               | But the model has on average just 1 regurgitation in 10
               | weeks per user, so you can just discard all of them.
        
             | visarga wrote:
             | Almost all the output of Copilot is not an exact copy of
             | any code in the training set. You discard the 0.001% of
             | generations that are similar to the training data and use
             | the rest.
        
       | captaincaveman wrote:
       | If I understand what is being stated correctly; even if I assert
       | a prohibition in my licence for my creative work (code) not to be
       | used by Copilot (or any other machine learning model as training
       | data), it wouldn't matter as its not covered by Copyright?
        
       | yakubin wrote:
       | _> The output of a machine simply does not qualify for copyright
       | protection - it is in the public domain._
       | 
       | Does it mean that compiler output does not qualify for copyright
       | protection and I may legally share copies of MS Word via torrent?
        
       | dragonwriter wrote:
       | > Copyleft does not benefit from tighter copyright laws
       | 
       | Of course it does, at least the goal copyleft serves for RMS
       | style Free Software ideologues. While copyleft may be motivated
       | by an ideology that prefers _no_ copyright protections, at least
       | for software, it relies on copyright maximalism to avoid nonfree
       | derivatives. From advocates viewpoint, the worst situation is a
       | copyright regime that is strong enough that it allows nonfree
       | software to exist but is also weak enough that it prevents
       | creating an iron wall that prevents the use of software built by
       | ideolgoical opponents of nonfree software from being used to
       | advance nonfree software.
        
       | creshal wrote:
       | > On the other hand, the argument that the outputs of GitHub
       | Copilot are derivative works of the training data is based on the
       | assumption that a machine can produce works. This assumption is
       | wrong and counterproductive. Copyright law has only ever applied
       | to intellectual creations - where there is no creator, there is
       | no work.
       | 
       | Cool. I'll just train my new AI on 20 different copies of the
       | same Disney movie and have it generate a new movie. Checkmate,
       | lawyers!
        
         | alpaca128 wrote:
         | "The model might be slightly overfitted, but no creator, no
         | work"
        
         | nkrisc wrote:
         | I don't think the judge will care _how_ you arrived at the
         | copyright violation, only that you did. But hey, I 'd love to
         | see that court case anyhow, maybe I'm wrong.
        
           | TeMPOraL wrote:
           | Yes, and this principle should apply equally well to Copilot
           | - in particular, to anyone _using_ code provided by Copilot
           | in their projects.
        
         | arcturus17 wrote:
         | You do understand that's not how laws work in general, right?
         | 
         | The court of law would probably have you unveil and tear apart
         | your process and find that you were trying to plagiarize in a
         | roundabout way.
        
           | xdennis wrote:
           | > you were trying to plagiarize in a roundabout way
           | 
           | So using machine learning with one movie is illegal, but
           | using it with a million isn't?
        
             | mthoms wrote:
             | Generally speaking - probably yes. That's because the
             | output would be (a) transformative and (b) not likely to
             | affect the profitability of the original work (at least not
             | directly).
             | 
             | Note: I'm not really trying to comment specifically about
             | the code/movie examples - just the general notion that the
             | more input there is (from different sources), the
             | likelihood that the use will be considered "Fair Use"
             | increases.
             | 
             | https://en.wikipedia.org/wiki/Fair_use
        
           | Sr_developer wrote:
           | Do you think this Copilot is some sort of advanced AGI who
           | just became a genius programmer? Almost every piece of code
           | that it "generates" you will find it almost verbatim in one
           | or several public repos.
        
           | creshal wrote:
           | > The court of law would probably have you unveil and tear
           | apart your process and find that you were trying to
           | plagiarize in a roundabout way.
           | 
           | Well, yes, that's the point.
        
             | arcturus17 wrote:
             | I'm not sure what the point is. That the argument of
             | copying Disney movies is not a good analogy, or that courts
             | will find plagiarism in software copyright cases involving
             | Copilot?
        
               | TeMPOraL wrote:
               | The point is that Copilot is effectively doing the Disney
               | movie thing, just with code, and yet this article argues
               | this is all fine. As it is, the article turns Copilot
               | into an universal copyright laundering machine.
        
           | phoe-krk wrote:
           | > The court of law would probably have you unveil and tear
           | apart your process and find that you were trying to
           | plagiarize in a roundabout way.
           | 
           | This is the whole idea. Copilot is spitting out considerable
           | chunks of code that is licensed under GPL and it will be up
           | to GitHub to prove that Copilot is _not_ trying to plagiarize
           | this code in a roundabout way.
           | 
           | In the very least, Copilot should have separate data stores
           | for different groups of licenses: public domain, attribution-
           | only, copyleft, etc.. That would already make it much more
           | usable than the current "here's some code, it came from I
           | don't know where, don't ask me" that literally looks like
           | black market deals except they are GitHub-branded.
        
             | visarga wrote:
             | > Copilot is spitting out considerable chunks of code that
             | is licensed under GPL
             | 
             | One regurgitation in 10 weeks per user. Not considerable.
             | Could be just skipped with a simple search by Github.
        
           | hmfrh wrote:
           | > You do understand that's not how laws work in general,
           | right?
           | 
           | The only reason the law doesn't work this way for Microsoft
           | Copilot is because the copyright holders are individuals who
           | do not have the capital or expertise to file suit.
           | 
           | If Microsoft instead released a video editor addon that was
           | trained on Disney movies and which would sometimes insert
           | scenes of _any_ Disney movie you can bet your ass we wouldn't
           | be having the same discussion.
        
             | visarga wrote:
             | Comparing code to movies - in code even a single char
             | difference can change the meaning of everything, in movies
             | - you can skip whole scenes and still get the meaning. I
             | don't think the two are compatible, they are judged by
             | different standards.
        
       | CyberRabbi wrote:
       | > Works licensed under copyleft may be copied, modified and
       | distributed by all, as long as any copies or derivative works may
       | in turn be re-used under the same license conditions. This
       | creates a virtuous circle, thanks to which more and more
       | innovations are open to the general public.
       | 
       | She claims that Copilot advanced the goals of copyleft but
       | copilot does not create a "virtuous cycle" of generating more
       | public IP. The customers of Copilot use Copilot extract public
       | work through Copilot for themselves and are not compelled to
       | contribute back.
       | 
       | Copilot is anti-FOSS plain and simple.
        
       | oolonthegreat wrote:
       | Such a weird argument: "Copyleft people should not argue for
       | better copyright". What does that even mean?
        
         | pessimizer wrote:
         | She's arguing that copyleft people are arguing for an effective
         | extension of copyright into places IP lobbyists are currently
         | fighting for. It's not a good framing. She's saying that we
         | shouldn't argue for copyright to be consistent if we're against
         | copyright - arguing that we should make a moral argument
         | against a legal situation.
         | 
         | It's as if we couldn't argue against drug companies being
         | allowed to sell heroin if we were anti-drug war and drugs would
         | remain illegal. It's a strategy argument that leads nowhere. If
         | the result of making machine written works also subject to
         | copyright results in all possible songs being copyrighted by a
         | machine, _that 's a good outcome._ It's obviously absurd and
         | weakens the entire concept.
         | 
         | We should demand consistency.
         | 
         | If this is fine, we might as well stop enforcing the GPL, too.
         | It's a trick of copyright to further the cause of anti-
         | copyright. I'm sure somebody can write an "auto-fork" that will
         | digest GPL'd code and rearrange and rephrase it in order to
         | spit out a clone.
        
       | codesections wrote:
       | Julia Reda's analysis depends on the factual claim in this key
       | passage:
       | 
       | > In a few cases, Copilot also reproduces short snippets from the
       | training datasets, according to GitHub's FAQ.
       | 
       | > This line of reasoning is dangerous in two respects: On the one
       | hand, it suggests that even reproducing the smallest excerpts of
       | protected works constitutes copyright infringement. This is not
       | the case. Such use is only relevant under copyright law if the
       | excerpt used is in turn original and unique enough to reach the
       | threshold of originality.
       | 
       | That analysis may have been reasonable when the post was first
       | written, but subsequent examples seem to show Copilot reproducing
       | far more than the "smallest excerpts" of existing code. For
       | example, the excerpt from the Quake source code[0] appears to
       | easily meet the standard of originality.
       | 
       | [0]: https://news.ycombinator.com/item?id=27710287
        
         | joe_the_user wrote:
         | [Ianal]
         | 
         | The thing about the situation is that "copying code you found
         | on the Internet" certainly isn't automatically, always legal.
         | That you engaged in copying X from the Internet doesn't make
         | illegal either. Your source for the source code your
         | incorporate into a product doesn't matter, what matters is
         | whether that code is copyrighted and what the license terms (if
         | any) are (and people saying "copyright doesn't apply to
         | machines" are wildly misinterpreting things imo).
         | 
         | Given what's come out, it seems plausible that you could coax
         | the source of whatever smallish open source project you wished
         | out of copilot. Claiming copyright on that code wouldn't be
         | legal regardless of Copilot.
         | 
         | Whether Microsoft/Github would be liable is another question as
         | far as I can tell. I mean, Youtube-dl can be used to violate
         | copyright but it isn't liable for those violations. The only
         | way Copilot is different from youtube-dl is that it tells it's
         | users everything is OK and "they told me it was OK" is
         | generally not a legal defense (IE, I don't know for sure but
         | I'd be shocked if the app shielded it's users from liability).
         | All the open source code is certainly "free to look at" and
         | Copilot putting that on a programmers screen isn't doing more
         | than letting the programmer look at it until the programmer
         | does something (incorporating it into a released work they
         | claim as their own would be act).
         | 
         | The question is how easily a programmer could accidentally come
         | up with a large enough piece of a copyrighted work using
         | Copilot. That question seems to be open.
         | 
         | TL;DR; My entirel amateur legal opinion is that Copilot can't
         | violate copyright but that it's users certainly can.
        
         | riedel wrote:
         | I would love to try a session of clean room reverse engineering
         | using copilot. I would bet you get reasonably far for very
         | common libraries with not much effort. The question would be if
         | such compression/decompression would infringe copyright.
        
         | make3 wrote:
         | It can, that does not mean that it will, in any case other than
         | people actively probing it for that.
        
         | creshal wrote:
         | I'm not sure if making an "analysis" without doing any research
         | whatsoever is reasonable.
        
           | codesections wrote:
           | I'm not sure either --which is why I said "may have been
           | reasonable" instead of "was reasonable" :)
           | 
           | I can see an argument for doing your own research, but I can
           | also see an argument for basing an analysis on what GitHub
           | said in the FAQ -- I'm honestly a bit surprised that
           | Microsoft's lawyers let them say that with a product that can
           | reproduce such large blocks of verbatim code.
        
             | creshal wrote:
             | My guess is that their lawyers weren't consulted, and that
             | the Github people just shipped it on their own.
        
               | hnfong wrote:
               | It's not obvious that _Microsoft_ is violating copyright
               | yet. The main concern is whether the product makes others
               | liable.
               | 
               | So it could be that the executives really wanted to do
               | it, and the lawyers thought "OK, _technically_ we 're not
               | violating anything...."
        
               | yunohn wrote:
               | That literally cannot happen in FAANG/MS, esp not when
               | the CEO announces the product in a public blog post.
        
               | swiftcoder wrote:
               | Yep. Individuals in a FAANG don't have the ability to
               | launch a product without review. Just drafting a press
               | release for a new product involves Comms oversight and
               | VP-level approval.
        
         | denton-scratch wrote:
         | > appears to easily meet the standard of originality
         | 
         | It's an algorithm. In the olden days, you couldn't copyright an
         | algorithm, even an original one. There's only so many ways you
         | can express an algorithm; the best ways are using code. So is
         | it the intention that rewriting in Python an algorithm
         | previously expressed in C would be infringing? Suppose the
         | algorithm is re-expressed in English?
         | 
         | Allowing copyrights on algorithms is tantamount to allowing
         | copyrights on thought-processes.
         | 
         | Here come the thought-police. Take cover.
        
           | geofft wrote:
           | Copyright is (and has been, since the earliest days) about
           | protecting the _creative expression_ of an idea.
           | 
           | You can't copyright an algorithm, but you certainly can
           | copyright the expression of an algorithm in Python. You
           | cannot copyright the words of the English language and their
           | meanings, but Noah Webster absolutely did copyright his
           | dictionary, which was a creative expression of their
           | definitions (and lobbied for the first increase to US
           | copyright law). Webster wasn't the "thought police" for
           | trying to copyright people's understanding of words in
           | English, because he didn't and couldn't copyright them; he
           | copyrighted his expression of what words meant.
           | 
           | If you read the creative expression of an algorithm in Python
           | and then re-express it in English, then sure, copyright
           | protection doesn't extend to that re-expression. But Copilot
           | isn't doing that, it's quite clearly reproducing parts of the
           | original creative expression of an algorithm, not the
           | algorithm itself.
           | 
           | Here's an easy way to demonstrate it: open up a source file
           | in any language _other_ than C and try to get Copilot to spit
           | out an implementation of Quake 's fast-inverse-square-root
           | algorithm. You will very quickly discover that Copilot
           | doesn't "know" the algorithm; it only "knows" the specific
           | creative expression of it (comments included).
        
           | eesmith wrote:
           | There are no thought police here.
           | 
           | In the US, copyright may include the choice of variable
           | names, the organization of the code into modules and
           | functions, and other aspects which where there are the
           | creative choices that may be protected under copyright law.
           | 
           | The relevant process is described at
           | https://en.wikipedia.org/wiki/Abstraction-Filtration-
           | Compari... , which comes from the court case at https://en.wi
           | kipedia.org/wiki/Computer_Associates_Internatio.... nearly 30
           | years ago:
           | 
           | > the court presented a three-step test to determine
           | substantial similarity, abstraction-filtration-comparison.
           | This process is based on other previously established
           | copyright principles of merger, scenes a faire, and the
           | public domain.[1] In this test, the court must first
           | determine the allegedly infringed program's constituent
           | structural parts. Then, the parts are filtered to extract any
           | non-protected elements. Non-protected elements include:
           | elements made for efficiency (i.e. elements with a limited
           | number of ways it can be expressed and thus incidental to the
           | idea), elements dictated by external factors (i.e. standard
           | techniques), and design elements taken from the public
           | domain. _Any of these non-protected elements are thrown out
           | and the remaining elements are compared with the allegedly
           | infringing program 's elements to determine substantial
           | similarity._
           | 
           | Emphasis mine. This specifically highlights that your example
           | ('only so many ways you can express an algorithm') is _not_
           | protected under US copyright law.
           | 
           | The originality requirement only applies to other aspects of
           | the generated code, which in this case would include the
           | comments that Copilot generated, and which clearly are not
           | required for the algorithm to work.
           | 
           | For thought police like you describe, look to patent law.
        
           | jen20 wrote:
           | > So is it the intention that rewriting in Python an
           | algorithm previously expressed in C would be infringing?
           | 
           | Yes, a port from language X to Y is widely considered a
           | derived work. Whether it is infringing is a separate
           | question.
        
           | dang wrote:
           | Please omit flamebait from your HN comments. It tends to
           | produce flamewars, which are tedious and nasty. Your comment
           | would be fine without the last two sentences.
           | 
           | https://news.ycombinator.com/newsguidelines.html
        
             | denton-scratch wrote:
             | I didn't realise I was perpetrating flamebait; my last two
             | sentences were meant as rhetorical hyperbole (and I wasn't
             | targetting anyone here!)
             | 
             | At any rate, I like it here; so I'll try to figure out how
             | what I said was flamebait, and try not to say such things
             | again.
             | 
             | Sorry.
             | 
             | [Edited upon re-reading]
        
           | glitchc wrote:
           | Clarification: One cannot patent an algorithm, but an
           | implementation in source code can certainly be copyrighted.
        
             | Wowfunhappy wrote:
             | Huh, that's interesting. While I'm hesitant to suggest that
             | what the world needs is even more patents, this doesn't
             | make immediate sense to me.
             | 
             | Let's say someone comes up with a new sorting algorithm,
             | which completes in less cycles than was previously believed
             | possible. Sure, it's math, but isn't that a new, creative
             | expression? Don't we want to encourage them to publish
             | their algorithm (one of the key purposes of patents--this
             | way, anyone can use it after 20 years), as opposed to
             | keeping it hidden from the world?
             | 
             | It makes more sense to me than most software patents
             | (admittedly, a low bar to clear). And if the patent office
             | is doing its job (big if), the patents should only be
             | granted for algorithms which are sufficiently novel.
        
               | denton-scratch wrote:
               | A new super-fast sorting algorithm (not just a few
               | cycles, but something that actually changes the O-number)
               | would obviously be a fantastic boon - I would want the
               | inventor to benefit from his cleverness.
               | 
               | But nowadays I think patent law isn't the right way to do
               | that; trade secrets should be enough. I don't think that
               | what is disclosed to the public in patent applications is
               | of enough value to justify a long monopoly. It's not
               | necessarily a problem with the written law; patents are
               | horrible because of the way courts apply them.
        
             | gruez wrote:
             | >One cannot patent an algorithm
             | 
             | software patents?
        
               | tovej wrote:
               | An algorithm is maths, you can't patent maths. Patent
               | lawyers and business people have however somehow managed
               | to convince courts/patent authorities that configurations
               | of computer systems are patentable (or some similar
               | argument), which then makes software patentable (IANAL
               | but I think it's something like this).
               | 
               | Either way, the copyright of source code is separate from
               | that. Copyright is for the text of a program (the source
               | code), that might e.g. implement an algrithm. The
               | algorithm itself cannot be patented or otherwise legally
               | protected.
        
               | ModernMech wrote:
               | An algorithm is maths, but a lot of code isn't
               | algorithmic. Algorithms provably halt, and most software
               | doesn't halt, let alone provably. Operating systems,
               | browsers, games, etc. are non-algorithmic. It's hard to
               | claim that something like a browser is just math and
               | therefore deserves no IP protections.
        
               | denton-scratch wrote:
               | An algorithm is a reasoning procedure. A program (e.g. a
               | browser) embodies many algorithms.
               | 
               | I've not come across your stipulation that for a thing to
               | count as an algorithm it must provably halt, but I can go
               | along with that. So I'd argue that in most cases, any
               | function or subroutine provably terminates, even if the
               | program embodying it is not supposed to terminate.
               | 
               | I also don't agree that an algorithm is "just maths". At
               | least, not if you then pivot to saying that a browser
               | isn't "just maths". Any operation performed by a computer
               | is "just maths", because what a CPU does is basically
               | arithmetic and branching.
               | 
               | I don't think it's a question of what does and doesn't
               | "deserve" IP protection. The source code of a browser is
               | clearly an original work, and entitled to protection. But
               | the ideas and procedures it embodies are not "works", and
               | copyright isn't supposed to apply to ideas and
               | procedures.
               | 
               | I'm against the very idea of "intellectual property". It
               | must have seemed a good idea at the time, but I think
               | patents and copyrights have become monsters that inhibit,
               | rather than encourage, innovation and creativity.
        
               | ModernMech wrote:
               | > I also don't agree that an algorithm is "just maths".
               | At least, not if you then pivot to saying that a browser
               | isn't "just maths".
               | 
               | Algorithms are distinguished by their proofs of
               | correctness. This elevates them above simple procedures.
               | The halting problems tells us that there is no easy and
               | automatic way to determine whether or not a program
               | terminates. So when we find one, it's like discovering a
               | mathematical law. The proof of an algorithm's correctness
               | is expressed independently of any programming language or
               | platform. What else could they be other than math?
               | 
               | Things like browsers, games, operating systems, e-mail
               | clients, music players etc. are not treated this way.
               | They are not formally specified. They are implemented in
               | the context of a machine and an actual running
               | environment. The source code of the program usually
               | doubles as its specification. It's very different
               | compared to an algorithm.
               | 
               | I agree IP as a concept is bad, but this is the way of
               | the world at least for now. Given where we are, for me it
               | makes sense to draw a line between algorithms and
               | software in the context of copyright.
        
               | erk__ wrote:
               | On top of the sister comment software patents are not a
               | thing in general in Europe which I would imagine is the
               | authors area of expertise.
        
           | mytailorisrich wrote:
           | It's not the algorithm that's copyrighted. It's the source
           | code that implements it.
        
         | Joeri wrote:
         | But that fast inverse square root example is particularly
         | interesting because it is also a derivative work. Carmack did
         | not invent it, and several variations of it had been passed
         | around over time.
         | 
         | Algorithms should not be subject to copyright, that way lies
         | madness. It would prevent new generations from building on top
         | of the work of their predecessors, because copyright lasts a
         | very long time. The amounts of code that github copilot
         | reproduces fall squarely into the "shouldn't be subject to
         | copyright" domain for me, even if they pass the bar for
         | originality.
        
           | klodolph wrote:
           | Something which is a "derivative work" is still copyrighted.
           | In fact, by definition, a "derivative work" is copyrightable.
           | It's the minimum threshold at which something, based on
           | something else, gets its own, new copyright.
           | 
           | The algorithm is not copyrighted, but the source code of the
           | function _is_ copyrighted. You could learn how the algorithm
           | works by reading the function, and then write your own
           | function that implements the same algorithm. Algorithms are
           | not copyrightable, they are not subject to copyright. Source
           | code is copyrightable.
           | 
           | Copilot is not reproducing just the algorithm, it is spitting
           | out large chunks of the copyrighted source code, verbatim.
        
         | modeless wrote:
         | The funny thing about the Quake function is, id Software is
         | almost certainly not the origin of the code. They copied it
         | from somewhere else, possibly added profane comments, then
         | slapped GPLv2 on it. Did they even have the right to do that?
         | From an IP absolutist standpoint, probably not.
         | 
         | https://www.beyond3d.com/content/articles/8/
        
           | fridif wrote:
           | They did not copy the implementation, they copied the general
           | idea of what the algorithm should do.
           | 
           | Do not go down this line of reasoning, otherwise we will be
           | copyrighting the concept of for loops.
        
             | dahart wrote:
             | > they copied the general idea of what the algorithms
             | should do. Do not go down this line of reasoning
             | 
             | Too late, patents pick up where copyright ends, to protect
             | general algorithmic ideas, not just implementations. And we
             | have lots of patents on things that seem trivial now,
             | including for-loops (just see how many patents depend on "a
             | multiplicity"). Look - here's a helpful lawyer's template
             | for including for-loops as a claim in your own patents:
             | https://www.natlawreview.com/article/recursive-and-
             | iterative...
             | 
             | Another example is the famous XOR patent
             | https://patents.google.com/patent/US4197590/en
             | 
             | EFF keeps a blog on stupid patents
             | https://www.eff.org/issues/stupid-patent-month
        
               | fridif wrote:
               | Anyone who believes in a free and open society should do
               | away with all copyrights and patents.
               | 
               | Anyone who thinks that licensing will have an effect on
               | what is happening in reality is severely misguided.
        
               | dahart wrote:
               | > Anyone who believes in a free and open society should
               | do away with all copyrights and patents.
               | 
               | Free and open sound good to me! What do they mean
               | exactly? I guess it's a non-debatable fact that
               | copyrights and patents are abused by many big companies
               | and patent trolls, but doing away with the system does
               | seem extreme, it has also protected deserving individuals
               | on occasion, no? You are saying that it should _always_
               | be legal to copy someone else's code  / inventions
               | without giving them any credit or compensation?
               | 
               | > Anyone who thinks that licensing will have an effect on
               | what is happening in reality is severely misguided.
               | 
               | I'm not sure I understand what you mean; lots of
               | licensing activity does have a measurable effect on
               | reality. This article is only a small example, but people
               | get sued all the time over taking code and using it
               | without licensing it.
        
             | modeless wrote:
             | > They did not copy the implementation, they copied the
             | general idea of what the algorithm should do
             | 
             | [Citation needed]
        
               | fridif wrote:
               | If you wrote an algorithm in the early 80s that did x+y+z
               | 
               | And then I saw your source code and in the late 80s I
               | changed the variable names, function name, and logic to
               | be x+y+z+0.1
               | 
               | And then I told my friend John that there's a super cool
               | algorithm that adds numbers together, and he made some
               | more changes to it and compiled it for a different
               | platform...
               | 
               | Has anybody broken the law in your mind?
               | 
               | EDIT: because it would seem that the original authors
               | (among them Cleve Moler) don't have any issue with what
               | transpired
        
               | hnfong wrote:
               | The GP's argument is that you don't have evidence that
               | they didn't copy the whole function verbatim.
               | 
               | Is there a source that said they changed variable and
               | function names and modified the logic?
               | 
               | > because it would seem that the original authors (among
               | them Cleve Moler) don't have any issue with what
               | transpired
               | 
               | Yet. Without an explicit license there is no basis to
               | release it under the GPL (if the code was copied verbatim
               | or had insufficient re-writing). What if the heirs of the
               | copyright owner wanted to assert their rights? Is there a
               | doctrine that if you don't assert your rights you lose
               | them? (Presumably applies to trademarks, but I don't
               | think this is the case for copyrights)
        
               | fridif wrote:
               | The source code in question is over 40 years old and most
               | likely doesn't exist anymore in its original form.
               | 
               | What do we do then? The burden of proof for infringement
               | is on original authors, and they haven't done so for 40
               | years.
               | 
               | In the late 1700s and early 1800s, Britain had to take
               | measure to prevent visiting Americans and others from
               | memorizing the designs of their new high tech machinery
               | like the steam engine and the power loom.
               | 
               | Where do we draw the line? Shut down the internet until
               | we create a massive copyright detection firewall?
               | 
               | No, we live with the copying and constantly evolve and
               | adapt our business. Death to all patent trolls.
        
               | hnfong wrote:
               | > Where do we draw the line?
               | 
               | I won't even claim that people must necessarily follow
               | the law. Copyright law is inconsistent at best, and
               | notoriously hard to follow to the letter (and often
               | ridiculous). In practice lawyers assess the legal risk
               | and weigh the outcomes.
               | 
               | I never intended to discuss what we should do, and I
               | definitely did not propose shutting down the internet...
               | 
               | The original discussion was such:
               | 
               | > > They did not copy the implementation, they copied the
               | general idea of what the algorithm should do
               | 
               | > [Citation needed]
               | 
               | You said the original authors did not complain, which is
               | neither here nor there, as I pointed out. There is still
               | some theoretical legal risk if you copy with the owner's
               | knowledge but not express consent. The fact that the
               | burden of proof is on the authors is true but that they
               | have not brought a claim does not mean they cannot prove
               | infringement.
               | 
               | And in case I haven't made it clear, I don't think it's a
               | bad idea to assume the function is under GPL, I just
               | don't think there's a basis for claiming what you
               | originally claimed, and there is still _some_ level of
               | (probably acceptable) risk if you take the purported
               | license of source code as-is.
        
           | robertlagrant wrote:
           | It's not the actual copying of the idea, but the verbatim
           | reproduction of the function, comments and all. I think
           | people somehow thought that copilot could write code, and so
           | verbatim reproduction was surprising to them.
        
             | jonas21 wrote:
             | A quick search shows that this snippet, including comments,
             | is included in thousands of Github repos [1], so it's not
             | surprising that the model learned to reproduce it verbatim.
             | 
             | It's such a famous snippet that it's even included in full
             | on Wikipedia [2].
             | 
             | I wouldn't be surprised if the next version of Copilot
             | filtered these out.
             | 
             | [1] https://github.com/search?q=0x5f3759df+what+the+fuck&ty
             | pe=co...
             | 
             | [2] https://en.wikipedia.org/wiki/Fast_inverse_square_root#
             | Overv...
        
         | eterevsky wrote:
         | The excerpt from Quake code is literally one of the most famous
         | functions out there. There is no wonder that it was reproduced
         | verbatim. The share of such code, according to Github is really
         | small.
         | 
         | It would be quite straightforward to write an additional filter
         | that would check the generated code against the training corpus
         | to exclude exact copies.
        
           | bogwog wrote:
           | But the fact that it did that at all should be proof that
           | Copilot is, in fact, copy and pasting rather than actually
           | learning and producing new things using intelligence.
           | 
           | This is a code search engine with the ability to integrate
           | search results into your language syntax and program
           | structure. The database is just stored in the neural network.
           | 
           | It's definitely an impressive and interesting project with
           | useful applications, but it's not an excuse to violate
           | people's rights.
        
             | throwaway984393 wrote:
             | > actually learning and producing new things using
             | intelligence
             | 
             | People have been trying to accomplish that for 65 years.
             | We're not even close. It's the software equivalent of cold
             | fusion (with less scientific rigor)
        
             | eterevsky wrote:
             | Big part of work of almost any software engineer is finding
             | similar already written parts of code and adapting them.
             | How is this different?
        
             | cartoonworld wrote:
             | It also shows that copilot knows nothing about copyright,
             | and is incapable of considering copyright as such.
             | 
             | I'm not sure if I would characterize as a "database stored
             | in a neural net", but that is definitely something to
             | deeply consider.
        
             | [deleted]
        
             | TaupeRanger wrote:
             | This is all just computational statistics. Why in the world
             | would you invoke ill-defined anthropocentric terminology
             | like "intelligence"? Of course a statistics program isn't
             | "using intelligence".
             | 
             | But it's also not exactly just a database. It contains
             | contextual relationships as seen with things like GPT that
             | are beyond what a typical database implementation would be
             | capable of.
        
               | sdfzug wrote:
               | Define intelligence
        
               | solipsism wrote:
               | Also define "computational statistics". It'll be fun to
               | try and fail to draw a clear line between the two.
        
               | TaupeRanger wrote:
               | A common tech-bro fallacy. We understand exactly what is
               | happening at the base level of a statistics package. We
               | can point to the specific instructions it is undertaking.
               | We haven't the slightest understanding of what
               | "intelligence" is in the human sense, because it's
               | wrapped up with totally mysterious and unsolved problems
               | about the nature of thought and experience more
               | generally.
        
               | randallsquared wrote:
               | To be fair, they themselves referred to intelligence as
               | "ill-defined"...
        
               | bogwog wrote:
               | > But it's also not exactly just a database. It contains
               | contextual relationships as seen with things like GPT
               | that are beyond what a typical database implementation
               | would be capable of.
               | 
               | You mean in the same way that google.com isn't "just a
               | database"?
               | 
               | If Copilot isn't intelligent, then what makes it more
               | special than a search engine? How is Copilot not just
               | Limewire but for code?
               | 
               | I could understand the argument that, if Copilot really
               | is intelligent or sentient or something like that, then
               | what it is producing is as original as what a human can
               | produce (although, humans still have to respect copyright
               | laws). However, I haven't seen anyone even attempt to
               | make a serious argument like that.
        
               | freeone3000 wrote:
               | It _can_ produce code snippets that were never seen by
               | generating fragments from various sources and combining
               | them in a new way. This makes it different from a search
               | engine, which only returns existing items.
        
               | bogwog wrote:
               | Is it _producing_ code (by which I mean creating
               | /inventing new code by itself), or is it just combining
               | existing code? Because to me it seems like the latter is
               | a more appropriate description.
               | 
               | * AI searches for code in its neural-net-encoded database
               | using your search terms (ex: "fast inverse square root")
               | 
               | * AI parses and generates AST from the snippet it found
               | 
               | * AI parses and generates AST from your existing codebase
               | 
               | * AI merges the ASTs in a way that compiles (it inserts
               | snippet at your cursor, renames variables/function/class
               | names to match existing ones in your program, etc)
               | 
               | * AI converts AST back into source code
               | 
               | Is AI intelligently producing new code in that example?
               | Because I don't think it is.
               | 
               | What would be an interesting test of whether it can
               | actually generate code is if it were tasked with
               | implementing a new algorithm that isn't in the training
               | set at all, and could not possibly be implemented by
               | simply merging existing code snippets together. Maybe by
               | describing a detailed imaginary protocol that does
               | nothing useful, but requires some complicated logic,
               | abstract concepts, and math.
               | 
               | A person can implement an algorithm they've never seen
               | before by applying critical thinking and creativity (and
               | maybe domain knowledge). If an AI can't do that, then you
               | cannot credibly say that it's writing original code,
               | because the only thing it has ever read, and the only
               | thing it will ever write, is other people's code.
        
               | LightMachine wrote:
               | That isn't even necessary. I've been exploring GPT-3 for
               | a while and it is completely incapable of any reasoning.
               | If you enter short unique logical sentences like "Bob had
               | 5 apples, gave 2 to Mary, then ate the same amount. How
               | many apples Bob has left?" No matter how many previous
               | examples you give it (to be sure it gets the question),
               | it gets it wrong. It is simply incapable of reasoning
               | about what is going on.
        
               | sjy wrote:
               | Perhaps it's not so different from a search engine like
               | Google. The article cites Google's successful defence,
               | under US copyright law, of its practice of displaying
               | 'snippets' from copyrighted books in search results.
               | There is a clear difference between this and the
               | distribution of complete copies on LimeWire.
        
               | eterevsky wrote:
               | If you look at it this way, your brain is also "just"
               | computational statics. (Or to be precise, it might be,
               | since we don't yet know in all the details how it works).
        
           | tovej wrote:
           | Would it? What would the threshold be? Twenty lines copied
           | verbatim? Ten lines copied verbatim? What about boiler plate
           | like ten #include statements at the beginning of a file? Or
           | licenses in comments? What if someone has a one-liner that's
           | unique enough to be protected by copyright?
        
             | hobs wrote:
             | https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,
             | _...
             | 
             | I think that was a big part of the Google Vs Oracle case -
             | how much copying constitutes an infringement?
             | 
             | It looks like they made a fairly complex rubric to apply in
             | the future, it appears it would be on a case by case basis.
        
           | e3bc54b2 wrote:
           | > The excerpt from Quake code is literally one of the most
           | famous functions out there. There is no wonder that it was
           | reproduced verbatim.
           | 
           | The question that brings is that this was found because it is
           | so famous, but what if it is repeating Joe Schmoe's weekend
           | library project, but we will never know because its not
           | famous?
        
             | 0-_-0 wrote:
             | Because someone already checked, and it doesn't:
             | 
             | https://docs.github.com/en/github/copilot/research-
             | recitatio...
             | 
             | Every literally quoted part that could infringe appears at
             | least 10 times in the training data
        
               | Zababa wrote:
               | Someone that works at Github already checked. I think
               | asking for a more independant study is fair.
        
               | b3morales wrote:
               | This doesn't stand on its own as a defense: perhaps the
               | 10 inputs were legitimate copies of a single source. They
               | could be forked repos that were properly following the
               | original's license, for example.
        
               | leereeves wrote:
               | Or 10 different GPL projects that legitimately share code
               | that remains copyrighted and protected by the GPL. Or 10
               | obscure projects that illegitimately copied code but
               | haven't been caught.
               | 
               | Clearly, "10 other people did it" is no defense at all.
        
               | duskwuff wrote:
               | It might not even be "10 other people". For projects
               | which originated outside Github, it's common for multiple
               | users to have independently uploaded copies of the
               | project. There's probably at least 10 users who have
               | pushed copies of the GCC codebase to Github, for example.
        
           | happymellon wrote:
           | Pretty sure if someone trained a code suggestion tool with
           | Windows source, Microsoft would claim that a single similar
           | character being the same is grounds for copyright
           | infringement.
           | 
           | They are putting GPL code in non-gpled codebases. Is it okay
           | to take sections of other people's source code and use it on
           | yours, if you just got it as a suggestion?
        
             | dboreham wrote:
             | The true test will be whether MS indemnifies me against
             | claims of copyright (and patent) infringement due to use of
             | their tool.
        
               | happymellon wrote:
               | This will be interesting to watch.
        
         | onion2k wrote:
         | The example you linked to is talking about a 16 line function
         | from the Quake source. The Quake source is 167,594 lines in
         | total (counting the C code only). Does that _really_ fail to
         | meet the standard for  "smallest excerpt"?
        
           | stefan_ wrote:
           | That excerpt has its own Wikipedia page, of course it meets
           | the threshold of originality. In any case, once you are
           | discussing this, you have entered the area of _fair use_ ;
           | that is an admission of copyright violation.
        
             | coldacid wrote:
             | Fair use is not a violation of copyright but a specified
             | (and since 1976 statutory) _exception_ to it. You are
             | clearly impugning the doctrine with your comment.
        
             | mod50ack wrote:
             | Having a WP page isn't proof of threshold-passing.
             | https://en.wikipedia.org/wiki/BACH_motif
             | 
             | But there is also actually an issue about laundering and
             | what constitutes "use". But there is also de minimis to
             | consider.
             | 
             | And EVERYTHING will depend on jurisdiction of course.
             | 
             | IANAL
        
           | emodendroket wrote:
           | Not only that, but it is clearly someone going out of their
           | way to make it do that. I'm not sure that that is a
           | reasonable test of how the program typically behaves.
        
             | hmfrh wrote:
             | > I'm not sure that that is a reasonable test of how the
             | program typically behaves.
             | 
             | That's not what people care about, people care about their
             | copyright being blatantly violated by a massive corporation
             | _without any consequences_.
        
               | emodendroket wrote:
               | Ok, but is "I can go out of my way to make it misbehave"
               | adequate proof that the copyright is being violated?
        
               | ghoward wrote:
               | Not GP.
               | 
               | Yes, it is, because that means that the algorithm will
               | produce that copyrighted code regardless of the intent of
               | the person who makes it misbehave. People could both
               | accidentally and "accidentally" make it reproduce
               | copyrighted code. In the first case, it's unintentional.
               | In the second, how could you prove it's intentional?
               | 
               | Because of this whole mess, I am actually adding clauses
               | to FOSS licenses that I am writing, just to ensure that
               | my copyright on my code is not infringed by code
               | laundering.
        
               | emodendroket wrote:
               | To be clear, my suspicion is that this is so unlikely to
               | happen unintentionally that it does not represent a real
               | risk. If the issue is that I can force it to generate
               | infringing output if I really want to, it is an argument
               | against the Web browser too, since I could just as easily
               | use the copyright-unsafe "copy" feature.
        
               | ghoward wrote:
               | I don't entirely agree.
               | 
               | Whereas using the browser's copy feature requires the
               | user to have intent to use it, getting Copilot to produce
               | exact code does not. And proving that intent is not easy.
               | 
               | I think companies will see that such code _can_ be
               | exactly reproduced and decide to stay away from Copilot.
               | I hope they do. In fact, I am less willing to take
               | outside contributions for my own code, even for bug
               | fixes, just because of the risk that that code came from
               | Copilot.
        
               | breakfastduck wrote:
               | How long does it have to be for you to consider it
               | copyrighted code?
               | 
               | For example, a book could be copyrighted, but they
               | certainly cannot sue me because a book i wrote contains a
               | sentence that is the same.
        
               | ghoward wrote:
               | The answer to your first question is for the courts to
               | decide, unfortunately.
               | 
               | However, for my purposes, using a new license with
               | particular terms would only be to make companies like
               | GitHub pause and think before using my code as "training"
               | to an "algorithm" like Copilot.
        
               | b3morales wrote:
               | I'm not at all in favor of the "code laundering" (which
               | is a brilliant term, thank you). But I don't understand
               | how you expect a new license to help.
               | 
               | 1. A license applied to source code is effective _because
               | of_ your copyright
               | 
               | 2. The claim of Copilot's maintainers is that it
               | _bypasses_ copyright
               | 
               | Therefore, they will assert that they can ignore the new
               | license saying "you may not launder my code" just as
               | surely as they can ignore the previous license.
        
               | ghoward wrote:
               | First, I did not come up with the term "code laundering."
               | I cannot claim credit for that; I saw it first on HN on
               | https://news.ycombinator.com/item?id=27729209 somewhere.
               | 
               | Second, you are correct that Copilot's maintainers claim
               | that it bypasses copyright, but if it does while
               | producing exact copies of code, then copyright is dead,
               | and there are a lot of big companies out there with deep
               | pockets that will ensure that doesn't happen.
               | 
               | They may claim that because their algorithm is a black
               | box, that whatever it produces has no copyright, but my
               | licenses will push back directly on that claim by saying
               | that if source code under the license is used as all or
               | part of the inputs to an algorithm, whether all of the
               | source code or partially, then the license terms must be
               | attached to the output. After all, that's what we do with
               | GPL and binary code. The binary code is the output of an
               | algorithm (the compiler) whose input was the source code.
               | 
               | I hope by tying it together like that, the terms can
               | close the loophole they are claiming. But of course, I am
               | going to get a lawyer to help me with those licenses.
        
               | d110af5ccf wrote:
               | > ... if source code under the license is used as all or
               | part of the inputs to an algorithm, whether all of the
               | source code or partially, then the license terms must be
               | attached to the output.
               | 
               | You're not getting it. If Copilot isn't currently
               | infringing copyright then adding such a clause _won 't
               | matter_. Such a clause would only hold weight _when
               | copyright applies_. On the other hand, if copyright
               | _does_ apply, then you don 't need such a clause because
               | the activity is already a violation of the vast majority
               | of licenses. (It even violates extremely permissive ones
               | because it effectively strips out the license notice.)
               | 
               | The GPL works specifically because copyright applies to
               | the usecase in question. It simply specifies various
               | requirements that you must meet in order to license the
               | code _given that copyright applies_.
               | 
               | In short, you can't just put a clause into a license
               | saying, effectively, "and also, this license confers
               | superpowers which make it so that my copyright applies in
               | additional situations where it otherwise wouldn't!".
        
               | ghoward wrote:
               | Ah, I see.
               | 
               | I argue that, even if _training_ a dataset is fair use,
               | _distributing the result_ is copyright infringement. I
               | would want my license to make that part clearer.
        
               | hnfong wrote:
               | I think the GP's "license" would still be effective,
               | although it would not be "open source" per the OSI
               | definition.
               | 
               | Imagine this simplified scenario first: if I published a
               | source file publicly without any licensing or explanation
               | except a standard copyright notice - "Copyright (C) 2021
               | MY NAME, all rights reserved", do you think a random
               | person/company can take that code and integrate it into a
               | commercial product?
               | 
               | I would argue not (in general). Copyrights law as it is,
               | does not permit a user who has access to a copy to do
               | whatever they want with that copy (esp. if it involves
               | more copying). OSS licenses do give you much freedom as
               | long as you don't modify it, and that's why we have
               | impression that we can do whatever with publicized source
               | code. However, if we think about other types of
               | copyrighted work, say movies for example, streaming
               | services can "rent" you a movie multiple times even
               | though you've paid to download the content previously.
               | What are you paying for the second time you rent? Another
               | example - some photographers may allow you to freely
               | browse their works, but they can still make you pay money
               | if you want to use their photo in your commercial
               | product.
               | 
               | So why wouldn't copyright restrict usage of source code
               | in similar situations? The GP only needs to add a
               | condition to the license to restrict how users can use
               | it. It will no longer be OSS, but as long as it's his
               | work, I don't see why in principle it shouldn't work.
               | 
               | (In practice, I don't think it will make much difference
               | -- I think your argument is still somewhat compelling,
               | and some people will probably take your position.
               | Conservative corporate lawyers aimed at reducing legal
               | risk would disagree, so it's basically a matter of how
               | much legal risk one is ready to take. Also, for an author
               | trying to do this, note that suing Microsoft in these
               | cases would be expensive, since they will likely fight
               | back given that they spent so much money trying to do
               | this, and the outcome will be uncertain. If really tested
               | in court, given the result of the Oracle v Google case,
               | if the US Supreme Court is impressed by the
               | social/economic benefits that Android brings, I'm pretty
               | sure the justices will be even more impressed by this
               | intelligent code generation thingy, and might just grant
               | this thing a fair use.)
        
               | b3morales wrote:
               | Your summary is generally correct, and I certainly agree
               | with the other commenter's position on their work. But I
               | think you're still missing the point. Copyright is the
               | mechanism that allows you to prevent copying, but
               | GitHub's claim is that copyright is _irrelevant_ to
               | Copilot 's input.
               | 
               | I have a nice strong lock on my door. GitHub (asserts
               | that it) can enter my home through the window.
               | 
               | Adding another deadbolt to the door does not help.
        
               | hnfong wrote:
               | I don't think I missed that point. I'm trying to argue
               | that copyright _is_ relevant to Copilot 's input if not
               | allowed by an OSS license.
               | 
               | Maybe I'm missing something (just not the thing you
               | said), but has Github made any legal claims so far? The
               | original article is written by a politician in EU...
               | 
               | Even if you're a lawyer defending Github in this case,
               | there's still a couple things that needs to be clarified
               | before you can make the case: (maybe the info is out
               | there but I'm too lazy to research)
               | 
               | - Is Github only using code/repos that are explicitly
               | under OSS licenses? (because if that's the case, then the
               | discussion might be justified in presuming OSS terms, and
               | it may be the case that more restrictive non-OSS licenses
               | would require a different analysis)
               | 
               | - As somebody pointed out in another thread, the Github
               | terms of service agreement seems to grant Github
               | additional rights when dealing with user uploaded
               | content. Is that a legal basis for the use?
        
               | b3morales wrote:
               | > I'm trying to argue that copyright _is_ relevant to
               | Copilot 's input if not allowed by an OSS license.
               | 
               | And I tend to agree with you (and the other commenter)
               | here. But GitHub doesn't.
               | 
               | > has Github made any legal claims so far?
               | 
               | I'm not sure how actively, but the CEO was here in the
               | announcement thread the other day saying that they think
               | the ingestion of the inputs is a "fair use". They also
               | have some material defending the output side:
               | https://docs.github.com/en/github/copilot/research-
               | recitatio...
               | 
               | > Is Github only using code/repos that are explicitly
               | under OSS licenses?
               | 
               | I don't think we know exactly what code they used as
               | inputs, no.
        
               | [deleted]
        
               | ipaddr wrote:
               | Can you add fines?
        
               | ghoward wrote:
               | I wish. I just want users to know what rights _they_
               | have. Ultimately, I want my software to serve end users,
               | not companies. If companies add value for users with my
               | software, that 's exactly what I want.
               | 
               | But stripping licenses away so that users can't know what
               | rights they have with my code is not that.
        
               | rndgermandude wrote:
               | >I am actually adding clauses to FOSS licenses that I am
               | writing
               | 
               | Doesn't this make your new licenses incompatible to a lot
               | of existing licenses?
        
               | ghoward wrote:
               | wizzwizz4 is correct. Also, I have explicit clauses
               | saying that GPL/AGPL dominate.
               | 
               | But yes, my licenses may be incompatible (one-way) with
               | permissive licenses. I say "one-way" because code with
               | permissive licenses can still be used in code under my
               | licenses, but maybe not necessarily the other way around.
               | 
               | I'm okay with that.
        
               | rndgermandude wrote:
               | That does not really ring true to me. AGPL broadens the
               | scope of violations as well, and you cannot use AGPL code
               | in GPL-only code bases without turning the end product
               | AGPL (but you can use GPL-only code in AGPL code bases).
               | 
               | If you're just adding something along the lines of
               | "copying passages extensive enough to reach originality
               | is a violation of this license" then that's indeed
               | already covered by the GPL, and there is really no need
               | to add such a passage other than to be more explicit -
               | and confuse people at least at first about why your
               | license is not actually the GPL. So there isn't much of a
               | point to do it in the first place, in my humble opinion.
               | 
               | If you add text that says something along the lines of
               | "you may not use this code as training data", then you
               | created an incompatible license, and your code cannot be
               | used in GPL code bases, and even worse, since it
               | restricts what you can do with the code more than the
               | GPL, it might even mean you stop being reverse-compatible
               | and may not use GPL'ed code yourself in your own custom-
               | license code base.
               | 
               | The AGPL does not further restrict code uses, just
               | broadens the scope of when you have to make available the
               | code, so it's fine there. However, the original BSD
               | license with the advertising clause is considered
               | incompatible with the GPL.
               | 
               | I am not a lawyer, and these are just my quick layman
               | concerns. I fully recognize you're entitled to use
               | whatever license you find suitable for your code and I am
               | absolutely not entitled to your code and work whatsoever.
               | 
               | But that said, I wouldn't touch your code if I saw a
               | "potentially problematic" custom license, and I wouldn't
               | consider contributing to your projects either.
        
               | ghoward wrote:
               | I understand your concerns.
               | 
               | Honestly, with this whole debacle, I am not going to be
               | accepting outside contributions anyway.
               | 
               | I also understand the concern with a problematic license.
               | However, I don't plan to make a specific exemption about
               | machine learning, but rather tie up an ambiguity.
               | 
               | What I think I'll do is that the license will require
               | that when the licensed source code is used, partially or
               | fully, as an input to an algorithm, the license terms
               | must be distributed with the output of that algorithm.
               | 
               | I don't think this is a violation of the GPL at all
               | because the GPL requires you to distribute the license
               | with the binary code of GPL'ed code, and such binary code
               | is the output of an algorithm (the compiler) whose input
               | was the source code.
               | 
               | But what it would do is put the onus on GitHub that, if
               | they used my code in training that data, if they
               | distributed the results (as they are doing), they must
               | distribute my license terms as well and tell users that
               | some of the results are under those terms.
        
               | d110af5ccf wrote:
               | > something along the lines of "you may not use this code
               | as training data"
               | 
               | Would such a term be legally binding under present
               | copyright law? Other than disallowing inclusion in a
               | redistributed dataset specifically intended for training
               | ML models, it's not clear to me that it would actually
               | prevent such use if you already had a copy on hand for
               | some other purpose. (Specifically, note that GitHub
               | indeed already has a copy on hand for their authorized
               | primary purpose of publicly distributing it.)
               | 
               | More generally, the manner in which copyright law applies
               | to machine learning algorithms _in general_ hasn 't been
               | worked out by either the courts or legislature yet. Hence
               | the current article ...
        
               | wizzwizz4 wrote:
               | Not necessarily. If you do it right, you've got a
               | perfectly GPL-compatible license (because such laundering
               | is, technically, a violation of the GPL... probably) -
               | it's just a license that's more explicit about what's a
               | license violation.
               | 
               | Law isn't code.
        
               | hnfong wrote:
               | GPL explicitly forbids re-licensing under more
               | restrictive terms.
               | 
               | So either the added terms are not more restrictive, which
               | basically means they are unnecessary and have no real
               | effect; or they are more restrictive, which is
               | incompatible with the GPL.
               | 
               | You can't have things go both ways. It seems that your
               | argument is "we're not adding restrictions, we're just
               | saying what we think Copyright law / the GPL should
               | actually be like." But unfortunately you can't "clarify"
               | Copyright Law or "clarify" the GPL by adding terms.
               | Ultimately courts decide that.
               | 
               | (Of course, if somehow your "clarification" happens to
               | align with a court decision, then maybe it will work
               | after all. But in theory your "clarification" is still
               | not necessary and has no additional effect....)
        
               | heavyset_go wrote:
               | GPL code and its derivatives can't be distributed with
               | additional restrictions.
        
               | formerly_proven wrote:
               | Double standards ensue.
               | 
               | Tool that could be used to violate copyright := Gets
               | prosecuted by MPAA and friends, legislation is passed to
               | make use / development / distribution of such tools
               | illegal
               | 
               | Bigcorp ships the ML equivalent of ALLCODE.tgz, but you
               | actually gotta look in the
               | no/dont/open/this/folder/gplviolations/quake.c folder :=
               | Is this adequate proof that copyright is being violated?
        
               | emodendroket wrote:
               | Since I do not work for the MPAA, I don't see why you
               | expect me to answer for them. Half of the article's
               | argument is that any argument you could use to shut down
               | Copilot would also give a lot of power to such entities
               | if it were accepted.
        
               | TeMPOraL wrote:
               | Honestly, I feel most people don't care about that. What
               | they do care about, is the risk of Copilot making the
               | _user_ liable for copyright infringement. Even a
               | possibility of it spewing out non-public-domain code
               | should be considered a showstopper for any use of
               | Copilot-generated code in a commercial project.
               | 
               | Can Copilot produce licensed code verbatim, in enough
               | quantities to matter, with a license your business would
               | be infringing? Yes. Can you easily tell by looking at the
               | output? No. Could someone end up suing you over it?
               | Maybe, if they cared enough to find out. Can you honestly
               | tell your investors, or a company you seek to be acquired
               | by, that nobody else can have valid copyright claim
               | against your code? No.
        
               | emodendroket wrote:
               | > Can Copilot produce licensed code verbatim, in enough
               | quantities to matter, with a license your business would
               | be infringing? Yes. Can you easily tell by looking at the
               | output? No. Could someone end up suing you over it?
               | Maybe, if they cared enough to find out. Can you honestly
               | tell your investors, or a company you seek to be acquired
               | by, that nobody else can have valid copyright claim
               | against your code? No.
               | 
               | Well aren't all your assertions exactly the point of
               | contention?
        
               | TeMPOraL wrote:
               | Well, the "enough quantities to matter" part wasn't
               | tested in courts yet, but I fail to see a way to rule for
               | "No" here in a way that wouldn't gift us an universal way
               | to turn any code into public domain, destroying source
               | code licensing as a concept. Other than this part, the
               | first two claims have already been demonstrated, and the
               | rest follow from them.
        
               | emodendroket wrote:
               | But that is in fact the most fundamental question here.
               | And I'm not fully sold on the idea either that this is
               | going to happen in real-world usage or that a single
               | function in a massive program constitutes a large enough
               | portion to be infringing.
        
               | TeMPOraL wrote:
               | Quake's square root function wasn't the only, or the
               | largest, example of code Copilot reproduces verbatim.
               | Among others I've seen to date is someone generating a
               | real "About" page with PII information of some random
               | software developer.
               | 
               | How much code is enough to infringe is a tricky question,
               | though. It's not only a function of size, but also of
               | importance/uniqueness - and we know that Copilot doesn't
               | understand these concepts.
        
               | shakna wrote:
               | > ... or that a single function in a massive program
               | constitutes a large enough portion to be infringing.
               | 
               | As part of the sequences of rulings in Google vs Oracle,
               | the 9-line rangeCheck function, in the entirety of the
               | Android codebase, was found to be infringing.
        
             | [deleted]
        
       | noobermin wrote:
       | I've said this before, but I hope the issue isn't infringement
       | per se, but that the produced code isn't automatically GPL'ed.
       | The author argues that machine generated code isn't copyrighted
       | and this is good because it essentially fits the "data wants to
       | be free" mentality, but I'd say tell that to the people who use
       | it. Will they, after using something derived from open source,
       | have to open source their code? No, they won't. If anything, this
       | finally provides closed source developers with what they've
       | always wanted, a means to rip open source code without having to
       | return contributions.
       | 
       | Julia Reda hints at that last bit as being an issue but only in a
       | parenthetical. To the author, that literally is _the whole
       | point_. Do people not remember the Free Software vs. Open Source
       | debate? Or GPL vs BSD? The requirement that derived works also be
       | free is literally the important bit in Free Software. This only
       | fits the mentality of  "data wanting to be free" if your model of
       | that idea includes the permissive sensibility and doesn't care
       | about actually changing the state of things, which is making free
       | software more widely used in the world over proprietary software.
        
       | jordigh wrote:
       | > The output of a machine simply does not qualify for copyright
       | protection
       | 
       | Wolfram disagrees, and he's got lawyers and money too. Whom do we
       | believe?
       | 
       | http://www.groklaw.net/article.php?story=20090518204959409
        
         | Mindwipe wrote:
         | Wolfram is discussing American law, Reda European.
         | 
         | (I'm still not sure I agree with Reda, but the point is at
         | least arguable under European law and depends on the
         | circumstances).
        
           | jordigh wrote:
           | But there's a Berne convention that kind of unifies copyright
           | around the world, right? It's not like something can be
           | copyrighted in one country but not another.
        
       | sascha_sl wrote:
       | I frankly think that the "free culture" label and extremely
       | permissive licenses of many open source project are nothing but a
       | redistribution of wealth upwards. Those with existing capital can
       | make profitable unfree derivative works without any benefit to
       | original authors. This relationship must go both ways if you want
       | actual free culture. Stop producing MIT/BSD code in your non-work
       | time.
       | 
       | This is not a research project, this is a commercial work that
       | produces verbatim copies of code without disclosing its license
       | (or having a license grant in many cases). It doesn't matter how
       | it manages to reproduce it either. It does.
        
         | FranchuFranchu wrote:
         | > without any benefit to original authors
         | 
         | I don't really expect any benefit when writing some pieces of
         | code. These days I just make certain types of code I make
         | public domain, because if it's MIT then BigCorp inc. will just
         | make my name one in a really long list of contributors, and I
         | won't get any benefit either.
        
         | indigochill wrote:
         | > extremely permissive licenses of many open source project are
         | nothing but a redistribution of wealth upwards
         | 
         | Although I license all my amateur work GPLv3, I dispute the
         | assertion that more permissive licenses are "nothing but" a
         | redistribution of wealth upwards.
         | 
         | Permissive licenses commoditize their features. This is of
         | benefit to everyone, but organizations with more capital are
         | better positioned to leverage that commodity, and typically
         | when they do, they do so selfishly. This further centralizes
         | value with them, but I believe this is still a better outcome
         | than a closed license because of the
         | educational/cultural/technical benefits to everyone of the open
         | license. The capital leverage problem is orthogonal to that.
         | 
         | Copyleft licenses kind of do the same commoditization thing,
         | but explicitly only for share-alike uses, which is why they're
         | the only way to grow open culture relative to proprietary
         | culture: they're designed to deny "freeloaders" on open culture
         | by requiring all derivative works to also be open.
         | 
         | > without any benefit to original authors
         | 
         | Benefiting the original authors is not (directly) the point of
         | open source.
         | 
         | MIT: "Here's some code, go nuts."
         | 
         | GPL: "Here's some code, go nuts, but share the nuts if you do."
         | 
         | There's zero value capture for the original author built into
         | these licenses. If the original author wants to ensure they
         | capture the value from their work, I recommend using a closed,
         | proprietary license. Just be aware they're applying friction to
         | the overall technological development of humanity by doing so.
        
         | Kiro wrote:
         | I don't expect any benefits when releasing stuff under MIT,
         | regardless of who uses it. That's the whole point or I wouldn't
         | use that license.
        
         | wccrawford wrote:
         | This is the closest anyone has ever come to convincing me to
         | use GPL instead of MIT license.
         | 
         | But I still want to support _small_ developers with anything I
         | produce for fun, and I 'm not willing to give that up to spite
         | the big developers.
         | 
         | For instance, I wrote a small class to load OBJ files in Unity
         | because I needed it for an idea. I went ahead and put it on
         | Github for others that need it, too. I could easily see someone
         | having an idea similar to mine that needed that and couldn't
         | find it out there. (I think there are more libraries like that
         | now, though.) I wanted them to feel comfortable using it, even
         | if they eventually make money with their game.
         | 
         | If a big corp uses that code, too, that sucks. But there's no
         | good way to draw that line in a license, so I didn't.
         | 
         | Having said that, in the future I could see releasing some
         | software that I don't think anyone should profit from, and in
         | that case I'd GPL it. Previously, I'd have just defaulted to
         | the same MIT license. I'm just not sure what that'd be yet.
        
           | dgb23 wrote:
           | Just to be sure: GPL doesn't prevent anyone from selling.
        
             | wccrawford wrote:
             | No, but it does force them to GPL their own code if they
             | do. And that's a no-go for most companies.
        
               | enriquto wrote:
               | > No, but it does force them to GPL their own code if
               | they do. And that's a no-go for most companies.
               | 
               | There's many companies who release source code. I don't
               | know enough to say if that makes the word "most" in your
               | sentence false or not.
               | 
               | Anyways, a company using GPL code need not release all
               | their own code. Just their modifications to that
               | particular GPL program. And then again, only if they do
               | intend to distribute the modified program.
        
           | enriquto wrote:
           | > This is the closest anyone has ever come to convincing me
           | to use GPL instead of MIT license.
           | 
           | > But I still want to support small developers with anything
           | I produce for fun, and I'm not willing to give that up to
           | spite the big developers.
           | 
           | Another line of thought that may help you choose a license:
           | Do not think only about other developers. Think about the
           | final users of your code that will be running your algorithms
           | on their computers. The GPL protects the right of these users
           | to see and modify the code they run (your code). So-called
           | permissive licenses, on the other hand, let middlemen to
           | strip this right from your users.
           | 
           | Users of your code are in fact freer thanks to copyleft
           | licenses.
        
       | zxcb1 wrote:
       | Open source developers deserve the same rights as corporations.
       | 
       | As a side note, in a not so distant future there may be
       | decompilers enhanced by artificial intelligence.
        
       | ksec wrote:
       | So may be it is best to have a separate license for Machine
       | Learning? Let's call it copilot licences. ( May be it is better
       | to call it an exemption ? )
       | 
       | You will need AGPL / GPL / LGPL / MIT / Apache / BSD + Copilot
       | licences before it can be used for training? Knowing there are a
       | very small possibility that some code snippet will be the output?
       | 
       | I mean we could endless debate this with no solution unless this
       | is put into court.
        
       | Sr_developer wrote:
       | This is a supposedly progressive politician, young, in an
       | advanced country, her personal platform runs almost entirely on
       | copyright issues and yet she gets almost everything wrong, what
       | can you expect from your usual dinosaurs?
        
         | marcosdumay wrote:
         | Depends on who is founding the dinosaurs.
         | 
         | She had to work really hard to get to that conclusion she
         | stated.
        
       | marcodiego wrote:
       | Simple way to fix this mess: allow to user to choose training
       | data samples licenses.
        
         | stabbles wrote:
         | And then what license do you choose? Many licenses require you
         | to copy the original license verbatim, which may include the
         | author's name and the date.
        
       | [deleted]
        
       | denton-scratch wrote:
       | > The output of a machine simply does not qualify for copyright
       | protection
       | 
       | "Simply"? If it were that simple, surely that would mean that the
       | output of the Unix "cp" program would not qualify? What about a
       | DVD copier?
       | 
       | I'm OK with copyright as it used to be, back when I was a
       | teenager; the right expired with the author's life. Corporations
       | couldn't own copyrights. There was no burden on the author to
       | register their rights. And copyright was a civil matter; you sued
       | for actual damages. Infringement wasn't a crime.
       | 
       | I'm not OK with modern copyright law, with criminal penalties,
       | rights that can be transferred to entities that are essentially
       | immortal, and copyright terms that keep getting extended, just
       | before Mickey Mouse and Elvis Presley become public domain.
        
         | tzs wrote:
         | >> The output of a machine simply does not qualify for
         | copyright protection
         | 
         | > "Simply"? If it were that simple, surely that would mean that
         | the output of the Unix "cp" program would not qualify? What
         | about a DVD copier?
         | 
         | It means that the output of "cp" does not qualify for copyright
         | protection _as a derivative work_. The output is still a copy
         | of the input and would be subject to the same copyright as that
         | input.
         | 
         | Roughly, a derivative work is a _new_ work that incorporates
         | some copyrightable elements from a previous work. The
         | derivative work gets its own copyright separate from the
         | copyrights of those incorporated elements.
        
           | chalst wrote:
           | Not necessarily. The object code produced by a compiler might
           | best be regarded as a different form of the source code, even
           | though it is not a copy or a new work, according to
           | 
           | http://digital-law-online.info/lpdi1.0/treatise26.html
        
           | adriancr wrote:
           | > Roughly, a derivative work is a new work that incorporates
           | some copyrightable elements from a previous work.
           | 
           | By this logic:
           | 
           | - someone could go and copy functions and/or entire files
           | from GPL code bases and use them with a different license.
           | 
           | - someone could use copilot or similar to learn from all
           | available GPL code. Is resulting code GPL?
           | 
           | - someone could use copilot or similar to learn from open
           | source code of their competitors that license doesn't allow
           | them to use. Are the results legal?
        
             | mthoms wrote:
             | The definition of a "derivative work" as stated is correct.
             | 
             | The _copyright status_ of a derivative work is a separate
             | issue: A derivative work can be considered infringing, and
             | a derivative work can be considered non-infringing (ie. due
             | to Fair Use).
        
         | mthoms wrote:
         | I think they meant "when the output of the machine is
         | significantly original/transformed (i.e. is a new creative
         | work)."
         | 
         | I'm not arguing for/against: I just think a straight copy was
         | not what the author intended to be included by that statement.
        
           | b3morales wrote:
           | > I think they meant "when the output of the machine is
           | significantly original/transformed (i.e. is a new creative
           | work)."
           | 
           | No; the author spends some time asserting that the output of
           | a machine is inherently _not_ a creative work:
           | 
           | > Machine-generated code is not a derivative work
           | 
           | > the argument that the outputs of GitHub Copilot are
           | derivative works of the training data is based on the
           | assumption that a machine can produce works. This assumption
           | is wrong and counterproductive.
           | 
           | > This means that machine-generated code like that of GitHub
           | Copilot is not a work under copyright law at all
        
             | mthoms wrote:
             | The machine is performing some sort of transformation.
             | Whether you want to call that a "(creative) work" or not is
             | irrelevant to the point I was making: The machine is doing
             | _something_ to the input, and that makes the  'cp' comment
             | I replied to kind of silly.
             | 
             | I think we can assume the author doesn't believe that
             | piping bytes through the 'cp' command automatically removes
             | copyright (as the person I replied to suggested).
        
       | Cort3z wrote:
       | I wonder how long it will take for the licenses to start
       | explicitly disallowing this sort of usage. It is clearly
       | something that many open source writers dislike, and in my
       | opinion, rightly so.
        
       | vharuck wrote:
       | >What would then stop a music label from training an AI with its
       | music catalogue to automatically generate every tune imaginable
       | and prohibit its use by third parties? What would stop publishers
       | from generating millions of sentences and privatising language in
       | the process?
       | 
       | The existing barrier we have is that, unless the music label can
       | prove a human artist has listened to the specific song matching
       | the artist's, there's no copyright violation. A copyright
       | protects creators from having their work _copied_. It doesn 't
       | give them ownership over matching works. I'm sure there are
       | plenty of pairs of novels with the same first sentence despite
       | each author never having read the other's work.
        
         | rjmunro wrote:
         | Note that Patents and Trademarks are not like this. You can
         | innocently recreate an invention or a similar trademark and you
         | are still infringing.
         | 
         | This often causes confusion - people apply the rules of one
         | type of IP to the others, but they have almost nothing in
         | common.
        
       | uCantCauseUCant wrote:
       | I felt a great disturbance in the AI-community, as if millions of
       | voices suddenly cried out in terror, of GPL Code in there output,
       | and were suddenly silenced. I fear something terrible has
       | happened.
        
       | ClumsyPilot wrote:
       | Julia is one the few MEPs that properly engages with issues of
       | copyright and is active in IT. I really appreciate it, even if I
       | dont always agree with her
        
         | toyg wrote:
         | This is actually why I was so disappointed by her analysis
         | having some very glaring errors. With friends like these...
        
           | CyberRabbi wrote:
           | Politicians are very rarely trustworthy. Who funds her?
        
             | ocdtrekkie wrote:
             | She meets with tech company lobbyists pretty regularly
             | according to her meeting log.
        
         | elcapitan wrote:
         | Former MEP, btw.
        
         | chrisseaton wrote:
         | But she doesn't seem to have engaged - she seems ignorant of
         | basic facts of what the technology is doing in practice if you
         | read the other comments here which give specific examples.
        
       | mabbo wrote:
       | > Copyright law has only ever applied to intellectual creations -
       | where there is no creator, there is no work. This means that
       | machine-generated code like that of GitHub Copilot is not a work
       | under copyright law at all, so it is not a derivative work
       | either. The output of a machine simply does not qualify for
       | copyright protection - it is in the public domain
       | 
       | This is fantastic news.
       | 
       | I'm going to create a bot that crawls sites like GitHub searching
       | for popular libraries. Then it will copy them- sans any license-
       | to it's own website where it will sell these libraries under a
       | new name.
       | 
       | Since there is no creator here, just a piece of software, then
       | there is no copyright violation. My system simply is "inspired"
       | by the original source code using a proprietary algorithm that I
       | call "Copy and paste".
       | 
       | I'm open to accepting venture capital for this project.
        
         | JorgeGT wrote:
         | > The output of a machine simply does not qualify for copyright
         | protection
         | 
         | Does this include my Xerox machine? If so, is anyone looking
         | for very very cheap textbooks?
        
           | progval wrote:
           | Already tried in court: https://en.wikipedia.org/wiki/Rameshw
           | ari_Photocopy_Service_s...
        
             | chrismorgan wrote:
             | That looks to be as much about exceptions for education as
             | about photocopying. Purely from a quick read of the
             | Wikipedia article's summary, it looks like a lot of it
             | hinges on the interpretation of Section 52(1)(i), https://c
             | opyright.gov.in/Documents/CopyrightRules1957.pdf#pa...,
             | "the reproduction of any work-- (i) by a teacher or a pupil
             | in the course of instruction; or (ii) as part of the
             | question to be answered in an examination; or (iii) in
             | answers to such questions".
        
         | Arnt wrote:
         | It's not news, it is fantastic, and you'd do well to understand
         | it.
         | 
         | The output of copilot is macnine-generated and is not subject
         | to copyright. Microsoft cannot claim copyright on what it
         | generates. That does not affect my rights, or anyone else's. I
         | can claim copyright on what _I_ write, and neither copilot nor
         | your stupidity diminish my rights.
         | 
         | MS may argue that what copilot copies is small enough that I
         | have no copyright on that, and win in court. You may put
         | forward the same argument but I think your fate in court would
         | be different.
        
           | epicide wrote:
           | > The output of copilot is macnine-generated and is not
           | subject to copyright. Microsoft cannot claim copyright on
           | what it generates. That does not affect my rights, or anyone
           | else's.
           | 
           | What about when I, a developer working on a proprietary
           | codebase, blindly commit the output code into our product?
           | Have _I_ created a derivative work or, worse, plagiarized?
        
             | user5994461 wrote:
             | >>> Have I created a derivative work or, worse,
             | plagiarized?
             | 
             | If you reproduced code that's copyrightable and under
             | another license, then yes you are in violation.
             | 
             | It will take a decade for the case to proceed to court and
             | determine exactly what claims can be made against your
             | company and GitHub.
             | 
             | In practical terms, you should turn off Copilot this very
             | minute.
        
               | epicide wrote:
               | Right, which is why it feels like a bad-faith argument in
               | this case to say "a machine can't produce copyrightable
               | code, etc."
               | 
               | We aren't talking about a server sitting in a Microsoft
               | data center shuffling code to another server without
               | human intervention. We are talking about a tool that
               | helps _developers_ create code -- code that is
               | "copyrightable and under another license", and thus in
               | violation.
        
         | verelo wrote:
         | Id expect that since that's your intention, as described above,
         | then it is copyright and you would be the actor. In the GitHub
         | case, it isn't the primary intent but rather a byproduct of the
         | goal of helping another developer? Not a lawyer, but spent
         | enough time with lawyers to know that what you're describing
         | won't fly. I don't even know what GitHub is doing will fly,
         | maybe they're hoping it gets tested.
        
           | mabbo wrote:
           | I can't tell if you are being doubly-sarcastic to my sarcasm,
           | or if you missed my point.
        
             | verelo wrote:
             | Ah, honestly i missed your sarcasm. Yeah, so i guess we're
             | on the same page.
        
         | jdright wrote:
         | You know what, I love this idea! We can do the same with music
         | with very few adaptations to the algorithm. This idea is worth
         | gold!
        
       | [deleted]
        
       | jfmc wrote:
       | Modern AI seems more like machine-assisted collage (or pictures,
       | code, text, etc.) than anything else. Someone (of some other
       | algorithm) needs to be added to ensure that the whole thing makes
       | sense. The big problem here is that when an artist creates a
       | collage he/she knows the sources. Here provenance is lost.
       | 
       | [1] Collage (/k@'la:Z/, from the French: coller, "to glue" or "to
       | stick together";[1]) is a technique of art creation, primarily
       | used in the visual arts, but in music too, by which art results
       | from an assemblage of different forms, thus creating a new whole.
        
       | SXX wrote:
       | I think it's time for someone to train AI on leaked proprietary
       | code and source-available code like Unreal Engine. It's cool that
       | we have so much of it right now.
       | 
       | Then we'll see how fast Microsoft and others will shut it down.
        
       | flazx wrote:
       | "This is a slightly modified version of my original German-
       | language article first published on heise.de under a CC-by 4.0
       | license."
       | 
       | Heise appears to be quite $bigcorp friendly recently.
        
         | detaro wrote:
         | ... because they let a regular author publish her opinion?
        
         | [deleted]
        
         | creshal wrote:
         | If by "recently" you mean in the past 10~15 years or so, yeah.
        
       | bennyp101 wrote:
       | Countdown to Oracle lawsuit in 3, 2 ...
        
       | varispeed wrote:
       | Mass processing, repackaging and then selling the data is an
       | exploitative business these multi-billion companies run without
       | paying anything to the people who produced the data.
       | 
       | This is wrong and should be stamped out.
        
       | emrah wrote:
       | Copilot itself may not be infringing copyright or GPL, but its
       | users will be if they incorporate its suggestions into their
       | commercial products.
        
       | swiley wrote:
       | So copyright is dead then?
       | 
       | Can we merge all the leaked driver source into Linux and have
       | decent OSes on handhelds yet?
       | 
       | If I train an "ML autocomplete" on the "OpenNT" source can I
       | share it legally?
        
       | temac wrote:
       | > What is astonishing about the current debate is that the calls
       | for the broadest possible interpretation of copyright are now
       | coming from within the Free Software community.
       | 
       | It is not astonishing at all given:
       | 
       | * proprietary codebase have not been indexed by copilot (at least
       | a public version of it)
       | 
       | * arguably derived code will be used in proprietary programs
        
         | dleslie wrote:
         | Yah, not sure what is astonishing about outrage in response to
         | what appears to be a method for laundering GPL'd software.
         | 
         | Copilot ought only to have indexed public domain, wtf, and
         | other wide-open licensed software. They should remove all GPL'd
         | software from their model, even if that means retraining from
         | scratch.
        
           | TeMPOraL wrote:
           | It's not just GPL, they arguably should remove MIT, BSD and
           | most other Open Source software too, as it's hard to tell
           | when any given snippet crosses a threshold where the original
           | license demands attribution or other things. People seem to
           | forget that even MIT license has actual conditions in it.
        
           | dento wrote:
           | Not just GPL, even works with MIT/Apache/BSD license require
           | attribution
        
       | mrh0057 wrote:
       | Why is everyone ignoring the fact what neural networks do? It is
       | being used as a search context aware pattern matching and use
       | that to predict what you will write next. Of course it's going to
       | return copyrighted works based on what you right.
       | 
       | It's a pattern matching algorithm what exactly did they think it
       | was going to do?
        
       | maweki wrote:
       | The output of Copilot may be not a derivative work, but the
       | trained model surely is, right?
        
       | betwixthewires wrote:
       | > ...some commentators accuse GitHub of copyright infringement,
       | because Copilot itself is not released under a copyleft
       | licence...
       | 
       | This is not why. The issue at hand as I understand it is that
       | people using copilot will potentially have code snippets in their
       | work that are already licensed they do not know the license for
       | and that they will not license properly as a result.
       | 
       | That's in the first paragraph. If you enter this discussion with
       | an incorrect presumption from the outset I don't see how you can
       | form a valid defense.
       | 
       | > However, by doing so, the copyleft scene is essentially
       | demanding an extension of copyright to actions that have for good
       | reason not been covered by copyright.
       | 
       | No. Nobody is asking for an extension of copyright protection, we
       | are asking for the existing reach of copyright to be respected.
       | We built our licenses based on a ruleset that we were told is
       | fair. You don't get to violate rules _you_ made and then claim
       | that copyleft people only made their licenses because as a
       | workaround to copyright and so are being hypocrites.
       | 
       | > Others focus on Copilot's ability to generate outputs based on
       | the training data. One may find both ethically reprehensible, but
       | copyright is not violated in the process.
       | 
       | The arguments I've heard are not that Microsoft is using publicly
       | available information to train it's AI. The argument is that
       | people are potentially (and in some current cases demonstrably)
       | getting _copy pasted code snippets from licensed software._ If
       | you can 't see the plainly obvious problem here it's because
       | you're trying not to.
       | 
       | Also a point made in the article, that machine generated things
       | cannot be copyright because copyright requires a creator, brings
       | up an interesting question as to whether works by people who used
       | copilot can be licensed at all.
        
       | boleary-gl wrote:
       | I'd agree with this conclusion if it wasn't clear that it is very
       | possible - if not common - for Copilot to just completely copy
       | code. That isn't fair use - that's a clear violation of copyright
       | regardless of license.
        
       | orthoxerox wrote:
       | Whether Copilot itself violates GPL or not is one issue.
       | 
       | Whether the code produced by Copilot violates GPL or not is a
       | whole different independent issue.
       | 
       | If I am walking down the street, find a piece of paper with code
       | on it, pick it up and add the code to my program and this code
       | turns out to be licensed under the GPL then my program becomes a
       | derivative work. It doesn't matter who wrote it on that piece of
       | paper, whether it's a 100% correct copy of the GPLed code or not
       | or if there are mistakes in it.
        
       | alfiedotwtf wrote:
       | Has anyone tried dumping the debugging symbols from a Microsoft
       | binary e.g explorer.exe and tried to autocomplete^Wcopilot its
       | functions? Would be interesting how far Microsoft could be pushed
       | before they ate their own hat.
        
       | alkonaut wrote:
       | Whether Copilot infringes copyright is a muddy area. I personally
       | would like to think that the world where machines can be trained
       | on any data is easier to live in than one where trained machines
       | are tainted by the licens of input.
       | 
       | The interesting question however isn't whether Copilot infringes
       | copyrights, but whether those that _use_ copilot do.
        
         | Rapzid wrote:
         | One of the points being made is that in the worst-case scenario
         | of getting Copilot to repeat back verbatim chunks of code from
         | projects, something that's not its primary use case, it would
         | be a situation similar to a copy machine.
         | 
         | You can copy a page out of a book, or the whole book, and be
         | covered under fair use. But you can't sell your copy on Amazon.
         | And if you did, the copy machine nor Xerox ran afoul of
         | copyright law.
         | 
         | You could also use a copy machine to copy fragments of the
         | Linux kernel source out of a book about the Linux source and
         | use them to construct an entirely original work that's not
         | considered derivative.
         | 
         | The devil's in the details, but GitHub talks at some length
         | about the plagiarization issue and their plans to detect and
         | link back to where verbatim chunks exist in the training data
         | to let the operator decide what to do soo.. IDK.
        
       | scotty79 wrote:
       | Don't you think that our world would be way more relaxed and
       | flourishing place if lawers kept their noses out of software like
       | they are keeping them out of math?
        
       | glitchc wrote:
       | I disagree with this article. GitHub Copilot is indeed infringing
       | copyright and not only in a grey zone, but in a very clear black
       | and white fashion that our corporate taskmasters (Microsoft
       | included) have defended as infringement.
       | 
       | The legal debate around copyright infringement has always
       | centered around the rights granted by the owner vs the rights
       | appropriated by the user, with the owner's wants superseding user
       | needs/wants. Any open-source code available on Github is
       | controlled by the copyright notice of the owner granting specific
       | rights to users. Copilot is a commercial product, therefore,
       | Github can only use code that the owners make available for
       | commercial use. Every other instance of code used is a case of
       | copyright infringement, a clear case by Microsoft's own
       | definition of copyright infringement [1][2].
       | 
       | Github (and by extension Microsoft) is gambling on the fact that
       | their license agreement granting them a license to the code in
       | exchange for access to the platform supersedes the individual
       | copyright notices attached to each repo. This is a fine line to
       | walk and will likely not survive in a court of law. They are
       | betting on deep lawyer pockets to see them through this, but are
       | more likely than not to lose this battle. I suspect we will see
       | how this plays out in the coming months.
       | 
       | [1] https://www.microsoft.com/info/Cloud.html
       | 
       | [2] https://github.com/contact/dmca
        
         | lacker wrote:
         | _Github (and by extension Microsoft) is gambling on the fact
         | that their license agreement granting them a license to the
         | code_
         | 
         | This is incorrect. First of all, GitHub isn't even the people
         | building the model. It's built by OpenAI, which has none of
         | these licenses. Secondly, the model is not built purely from
         | GitHub data. OpenAI is relying on fair use, not on a specific
         | license.
        
         | rlpb wrote:
         | > Github (and by extension Microsoft) is gambling on the fact
         | that their license agreement granting them a license to the
         | code in exchange for access to the platform supersedes the
         | individual copyright notices attached to each repo.
         | 
         | The person who has the account on Github and uploads code to
         | them rarely owns the copyright on all of the code, and
         | therefore doesn't have the right to delegate to Github any
         | further licensing permission.
        
         | lubujackson wrote:
         | "Copilot is a commercial product, therefore, Github can only
         | use code that the owners make available for commercial use."
         | 
         | IANAL, but this doesn't sound quite right. There is a
         | difference between "using" code (running it in a commercial
         | product) and manipulating it as arbitrary data within a
         | commercial product.
         | 
         | It definitely can be a gray area, but let's say I use Amazon's
         | service where I email a PDF to my Kindle - is it Amazon's
         | responsibility to know the copyright status of the PDF, or
         | mine? In both cases a commercial product is manipulating
         | copywritten data for the benefit of a user.
        
           | emrah wrote:
           | Even if it's legal for Copilot to do what it does, does it
           | not violate GPL to take pieces of GPL'ed code and use them in
           | a commercial product?
        
             | aj3 wrote:
             | There are plenty of SAAS that use GPL'd code on the
             | backend. That's fine.
        
             | dminor wrote:
             | The basis of the GPL is copyright, so what you're really
             | asking is whether you can use part of a copyrighted work in
             | another work without infringing.
             | 
             | And the answer as always is "it depends".
        
               | danudey wrote:
               | If I use Copilot and it suggests a large block of GPL2'ed
               | code for my project, which I then include, then that is a
               | GPL2 license violation.
               | 
               | Whether the GPL2 will hold up in court, or whether the
               | courts will uphold this specific case (e.g. can you prove
               | intent? Do you need to?), is a separate issue entirely.
               | 
               | The next question is, can I use GPL'ed code in my product
               | and then claim that it was injected by Copilot to avoid
               | repercussions of my actions if caught?
        
             | electroly wrote:
             | The claim (which I'm not qualified to judge) is that this
             | use falls under fair use. The point of fair use is to allow
             | some use of copyrighted works even if the copyright owner
             | does not license it to you and even if the owner is
             | explicitly hostile towards your usage. If it is indeed fair
             | use, then the license doesn't matter because that's not the
             | thing that's allowing you to use the work.
        
             | moralestapia wrote:
             | Yes.
        
           | lelandbatey wrote:
           | Your example doesn't quite match what's happening in real
           | life though. You're not "using copilot as a mechanism to
           | ferry around code". Co-pilot is making recommendations for
           | what code to use and then also giving that exact code (the
           | text) to you. A more apt example would be if Amazon had some
           | UI which said "What kind of book do you want to read on your
           | kindle?", you click the button labeled "biography", and then
           | Amazon sends your Kindle an AI generated book which is the
           | biography of a famous person, and it _just so happens_ that
           | the  "generated" book being sent to you is an exact copy of
           | someone elses book (or incorporates exact copies of
           | chapters/paragraphs of someone elses book), legal disclaimers
           | and all.
        
           | nxpnsv wrote:
           | The proprietary model is a representation of lots of
           | harvested open source code snippets. Without the model
           | copilot is nothing. Arguably, the code snippets are part of
           | the product....
        
           | to11mtm wrote:
           | Maybe you're right, maybe you're wrong.
           | 
           | I'll give the best example, the one task that off the top of
           | my head that I would like some AI help with.
           | 
           | I would really like to replicate the functionality of Java's
           | SSLEngine, but for C#.
           | 
           | If I used Co-Pilot to help, at best, I would need to pay for
           | a legal team to do some form of 'clean room' review of
           | whatever was generated to make sure it did not infringe on
           | the OpenJDK code that is out there. At worst, I would be
           | having to defend myself from Oracle's legal team -anyway-.
           | 
           | And yeah, I'm assuming in this case that Copilot would be
           | 'smart' enough to be able to make the right inferences of
           | that java code and put it into workable C# construct.
           | Stepping back, though, one could still ask the question;
           | what's the risk of a _Java_ developer accidentally getting
           | some OpenJDK code a little too closely? There 's an order of
           | magnitude difference between even a smaller AGPL developer
           | and Oracle.
           | 
           | If Microsoft/GH was willing to go to bat and agree to pay for
           | the defense of users of Copilot, I would be far less
           | concerned with the implications of all of this.
        
         | onion2k wrote:
         | If Copilot is infringing copyright by reproducing small samples
         | of the training data, and if we agree that that isn't
         | acceptable, doesn't that effectively spell the end of the road
         | for any and all AI generated content unless the developers
         | explicitly stop their product reproducing data that matches the
         | data it was trained on? That seems like it would have far
         | reaching consequences for AI as an industry.
        
         | leereeves wrote:
         | Doesn't everyone who uploads code to a public repo give
         | Microsoft/GitHub a license to (strike ~redistribute~) reproduce
         | that code?
         | 
         | If they didn't, GitHub itself would be violating copyright
         | every time someone browsed the repo.
         | 
         | And copilot appears to be a part of GitHub.
         | 
         | https://copilot.github.com/
         | 
         | So why wouldn't copilot itself be covered by that license?
         | 
         | (Certainly people using copilot would not. Let the user
         | beware.)
         | 
         | Edit: downvoted to death but the top reply shows that it's
         | true. An inconvenient truth, I suppose.
        
           | krono wrote:
           | From the GH TOS:
           | 
           | > _4. License Grant to Us_
           | 
           | > _This license does not grant GitHub the right to sell Your
           | Content. It also does not grant GitHub the right to otherwise
           | distribute or use Your Content outside of our provision of
           | the Service_
           | 
           | https://docs.github.com/en/github/site-policy/github-
           | terms-o...                 .
           | 
           | > _5. License Grant to Other Users_
           | 
           | > _If you set your pages and repositories to be viewed
           | publicly, you grant each User of GitHub a nonexclusive,
           | worldwide license to use, display, and perform Your Content
           | through the GitHub Service and to reproduce Your Content
           | solely on GitHub as permitted through GitHub 's functionality
           | (for example, through forking)._
           | 
           | > _You may grant further rights if you adopt a license._
           | 
           | https://docs.github.com/en/github/site-policy/github-
           | terms-o...                 .
           | 
           | So yes, but only within GitHub.                 .
           | 
           | Edit:
           | 
           | > _A. Definitions_
           | 
           | > _The "Service" refers to the applications, software,
           | products, and services provided by GitHub, including any Beta
           | Previews._
           | 
           | https://docs.github.com/en/github/site-policy/github-
           | terms-o...
           | 
           | Sneaky bastards.                 .
           | 
           | Edit: Formatting
        
           | ben0x539 wrote:
           | Not everything on Github was uploaded by the copyright
           | holders. Often enough, it's uploaded by people who only have
           | access to it under an open source license, so Github cannot
           | in general squeeze additional license terms out of the
           | uploader at that point.
        
             | leereeves wrote:
             | That's a good point; what is their obligation/liability in
             | that case?
        
           | glitchc wrote:
           | There's more than redistribution happening here. Co-pilot is
           | providing a value-add service where the open-source code is
           | an input and the output is a service. As it happens, the
           | service is actually regurgitating the code at this point, but
           | it's important to consider that even if it didn't regurgitate
           | the code verbatim, the fact that the service is making use of
           | that code to provide a value-add means the code is a crucial
           | input to the value proposition. Would Co-pilot be able to
           | provide the value-add without the source? Likely not.
           | 
           | Couple that with the fact, that presumably at some point in
           | the future, Co-pilot will come attached with a subscription
           | model (otherwise why do it in the first place?), and we have
           | the makings of a product that is commercially infringing on
           | copyright left, right and center.
        
           | emrah wrote:
           | I'm thinking it's not so much what is legal for Copilot to do
           | with code chunks from GPL'ed code, but what it means for end
           | users (i.e. developers at for-profit companies) to
           | incorporate those chunks into commercial products
        
           | moralestapia wrote:
           | No.
           | 
           | Edit: Sorry downvoters, whether you like it or not, you don't
           | understand the terminology. You're confusing _reproduction_
           | with _redistribution_.
        
             | leereeves wrote:
             | I'm not a lawyer so it's entirely possible I used the wrong
             | term. Thank you for clarifying below.
             | 
             | Using the terms as you explained them below, I meant that
             | Microsoft/GitHub has permission to _reproduce_ the code so
             | why wouldn 't that extend to copilot?
        
               | blooalien wrote:
               | Are they displaying the _license_ under which said code
               | is licensed when they display a chunk of licensed code?
               | If not, then they 're violating the terms of most
               | licenses (except pure public domain, or other similar
               | licenses which don't have any such requirements
               | attached).
               | 
               | The use of licensed code in other projects must be done
               | under the terms of that license or you aren't legally
               | (under copyright law) allowed to use the code.
        
               | leereeves wrote:
               | As I said, I'm not a lawyer, but I believe they're
               | displaying it under the terms of the GitHub ToS, using
               | rights granted to them when the project is uploaded to
               | GitHub, not under the terms of the license the project
               | uses for everyone else.
        
               | moralestapia wrote:
               | _Reproduction_ is enough to cover the first part of your
               | use case. This is mentioned on Github 's TOS.
               | 
               | For the latter you would need _redistribution_ as it is
               | going into a different product, for which you claim
               | ownership, and with possible modifications /adaptations
               | (this would depend on the rights granted by the license).
               | Nowhere on Github's TOS is the word or concept of
               | _redistribution_ referenced.
               | 
               | So, the answer to your original question is "no".
               | 
               | Edit: leereeves modified its comment after I wrote this,
               | so it may not make much sense but you can figure out the
               | point. Best!
        
               | dahart wrote:
               | I'm not sure this is a completely fair take, I think the
               | original question is legitimate and relevant. Github's
               | TOS does in fact ask the contributor to grant a license
               | for GH to host and serve their code from GH servers. That
               | is both reproduction and distribution as defined by
               | copyright law, and copyright covers both of those at the
               | same time https://www.copyright.gov/what-is-copyright/
               | 
               | (Edit and BTW GH calls out their 'distribution' in
               | section D.4 of their TOS explicitly, but without using
               | the word "distribute". They say you grant them the right
               | to "publish" and "share" code you upload, which means
               | "distribute" under copyright law. They also imply that by
               | spelling out the terms under which they do not
               | "distribute", which is anytime the content is used
               | outside of GitHub's services.)
               | 
               | I don't think you're correct that the term
               | "redistribution" means either going into another product,
               | nor that it implies a claim of ownership. Putting works
               | into another product is sometimes known as making a
               | _derivative_ work, while "redistributing" is quite
               | commonly used to mean copy-and-distribute as-is.
               | Redistribution can happen via license as well, it
               | requires permission by the copyright owner, but does not
               | imply the redistributor is (or is claiming to be) the
               | copyright owner.
        
               | moralestapia wrote:
               | >I think the original question is legitimate and relevant
               | 
               | You didn't see the original question, it was edited, so
               | we cannot discuss that further.
               | 
               | "[...] which means "distribute" under copyright law" <--
               | Citation needed please, because I don't think that's
               | correct.
               | 
               | From the site you linked:
               | 
               | "Distribute copies or phonorecords of the work to the
               | public by sale or other transfer of ownership or by
               | rental, lease, or lending."
               | 
               | What I seem to grasp about the difference between
               | _reproducing_ and _redistributing_ is that it has to do
               | with the concept of  "transfer of ownership". Also
               | _derivate work_ and _redistribution_ are not mutually
               | exclusive.
               | 
               | The moment you create a new thing and start
               | _distributing_ it (even if you do not modify it), you
               | become the de facto owner of that new product, and
               | copyright law is trying to limit the extent of the rights
               | that apply there. So, in the case of music, it 's
               | different thing to play ( _reproduce_ ) a song than to
               | create a new album with your favorite artists that
               | happens to include that particular song (
               | _redistribution_ ).
        
               | dahart wrote:
               | > "Distribute copies or phonorecords of the work to the
               | public by sale or other transfer of ownership or by
               | rental, lease, or lending."
               | 
               | > What I seem to grasp about the difference between
               | reproducing and redistributing is that it has to do with
               | the concept of "transfer of ownership". Also derivate
               | work and redistribution are not mutually exclusive.
               | 
               | What you've misunderstood is it is the _copies_ that are
               | sold, not the copyrights.
               | 
               | * edit
               | 
               | > create a new album with your favorite artists that
               | happens to include that particular song (redistribution).
               | 
               | This is not what redistribution means. You seem confused
               | about this word.
        
               | moralestapia wrote:
               | >What you've misunderstood is it is the copies that are
               | sold, not the copyrights.
               | 
               | Sorry, I'm not following you anymore. I don't even know
               | what you mean by that sentence.
               | 
               | Edit:
               | 
               | >This is not what redistribution means. You seem confused
               | about this word.
               | 
               | But, that's exactly what redistribution entails ...
        
               | dahart wrote:
               | > Sorry, I'm not following you anymore. I don't even know
               | what you mean by that sentence.
               | 
               | The transfer of ownership you referred to is a transfer
               | of ownership of a copy, it is not a transfer of ownership
               | of the original work itself. You misunderstood the
               | passage you quoted to mean that redistribution is
               | transferring ownership of the work itself, as in
               | copyright ownership of the work. But the text you quoted
               | is only talking about transferring ownership of the
               | copies. The text you chose makes more sense in the
               | context of physical copies of books or "phonorecords".
        
               | [deleted]
        
             | dahart wrote:
             | > You're confusing reproduction with redistribution.
             | 
             | It seems like you're confused; GitHub's terms require users
             | to grant both of those. Copyright law also covers both.
        
               | moralestapia wrote:
               | >GitHub's terms require users to grant both of those
               | 
               | Last time I checked (about an hour ago), that wasn't
               | true. Feel free to provide evidence to support your
               | argument.
        
               | dahart wrote:
               | > Last time I checked (about an hour ago), that wasn't
               | true. Feel free to provide evidence to support your
               | argument.
               | 
               | https://docs.github.com/en/github/site-policy/github-
               | terms-o...
               | 
               | "publish" and "share" mean redistribution. "Store" and
               | "copy" mean reproduce.
        
               | moralestapia wrote:
               | >"publish" and "share" mean redistribution
               | 
               | No. That's something you believe, but it's not
               | necessarily true.
               | 
               | Check here, https://copyrightalliance.org/faqs/what-
               | rights-copyright-own...
               | 
               | Again, distribution has to do with a transfer of
               | ownership. In layman terms, Github can _show_ your code
               | to others but it cannot _give_ (as in ownership) your
               | code to them. It 's a bit tricky here since on the web
               | showing something literally means making a copy at some
               | point, but try to view things under the light of "who
               | owns what" and it's a bit easier to grasp.
               | 
               | If you browse through someone's repository, it's pretty
               | clear who the owner of that code is, if a program gives
               | you a chunk of code that it "got from somewhere" there's
               | definitely some sort of change of ownership operation
               | going on; which in this case is interesting, as it went
               | from _attributed to someone_ to _missing_.
        
         | maximilianroos wrote:
         | > GitHub Copilot is indeed infringing copyright and not only in
         | a grey zone, but in a very clear black and white fashion
         | 
         | You seem to be confusing what you'd like the law to be with
         | what the law is.
        
           | drran wrote:
           | Here is an explanation of the law:
           | https://www.copyright.gov/fair-use/more-
           | info.html#:~:text=Fa....
           | 
           | Effect of the use upon the potential market for or value of
           | the copyrighted work: Here, courts review whether, and to
           | what extent, the unlicensed use harms the existing or future
           | market for the copyright owner's original work. In assessing
           | this factor, courts consider whether the use is hurting the
           | current market for the original work (for example, by
           | displacing sales of the original) and/or whether the use
           | could cause substantial harm if it were to become widespread.
        
         | progval wrote:
         | > Any open-source code available on Github is controlled by the
         | copyright notice of the owner granting specific rights to
         | users.
         | 
         | and by the GitHub ToS:
         | 
         | > You grant us and our legal successors the right to store,
         | archive, parse, and display Your Content, and make incidental
         | copies, as necessary to provide the Service, including
         | improving the Service over time. This license includes the
         | right to do things like copy it to our database and make
         | backups; show it to you and other users; parse it into a search
         | index or otherwise analyze it on our servers; share it with
         | other users; and perform it, in case Your Content is something
         | like music or video.
         | 
         | https://docs.github.com/en/github/site-policy/github-terms-o...
        
           | TomVDB wrote:
           | Does the GitHub ToS matter when I upload code that was
           | written by somebody who doesn't use GitHub?
        
             | progval wrote:
             | Then you would be the one infringing their copyright, and
             | they could probably sue you.
             | 
             | Although I'm curious about what GitHub would do if the
             | original author asked them to remove the work from Copilot.
             | Retrain from scratch every month or so, to remove last
             | month's DCMAed content?
        
           | GoOnThenDoTell wrote:
           | Not everyone who's code ends up on GitHub has agreed to this
           | set of terms
        
             | progval wrote:
             | See my answer to
             | https://news.ycombinator.com/item?id=27741709
        
           | eCa wrote:
           | > > as necessary to provide the Service
           | 
           | I would consider Copilot to not be part of "the Service"[1],
           | but at least currently[2] the definition of "the Service" is
           | so vague as to include anything that Github does.
           | 
           | Maybe they consider Copilot to be a "search index" and the
           | suggestions "[sharing] [Your Content] with other users".
           | 
           | [1] Since, as I understand it, it will require separate
           | payment.
           | 
           | [2] The ToS is currently last edited 2020-11-16, and does not
           | contain the word "Copilot"
        
         | j4yav wrote:
         | The part that feels really obvious to me is that, if I made an
         | AI that could generate music by looking through the entire
         | (copyrighted) back catalog of the Beatles for example, and it
         | would output music that I could control to be very much or even
         | exactly like the original recordings, or I could accidentally
         | do it, that it wouldn't really be a way to launder the original
         | licenses/copyright into the public domain.
         | 
         | Or maybe it is, but if so it essentially means the end of
         | licensing because it would be trivial to make an AI that can
         | take an input and produce the same output. Or maybe even cp is
         | good enough to strip the source of its original license in that
         | case.
         | 
         | Open source licenses are worth protecting or you break the
         | cycle that helps more software be open.
        
           | Jarwain wrote:
           | Wouldn't the parallel be closer to having an ai remix a bunch
           | of songs together?
        
           | aj3 wrote:
           | You're giving AI too much credit, it's just a tool, it does
           | not have it's own intentions.
           | 
           | I.e. if you buy a piano or a guitar, you could play and
           | record copyrighted music on it. That's not piano's or
           | guitar's fault though, it's yours.
        
             | alex_c wrote:
             | Funny you should say that, as there is a direct line
             | connecting player pianos in the 19th century to copyright
             | law in the 21st:
             | 
             | https://en.m.wikipedia.org/wiki/Mechanical_license
        
             | [deleted]
        
             | sillysaurusx wrote:
             | It sounds like you might be the one giving it too much
             | credit. AI is a glorified markov chain, which is
             | essentially a compression algorithm. I agree that it can be
             | an instrument (I've done it:
             | https://soundcloud.com/theshawwn/sets/ai-generated-
             | videogame...) but it's almost trivial to train a model that
             | memorizes by rote.
             | 
             | Suppose a model was trained solely on a single Beatles
             | album. It could only spit out that album. That would be
             | clear infringement, wouldn't it?
        
               | toxik wrote:
               | It's funny that people say it's a glorified Markov chain.
               | 
               | No. It's not. A Markov chain has some very specific
               | properties that are absolutely not fulfilled by GPT-3
               | models.
               | 
               | Just say "stochastic" if you want a buzzword. Stop
               | appropriating Markov chains.
        
               | sillysaurusx wrote:
               | "It's a stochastic" doesn't flow, though I guess I could
               | use "stochastic random walk."
               | 
               | What properties does a gpt-3 model have that a Markov
               | chain doesn't? (Other than effectiveness.)
        
               | exdsq wrote:
               | ANNs aren't Markov chains
        
             | alkonaut wrote:
             | A typewriter vs. a machnine that recites paragraphs of
             | shakespeare are two different things.
        
               | j4yav wrote:
               | Neither of them unbind the content from the original
               | license, though.
        
             | koonsolo wrote:
             | When I press 1 key and it plays copyrighted music, that is
             | the piano's fault.
        
               | aj3 wrote:
               | Even if hypothetically there was such a strange bug in
               | your piano and you decided to exploit it by recording
               | copyrighted music and redistributing it, you would be
               | accountable for it, not a piano.
               | 
               | This analogy train went too far, don't you think? All
               | examples that I've seen on Twitter require quite an
               | intentional manipulation by human for Copilot to produce
               | something copyrighted. It does not recite Linux code by
               | pressing 1 key.
        
               | hnmullany wrote:
               | If you have an electronic piano that requires a complex
               | series of button pushes to produce copyrighted music -
               | that's still a copyright violation. Copyright law has no
               | notion that the difficulty of reproducing copyrighted
               | content effects the fact of a violation.
        
               | akersten wrote:
               | > an electronic piano that requires a complex series of
               | button pushes to produce copyrighted music
               | 
               | Surely a judge presented with the "complex series of
               | button pushes," otherwise known as playing an instrument,
               | would hold the player accountable for any infringement
               | and not the piano?
               | 
               | These analogies have gone so far off the rails that I
               | can't tell which side this thread is arguing for by now
               | ;)
        
               | b3morales wrote:
               | I think the whole swirling discussion is a little
               | confused because there are potentially two "ends" where
               | infringement could happen, and different people are
               | talking about each. And the article covers both.
               | 
               | One end is GitHub's, at the input: Copilot's "database"
               | was initialized from code that GitHub does not have
               | copyright to. The contention at this end is that they are
               | ignoring the licenses that would grant them the right to
               | use that code.* The article, GitHub, and others assert
               | that there's no copyright issue for creating a database
               | of this kind (a machine learning model).
               | 
               | The other end is the the developer taking Copilot's
               | output. The article seems to take the (absurd IMO)
               | position that there's also no copyright implications
               | here, because the output _is not copyrightable_ at all.
               | 
               | *And personally this is the side that concerns me most.
        
               | [deleted]
        
               | [deleted]
        
               | dahart wrote:
               | If you have a piano that plays copyrighted music when you
               | press a single key, isn't that the piano _maker's_ fault?
               | 
               | Edit - googling, the history of player pianos vs
               | copyright is interesting
               | 
               | https://slate.com/technology/2014/05/white-smith-music-
               | case-...
               | 
               | https://www.techdirt.com/articles/20100712/18325210185.sh
               | tml
        
           | freshhawk wrote:
           | "Or maybe it is, but if so it essentially means the end of
           | licensing because it would be trivial to make an AI that can
           | take an input and produce the same output."
           | 
           | Yes, this is what is pretty interesting to me. I said in a
           | previous comment that I have a really good OS generating AI.
           | It asks you your favorite color and outputs a disk image you
           | can use as an installer.
           | 
           | Right now it just happens to output a cracked version of
           | Windows if you answer "blue". Who can know how that happened?
           | It's a black box after all. Seems useful though, since
           | Microsoft is loudly saying that if I distributed this it
           | would have no license problems at all.
        
           | remus wrote:
           | I think the main point that the article makes is that for
           | copyright to work you need some notion of a creative work,
           | and so far it's generally accepted that snippets like
           | i = i + 1
           | 
           | aren't creative enough to be covered by copyright. The
           | interesting point is where you draw the line between what's
           | boilerplate and what's creative, and legally it will
           | presumably come down to showing that copilot crosses that
           | line egregiously enough for someone to think they've got a
           | successful chance at legal action.
        
             | b3morales wrote:
             | Sure, but GitHub's own promotional pages (pretty much any
             | of the gifs on https://copilot.github.com/ as well as other
             | articles, e.g.
             | https://docs.github.com/en/github/copilot/research-
             | recitatio...) show it producing much more elaborate
             | segments than that.
             | 
             | In fact, that's a crucial selling point for the product.
        
             | j4yav wrote:
             | Since that article was written people have shown it will
             | generate quite long coherent sections. It will even
             | generate someone's private about me page: https://twitter.c
             | om/kylpeacock/status/1410749018183933952?s=...
        
               | nanna wrote:
               | But that about me page is the very definition of
               | boilerplate text, so really it only gives weight to the
               | argument that it's _not_ producing original work.
        
               | dmurray wrote:
               | You got downvoted, but I kind of like this argument.
               | There are a million "about me" pages, but Copilot did a
               | good job of picking one for "generic software engineer".
               | If it could just have changed a word or two to a synonym,
               | it would be great.
        
               | phire wrote:
               | Bullshit.
               | 
               | That not an existing aboutme page. You can go to
               | davidcelis' website and verify that it's completely
               | different.
               | 
               | Copilot just picked a random person and linked to their
               | social media accounts. You can search any large quote
               | within that about me on Google and not find a match, it
               | is unique.
               | 
               | The only two examples of generating large sections of
               | copyrighted work are the quake floating point hack and
               | the zen of python. Both those examples are commonly known
               | and copied and talked about, to the point that they have
               | wikipedia pages.
        
             | thayne wrote:
             | But as I understand it, copilot can generate much longer
             | snippets, even entire functions.
             | 
             | I think the big question is, if copilot ends up copying
             | significant portions a GPL work, not just tiny snippets, is
             | the resulting work infringing, and if so, who is liable?
        
           | bdowling wrote:
           | > and it would output music that I could control to be very
           | much or even exactly like the original recordings, or I could
           | accidentally do it, that it wouldn't really be a way to
           | launder the original licenses/copyright into the public
           | domain.
           | 
           | The test for non-literal copyright infringement is
           | "substantial similarity." If, after filtering out irrelevant
           | and non-copyrightable elements, the allegedly-infringing work
           | is substantially the same as the original work, then it
           | infringes. If it infringes, then two common defenses are
           | independent creation and fair use.
           | 
           | In your hypothetical, the AI-generated work would infringe
           | the original because you stated it would be substantially the
           | same as the copyrighted work. You can't claim independent
           | creation because the algorithm was dependent on the original
           | work and you controlled the output of the algorithm to be
           | exactly like the original work. Fair use is pretty much a
           | non-starter, so I'll skip that analysis.
           | 
           | So, no, you couldn't use an AI to launder copyrighted works
           | into the public domain.
        
             | Aeolun wrote:
             | Unless you are Github, in which case having your AI copy
             | code vertabim is ok?
        
         | DarkmSparks wrote:
         | There were two parts to the argument which seem to hold water.
         | 
         | 1. Any code generated by co pilot is likely to be agpl
         | 
         | 2. since the authors of copilot used co pilot beta to make co
         | pilot release copilot is very likely using agpl licenced code
         | and therefore in breach of the agpl licence.
         | 
         | so yep, article looks flawed.
        
       | makecheck wrote:
       | Of _course_ derivative works are being produced!! Whether you
       | blame Copilot or the developer using it, the result is something
       | that required the original developer of the code in order to be
       | constructed.
       | 
       | Have we reached the point where every "class X" must become
       | "class X_GPL2_CopyrightJohnQSmith_AllRightsReserved" in every
       | code base out there? Do we need to go from header comments at the
       | top of a file to reminder comments at the end of every line?
        
       | truffdog wrote:
       | If Microsoft is confident that Copilot is not a parrot, they
       | should include their proprietary codebases in the training
       | database.
        
         | BuildTheRobots wrote:
         | Does anyone know which codebases got included? I get the
         | impression copilot scraped github - but as it's an internal
         | tool, did it only scrape public repos or has private repos also
         | been slurped?
        
           | xdennis wrote:
           | There are torrents of leaked Windows source code. Someone
           | with access to Microsoft Copilot could try to see if
           | reproduces the code there.
        
         | dleslie wrote:
         | If they truly believe copilot does not produce derivative
         | works, then there is no downside to indexing their own code in
         | its entirety; it would probably improve copilot's behaviour.
         | 
         | Well, Microsoft, show us you believe your own arguments!
        
           | hnfong wrote:
           | A counter argument that Microsoft can use is: "The code we
           | write at Microsoft is so bad that it will decrease the
           | quality of the generated output". :)
        
       | tyingq wrote:
       | Guess round 2 will have Copilot dumping to AST, changing function
       | and variable names, then dumping back to source.
        
       | sprafa wrote:
       | Amazing how this was never an issue when other "AI" systems use
       | other people's data to learn how to drive cars/write text. But
       | man you start messing with developer data and suddenly there are
       | ethical issues! Amazing turnaround.
       | 
       | Face it - AI as we currently call it is just a very sophisticated
       | data sorting algo in most cases (let's ignore the AlphaZero non
       | supervised learning type). Everyone was getting celebrating when
       | Common Man was destroyed by devs commoditising their knowledge
       | through data capture. But now suddenly it's a problem! Mess with
       | a man's pocket.
        
         | ben0x539 wrote:
         | How was this never an issue? People including devs have been
         | upset by AI and data mining for a long time.
        
           | ramraj07 wrote:
           | Show me a 700 comment HN thread about people worried about
           | gpt violating copyright.
           | 
           | Then show me 20 such threads because thats what's been spewed
           | here.
        
           | sprafa wrote:
           | not in this forum, they were rubbing their hands in glee at
           | insane AI valuations.
        
         | Rapzid wrote:
         | > GPT-3 trained on the ENTIRE INTERNET
         | 
         | Red carpet.
         | 
         | > Copilot trained on publicly available source code
         | 
         | Pitch forks.
        
       | dj_mc_merlin wrote:
       | I think a good deal of engineers here should familiarize
       | themselves with Julia Reda and her work and ask themselves if
       | they have the legal knowledge to debate on this matter. Common
       | knowledge is not acceptable to determine truth.
       | 
       | Would you really respect the opinion of some dude who's only used
       | Excel about your profession?
        
         | josefx wrote:
         | She cites githubs smallest excerpt claim for her reasoning when
         | we already know that the tool happily reproduces entire
         | functions with comments verbatim.
         | 
         | Also her claims about machine generated code have a really
         | funny interaction with the cp command. Clearly cp
         | MicrosoftWindows11Source.zip FreeWindows.zip is not a creative
         | process, cp is a command executed by a machine hence the
         | contents of FreeWindows.zip are now entirely public domain. Man
         | were was she when people where sued over creating entire
         | libraries of public domain movies using BitTorrent?
        
           | ramraj07 wrote:
           | Just like you accuse her of being out of date with recent
           | findings, y'all seem conveniently out of date with githubs
           | assurance that they will be adding checks to not regurgitate
           | full chunks of code. So what exactly is your point then?
        
         | ramraj07 wrote:
         | Won't work. This tool is attacking them like the presence of a
         | vegan attacks some hardcore meat eaters. They might realize
         | deep down that this is not an argument they can win but it
         | offends their core existence in some ways so they can't help
         | but die defending their incoherent arguments.
         | 
         | Ethical or not, it's clear Microsoft isn't going to get into
         | real legal trouble due to this, and if the tool is genuinely
         | useful, it's going to "allow the laundering of GPL code" into
         | companies, whatever that means.
         | 
         | If that offends people then they better learn the lesson and
         | not produce open source any more. I'm not happy but if thats
         | the direction the natural progression of things take whatever
         | let's see where that goes.
        
       | Causality1 wrote:
       | _The output of a machine simply does not qualify for copyright
       | protection - it is in the public domain._
       | 
       | Is it just me or is that a patently ridiculous statement? The
       | output of a machine belongs to the person owning/using the
       | machine. If I use a digital camera to take a picture of a
       | copyrighted image I'm still committing copyright infringement
       | despite the output being created by a machine and a bunch of
       | image processing software.
        
       | dr_kiszonka wrote:
       | My less lofty personal gripe with Copilot is as follows. I worked
       | hard to produce quality code. GitHub will make money off my code.
       | Copilot users will make money using my code. I - the creator -
       | will make nothing.
       | 
       | At the very least, I should have been asked whether my code can
       | used by Copilot and I should get at least a share of the profit
       | Copilot generates every month, where the share equals to my code
       | / all training code used by Copilot. The latter part could be
       | gamed by other developers in the future, but it's the best I
       | could come up with.
        
         | CyberRabbi wrote:
         | A determination of fair use does take commercialization into
         | account so this is a fully valid concern. GitHub is explicitly
         | looking to profit from the work of others.
        
         | FeepingCreature wrote:
         | If you didn't want your code to be reused or even
         | commercialized by others, you really shouldn't have made it
         | opensource.
        
           | heavyset_go wrote:
           | I'm okay with reuse and commercialization as long as the
           | licensing terms of my code are adhered to. That means proper
           | attribution, distribution of copyright notice and license,
           | and making modified code available to users. Copilot does
           | none of that.
        
           | ben0x539 wrote:
           | My understanding is that Github's argument is that their use
           | of the code to train Copilot is fair use. As such, whether
           | the code in question has been released as open source only
           | matters to the extent that it makes it more convenient for
           | Github to access it, but the argument would work as well for
           | a proprietary codebase.
           | 
           | Edit: I just skimmed the copilot blurb again, they seem to
           | refer to "publicly available" sources and not open source
           | code as their input.
        
             | drran wrote:
             | Yes, this is M$ argument, but it's not backed by law. See
             | [0].
             | 
             | "Publicly available" is not "in public domain". Many
             | commercial songs are "publicly available" via a radio
             | station.
             | 
             | [0]: https://www.copyright.gov/fair-use/more-
             | info.html#:~:text=Fa.... .
        
           | [deleted]
        
           | swiftcoder wrote:
           | If my code isn't released under a permissive license, then I
           | might have the expectation that those wishing to use my code
           | for commercial purposes will contact me and pay for a
           | commercial license.
           | 
           | This is sort of the whole point of non-commercial licensing
           | (and often, of the GPL itself, since many potential licensors
           | don't wish to deal with GPL restrictions).
        
             | IshKebab wrote:
             | Sure but did you have the expectation that people wouldn't
             | read your code and learn from it? I think even non-
             | commercial licensing can't prevent that.
             | 
             | If your code is so super-special that you don't want people
             | to read it and go "ah that's a neat linked list reversal
             | algorithm" or whatever then your only options are software
             | patents or keeping it entirely closed source.
             | 
             | Maybe trade secrets, but they tend to apply in very very
             | limited circumstances. I doubt any software would qualify.
        
               | drran wrote:
               | Yes, people can read my open-sourced code and learn from
               | it, like they can do with paints, movies, sculpts, and
               | books.
               | 
               | No, I don't allow to copy my code freely.
               | 
               | Can you explain, what point you are trying to defend
               | here?
        
         | visarga wrote:
         | > I - the creator - will make nothing.
         | 
         | Did you benefit from reading code in your education? Pass it
         | forward! You will benefit many people, don't cut the rope under
         | you. And in turn you will also get the same benefit, and
         | adapted to your needs.
        
       | yunohn wrote:
       | Here we go again, a legal expert weighs in with a long and
       | detailed post about Copilot;
       | 
       | And HN rallies to criticize it because Copilot can reproduce some
       | snippets when forced to.
        
         | joshuaissac wrote:
         | > a legal expert weighs in [...] And HN rallies to criticize it
         | 
         | That is an appeal to authority. Being a legal expert does not
         | excuse one's writing from critical analysis. In this case, the
         | post does not address Copilot reproducing large segments of
         | copyrighted code verbatim. That is valid criticism.
        
           | yunohn wrote:
           | It is not an appeal to authority. I'm saying the expert is
           | providing a legal explanation, and HN is throwing anecdotes
           | around.
           | 
           | There is no logical fallacy since HN refuses to even have a
           | logical discussion about Copilot.
        
             | IncRnd wrote:
             | The GP is correct. It is a logical fallacy that by
             | definition is an appeal to authority, and this is the
             | logical discussion.
        
               | floatingatoll wrote:
               | This discussion is heavily biased and prioritizes
               | people's emotional need to be credited and/or paid for
               | their work over a discussion of the legal and ethical
               | concerns at play here. It disregards the comments of an
               | expert in the field and focuses instead on demands that
               | may well be unsupported by copyright law.
               | 
               | For example, GitHub license section D.4 specifically
               | grants GitHub the right to display your content, analyze
               | your content, and reproduce it in full to other users of
               | the service. Yet no one seems particularly interested in
               | discussing that here today, because it isn't compatible
               | with the outrage that people are prioritizing on HN when
               | discussion Copilot.
               | 
               | I would have expected HN to be better than Reddit in this
               | regard, but I'm not seeing it yet. I don't know if the
               | expert is right or wrong here, but nothing in today's
               | comments suggests anything new or curious that hasn't
               | already been ranted about in every prior thread about
               | this topic. I specifically care about copyright law and
               | it's disappointing to see HN having a group tantrum
               | instead of a discussion.
               | 
               | https://docs.github.com/en/github/site-policy/github-
               | terms-o...
        
               | ghaff wrote:
               | The legal commentary I'm seeing from people who really
               | know this stuff is pretty much unanimously in favor of
               | this being legal in at least most of the world based on
               | caselaw--while acknowledging why some might have ethical
               | concerns.
               | 
               | I'm actually sort of curious as to the vigor of the
               | backlash. Because Microsoft? Because of concerns about
               | perceived further undermining of the GPL in particular?
               | Because of people anxious to get their credit?
               | Because...?
        
               | floatingatoll wrote:
               | Because they're not getting a share of GitHub's future
               | revenues from their works or from derivations or their
               | work.
               | 
               | (Why do they care so much about revenue? Open source
               | coders and 'starving artists', not to mention Covid
               | economic wreckage, the US approach to medical insurance,
               | and the total absence of Universal Basic Income in
               | virtually all countries permitted to access GitHub.)
        
               | ghaff wrote:
               | So don't open source it and/or put it on GitHub?
        
               | floatingatoll wrote:
               | The latter, it turns out, is more important than the
               | former.
        
               | IncRnd wrote:
               | > I'm actually sort of curious as to the vigor of the
               | backlash. Because Microsoft? Because of concerns about
               | perceived further undermining of the GPL in particular?
               | Because of people anxious to get their credit?
               | Because...?
               | 
               | Because, this is really against the understanding of what
               | was possible for copyrighted works. So, now that this is
               | possible for anyone, copyright will start to get examined
               | and hopefully updated to be useful in today's
               | environment.
               | 
               | There are about a million problems with this.
               | 
               | This can even be used to intentionally launder source
               | codes from a competitor. Apparently, all it will take
               | will be to steal code (or just fork it), then create more
               | than 10 copies on Github. At that point, copilot will
               | start to emit the code during use. With all the legal
               | commentary saying this isn't infringement, imagine how
               | companies will be able to use this product.
               | 
               | Similarly, the training set can be intentionally
               | polluted, so your competitor finds the output of Copilot
               | worthless.
        
               | IncRnd wrote:
               | > For example, GitHub license section D.4 specifically
               | grants GitHub the right to display your content, analyze
               | your content, and reproduce it in full to other users of
               | the service. Yet no one seems particularly interested in
               | discussing that here today, because it isn't compatible
               | with the outrage that people are prioritizing on HN when
               | discussion Copilot.
               | 
               | Well, Copilot isn't really an analaysis and display of
               | the source code within the original meaning that people
               | held. That was meant more to run codeql, github actions,
               | and other analysis while presenting the results in a
               | repository to people. People never anticipated that
               | github would strip their licenses from files and present
               | their source code inside of VSCode for people to use
               | freely. It may be legal, but what we are seeing now is an
               | abuse of the sentences you just quoted that goes outside
               | what they were originally understood to mean.
        
               | floatingatoll wrote:
               | Is it fair use to remix two musical albums into a new
               | derivative work, that cannot plausibly be judged to
               | replace demand for either original work?
               | 
               | Is it fair use to autogenerate GIFs from movies, perhaps
               | the most protected digital works on the Internet today,
               | in order to use them as reaction memes on Imgur?
               | 
               | Is it fair use to autoextract code fragments from a code
               | base, in order to use them as suggestions on GitHub?
               | 
               | The Internet, and I imagine HN, was in an uproar when the
               | music industry attempted to kill the White Album, because
               | it infringed on their freedom to remix and derive.
               | 
               | The Internet, and I imagine HN, was in an uproar when MLB
               | attempted to kill unauthorized baseball GIFs and replace
               | them with official curated ones, because it infringed on
               | their freedom to remix and derive.
               | 
               | How, precisely, is remixing and deriving from code
               | 'abusive', in contrast to the past ten or twenty years of
               | pressure on the Internet to the contrary when remixing
               | and deriving from music or movies?
               | 
               | This is a core point of the original post linked above,
               | where the author is shocked by our demands for more
               | prohibitive copyright interpretations, and I want to call
               | this out more bluntly and less politely than they did:
               | 
               | Fair use of a work is _almost always_ perceived as
               | abusive and unfair by the creator of a work. Creators
               | ignore the cognitive dissonance between their demand to
               | have fair use rights granted _more_ easily to the
               | protected works of others, and their demand to have fair
               | use rights granted _less_ easily to their own protected
               | works.
               | 
               | I see that dissonance go unaddressed in every top-level
               | comment in today's discussion. I see that desire to deny
               | fair use rights driving hundreds of emotional me-too
               | posts, without considering the framing of _whether_ it is
               | fair use in alignment with every prior copyright outrage
               | we've discussed over the years.
               | 
               | My theory is that permitting discussion of fair use would
               | weaken their efforts to groundswell a pitchfork mob, and
               | no one wants to confront their own biases or emotional
               | investment or inability to profit from their code.
               | 
               | Whatever the motivations, HN deserves better than this.
        
               | hnfong wrote:
               | Good point. Here's my 2 cents -
               | 
               | > D.4 specifically grants GitHub the right to display
               | your content, analyze your content, and reproduce it in
               | full to other users of the service
               | 
               | If you read the section carefully, this covers the right
               | of GitHub to do those things to your content "as
               | necessary to provide the Service". "It also does not
               | grant GitHub the right to otherwise distribute or use
               | Your Content outside of our provision of the Service".
               | 
               | So, does "Service" only cover the type of Github's
               | service at the time of the agreement, or does it allow
               | Github to invent all kinds of unrelated services and use
               | the code as such? If Github can provide a "Copilot"
               | service that arguably "learns" the code, can it also
               | provide a service that blatantly "copies" large pieces of
               | source code for the user (without complying to OSS
               | license terms)?
               | 
               | It's not very clear what the answer would be, but if what
               | I described is allowed, the consequences of this term
               | being so broad would imply that if you're not the
               | copyright owner of code you uploaded to Github, you've
               | probably violated some OSS license by agreeing to
               | Github's terms.
        
               | floatingatoll wrote:
               | Which OSS licenses are potentially incompatible with
               | GitHub? Are they also incompatible with GitLab? How can
               | one or the other be judged to have exceeded the bounds of
               | what is permissible as a user-generated content provider,
               | and/or fair use rights, in the legal jurisdiction of
               | each?
        
               | ben0x539 wrote:
               | > For example, GitHub license section D.4 specifically
               | grants GitHub the right to display your content, analyze
               | your content, and reproduce it in full to other users of
               | the service. Yet no one seems particularly interested in
               | discussing that here today, because it isn't compatible
               | with the outrage that people are prioritizing on HN when
               | discussion Copilot.
               | 
               | How applicable is the Github license when a lot of code
               | on Github (let's say eg. the Linux kernel) was posted
               | there by people other than the individual copyright
               | holders? I'd assume they can only rely on the open source
               | license of the code in question, and not really on
               | additional license terms. As far as I can tell, Github
               | claims fair use rather than citing their license.
        
               | floatingatoll wrote:
               | That's perhaps the most important question of this entire
               | debate, and it's the one that no one is considering
               | seriously here in the comments. I personally think that
               | it's because no one at HN is both competent enough at
               | copyright and licensing law to debate it _and_ willing to
               | spend time debating it with Internet commenters for a $0
               | /hour wage.
        
       | treffer wrote:
       | Well, I have a hard time drawing a line between GitHub Copilot
       | and a compression algorithm.
       | 
       | If you can reproduce a verbatim copy of Quake source code after
       | taking that source code as input before then that's compression.
       | A really fancy, but still.
       | 
       | And given that it reproduces the source code: it has to hold that
       | somewhere.
       | 
       | It would be very interesting if someone could reproduce the Quake
       | example with AGPL code, then request the whole model + code
       | because it clearly contains the AGPL code in some encoded form.
        
         | abriosi wrote:
         | Some purists may say learning is compressing
        
           | Syzygies wrote:
           | Yes! In every form, lossy compression is distilling
           | meaningful information from noise.
           | 
           | This is a great legal question as it concerns our use of
           | machine agents. We can learn from copyrighted literature or
           | code that we read. Why can't our agents?
        
             | Zababa wrote:
             | > We can learn from copyrighted literature or code that we
             | read.
             | 
             | Not everywhere. Emulators communities often prohibit people
             | from contributing if they've read the original code to
             | protect themselves from copyright claims.
        
             | AlotOfReading wrote:
             | Because the process is different. You and any computer
             | agent are allowed to learn the functional, non-
             | copyrightable elements of fast inverse sqrt. When you need
             | that functionality, you can write code that implements your
             | understanding of those non-copyrightable elements and gain
             | copyright over the resulting creative expression.
             | 
             | What you _can 't_ do is copy all of the creative expression
             | in the original (such as comments) without complying with
             | the terms of the license. Moreover, reproducing the magic
             | constants is a strong indication that your process didn't
             | independently derive your code because the constants used
             | in the original are unique and non-optimal.
        
               | anticensor wrote:
               | I should include a term in my licenses that licensees
               | explicitly waive their rights to fair use and/or fair
               | dealing.
        
         | burnte wrote:
         | If your model can't reproduce the Quake source without my
         | input, you haven't really compressed it, especially if the
         | dataset to recreate it is larger than the original. If I have
         | to tell the program exactly what I want in detail to get the
         | Quake source, that's more of a storage database. If I have to
         | guide it intently to get it to output the Quake source, I'm
         | heavily guiding it.
        
           | dleslie wrote:
           | All decompression requires input: the compressed artifact. In
           | this case, the compressed artifact is the semantic queues
           | necessary to extract the Quake inverse square root function.
        
           | swiftcoder wrote:
           | > especially if the dataset to recreate it is larger than the
           | original
           | 
           | Many types of compression produce a compressed file larger
           | than the original for input data that is not easily
           | compressed. Just because a compressor is bad at compressing
           | (some) inputs, doesn't exclude it from being a compression
           | algorithm.
        
         | vharuck wrote:
         | A compressed file containing Quake's source code would be
         | covered by the copyright on Quake's source code. The
         | compression algorithm would not. The algorithm cannot produce
         | the plain-text copyrighted material without the compressed
         | copyrighted material.
         | 
         | Copilot has the ability to produce Quake's source code nearly
         | by itself. And it's a work (not a person), so it can be seen as
         | a derived work. Like a compression algorithm that sometimes
         | tacks on the first paragraph of "50 Shades of Grey" at the end
         | of files.
         | 
         | I'm not a lawyer, but that's my opinion (admittedly, my opinion
         | is softening each day). Plus, the purpose of the tool is to
         | create code for inclusion in projects somebody will hold a
         | copyright over, and they likely won't be the original authors.
         | So it's output should be held to a higher standard than a
         | compression algorithm or keyboard.
        
           | madsbuch wrote:
           | > A compressed file containing Quake's source code would be
           | covered by the copyright on Quake's source code. The
           | compression algorithm would not.
           | 
           | What? Where does the distinction between data and algorithm
           | go with compression algorithms?
           | 
           | In its most abstract form a compression algorithm is function
           | `{0, 1}^n -> {0, 1}^m` such that n < m and the output string
           | is the result of something previously encoded.
           | 
           | Why can't the input string be the seed used to make the
           | machine learnt model generate the Quake source code?
        
           | leereeves wrote:
           | > Copilot has the ability to produce Quake's source code
           | nearly by itself.
           | 
           | Was it fed the Quake source code while training? Then it's
           | not producing that code, it's just reproducing it, like a
           | fancy (but imperfect) copy machine.
           | 
           | I'm not sure it's accurate to say that the training source
           | code is "compressed" in the parameters of the model, but
           | certainly some approximation of the training source code is
           | stored in the parameters.
        
             | treffer wrote:
             | It is probably a stretch, but I think less of a stretch
             | than saying "it just a machine that learned to code and
             | randomly reproduced these 10+ lines of code". That has IMHO
             | a probability of 0.
             | 
             | So if I rule that out, where does it end up? What if we put
             | this as the grown up ML brother of the chain of LZW, PPM,
             | dictionary assisted compression (e.g. zstd) and various
             | attempts at using neural networks for compression?
             | 
             | I would not want to judge this - that's why I put up the
             | AGPL idea. Or even unlicensed code. It would be a very
             | interesting case to watch.
        
         | dleslie wrote:
         | This is an interesting perspective; it does, indeed, seem like
         | Copilot is a lossy compression algorithm wrapped in a semantic
         | search interface.
        
           | Spivak wrote:
           | I mean that's essentially what all ML is if you want to think
           | about it that way.
           | 
           | Training is the process of creating a space where searching
           | for the right thing within it gives you the answer to some
           | problem you have.
        
         | elliekelly wrote:
         | I know absolutely nothing about IP and even less about
         | compression but aren't compression algorithms usually run on
         | copyright protected material with the consent of the rights
         | holder or authorized licensee?
        
         | belorn wrote:
         | You would not need to produce a perfect copy. A fansub of a
         | movie is considered an derivative of the movie, while being a
         | far cry from being an actually copy of the movie.
         | 
         | As a subtitle is to a movie, the quake "output" might be much
         | smaller than quake itself.
        
       | chx wrote:
       | > The short code snippets that Copilot reproduces from training
       | data are unlikely to reach the threshold of originality.
       | 
       | I can only repeat myself: In light of Google v. Oracle going as
       | far as the Supreme Court I find your confidence in this quite
       | astonishing.
        
       | dominicjj wrote:
       | "(of course, free software licenses would still fulfil the
       | important function of contractually requiring the publication of
       | modified source code)"
       | 
       | No no no. Licenses are NOT contracts. Someone who copies or makes
       | derivative works of copylefted software which they then
       | distribute is obliged to remain within the bounds of the license
       | not because they voluntarily promised, but because they don't
       | have any right to act at all except as the license permits.
       | 
       | https://www.gnu.org/philosophy/enforcing-gpl.en.html
        
         | hnfong wrote:
         | OSS licenses, so far as they a permissive and require nothing
         | in return, are not contracts. This is often the case for simply
         | _using_ the OSS software. The user has no obligations
         | whatsoever.
         | 
         | If, on the other hand, the licensor and licensee both have some
         | obligations (in OSS, this is usually when you modify or
         | redistribute the source or compiled product), then it's
         | basically a contract, no matter what RMS claims.
         | 
         | I mean, with all due respect to the guy, he makes controversial
         | claims even in the field of software engineering (and also free
         | software evangelism), his supposed professional field. Why
         | would you trust what he says about contract law, a field where
         | he has no professional training whatsoever?
         | 
         | (That said, GPLv2 is still an ingenious work for many reasons,
         | albeit lawyers probably won't draft it that way)
        
           | luhn wrote:
           | That article isn't written by RMS, and the author has some
           | relevant credentials.
           | 
           | > Eben Moglen is professor of law and legal history at
           | Columbia University Law School.
        
             | hnfong wrote:
             | You're right. I was mistaken -- I thought he was referring
             | to those RMS claims that in general the GPL is not a
             | contract.
             | 
             | In Moglen's article about _enforcement_ , I think he's
             | right that where there's a breach of GPL there is no
             | contract. In fact that's what I said also in my follow up
             | reply.
        
           | hnfong wrote:
           | PS: There's still a nuance that might require
           | clarification(or am I adding confusion?) in your original
           | quote though:
           | 
           | Quote: "(of course, free software licenses would still fulfil
           | the important function of contractually requiring the
           | publication of modified source code)"
           | 
           | Even though as many others have pointed out, OSS licenses can
           | be contracts, I'm actually not sure this sentence is correct.
           | 
           | When somebody uses the source code in compliance with the
           | license terms, a contract might be formed to allow both
           | parties to enjoy rights. However, if one party never complied
           | with those terms and breaches them (eg. distributing source
           | without retaining copyright notices), then arguably no
           | contract was ever formed, and the act is a simple matter of
           | _copyright violation_ and not a  "breach of contract".
           | 
           | Hope I'm not splitting hairs.
           | 
           | Disclaimer: learned English common law a bit, not a lawyer.
        
           | goodpoint wrote:
           | This is false on many levels.
        
             | hnfong wrote:
             | Pray tell? (preferably citing legal authority?)
        
           | dominicjj wrote:
           | "This is often the case for simply using the OSS software.
           | The user has no obligations whatsoever."
           | 
           | This is a category error when it comes to copyleft licenses
           | like the GPL. It has nothing to say about usage.
           | 
           | "If, on the other hand, the licensor and licensee both have
           | some obligations (in OSS, this is usually when you modify or
           | redistribute the source or compiled product), then it's
           | basically a contract, no matter what RMS claims."
           | 
           | No it's not. There are no pre-agreed terms, penalties for
           | violation, expected compensation for services provided or
           | anything like that. GPLed software is copyrighted. Copyright
           | law says you have no rights to copy it or make derivative
           | works of it whatsoever. The license permits you to do so.
           | 
           | "Why would you trust what he says about contract law, a field
           | where he has no professional training whatsoever?"
           | 
           | Because, surprise surprise, he has advice from people who ARE
           | trained in the law.
        
             | hnfong wrote:
             | > The GPL ... has nothing to say about usage.
             | 
             | The GPLv2 text: " The act of running the Program is not
             | restricted, and the output from the Program is covered only
             | if its contents constitute a work based on the Program
             | (independent of having been made by running the Program). "
             | 
             | Of course this sentence contradicts the previous sentence
             | in the text, which claims that normal usage "is not covered
             | by this License". I presume you'd argue to support your
             | claim, but seriously, this is bad drafting.
             | 
             | > No it's not. There are no pre-agreed terms, penalties for
             | violation, expected compensation for services provided or
             | anything like that.
             | 
             | "penalties for violation", "expected compensation" are not
             | necessary requirements for formation of a contract. The
             | pre-agreed terms are clearly stated in the license text, or
             | at least as clear as far as they don't contradict each
             | other. By the way, "pre-agreed terms" are not necessary for
             | the formation of a contract either.
             | 
             | > Because, surprise surprise, he has advice from people who
             | ARE trained in the law.
             | 
             | Are you trained in the law? Because if you think the
             | Internet should pay regards to somebody trained in the law
             | (even though they may not have learned the law properly) as
             | opposed to somebody who hasn't, then I don't see why you
             | think you have a standing to speak as though you're an
             | authoritative source on the matter.
        
         | fredgrott wrote:
         | and what pre-tell makes it a NON contract?
         | 
         | License all by themselves are forms of contracts
         | 
         | In fact the bill of rights is one
        
           | roywiggins wrote:
           | You need to actively assent to a contract. Some software has
           | contracts ("EULAs") but you are bound by the license whether
           | you agree or not.
           | 
           | https://en.wikipedia.org/wiki/Meeting_of_the_minds?wprov=sfl.
           | ..
        
             | adrusi wrote:
             | A license isn't a contract that binds the licensee, it's a
             | contract that only binds the rightsholder. Since you, the
             | licensee, are not relinquishing any rights in the contract,
             | there's no need for you to agree to anything. The only
             | rights being relinquished are the rightsholder's right to
             | pursue legal retribution for some uses of their work that
             | would otherwise be violations of copyright.
             | 
             | You dont have to call it a contract, but it is a legal
             | document in which one or more parties legally bind
             | themselves, which seems like an adequate definition of a
             | contract to me, and has more etymological fidelity to the
             | word "contract" than other possible definitions that would
             | exclude licenses. A _contract_ is a legal instrument by
             | which the breadth of your rights _contract_ -- as in become
             | smaller.
        
           | robbedpeter wrote:
           | Not trying to be snarky or rude, just letting you know the
           | phrase is "pray tell", for your future reference.
        
         | wizzwizz4 wrote:
         | They're sort of contracts. If you do this, you get a copyright
         | exemption; otherwise, you don't have the legal right to do
         | anything.
        
           | dominicjj wrote:
           | They have nothing to do with contracts and there's a simple
           | test for it. When contracts are violated, then if there's
           | litigation the parties consult the relevant contract law for
           | how to proceed. When a license is violated, the parties
           | consult whatever law the license was permitting an exception
           | to. If you copy software without a license, you can be sued
           | for copyright infringement. If you fish without a license,
           | you can be sued for trespassing.
        
         | user5994461 wrote:
         | >>> No no no. Licenses are NOT contracts.
         | 
         | Yes yes yes, licenses are contracts.
         | 
         | That just got set in stone by the French appeal court and
         | backed by a decision from the CJEU few months before (the
         | European court of Justice).
         | 
         | Case 19 March 2021 https://www.legalis.net/jurisprudences/cour-
         | dappel-de-paris-...
        
           | dominicjj wrote:
           | Enjoy. I'm sure I'm not alone in completely ignoring the
           | opinion of French judges and the European Court of Justice.
        
             | stale2002 wrote:
             | If you don't care about what the courts say, I am not sure
             | why you are making legal claims here.
             | 
             | When it comes to legal matters, the only thing that matters
             | is the opinion of the court system.
        
             | ben0x539 wrote:
             | Just for context, the author of the article you're
             | commenting on is Julia Reda, a EU copyright activist and a
             | former member of the EU parliament. While I likewise don't
             | have too much use for the legal opinion of French courts, I
             | think we can afford to cut her some slack for focusing on
             | legal interpretations in her native jurisdiction.
        
               | dominicjj wrote:
               | Fair enough. She is correct about licenses in her
               | jurisdiction.
        
         | ghoward wrote:
         | You are correct that they are not _necessarily_ contracts, but
         | they can be. (See
         | https://writing.kemitchell.com/2020/12/27/War-on-License-Not...
         | and search for "Blue Oak avoids this theoretical complexity.")
        
         | [deleted]
        
         | detaro wrote:
         | That's the US perspective on the matter, not globally
         | applicable.
        
       | visarga wrote:
       | Copilot is the moment when simple functions have been
       | commoditized, you can have as many as you like almost for free,
       | and adapted to any project. Just spend a moment to admire the
       | transition, it's a new stage of post-scarcity.
       | 
       | AI can recreate photos, paintings, sounds, voice, music, human
       | faces, text, dialogue, math, proteins, and now code. It does all
       | this while allowing humans to control and direct the whole
       | process, and create original combinations. They all have no
       | economic value to own and are free to use now, like words in a
       | language. Enjoy!
       | 
       | Remember Karpathy's Char-RNN? How long we've come.
       | 
       | http://karpathy.github.io/2015/05/21/rnn-effectiveness/
        
       | turtletontine wrote:
       | The idea that the debate actually does a disservice to copyleft
       | by relying on the strictest interpretations of copyright is an
       | interesting perspective to me, but the rest of this seems pretty
       | weak. (Caveat that I'm no lawyer.) Copilot can regurgitate
       | verbatim chunks of other codebases: it seems absurd to me that
       | that wouldn't count as derivative work.
        
       | moralestapia wrote:
       | >it suggests that even reproducing the smallest excerpts of
       | protected works constitutes copyright infringement
       | 
       | Actually, it is. It has to do with whether the small excerpt is
       | copying what could be called "the heart of the work"; which in
       | the case of code I would argue is almost always what you are
       | after. No one's gonna copy the indentation style, boilerplate
       | around functions/blocks, punctuation, etc. You always go for the
       | "functional" part of the code, which is definitely "the heart of
       | the work".
       | 
       | The heart of Carmack's fast inverse square root lies in its
       | selection of a particular set of constants and operations that
       | happen (i.e. were designed) to approximate the square root
       | without taking an expensive path. Copyright law would look at
       | this novelty; I don't think it would argue around "the use of
       | subtraction and multiplication in a computer program", as that
       | would be plainly stupid.
       | 
       | I am surprised that someone who is supposedly an expert in
       | copyright law does not (or pretends not to know) about this, not
       | only that, but to actually suggest the opposite. This is
       | copyright 101, come on.
        
       | softwaredoug wrote:
       | This is kind of beside the point. Something can still be
       | unethical and perfectly legal. The issue is that machine learning
       | can whitewash a developers intended license.
       | 
       | Or put differently, as a GitHub customer, are you comfortable
       | with your code being used this way? Instead of a passive host,
       | your code is now being used to create tremendous value for GitHub
       | and Microsoft. Do you feel your trust has been violated?
       | (regardless of legality).
        
       | reilly3000 wrote:
       | How does one address the fact that 95% of software is based on
       | the same basic tropes? At a certain level of density, all code
       | trying to achieve a similar function to legally-protected code
       | will convene on an implementation that is almost
       | indistinguishable. With LOC accreting exponentially, only time
       | will determine when we reach that threshold. The Copilots of the
       | world serve to accelerate and monetize this reality.
        
       | [deleted]
        
       | MattIPv4 wrote:
       | This seems to completely ignore the fact that we've seen Copilot
       | regurgitating exact copies of existing code, and even with the
       | incorrect license attached when it was asked for it. [0]
       | 
       | [0] https://twitter.com/mitsuhiko/status/1410886329924194309
        
         | sfletcher wrote:
         | The Google Books case cited here allowed Google to show exact
         | snippets (extracts) from the copyrighted books, hard to see how
         | this is any different.
        
           | creshal wrote:
           | It also has no relevance for the discussion at hand. Yes,
           | Github can _display_ all of its content - that 's kind of the
           | point of it.
           | 
           | But Copilot doesn't exist to show you random code snippets
           | for the sole purpose of showing them.
           | 
           |  _Using_ this copyrighted material to create _derivative
           | works_ is a completely different use case, and not covered at
           | all by the Google Books ruling, or any other I 'm aware of.
        
           | mjburgess wrote:
           | You wouldn't be allowed to make derivative works of those
           | books; i.e. copy/paste into your own work.
           | 
           | Google isnt making new books, or enabling people to make
           | derivative copies; it is merely previewing a book.
           | 
           | Github search is a _preview_. Copilot is a copy /paste.
        
             | duckmysick wrote:
             | What about Google Books Ngram Viewer? Isn't that a
             | derivative work based on copyrighted content? It's more
             | than just a search or preview - it contains both novel
             | information and snippets of existing content. Is linguistic
             | corpus a special case?
             | 
             | https://books.google.com/ngrams
        
               | pessimizer wrote:
               | https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Goog
               | le,....
               | 
               | The research value actually turned activity that would be
               | infringing into activity that was not infringing. _Take
               | away_ things like the ngram viewer and Google Books
               | infringes.
        
         | bootlooped wrote:
         | That function exists in hundreds, if not thousands, of GitHub
         | repositories. The function is so well known it has it's own
         | Wikipedia page. If there is a more famous function in computer
         | science, I don't know what it is. The fact that a machine
         | trained on GitHub repositories might reproduce such common code
         | is not alarming or surprising to me. I think people are using
         | this as an example and implying it's happening all over the
         | place, but I've yet to see another example like it.
        
           | user5994461 wrote:
           | >>> If there is a more famous function in computer science
           | 
           | Maybe fizzbuzz?
           | 
           | Let's try to auto generate some fizzbuzz code, see what we
           | get :D
        
         | timdaub wrote:
         | Agree. I've written a comment on her blog about this. Hoping
         | she'll enable it. I've published an opinion piece on the
         | subject matter myself:
         | https://rugpullindex.com/blog#BuiltonStolenData
         | 
         | Edit: My comment was enabled:
         | https://juliareda.eu/2021/07/github-copilot-is-not-infringin...
        
       | kalium-xyz wrote:
       | " If it looks like a duck, swims like a duck, and quacks like a
       | duck, then it probably is a duck." I don't see my license
       | respected for code it regurgitates that I wrote, there is nothing
       | more to this.
        
       ___________________________________________________________________
       (page generated 2021-07-05 23:00 UTC)