[HN Gopher] GitHub Copilot investigation
       ___________________________________________________________________
        
       GitHub Copilot investigation
        
       Author : john-doe
       Score  : 102 points
       Date   : 2022-10-17 22:30 UTC (29 minutes ago)
        
 (HTM) web link (githubcopilotinvestigation.com)
 (TXT) w3m dump (githubcopilotinvestigation.com)
        
       | jasone wrote:
       | I really don't care if my code gets ingested and regurgitated by
       | Copilot, but it seems rather a stretch to imagine that this is
       | fair use, in part because it separates me from the legal
       | protections afforded by the licenses I released my software
       | under. In my ideal world, Copilot would be legally viable, and
       | releasing my software without restriction wouldn't be risky.
       | 
       | As a long-time open source software developer, I have favored the
       | 2-clause BSD and MIT licenses because they are the simplest
       | licenses that provide me some liability protection. I would
       | release code into the public domain if that didn't increase the
       | likelihood of being sued, whether for liability, or for someone
       | else claiming intellectual rights to code I actually wrote.
        
       | armchairhacker wrote:
       | One issue I see with Copilot is that they get free access to all
       | open-source data on GitHub, but using GitHub APIs to download the
       | data yourself isn't possible (rate limiting). This is an unfair
       | advantage. Copilot is not only making money off of open-source,
       | they are making money off of open-source in a way others can't.
       | 
       | I would love to see a lawsuit which requires GitHub to provide
       | their full Copilot dataset.
        
       | Kiro wrote:
       | Abolish all copyright. You're all happily pirating movies and
       | songs but code is for some reason sacred.
        
       | ipaddr wrote:
       | Doesn't really explain how co-pilot is stealing your community.
       | I've used co-pilot and it works great until you are past
       | boilerplate than it falls apart.
        
         | otterley wrote:
         | Agree, the argument seems pretty threadbare. Millions of
         | programmers use open-source software every day and incorporate
         | it into their own projects without ever engaging the authors of
         | the code upon which they rely.
         | 
         | Perhaps the author means that there's a possibility that the
         | programmer to whom the code was suggested won't necessarily
         | know its provenance and how to engage the community from whence
         | it came. If so, that's a stronger argument, but I don't know
         | that it's the best one they can make.
        
         | ja3k wrote:
         | > Over time, this process will starve these communities. User
         | attention and engagement will be shifted into the walled garden
         | of Copilot and away from the open-source projects themselves
         | 
         | The author seems to be implying that since Copilot can
         | reproduce the code of open source repository X in certain
         | scenarios there'd be no reason for programmers to
         | learn/use/engage with repository X. But this is silly. Maybe
         | some open source repositories could be tab completed with a
         | little prompting but people will presumably choose to add a
         | dependency instead of tab completing the code of express or
         | something.
        
           | cowtools wrote:
           | It strips GPL, or any license.
        
           | cercatrova wrote:
           | It also doesn't make any sense. Copilot suggesting to me the
           | signature of a function from some library is not actually the
           | same as executing that library. That library still needs to
           | be downloaded onto my computer to be executed. And who will
           | write new features to a library if not for the people who are
           | interested in that?
        
       | comfypotato wrote:
       | [deleted]
        
         | Thorentis wrote:
         | All class actions are a mix of both
        
       | authpor wrote:
       | I'm more worried about the status of freedom in software, open
       | source feels like a mirage to divert the attention away from the
       | original issues from the FSF.
        
       | commitpizza wrote:
       | Great, I hope it is tried in court. It should be. But
       | unfortunately I have not a big hope that the courts will come to
       | understand the issue well enough.
        
         | otterley wrote:
         | Many courts, especially those in the Northern District of
         | California (where a case would likely end up litigated), are
         | very proficient and literate about software and copyright law.
         | See Judge William Alsup's cases if you want to see some
         | examples that illustrate the court's competence. And these
         | judges frequently have technical consultants on staff to assist
         | with technological issues.
        
       | echelon wrote:
       | BSD 5-Clause
       | 
       | 1. Redistributions of source code must retain the above copyright
       | notice, this list of conditions and the following disclaimer.
       | 
       | 2. Redistributions in binary form must reproduce the above
       | copyright notice, this list of conditions and the following
       | disclaimer in the documentation and/or other materials provided
       | with the distribution.
       | 
       | 3. All advertising materials mentioning features or use of this
       | software must display the following acknowledgement: This product
       | includes software developed by the organization.
       | 
       | 4. Neither the name of the copyright holder nor the names of its
       | contributors may be used to endorse or promote products derived
       | from this software without specific prior written permission.
       | 
       | 5. Use of this source code for the research or training of
       | machine learning models is permitted.
        
         | jen20 wrote:
         | The issue with copilot is that it is not respecting clause 1.
        
       | cercatrova wrote:
       | Does GitHub not have the right to view and train from your
       | content when you agree to their Terms of Service and upload your
       | code?
       | 
       | People are conflating their open source license with the one they
       | give GitHub when making a GitHub account, but they are two
       | entirely separate and parallel licenses. The former is for other
       | people to use your code, the latter is for GitHub to host your
       | code.
       | 
       | If you don't like it, you are free to host your code on your own
       | servers.
       | 
       | And anyway, as noted the other day about AI, it is often funny to
       | see people not care about (or even enjoy) AI in other fields that
       | they don't work in, but when it comes for their own field, they
       | are suddenly very worried. See programmers on HN who argue for
       | Stable Diffusion but against Copilot, and vice versa with artists
       | on Twitter. As I commented then, it's an act of cowardice to
       | think our own profession should be immune from AI while we enjoy
       | the fruits of AI in other fields [0]:
       | 
       |  _> Yes, many of us will turn into cowards when automation starts
       | to touch our work, but that would not prove this sentiment
       | incorrect - only that we 're cowards._
       | 
       |  _> > Dude. What the hell kind of anti-life philosophy are you
       | subscribing to that calls "being unhappy about people trying to
       | automate an entire field of human behavior" being a "coward".
       | Geez._
       | 
       |  _> >> Because automation is generally good, but making an
       | exemption for specific cases of automation that personally
       | inconvenience you is rooted is cowardice/selfishness. Similar to
       | NIMBYism._
       | 
       | We _should_ want AI. That we then try to use outdated models like
       | copyright to enforce holding back human progress is a true shame.
       | In my view, _so what_ if GitHub uses people 's code for training
       | data, we are all getting a better product because of that.
       | 
       | [0] https://news.ycombinator.com/item?id=33226515#33228948
        
         | akprasad wrote:
         | > If you don't like it, you are free to host your code on your
         | own servers.
         | 
         | From the article:
         | 
         | > "Dude, it's cool. I took SFC's advice and moved my code off
         | GitHub." So did I. Guess what? It doesn't matter. By claiming
         | that AI training is fair use, Microsoft is constructing a
         | justification for training on public code anywhere on the
         | internet, not just GitHub.
         | 
         | And:
         | 
         | > when it comes for their own field, they are suddenly very
         | worried
         | 
         | From the article:
         | 
         | > First, the objection here is not to AI-assisted coding tools
         | generally, but to Microsoft's specific choices with Copilot. We
         | can easily imagine a version of Copilot that's friendlier to
         | open-source developers--for instance, where participation is
         | voluntary, or where coders are paid to contribute to the
         | training corpus. Despite its professed love for open source,
         | Microsoft chose none of these options. Second, if you find
         | Copilot valuable, it's largely because of the quality of the
         | underlying open-source training data. As Copilot sucks the life
         | from open-source projects, the proximate effect will be to make
         | Copilot ever worse--a spiraling ouroboros of garbage code.
        
         | sneak wrote:
         | Not all code on GitHub was uploaded by the copyright holder.
         | The entire linux kernel is on GitHub and at least some of those
         | copyright holders have never explicitly granted a license to
         | GitHub beyond the GPL.
        
         | SXX wrote:
         | Code pushed to GitHub is quite often not pushed by actual
         | copyright holders and there no way to distinguish between it
         | even if there was clause like this in GitHub user agreement.
        
         | whateveracct wrote:
         | ToS isn't some all-powerful thing, first of all. A lot of it is
         | unenforceable nonsense. And I'm not sure how it really works
         | with OSS.
         | 
         | For instance, what if I self-host an OSS project but someone
         | puts a mirror on GitHub? Or just uses GH as a remote for their
         | fork? Does that random person accepting the ToS now mean GH has
         | carte blanche to do whatever they want with that IP?
        
         | wongarsu wrote:
         | There are quite a few projects that didn't originate on Github.
         | Some are mirrors of projects hosted elsewhere, some accept
         | patches through other means. If get your linux kernel patch
         | accepted by emailing it to the responsible maintainer, it will
         | end up on https://github.com/torvalds/linux, but you never
         | agreed to the Github ToS, all you did was agree to publish it
         | under the GPLv2. Linus agreed to the Github ToS, but he can't
         | give away rights he doesn't have, so he can't be giving Github
         | any rights to your patches that go beyond the GPL.
        
         | blatherard wrote:
         | The current github terms of service don't seem to mention this
         | use when they describe the license granted github.
         | 
         | https://docs.github.com/en/site-policy/github-terms/github-t...
         | 
         | 4. License Grant to Us
         | 
         | We need the legal right to do things like host Your Content,
         | publish it, and share it. You grant us and our legal successors
         | the right to store, archive, parse, and display Your Content,
         | and make incidental copies, as necessary to provide the
         | Service, including improving the Service over time. This
         | license includes the right to do things like copy it to our
         | database and make backups; show it to you and other users;
         | parse it into a search index or otherwise analyze it on our
         | servers; share it with other users; and perform it, in case
         | Your Content is something like music or video.
         | 
         | This license does not grant GitHub the right to sell Your
         | Content. It also does not grant GitHub the right to otherwise
         | distribute or use Your Content outside of our provision of the
         | Service, except that as part of the right to archive Your
         | Content, GitHub may permit our partners to store and archive
         | Your Content in public repositories in connection with the
         | GitHub Arctic Code Vault and GitHub Archive Program.
        
           | Arainach wrote:
           | That mentions everything: Parsing the content, showing it
           | to/sharing it with other users, using it to improve and
           | provide the service. GitHub and all of its features are "the
           | service".
        
           | cercatrova wrote:
           | > show it to you and other users...analyze it on our
           | servers...share it with other users...perform it
           | 
           | I don't know, sounds pretty similar to training on ML
           | programs, even if they don't explicitly say "machine
           | learning" in the ToS.
        
       | williamcotton wrote:
       | Copyright only covers the expressive parts and not the
       | utilitarian parts:
       | 
       | https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...
       | 
       | https://en.wikipedia.org/wiki/Idea-expression_distinction
       | 
       | https://h2o.law.harvard.edu/cases/5004
       | 
       | Most of your code is probably not subject to copyright in the
       | first place, regardless of license.
        
         | jashmatthews wrote:
         | Doesn't Copilot reproduce the exact expression given the right
         | prompt, though?
        
           | gjsman-1000 wrote:
           | Even if it does, it may not matter. For example, APIs are not
           | copyrightable (see Google v Oracle), and if there is only one
           | obvious efficient way to make something work, it does not
           | follow that the user must be prohibited from using that way
           | even if someone else did it first.
        
       | blibble wrote:
       | in countries where there is no fair use (most of the world
       | outside the US) it seems quite likely copilot is willful,
       | commercial scale copyright infringement
        
       | RunSet wrote:
       | > Microsoft characterizes the output of Copilot as a series of
       | code "suggestions". Microsoft "does not claim any rights" in
       | these suggestions. But neither does Microsoft make any guarantees
       | about the correctness, security, or extenuating intellectual-
       | property entanglements of the code so produced. Once you accept a
       | Copilot suggestion, all that becomes your problem:
       | 
       | > "You are responsible for ensuring the security and quality of
       | your code. We recommend you take the same precautions when using
       | code generated by GitHub Copilot that you would when using any
       | code you didn't write yourself. These precautions include
       | rigorous testing, intellectual property scanning, and tracking
       | for security vulnerabilities."
       | 
       | I can't help but recall:
       | 
       | "Linux is a cancer that attaches itself in an intellectual
       | property sense to everything it touches."
       | 
       | - Steve Ballmer, while CEO of Microsoft
        
         | walrus01 wrote:
         | > Steve Ballmer
         | 
         | They have some really good blow in Redmond.
         | 
         | If anybody could win an award for being coked up and sweaty on
         | stage...
         | 
         | https://www.youtube.com/watch?v=Vhh_GeBPOhs
        
       | pr337h4m wrote:
       | It's tragically beautiful how the copyleft crowd is putting so
       | much effort into drastically expanding the scope of copyright.
       | 
       | "I used the copyright to destroy the copyright."
       | 
       | That sort of plot never works in practice.
        
       | boomskats wrote:
       | Everything else aside, the design on this site is among the best
       | I've ever seen. Amazing typography, great to read on a phone.
        
         | elfatizer wrote:
         | He wrote the book on it. https://practicaltypography.com/
        
         | Kiro wrote:
         | I think it's very hard to skim for some reason.
        
         | chatterhead wrote:
         | Can you elaborate on what makes it so? Changing font sizes,
         | boldness, lines etc...
        
       | mmastrac wrote:
       | I think the test for whether an AI is infringing or not should
       | be:
       | 
       | Can this AI regurgitate the vast majority of the creative aspects
       | of an original/novel piece of software with minimal prompting, to
       | the point where the output code looks mostly and directly cloned
       | to a reasonable person trained in the art?
        
       | crawsome wrote:
        
       ___________________________________________________________________
       (page generated 2022-10-17 23:00 UTC)