[HN Gopher] GitHub Copilot investigation
___________________________________________________________________
GitHub Copilot investigation
Author : john-doe
Score : 102 points
Date : 2022-10-17 22:30 UTC (29 minutes ago)
(HTM) web link (githubcopilotinvestigation.com)
(TXT) w3m dump (githubcopilotinvestigation.com)
| jasone wrote:
| I really don't care if my code gets ingested and regurgitated by
| Copilot, but it seems rather a stretch to imagine that this is
| fair use, in part because it separates me from the legal
| protections afforded by the licenses I released my software
| under. In my ideal world, Copilot would be legally viable, and
| releasing my software without restriction wouldn't be risky.
|
| As a long-time open source software developer, I have favored the
| 2-clause BSD and MIT licenses because they are the simplest
| licenses that provide me some liability protection. I would
| release code into the public domain if that didn't increase the
| likelihood of being sued, whether for liability, or for someone
| else claiming intellectual rights to code I actually wrote.
| armchairhacker wrote:
| One issue I see with Copilot is that they get free access to all
| open-source data on GitHub, but using GitHub APIs to download the
| data yourself isn't possible (rate limiting). This is an unfair
| advantage. Copilot is not only making money off of open-source,
| they are making money off of open-source in a way others can't.
|
| I would love to see a lawsuit which requires GitHub to provide
| their full Copilot dataset.
| Kiro wrote:
| Abolish all copyright. You're all happily pirating movies and
| songs but code is for some reason sacred.
| ipaddr wrote:
| Doesn't really explain how co-pilot is stealing your community.
| I've used co-pilot and it works great until you are past
| boilerplate than it falls apart.
| otterley wrote:
| Agree, the argument seems pretty threadbare. Millions of
| programmers use open-source software every day and incorporate
| it into their own projects without ever engaging the authors of
| the code upon which they rely.
|
| Perhaps the author means that there's a possibility that the
| programmer to whom the code was suggested won't necessarily
| know its provenance and how to engage the community from whence
| it came. If so, that's a stronger argument, but I don't know
| that it's the best one they can make.
| ja3k wrote:
| > Over time, this process will starve these communities. User
| attention and engagement will be shifted into the walled garden
| of Copilot and away from the open-source projects themselves
|
| The author seems to be implying that since Copilot can
| reproduce the code of open source repository X in certain
| scenarios there'd be no reason for programmers to
| learn/use/engage with repository X. But this is silly. Maybe
| some open source repositories could be tab completed with a
| little prompting but people will presumably choose to add a
| dependency instead of tab completing the code of express or
| something.
| cowtools wrote:
| It strips GPL, or any license.
| cercatrova wrote:
| It also doesn't make any sense. Copilot suggesting to me the
| signature of a function from some library is not actually the
| same as executing that library. That library still needs to
| be downloaded onto my computer to be executed. And who will
| write new features to a library if not for the people who are
| interested in that?
| comfypotato wrote:
| [deleted]
| Thorentis wrote:
| All class actions are a mix of both
| authpor wrote:
| I'm more worried about the status of freedom in software, open
| source feels like a mirage to divert the attention away from the
| original issues from the FSF.
| commitpizza wrote:
| Great, I hope it is tried in court. It should be. But
| unfortunately I have not a big hope that the courts will come to
| understand the issue well enough.
| otterley wrote:
| Many courts, especially those in the Northern District of
| California (where a case would likely end up litigated), are
| very proficient and literate about software and copyright law.
| See Judge William Alsup's cases if you want to see some
| examples that illustrate the court's competence. And these
| judges frequently have technical consultants on staff to assist
| with technological issues.
| echelon wrote:
| BSD 5-Clause
|
| 1. Redistributions of source code must retain the above copyright
| notice, this list of conditions and the following disclaimer.
|
| 2. Redistributions in binary form must reproduce the above
| copyright notice, this list of conditions and the following
| disclaimer in the documentation and/or other materials provided
| with the distribution.
|
| 3. All advertising materials mentioning features or use of this
| software must display the following acknowledgement: This product
| includes software developed by the organization.
|
| 4. Neither the name of the copyright holder nor the names of its
| contributors may be used to endorse or promote products derived
| from this software without specific prior written permission.
|
| 5. Use of this source code for the research or training of
| machine learning models is permitted.
| jen20 wrote:
| The issue with copilot is that it is not respecting clause 1.
| cercatrova wrote:
| Does GitHub not have the right to view and train from your
| content when you agree to their Terms of Service and upload your
| code?
|
| People are conflating their open source license with the one they
| give GitHub when making a GitHub account, but they are two
| entirely separate and parallel licenses. The former is for other
| people to use your code, the latter is for GitHub to host your
| code.
|
| If you don't like it, you are free to host your code on your own
| servers.
|
| And anyway, as noted the other day about AI, it is often funny to
| see people not care about (or even enjoy) AI in other fields that
| they don't work in, but when it comes for their own field, they
| are suddenly very worried. See programmers on HN who argue for
| Stable Diffusion but against Copilot, and vice versa with artists
| on Twitter. As I commented then, it's an act of cowardice to
| think our own profession should be immune from AI while we enjoy
| the fruits of AI in other fields [0]:
|
| _> Yes, many of us will turn into cowards when automation starts
| to touch our work, but that would not prove this sentiment
| incorrect - only that we 're cowards._
|
| _> > Dude. What the hell kind of anti-life philosophy are you
| subscribing to that calls "being unhappy about people trying to
| automate an entire field of human behavior" being a "coward".
| Geez._
|
| _> >> Because automation is generally good, but making an
| exemption for specific cases of automation that personally
| inconvenience you is rooted is cowardice/selfishness. Similar to
| NIMBYism._
|
| We _should_ want AI. That we then try to use outdated models like
| copyright to enforce holding back human progress is a true shame.
| In my view, _so what_ if GitHub uses people 's code for training
| data, we are all getting a better product because of that.
|
| [0] https://news.ycombinator.com/item?id=33226515#33228948
| akprasad wrote:
| > If you don't like it, you are free to host your code on your
| own servers.
|
| From the article:
|
| > "Dude, it's cool. I took SFC's advice and moved my code off
| GitHub." So did I. Guess what? It doesn't matter. By claiming
| that AI training is fair use, Microsoft is constructing a
| justification for training on public code anywhere on the
| internet, not just GitHub.
|
| And:
|
| > when it comes for their own field, they are suddenly very
| worried
|
| From the article:
|
| > First, the objection here is not to AI-assisted coding tools
| generally, but to Microsoft's specific choices with Copilot. We
| can easily imagine a version of Copilot that's friendlier to
| open-source developers--for instance, where participation is
| voluntary, or where coders are paid to contribute to the
| training corpus. Despite its professed love for open source,
| Microsoft chose none of these options. Second, if you find
| Copilot valuable, it's largely because of the quality of the
| underlying open-source training data. As Copilot sucks the life
| from open-source projects, the proximate effect will be to make
| Copilot ever worse--a spiraling ouroboros of garbage code.
| sneak wrote:
| Not all code on GitHub was uploaded by the copyright holder.
| The entire linux kernel is on GitHub and at least some of those
| copyright holders have never explicitly granted a license to
| GitHub beyond the GPL.
| SXX wrote:
| Code pushed to GitHub is quite often not pushed by actual
| copyright holders and there no way to distinguish between it
| even if there was clause like this in GitHub user agreement.
| whateveracct wrote:
| ToS isn't some all-powerful thing, first of all. A lot of it is
| unenforceable nonsense. And I'm not sure how it really works
| with OSS.
|
| For instance, what if I self-host an OSS project but someone
| puts a mirror on GitHub? Or just uses GH as a remote for their
| fork? Does that random person accepting the ToS now mean GH has
| carte blanche to do whatever they want with that IP?
| wongarsu wrote:
| There are quite a few projects that didn't originate on Github.
| Some are mirrors of projects hosted elsewhere, some accept
| patches through other means. If get your linux kernel patch
| accepted by emailing it to the responsible maintainer, it will
| end up on https://github.com/torvalds/linux, but you never
| agreed to the Github ToS, all you did was agree to publish it
| under the GPLv2. Linus agreed to the Github ToS, but he can't
| give away rights he doesn't have, so he can't be giving Github
| any rights to your patches that go beyond the GPL.
| blatherard wrote:
| The current github terms of service don't seem to mention this
| use when they describe the license granted github.
|
| https://docs.github.com/en/site-policy/github-terms/github-t...
|
| 4. License Grant to Us
|
| We need the legal right to do things like host Your Content,
| publish it, and share it. You grant us and our legal successors
| the right to store, archive, parse, and display Your Content,
| and make incidental copies, as necessary to provide the
| Service, including improving the Service over time. This
| license includes the right to do things like copy it to our
| database and make backups; show it to you and other users;
| parse it into a search index or otherwise analyze it on our
| servers; share it with other users; and perform it, in case
| Your Content is something like music or video.
|
| This license does not grant GitHub the right to sell Your
| Content. It also does not grant GitHub the right to otherwise
| distribute or use Your Content outside of our provision of the
| Service, except that as part of the right to archive Your
| Content, GitHub may permit our partners to store and archive
| Your Content in public repositories in connection with the
| GitHub Arctic Code Vault and GitHub Archive Program.
| Arainach wrote:
| That mentions everything: Parsing the content, showing it
| to/sharing it with other users, using it to improve and
| provide the service. GitHub and all of its features are "the
| service".
| cercatrova wrote:
| > show it to you and other users...analyze it on our
| servers...share it with other users...perform it
|
| I don't know, sounds pretty similar to training on ML
| programs, even if they don't explicitly say "machine
| learning" in the ToS.
| williamcotton wrote:
| Copyright only covers the expressive parts and not the
| utilitarian parts:
|
| https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...
|
| https://en.wikipedia.org/wiki/Idea-expression_distinction
|
| https://h2o.law.harvard.edu/cases/5004
|
| Most of your code is probably not subject to copyright in the
| first place, regardless of license.
| jashmatthews wrote:
| Doesn't Copilot reproduce the exact expression given the right
| prompt, though?
| gjsman-1000 wrote:
| Even if it does, it may not matter. For example, APIs are not
| copyrightable (see Google v Oracle), and if there is only one
| obvious efficient way to make something work, it does not
| follow that the user must be prohibited from using that way
| even if someone else did it first.
| blibble wrote:
| in countries where there is no fair use (most of the world
| outside the US) it seems quite likely copilot is willful,
| commercial scale copyright infringement
| RunSet wrote:
| > Microsoft characterizes the output of Copilot as a series of
| code "suggestions". Microsoft "does not claim any rights" in
| these suggestions. But neither does Microsoft make any guarantees
| about the correctness, security, or extenuating intellectual-
| property entanglements of the code so produced. Once you accept a
| Copilot suggestion, all that becomes your problem:
|
| > "You are responsible for ensuring the security and quality of
| your code. We recommend you take the same precautions when using
| code generated by GitHub Copilot that you would when using any
| code you didn't write yourself. These precautions include
| rigorous testing, intellectual property scanning, and tracking
| for security vulnerabilities."
|
| I can't help but recall:
|
| "Linux is a cancer that attaches itself in an intellectual
| property sense to everything it touches."
|
| - Steve Ballmer, while CEO of Microsoft
| walrus01 wrote:
| > Steve Ballmer
|
| They have some really good blow in Redmond.
|
| If anybody could win an award for being coked up and sweaty on
| stage...
|
| https://www.youtube.com/watch?v=Vhh_GeBPOhs
| pr337h4m wrote:
| It's tragically beautiful how the copyleft crowd is putting so
| much effort into drastically expanding the scope of copyright.
|
| "I used the copyright to destroy the copyright."
|
| That sort of plot never works in practice.
| boomskats wrote:
| Everything else aside, the design on this site is among the best
| I've ever seen. Amazing typography, great to read on a phone.
| elfatizer wrote:
| He wrote the book on it. https://practicaltypography.com/
| Kiro wrote:
| I think it's very hard to skim for some reason.
| chatterhead wrote:
| Can you elaborate on what makes it so? Changing font sizes,
| boldness, lines etc...
| mmastrac wrote:
| I think the test for whether an AI is infringing or not should
| be:
|
| Can this AI regurgitate the vast majority of the creative aspects
| of an original/novel piece of software with minimal prompting, to
| the point where the output code looks mostly and directly cloned
| to a reasonable person trained in the art?
| crawsome wrote:
___________________________________________________________________
(page generated 2022-10-17 23:00 UTC)