[HN Gopher] I do not agree with Github's use of copyrighted code...
       ___________________________________________________________________
        
       I do not agree with Github's use of copyrighted code as training
       for Copilot
        
       Author : janvdberg
       Score  : 544 points
       Date   : 2021-07-03 19:21 UTC (3 hours ago)
        
 (HTM) web link (thelig.ht)
 (TXT) w3m dump (thelig.ht)
        
       | saurik wrote:
       | I never hosted--with quite some prejudice, even--any of my
       | projects on GitHub (for a number of reasons that are off topic
       | right now)... it didn't matter, though: people take your code and
       | upload it to GitHub themselves (which is their right); so you
       | can't avoid Copilot by simply self-hosting your repositories.
        
         | BiteCode_dev wrote:
         | Github is just the begining, they will crawl any open source
         | code, crawling npm, pypi, cpan, public gitlab...
         | 
         | If your code is open source, they will get it.
         | 
         | That's kinda the point of open source.
        
           | stingraycharles wrote:
           | I'd argue that this new use case is very interesting to open
           | source and how it relates to the various licenses, and not
           | necessarily "the point of open source".
           | 
           | I can imagine people being OK with their code being used as-
           | is, and/or being modified, but not used completely out of
           | context to train some corporate AI to inject code into
           | commercial code based.
        
             | CameronNemo wrote:
             | Agreed. I am considering relicensing all of my permissively
             | licensed code because of this. The fundamental assumptions
             | I had when releasing that code under a permissive license
             | have been violated.
        
         | nxc18 wrote:
         | Indeed, the Windows Research Kernel itself is on GitHub. Kind
         | of amazing that Microsoft is hosting their own pirated OS
         | kernel.
         | 
         | https://github.com/cnsuhao/Windows-Research-Kernel-1
        
       | ipaddr wrote:
       | Is there a local open source version of this?
        
       | throwaway_09870 wrote:
       | Semi-related question, The MIT license template has "Copyright
       | (c) 2021 <copyright holders>", but don't I have to register
       | copyrights somewhere? I've always been confused by this. Do I
       | just stick "Copyright MyName" in my GitHub repos? It seems like
       | this is what most people do..
        
         | city41 wrote:
         | In the US, copyright is automatically granted: "Copyright
         | protection in the United States exists automatically from the
         | moment the original work of authorship is fixed"
         | 
         | https://www.copyright.gov/circs/circ01.pdf
        
         | macintux wrote:
         | Copyright does not require registration, although there may be
         | advantages to doing so.
        
       | tyrex2017 wrote:
       | My feeling is that only 30% of the outrage against Copilot is
       | honest.
       | 
       | 50% is anti-big-tech and 20% is our fear of being made redundant.
        
       | okareaman wrote:
       | Of all the hills to die on, this seems like an odd choice. Why
       | not work with others to iron out the legal and technical issues
       | with this new technology?
        
         | qayxc wrote:
         | It's a reflex with some people and that's OK.
         | 
         | Not everyone has the patience and ability to discuss their
         | objections in a public forum while their rights are being
         | violated (in their view).
         | 
         | Some people have a passion and a very strong belief in their
         | ideals and I applaud them for following through with it, even
         | if I don't necessarily share their opinion on the matter.
        
       | supergirl wrote:
       | how well does copilot respect licenses? I'm willing to bet it
       | accidentally ingested some GPL code and will be spitting it out
       | at some point. would that be allowed by GPL?
       | 
       | also, someone could deliberately obfuscate the license text to
       | fool it but still be clear enough for humans. something like
       | "License: if you use this source code to train a bot then you
       | must obtain a commercial license, otherwise MIT license applies".
       | bot searches for "MIT" and thinks it's safe.
        
       | jpswade wrote:
       | Pretty sure copyright and IP laws don't overrule innovation.
        
       | hiyou102 wrote:
       | It seems weird to do this over a feature that is still in
       | technical preview, we don't even know if this product will ever
       | ship publicly. I'm guessing a public release is still years off
       | given the number issues they need to work through before release.
       | My understanding is that they are working on an attribution
       | system to catch cases with common code. Beyond that this person
       | seems to use the MIT licensed code which already can be used
       | internally by a company to host a proprietary service without
       | attribution. It would make more sense to be outraged if you were
       | using AGPL or something.
        
       | nilshauk wrote:
       | I used to admire GitHub for being a fully bootstrapped company
       | and free to pursue a path in the world they believed in as a
       | company.
       | 
       | Since the Microsoft acquisition it's becoming painfully obvious
       | how unhealthily centralized the dev world has become, and they
       | seem to strive to become ever more entrenched in the name of
       | maximizing shareholder value.
       | 
       | I only have a small amount of open source projects on GH but I
       | intend to vote with my feet and abandon the platform by self-
       | hosting Gitea. By itself it won't be a big splash but I'm
       | inspired by posts such as this and I hope to inspire someone else
       | in turn. Of all people we devs should be able to find good ways
       | to decentralize.
        
         | Karrot_Kream wrote:
         | What does this have to do with Copilot?
        
           | [deleted]
        
         | remram wrote:
         | In this case that might not help you at all. If your project is
         | popular enough, somebody will mirror it on GitHub, where they
         | are free (or believe they are) to incorporate your code in
         | Copilot. Voting with your feet might be helpful long-term but
         | will not protect you from this particular "feature".
        
       | jollybean wrote:
       | The argument that 'machines can learn from the code to produce
       | something novel' doesn't bode well given copilot may very well
       | produce code that is straight up cut and paste.
       | 
       | This just seems like a massive lawsuit waiting to happen.
       | 
       | What happens when you discover that you're using '20 lines of
       | code from some GPL'd thing'?
       | 
       | What will your lawyers say? Judges?
       | 
       | It seems to me that if you use Copilot there's a straight up real
       | world chance you could end up with GPL'd code in your project. It
       | doesn't matter 'how' it got there.
       | 
       | I don't understand therefore how any commercial entity could
       | allow this to be used without absolute guarantees they won't end
       | up with GPL'd code. Or worse.
        
       | 29athrowaway wrote:
       | Microsoft being Microsoft.
        
       | cjohansson wrote:
       | It's strange how this thing was not an opt-in feature at GitHub.
       | I also feel like this thing is a violation of my integrity and I
       | will consider stop using GitHub as well
        
         | lolinder wrote:
         | From their perspective, they weren't doing anything that abused
         | their privileged position. Tabnine trained their model on open
         | source code, much of which was probably hosted on GitHub. Why
         | should GitHub have to ask permission if tabnine didn't?
         | 
         | Whether training an ML model on code is fair use is still an
         | open question, but I don't think GitHub is a greater villain
         | here than anyone else doing the same thing (at least until they
         | start using private repos).
        
       | talkingtab wrote:
       | The essential isue is simple: taking someone's work product and
       | financially profiting from that work without paying for it. No
       | matter what, that is just wrong.
        
       | breck wrote:
       | It's time to abolish copyright (http://www.breckyunits.com/the-
       | intellectual-freedom-amendmen...). It absolutely makes no sense--
       | unless your rich and don't care about the progress of the arts
       | and sciences.
       | 
       | You can spin your wheels all you want but going from simple first
       | principles it is fundamentally flawed. If you believe ideas can
       | be property, then you believe people can be property.
        
         | tomtheelder wrote:
         | > If you believe ideas can be property, then you believe people
         | can be property.
         | 
         | Can you defend that? I generally think copyright isn't a great
         | idea as it exists, but this statement feels extremely dubious
         | _at best_.
        
       | voidnullnil wrote:
       | This is hyperbole / pretend outrage. No sane person claims to be
       | outraged at a company because they made a silly oversight in
       | their experimental product. Obviously Github can just create one
       | instance of Copilot trained with each incompatible license. Even
       | if it used heuristics to find out the license, a tiny subset of
       | code that is accidentally admitted into the training would be
       | negligable, and copyright concerns in software have always been
       | already overblown to begin with.
        
       | kuon wrote:
       | I've given thousands of hours to open source projects, I really
       | think open source is a pillar of modern society. So you would
       | think I am all for something like copilot, but no.
       | 
       | At first I thought this was a great feature, because easier
       | access to code, but after some reflection, I am also very
       | skeptical.
       | 
       | I am able to make my code open source, because I can make a
       | living out of it, and I have a lot of open source code that I
       | love to share for things like education or private stuff, but if
       | you want to use it for something real, you need to hire me. If
       | you can suck all the code without even I noticing it, that's not
       | fair.
       | 
       | The other thing is code quality. I don't want to sound rude, but
       | there are tons of bad code around. Not necessarily because the
       | author is unskilled, but because the code might not need to be
       | high quality (for example I wrote a script to sort my photos, it
       | was very hastily written and specific to my usage, I used once
       | and was done with it). Also, there are some bad/wrong pattern
       | that are really popular.
       | 
       | I am surprised you are able to DMCA a twitch stream because
       | someone whistle india jones theme but in this case it is
       | considered fair use.
        
         | jefftk wrote:
         | _> have a lot of open source code that I love to share for
         | things like education or private stuff, but if you want to use
         | it for something real, you need to hire me. If you can suck all
         | the code without even I noticing it, that 's not fair_
         | 
         | Co-pilot aside, that's already how it works today. If you make
         | something open source, I can use your code to power my
         | business, and I'm under no obligation to hire you. It's great
         | when companies give back to open source, either by supporting
         | the projects they depend on, or by open sourcing their own
         | internal projects, but it's not obligatory.
         | 
         | If you don't want people to independently profit from your
         | code, don't release it under a license that allows commercial
         | use
        
           | johnday wrote:
           | It sounds like the person you're responding to _already_
           | releases their code under a non-commercial license. The
           | problem with Copilot is that it may allow commercial
           | enterprises to _avoid_ such a license by copying the code
           | verbatim from their repositories, possibly without any party
           | involved knowing that it 's happened.
        
             | jefftk wrote:
             | Where are you seeing that they are using a non-commercial
             | license?
             | 
             | (And non-commercial licenses are not open source:
             | https://opensource.org/osd)
        
           | krono wrote:
           | > If you make something open source, I can use your code to
           | power my business
           | 
           | "Open source" isn't a license. You're not allowed to just use
           | any open source software that doesn't contain a license by
           | default.
        
             | Denvercoder9 wrote:
             | The conventional definition of "open source" is software
             | licensed under a OSI-approved license.
        
             | jefftk wrote:
             | The standard definition of "open source" is
             | https://opensource.org/osd, which has:
             | 
             | "Open source doesn't just mean access to the source code.
             | The distribution terms of open-source software must comply
             | with the following criteria: ... The license must not
             | restrict anyone from making use of the program in a
             | specific field of endeavor. For example, it may not
             | restrict the program from being used in a business, or from
             | being used for genetic research."
        
         | xyzzy_plugh wrote:
         | > I am able to make my code open source, because I can make a
         | living out of it, and I have a lot of open source code that I
         | love to share for things like education or private stuff, but
         | if you want to use it for something real, you need to hire me.
         | If you can suck all the code without even I noticing it, that's
         | not fair.
         | 
         | If you license your software such that I can do whatever I want
         | with it, then I can do whatever I want with it. I don't see how
         | you can then go on to claim it isn't fair if I'm using as you
         | allow.
        
           | CameronNemo wrote:
           | Personally, I make a distinction between legal and socially
           | acceptable.
           | 
           | If one of the richest corporations on Earth can't be bothered
           | to share patches for permissively licensed code that they
           | use, I will gladly shame them.
           | 
           | It's a different story for a small shop with no legal
           | department and wariness about being sued over its use of open
           | source code.
        
         | ricardobeat wrote:
         | > I am surprised you are able to DMCA a twitch stream because
         | someone whistle india jones theme but in this case it is
         | considered fair use
         | 
         | Why is it surprising? Indiana Jones is private IP. The code was
         | published with an OSS license explicitly authorizing its use.
        
         | edem wrote:
         | I second the DMCA, it is ridiculous.
        
         | judge2020 wrote:
         | > but if you want to use it for something real, you need to
         | hire me.
         | 
         | No I do not. Even strictly proprietary code can be copied and
         | used in a for-profit way without approval as long as it
         | qualifies for fair use.
        
           | okamiueru wrote:
           | "Something real" and "fair use" don't get along. I'm also not
           | sure fair use trumps licensing, since one is a copyright
           | issue, and the other is the terms of use. You don't get to
           | copy a snippet of GPL code and get away by calling it fair
           | use. At least, I hope it isn't the case.
        
       | daenz wrote:
       | I think it's clear that Copilot pushes boundaries...technological
       | and legal. It makes people uncomfortable and challenges a lot of
       | assumptions that we have about the current world. But this is
       | exactly what I expect from the next revolutionary change in
       | computing.
        
         | xunn0026 wrote:
         | Because if it's somebody that has to push boundaries it's not
         | the plebs it's trillion dollar companies.
        
         | emerged wrote:
         | Yeah we need to work through all the issues this exposes. It's
         | going to be complicate and messy but it's been inevitable for a
         | long time.
        
       | crazypython wrote:
       | GitLab is pretty good: https://gitlab.com
       | 
       | Many of its features are available in the self-hostable free and
       | open-source version, GitLab CE.
        
       | forgotmypw17 wrote:
       | I'm looking for a new place, because of GitHub's new policy of
       | not supporting password authentication.
       | 
       | I sometimes code from devices which are not my own and on which
       | key management is a major impediment and accessibility issue for
       | me.
       | 
       | Does anyone know how those listings were generated? I like their
       | simplicity, and would like to do something similar.
        
       | fshee wrote:
       | Grats. I abandoned a while ago as well. If anyone is looking for
       | a rec for self-hosting: Gitea is cake. Sits nicely behind Caddy
       | as all my other services do. Alternatives such as Gitlab I found
       | wanted to 'own' too much of my system.
        
       | hawski wrote:
       | I have some of my code BSD0 licensed (in practice public domain).
       | One thing that I'm vary of regarding Copilot is: what would
       | happen if my code would become a part of some proprietary code by
       | a big multinational corporation and then they would DMCA me out?
       | I'm a bit in the middle of a digital housekeeping and I think I
       | will move my code somewhere else, because of it.
        
         | saurik wrote:
         | Your code will end up on GitHub anyway if other people find it
         | useful, as the majority of developers don't even understand you
         | _can_ self host git repositories, so they only know how to do
         | their own development by taking the code they find and putting
         | it on GitHub first.
        
       | monokh wrote:
       | Slightly off topic: Is the git frontend [1] open source? If not,
       | are there some very light self hosting ones like it?
       | 
       | [1] https://thelig.ht/code/
        
         | nfoz wrote:
         | Check out sourcehut!
         | 
         | https://sourcehut.org/
        
         | wmichelin wrote:
         | I was also wondering this. I'm unfamiliar with linux kernel
         | development but this reminds me of that.
        
         | dchest wrote:
         | This looks like https://codemadness.org/stagit.html
         | 
         | Other popular choices are gitweb and cgit (both dynamic).
        
         | prezjordan wrote:
         | You may like Fossil! https://www.fossil-
         | scm.org/home/doc/trunk/www/index.wiki
        
       | dang wrote:
       | Recent and related:
       | 
       |  _Copilot regurgitating Quake code, including sweary comments_ -
       | https://news.ycombinator.com/item?id=27710287 - July 2021 (625
       | comments)
       | 
       |  _GitHub Copilot as open source code laundering?_ -
       | https://news.ycombinator.com/item?id=27687450 - June 2021 (449
       | comments)
       | 
       | Also ongoing, and more or less a duplicate of this one:
       | 
       |  _GitHub scraped your code. And they plan to charge you_ -
       | https://news.ycombinator.com/item?id=27724008 - July 2021 (148
       | comments)
       | 
       | Original thread:
       | 
       |  _GitHub Copilot_ - https://news.ycombinator.com/item?id=27676266
       | - June 2021 (1255 comments)
        
       | owlbynight wrote:
       | Has this person been in a coma? If I utilize a free service on
       | the Internet, I'm trading for some kind of convenience with the
       | knowledge that I am in some way being boned in the backend by
       | teams of people, all of whom are likely more clever than I am and
       | using my patronage to some kind of nefarious end.
       | 
       | The Internet isn't really a place to exercise an inflexible moral
       | code. His new repository probably can be traced back to slave
       | labor somehow if someone digs deep enough. Probably won't even
       | take 6 degrees of separation.
       | 
       | If it makes it easier for me to code and gives me more time to do
       | something other than work without doing irreprovable harm to some
       | sentient entity, I'm firmly in the who gives a shit camp.
        
         | kennywinker wrote:
         | > His new repository probably can be traced back to slave labor
         | somehow
         | 
         | And you're ok with that? It doesn't HAVE to be like this. Just
         | because you've chosen nihilism, doesn't mean that's the only
         | choice, and it certainly doesn't help anything.
        
           | owlbynight wrote:
           | Of course I'm not okay with that, but I'm also not
           | disillusioned about my lack of control over the technology in
           | which I've chosen to build my life around. We parasites can't
           | complain really complain that our hosts smell like shit when
           | we're riding them to the bank, can we?
           | 
           | It doesn't HAVE to be like this, it just is, and all of the
           | alternatives suck. If you want to choose to inconvenience
           | yourself in order to pass a morality test that doesn't exist,
           | go ahead I suppose.
        
       | [deleted]
        
       | einpoklum wrote:
       | I wish I had the guts to leave only "tombstones" for my GitHub
       | projects, pointing to other sites where they're actually stored.
       | 
       | Unfortunately, GitHub enjoys the effect of most people being on
       | it (correct me if I'm wrong), and leaving it is costly,
       | regardless of whether the alternative is a reasonable service or
       | not.
        
       | [deleted]
        
       | soheil wrote:
       | To address a lot of the negativity around copyright fair-use,
       | Copilot should have probably adopted something like
       | Stackoverflow's model where contributors get rewarded by points.
       | In this case the repo that the code used by Copilot came from
       | would get a new type of star rating and the more people used it
       | Copilot would assign more stars. Fractional stars would be
       | awarded depending on what fraction of each code snippet Copilot
       | thinks came from a specific repo...
       | 
       | It could maybe at some point send rewards in form of donations
       | etc. from Copilot users, similar to Sponsored repos.
        
       | NomDePlum wrote:
       | Very simplistically, my understanding on these matters is:
       | 
       | "You know what you know."
        
       | MillenialMan wrote:
       | I think there's an argument to be made that machine learning is a
       | compression algorithm, so training a model on copyrighted data is
       | quite direct copyright infringement - you're essentially
       | compressing, then redistributing, that data.
       | 
       | Has this ever been used as an argument in a legal case?
        
       | [deleted]
        
       | [deleted]
        
         | rubyist5eva wrote:
         | Your mom
        
       | ddmma wrote:
       | Interestingly enough only announced copilot as extension this
       | became a problematic while the model generated code since was
       | launched. I suppose it's difficult to prepare billions of line
       | codes or data points and everyone to be happy.
        
       | [deleted]
        
       | gfodor wrote:
       | This is a hell of a Pandora's box that's being cracked open here.
       | 
       | Interesting times ahead. For example, if you believe these kinds
       | of tools will become a huge competitive advantage, and that the
       | inclusion of GPL code is a meaningful force multiplier, it kind
       | of implies the fusion of AI code generation and the GPL will eat
       | the world.
        
         | saurik wrote:
         | Only if people understand that the result is under GPL; if they
         | don't, then this is a mechanism to slowly "launder" the work
         | people put into GPL code to funnel into non-GPL codebases.
        
           | xbar wrote:
           | Why is human understanding going to prevent this? Doesn't it
           | seem like this is precisely the de facto function of Copilot:
           | a license laundering machine?
        
             | saurik wrote:
             | If humans understand this then presumably lawyers would
             | start hunting for code replication caused by Copilot--using
             | automated mechanisms similar to those used by professors at
             | Universities to catch people cheating--and do the moral
             | equivalent of ambulance chasing: offering to file all the
             | paperwork on spec for a cut of an assured payout. But if
             | people in general believe this to be fair use somehow, then
             | GPL is essentially dead (I have been a big advocate for it
             | over the years, and if people are doing this--and everyone
             | thinks it is OK--then it loses the entire point as far as I
             | am concerned).
        
           | gfodor wrote:
           | It depends on which "people" you're referring to. I suspect
           | the degree to which the programmer knows this is of little
           | relevance to the question of how the legal + risk management
           | implications will play out.
        
             | saurik wrote:
             | I mean general people people, not only developers: people
             | includes managers and lawyers and politicians and everyone
             | who might cause you to have GPL Copilot separate from MIT
             | Copilot... the same people who right now cause licenses to
             | matter, despite many developers not understanding anything
             | about copyright law and just thinking "I'll steal that
             | other developer's work as it makes my life easier".
             | 
             | If anything, I think the real test of this tech is going to
             | be audio, as it has the right overlap of "big copyright is
             | going to get pissed", "there already exist tools that
             | attempt to automatically detect even small bits of
             | infringement", "people actually litigate even small bits of
             | infringement", and "it feels feasible in the near future":
             | you whistle a tune, and the result is a fully produced
             | backing track that sometimes happens to exactly sound like
             | the band backing Taylor Swift on a recognizable song and
             | generates Taylor Swift's voice, almost verbatim, singing
             | some of her lyrics to go along with it.
        
       | Yaina wrote:
       | This is not the first person I've seen that ditches GitHub in
       | favor of some other front-end... and it's not uncommon that they
       | look like this which is often baffling to me.
       | 
       | Say what you want about GitHub's almost monopoly position, but
       | the UX is really great and accessible even to non-technical
       | people. Maybe you don't need that, maybe you don't want the
       | issue-trackers, but it's worth thinking about who you're
       | excluding with these kind front-ends.
        
       | orlovs wrote:
       | Lets face, gitlab and github valuation is based on future "ai"
       | code autogens. Brave new world
        
         | yawaworht1978 wrote:
         | I cannot see them even throwing together a library. For
         | example, how could they architect, let's say something like
         | jQuery?
        
           | voidnullnil wrote:
           | Coming up with this "return $this" pattern to emulate a
           | composition operator seems like a very AI thing to do.
        
         | yonixw wrote:
         | Based on their website, Gitlab are pushing the CI\CD future not
         | AI.
        
           | canadianfella wrote:
           | The British way of using "are" on a non-plural word is weird
           | to me and always looks very awkward.
        
       | qayxc wrote:
       | The irony of it all is that their code will find its way into the
       | next Common Crawl release anyway and that's used to train GPT-3,
       | which in turn forms the basis of OpenAI Codex, which is the
       | product that CoPilot builds on...
       | 
       | So hosting elsewhere _might_ not safe your code from ending up
       | deep in the bowels of some corporate black-box ML model that
       | occasionally regurgitates your IP if accidentally given the wrong
       | (right?) prompt.
       | 
       | If you make your code public, you basically accept that someone
       | will copy it verbatim. Other companies still might have it in
       | their closed source product somewhere, even if it's just
       | accidental copypasta from SO.
        
       | enraged_camel wrote:
       | It's kind of interesting how quickly sentiment turned negative.
       | The original feature showcase/announcement post was full of
       | excitement by HN (which is kind of strange, if you think about
       | how skeptical the HN crowd is towards AI/ML and automation of
       | programming) but it hasn't been a week and people are already
       | talking about the questionable ethics and potentially disastrous
       | consequences of using the feature.
        
         | arp242 wrote:
         | I can't speak for anyone else, but when I first saw it, it
         | seemed kind of okay, but I also didn't really look too deeply
         | in to it. As I've looked at it a bit more closely and thought
         | about it for a few days, my original feelings have soured quite
         | a bit.
         | 
         | I never considered the copyright and related ethical
         | implications of ML at all, or thought about the impact it may
         | or may not have on programmers. Your first thoughts on
         | something can be wrong (and actually, often are) and it takes a
         | bit to really think things though - or at least, it does for
         | me.
        
         | tyingq wrote:
         | Do you mean this post?
         | https://news.ycombinator.com/item?id=27676266
         | 
         | There's plenty of skepticism there, even in the early comments.
        
       | maximilianroos wrote:
       | To what extent are these expressions driven by a genuine
       | allegiance to strict copyright laws?
       | 
       | As opposed to an anxiety that a machine might be able to do some
       | of our jobs better than we can?
        
       | nomercy400 wrote:
       | This is exactly why people have issue with Github's Copilot.
       | 
       | It's not the technology, but the fact that any code you pushed to
       | GitHub in the past 13 years is now 'accessible' to anyone.
       | 
       | Private repo? Paid account? Deleted repo five years ago? Deleted
       | repo today? Proprietary code? Embarassing commits? Accidental API
       | keys or passwords in commits?
       | 
       | All 'available'.
       | 
       | It feels like the entirety of GitHub was just 'leaked', and
       | converted into a marketable product.
       | 
       | Would you push your code to a service if you knew it could be
       | read by anyone one to ten years from now? Even if you paid to
       | keep it a secret?
        
         | jeroenhd wrote:
         | I know that some people have uploaded the Microsoft research
         | kernel or even the leaked Windows source code to github at some
         | point.
         | 
         | I wonder what Microsoft will do when snippets from that code
         | start appearing in your code because of copilot. I'm guessing
         | their lawyers wouldn't accept "the robot did it" as an excuse
         | in that case.
         | 
         | I'm tempted to just throwing stuff like "AWS_KEY=" at the
         | algorithm and see how many working credentials I can steal from
         | private repos.
        
           | enriquto wrote:
           | > I'm tempted to just throwing stuff like "AWS_KEY=" at the
           | algorithm and see
           | 
           | Anybody tried? What does actually happen if you do this kind
           | of thing? I can think of a few more obvious "script kiddie"
           | ideas, but I won't post them here lest a copilot developer
           | sees it and closes all the elementary stuff.
        
         | danielbln wrote:
         | Wasn't Codex (the tech underlying CoPilot) trained on purely
         | publicly available repos?
        
           | lars wrote:
           | Yes, it was. From their site: "It has been trained on a
           | selection of English language and source code from publicly
           | available sources, including code in public repositories on
           | GitHub."
        
           | IshKebab wrote:
           | Yes. nomercy400 is wrong.
        
           | WillDaSilva wrote:
           | The issue of deleted repositories being available through it
           | would still exist. Whether or not GitHub should be blamed for
           | that is another matter.
        
             | lolinder wrote:
             | Once you put something on the internet, you should assume
             | it still exists out there somewhere even after deleting it.
             | Even before copilot, all credentials that end up in a repo
             | needed to be changed. I'm not sure what's supposed to be
             | different now.
        
         | ForHackernews wrote:
         | > Would you push your code to a service if you knew it could be
         | read by anyone one to ten years from now? Even if you paid to
         | keep it a secret?
         | 
         | I'm old enough to remember when "assume anything you put in
         | cleartext online is public" was received wisdom. We were taught
         | that if you want to keep something private, keep it encrypted
         | on your own local media. Or, failing that, at least on a server
         | you control.
        
         | lolinder wrote:
         | I'm not sold on the product, but it's important to note that
         | GitHub Copilot was only trained on public repos, which means
         | nothing should be out in the open that wasn't already made
         | public by the authors.[0]
         | 
         | > GitHub Copilot is powered by OpenAI Codex, a new AI system
         | created by OpenAI. It has been trained on a selection of
         | English language and source code from publicly available
         | sources, including code in public repositories on GitHub.
         | 
         | [0] https://copilot.github.com/
        
       | M4v3R wrote:
       | While I understand the sentiment wasn't Copilot trained on code
       | not only hosted on GitHub, but found all over the Internet? Which
       | means hosting your code yourself would not prevent GitHub from
       | using it to train Copilot. That raises an interesting question
       | though - how do you opt out? Is there even a way to do it?
        
         | brobdingnagians wrote:
         | I guess it goes back to closed source / trade secrets
         | territory. If you have something you really don't want stolen,
         | it is safer to never expose it and never trust that the law
         | will fairly protect you.
         | 
         | The irony is that copilot won't suggest its own source code,
         | just everyone else's. It is open source without the benefits.
        
           | axismundi wrote:
           | Smells like Microsoft
        
             | yumraj wrote:
             | More like _Open_ AI
        
         | bryanrasmussen wrote:
         | robots.txt, or a copyright notice saying the code can't be used
         | to train AI which bots will ignore and open their corporate
         | masters to liability.
         | 
         | on edit: fixed typo
        
           | ezoe wrote:
           | Bad news for you. Japanese copyright law, article 47-7
           | explicitly allow using copyrightable works for data analysis
           | by means of a computer(including recording a derivative work
           | created by adaptation)
           | 
           | It should be considered as fair-use of USA except we don't
           | use Common Law system so we explicitly state what exempt from
           | the copyright protection.
        
             | bryanrasmussen wrote:
             | Thanks for the bad news! Not glad to hear it but glad to
             | know something I didn't.
             | 
             | That said - so they would be able to sell some things in
             | Japan that they couldn't other places.
        
               | amelius wrote:
               | What if a Japanese software company uses it to write
               | software, which is then sold in the US?
               | 
               | It's still copyright laundering, if you ask me.
        
           | blihp wrote:
           | robots.txt is a convention for those who want to be good 'web
           | citizens' rather than legally binding. It does absolutely
           | nothing to stop someone who ignores your wishes. For example,
           | there are tons of bots that ignore robots.txt entirely or
           | even go straight for the thing (i.e. 'hey, thanks for telling
           | us where to look!') you're telling them to avoid in
           | robots.txt. While copyright is a mechanism that can be used
           | if you can make the case, and have the means, it will only
           | work for entities that have something to lose and are within
           | a jurisdiction where it matters.
        
           | woodruffw wrote:
           | This kind of learning needs to be opt-in, not opt-out.
           | 
           | I would also be extremely surprised if most open source
           | copyright holders didn't already expect their licensing terms
           | to protect against this kind of code/authorship laundering.
           | Speaking individually, I know that it certainly surprised me
           | to hear that GitHub thinks that it's probably okay to
           | regurgitate entire fragments of the training set without
           | preserving the license.
        
         | rvense wrote:
         | I'm not surprised. I imagine all images on the internet are
         | used to train image classifiers as well. It's a shitty future,
         | but it's the one we have.
        
           | hojjat12000 wrote:
           | Researchers in our lab created a huge dataset of facial
           | expressions from images on the web, annotated it and
           | published the URLs to the images and the annotations for
           | research but made sure to search only for images with proper
           | licenses. I don't think that you are allowed to just go
           | download any old image and train on it. I understand the many
           | many people do it, but it's not legal (as far as I know,
           | please correct me if I'm wrong).
        
             | sillysaurusx wrote:
             | > I don't think that you are allowed to just go download
             | any old image and train on it.
             | 
             | My understanding as a two-year student of ML is that you
             | are allowed in the US to go download any old image, train
             | on it, and then release the model as long as the outputs
             | are "sufficiently transformative."
             | 
             | That last phrase is the key part, and has never been tested
             | in court. It's entirely possible that either I'm mistaken
             | here, or that the courts will soon say that I am mistaken
             | here. https://www.youtube.com/watch?v=4FA_gt9w28o&ab_channe
             | l=guava...
        
               | saurik wrote:
               | To be clear: "transformative" not meaning merely
               | "altered" but really meaning "repurposed"; if the new
               | work is something people could feasibly use instead of
               | the old work (harming the author's original market), it
               | isn't "transformative".
        
               | sillysaurusx wrote:
               | Yes. For example, arfa ran into this question when
               | launching https://thisfursonadoesnotexist.com/. Lots of
               | furry artists had exactly the same concerns with his work
               | there, but that work is decisively transformative.
               | 
               | Copilot seems ... well, less transformative. I'm still
               | not sure how to feel.
        
         | bsd44 wrote:
         | I would like to know this too. I understand that GitHub is a
         | private company and you have to accept their T&C, but surely
         | they aren't allowed to use source code found elsewhere on the
         | internet to train their ML models without asking for permission
         | first unless it's a B2B cooperation such as with Stackoverflow.
        
           | [deleted]
        
           | lacker wrote:
           | According to the discussion at this link, you do not need
           | permission to use copyrighted data to train AI models.
           | Copyright prevents you from copying data, it doesn't prevent
           | you from learning from it.
           | 
           | https://twitter.com/luis_in_brief/status/1410985742268911631.
           | ..
        
             | marcosdumay wrote:
             | To train your model, yeah, probably ok. But I don't think
             | anybody will see people using the duplicated code that AI
             | insert on your codebase the same way.
        
           | lloydatkinson wrote:
           | Oh man I can't imagine the consequences for certain languages
           | and frameworks if it uses SO answers though. Imagine if it
           | trained in all the dumb and ancient answers like "how do I
           | get the length of a string in javascript" and took the first
           | accepted answer of "use jquery"
        
             | bsd44 wrote:
             | This raises an issue of trolling. What prevents developers
             | to generate "inappropriate" code to feed it to this
             | algorithm the same way they did with the Microsoft Chat bot
             | for example? That will surely reflect on the quality of
             | code generated by this AI system and therefore the
             | stability and security of applications built.
        
               | kingofclams wrote:
               | I'm sure this will happen, and there will definitely be
               | instances of the bot giving users bad code, but it would
               | be incredibly difficult to make it solely give out bad
               | code.
        
       | throwaway3699 wrote:
       | Are people angry at copyright violations, or Microsoft? Copyright
       | and patents have their place, but they've clearly overreached
       | long ago.
        
       | EugeneOZ wrote:
       | Some unknown person is trying to get some hype on "cancel github"
       | cry.
       | 
       | I don't give a shit about the Copilot, but I care even less about
       | Rian Hunter and his statements.
        
         | Nicksil wrote:
         | >Some unknown person is trying to get some hype on "cancel
         | github" cry.
         | 
         | >I don't give a shit about the Copilot, but I care even less
         | about Rian Hunter and his statements.
         | 
         | This is untrue because you had a choice of not saying anything
         | at all and carrying on (clearly not giving a shit) or take the
         | time to leave such a comment (giving enough of a shit to inform
         | everyone you don't give a shit.) So far this and Lloyd's is the
         | only crying going on in this topic.
        
           | EugeneOZ wrote:
           | This is true. And I didn't even read your nickname because I
           | don't give a shit about a shmuck who is trying to tell me
           | what I care about :)
        
             | lloydatkinson wrote:
             | It's amazing he seems so butt hurt. I think it's an alt of
             | the blog author.
        
           | Dylan16807 wrote:
           | Is it not obvious that you can care about a post on HN
           | without caring about the page it links to?
           | 
           | Back away from this specific situation for a second: If you
           | would ignore something entirely if it wasn't being shoved in
           | your face, complaining about it being shoved in your face and
           | saying it's stupid wouldn't mean you suddenly "care" about
           | the underlying item.
           | 
           | (And no, I'm _not_ saying that an HN post is shoved in your
           | face. It 's a more extreme example to make the point more
           | clear.)
        
         | lloydatkinson wrote:
         | My thoughts too
        
           | Nicksil wrote:
           | Also
           | 
           | >Who asked?
           | 
           | But you deleted that comment.
        
             | [deleted]
        
             | lloydatkinson wrote:
             | Who asked?
        
               | Nicksil wrote:
               | >Who asked?
               | 
               | Now this was completely unexpected.
        
       | wyldfire wrote:
       | What a cool court case this would make. Is copilot's model
       | sufficiently abstracted from the code it has read? Judges and
       | juries learning about how the GitHub team avoided overfitting?
       | Are humans who have read open source code producing derivative
       | works?
       | 
       | Won't be long until we see an infringement case. /me grabs
       | popcorn
        
       | gbtw wrote:
       | Does github guarantee that my private repo's content are not
       | being leaked this way in the future?
        
         | orlovs wrote:
         | Nah, all will gonna be fine
        
         | errata wrote:
         | Yes
        
           | eCa wrote:
           | Source?
        
             | ralph84 wrote:
             | https://docs.github.com/en/github/site-policy/github-
             | privacy...
        
       | tvirosi wrote:
       | This huge revolt is interesting but I doubt it makes github very
       | scared. They'll just come out with some new version of it which
       | they'll show takes into account licenses (or uses a 10k or
       | something dataset with hand checked licenses) and that'll be that
       | and we'll forget about all this the week after.
        
       | hashhar wrote:
       | Why is this noteworthy? Who is this person? Am I missing
       | something?
       | 
       | I agree that there needs to be talk about licensing and copyright
       | but with so "less/no content" there can be no meaningful
       | discussion other than aimless banter.
        
         | pcthrowaway wrote:
         | > Why is this noteworthy? Who is this person? Am I missing
         | something
         | 
         | Why is this comment noteworthy? Who is this person? Am I
         | missing something?
        
         | eitland wrote:
         | > Who is this person?
         | 
         | One of the beautiful things about HN is that you don't need to
         | be anything, you just have to have something interesting to
         | say.
        
           | IshKebab wrote:
           | Right, but you either need a solid argument or some
           | authority, and this guy has neither. He's effectively a
           | nobody and he has just jumped to the conclusion that CoPilot
           | is illegal.
           | 
           | If he had a good argument for that, fine. But without that he
           | really needs to be someone whose opinion I care about.
        
             | NiceWayToDoIT wrote:
             | This is somehow inverse logic. Does rape victim needs
             | authority to voice raping in order to validate it? What is
             | there that is not solid, CoPilot is using community code
             | that is under GPL licence therefore Microsoft should not be
             | able to charge for CoPilot but give it for free, or not
             | create another revenue stream.
        
           | meibo wrote:
           | This isn't interesting though. It doesn't even provide any
           | value. It's a random guy that doesn't like GitHub, it could
           | have just as well been a HN comment from yesterday.
           | 
           | It's just posted(not by the guy that made the page, mind you)
           | to farm karma, exploit the news cycle and carve out some more
           | space for discussion of this tired topic.
        
             | [deleted]
        
             | eitland wrote:
             | If it sparks the necessary discussions I don't care if it
             | was written by Joe Random Nobody or Joe Biden.
             | 
             | > It's just posted(not by the guy that made the page, mind
             | you)
             | 
             | Others would complain if the author himself had posted
             | this.
        
               | Dylan16807 wrote:
               | The necessary discussion was already sparked.
        
               | eitland wrote:
               | Well, a lot of the people with voting rights here
               | obviously thought otherwise.
        
               | Dylan16807 wrote:
               | An upvote doesn't mean you think something is new or
               | needed sparking. There are very often redundant posts on
               | a topic.
        
             | exolymph wrote:
             | why are you on an upvote-based aggregator + forum if you're
             | not looking for upvote-based links + commentary?
        
         | qwertox wrote:
         | How about just leaning back and reading the discussions which
         | evolve out of this post? Some may have something to say about
         | it which will either help you solidify your point of view or
         | add a new perspective to it which you might have missed.
         | 
         | The topic is a current one [1], which makes it even more
         | valuable.
         | 
         | [1] https://news.ycombinator.com/item?id=27676266
        
         | judge2020 wrote:
         | They're not particularly popular on HN:
         | https://news.ycombinator.com/from?site=thelig.ht (except for
         | https://news.ycombinator.com/item?id=18133450 )
         | 
         | And their only huge project on GitHub is dbxfs, a userspace
         | dropbox filesystem with 687 stars
         | https://github.com/rianhunter?tab=repositories&q=&type=&lang...
         | 
         | I think this is just a post meant to continue the discussion of
         | CoPilot past the first 2 days of news.
        
           | pmarreck wrote:
           | I'm glad the post showed up because I've been in the hospital
           | for 3 days and I was like HOLY SHIT WHAT IS THIS? ;)
        
       | smartmic wrote:
       | Maybe now is the time to release a GPLv4 extending-restricting-
       | relating the four freedoms to non-humans.
       | 
       | I expect the best lawyers from Microsoft have had a look into
       | this and maybe there a weaknesses in GPLv3 ready to exploit for
       | corporate AIs. What is the response from the FSF?
        
       | zmmmmm wrote:
       | Seems to me like they need to back out of this fast and at very
       | least limit it such that it is only trained and then used on
       | "license compatible" projects. eg: train it in isolation on MIT
       | licensed projects and then have the user explicitly confirm what
       | license the code they are working on is to enable it. Possibly
       | they even need to auto-enable a mechanism to detect when code has
       | been reused verbatim and enable some kind of attribution (or
       | respect for other constraints) where that is required by the
       | license.
        
         | shadowgovt wrote:
         | Alternatively, they'll take it head-on, pay their lawyers to
         | argue fair use, and blaze a new trail through the understanding
         | of copyright application that allows this ML model (and others
         | like it) to exist.
         | 
         | This is ultimately a Microsoft project, and they have Microsoft
         | money and Microsoft lawyers to defend their position.
        
       | na85 wrote:
       | Feels like everyone is missing the point: Copilot will ultimately
       | serve to weaken the arguments in support of software patents and
       | copyright.
       | 
       | That can only be a good thing for society (though perhaps not for
       | rent seekers).
        
         | teflodollar wrote:
         | If all copilot output was automatically GPL, I would think it's
         | fantastic. As it stands, it seems to undermine GPL the most.
        
           | CameronNemo wrote:
           | They should really have trained models based on the license,
           | so a GPL-2.0-only model, 2+, 3+, 3 only LGPL 2.1(+), CDDL,
           | MIT, et cetera.
           | 
           | As it stands, the combined inputs leaves the model in the
           | most murky of gray areas.
        
         | MontyCarloHall wrote:
         | Surprised I had to scroll so far to find this, given how
         | copyleft and straight-up anti-IP so much of the open source
         | community is.
         | 
         | I think a lot more people on this site (and in the FOSS
         | community in general) would be on board with Copilot if it
         | respected viral licenses, e.g. if it had a way of inferring
         | that the code it was copying verbatim were GPL-3 and warned the
         | user that including it in their project would require them to
         | GPL-3 their project as well.
        
           | noobermin wrote:
           | Honestly, that would fix every issue with it. The laundering
           | of the license is the issue.
        
           | ImprobableTruth wrote:
           | But that's literally the issue. The only form of intellectual
           | property that is being damaged by this is copyleft.
        
         | CameronNemo wrote:
         | The arguments are already weak. The judicial precedent,
         | however, is strong. Microsoft will continue to publish
         | proprietary ML models and profit off them, at the expense of
         | the corpus authors (us lowly laborers).
        
         | xunn0026 wrote:
         | Not really. Back when free software was strong, it would have
         | been a good thing for society since Microsoft was selling
         | software in boxes on actual store shelves.
         | 
         | Now 'the edge' is already mostly open source. All the lock-in
         | and value has moved into either infrastructure or in software
         | you don't even get to touch since it runs in the Cloud and you
         | just provide IO to it.
        
           | na85 wrote:
           | I think in this new era of endless security breaches at cloud
           | firms and M1-style processing innovation we'll see a slow but
           | steady migration away from the cloud.
        
             | dcolebatch wrote:
             | I'm going to take the other side of that prediction:
             | 
             | Endless security breaches will encourage firms to do "less
             | IT" themselves and accelerate the adoption of SaaS
             | solutions (and PaaS, with no/low-code etc.)
             | 
             | Also, perhaps not a massive driver but still, not for
             | nothing: M1-style processing innovation (ARM) will see more
             | developers creating for ARM servers, because they can,
             | which will almost exclusively be run by the hyper scale
             | cloud providers.
        
             | hsbauauvhabzb wrote:
             | I used to think like this but the total cost of ownership
             | of on-prem is substantially higher, and it has its own
             | security implications too.
        
         | dimgl wrote:
         | No no, we get it. Some people, like myself, still think that
         | copyrights serve a purpose.
        
           | Dylan16807 wrote:
           | You can say both, you know. That it serves a purpose _and_ is
           | too strong.
           | 
           | And I notice you didn't say anything about patents?
        
         | temac wrote:
         | Not really unless you force the source code of proprietary
         | software to be published. If you don't, copyleft has a role to
         | play.
        
         | ThrowawayR2 wrote:
         | It is certainly fascinating to see people start running away
         | from " _information wants to be free_ " and other Free Software
         | principles full tilt when, all of a sudden, it's _their_
         | livelihoods that are on the line. Unless my recollection is
         | off, the GPL was never the goal of the original Free Software
         | movement; it was merely a tool to get to the end state where
         | all code becomes available for use by anyone for any reason
         | without cost or restriction.
         | 
         | I am reminded of a line from Terry Pratchett's _Going Postal_
         | in relation to a hacker-like organization called the Smoking
         | GNU,  " _...[A]ll property is theft, except mine..._ ", which I
         | thought was rather painfully apt in describing what FOSS
         | evolved into after becoming popular.
        
           | snickerbockers wrote:
           | I can't speak to whether or not Richard Stallman was trying
           | to make some 4-dimensional chess move to remove software
           | restrictions by adding software restrictions when he wrote
           | the GPL back in the 80s, but his original intentions are
           | irrelevant in most cases since most people who license their
           | code under the GPL do not consult with him or consider his
           | opinions when they choose to do so.
        
           | ImprobableTruth wrote:
           | Your recollection is off, majorly. I'd recommend looking up
           | the origins of the FSF/GPL/Copyleft. The entire movement
           | essentially got started because Stallman gave Symbolics his
           | (public domain) Lisp interpreter, then Symbolics improved it
           | but refused to share the improvements.
           | 
           | "No restrictions" has never been the goal and to claim that
           | they're egoistic hypocrites who are just scared for their own
           | livelihood because of this is just an absurd strawman.
        
             | lispm wrote:
             | Stallman did not gave Symbolics his Lisp interpreter.
             | 
             | Symbolics had a license for MIT's Lisp system.
        
           | the_af wrote:
           | > _it was merely a tool to get to the end state where all
           | code becomes available for use by anyone for any reason
           | without cost or restriction_
           | 
           | Your recollection seems to be completely off. That wasn't the
           | goal of the Free Software movement.
           | 
           | Also, the code they champion comes with restrictions and,
           | optionally, cost. So again, you're off.
        
           | na85 wrote:
           | >It is certainly fascinating to see people start running away
           | from "information wants to be free" and other Free Software
           | principles full tilt when, all of a sudden, it's their
           | livelihoods that are suddenly on the line.
           | 
           | Indeed. Everybody is a leet haxors when they're 14, it's
           | 1998, and we're vying for +o in #warez on DALnet. We believed
           | information really did "want to be free".
           | 
           | Unfortunately some of those same kids grew up to create
           | today's data barons and that old saying about getting someone
           | to understand something when their salary depends on not
           | understanding comes into play.
        
       | booleangate wrote:
       | There's a lot of consternation over copyright issues, but I see
       | an entirely different problem. When I hear this tool described
       | and see it's examples the first thing I think is that Github has
       | just automated the dubious process of copy/pasting from
       | StackOverflow.
       | 
       | As a senior developer, I am strongly biased against the SO+c/p
       | programming approach that I've seen many Junior and mid level
       | developers use. There's certainly a time and place for it when
       | you become really stuck but at least having to go out and find
       | the code yourself requires thought which helps you grow.
       | 
       | My gut reaction to Copilot is that adding this automation into
       | IDEs is going to have a net-negative effect on growing developers
       | as it lowers the level of thought and effort necessary to write
       | even trivial applications. This is a huge detriment to learning.
       | You don't even get the chance to try to solve the problems
       | yourself because the AI is going to be proactively getting in the
       | way of your learning.
       | 
       | All that being said, I think a tool like this could be of great
       | use with boilerplate within a project -- but only suggesting
       | things from that project. For example, setting up a new api
       | route, dependency injection, error propagation, etc. Help with
       | all of these mechanical things would be awesome.
        
       | [deleted]
        
       | ralph84 wrote:
       | If you don't want people and/or AI to read your code, why would
       | you post it anywhere? Just post binaries and call it freeware.
        
       | foobarbazetc wrote:
       | We're going to see license additions that explicitly ban ML from
       | being trained on the code soon.
       | 
       | Fun.
        
       | henvic wrote:
       | I really hope this weakens copyright. We can live without it.
        
         | lc9er wrote:
         | Who can? Sure, Disney shouldn't be able to copyright public
         | domain works or Mickey Mouse until the end of time. But they
         | also shouldn't be able to swoop in, use your
         | songs/artwork/software in their latest movie, without
         | permission or appropriate compensation.
        
           | mmastrac wrote:
           | Hobbling copyright would likely make Disney et al much weaker
           | in the future, to where this might not be as big of a deal.
        
         | dvdkon wrote:
         | Well, right now it could also just weaken copyleft while
         | leaving proprietary non-public code copyright holders well-off.
        
         | eddieh wrote:
         | Are you kidding? If I produce any creative work, don't copy it
         | without my permission, full stop. (c) 2021
        
         | caconym_ wrote:
         | Copyright doesn't just benefit huge corporations. For instance,
         | without it, independent artists who rely on copying for
         | distribution (authors, musicians, etc.) would find it much more
         | difficult to make money off their work, mostly (IMO) because
         | large corporate entities with large investments made in
         | publication and distribution systems could simply take content
         | and sell it themselves with zero obligation to the original
         | creator(s). This process could be highly automated at scale,
         | giving creators essentially zero chance to compete in the
         | market.
         | 
         | It's a bad idea.
         | 
         | The thing about copyright law that needs reform is its bias
         | toward the benefit of large corporate entities. Platforms'
         | implementations of DMCA compliance allow "rights holders" to
         | spam perjurious takedown requests en masse, garnishing the
         | earnings of creators and _legitimate_ rights holders in what
         | can only be called (in addition to perjury) outright fraud.
         | Companies like Github scrape the web for content, most of it
         | copyrighted, and use it to construct new products for their own
         | profit. Rare recitation events aside, I think their use case
         | _is_ legitimate fair use in the eyes of the law (and if you
         | look at my comment history you 'll see me vehemently arguing to
         | that effect), but _should_ it be? We don 't seem to be asking
         | that question, which is really disappointing--we're either
         | complaining loudly and without substance, or blithely accepting
         | the might-makes-right ethic as the central pillar of our IP
         | law.
        
           | kingsuper20 wrote:
           | >Copyright doesn't just benefit huge corporations. For
           | instance, without it, independent artists who rely on copying
           | for distribution (authors, musicians, etc.) would find it
           | much more difficult to make money off their work,
           | 
           | That doesn't look like it's the point to me.
           | 
           | ""[the United States Congress shall have power] To promote
           | the Progress of Science and useful Arts, by securing for
           | limited Times to Authors and Inventors the exclusive Right ,
           | to their respective Writings and Discoveries." "
           | 
           | As I read that, copyright is there to 'promote progress', not
           | to maximize gains.
           | 
           | No doubt there is a million linear feet of case law that got
           | us where we are.
           | 
           | Honestly, I rather like this whole question of copilot. I
           | solidly appreciate the brilliance of github as a honeypot.
        
             | caconym_ wrote:
             | > To promote the Progress of Science and useful Arts, by
             | securing for limited Times to Authors and Inventors the
             | exclusive Right , to their respective Writings and
             | Discoveries.
             | 
             | What better way to promote said Progress than by making
             | sure said Authors and Inventors can make enough money off
             | their work to keep doing it? As written, it's a roundabout
             | way to get at the instrumentality of capital, but if that's
             | not what they had in mind then I'm not sure what they
             | _were_ getting at. Without copyright, a creator 's rights
             | to their own work aren't diminished; it's just that
             | everyone else's are expanded to the same level.
             | 
             | (I'd love to know if I'm way off base about this. I'm not a
             | lawyer, and I'm sure it's been discussed to death.)
             | 
             | > Honestly, I rather like this whole question of copilot. I
             | solidly appreciate the brilliance of github as a honeypot.
             | 
             | I think it's really cool, and I'd probably use it myself.
             | As much as my favorite kinds of programming (e.g. writing
             | experimental text editors) might not benefit from it, in my
             | day job I sure would love to spend less time filling in
             | boilerplate and looking up mundane API details.
             | 
             | I don't mean to single Github out in my mention of big
             | corporations benefiting from copyright law. Scraping vast
             | quantities of copyrighted data to build new products is a
             | common business model at this point, and--like other new
             | IP-related paradigms enabled by modern information
             | technology--I think it deserves a fresh look, being mindful
             | of just what it is we're trying to accomplish with
             | copyright law. As you say, it's not always obvious, even in
             | written law.
        
         | matthewmacleod wrote:
         | Don't be too eager! Weakened copyright doesn't necessarily
         | translate to an overall benefit, at least for software.
         | 
         | Weakening copyright also weakens copyleft - for example, it
         | seems reasonable to me that the producer of an open-source work
         | should be entitled to require reciprocal openness from people
         | who build upon it. If I can legitimately launder some GPL
         | source code (say, a Linux kernel driver) through an ML model
         | without being obliged to release the resulting code, I think
         | everyone loses.
        
           | blibble wrote:
           | > I think everyone loses.
           | 
           | only people who have released their code publicly under a
           | (mostly) open license
           | 
           | so, not Microsoft
        
       | jbluepolarbear wrote:
       | Regardless on how I feel about this usage, I'd be more concerned
       | with the very real possibility of introducing vulnerabilities
       | this way. Say the copilot takes a snippet from a code base. That
       | snippet had a vulnerability and was fixed by the team that
       | understood the what and how. How does that vulnerability get
       | fixed? Does copilot let the user know months later that snippet
       | used actually is very bad and that the company that originally
       | implemented fixed it and you should too?
        
       | amelius wrote:
       | Can't you just put a robots.txt file in your project which says
       | "no ML".
        
       | sillysaurusx wrote:
       | Anyone know how they're hosting their repositories?
       | https://thelig.ht/code/ is actually kind of nice and minimalist;
       | I was hoping to set up the same thing, mostly just for kicks.
        
         | lolinder wrote:
         | Googling a bit of the stylesheet suggests that it's stagit, a
         | static page generator for git repos:
         | 
         | https://codemadness.org/stagit.html
         | 
         | Contrast these two pages, and you'll see it's a match:
         | 
         | https://codemadness.org/git/bmf/log.html
         | 
         | https://thelig.ht/code/block-tracing/log.html
        
           | sillysaurusx wrote:
           | Woo! You rock. I was too lazy to do that myself (or at least,
           | lounging around in bed...) so I was hoping a fellow like you
           | would sleuth it.
           | 
           | Thank you. :)
        
         | teflodollar wrote:
         | Cgit
         | 
         | https://git.zx2c4.com/cgit/about/
        
         | varenc wrote:
         | it's built with stagit: https://codemadness.org/stagit.html
         | 
         | Love the minimal style and monospace font!
        
         | linkdd wrote:
         | This reminds me of cgit[1], but the UI seems even simpler.
         | 
         | [1] - https://git.zx2c4.com/cgit/about/
        
       | ta1234567890 wrote:
       | It seems like the most fair way to go would be for Copilot to be
       | completely open sourced and hosted on GitHub. That way they'd be
       | subject to the same terms/conditions they are imposing on
       | everyone else's code/repos.
        
         | adamtulinius wrote:
         | The problem isn't the source code of Copilot, but the code it
         | is outputting.
        
         | cj wrote:
         | They aren't using private repos in their training data.
        
       | rhn_mk1 wrote:
       | I would be more sympathetic to the idea of the co-pilot if apart
       | from being susceptible to stripping licensing information from
       | permissive and copyleft projects, it could also inject copyright-
       | stripped sources of the same amount of closed source code.
       | 
       | As it is now, it works towards weakening the copyright of free
       | software while doing nothing (or very little) to closed software.
        
       | bullen wrote:
       | How does he expose the git stuff? Is that open source?
        
         | null-a wrote:
         | Stagit?
         | 
         | https://codemadness.org/stagit.html
        
       | bryanrasmussen wrote:
       | Lots of people arguing this guy isn't anybody, but the name
       | seemed sort of familiar to me and my quick googling and looking
       | at his site makes me think he probably has done something that
       | some people use? For example dbxfs seems to have quite a history.
       | 
       | on edit: just saw there was a description of who he is
       | https://news.ycombinator.com/item?id=27724247 as noted I don't
       | know but not sure if it's enough to imply a bad motive of him
       | wanting to get some sort of attention for opposing copilot.
       | 
       | on second edit: huh, seems to be one of those occasions when I
       | have mysteriously offended some people on HN without swearing,
       | joking or being rude.
        
       | coliveira wrote:
       | This is typical Microsoft behavior: embrace, extend, and
       | extinguish. They embraced open source with the intention of
       | controlling (GitHub) and exploiting it. And the interesting thing
       | is that many people fell for this already ancient strategy.
        
         | qayxc wrote:
         | So when MS does it it's evil, but it's perfectly fine for
         | everyone else to do it?
         | 
         | I also don't see how any of this follows - they could've just
         | crawled GitLab or any other OSS repository. They didn't even
         | _need_ Github for this.
         | 
         | Heck, is OpenAI doing embrace, extend, and extinguish on the
         | entire web now, because they use Common Crawl [0] to train
         | GTP-3, which forms the basis of CoPilot?
         | 
         | [0] https://en.wikipedia.org/wiki/Common_Crawl
        
         | rvz wrote:
         | Well, I did try to warn the Copilot fanatics [0]. They just
         | downvoted me days ago and here we are. We have a GitHub Copilot
         | backlash against the hype squad.
         | 
         | The GitHub CEO is no where to be found to answer the important
         | questions on software licenses, copyright and the legal
         | implications on scraping the source code of tons of projects
         | with those licenses for Copilot.
         | 
         | The fact you can only use it in VSCode and with Microsoft
         | having an exclusive deal with OpenAI screams an obvious
         | 'embrace and extend',
         | 
         | As for 'Extinguish', they will need to be very creative on
         | that.
         | 
         | [0] https://news.ycombinator.com/item?id=27685104
        
       | fartcannon wrote:
       | What happens when Microsoft sues someone for including
       | Microsoft's code in another project?
       | 
       | Will it be fair use then?
        
       | wcerfgba wrote:
       | I'm glad that Copilot is bringing the grey areas of copyright
       | into discussion. If I write a book and it is copyright, what's
       | the smallest unit which is covered by that copyright? Each word
       | is obviously not. Some sentences will be fairly generic and I
       | will not be the first person to write them. But some sentences
       | will be characteristic of the work or my own style. Clearly how
       | we apply copyright to subdivisions of an original work is an open
       | question.
        
         | Engineering-MD wrote:
         | I think an interesting analogy is if you rewrote a book in your
         | own words but with each paragraphs meaning intact. So you
         | rewrote Harry Potter but with slightly different sentence
         | structures, but meaning was otherwise near identical. It's that
         | copyright infringement? I think it would certainly be
         | plagiarism.
         | 
         | The other similar analogy is of translation: a translated work
         | is still copied by 'derived from' copyright laws.
         | 
         | Is this just what copilot is doing in some ways but for smaller
         | components?
        
         | MontyCarloHall wrote:
         | The legal term for this is scenes a faire[0], and there is
         | quite a bit of legal precedent covering exactly the cases you
         | bring up.
         | 
         | [0] https://en.m.wikipedia.org/wiki/Scenes_a_faire
        
           | canada_dry wrote:
           | > Limits of the scenes a faire doctrine are a matter of
           | degree -- that is, _operate on a continuum_.
           | 
           | Copilot is certainly pushing that envelope.
        
         | nomercy400 wrote:
         | 'Some sentences' makes me think of the link tax introduced to
         | prevent aggregating news sources based on only headlines, so
         | even generic sentences fall under copyright in certain cases.
        
         | breck wrote:
         | This. You realize it doesn't make any sense. All ideas are
         | shared creations, by definition. If you've created something
         | that has meaning for other people, the meaning comes from the
         | ideas you are incorporating into your own tree.
         | 
         | There is no defending copyright. It is indefensible from first
         | principles. It makes no logical sense.
         | 
         | Though it sure has proven to be a profitable con.
        
           | okamiueru wrote:
           | I recognise your comments from several different threads, and
           | I'm wondering if you might not be working against your own
           | ideals. The GPL license is intended to persuade other sources
           | to share their contributions when building on top, which I
           | assume is what you would like to see happen. If everything is
           | GPL then everything is open source, everyone can use
           | anything, including training AI methods, etc.
           | 
           | The problem posed with copilot is in fact the opposite. By
           | taking it to its logical conclusion, this might make it
           | possible to disregard this effort and use GPL code on your
           | private project.
        
           | kortilla wrote:
           | > There is no defending copyright. It is indefensible from
           | first principles. It makes no logical sense.
           | 
           | What does that even mean? The intent from the beginning of
           | copyright was to allow people to live off of intellectual
           | works by claiming legal rights over the work.
           | 
           | There are no "first principles" from which basically any
           | societal agreements like these are derived.
           | 
           | Even something as simple as "murder is illegal" isn't
           | actually derived from any first principles because the
           | government is allowed to murder people, citizens are during
           | self defense, etc.
        
           | cnma wrote:
           | Nonsense. Above a certain level of creativity people do
           | produce novel or exceptional things that are worthy of
           | protection.
           | 
           | Because naked men are a shared concept Michelangelo's David
           | is not protect-worthy?
           | 
           | I'm very worried that such opinions are up-voted so highly
           | when Microsoft leeches open source code (but not its own
           | ...).
           | 
           | People have no respect for other people's creations. Perhaps
           | it makes them feel better because they haven't created
           | anything difficult themselves.
        
       | ChrisMarshallNY wrote:
       | All of my open-source stuff on GH is MIT. I don't care whether or
       | not Copilot (or anyone else) uses it.
       | 
       | I seriously doubt that Copilot scans my (very few) private repos.
       | Even then, I don't think I do anything particularly noteworthy.
       | 
       | But that is just me.
        
         | eCa wrote:
         | The license you have chosen requires attribution. You may not
         | care[1] but the other party still most likely will be in
         | violation if Copilot reproduces a significant chunk of your
         | code.
         | 
         | [1] I also MIT license my public code on Github, and also
         | wouldn't care that much.
        
           | ChrisMarshallNY wrote:
           | I don't care about attribution.
           | 
           | The only reason I use MIT, is so some knucklehead doesn't try
           | to sue me, because they cheezed up my code.
        
         | [deleted]
        
       | BiteCode_dev wrote:
       | My prediction is that they will add a licence tooltip in the code
       | completion ui and solve that issue next month.
        
       | IceDane wrote:
       | ok
        
       | blihp wrote:
       | Anyone publishing anything on the Internet should expect this
       | type of use case. If it is removed from github and republished
       | via another site, there is absolutely nothing preventing another
       | service/company from doing the exact same thing (or 'worse'...
       | i.e. imagine a learning system that can actually understand the
       | code) when scraping the alternative location. It's not unusual
       | for bots to be among the most frequent visitors to low traffic
       | pages these days and they aren't all just populating search
       | engines.
        
         | yashap wrote:
         | A bigger concern for many is that if you USE copilot, you'll
         | unintentionally copy code with licences that your company
         | really, REALLY does not want to copy. For example, here's
         | copilot copying some very famous GPL code:
         | https://twitter.com/mitsuhiko/status/1410886329924194309?s=2...
         | 
         | And basically every software company avoids GPL like the
         | plague, due to its strong copyleft conditions.
        
           | blihp wrote:
           | Sure, but that's a different end of the issue than I was
           | referring to. I was pointing out that just taking code off of
           | github wouldn't avoid the use case. Any published code from
           | any public source is likely to eventually be used this way by
           | someone.
        
             | yashap wrote:
             | Yeah, I agree with your point that "if you publish content
             | to the internet, expect it to be used in ways you don't
             | intend, or even permit." Just pointing out that a lot of
             | the concerns are not "GitHub is stealing my code for use in
             | Copilot," but "using GitHub Copilot in my proprietary
             | software is a massive risk/liability."
        
       | lacker wrote:
       | I thought this was a pretty good thread (by an ex-Wikipedia
       | lawyer) on Twitter about the IP meaning of Copilot.
       | 
       | https://twitter.com/luis_in_brief/status/1410242882523459585...
       | 
       | And this is a longer article about how IP and AI interact:
       | 
       | https://ilr.law.uiowa.edu/print/volume-101-issue-2/copyright...
       | 
       | I am not a lawyer, but I am capable of summarizing the thoughts
       | of lawyers, so my take is that in general, fair use allows AI to
       | be trained on copyrighted material, and humans who use this AI
       | are not responsible for minor copyright infringement that happens
       | accidentally as a result. However, this has not been tested in
       | court in detail, so the consensus could change, and if you were
       | extremely risk-averse you might want to avoid Copilot.
       | 
       | A key quote from the second link:
       | 
       |  _Copyright has concluded that reading by robots doesn't count.
       | Infringement is for humans only; when computers do it, it's fair
       | use._
       | 
       | Personally, I think law should allow Copilot. As a human, I am
       | allowed to read copyrighted code and learn from it. An AI should
       | be allowed to do the same thing. And nobody cares if my ten-line
       | "how to invert a binary tree" snippet is the same as someone
       | else's. Nobody is really being hurt when a new tool makes it
       | easier to copy little bits of code from the internet.
        
         | lamontcg wrote:
         | I'm more concerned with all the poor code and security issues
         | that Copilot has been trained on. Garbage In, Garbage Out.
        
         | mmastrac wrote:
         | > Copyright has concluded that reading by robots doesn't count.
         | Infringement is for humans only; when computers do it, it's
         | fair use.
         | 
         | This would be interesting to test with AI and pop music.
        
           | encryptluks2 wrote:
           | This is a stupid argument that the Twitter author made.
           | Saving music digitally is reading by robot, so recording
           | music that wasn't digital into a digital format is fair use.
        
             | extra88 wrote:
             | > recording music that wasn't digital into a digital format
             | is fair use
             | 
             | If you're doing it from an analog format you bought for
             | your own use (format shifting), it is fair use.
        
           | inglor_cz wrote:
           | Perhaps the final judgment would say "AI cannot infringe on
           | copyright provided that only other AIs consume the result of
           | the first AIs work".
           | 
           | And suddenly there is a world of robots composing, writing
           | and painting for other robots. With us humans left out.
           | 
           | There should be a /s at the end, but legal world sometimes
           | produces such convolutions. See, for example, the
           | interpretation of the Commerce Clause in Gonzales v. Reich.
        
           | dehrmann wrote:
           | As far as IP protections go, they're similar, but the
           | incentives are so different that you get songwriters going to
           | court over bits of melodies that might be worth millions.
           | Outside of quantitative trading, it's hard to find an example
           | of 10 lines of code that are worth millions and couldn't
           | easily be replaced with another implementation.
        
         | janoc wrote:
         | Sorry but it is not a robot publishing the "lifted" code but a
         | human. So the copyright will very much apply. That's an
         | argument like saying CTRL+C/CTRL+V is OK because it is a
         | "computer doing it".
         | 
         | Plus it is not "minor infringement" but code is being lifted
         | verbatim - e.g. as has been demonstrated by the Quake square
         | root code.
         | 
         | Feel free to test this theory in court ...
        
         | cnma wrote:
         | > Nobody is really being hurt when a new tool makes it easier
         | to copy little bits of code from the internet.
         | 
         | Of course people are hurt, namely the original creators who
         | spent years of work and whose work is potentially laundered,
         | depending on how good this IP grabbing AI will get.
         | 
         | If it gets really good, some smug and well connected loser
         | (e.g. the type who posts pictures of himself with a microphone
         | on GitHub) will click a button, steal other people's hard work
         | and start a "new" project that supersedes the old one.
        
         | dathinab wrote:
         | Fair use for training and "independent creation" are one think
         | a AI "remembering and mostly verbatim copying code over" an
         | another.
         | 
         | Many of the current Machine Learning application try to teach
         | AI to understand the concepts behind their training data and
         | use that to do whatever they are trained to do.
         | 
         | But most (all?) fail to properly reach the goal in any more
         | complicated cases, at least the kinds of models which are used
         | for things like Copilot (GPT-3?).
         | 
         | Instead what this models learn can be described as a
         | combination of some abstract understanding and verbatim
         | snippets of input data of varying size.
         | 
         | As such while they sometimes generate "new" things based on
         | "understanding" they also sometimes just copy things they have
         | seen before!! (Like in the Quake code example where it even
         | copied over some of the not-so "proper" comments expressing
         | programmers frustration).
         | 
         | It's like a human not understanding programming or english or
         | Latin letters but has a photographic memory and tries to
         | somehow create something which seems to make sense by
         | recombining existing verbatim snippets, sometimes while
         | tweaking them.
         | 
         | I.e. if the snippets are small enough and tweaked enough it's
         | covered by fair use and similar, BUT the person doing it
         | doesn't know about this, so if a large remembered snippet
         | matches verbatim it _will_ put it in effectively copying code
         | of a size which likely doesn 't fall under fair use.
         | 
         | Also this is a well known problem, at least it was when I
         | covered topics including ML ~5 years ago. I.e. good examples
         | included extracting whole sequences of paragraphs of a book out
         | of such a network or (more brilliantly) extracting thinks like
         | peoples contact data based on their names or credit card
         | information (in case of systems trained on mails).
         | 
         | So that Copilot is basically guaranteed to sometimes copy non
         | super smalls snippets of code and potential comments in a way
         | not-really appropriate wrt. copyright should have been a well
         | know fact for the ML specialist in charge of this project.
        
         | thunderbong wrote:
         | IMHO, Threadreader does a better job of reading these kind of
         | tweets
        
         | j4yav wrote:
         | Wouldn't this make for a simple license laundering system?
        
         | SV_BubbleTime wrote:
         | > Nobody is really being hurt when a new tool makes it easier
         | to copy little bits of code from the internet.
         | 
         | Quite the opposite. We all get a tiny bit better with good
         | information like this. This is what the internet should be for,
         | evolving, learning from past mistakes, information
         | availability.
         | 
         | If the discussion was "I clicked this button and got someone's
         | entire chat platform" that would be different. Words and
         | sentences aren't copy written, books are, so when exactly are a
         | collection of words a book?
         | 
         | There is nuance, and the linked page has none. But that's fine,
         | that guy is free to pull his content off GitHub. This seems
         | like a useful feature for other people who want to make things
         | first and foremost.
        
           | IncRnd wrote:
           | > Words and sentences aren't copy written, books are, so when
           | exactly are a collection of words a book?
           | 
           | If that were true, then 20 people could each steal a single
           | chapter from a book, and one of the people could combine
           | those 20 chapters into a new copyright-free book. That's
           | clearly false.
        
             | SV_BubbleTime wrote:
             | Did I say anything about paragraphs or chapters? Didn't I
             | specially write there is nuance?
             | 
             | And for your strawman, the assembly of uncopywritable
             | components into a copywriten work, would still be a
             | violation.
             | 
             | So we agree that copywriter is somewhere between the
             | paragraphs and chapters and the book. So why are tiny code
             | excerpts "a problem"?
        
         | exo762 wrote:
         | > As a human, I am allowed to read copyrighted code and learn
         | from it. An AI should be allowed to do the same thing.
         | 
         | This is a very false equivalency. AI and humans are different.
         | First, AI is at best a slave, and likely a slave of a capital.
         | Second - scale makes difference.
        
         | bcrosby95 wrote:
         | > Copyright has concluded that reading by robots doesn't count.
         | Infringement is for humans only; when computers do it, it's
         | fair use.
         | 
         | Reading by a robot doesn't count. But injecting a robot between
         | copyright material and a product doesn't magically strip the
         | copyright from whatever it produces.
        
         | jcelerier wrote:
         | > As a human, I am allowed to read copyrighted code and learn
         | from it.
         | 
         | Of course not. Reading some copyrighted code can have you
         | entirely excluded from some jobs - you can't become a wine
         | contributor if it can be shown you ever read Windows source
         | code and most likely conversely. Likewise, you can't ever write
         | GPL VST 2 audio plug-ins if you ever had access to the official
         | Steinberg VST2 SDK. Etc etc...
         | 
         | Did people forget why black box reverse engineering of software
         | ever came to be ?
        
           | [deleted]
        
           | burntoutfire wrote:
           | > Reading some copyrighted code can have you entirely
           | excluded from some jobs - you can't become a wine contributor
           | if it can be shown you ever read Windows source code and most
           | likely conversely.
           | 
           | If that's the case, it should be easy to kill a project like
           | wine - just send every core contributor an email containing
           | some Windows code.
        
             | pvaldes wrote:
             | Nobody could grant if that thing is really windows code or
             | a fake. Not without the sender self-identifying as a well
             | known top MS employee having access to it. In that case the
             | sender would be doing something illegal and against MS
             | interests.
             | 
             | The result would be WINE having an advantage to redo the
             | snippet of code in a totally new and different way and MS
             | being forced to show part of its private code, that would
             | expose them also to patent trolls.
             | 
             | Would be a win-win situation for Wine and a lose-lose
             | situation for MS.
        
           | dahart wrote:
           | > Reading some copyrighted code can have you entirely
           | excluded from some jobs
           | 
           | What provision of copyright law are you referring to? Are you
           | conflating copyright law with arbitrary organizational
           | policies?
        
             | chrisseaton wrote:
             | Who said it was a law?
        
               | dahart wrote:
               | Which "it" are you referring to? @lacker was talking
               | about copyright in the comment @jcelerier replied to.
        
               | chrisseaton wrote:
               | Yeah... but they didn't say it was the law that got you
               | excluded from working on some projects from reading
               | copyright code. It's corporate policy that does that -
               | it's not a law but they do it based on who owns the
               | copyright. Not everything that impacts you is a law.
               | 
               | They said
               | 
               | > Reading some copyrighted code can have you entirely
               | excluded from some jobs
               | 
               | And they're right. It's because of corporate policies.
               | They never said it was because of a law - you imagined
               | that out of nothing.
        
               | dahart wrote:
               | > They never say it was because of a law - you imagined
               | that out of nothing.
               | 
               | @jcelerier flatly contradicted the statement that
               | copyright doesn't prevent you from reading something.
               | 
               | You're right that @jceleier didn't say their example was
               | law, that's because the example is a straw man in the
               | context of what @lacker wrote.
        
               | chrisseaton wrote:
               | Are you editing your comments out after they're been
               | replied to? That's really poor form.
        
               | dahart wrote:
               | I did not edit my comments above after reading your
               | replies, why do you ask? What do you think I changed that
               | affected how the thread reads?
               | 
               | And, who says improving or clarifying a comment is poor
               | form? What is the edit button for, and why is it
               | available once replies have been posted?
        
               | chrisseaton wrote:
               | > What do you think I changed
               | 
               | I think you added
               | 
               | > Which "it" are you referring to?...
               | 
               | Because I have a tab open and can see the old one!
        
               | dahart wrote:
               | I added that before I saw your comment. So?
        
               | hluska wrote:
               | So @chrisseaton was correct, you did edit your posts and
               | their question was in good faith.
               | 
               | Edit - I'm adding another point as an edit to show
               | another way to communicate. Would any of your points been
               | lost had you done something similar?
        
               | dahart wrote:
               | > So @chrisseaton was correct
               | 
               | No that's not true. I did not edit my posts after reading
               | their reply, and the false accusation was that I changed
               | my comment after it was replied to.
               | 
               | I didn't challenge whether the question was in good
               | faith, but I'll just note that the relevant discussion of
               | copyright got dropped in favor of an ad-hominem attack.
               | 
               | My question of which "it" was being referred to is a
               | legitimate question that I believe clarified the intent
               | of my comment, and I added it to make clear I was talking
               | about what @lacker said, not what @jcelerier wrote.
               | 
               | > Edit - I'm adding another point as an edit to show
               | another way to communicate. Would any of your points been
               | lost had you done something similar?
               | 
               | This doesn't answer my question of why an edit should not
               | be made before I see any replies, nor of why any edit is
               | "poor form" and according to whom. I made my edit
               | immediately. I'm well aware of the practice of calling
               | out edits with a note, I've done it many times. I don't
               | feel the need to call out every typo or clarification
               | with an explicit note, especially when edited very soon
               | after the original comment.
        
           | TheRealPomax wrote:
           | I don't believe you on this in the slightest. This sounds
           | like you making up an argument, so cite sources if you want
           | people to believe your claims.
        
           | mhh__ wrote:
           | In my experience open source has now become so prevalent that
           | I think some young developers could be completely caught out
           | if the pendulum swings the other way.
           | 
           | Semi-related, the GNU/Linux copypasta is now more familiar to
           | some than the GNU project in general - this is a shame to me
           | as I view the copypasta to be mocking people who worked very
           | hard to achieve what GNU has achieved asking for some credit.
        
           | messe wrote:
           | It's dependent on jurisdiction. Black box reverse engineering
           | is only required in certain countries. If I remember
           | correctly, most of Europe doesn't require it.
        
           | k__ wrote:
           | Wasn't that the entire premise of "Halt and Catch Fire"?
        
           | cush wrote:
           | I've you've ever read a book or interacted with any product,
           | you've learned from copyrighted material.
           | 
           | You've extrapolated "some organizations don't allow you to
           | contribute if you've learned from the code of their direct
           | competitor" to "You're not allowed to learn from copyrighted
           | code", which is absurd.
        
           | crazygringo wrote:
           | That's not what GP is saying.
           | 
           | In general, you're _absolutely_ allowed to learn programming
           | techniques from _anywhere_. You can contribute software
           | almost anywhere even if you 've read Windows source code. Re-
           | using everything you've learned, in your own creative
           | creation, is part of fair use.
           | 
           | Your example is the very specific scenario where you're
           | attempting to replicate an _entire_ program, API, etc., to
           | identical specifications. That 's obviously not fair use.
           | You're not dealing with little bits and pieces, you're
           | dealing with an entire finished product.
        
             | doytch wrote:
             | This is true, but there's also a murkier middle option. I
             | used to work for a company that made a lot of money from
             | its software patents but I was in a division that worked
             | heavily in open-source code. We were forbidden to
             | contribute to the high-value patented code because it was
             | impossible to know whether we were "tainted" by knowledge
             | of GPL code.
        
               | dathinab wrote:
               | No you are not, guaranteed (I think, not a lawyer).
               | 
               | At least from a copyright point of few.
               | 
               | TL;DR: Having right, and having a easy defense in a law
               | suite are not the same.
               | 
               | BUT separating it makes defending any law-suite against
               | them because of copyright and patent law much easier. It
               | also prevents any employee from "copying GPL(or similar)
               | code verbatim from memory"(1) (or even worse the
               | clipboard) sure the employee "should" not do it but by
               | separating them you can be more sure they don't, and in
               | turn makes it easier to defent in curt especially wrt.
               | "independent creation".
               | 
               | There is also patent law shenanigans.
               | 
               | (1): Which is what GitHub Copilot is sometimes doing
               | IMHO.
        
               | [deleted]
        
               | 0xdky wrote:
               | Same here. I worked at a NAS storage (NFS) vendor and
               | this was a common practice. Could not look at server
               | implementation in Linux kernel and open source NFS client
               | team could not look at proprietary server code.
        
             | jcelerier wrote:
             | > Your example is the very specific scenario where you're
             | attempting to replicate an entire program, API, etc., to
             | identical specifications. That's obviously not fair use.
             | You're not dealing with little bits and pieces, you're
             | dealing with an entire finished product.
             | 
             | No - google's 9 lines of sorting algorithm (iirc) copied
             | from Oracle's implementation were not considered fair use
             | in the Google / Oracle debacle.
             | 
             | Likewise SCO claimed that 80 copied lines (in the entirety
             | of the Linux source code) were a copyright violation, even
             | if we never had a legal answer to this.
        
               | crazygringo wrote:
               | Sorry, but you're not recalling correctly. :)
               | 
               | The Supreme Court decided Google v. Oracle _was_ fair
               | use. It was 3 months ago:
               | 
               | https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_Americ
               | a,_...
               | 
               | That's the highest form of precedent, the question has
               | now been effectively settled (unless Congress ever
               | changes the law).
               | 
               | Edit: added a dummy hash to end of URL so HN parses it
               | correctly (thanks @thewakalix below)
        
               | thewakalix wrote:
               | There seems to be an issue with Hacker News's URL
               | parsing. The final period isn't included as part of the
               | link.
        
               | croes wrote:
               | The fair use was about Googled API reimplementation. It
               | becomes a whole different case with a 1:1 copy of code.
               | And don't forget fair use works in the US, not
               | necessarily in the rest of the world.
               | 
               | But I'm happy about all the new GPL programs created by
               | Copilot
        
               | wtallis wrote:
               | That Supreme Court ruling doesn't appear to address the
               | claims of actual copied code (the rangeCheck function),
               | only the more nebulous API copyright claims.
        
               | [deleted]
        
               | jcelerier wrote:
               | nope, those lines were specifically excluded from the
               | prior judgment - and SC did not cast another judgment on
               | them:
               | 
               | > With respect to Oracle's claim for relief for copyright
               | infringement, judgment is entered in favor of Google and
               | against Oracle except as follows: the rangeCheck code in
               | TimSort.java and ComparableTimSort.java, and the eight
               | decompiled files (seven "Impl.java" files and one"ACL"
               | file), as to which judgment for Oracle and against Google
               | is entered in the amount of zero dollars (as per the
               | parties' stipulation).
        
               | [deleted]
        
             | saurik wrote:
             | This model doesn't learn and abstract: it just pattern
             | matches and replicates; that's why it was shown exactly
             | replicating regions of code--long enough to not be "de
             | minimis" and recognizable enough to include the comments--
             | that happen to be popular... which would be fine, as long
             | as the license on said code were also being replicated. It
             | just isn't reasonable to try to pretend Copilot--or GPT-3
             | in general--is some kind of general purpose AI worthy of
             | being compared with the fair use rights of a human learning
             | techniques: this is a machine learning model that likes to
             | copy/paste not just tiny bits of code but _entire
             | functions_ out of other peoples ' projects, and most of
             | what makes it fancy is that it is good at adapting what it
             | copies to the surrounding conditions.
        
               | the8472 wrote:
               | This is called prompt engineering. If you find a popular,
               | frequently repeated code snippet and then fashion a
               | prompt that is tailored to that snippet then yes the NN
               | will recite it verbatim like a poem.
               | 
               | But that doesn't mean it's the only thing it does or even
               | that it does it frequently. It's like calling a human a
               | parrot because he completed a line from a famous poem
               | when the previous speaker left it unfinished.
               | 
               | The same argument was brought up with GPT too and has
               | been long debunked. The authors (and others) checked
               | samples against the training corpus and it only rarely
               | copies unless you prod it to.
        
               | saurik wrote:
               | I don't know if I agree with your argument about GPT-3,
               | but I think our disagreement seems to be besides the
               | point: if your human parrot did that, they would--not
               | just in theory but in actual fact! see all the cases of
               | this in the music industry--get sued for it, even if they
               | claim they didn't mean to and it was merely a really
               | entrenched memory.
        
               | the8472 wrote:
               | The point is that many of the examples you see are
               | intentional, through prompt engineering. The pilot asked
               | the copilot to violate copyright, the copilot complied.
               | Don't blame the copilot.
               | 
               | There _also_ are cases where this happens
               | unintentionally, but those are not the norm.
        
               | moyix wrote:
               | Have you used Copilot? I have not, but I have trained a
               | GPT2 model on open source projects
               | (https://doesnotexist.codes/). It does _not_ just pattern
               | match and replicate. It can be cajoled into reproducing
               | some memorized snippets, but this is not the norm; in my
               | experience the vast majority of what it generates is
               | novel. The exceptions are extremely popular snippets that
               | are repeated many many times in the training data, like
               | license boilerplate.
               | 
               | Perhaps Copilot behaves very differently from my own
               | model, but I strongly suspect that the examples that have
               | been going around twitter are outliers. Github's study
               | agrees:
               | https://docs.github.com/en/github/copilot/research-
               | recitatio... (though of course this should be replicated
               | independently).
        
               | saurik wrote:
               | So, to verify, your claim is that GPT-3, when trained on
               | a corpus of human text, isn't merely managing to string
               | together a bunch of high-probability sequences of symbol
               | constructs--which is how every article I have ever read
               | on how it functions describes the technology--but is
               | instead managing to build a model of the human world and
               | the mechanism of narration required to describe it, with
               | which it uses to write new prose... a claim you must make
               | in order to then argue that GPT-3 works like a human
               | engineer learning a model of computers, libraries, and
               | engineering principals from which it can then write code,
               | instead of merely using pattern recognition as I stated?
               | As someone who spent years studying graduate linguistics
               | and cognitive science (though admittedly 15-20 years ago,
               | so I certainly haven't studied this model: I have only
               | read about it occasionally in passing) I frankly think
               | you are just trying to conflate levels of understanding,
               | in order to make GPT-3 sound more magical than it is :/.
        
               | moyix wrote:
               | What? I don't think I made any claim of the sort. I'm
               | claiming that it does more than mere regurgitation and
               | has done _some_ amount of abstraction, not that it has
               | human-level understanding. As an example, GPT-3 learned
               | some arithmetic and can solve basic math problems not in
               | its training set. This is beyond pattern matching and
               | replication, IMO.
               | 
               | I'm not really sure why we should consider Copilot
               | legally different from a fancy pen - if you use it to
               | write infringing code then that's infringement by the
               | user, not the pen. This leaves the _practical_ question
               | of how often it will do so, and my impression is that it
               | 's not often.
        
               | saurik wrote:
               | The argument I was responding to--made by the user
               | crazygringo--was that GPT-3 trained on a model of the
               | Windows source code is fine to use nigh unto
               | indiscriminately, as supposedly Copilot is abstracting
               | knowledge like a human engineer. I argued that it doesn't
               | do that: that GPT-3 is a pattern recognize that not only
               | theoretically just likes to memorize and regurgitate
               | things, it has been shown to in practice. You then
               | responded to my argument claiming that GPT-3 in fact...
               | what? Are you actually defending crazygringo's argument
               | or not? Note carefully that crazygringo explicitly even
               | stated that copying little bits and pieces of a project
               | is supposedly fair use, continuing the--as far as I
               | understand, incorrect--assertion by lacker (the person
               | who started this thread) that if you copied someone's
               | binary tree implementation that would be fair use, as the
               | two of them seem to believe that you have to copy
               | essentially an entire combined work (whatever that means
               | to them) for something to be infringing. Honestly, it now
               | just seems like you decided to skip into the middle of a
               | complex argument in an attempt to made some pedantic
               | point: either you agree that GPT-3 is a human that is
               | allowed to, as crazygringo insists, read and learn from
               | anything and the use that knowledge in any way they see
               | fit, or you agree with me that GPT-3 is a fancy pattern
               | recognizer and it can and will just generate copyright
               | infringements if used to solve certain problems. Given
               | your new statements about Copilot being a "fancy pen"
               | that can in fact be used incorrectly--something
               | crazygringo seems to claim isn't possible--you frankly
               | sound like you agree with my arguments!!
        
               | bnjemian wrote:
               | I think a crucial distinction to be made here, and with
               | most 'AI' technologies (and I suspect this isn't news to
               | many people here) is that - yes - they are building
               | abstractions. They are not simply regurgitating. But - no
               | - those abstractions are _not_ identical (and very often
               | not remotely similar) to human abstractions.
               | 
               | That's the very reason why AI technologies can be useful
               | in augmenting human intelligence; they see problems in a
               | different light, can find alternate solutions, and
               | generally just don't think like we do. There are many
               | paths to a correct result and they needn't be isomorphic.
               | Think of how a mathematical theorem may be proved in
               | multiple ways, but the core logical implication of the
               | proof within the larger context is still the same.
        
               | codelord wrote:
               | It's not really comparable to a pen. Because a pen by
               | itself doesn't copy someone else's code/written words.
               | It's more like copying code from Github or if you wrote a
               | script that did that automatically. You have to be
               | actively cautious that the material that you are copying
               | is not violating any copyrights. The problem is Copilot
               | has enough sophistication to for example change variable
               | names and make it very hard to do content matching. What
               | I can guarantee it won't be able to do is to be able to
               | generate novel code from scratch that does a particular
               | function (source: I have a PhD in ML). This brute-force
               | way of modeling computer programs (using a language
               | model) is just not sophisticated enough to be able to
               | reason and generate high level concepts at least today.
        
               | DougBTX wrote:
               | One way to look at these models is to say that they take
               | raw input, convert it into a feature space, manipulate
               | it, then output back as raw text. A nice example of this
               | is neural style transfer, where the learnt features can
               | distinguish content from style, so that the content can
               | be remixed with a different style in feature space. I
               | could certainly imagine evaluating the quality of those
               | features on a scale spanning from rote-copying all the
               | way up to human understanding, depending on the quality
               | of the model.
        
               | jozvolskyef wrote:
               | Imagine for a second a model of the human brain that
               | consists of three parts. 1) a vector of trillion inputs,
               | 2) a black box, and 3) a vector of trillion outputs. At
               | this level of abstraction, the human brain "pattern
               | matches and replicates" just the same, except it is
               | better at it.
        
               | saurik wrote:
               | Human brains are at least minimally recurrent, and are
               | trained on data sets that are much wider and more complex
               | than what we are handing GPT-3. I have done all of these
               | standard though experiments and even developed and
               | trained my own neural networks back before there were
               | libraries that have allowed people to "dabble" in machine
               | learning: if you consider the implications of humans
               | being able to execute turing complete thoughts it should
               | be come obvious that the human brain isn't merely doing
               | pattern-anything... it _sometimes_ does, but you can 't
               | just conflate them and then call it a day.
        
               | jozvolskyef wrote:
               | The human brain isn't Turing-complete as that would
               | require infinite memory. I'm not saying that GPT-3 is
               | even close, but it is in the same category. I tried
               | playing chess against it. According to chess.com, move 10
               | was its first mistake, move 16 was its first blunder, and
               | past move 20 it tried to make illegal moves. Try playing
               | chess without a chessboard and not making an illegal
               | move. It is difficult. Clearly it does understand chess
               | enough not to make illegal moves as long as its working
               | memory allows it to remember the game state.
        
               | hollerith wrote:
               | >The human brain isn't Turing-complete as that would
               | require infinite memory
               | 
               | A human brain with an unlimited supply of pencils and
               | paper, then.
        
               | robbedpeter wrote:
               | Transformers do learn and abstract. Not as well as
               | humans, but for whatever definitive of innovation or
               | creativity you wanna run with, these gpt models have it.
               | It's not magic, it's math, but these programs are
               | approximating the human function of media synthesis
               | across narrowly limited domains.
               | 
               | These aren't your crazy uncle's Markov chain chatbots.
               | They're sophisticated bayesian models trained to
               | approximate the functions that produced the content used
               | in training.
        
               | visarga wrote:
               | > this is a machine learning model that likes to
               | copy/paste not just tiny bits of code but entire
               | functions out of other peoples' projects
               | 
               | Github could make a blacklist and tell Copilot never to
               | suggest that code. Problem solved. You use one of the
               | other 9 suggestions.
        
           | eatbitseveryday wrote:
           | > > As a human, I am allowed to read copyrighted code and
           | learn from it.
           | 
           | > Of course not. Reading some copyrighted code can make you
           | entirely excluded from some jobs - you can't become a wine
           | contributor if it can be shown you ever read Windows source
           | code and most likely conversely.
           | 
           | You can of course read the code. The consequences are thus
           | increased limitations, like you say.
           | 
           | What you mention is not an absolute restriction from reading
           | copyrighted material. You perhaps have to cease other
           | activities as a result.
        
           | PragmaticPulp wrote:
           | > Of course not. Reading some copyrighted code can have you
           | entirely excluded from some jobs
           | 
           | That's not a law. That's a cautionary decision made by those
           | companies or projects to make it more difficult for
           | competitors to argue that code was copied.
           | 
           | Those projects could hire people familiar with competitor
           | code and assign them to competing projects if they wanted.
           | The contributors could, in theory, write new code without
           | using proprietary knowledge from their other companies. In
           | practice, that's actually really difficult to do and even
           | more difficult to prove in court, so companies choose the
           | safe option and avoid hiring anyone with that knowledge
           | altogether.
           | 
           | Now the question is whether or not GitHub's AI can be argued
           | to have proprietary knowledge contained within. If your goal
           | is to avoid any possibility that any court could argue that
           | GitHub copilot funneled proprietary code (accessible to
           | GitHub copilot) into your project, then you'd want to forbid
           | contributors from using CoPilot.
        
             | ithkuil wrote:
             | In this case though we have machine learning model that is
             | trained with some code and is not merely learning abstract
             | concepts to be applied generally in different domains, but
             | instead can use that knowledge to produce code that looks
             | pretty much the same as the learning material, given the
             | context that fits the learning material.
             | 
             | If humans did that, it would be hard to argue they didn't
             | outright copy the source.
             | 
             | When a machine does it, does it matter if the machine
             | literally copied it from sources, or first transformed it
             | into an isomorphic model in its "head" before regurgitating
             | it back?
             | 
             | If yes, why doesn't parsing the source into an AST and then
             | rendering it back also insulate you from abiding a
             | copyright?
        
               | dTal wrote:
               | >When a machine does it, does it matter if the machine
               | literally copied it from sources, or first transformed it
               | into an isomorphic model in its "head" before
               | regurgitating it back?
               | 
               | You've hit the nail on the head here. If this is okay,
               | then neural nets are simply machines for laundering IP.
               | We don't worry about people memorizing proprietary source
               | code and "accidentally" using it because it's virtually
               | impossible for a human to do that without realizing it.
               | But it's trivial for a neural net to do it, so
               | comparisons to humans applying their knowledge are
               | flawed.
        
               | visarga wrote:
               | This is not such a big problem in reality because the
               | output of Copilot can be filtered to exclude snippets too
               | similar to the training data, or any corpus of code you
               | want to avoid. It's much easier to guarantee clean code
               | than train the model in the first place.
        
         | temac wrote:
         | I will completely follow that opinion the day MS includes the
         | whole Windows codebase into the training of copilot.
         | 
         | Until then, it's basically "GPL" (and other licences)
         | laundering with one-sided excuses.
        
         | [deleted]
        
         | abeppu wrote:
         | Well, maybe the interpretation will change if the right people
         | are pissed off.
         | 
         | At this point, how hard would it be to produce a structurally
         | similar "content-aware continuation/fill" for audio producers,
         | film makers, etc, which suggests audio snippets or film
         | snippets, trained from copyrighted source material?
         | 
         | If prompted by a black screen with some white dots, the video
         | tool could suggest a sequence of frames beginning with text
         | streaming into the distance "A long time ago in a galaxy far
         | far away ..." and continue from there.
         | 
         | Normally we don't try to train models to regurgitate their
         | inputs, but if we actually tried, I'm sure one could be made to
         | reproduce the White Album or Thriller or whatever else.
        
           | visarga wrote:
           | NeRFs (neural radiance fields) are neural nets that exactly
           | encode one input, kind of like a JPEG. They can reconstruct
           | from novel viewpoints.
        
         | throwaway_egbs wrote:
         | Just when I thought tweetstorms couldn't get any worse, here's
         | one where every tweet is a quote-tweet of the author. I don't
         | even understand how I'm supposed to read this.
         | 
         | > Copyright has concluded that reading by robots doesn't count.
         | Infringement is for humans only; when computers do it, it's
         | fair use.
         | 
         | Surely there's a limit to this. If I use a machine to produce
         | something that just happens to exactly match a copyrighted
         | work, now it's not infringement because of the method I used to
         | produce it? That seems nonsensical, but maybe there's precedent
         | for this too? (I have no idea what I'm talking about.)
        
           | neolog wrote:
           | Ctrl-c is a robot, so copyright doesn't apply to it
        
           | rcxdude wrote:
           | That quote is basically entirely nonsensical. 'copyright'
           | hasn't decided anything (nor has any legislative body nor the
           | courts). All that's happened is that OpenAI has put forward
           | an argument that using large quantities of media scraped from
           | the internet as training data is fair use. This argument for
           | the most part does not rely on the human vs machine
           | distinction (in fact it leans on the idea that the process is
           | not so different from a human learning). The main place this
           | comes up is the final test of damage to the original in terms
           | of lost market share where it's argued that because it's a
           | machine consuming the content there's no loss of audience to
           | the creator (which is probably better phrased as the people
           | training the neural net weren't going to pay for it anyway).
           | A lot does ride on the idea that the neural net, if 'well
           | designed', does not generally regurgitate its training data
           | verbatim, which is in fairly hot dispute at the moment.
           | OpenAI somewhat punts on this situation and basically says
           | the output may infringe copyright in this case, but the
           | copyright holder should sue whoever's generating and using
           | the output from the net, not the person who trained and
           | distributed the net.
        
             | discreteevent wrote:
             | Surely it could be argued that there is a loss of audience
             | to the author. At the moment some people will read the
             | author's code directly in order to find out how to solve a
             | problem. In the future at least some of those people will
             | just ask copilot to solve the problem for them.
        
             | noobermin wrote:
             | This argument is very convenient for OpenAI.
        
         | niekverw wrote:
         | > Copyright has concluded that reading by robots doesn't count.
         | Infringement is for humans only; when computers do it, it's
         | fair use.
         | 
         | This is silly. Co pilot is not reading by itself, someone
         | pushed buttons telling it to read and write. If I clone the
         | entire github without the licenses I am telling a robot to do
         | it, doesn't make it right.
        
         | dehrmann wrote:
         | I think the law will allow what copilot eventually becomes. As
         | others have said, right now, it's too apt to reproduce code
         | verbatim.
        
         | intricatedetail wrote:
         | There is a difference when human learns vs multi billion dollar
         | company train their models without paying a penny.
         | 
         | Saying it is just like user, maybe they start paying taxes like
         | individuals without access to creative accountants pay.
         | 
         | Leeches without morals - Micro$oft
        
         | IncRnd wrote:
         | > Nobody is really being hurt when a new tool makes it easier
         | to copy little bits of code from the internet.
         | 
         | That's the first time I've heard copilot get described as
         | copying little bits of code from the Internet. Copilot
         | aggregates all github source code, removes licences from the
         | code, and regurgitates the code without licenses.
         | 
         | Furthermore, both github and the programmers using copilot know
         | this. Look at any one of these threads written by programmers
         | about copilot. Using copilot is knowingly stealing the source
         | code of others without attribution. Using copilot is literally
         | humans stealing source code from others. Copilot was written
         | _for the purpose_ of taking other 's code.
        
           | IfOnlyYouKnew wrote:
           | It's not "literally" stealing, because it doesn't deprive
           | anyone of the use the source code. Those two points were
           | somehow extremely obvious to everyone here as long as it was
           | music and movies we were talking about.
           | 
           | And Github themselves have stated that only 0.1% of the
           | Copilot output contains chunks taken verbatim from the
           | learning set. Of those, the vast majority are likely to be
           | boilerplate so generic it's silly to claim ownership, and
           | maybe sometimes impossible to avoid.
        
             | IncRnd wrote:
             | > It's not "literally" stealing, because it doesn't deprive
             | anyone of the use the source code.
             | 
             | That's simply not true. You might be confusing idealism
             | about software freedom with how both law and society define
             | theft.
             | 
             | Edit: In this comment I refer to the US.
        
               | mdpye wrote:
               | It is actually true, in the UK at least the legal
               | definition of theft includes the deprivation of the owner
               | of the property in question.
               | 
               | The copyright lobby hedge the term as "copyright theft"
               | (i.e. not _actual_ theft) in order to shift the societal
               | understanding. Whish appears to have worked.
               | 
               | This is not a value judgement on copyright infringement.
               | Just that technically it doesn't meet the legal
               | definition of theft.
               | 
               | cf. The rather amusing satire of the "you wouldn't steal
               | a handbag" campaign in the UK, which ran "you wouldn't
               | download a bear!"
        
               | IncRnd wrote:
               | Yes! Thank you. I should have clarified that I meant
               | within the US.
        
               | mdpye wrote:
               | Oh, then today I learned! I didn't realise they were
               | different. Just looked it up in a "plain English
               | dictionary of law" and the distinction seems subtle but
               | important. Rather than "with the intention of depriving
               | the owner", the US one says "with the intention of
               | converting it to their use", which seems broad enough to
               | cover exploiting a copy, rather than the original (or
               | only, in the physical realm...)
        
         | a3w wrote:
         | In germany, there is no fair use exception to copyright. Also,
         | there is no IP most software principles: e.g. writing a
         | specific loop, that even an (weak) AI could suggest, would
         | probably be too simple to be protected.
         | 
         | What could be valid is a right to not mimic collections, but
         | that would mean you cannot clone the Copilot, as input is
         | mapped to a non-trivial collection of outputs.
         | 
         | Disclaimer: IANAL, but I do dabble in IT-law.
        
         | shakow wrote:
         | > Copyright has concluded that reading by robots doesn't count.
         | 
         | Until someone trains a DNN to generate Mickey Mouse-like
         | cartoons I assume.
        
           | dmitriid wrote:
           | There was a joke that all ML will be immediately banned the
           | moment there's a Copilot for RIAA-licensed songs.
        
         | yumraj wrote:
         | It all comes down to this: this has not been tested in the
         | court. The above opinion, or for that matter any opinion from
         | any lawyer or not-a-lawyer, is just that, an _opinion_.
         | 
         | As a business it is your responsibility to determine if this
         | code-copying is worth a risk to your business.
         | 
         | Based on my experience, I'm pretty sure all corporate lawyers
         | will disallow such code copying, till it has been tested in the
         | court. It's just a matter of who will be the guinea pig.
        
         | pubby wrote:
         | The issue isn't an AI reading copyrighted code, the issue is an
         | AI regurgitating the lines of copyrighted code verbatim. To be
         | clear, humans aren't allowed to do this either.
         | 
         | And sure, nobody cares about your stupid binary tree, but do
         | they care about GNU and the Linux kernel? Imagine someone
         | trained an AI to specifically output Linux code, and used it to
         | reproduce a working OS. Is that fair?
        
           | PaulDavisThe1st wrote:
           | > the issue is an AI regurgitating the lines of copyrighted
           | code verbatim. To be clear, humans aren't allowed to do this
           | either.
           | 
           | That's a little broad. There's a wide range of licenses for
           | software that explicitly allow precisely this.
        
             | temac wrote:
             | Tons of licences require at least attribution.
        
         | lucideer wrote:
         | There's a lot of sibling commenters disagreeing with this take
         | but I think they miss that ultimately this comes down to how
         | legal experts interpret tech, rather than what tech experts
         | think law should apply.
         | 
         | This is, imo, unfortunate, as often the legal interpretation is
         | based on a gross misunderstanding of how the tech works, but
         | this is the way.
         | 
         | I don't think copilot should be legal according to my own
         | interpretation but in this (rare) case I feel the "IANAL" tag
         | applies not because I lack (legal) knowledge, but rather
         | because I have (tech) knowledge that is likely absent from
         | actual decision making on legal outcomes (therefore leading to
         | different legal outcomes than how I would see things working).
        
         | 41209 wrote:
         | Copilot is lifting entire functions from GPL code. Legal
         | technicality aside , I know I'd be upset if I gpl'ed some code
         | and someone stole large parts of it.
        
         | josefx wrote:
         | > Copyright has concluded that reading by robots doesn't count.
         | Infringement is for humans only; when computers do it, it's
         | fair use.
         | 
         | So wait, if I write my own AI, lets call it cp, and train it on
         | gnu-gcc.tar.gz with the goal of creating a commercial-
         | compiler.tar.gz then I can license the result any way I want?
         | After all most of the work was done by the computer.
        
           | axismundi wrote:
           | Sorry, you can't. You are not rich enough to get away with
           | it.
        
         | pdonis wrote:
         | _> nobody cares if my ten-line  "how to invert a binary tree"
         | snippet is the same as someone else's._
         | 
         | Maybe nobody cares about that, but the problem is that Github's
         | automated tool is not telling you what code it shows you is
         | actually an exact copy of existing code, or how much of that
         | existing code is being copied, or whether the existing code is
         | licensed, or, if it is licensed, whether your copying is in
         | accordance with the license or not. And without that
         | information you can't possibly know whether what you are doing
         | is legal or ethical. Sure, you could try to guess, but that
         | sort of thing is not supposed to rely on guessing.
        
         | niekverw wrote:
         | "I am not a lawyer,"
         | 
         | STOP READING
        
         | nabilhat wrote:
         | Autonomous programming will be explored. Potentially, Copilot
         | is a proof of concept, an early step in that direction. If it
         | is, the corrections made by Copilot users will be applied to
         | the development of the future of unattended programming.
         | Whether it is or not, it's close enough that any legal outcomes
         | experienced by Copilot users will contribute to the definition
         | of liability boundaries relevant to the future of autonomous
         | programming. Copilot users are numerous enough that the
         | incidence of risk is low of ending up under the foot of a
         | copyright owner with the means and will to crush a user, but no
         | one should take such a risk to use a novelty like Copilot in
         | production code.
        
         | robbrown451 wrote:
         | "reading by robots doesn't count."
         | 
         | It should be obvious that if the robot is simply scraping web
         | sites and reproducing their text verbatim (without permission
         | and without giving credit) that would be an infringement.
         | 
         | There are a lot of shades of gray between that and the other
         | extreme, which is where it is scraping millions of sites,
         | learning from them, and producing something that isn't all that
         | similar to any of them. Both ends of the spectrum, and
         | everywhere in between, are things that humans can do, but as
         | machines get more capable this is getting trickier and trickier
         | to sort out.
         | 
         | In this case, it sounds like it might be closer to the first
         | example, since significant parts of the code will be verbatim.
         | 
         | Ultimately, I am hoping that such things cause us to completely
         | rethink copyright law. The blurriness of it all is becoming too
         | much to make laws around. We just need better mechanisms to
         | reward people for creating valuable IP that they allow people
         | to freely use as they please.
        
           | IfOnlyYouKnew wrote:
           | Copyright requires a certain amount of creativity involved in
           | its creation. I strongly suspect most code snippets of a few
           | lines just don't qualify.
        
         | blibble wrote:
         | there's a nice example here of it reproducing carmack's famous
         | inverse square root function from Quake 3 (sans GPL, of course)
         | 
         | https://twitter.com/mitsuhiko/status/1410886329924194309
         | 
         | this is clearly copyright infringement, and if it isn't: it
         | should be
        
           | [deleted]
        
           | dang wrote:
           | _Copilot regurgitating Quake code, including sweary comments_
           | - https://news.ycombinator.com/item?id=27710287 - July 2021
           | (625 comments)
        
         | boxfire wrote:
         | So what happens when someone makes a transformer network that
         | can read fanfics and animate them live trained from the whole
         | collection of MPAA movies? I mean its inevitable. Given the
         | history of the MPAA, I don't think they're gonna lie down and
         | just take it. I feel like we're in a slippery slope to provoke
         | the "IP lords" into brutally draconian measures that will make
         | the Disney copyright extensions look like a tax deferral.
        
           | runawaybottle wrote:
           | We are toeing the crater line. Quite frankly, there's clear
           | evidence that humans have little regard for plagiarism versus
           | inspiration.
           | 
           | Will co-pilot offer royalties for auto suggestions that are
           | committed to code bases? I'm sure our ML can track how
           | similar the commits were.
           | 
           | It's always fascinating to me how we have the tech to take,
           | but never to give. Pay the motherfucker you stole this shit
           | from.
           | 
           | The proverbial: https://youtu.be/6TLo4Z_LWu4
        
         | croes wrote:
         | Looks like more than a minor infringement
         | 
         | https://news.ycombinator.com/item?id=27710287
         | 
         | And reading is no infringement but writing maybe is.
        
         | jhgb wrote:
         | > Infringement is for humans only; when computers do it, it's
         | fair use.
         | 
         | But ultimately the human is OK-ing the code and committing it,
         | basically as his own work most of the time. I'm reasonably sure
         | that this may matter to courts.
        
         | erhk wrote:
         | An AI isnt learning from it. Its effectively copying prior work
         | when it solves a problem. There is no novel out of bounds data
         | generation by modern ai approaches
        
       | devinplatt wrote:
       | > This product injects source code derived from copyrighted
       | sources into the software of their customers without informing
       | them of the license of the original source code. This
       | significantly eases unauthorized and unlicensed use of a
       | copyright holder's work.
       | 
       | It appears that GitHub wishes to address this issue via UI
       | changes to Copilot. A quote from a recent post on GitHub[0]:
       | 
       | > When a suggestion contains snippets copied from the training
       | set, the UI should simply tell you where it's quoted from. You
       | can then either include proper attribution or decide against
       | using that code altogether.
       | 
       | > This duplication search is not yet integrated into the
       | technical preview, but we plan to do so. And we will both
       | continue to work on decreasing rates of recitation, and on making
       | its detection more precise.
       | 
       | That post is also on the Hacker News front page right now[1], but
       | has 10% of the upvotes as this post so it's less visible.
       | 
       | I'm hoping all the criticism will encourage GitHub to make a
       | better product.
       | 
       | [0]: https://docs.github.com/en/github/copilot/research-
       | recitatio...
       | 
       | [1]: https://news.ycombinator.com/item?id=27723710
        
       | emersonrsantos wrote:
       | Copilot assumes the code in the repo is right, so just start
       | putting some wrong code there as an anti Copilot measure.
        
         | Engineering-MD wrote:
         | Hide the hay in a pile of rotten hay as it were.
        
       | darnfish wrote:
       | People really sign up without reading Terms of Conditions and
       | then complain when GitHub decides to do something with the data
       | that you've given them permission to use under the ToS
        
         | Engineering-MD wrote:
         | A tiny percentage (less than 1%) [0]of people read terms and
         | conditions- they are long, repetitive and often in legal
         | language. If you expect to read every terms and conditions and
         | privacy policy (and every change there of), you would waste
         | over 240 hours over the year.[1]
         | 
         | [0] Bakos, Y., Marotta-Wurgler, F. and Trossen, D. R. (2014)
         | 'Does Anyone Read the Fine Print? Consumer Attention to
         | Standard-Form Contracts', The Journal of Legal Studies, 43(1),
         | pp. 1-35. doi: 10.1086/674424.
         | 
         | [1] McDonald, A. M. and Cranor, L. F. (2008) 'The Cost of
         | Reading Privacy Policies', A Journal of Law and Policy for the
         | Information Society, 4(3), pp. 543-568.
        
       | calvinmorrison wrote:
       | I abandoned github when they put code that was not licensed (is:
       | copyright retained) and reproduced it and saved it in their
       | Arctic Vault without the authors consent (mine)
        
         | Retr0id wrote:
         | How is the Arctic Vault different from any other offsite
         | backup?
         | 
         | I suppose one issue is that you (presumably) can't request
         | deletion from it (which may even be a GDPR violation).
         | 
         | Edit: I looked up the relevant GDPR stuff, apparently there's
         | an exemption for when "erasing your data would prejudice
         | scientific or historical research, or archiving that is in the
         | public interest.", which it arguably includes the Arctic Vault.
        
           | dheera wrote:
           | GDPR only applies to EU users.
        
             | Retr0id wrote:
             | Arctic Vault includes code written by EU users, and there
             | is similar legislation in non-EU jurisdictions, e.g.
             | California's CCPA
        
           | corty wrote:
           | There is an exception paragraph for various kinds of archives
           | in GDPR: https://www.privacy-
           | regulation.eu/en/article-89-safeguards-a...
        
         | wcerfgba wrote:
         | What's wrong with the Arctic Code Vault [1]? Is the only
         | problem that they didn't seek your consent? How is it different
         | to deploying a new availability zone and having your public
         | repos accessible on another server? Your code is preserved
         | verbatim, and it's not possible for GitHub to provide their
         | service without the right to make verbatim copies of your code,
         | which presumably you agreed to as part of their ToS.
         | 
         | [1] https://archiveprogram.github.com/arctic-vault/
        
           | calvinmorrison wrote:
           | I guess copying my code to microfiche is basically reprinting
           | it without my permission.
        
             | Dylan16807 wrote:
             | But LTO is fine? I was going to ask if it was because it's
             | not _intended_ as a backup, but that 's not even true, this
             | _is_ intended as a backup on a long time scale.
        
           | dmitriid wrote:
           | > What's wrong with the Arctic Code Vault
           | 
           | It's nothing more than a publicity stunt whose one and only
           | purpose is to advertise GitHub.
        
         | lifthrasiir wrote:
         | Github does not own the Arctic Vault, there is an independent
         | company behind it [1]. Given its purpose as a long-term
         | archival, it is likely that exemptions to the copyright for
         | (library) archival can apply here. [EDIT: This is probably not
         | true, see the reply for the reason.]
         | 
         | [1] https://www.piql.com/awa/
        
           | dmitriid wrote:
           | > Github does not own the Arctic Vault, there is an
           | independent company behind it
           | 
           | Github are the ones doing all the archiving. So, in essence,
           | they _do_ own that. Piql are just the ones providing the
           | storage: it 's a commercial for-profit entity employed for
           | backup by another commercial for-profit entity.
        
             | lifthrasiir wrote:
             | It is technically true, but the Arctic World Archive
             | specifically "accepts deposits that are globally
             | significant for the benefit of future generations, as well
             | as information that is significant to your organisation or
             | to you individually" [1]. So it doesn't accept any data (at
             | least as far as I see) and the Github archive should also
             | have met this criteria.
             | 
             | By the way, my initial statement that it may qualify for
             | copyright exemptions turned out to be false for a different
             | reason. They only apply when the library and/or archive in
             | question is open to the public, and the Github Arctic Vault
             | isn't. Thus I think it's actually a Github's generic usage
             | grant in the ToS [2] that allows for the Vault. The Copilot
             | is, of course, very different to anything described in the
             | ToS.
             | 
             | [1] https://arcticworldarchive.org/contribute/
             | 
             | [2] https://docs.github.com/en/github/site-policy/github-
             | terms-o...
        
               | dmitriid wrote:
               | > but the Arctic World Archive specifically...
               | 
               | ...provides prime-rate marketing bullshit in its
               | marketing materials
               | 
               | > Thus I think it's actually a Github's generic usage
               | grant in the ToS
               | 
               | If you refer to Section D.4, then:
               | 
               | - Arctic Vault is not "for future generations", but for
               | GitHub only, since that section doesn't permit GitHum to
               | just make copies willy-nilly for anything other than "as
               | necessary to provide the Service, including improving the
               | Service over time" and "make backups"
               | 
               | - This specifically makes GitHub "the owner" of that
               | data, and not "some third-party" as you originally
               | suggested
        
               | lifthrasiir wrote:
               | If you insist the term "owner" for copyright grants, you
               | have a faulty understanding of copyright. The terms of
               | service, much like software license, only allows for the
               | licensee to do some specific things (in this case,
               | including backups) under certain circumstances agreed
               | upon in advance. Copyright assignment, which is akin to
               | the ownership transfer, is much harder.
               | 
               | > This specifically makes GitHub "the owner" of that
               | data, and not "some third-party" as you originally
               | suggested
               | 
               | This one is my fault though, I've used the "Arctic Vault"
               | as an archival site, but as I later realized it is a
               | Github's archive stored in the Arctic World Archive. So
               | yeah, it's (only) Github that can retrieve the data.
        
         | CognitiveLens wrote:
         | I haven't read this interpretation of the Arctic Vault project
         | - presumably most users of GitHub are okay with their code
         | being reproduced/backed up across many production servers for
         | fault tolerance. Making an 'extra special' long-term backup in
         | the Arctic Vault doesn't seem like a meaningfully different
         | action to me - i.e. using a cloud-based host is essentially
         | opting in to this kind of 'license violation'.
         | 
         | If they had taken one of their existing DB/disk backups and
         | called it a vault, would that have been an issue?
        
       | pmarreck wrote:
       | Should I agree with this guy if I believe all software should be
       | open-source? I don't think snippets of code have copyright
       | strength; we pass them around constantly in Slack chatrooms, IRC
       | and Stackoverflow...
        
       ___________________________________________________________________
       (page generated 2021-07-03 23:00 UTC)