[HN Gopher] I do not agree with Github's use of copyrighted code...
___________________________________________________________________
I do not agree with Github's use of copyrighted code as training
for Copilot
Author : janvdberg
Score : 544 points
Date : 2021-07-03 19:21 UTC (3 hours ago)
(HTM) web link (thelig.ht)
(TXT) w3m dump (thelig.ht)
| saurik wrote:
| I never hosted--with quite some prejudice, even--any of my
| projects on GitHub (for a number of reasons that are off topic
| right now)... it didn't matter, though: people take your code and
| upload it to GitHub themselves (which is their right); so you
| can't avoid Copilot by simply self-hosting your repositories.
| BiteCode_dev wrote:
| Github is just the begining, they will crawl any open source
| code, crawling npm, pypi, cpan, public gitlab...
|
| If your code is open source, they will get it.
|
| That's kinda the point of open source.
| stingraycharles wrote:
| I'd argue that this new use case is very interesting to open
| source and how it relates to the various licenses, and not
| necessarily "the point of open source".
|
| I can imagine people being OK with their code being used as-
| is, and/or being modified, but not used completely out of
| context to train some corporate AI to inject code into
| commercial code based.
| CameronNemo wrote:
| Agreed. I am considering relicensing all of my permissively
| licensed code because of this. The fundamental assumptions
| I had when releasing that code under a permissive license
| have been violated.
| nxc18 wrote:
| Indeed, the Windows Research Kernel itself is on GitHub. Kind
| of amazing that Microsoft is hosting their own pirated OS
| kernel.
|
| https://github.com/cnsuhao/Windows-Research-Kernel-1
| ipaddr wrote:
| Is there a local open source version of this?
| throwaway_09870 wrote:
| Semi-related question, The MIT license template has "Copyright
| (c) 2021 <copyright holders>", but don't I have to register
| copyrights somewhere? I've always been confused by this. Do I
| just stick "Copyright MyName" in my GitHub repos? It seems like
| this is what most people do..
| city41 wrote:
| In the US, copyright is automatically granted: "Copyright
| protection in the United States exists automatically from the
| moment the original work of authorship is fixed"
|
| https://www.copyright.gov/circs/circ01.pdf
| macintux wrote:
| Copyright does not require registration, although there may be
| advantages to doing so.
| tyrex2017 wrote:
| My feeling is that only 30% of the outrage against Copilot is
| honest.
|
| 50% is anti-big-tech and 20% is our fear of being made redundant.
| okareaman wrote:
| Of all the hills to die on, this seems like an odd choice. Why
| not work with others to iron out the legal and technical issues
| with this new technology?
| qayxc wrote:
| It's a reflex with some people and that's OK.
|
| Not everyone has the patience and ability to discuss their
| objections in a public forum while their rights are being
| violated (in their view).
|
| Some people have a passion and a very strong belief in their
| ideals and I applaud them for following through with it, even
| if I don't necessarily share their opinion on the matter.
| supergirl wrote:
| how well does copilot respect licenses? I'm willing to bet it
| accidentally ingested some GPL code and will be spitting it out
| at some point. would that be allowed by GPL?
|
| also, someone could deliberately obfuscate the license text to
| fool it but still be clear enough for humans. something like
| "License: if you use this source code to train a bot then you
| must obtain a commercial license, otherwise MIT license applies".
| bot searches for "MIT" and thinks it's safe.
| jpswade wrote:
| Pretty sure copyright and IP laws don't overrule innovation.
| hiyou102 wrote:
| It seems weird to do this over a feature that is still in
| technical preview, we don't even know if this product will ever
| ship publicly. I'm guessing a public release is still years off
| given the number issues they need to work through before release.
| My understanding is that they are working on an attribution
| system to catch cases with common code. Beyond that this person
| seems to use the MIT licensed code which already can be used
| internally by a company to host a proprietary service without
| attribution. It would make more sense to be outraged if you were
| using AGPL or something.
| nilshauk wrote:
| I used to admire GitHub for being a fully bootstrapped company
| and free to pursue a path in the world they believed in as a
| company.
|
| Since the Microsoft acquisition it's becoming painfully obvious
| how unhealthily centralized the dev world has become, and they
| seem to strive to become ever more entrenched in the name of
| maximizing shareholder value.
|
| I only have a small amount of open source projects on GH but I
| intend to vote with my feet and abandon the platform by self-
| hosting Gitea. By itself it won't be a big splash but I'm
| inspired by posts such as this and I hope to inspire someone else
| in turn. Of all people we devs should be able to find good ways
| to decentralize.
| Karrot_Kream wrote:
| What does this have to do with Copilot?
| [deleted]
| remram wrote:
| In this case that might not help you at all. If your project is
| popular enough, somebody will mirror it on GitHub, where they
| are free (or believe they are) to incorporate your code in
| Copilot. Voting with your feet might be helpful long-term but
| will not protect you from this particular "feature".
| jollybean wrote:
| The argument that 'machines can learn from the code to produce
| something novel' doesn't bode well given copilot may very well
| produce code that is straight up cut and paste.
|
| This just seems like a massive lawsuit waiting to happen.
|
| What happens when you discover that you're using '20 lines of
| code from some GPL'd thing'?
|
| What will your lawyers say? Judges?
|
| It seems to me that if you use Copilot there's a straight up real
| world chance you could end up with GPL'd code in your project. It
| doesn't matter 'how' it got there.
|
| I don't understand therefore how any commercial entity could
| allow this to be used without absolute guarantees they won't end
| up with GPL'd code. Or worse.
| 29athrowaway wrote:
| Microsoft being Microsoft.
| cjohansson wrote:
| It's strange how this thing was not an opt-in feature at GitHub.
| I also feel like this thing is a violation of my integrity and I
| will consider stop using GitHub as well
| lolinder wrote:
| From their perspective, they weren't doing anything that abused
| their privileged position. Tabnine trained their model on open
| source code, much of which was probably hosted on GitHub. Why
| should GitHub have to ask permission if tabnine didn't?
|
| Whether training an ML model on code is fair use is still an
| open question, but I don't think GitHub is a greater villain
| here than anyone else doing the same thing (at least until they
| start using private repos).
| talkingtab wrote:
| The essential isue is simple: taking someone's work product and
| financially profiting from that work without paying for it. No
| matter what, that is just wrong.
| breck wrote:
| It's time to abolish copyright (http://www.breckyunits.com/the-
| intellectual-freedom-amendmen...). It absolutely makes no sense--
| unless your rich and don't care about the progress of the arts
| and sciences.
|
| You can spin your wheels all you want but going from simple first
| principles it is fundamentally flawed. If you believe ideas can
| be property, then you believe people can be property.
| tomtheelder wrote:
| > If you believe ideas can be property, then you believe people
| can be property.
|
| Can you defend that? I generally think copyright isn't a great
| idea as it exists, but this statement feels extremely dubious
| _at best_.
| voidnullnil wrote:
| This is hyperbole / pretend outrage. No sane person claims to be
| outraged at a company because they made a silly oversight in
| their experimental product. Obviously Github can just create one
| instance of Copilot trained with each incompatible license. Even
| if it used heuristics to find out the license, a tiny subset of
| code that is accidentally admitted into the training would be
| negligable, and copyright concerns in software have always been
| already overblown to begin with.
| kuon wrote:
| I've given thousands of hours to open source projects, I really
| think open source is a pillar of modern society. So you would
| think I am all for something like copilot, but no.
|
| At first I thought this was a great feature, because easier
| access to code, but after some reflection, I am also very
| skeptical.
|
| I am able to make my code open source, because I can make a
| living out of it, and I have a lot of open source code that I
| love to share for things like education or private stuff, but if
| you want to use it for something real, you need to hire me. If
| you can suck all the code without even I noticing it, that's not
| fair.
|
| The other thing is code quality. I don't want to sound rude, but
| there are tons of bad code around. Not necessarily because the
| author is unskilled, but because the code might not need to be
| high quality (for example I wrote a script to sort my photos, it
| was very hastily written and specific to my usage, I used once
| and was done with it). Also, there are some bad/wrong pattern
| that are really popular.
|
| I am surprised you are able to DMCA a twitch stream because
| someone whistle india jones theme but in this case it is
| considered fair use.
| jefftk wrote:
| _> have a lot of open source code that I love to share for
| things like education or private stuff, but if you want to use
| it for something real, you need to hire me. If you can suck all
| the code without even I noticing it, that 's not fair_
|
| Co-pilot aside, that's already how it works today. If you make
| something open source, I can use your code to power my
| business, and I'm under no obligation to hire you. It's great
| when companies give back to open source, either by supporting
| the projects they depend on, or by open sourcing their own
| internal projects, but it's not obligatory.
|
| If you don't want people to independently profit from your
| code, don't release it under a license that allows commercial
| use
| johnday wrote:
| It sounds like the person you're responding to _already_
| releases their code under a non-commercial license. The
| problem with Copilot is that it may allow commercial
| enterprises to _avoid_ such a license by copying the code
| verbatim from their repositories, possibly without any party
| involved knowing that it 's happened.
| jefftk wrote:
| Where are you seeing that they are using a non-commercial
| license?
|
| (And non-commercial licenses are not open source:
| https://opensource.org/osd)
| krono wrote:
| > If you make something open source, I can use your code to
| power my business
|
| "Open source" isn't a license. You're not allowed to just use
| any open source software that doesn't contain a license by
| default.
| Denvercoder9 wrote:
| The conventional definition of "open source" is software
| licensed under a OSI-approved license.
| jefftk wrote:
| The standard definition of "open source" is
| https://opensource.org/osd, which has:
|
| "Open source doesn't just mean access to the source code.
| The distribution terms of open-source software must comply
| with the following criteria: ... The license must not
| restrict anyone from making use of the program in a
| specific field of endeavor. For example, it may not
| restrict the program from being used in a business, or from
| being used for genetic research."
| xyzzy_plugh wrote:
| > I am able to make my code open source, because I can make a
| living out of it, and I have a lot of open source code that I
| love to share for things like education or private stuff, but
| if you want to use it for something real, you need to hire me.
| If you can suck all the code without even I noticing it, that's
| not fair.
|
| If you license your software such that I can do whatever I want
| with it, then I can do whatever I want with it. I don't see how
| you can then go on to claim it isn't fair if I'm using as you
| allow.
| CameronNemo wrote:
| Personally, I make a distinction between legal and socially
| acceptable.
|
| If one of the richest corporations on Earth can't be bothered
| to share patches for permissively licensed code that they
| use, I will gladly shame them.
|
| It's a different story for a small shop with no legal
| department and wariness about being sued over its use of open
| source code.
| ricardobeat wrote:
| > I am surprised you are able to DMCA a twitch stream because
| someone whistle india jones theme but in this case it is
| considered fair use
|
| Why is it surprising? Indiana Jones is private IP. The code was
| published with an OSS license explicitly authorizing its use.
| edem wrote:
| I second the DMCA, it is ridiculous.
| judge2020 wrote:
| > but if you want to use it for something real, you need to
| hire me.
|
| No I do not. Even strictly proprietary code can be copied and
| used in a for-profit way without approval as long as it
| qualifies for fair use.
| okamiueru wrote:
| "Something real" and "fair use" don't get along. I'm also not
| sure fair use trumps licensing, since one is a copyright
| issue, and the other is the terms of use. You don't get to
| copy a snippet of GPL code and get away by calling it fair
| use. At least, I hope it isn't the case.
| daenz wrote:
| I think it's clear that Copilot pushes boundaries...technological
| and legal. It makes people uncomfortable and challenges a lot of
| assumptions that we have about the current world. But this is
| exactly what I expect from the next revolutionary change in
| computing.
| xunn0026 wrote:
| Because if it's somebody that has to push boundaries it's not
| the plebs it's trillion dollar companies.
| emerged wrote:
| Yeah we need to work through all the issues this exposes. It's
| going to be complicate and messy but it's been inevitable for a
| long time.
| crazypython wrote:
| GitLab is pretty good: https://gitlab.com
|
| Many of its features are available in the self-hostable free and
| open-source version, GitLab CE.
| forgotmypw17 wrote:
| I'm looking for a new place, because of GitHub's new policy of
| not supporting password authentication.
|
| I sometimes code from devices which are not my own and on which
| key management is a major impediment and accessibility issue for
| me.
|
| Does anyone know how those listings were generated? I like their
| simplicity, and would like to do something similar.
| fshee wrote:
| Grats. I abandoned a while ago as well. If anyone is looking for
| a rec for self-hosting: Gitea is cake. Sits nicely behind Caddy
| as all my other services do. Alternatives such as Gitlab I found
| wanted to 'own' too much of my system.
| hawski wrote:
| I have some of my code BSD0 licensed (in practice public domain).
| One thing that I'm vary of regarding Copilot is: what would
| happen if my code would become a part of some proprietary code by
| a big multinational corporation and then they would DMCA me out?
| I'm a bit in the middle of a digital housekeeping and I think I
| will move my code somewhere else, because of it.
| saurik wrote:
| Your code will end up on GitHub anyway if other people find it
| useful, as the majority of developers don't even understand you
| _can_ self host git repositories, so they only know how to do
| their own development by taking the code they find and putting
| it on GitHub first.
| monokh wrote:
| Slightly off topic: Is the git frontend [1] open source? If not,
| are there some very light self hosting ones like it?
|
| [1] https://thelig.ht/code/
| nfoz wrote:
| Check out sourcehut!
|
| https://sourcehut.org/
| wmichelin wrote:
| I was also wondering this. I'm unfamiliar with linux kernel
| development but this reminds me of that.
| dchest wrote:
| This looks like https://codemadness.org/stagit.html
|
| Other popular choices are gitweb and cgit (both dynamic).
| prezjordan wrote:
| You may like Fossil! https://www.fossil-
| scm.org/home/doc/trunk/www/index.wiki
| dang wrote:
| Recent and related:
|
| _Copilot regurgitating Quake code, including sweary comments_ -
| https://news.ycombinator.com/item?id=27710287 - July 2021 (625
| comments)
|
| _GitHub Copilot as open source code laundering?_ -
| https://news.ycombinator.com/item?id=27687450 - June 2021 (449
| comments)
|
| Also ongoing, and more or less a duplicate of this one:
|
| _GitHub scraped your code. And they plan to charge you_ -
| https://news.ycombinator.com/item?id=27724008 - July 2021 (148
| comments)
|
| Original thread:
|
| _GitHub Copilot_ - https://news.ycombinator.com/item?id=27676266
| - June 2021 (1255 comments)
| owlbynight wrote:
| Has this person been in a coma? If I utilize a free service on
| the Internet, I'm trading for some kind of convenience with the
| knowledge that I am in some way being boned in the backend by
| teams of people, all of whom are likely more clever than I am and
| using my patronage to some kind of nefarious end.
|
| The Internet isn't really a place to exercise an inflexible moral
| code. His new repository probably can be traced back to slave
| labor somehow if someone digs deep enough. Probably won't even
| take 6 degrees of separation.
|
| If it makes it easier for me to code and gives me more time to do
| something other than work without doing irreprovable harm to some
| sentient entity, I'm firmly in the who gives a shit camp.
| kennywinker wrote:
| > His new repository probably can be traced back to slave labor
| somehow
|
| And you're ok with that? It doesn't HAVE to be like this. Just
| because you've chosen nihilism, doesn't mean that's the only
| choice, and it certainly doesn't help anything.
| owlbynight wrote:
| Of course I'm not okay with that, but I'm also not
| disillusioned about my lack of control over the technology in
| which I've chosen to build my life around. We parasites can't
| complain really complain that our hosts smell like shit when
| we're riding them to the bank, can we?
|
| It doesn't HAVE to be like this, it just is, and all of the
| alternatives suck. If you want to choose to inconvenience
| yourself in order to pass a morality test that doesn't exist,
| go ahead I suppose.
| [deleted]
| einpoklum wrote:
| I wish I had the guts to leave only "tombstones" for my GitHub
| projects, pointing to other sites where they're actually stored.
|
| Unfortunately, GitHub enjoys the effect of most people being on
| it (correct me if I'm wrong), and leaving it is costly,
| regardless of whether the alternative is a reasonable service or
| not.
| [deleted]
| soheil wrote:
| To address a lot of the negativity around copyright fair-use,
| Copilot should have probably adopted something like
| Stackoverflow's model where contributors get rewarded by points.
| In this case the repo that the code used by Copilot came from
| would get a new type of star rating and the more people used it
| Copilot would assign more stars. Fractional stars would be
| awarded depending on what fraction of each code snippet Copilot
| thinks came from a specific repo...
|
| It could maybe at some point send rewards in form of donations
| etc. from Copilot users, similar to Sponsored repos.
| NomDePlum wrote:
| Very simplistically, my understanding on these matters is:
|
| "You know what you know."
| MillenialMan wrote:
| I think there's an argument to be made that machine learning is a
| compression algorithm, so training a model on copyrighted data is
| quite direct copyright infringement - you're essentially
| compressing, then redistributing, that data.
|
| Has this ever been used as an argument in a legal case?
| [deleted]
| [deleted]
| rubyist5eva wrote:
| Your mom
| ddmma wrote:
| Interestingly enough only announced copilot as extension this
| became a problematic while the model generated code since was
| launched. I suppose it's difficult to prepare billions of line
| codes or data points and everyone to be happy.
| [deleted]
| gfodor wrote:
| This is a hell of a Pandora's box that's being cracked open here.
|
| Interesting times ahead. For example, if you believe these kinds
| of tools will become a huge competitive advantage, and that the
| inclusion of GPL code is a meaningful force multiplier, it kind
| of implies the fusion of AI code generation and the GPL will eat
| the world.
| saurik wrote:
| Only if people understand that the result is under GPL; if they
| don't, then this is a mechanism to slowly "launder" the work
| people put into GPL code to funnel into non-GPL codebases.
| xbar wrote:
| Why is human understanding going to prevent this? Doesn't it
| seem like this is precisely the de facto function of Copilot:
| a license laundering machine?
| saurik wrote:
| If humans understand this then presumably lawyers would
| start hunting for code replication caused by Copilot--using
| automated mechanisms similar to those used by professors at
| Universities to catch people cheating--and do the moral
| equivalent of ambulance chasing: offering to file all the
| paperwork on spec for a cut of an assured payout. But if
| people in general believe this to be fair use somehow, then
| GPL is essentially dead (I have been a big advocate for it
| over the years, and if people are doing this--and everyone
| thinks it is OK--then it loses the entire point as far as I
| am concerned).
| gfodor wrote:
| It depends on which "people" you're referring to. I suspect
| the degree to which the programmer knows this is of little
| relevance to the question of how the legal + risk management
| implications will play out.
| saurik wrote:
| I mean general people people, not only developers: people
| includes managers and lawyers and politicians and everyone
| who might cause you to have GPL Copilot separate from MIT
| Copilot... the same people who right now cause licenses to
| matter, despite many developers not understanding anything
| about copyright law and just thinking "I'll steal that
| other developer's work as it makes my life easier".
|
| If anything, I think the real test of this tech is going to
| be audio, as it has the right overlap of "big copyright is
| going to get pissed", "there already exist tools that
| attempt to automatically detect even small bits of
| infringement", "people actually litigate even small bits of
| infringement", and "it feels feasible in the near future":
| you whistle a tune, and the result is a fully produced
| backing track that sometimes happens to exactly sound like
| the band backing Taylor Swift on a recognizable song and
| generates Taylor Swift's voice, almost verbatim, singing
| some of her lyrics to go along with it.
| Yaina wrote:
| This is not the first person I've seen that ditches GitHub in
| favor of some other front-end... and it's not uncommon that they
| look like this which is often baffling to me.
|
| Say what you want about GitHub's almost monopoly position, but
| the UX is really great and accessible even to non-technical
| people. Maybe you don't need that, maybe you don't want the
| issue-trackers, but it's worth thinking about who you're
| excluding with these kind front-ends.
| orlovs wrote:
| Lets face, gitlab and github valuation is based on future "ai"
| code autogens. Brave new world
| yawaworht1978 wrote:
| I cannot see them even throwing together a library. For
| example, how could they architect, let's say something like
| jQuery?
| voidnullnil wrote:
| Coming up with this "return $this" pattern to emulate a
| composition operator seems like a very AI thing to do.
| yonixw wrote:
| Based on their website, Gitlab are pushing the CI\CD future not
| AI.
| canadianfella wrote:
| The British way of using "are" on a non-plural word is weird
| to me and always looks very awkward.
| qayxc wrote:
| The irony of it all is that their code will find its way into the
| next Common Crawl release anyway and that's used to train GPT-3,
| which in turn forms the basis of OpenAI Codex, which is the
| product that CoPilot builds on...
|
| So hosting elsewhere _might_ not safe your code from ending up
| deep in the bowels of some corporate black-box ML model that
| occasionally regurgitates your IP if accidentally given the wrong
| (right?) prompt.
|
| If you make your code public, you basically accept that someone
| will copy it verbatim. Other companies still might have it in
| their closed source product somewhere, even if it's just
| accidental copypasta from SO.
| enraged_camel wrote:
| It's kind of interesting how quickly sentiment turned negative.
| The original feature showcase/announcement post was full of
| excitement by HN (which is kind of strange, if you think about
| how skeptical the HN crowd is towards AI/ML and automation of
| programming) but it hasn't been a week and people are already
| talking about the questionable ethics and potentially disastrous
| consequences of using the feature.
| arp242 wrote:
| I can't speak for anyone else, but when I first saw it, it
| seemed kind of okay, but I also didn't really look too deeply
| in to it. As I've looked at it a bit more closely and thought
| about it for a few days, my original feelings have soured quite
| a bit.
|
| I never considered the copyright and related ethical
| implications of ML at all, or thought about the impact it may
| or may not have on programmers. Your first thoughts on
| something can be wrong (and actually, often are) and it takes a
| bit to really think things though - or at least, it does for
| me.
| tyingq wrote:
| Do you mean this post?
| https://news.ycombinator.com/item?id=27676266
|
| There's plenty of skepticism there, even in the early comments.
| maximilianroos wrote:
| To what extent are these expressions driven by a genuine
| allegiance to strict copyright laws?
|
| As opposed to an anxiety that a machine might be able to do some
| of our jobs better than we can?
| nomercy400 wrote:
| This is exactly why people have issue with Github's Copilot.
|
| It's not the technology, but the fact that any code you pushed to
| GitHub in the past 13 years is now 'accessible' to anyone.
|
| Private repo? Paid account? Deleted repo five years ago? Deleted
| repo today? Proprietary code? Embarassing commits? Accidental API
| keys or passwords in commits?
|
| All 'available'.
|
| It feels like the entirety of GitHub was just 'leaked', and
| converted into a marketable product.
|
| Would you push your code to a service if you knew it could be
| read by anyone one to ten years from now? Even if you paid to
| keep it a secret?
| jeroenhd wrote:
| I know that some people have uploaded the Microsoft research
| kernel or even the leaked Windows source code to github at some
| point.
|
| I wonder what Microsoft will do when snippets from that code
| start appearing in your code because of copilot. I'm guessing
| their lawyers wouldn't accept "the robot did it" as an excuse
| in that case.
|
| I'm tempted to just throwing stuff like "AWS_KEY=" at the
| algorithm and see how many working credentials I can steal from
| private repos.
| enriquto wrote:
| > I'm tempted to just throwing stuff like "AWS_KEY=" at the
| algorithm and see
|
| Anybody tried? What does actually happen if you do this kind
| of thing? I can think of a few more obvious "script kiddie"
| ideas, but I won't post them here lest a copilot developer
| sees it and closes all the elementary stuff.
| danielbln wrote:
| Wasn't Codex (the tech underlying CoPilot) trained on purely
| publicly available repos?
| lars wrote:
| Yes, it was. From their site: "It has been trained on a
| selection of English language and source code from publicly
| available sources, including code in public repositories on
| GitHub."
| IshKebab wrote:
| Yes. nomercy400 is wrong.
| WillDaSilva wrote:
| The issue of deleted repositories being available through it
| would still exist. Whether or not GitHub should be blamed for
| that is another matter.
| lolinder wrote:
| Once you put something on the internet, you should assume
| it still exists out there somewhere even after deleting it.
| Even before copilot, all credentials that end up in a repo
| needed to be changed. I'm not sure what's supposed to be
| different now.
| ForHackernews wrote:
| > Would you push your code to a service if you knew it could be
| read by anyone one to ten years from now? Even if you paid to
| keep it a secret?
|
| I'm old enough to remember when "assume anything you put in
| cleartext online is public" was received wisdom. We were taught
| that if you want to keep something private, keep it encrypted
| on your own local media. Or, failing that, at least on a server
| you control.
| lolinder wrote:
| I'm not sold on the product, but it's important to note that
| GitHub Copilot was only trained on public repos, which means
| nothing should be out in the open that wasn't already made
| public by the authors.[0]
|
| > GitHub Copilot is powered by OpenAI Codex, a new AI system
| created by OpenAI. It has been trained on a selection of
| English language and source code from publicly available
| sources, including code in public repositories on GitHub.
|
| [0] https://copilot.github.com/
| M4v3R wrote:
| While I understand the sentiment wasn't Copilot trained on code
| not only hosted on GitHub, but found all over the Internet? Which
| means hosting your code yourself would not prevent GitHub from
| using it to train Copilot. That raises an interesting question
| though - how do you opt out? Is there even a way to do it?
| brobdingnagians wrote:
| I guess it goes back to closed source / trade secrets
| territory. If you have something you really don't want stolen,
| it is safer to never expose it and never trust that the law
| will fairly protect you.
|
| The irony is that copilot won't suggest its own source code,
| just everyone else's. It is open source without the benefits.
| axismundi wrote:
| Smells like Microsoft
| yumraj wrote:
| More like _Open_ AI
| bryanrasmussen wrote:
| robots.txt, or a copyright notice saying the code can't be used
| to train AI which bots will ignore and open their corporate
| masters to liability.
|
| on edit: fixed typo
| ezoe wrote:
| Bad news for you. Japanese copyright law, article 47-7
| explicitly allow using copyrightable works for data analysis
| by means of a computer(including recording a derivative work
| created by adaptation)
|
| It should be considered as fair-use of USA except we don't
| use Common Law system so we explicitly state what exempt from
| the copyright protection.
| bryanrasmussen wrote:
| Thanks for the bad news! Not glad to hear it but glad to
| know something I didn't.
|
| That said - so they would be able to sell some things in
| Japan that they couldn't other places.
| amelius wrote:
| What if a Japanese software company uses it to write
| software, which is then sold in the US?
|
| It's still copyright laundering, if you ask me.
| blihp wrote:
| robots.txt is a convention for those who want to be good 'web
| citizens' rather than legally binding. It does absolutely
| nothing to stop someone who ignores your wishes. For example,
| there are tons of bots that ignore robots.txt entirely or
| even go straight for the thing (i.e. 'hey, thanks for telling
| us where to look!') you're telling them to avoid in
| robots.txt. While copyright is a mechanism that can be used
| if you can make the case, and have the means, it will only
| work for entities that have something to lose and are within
| a jurisdiction where it matters.
| woodruffw wrote:
| This kind of learning needs to be opt-in, not opt-out.
|
| I would also be extremely surprised if most open source
| copyright holders didn't already expect their licensing terms
| to protect against this kind of code/authorship laundering.
| Speaking individually, I know that it certainly surprised me
| to hear that GitHub thinks that it's probably okay to
| regurgitate entire fragments of the training set without
| preserving the license.
| rvense wrote:
| I'm not surprised. I imagine all images on the internet are
| used to train image classifiers as well. It's a shitty future,
| but it's the one we have.
| hojjat12000 wrote:
| Researchers in our lab created a huge dataset of facial
| expressions from images on the web, annotated it and
| published the URLs to the images and the annotations for
| research but made sure to search only for images with proper
| licenses. I don't think that you are allowed to just go
| download any old image and train on it. I understand the many
| many people do it, but it's not legal (as far as I know,
| please correct me if I'm wrong).
| sillysaurusx wrote:
| > I don't think that you are allowed to just go download
| any old image and train on it.
|
| My understanding as a two-year student of ML is that you
| are allowed in the US to go download any old image, train
| on it, and then release the model as long as the outputs
| are "sufficiently transformative."
|
| That last phrase is the key part, and has never been tested
| in court. It's entirely possible that either I'm mistaken
| here, or that the courts will soon say that I am mistaken
| here. https://www.youtube.com/watch?v=4FA_gt9w28o&ab_channe
| l=guava...
| saurik wrote:
| To be clear: "transformative" not meaning merely
| "altered" but really meaning "repurposed"; if the new
| work is something people could feasibly use instead of
| the old work (harming the author's original market), it
| isn't "transformative".
| sillysaurusx wrote:
| Yes. For example, arfa ran into this question when
| launching https://thisfursonadoesnotexist.com/. Lots of
| furry artists had exactly the same concerns with his work
| there, but that work is decisively transformative.
|
| Copilot seems ... well, less transformative. I'm still
| not sure how to feel.
| bsd44 wrote:
| I would like to know this too. I understand that GitHub is a
| private company and you have to accept their T&C, but surely
| they aren't allowed to use source code found elsewhere on the
| internet to train their ML models without asking for permission
| first unless it's a B2B cooperation such as with Stackoverflow.
| [deleted]
| lacker wrote:
| According to the discussion at this link, you do not need
| permission to use copyrighted data to train AI models.
| Copyright prevents you from copying data, it doesn't prevent
| you from learning from it.
|
| https://twitter.com/luis_in_brief/status/1410985742268911631.
| ..
| marcosdumay wrote:
| To train your model, yeah, probably ok. But I don't think
| anybody will see people using the duplicated code that AI
| insert on your codebase the same way.
| lloydatkinson wrote:
| Oh man I can't imagine the consequences for certain languages
| and frameworks if it uses SO answers though. Imagine if it
| trained in all the dumb and ancient answers like "how do I
| get the length of a string in javascript" and took the first
| accepted answer of "use jquery"
| bsd44 wrote:
| This raises an issue of trolling. What prevents developers
| to generate "inappropriate" code to feed it to this
| algorithm the same way they did with the Microsoft Chat bot
| for example? That will surely reflect on the quality of
| code generated by this AI system and therefore the
| stability and security of applications built.
| kingofclams wrote:
| I'm sure this will happen, and there will definitely be
| instances of the bot giving users bad code, but it would
| be incredibly difficult to make it solely give out bad
| code.
| throwaway3699 wrote:
| Are people angry at copyright violations, or Microsoft? Copyright
| and patents have their place, but they've clearly overreached
| long ago.
| EugeneOZ wrote:
| Some unknown person is trying to get some hype on "cancel github"
| cry.
|
| I don't give a shit about the Copilot, but I care even less about
| Rian Hunter and his statements.
| Nicksil wrote:
| >Some unknown person is trying to get some hype on "cancel
| github" cry.
|
| >I don't give a shit about the Copilot, but I care even less
| about Rian Hunter and his statements.
|
| This is untrue because you had a choice of not saying anything
| at all and carrying on (clearly not giving a shit) or take the
| time to leave such a comment (giving enough of a shit to inform
| everyone you don't give a shit.) So far this and Lloyd's is the
| only crying going on in this topic.
| EugeneOZ wrote:
| This is true. And I didn't even read your nickname because I
| don't give a shit about a shmuck who is trying to tell me
| what I care about :)
| lloydatkinson wrote:
| It's amazing he seems so butt hurt. I think it's an alt of
| the blog author.
| Dylan16807 wrote:
| Is it not obvious that you can care about a post on HN
| without caring about the page it links to?
|
| Back away from this specific situation for a second: If you
| would ignore something entirely if it wasn't being shoved in
| your face, complaining about it being shoved in your face and
| saying it's stupid wouldn't mean you suddenly "care" about
| the underlying item.
|
| (And no, I'm _not_ saying that an HN post is shoved in your
| face. It 's a more extreme example to make the point more
| clear.)
| lloydatkinson wrote:
| My thoughts too
| Nicksil wrote:
| Also
|
| >Who asked?
|
| But you deleted that comment.
| [deleted]
| lloydatkinson wrote:
| Who asked?
| Nicksil wrote:
| >Who asked?
|
| Now this was completely unexpected.
| wyldfire wrote:
| What a cool court case this would make. Is copilot's model
| sufficiently abstracted from the code it has read? Judges and
| juries learning about how the GitHub team avoided overfitting?
| Are humans who have read open source code producing derivative
| works?
|
| Won't be long until we see an infringement case. /me grabs
| popcorn
| gbtw wrote:
| Does github guarantee that my private repo's content are not
| being leaked this way in the future?
| orlovs wrote:
| Nah, all will gonna be fine
| errata wrote:
| Yes
| eCa wrote:
| Source?
| ralph84 wrote:
| https://docs.github.com/en/github/site-policy/github-
| privacy...
| tvirosi wrote:
| This huge revolt is interesting but I doubt it makes github very
| scared. They'll just come out with some new version of it which
| they'll show takes into account licenses (or uses a 10k or
| something dataset with hand checked licenses) and that'll be that
| and we'll forget about all this the week after.
| hashhar wrote:
| Why is this noteworthy? Who is this person? Am I missing
| something?
|
| I agree that there needs to be talk about licensing and copyright
| but with so "less/no content" there can be no meaningful
| discussion other than aimless banter.
| pcthrowaway wrote:
| > Why is this noteworthy? Who is this person? Am I missing
| something
|
| Why is this comment noteworthy? Who is this person? Am I
| missing something?
| eitland wrote:
| > Who is this person?
|
| One of the beautiful things about HN is that you don't need to
| be anything, you just have to have something interesting to
| say.
| IshKebab wrote:
| Right, but you either need a solid argument or some
| authority, and this guy has neither. He's effectively a
| nobody and he has just jumped to the conclusion that CoPilot
| is illegal.
|
| If he had a good argument for that, fine. But without that he
| really needs to be someone whose opinion I care about.
| NiceWayToDoIT wrote:
| This is somehow inverse logic. Does rape victim needs
| authority to voice raping in order to validate it? What is
| there that is not solid, CoPilot is using community code
| that is under GPL licence therefore Microsoft should not be
| able to charge for CoPilot but give it for free, or not
| create another revenue stream.
| meibo wrote:
| This isn't interesting though. It doesn't even provide any
| value. It's a random guy that doesn't like GitHub, it could
| have just as well been a HN comment from yesterday.
|
| It's just posted(not by the guy that made the page, mind you)
| to farm karma, exploit the news cycle and carve out some more
| space for discussion of this tired topic.
| [deleted]
| eitland wrote:
| If it sparks the necessary discussions I don't care if it
| was written by Joe Random Nobody or Joe Biden.
|
| > It's just posted(not by the guy that made the page, mind
| you)
|
| Others would complain if the author himself had posted
| this.
| Dylan16807 wrote:
| The necessary discussion was already sparked.
| eitland wrote:
| Well, a lot of the people with voting rights here
| obviously thought otherwise.
| Dylan16807 wrote:
| An upvote doesn't mean you think something is new or
| needed sparking. There are very often redundant posts on
| a topic.
| exolymph wrote:
| why are you on an upvote-based aggregator + forum if you're
| not looking for upvote-based links + commentary?
| qwertox wrote:
| How about just leaning back and reading the discussions which
| evolve out of this post? Some may have something to say about
| it which will either help you solidify your point of view or
| add a new perspective to it which you might have missed.
|
| The topic is a current one [1], which makes it even more
| valuable.
|
| [1] https://news.ycombinator.com/item?id=27676266
| judge2020 wrote:
| They're not particularly popular on HN:
| https://news.ycombinator.com/from?site=thelig.ht (except for
| https://news.ycombinator.com/item?id=18133450 )
|
| And their only huge project on GitHub is dbxfs, a userspace
| dropbox filesystem with 687 stars
| https://github.com/rianhunter?tab=repositories&q=&type=&lang...
|
| I think this is just a post meant to continue the discussion of
| CoPilot past the first 2 days of news.
| pmarreck wrote:
| I'm glad the post showed up because I've been in the hospital
| for 3 days and I was like HOLY SHIT WHAT IS THIS? ;)
| smartmic wrote:
| Maybe now is the time to release a GPLv4 extending-restricting-
| relating the four freedoms to non-humans.
|
| I expect the best lawyers from Microsoft have had a look into
| this and maybe there a weaknesses in GPLv3 ready to exploit for
| corporate AIs. What is the response from the FSF?
| zmmmmm wrote:
| Seems to me like they need to back out of this fast and at very
| least limit it such that it is only trained and then used on
| "license compatible" projects. eg: train it in isolation on MIT
| licensed projects and then have the user explicitly confirm what
| license the code they are working on is to enable it. Possibly
| they even need to auto-enable a mechanism to detect when code has
| been reused verbatim and enable some kind of attribution (or
| respect for other constraints) where that is required by the
| license.
| shadowgovt wrote:
| Alternatively, they'll take it head-on, pay their lawyers to
| argue fair use, and blaze a new trail through the understanding
| of copyright application that allows this ML model (and others
| like it) to exist.
|
| This is ultimately a Microsoft project, and they have Microsoft
| money and Microsoft lawyers to defend their position.
| na85 wrote:
| Feels like everyone is missing the point: Copilot will ultimately
| serve to weaken the arguments in support of software patents and
| copyright.
|
| That can only be a good thing for society (though perhaps not for
| rent seekers).
| teflodollar wrote:
| If all copilot output was automatically GPL, I would think it's
| fantastic. As it stands, it seems to undermine GPL the most.
| CameronNemo wrote:
| They should really have trained models based on the license,
| so a GPL-2.0-only model, 2+, 3+, 3 only LGPL 2.1(+), CDDL,
| MIT, et cetera.
|
| As it stands, the combined inputs leaves the model in the
| most murky of gray areas.
| MontyCarloHall wrote:
| Surprised I had to scroll so far to find this, given how
| copyleft and straight-up anti-IP so much of the open source
| community is.
|
| I think a lot more people on this site (and in the FOSS
| community in general) would be on board with Copilot if it
| respected viral licenses, e.g. if it had a way of inferring
| that the code it was copying verbatim were GPL-3 and warned the
| user that including it in their project would require them to
| GPL-3 their project as well.
| noobermin wrote:
| Honestly, that would fix every issue with it. The laundering
| of the license is the issue.
| ImprobableTruth wrote:
| But that's literally the issue. The only form of intellectual
| property that is being damaged by this is copyleft.
| CameronNemo wrote:
| The arguments are already weak. The judicial precedent,
| however, is strong. Microsoft will continue to publish
| proprietary ML models and profit off them, at the expense of
| the corpus authors (us lowly laborers).
| xunn0026 wrote:
| Not really. Back when free software was strong, it would have
| been a good thing for society since Microsoft was selling
| software in boxes on actual store shelves.
|
| Now 'the edge' is already mostly open source. All the lock-in
| and value has moved into either infrastructure or in software
| you don't even get to touch since it runs in the Cloud and you
| just provide IO to it.
| na85 wrote:
| I think in this new era of endless security breaches at cloud
| firms and M1-style processing innovation we'll see a slow but
| steady migration away from the cloud.
| dcolebatch wrote:
| I'm going to take the other side of that prediction:
|
| Endless security breaches will encourage firms to do "less
| IT" themselves and accelerate the adoption of SaaS
| solutions (and PaaS, with no/low-code etc.)
|
| Also, perhaps not a massive driver but still, not for
| nothing: M1-style processing innovation (ARM) will see more
| developers creating for ARM servers, because they can,
| which will almost exclusively be run by the hyper scale
| cloud providers.
| hsbauauvhabzb wrote:
| I used to think like this but the total cost of ownership
| of on-prem is substantially higher, and it has its own
| security implications too.
| dimgl wrote:
| No no, we get it. Some people, like myself, still think that
| copyrights serve a purpose.
| Dylan16807 wrote:
| You can say both, you know. That it serves a purpose _and_ is
| too strong.
|
| And I notice you didn't say anything about patents?
| temac wrote:
| Not really unless you force the source code of proprietary
| software to be published. If you don't, copyleft has a role to
| play.
| ThrowawayR2 wrote:
| It is certainly fascinating to see people start running away
| from " _information wants to be free_ " and other Free Software
| principles full tilt when, all of a sudden, it's _their_
| livelihoods that are on the line. Unless my recollection is
| off, the GPL was never the goal of the original Free Software
| movement; it was merely a tool to get to the end state where
| all code becomes available for use by anyone for any reason
| without cost or restriction.
|
| I am reminded of a line from Terry Pratchett's _Going Postal_
| in relation to a hacker-like organization called the Smoking
| GNU, " _...[A]ll property is theft, except mine..._ ", which I
| thought was rather painfully apt in describing what FOSS
| evolved into after becoming popular.
| snickerbockers wrote:
| I can't speak to whether or not Richard Stallman was trying
| to make some 4-dimensional chess move to remove software
| restrictions by adding software restrictions when he wrote
| the GPL back in the 80s, but his original intentions are
| irrelevant in most cases since most people who license their
| code under the GPL do not consult with him or consider his
| opinions when they choose to do so.
| ImprobableTruth wrote:
| Your recollection is off, majorly. I'd recommend looking up
| the origins of the FSF/GPL/Copyleft. The entire movement
| essentially got started because Stallman gave Symbolics his
| (public domain) Lisp interpreter, then Symbolics improved it
| but refused to share the improvements.
|
| "No restrictions" has never been the goal and to claim that
| they're egoistic hypocrites who are just scared for their own
| livelihood because of this is just an absurd strawman.
| lispm wrote:
| Stallman did not gave Symbolics his Lisp interpreter.
|
| Symbolics had a license for MIT's Lisp system.
| the_af wrote:
| > _it was merely a tool to get to the end state where all
| code becomes available for use by anyone for any reason
| without cost or restriction_
|
| Your recollection seems to be completely off. That wasn't the
| goal of the Free Software movement.
|
| Also, the code they champion comes with restrictions and,
| optionally, cost. So again, you're off.
| na85 wrote:
| >It is certainly fascinating to see people start running away
| from "information wants to be free" and other Free Software
| principles full tilt when, all of a sudden, it's their
| livelihoods that are suddenly on the line.
|
| Indeed. Everybody is a leet haxors when they're 14, it's
| 1998, and we're vying for +o in #warez on DALnet. We believed
| information really did "want to be free".
|
| Unfortunately some of those same kids grew up to create
| today's data barons and that old saying about getting someone
| to understand something when their salary depends on not
| understanding comes into play.
| booleangate wrote:
| There's a lot of consternation over copyright issues, but I see
| an entirely different problem. When I hear this tool described
| and see it's examples the first thing I think is that Github has
| just automated the dubious process of copy/pasting from
| StackOverflow.
|
| As a senior developer, I am strongly biased against the SO+c/p
| programming approach that I've seen many Junior and mid level
| developers use. There's certainly a time and place for it when
| you become really stuck but at least having to go out and find
| the code yourself requires thought which helps you grow.
|
| My gut reaction to Copilot is that adding this automation into
| IDEs is going to have a net-negative effect on growing developers
| as it lowers the level of thought and effort necessary to write
| even trivial applications. This is a huge detriment to learning.
| You don't even get the chance to try to solve the problems
| yourself because the AI is going to be proactively getting in the
| way of your learning.
|
| All that being said, I think a tool like this could be of great
| use with boilerplate within a project -- but only suggesting
| things from that project. For example, setting up a new api
| route, dependency injection, error propagation, etc. Help with
| all of these mechanical things would be awesome.
| [deleted]
| ralph84 wrote:
| If you don't want people and/or AI to read your code, why would
| you post it anywhere? Just post binaries and call it freeware.
| foobarbazetc wrote:
| We're going to see license additions that explicitly ban ML from
| being trained on the code soon.
|
| Fun.
| henvic wrote:
| I really hope this weakens copyright. We can live without it.
| lc9er wrote:
| Who can? Sure, Disney shouldn't be able to copyright public
| domain works or Mickey Mouse until the end of time. But they
| also shouldn't be able to swoop in, use your
| songs/artwork/software in their latest movie, without
| permission or appropriate compensation.
| mmastrac wrote:
| Hobbling copyright would likely make Disney et al much weaker
| in the future, to where this might not be as big of a deal.
| dvdkon wrote:
| Well, right now it could also just weaken copyleft while
| leaving proprietary non-public code copyright holders well-off.
| eddieh wrote:
| Are you kidding? If I produce any creative work, don't copy it
| without my permission, full stop. (c) 2021
| caconym_ wrote:
| Copyright doesn't just benefit huge corporations. For instance,
| without it, independent artists who rely on copying for
| distribution (authors, musicians, etc.) would find it much more
| difficult to make money off their work, mostly (IMO) because
| large corporate entities with large investments made in
| publication and distribution systems could simply take content
| and sell it themselves with zero obligation to the original
| creator(s). This process could be highly automated at scale,
| giving creators essentially zero chance to compete in the
| market.
|
| It's a bad idea.
|
| The thing about copyright law that needs reform is its bias
| toward the benefit of large corporate entities. Platforms'
| implementations of DMCA compliance allow "rights holders" to
| spam perjurious takedown requests en masse, garnishing the
| earnings of creators and _legitimate_ rights holders in what
| can only be called (in addition to perjury) outright fraud.
| Companies like Github scrape the web for content, most of it
| copyrighted, and use it to construct new products for their own
| profit. Rare recitation events aside, I think their use case
| _is_ legitimate fair use in the eyes of the law (and if you
| look at my comment history you 'll see me vehemently arguing to
| that effect), but _should_ it be? We don 't seem to be asking
| that question, which is really disappointing--we're either
| complaining loudly and without substance, or blithely accepting
| the might-makes-right ethic as the central pillar of our IP
| law.
| kingsuper20 wrote:
| >Copyright doesn't just benefit huge corporations. For
| instance, without it, independent artists who rely on copying
| for distribution (authors, musicians, etc.) would find it
| much more difficult to make money off their work,
|
| That doesn't look like it's the point to me.
|
| ""[the United States Congress shall have power] To promote
| the Progress of Science and useful Arts, by securing for
| limited Times to Authors and Inventors the exclusive Right ,
| to their respective Writings and Discoveries." "
|
| As I read that, copyright is there to 'promote progress', not
| to maximize gains.
|
| No doubt there is a million linear feet of case law that got
| us where we are.
|
| Honestly, I rather like this whole question of copilot. I
| solidly appreciate the brilliance of github as a honeypot.
| caconym_ wrote:
| > To promote the Progress of Science and useful Arts, by
| securing for limited Times to Authors and Inventors the
| exclusive Right , to their respective Writings and
| Discoveries.
|
| What better way to promote said Progress than by making
| sure said Authors and Inventors can make enough money off
| their work to keep doing it? As written, it's a roundabout
| way to get at the instrumentality of capital, but if that's
| not what they had in mind then I'm not sure what they
| _were_ getting at. Without copyright, a creator 's rights
| to their own work aren't diminished; it's just that
| everyone else's are expanded to the same level.
|
| (I'd love to know if I'm way off base about this. I'm not a
| lawyer, and I'm sure it's been discussed to death.)
|
| > Honestly, I rather like this whole question of copilot. I
| solidly appreciate the brilliance of github as a honeypot.
|
| I think it's really cool, and I'd probably use it myself.
| As much as my favorite kinds of programming (e.g. writing
| experimental text editors) might not benefit from it, in my
| day job I sure would love to spend less time filling in
| boilerplate and looking up mundane API details.
|
| I don't mean to single Github out in my mention of big
| corporations benefiting from copyright law. Scraping vast
| quantities of copyrighted data to build new products is a
| common business model at this point, and--like other new
| IP-related paradigms enabled by modern information
| technology--I think it deserves a fresh look, being mindful
| of just what it is we're trying to accomplish with
| copyright law. As you say, it's not always obvious, even in
| written law.
| matthewmacleod wrote:
| Don't be too eager! Weakened copyright doesn't necessarily
| translate to an overall benefit, at least for software.
|
| Weakening copyright also weakens copyleft - for example, it
| seems reasonable to me that the producer of an open-source work
| should be entitled to require reciprocal openness from people
| who build upon it. If I can legitimately launder some GPL
| source code (say, a Linux kernel driver) through an ML model
| without being obliged to release the resulting code, I think
| everyone loses.
| blibble wrote:
| > I think everyone loses.
|
| only people who have released their code publicly under a
| (mostly) open license
|
| so, not Microsoft
| jbluepolarbear wrote:
| Regardless on how I feel about this usage, I'd be more concerned
| with the very real possibility of introducing vulnerabilities
| this way. Say the copilot takes a snippet from a code base. That
| snippet had a vulnerability and was fixed by the team that
| understood the what and how. How does that vulnerability get
| fixed? Does copilot let the user know months later that snippet
| used actually is very bad and that the company that originally
| implemented fixed it and you should too?
| amelius wrote:
| Can't you just put a robots.txt file in your project which says
| "no ML".
| sillysaurusx wrote:
| Anyone know how they're hosting their repositories?
| https://thelig.ht/code/ is actually kind of nice and minimalist;
| I was hoping to set up the same thing, mostly just for kicks.
| lolinder wrote:
| Googling a bit of the stylesheet suggests that it's stagit, a
| static page generator for git repos:
|
| https://codemadness.org/stagit.html
|
| Contrast these two pages, and you'll see it's a match:
|
| https://codemadness.org/git/bmf/log.html
|
| https://thelig.ht/code/block-tracing/log.html
| sillysaurusx wrote:
| Woo! You rock. I was too lazy to do that myself (or at least,
| lounging around in bed...) so I was hoping a fellow like you
| would sleuth it.
|
| Thank you. :)
| teflodollar wrote:
| Cgit
|
| https://git.zx2c4.com/cgit/about/
| varenc wrote:
| it's built with stagit: https://codemadness.org/stagit.html
|
| Love the minimal style and monospace font!
| linkdd wrote:
| This reminds me of cgit[1], but the UI seems even simpler.
|
| [1] - https://git.zx2c4.com/cgit/about/
| ta1234567890 wrote:
| It seems like the most fair way to go would be for Copilot to be
| completely open sourced and hosted on GitHub. That way they'd be
| subject to the same terms/conditions they are imposing on
| everyone else's code/repos.
| adamtulinius wrote:
| The problem isn't the source code of Copilot, but the code it
| is outputting.
| cj wrote:
| They aren't using private repos in their training data.
| rhn_mk1 wrote:
| I would be more sympathetic to the idea of the co-pilot if apart
| from being susceptible to stripping licensing information from
| permissive and copyleft projects, it could also inject copyright-
| stripped sources of the same amount of closed source code.
|
| As it is now, it works towards weakening the copyright of free
| software while doing nothing (or very little) to closed software.
| bullen wrote:
| How does he expose the git stuff? Is that open source?
| null-a wrote:
| Stagit?
|
| https://codemadness.org/stagit.html
| bryanrasmussen wrote:
| Lots of people arguing this guy isn't anybody, but the name
| seemed sort of familiar to me and my quick googling and looking
| at his site makes me think he probably has done something that
| some people use? For example dbxfs seems to have quite a history.
|
| on edit: just saw there was a description of who he is
| https://news.ycombinator.com/item?id=27724247 as noted I don't
| know but not sure if it's enough to imply a bad motive of him
| wanting to get some sort of attention for opposing copilot.
|
| on second edit: huh, seems to be one of those occasions when I
| have mysteriously offended some people on HN without swearing,
| joking or being rude.
| coliveira wrote:
| This is typical Microsoft behavior: embrace, extend, and
| extinguish. They embraced open source with the intention of
| controlling (GitHub) and exploiting it. And the interesting thing
| is that many people fell for this already ancient strategy.
| qayxc wrote:
| So when MS does it it's evil, but it's perfectly fine for
| everyone else to do it?
|
| I also don't see how any of this follows - they could've just
| crawled GitLab or any other OSS repository. They didn't even
| _need_ Github for this.
|
| Heck, is OpenAI doing embrace, extend, and extinguish on the
| entire web now, because they use Common Crawl [0] to train
| GTP-3, which forms the basis of CoPilot?
|
| [0] https://en.wikipedia.org/wiki/Common_Crawl
| rvz wrote:
| Well, I did try to warn the Copilot fanatics [0]. They just
| downvoted me days ago and here we are. We have a GitHub Copilot
| backlash against the hype squad.
|
| The GitHub CEO is no where to be found to answer the important
| questions on software licenses, copyright and the legal
| implications on scraping the source code of tons of projects
| with those licenses for Copilot.
|
| The fact you can only use it in VSCode and with Microsoft
| having an exclusive deal with OpenAI screams an obvious
| 'embrace and extend',
|
| As for 'Extinguish', they will need to be very creative on
| that.
|
| [0] https://news.ycombinator.com/item?id=27685104
| fartcannon wrote:
| What happens when Microsoft sues someone for including
| Microsoft's code in another project?
|
| Will it be fair use then?
| wcerfgba wrote:
| I'm glad that Copilot is bringing the grey areas of copyright
| into discussion. If I write a book and it is copyright, what's
| the smallest unit which is covered by that copyright? Each word
| is obviously not. Some sentences will be fairly generic and I
| will not be the first person to write them. But some sentences
| will be characteristic of the work or my own style. Clearly how
| we apply copyright to subdivisions of an original work is an open
| question.
| Engineering-MD wrote:
| I think an interesting analogy is if you rewrote a book in your
| own words but with each paragraphs meaning intact. So you
| rewrote Harry Potter but with slightly different sentence
| structures, but meaning was otherwise near identical. It's that
| copyright infringement? I think it would certainly be
| plagiarism.
|
| The other similar analogy is of translation: a translated work
| is still copied by 'derived from' copyright laws.
|
| Is this just what copilot is doing in some ways but for smaller
| components?
| MontyCarloHall wrote:
| The legal term for this is scenes a faire[0], and there is
| quite a bit of legal precedent covering exactly the cases you
| bring up.
|
| [0] https://en.m.wikipedia.org/wiki/Scenes_a_faire
| canada_dry wrote:
| > Limits of the scenes a faire doctrine are a matter of
| degree -- that is, _operate on a continuum_.
|
| Copilot is certainly pushing that envelope.
| nomercy400 wrote:
| 'Some sentences' makes me think of the link tax introduced to
| prevent aggregating news sources based on only headlines, so
| even generic sentences fall under copyright in certain cases.
| breck wrote:
| This. You realize it doesn't make any sense. All ideas are
| shared creations, by definition. If you've created something
| that has meaning for other people, the meaning comes from the
| ideas you are incorporating into your own tree.
|
| There is no defending copyright. It is indefensible from first
| principles. It makes no logical sense.
|
| Though it sure has proven to be a profitable con.
| okamiueru wrote:
| I recognise your comments from several different threads, and
| I'm wondering if you might not be working against your own
| ideals. The GPL license is intended to persuade other sources
| to share their contributions when building on top, which I
| assume is what you would like to see happen. If everything is
| GPL then everything is open source, everyone can use
| anything, including training AI methods, etc.
|
| The problem posed with copilot is in fact the opposite. By
| taking it to its logical conclusion, this might make it
| possible to disregard this effort and use GPL code on your
| private project.
| kortilla wrote:
| > There is no defending copyright. It is indefensible from
| first principles. It makes no logical sense.
|
| What does that even mean? The intent from the beginning of
| copyright was to allow people to live off of intellectual
| works by claiming legal rights over the work.
|
| There are no "first principles" from which basically any
| societal agreements like these are derived.
|
| Even something as simple as "murder is illegal" isn't
| actually derived from any first principles because the
| government is allowed to murder people, citizens are during
| self defense, etc.
| cnma wrote:
| Nonsense. Above a certain level of creativity people do
| produce novel or exceptional things that are worthy of
| protection.
|
| Because naked men are a shared concept Michelangelo's David
| is not protect-worthy?
|
| I'm very worried that such opinions are up-voted so highly
| when Microsoft leeches open source code (but not its own
| ...).
|
| People have no respect for other people's creations. Perhaps
| it makes them feel better because they haven't created
| anything difficult themselves.
| ChrisMarshallNY wrote:
| All of my open-source stuff on GH is MIT. I don't care whether or
| not Copilot (or anyone else) uses it.
|
| I seriously doubt that Copilot scans my (very few) private repos.
| Even then, I don't think I do anything particularly noteworthy.
|
| But that is just me.
| eCa wrote:
| The license you have chosen requires attribution. You may not
| care[1] but the other party still most likely will be in
| violation if Copilot reproduces a significant chunk of your
| code.
|
| [1] I also MIT license my public code on Github, and also
| wouldn't care that much.
| ChrisMarshallNY wrote:
| I don't care about attribution.
|
| The only reason I use MIT, is so some knucklehead doesn't try
| to sue me, because they cheezed up my code.
| [deleted]
| BiteCode_dev wrote:
| My prediction is that they will add a licence tooltip in the code
| completion ui and solve that issue next month.
| IceDane wrote:
| ok
| blihp wrote:
| Anyone publishing anything on the Internet should expect this
| type of use case. If it is removed from github and republished
| via another site, there is absolutely nothing preventing another
| service/company from doing the exact same thing (or 'worse'...
| i.e. imagine a learning system that can actually understand the
| code) when scraping the alternative location. It's not unusual
| for bots to be among the most frequent visitors to low traffic
| pages these days and they aren't all just populating search
| engines.
| yashap wrote:
| A bigger concern for many is that if you USE copilot, you'll
| unintentionally copy code with licences that your company
| really, REALLY does not want to copy. For example, here's
| copilot copying some very famous GPL code:
| https://twitter.com/mitsuhiko/status/1410886329924194309?s=2...
|
| And basically every software company avoids GPL like the
| plague, due to its strong copyleft conditions.
| blihp wrote:
| Sure, but that's a different end of the issue than I was
| referring to. I was pointing out that just taking code off of
| github wouldn't avoid the use case. Any published code from
| any public source is likely to eventually be used this way by
| someone.
| yashap wrote:
| Yeah, I agree with your point that "if you publish content
| to the internet, expect it to be used in ways you don't
| intend, or even permit." Just pointing out that a lot of
| the concerns are not "GitHub is stealing my code for use in
| Copilot," but "using GitHub Copilot in my proprietary
| software is a massive risk/liability."
| lacker wrote:
| I thought this was a pretty good thread (by an ex-Wikipedia
| lawyer) on Twitter about the IP meaning of Copilot.
|
| https://twitter.com/luis_in_brief/status/1410242882523459585...
|
| And this is a longer article about how IP and AI interact:
|
| https://ilr.law.uiowa.edu/print/volume-101-issue-2/copyright...
|
| I am not a lawyer, but I am capable of summarizing the thoughts
| of lawyers, so my take is that in general, fair use allows AI to
| be trained on copyrighted material, and humans who use this AI
| are not responsible for minor copyright infringement that happens
| accidentally as a result. However, this has not been tested in
| court in detail, so the consensus could change, and if you were
| extremely risk-averse you might want to avoid Copilot.
|
| A key quote from the second link:
|
| _Copyright has concluded that reading by robots doesn't count.
| Infringement is for humans only; when computers do it, it's fair
| use._
|
| Personally, I think law should allow Copilot. As a human, I am
| allowed to read copyrighted code and learn from it. An AI should
| be allowed to do the same thing. And nobody cares if my ten-line
| "how to invert a binary tree" snippet is the same as someone
| else's. Nobody is really being hurt when a new tool makes it
| easier to copy little bits of code from the internet.
| lamontcg wrote:
| I'm more concerned with all the poor code and security issues
| that Copilot has been trained on. Garbage In, Garbage Out.
| mmastrac wrote:
| > Copyright has concluded that reading by robots doesn't count.
| Infringement is for humans only; when computers do it, it's
| fair use.
|
| This would be interesting to test with AI and pop music.
| encryptluks2 wrote:
| This is a stupid argument that the Twitter author made.
| Saving music digitally is reading by robot, so recording
| music that wasn't digital into a digital format is fair use.
| extra88 wrote:
| > recording music that wasn't digital into a digital format
| is fair use
|
| If you're doing it from an analog format you bought for
| your own use (format shifting), it is fair use.
| inglor_cz wrote:
| Perhaps the final judgment would say "AI cannot infringe on
| copyright provided that only other AIs consume the result of
| the first AIs work".
|
| And suddenly there is a world of robots composing, writing
| and painting for other robots. With us humans left out.
|
| There should be a /s at the end, but legal world sometimes
| produces such convolutions. See, for example, the
| interpretation of the Commerce Clause in Gonzales v. Reich.
| dehrmann wrote:
| As far as IP protections go, they're similar, but the
| incentives are so different that you get songwriters going to
| court over bits of melodies that might be worth millions.
| Outside of quantitative trading, it's hard to find an example
| of 10 lines of code that are worth millions and couldn't
| easily be replaced with another implementation.
| janoc wrote:
| Sorry but it is not a robot publishing the "lifted" code but a
| human. So the copyright will very much apply. That's an
| argument like saying CTRL+C/CTRL+V is OK because it is a
| "computer doing it".
|
| Plus it is not "minor infringement" but code is being lifted
| verbatim - e.g. as has been demonstrated by the Quake square
| root code.
|
| Feel free to test this theory in court ...
| cnma wrote:
| > Nobody is really being hurt when a new tool makes it easier
| to copy little bits of code from the internet.
|
| Of course people are hurt, namely the original creators who
| spent years of work and whose work is potentially laundered,
| depending on how good this IP grabbing AI will get.
|
| If it gets really good, some smug and well connected loser
| (e.g. the type who posts pictures of himself with a microphone
| on GitHub) will click a button, steal other people's hard work
| and start a "new" project that supersedes the old one.
| dathinab wrote:
| Fair use for training and "independent creation" are one think
| a AI "remembering and mostly verbatim copying code over" an
| another.
|
| Many of the current Machine Learning application try to teach
| AI to understand the concepts behind their training data and
| use that to do whatever they are trained to do.
|
| But most (all?) fail to properly reach the goal in any more
| complicated cases, at least the kinds of models which are used
| for things like Copilot (GPT-3?).
|
| Instead what this models learn can be described as a
| combination of some abstract understanding and verbatim
| snippets of input data of varying size.
|
| As such while they sometimes generate "new" things based on
| "understanding" they also sometimes just copy things they have
| seen before!! (Like in the Quake code example where it even
| copied over some of the not-so "proper" comments expressing
| programmers frustration).
|
| It's like a human not understanding programming or english or
| Latin letters but has a photographic memory and tries to
| somehow create something which seems to make sense by
| recombining existing verbatim snippets, sometimes while
| tweaking them.
|
| I.e. if the snippets are small enough and tweaked enough it's
| covered by fair use and similar, BUT the person doing it
| doesn't know about this, so if a large remembered snippet
| matches verbatim it _will_ put it in effectively copying code
| of a size which likely doesn 't fall under fair use.
|
| Also this is a well known problem, at least it was when I
| covered topics including ML ~5 years ago. I.e. good examples
| included extracting whole sequences of paragraphs of a book out
| of such a network or (more brilliantly) extracting thinks like
| peoples contact data based on their names or credit card
| information (in case of systems trained on mails).
|
| So that Copilot is basically guaranteed to sometimes copy non
| super smalls snippets of code and potential comments in a way
| not-really appropriate wrt. copyright should have been a well
| know fact for the ML specialist in charge of this project.
| thunderbong wrote:
| IMHO, Threadreader does a better job of reading these kind of
| tweets
| j4yav wrote:
| Wouldn't this make for a simple license laundering system?
| SV_BubbleTime wrote:
| > Nobody is really being hurt when a new tool makes it easier
| to copy little bits of code from the internet.
|
| Quite the opposite. We all get a tiny bit better with good
| information like this. This is what the internet should be for,
| evolving, learning from past mistakes, information
| availability.
|
| If the discussion was "I clicked this button and got someone's
| entire chat platform" that would be different. Words and
| sentences aren't copy written, books are, so when exactly are a
| collection of words a book?
|
| There is nuance, and the linked page has none. But that's fine,
| that guy is free to pull his content off GitHub. This seems
| like a useful feature for other people who want to make things
| first and foremost.
| IncRnd wrote:
| > Words and sentences aren't copy written, books are, so when
| exactly are a collection of words a book?
|
| If that were true, then 20 people could each steal a single
| chapter from a book, and one of the people could combine
| those 20 chapters into a new copyright-free book. That's
| clearly false.
| SV_BubbleTime wrote:
| Did I say anything about paragraphs or chapters? Didn't I
| specially write there is nuance?
|
| And for your strawman, the assembly of uncopywritable
| components into a copywriten work, would still be a
| violation.
|
| So we agree that copywriter is somewhere between the
| paragraphs and chapters and the book. So why are tiny code
| excerpts "a problem"?
| exo762 wrote:
| > As a human, I am allowed to read copyrighted code and learn
| from it. An AI should be allowed to do the same thing.
|
| This is a very false equivalency. AI and humans are different.
| First, AI is at best a slave, and likely a slave of a capital.
| Second - scale makes difference.
| bcrosby95 wrote:
| > Copyright has concluded that reading by robots doesn't count.
| Infringement is for humans only; when computers do it, it's
| fair use.
|
| Reading by a robot doesn't count. But injecting a robot between
| copyright material and a product doesn't magically strip the
| copyright from whatever it produces.
| jcelerier wrote:
| > As a human, I am allowed to read copyrighted code and learn
| from it.
|
| Of course not. Reading some copyrighted code can have you
| entirely excluded from some jobs - you can't become a wine
| contributor if it can be shown you ever read Windows source
| code and most likely conversely. Likewise, you can't ever write
| GPL VST 2 audio plug-ins if you ever had access to the official
| Steinberg VST2 SDK. Etc etc...
|
| Did people forget why black box reverse engineering of software
| ever came to be ?
| [deleted]
| burntoutfire wrote:
| > Reading some copyrighted code can have you entirely
| excluded from some jobs - you can't become a wine contributor
| if it can be shown you ever read Windows source code and most
| likely conversely.
|
| If that's the case, it should be easy to kill a project like
| wine - just send every core contributor an email containing
| some Windows code.
| pvaldes wrote:
| Nobody could grant if that thing is really windows code or
| a fake. Not without the sender self-identifying as a well
| known top MS employee having access to it. In that case the
| sender would be doing something illegal and against MS
| interests.
|
| The result would be WINE having an advantage to redo the
| snippet of code in a totally new and different way and MS
| being forced to show part of its private code, that would
| expose them also to patent trolls.
|
| Would be a win-win situation for Wine and a lose-lose
| situation for MS.
| dahart wrote:
| > Reading some copyrighted code can have you entirely
| excluded from some jobs
|
| What provision of copyright law are you referring to? Are you
| conflating copyright law with arbitrary organizational
| policies?
| chrisseaton wrote:
| Who said it was a law?
| dahart wrote:
| Which "it" are you referring to? @lacker was talking
| about copyright in the comment @jcelerier replied to.
| chrisseaton wrote:
| Yeah... but they didn't say it was the law that got you
| excluded from working on some projects from reading
| copyright code. It's corporate policy that does that -
| it's not a law but they do it based on who owns the
| copyright. Not everything that impacts you is a law.
|
| They said
|
| > Reading some copyrighted code can have you entirely
| excluded from some jobs
|
| And they're right. It's because of corporate policies.
| They never said it was because of a law - you imagined
| that out of nothing.
| dahart wrote:
| > They never say it was because of a law - you imagined
| that out of nothing.
|
| @jcelerier flatly contradicted the statement that
| copyright doesn't prevent you from reading something.
|
| You're right that @jceleier didn't say their example was
| law, that's because the example is a straw man in the
| context of what @lacker wrote.
| chrisseaton wrote:
| Are you editing your comments out after they're been
| replied to? That's really poor form.
| dahart wrote:
| I did not edit my comments above after reading your
| replies, why do you ask? What do you think I changed that
| affected how the thread reads?
|
| And, who says improving or clarifying a comment is poor
| form? What is the edit button for, and why is it
| available once replies have been posted?
| chrisseaton wrote:
| > What do you think I changed
|
| I think you added
|
| > Which "it" are you referring to?...
|
| Because I have a tab open and can see the old one!
| dahart wrote:
| I added that before I saw your comment. So?
| hluska wrote:
| So @chrisseaton was correct, you did edit your posts and
| their question was in good faith.
|
| Edit - I'm adding another point as an edit to show
| another way to communicate. Would any of your points been
| lost had you done something similar?
| dahart wrote:
| > So @chrisseaton was correct
|
| No that's not true. I did not edit my posts after reading
| their reply, and the false accusation was that I changed
| my comment after it was replied to.
|
| I didn't challenge whether the question was in good
| faith, but I'll just note that the relevant discussion of
| copyright got dropped in favor of an ad-hominem attack.
|
| My question of which "it" was being referred to is a
| legitimate question that I believe clarified the intent
| of my comment, and I added it to make clear I was talking
| about what @lacker said, not what @jcelerier wrote.
|
| > Edit - I'm adding another point as an edit to show
| another way to communicate. Would any of your points been
| lost had you done something similar?
|
| This doesn't answer my question of why an edit should not
| be made before I see any replies, nor of why any edit is
| "poor form" and according to whom. I made my edit
| immediately. I'm well aware of the practice of calling
| out edits with a note, I've done it many times. I don't
| feel the need to call out every typo or clarification
| with an explicit note, especially when edited very soon
| after the original comment.
| TheRealPomax wrote:
| I don't believe you on this in the slightest. This sounds
| like you making up an argument, so cite sources if you want
| people to believe your claims.
| mhh__ wrote:
| In my experience open source has now become so prevalent that
| I think some young developers could be completely caught out
| if the pendulum swings the other way.
|
| Semi-related, the GNU/Linux copypasta is now more familiar to
| some than the GNU project in general - this is a shame to me
| as I view the copypasta to be mocking people who worked very
| hard to achieve what GNU has achieved asking for some credit.
| messe wrote:
| It's dependent on jurisdiction. Black box reverse engineering
| is only required in certain countries. If I remember
| correctly, most of Europe doesn't require it.
| k__ wrote:
| Wasn't that the entire premise of "Halt and Catch Fire"?
| cush wrote:
| I've you've ever read a book or interacted with any product,
| you've learned from copyrighted material.
|
| You've extrapolated "some organizations don't allow you to
| contribute if you've learned from the code of their direct
| competitor" to "You're not allowed to learn from copyrighted
| code", which is absurd.
| crazygringo wrote:
| That's not what GP is saying.
|
| In general, you're _absolutely_ allowed to learn programming
| techniques from _anywhere_. You can contribute software
| almost anywhere even if you 've read Windows source code. Re-
| using everything you've learned, in your own creative
| creation, is part of fair use.
|
| Your example is the very specific scenario where you're
| attempting to replicate an _entire_ program, API, etc., to
| identical specifications. That 's obviously not fair use.
| You're not dealing with little bits and pieces, you're
| dealing with an entire finished product.
| doytch wrote:
| This is true, but there's also a murkier middle option. I
| used to work for a company that made a lot of money from
| its software patents but I was in a division that worked
| heavily in open-source code. We were forbidden to
| contribute to the high-value patented code because it was
| impossible to know whether we were "tainted" by knowledge
| of GPL code.
| dathinab wrote:
| No you are not, guaranteed (I think, not a lawyer).
|
| At least from a copyright point of few.
|
| TL;DR: Having right, and having a easy defense in a law
| suite are not the same.
|
| BUT separating it makes defending any law-suite against
| them because of copyright and patent law much easier. It
| also prevents any employee from "copying GPL(or similar)
| code verbatim from memory"(1) (or even worse the
| clipboard) sure the employee "should" not do it but by
| separating them you can be more sure they don't, and in
| turn makes it easier to defent in curt especially wrt.
| "independent creation".
|
| There is also patent law shenanigans.
|
| (1): Which is what GitHub Copilot is sometimes doing
| IMHO.
| [deleted]
| 0xdky wrote:
| Same here. I worked at a NAS storage (NFS) vendor and
| this was a common practice. Could not look at server
| implementation in Linux kernel and open source NFS client
| team could not look at proprietary server code.
| jcelerier wrote:
| > Your example is the very specific scenario where you're
| attempting to replicate an entire program, API, etc., to
| identical specifications. That's obviously not fair use.
| You're not dealing with little bits and pieces, you're
| dealing with an entire finished product.
|
| No - google's 9 lines of sorting algorithm (iirc) copied
| from Oracle's implementation were not considered fair use
| in the Google / Oracle debacle.
|
| Likewise SCO claimed that 80 copied lines (in the entirety
| of the Linux source code) were a copyright violation, even
| if we never had a legal answer to this.
| crazygringo wrote:
| Sorry, but you're not recalling correctly. :)
|
| The Supreme Court decided Google v. Oracle _was_ fair
| use. It was 3 months ago:
|
| https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_Americ
| a,_...
|
| That's the highest form of precedent, the question has
| now been effectively settled (unless Congress ever
| changes the law).
|
| Edit: added a dummy hash to end of URL so HN parses it
| correctly (thanks @thewakalix below)
| thewakalix wrote:
| There seems to be an issue with Hacker News's URL
| parsing. The final period isn't included as part of the
| link.
| croes wrote:
| The fair use was about Googled API reimplementation. It
| becomes a whole different case with a 1:1 copy of code.
| And don't forget fair use works in the US, not
| necessarily in the rest of the world.
|
| But I'm happy about all the new GPL programs created by
| Copilot
| wtallis wrote:
| That Supreme Court ruling doesn't appear to address the
| claims of actual copied code (the rangeCheck function),
| only the more nebulous API copyright claims.
| [deleted]
| jcelerier wrote:
| nope, those lines were specifically excluded from the
| prior judgment - and SC did not cast another judgment on
| them:
|
| > With respect to Oracle's claim for relief for copyright
| infringement, judgment is entered in favor of Google and
| against Oracle except as follows: the rangeCheck code in
| TimSort.java and ComparableTimSort.java, and the eight
| decompiled files (seven "Impl.java" files and one"ACL"
| file), as to which judgment for Oracle and against Google
| is entered in the amount of zero dollars (as per the
| parties' stipulation).
| [deleted]
| saurik wrote:
| This model doesn't learn and abstract: it just pattern
| matches and replicates; that's why it was shown exactly
| replicating regions of code--long enough to not be "de
| minimis" and recognizable enough to include the comments--
| that happen to be popular... which would be fine, as long
| as the license on said code were also being replicated. It
| just isn't reasonable to try to pretend Copilot--or GPT-3
| in general--is some kind of general purpose AI worthy of
| being compared with the fair use rights of a human learning
| techniques: this is a machine learning model that likes to
| copy/paste not just tiny bits of code but _entire
| functions_ out of other peoples ' projects, and most of
| what makes it fancy is that it is good at adapting what it
| copies to the surrounding conditions.
| the8472 wrote:
| This is called prompt engineering. If you find a popular,
| frequently repeated code snippet and then fashion a
| prompt that is tailored to that snippet then yes the NN
| will recite it verbatim like a poem.
|
| But that doesn't mean it's the only thing it does or even
| that it does it frequently. It's like calling a human a
| parrot because he completed a line from a famous poem
| when the previous speaker left it unfinished.
|
| The same argument was brought up with GPT too and has
| been long debunked. The authors (and others) checked
| samples against the training corpus and it only rarely
| copies unless you prod it to.
| saurik wrote:
| I don't know if I agree with your argument about GPT-3,
| but I think our disagreement seems to be besides the
| point: if your human parrot did that, they would--not
| just in theory but in actual fact! see all the cases of
| this in the music industry--get sued for it, even if they
| claim they didn't mean to and it was merely a really
| entrenched memory.
| the8472 wrote:
| The point is that many of the examples you see are
| intentional, through prompt engineering. The pilot asked
| the copilot to violate copyright, the copilot complied.
| Don't blame the copilot.
|
| There _also_ are cases where this happens
| unintentionally, but those are not the norm.
| moyix wrote:
| Have you used Copilot? I have not, but I have trained a
| GPT2 model on open source projects
| (https://doesnotexist.codes/). It does _not_ just pattern
| match and replicate. It can be cajoled into reproducing
| some memorized snippets, but this is not the norm; in my
| experience the vast majority of what it generates is
| novel. The exceptions are extremely popular snippets that
| are repeated many many times in the training data, like
| license boilerplate.
|
| Perhaps Copilot behaves very differently from my own
| model, but I strongly suspect that the examples that have
| been going around twitter are outliers. Github's study
| agrees:
| https://docs.github.com/en/github/copilot/research-
| recitatio... (though of course this should be replicated
| independently).
| saurik wrote:
| So, to verify, your claim is that GPT-3, when trained on
| a corpus of human text, isn't merely managing to string
| together a bunch of high-probability sequences of symbol
| constructs--which is how every article I have ever read
| on how it functions describes the technology--but is
| instead managing to build a model of the human world and
| the mechanism of narration required to describe it, with
| which it uses to write new prose... a claim you must make
| in order to then argue that GPT-3 works like a human
| engineer learning a model of computers, libraries, and
| engineering principals from which it can then write code,
| instead of merely using pattern recognition as I stated?
| As someone who spent years studying graduate linguistics
| and cognitive science (though admittedly 15-20 years ago,
| so I certainly haven't studied this model: I have only
| read about it occasionally in passing) I frankly think
| you are just trying to conflate levels of understanding,
| in order to make GPT-3 sound more magical than it is :/.
| moyix wrote:
| What? I don't think I made any claim of the sort. I'm
| claiming that it does more than mere regurgitation and
| has done _some_ amount of abstraction, not that it has
| human-level understanding. As an example, GPT-3 learned
| some arithmetic and can solve basic math problems not in
| its training set. This is beyond pattern matching and
| replication, IMO.
|
| I'm not really sure why we should consider Copilot
| legally different from a fancy pen - if you use it to
| write infringing code then that's infringement by the
| user, not the pen. This leaves the _practical_ question
| of how often it will do so, and my impression is that it
| 's not often.
| saurik wrote:
| The argument I was responding to--made by the user
| crazygringo--was that GPT-3 trained on a model of the
| Windows source code is fine to use nigh unto
| indiscriminately, as supposedly Copilot is abstracting
| knowledge like a human engineer. I argued that it doesn't
| do that: that GPT-3 is a pattern recognize that not only
| theoretically just likes to memorize and regurgitate
| things, it has been shown to in practice. You then
| responded to my argument claiming that GPT-3 in fact...
| what? Are you actually defending crazygringo's argument
| or not? Note carefully that crazygringo explicitly even
| stated that copying little bits and pieces of a project
| is supposedly fair use, continuing the--as far as I
| understand, incorrect--assertion by lacker (the person
| who started this thread) that if you copied someone's
| binary tree implementation that would be fair use, as the
| two of them seem to believe that you have to copy
| essentially an entire combined work (whatever that means
| to them) for something to be infringing. Honestly, it now
| just seems like you decided to skip into the middle of a
| complex argument in an attempt to made some pedantic
| point: either you agree that GPT-3 is a human that is
| allowed to, as crazygringo insists, read and learn from
| anything and the use that knowledge in any way they see
| fit, or you agree with me that GPT-3 is a fancy pattern
| recognizer and it can and will just generate copyright
| infringements if used to solve certain problems. Given
| your new statements about Copilot being a "fancy pen"
| that can in fact be used incorrectly--something
| crazygringo seems to claim isn't possible--you frankly
| sound like you agree with my arguments!!
| bnjemian wrote:
| I think a crucial distinction to be made here, and with
| most 'AI' technologies (and I suspect this isn't news to
| many people here) is that - yes - they are building
| abstractions. They are not simply regurgitating. But - no
| - those abstractions are _not_ identical (and very often
| not remotely similar) to human abstractions.
|
| That's the very reason why AI technologies can be useful
| in augmenting human intelligence; they see problems in a
| different light, can find alternate solutions, and
| generally just don't think like we do. There are many
| paths to a correct result and they needn't be isomorphic.
| Think of how a mathematical theorem may be proved in
| multiple ways, but the core logical implication of the
| proof within the larger context is still the same.
| codelord wrote:
| It's not really comparable to a pen. Because a pen by
| itself doesn't copy someone else's code/written words.
| It's more like copying code from Github or if you wrote a
| script that did that automatically. You have to be
| actively cautious that the material that you are copying
| is not violating any copyrights. The problem is Copilot
| has enough sophistication to for example change variable
| names and make it very hard to do content matching. What
| I can guarantee it won't be able to do is to be able to
| generate novel code from scratch that does a particular
| function (source: I have a PhD in ML). This brute-force
| way of modeling computer programs (using a language
| model) is just not sophisticated enough to be able to
| reason and generate high level concepts at least today.
| DougBTX wrote:
| One way to look at these models is to say that they take
| raw input, convert it into a feature space, manipulate
| it, then output back as raw text. A nice example of this
| is neural style transfer, where the learnt features can
| distinguish content from style, so that the content can
| be remixed with a different style in feature space. I
| could certainly imagine evaluating the quality of those
| features on a scale spanning from rote-copying all the
| way up to human understanding, depending on the quality
| of the model.
| jozvolskyef wrote:
| Imagine for a second a model of the human brain that
| consists of three parts. 1) a vector of trillion inputs,
| 2) a black box, and 3) a vector of trillion outputs. At
| this level of abstraction, the human brain "pattern
| matches and replicates" just the same, except it is
| better at it.
| saurik wrote:
| Human brains are at least minimally recurrent, and are
| trained on data sets that are much wider and more complex
| than what we are handing GPT-3. I have done all of these
| standard though experiments and even developed and
| trained my own neural networks back before there were
| libraries that have allowed people to "dabble" in machine
| learning: if you consider the implications of humans
| being able to execute turing complete thoughts it should
| be come obvious that the human brain isn't merely doing
| pattern-anything... it _sometimes_ does, but you can 't
| just conflate them and then call it a day.
| jozvolskyef wrote:
| The human brain isn't Turing-complete as that would
| require infinite memory. I'm not saying that GPT-3 is
| even close, but it is in the same category. I tried
| playing chess against it. According to chess.com, move 10
| was its first mistake, move 16 was its first blunder, and
| past move 20 it tried to make illegal moves. Try playing
| chess without a chessboard and not making an illegal
| move. It is difficult. Clearly it does understand chess
| enough not to make illegal moves as long as its working
| memory allows it to remember the game state.
| hollerith wrote:
| >The human brain isn't Turing-complete as that would
| require infinite memory
|
| A human brain with an unlimited supply of pencils and
| paper, then.
| robbedpeter wrote:
| Transformers do learn and abstract. Not as well as
| humans, but for whatever definitive of innovation or
| creativity you wanna run with, these gpt models have it.
| It's not magic, it's math, but these programs are
| approximating the human function of media synthesis
| across narrowly limited domains.
|
| These aren't your crazy uncle's Markov chain chatbots.
| They're sophisticated bayesian models trained to
| approximate the functions that produced the content used
| in training.
| visarga wrote:
| > this is a machine learning model that likes to
| copy/paste not just tiny bits of code but entire
| functions out of other peoples' projects
|
| Github could make a blacklist and tell Copilot never to
| suggest that code. Problem solved. You use one of the
| other 9 suggestions.
| eatbitseveryday wrote:
| > > As a human, I am allowed to read copyrighted code and
| learn from it.
|
| > Of course not. Reading some copyrighted code can make you
| entirely excluded from some jobs - you can't become a wine
| contributor if it can be shown you ever read Windows source
| code and most likely conversely.
|
| You can of course read the code. The consequences are thus
| increased limitations, like you say.
|
| What you mention is not an absolute restriction from reading
| copyrighted material. You perhaps have to cease other
| activities as a result.
| PragmaticPulp wrote:
| > Of course not. Reading some copyrighted code can have you
| entirely excluded from some jobs
|
| That's not a law. That's a cautionary decision made by those
| companies or projects to make it more difficult for
| competitors to argue that code was copied.
|
| Those projects could hire people familiar with competitor
| code and assign them to competing projects if they wanted.
| The contributors could, in theory, write new code without
| using proprietary knowledge from their other companies. In
| practice, that's actually really difficult to do and even
| more difficult to prove in court, so companies choose the
| safe option and avoid hiring anyone with that knowledge
| altogether.
|
| Now the question is whether or not GitHub's AI can be argued
| to have proprietary knowledge contained within. If your goal
| is to avoid any possibility that any court could argue that
| GitHub copilot funneled proprietary code (accessible to
| GitHub copilot) into your project, then you'd want to forbid
| contributors from using CoPilot.
| ithkuil wrote:
| In this case though we have machine learning model that is
| trained with some code and is not merely learning abstract
| concepts to be applied generally in different domains, but
| instead can use that knowledge to produce code that looks
| pretty much the same as the learning material, given the
| context that fits the learning material.
|
| If humans did that, it would be hard to argue they didn't
| outright copy the source.
|
| When a machine does it, does it matter if the machine
| literally copied it from sources, or first transformed it
| into an isomorphic model in its "head" before regurgitating
| it back?
|
| If yes, why doesn't parsing the source into an AST and then
| rendering it back also insulate you from abiding a
| copyright?
| dTal wrote:
| >When a machine does it, does it matter if the machine
| literally copied it from sources, or first transformed it
| into an isomorphic model in its "head" before
| regurgitating it back?
|
| You've hit the nail on the head here. If this is okay,
| then neural nets are simply machines for laundering IP.
| We don't worry about people memorizing proprietary source
| code and "accidentally" using it because it's virtually
| impossible for a human to do that without realizing it.
| But it's trivial for a neural net to do it, so
| comparisons to humans applying their knowledge are
| flawed.
| visarga wrote:
| This is not such a big problem in reality because the
| output of Copilot can be filtered to exclude snippets too
| similar to the training data, or any corpus of code you
| want to avoid. It's much easier to guarantee clean code
| than train the model in the first place.
| temac wrote:
| I will completely follow that opinion the day MS includes the
| whole Windows codebase into the training of copilot.
|
| Until then, it's basically "GPL" (and other licences)
| laundering with one-sided excuses.
| [deleted]
| abeppu wrote:
| Well, maybe the interpretation will change if the right people
| are pissed off.
|
| At this point, how hard would it be to produce a structurally
| similar "content-aware continuation/fill" for audio producers,
| film makers, etc, which suggests audio snippets or film
| snippets, trained from copyrighted source material?
|
| If prompted by a black screen with some white dots, the video
| tool could suggest a sequence of frames beginning with text
| streaming into the distance "A long time ago in a galaxy far
| far away ..." and continue from there.
|
| Normally we don't try to train models to regurgitate their
| inputs, but if we actually tried, I'm sure one could be made to
| reproduce the White Album or Thriller or whatever else.
| visarga wrote:
| NeRFs (neural radiance fields) are neural nets that exactly
| encode one input, kind of like a JPEG. They can reconstruct
| from novel viewpoints.
| throwaway_egbs wrote:
| Just when I thought tweetstorms couldn't get any worse, here's
| one where every tweet is a quote-tweet of the author. I don't
| even understand how I'm supposed to read this.
|
| > Copyright has concluded that reading by robots doesn't count.
| Infringement is for humans only; when computers do it, it's
| fair use.
|
| Surely there's a limit to this. If I use a machine to produce
| something that just happens to exactly match a copyrighted
| work, now it's not infringement because of the method I used to
| produce it? That seems nonsensical, but maybe there's precedent
| for this too? (I have no idea what I'm talking about.)
| neolog wrote:
| Ctrl-c is a robot, so copyright doesn't apply to it
| rcxdude wrote:
| That quote is basically entirely nonsensical. 'copyright'
| hasn't decided anything (nor has any legislative body nor the
| courts). All that's happened is that OpenAI has put forward
| an argument that using large quantities of media scraped from
| the internet as training data is fair use. This argument for
| the most part does not rely on the human vs machine
| distinction (in fact it leans on the idea that the process is
| not so different from a human learning). The main place this
| comes up is the final test of damage to the original in terms
| of lost market share where it's argued that because it's a
| machine consuming the content there's no loss of audience to
| the creator (which is probably better phrased as the people
| training the neural net weren't going to pay for it anyway).
| A lot does ride on the idea that the neural net, if 'well
| designed', does not generally regurgitate its training data
| verbatim, which is in fairly hot dispute at the moment.
| OpenAI somewhat punts on this situation and basically says
| the output may infringe copyright in this case, but the
| copyright holder should sue whoever's generating and using
| the output from the net, not the person who trained and
| distributed the net.
| discreteevent wrote:
| Surely it could be argued that there is a loss of audience
| to the author. At the moment some people will read the
| author's code directly in order to find out how to solve a
| problem. In the future at least some of those people will
| just ask copilot to solve the problem for them.
| noobermin wrote:
| This argument is very convenient for OpenAI.
| niekverw wrote:
| > Copyright has concluded that reading by robots doesn't count.
| Infringement is for humans only; when computers do it, it's
| fair use.
|
| This is silly. Co pilot is not reading by itself, someone
| pushed buttons telling it to read and write. If I clone the
| entire github without the licenses I am telling a robot to do
| it, doesn't make it right.
| dehrmann wrote:
| I think the law will allow what copilot eventually becomes. As
| others have said, right now, it's too apt to reproduce code
| verbatim.
| intricatedetail wrote:
| There is a difference when human learns vs multi billion dollar
| company train their models without paying a penny.
|
| Saying it is just like user, maybe they start paying taxes like
| individuals without access to creative accountants pay.
|
| Leeches without morals - Micro$oft
| IncRnd wrote:
| > Nobody is really being hurt when a new tool makes it easier
| to copy little bits of code from the internet.
|
| That's the first time I've heard copilot get described as
| copying little bits of code from the Internet. Copilot
| aggregates all github source code, removes licences from the
| code, and regurgitates the code without licenses.
|
| Furthermore, both github and the programmers using copilot know
| this. Look at any one of these threads written by programmers
| about copilot. Using copilot is knowingly stealing the source
| code of others without attribution. Using copilot is literally
| humans stealing source code from others. Copilot was written
| _for the purpose_ of taking other 's code.
| IfOnlyYouKnew wrote:
| It's not "literally" stealing, because it doesn't deprive
| anyone of the use the source code. Those two points were
| somehow extremely obvious to everyone here as long as it was
| music and movies we were talking about.
|
| And Github themselves have stated that only 0.1% of the
| Copilot output contains chunks taken verbatim from the
| learning set. Of those, the vast majority are likely to be
| boilerplate so generic it's silly to claim ownership, and
| maybe sometimes impossible to avoid.
| IncRnd wrote:
| > It's not "literally" stealing, because it doesn't deprive
| anyone of the use the source code.
|
| That's simply not true. You might be confusing idealism
| about software freedom with how both law and society define
| theft.
|
| Edit: In this comment I refer to the US.
| mdpye wrote:
| It is actually true, in the UK at least the legal
| definition of theft includes the deprivation of the owner
| of the property in question.
|
| The copyright lobby hedge the term as "copyright theft"
| (i.e. not _actual_ theft) in order to shift the societal
| understanding. Whish appears to have worked.
|
| This is not a value judgement on copyright infringement.
| Just that technically it doesn't meet the legal
| definition of theft.
|
| cf. The rather amusing satire of the "you wouldn't steal
| a handbag" campaign in the UK, which ran "you wouldn't
| download a bear!"
| IncRnd wrote:
| Yes! Thank you. I should have clarified that I meant
| within the US.
| mdpye wrote:
| Oh, then today I learned! I didn't realise they were
| different. Just looked it up in a "plain English
| dictionary of law" and the distinction seems subtle but
| important. Rather than "with the intention of depriving
| the owner", the US one says "with the intention of
| converting it to their use", which seems broad enough to
| cover exploiting a copy, rather than the original (or
| only, in the physical realm...)
| a3w wrote:
| In germany, there is no fair use exception to copyright. Also,
| there is no IP most software principles: e.g. writing a
| specific loop, that even an (weak) AI could suggest, would
| probably be too simple to be protected.
|
| What could be valid is a right to not mimic collections, but
| that would mean you cannot clone the Copilot, as input is
| mapped to a non-trivial collection of outputs.
|
| Disclaimer: IANAL, but I do dabble in IT-law.
| shakow wrote:
| > Copyright has concluded that reading by robots doesn't count.
|
| Until someone trains a DNN to generate Mickey Mouse-like
| cartoons I assume.
| dmitriid wrote:
| There was a joke that all ML will be immediately banned the
| moment there's a Copilot for RIAA-licensed songs.
| yumraj wrote:
| It all comes down to this: this has not been tested in the
| court. The above opinion, or for that matter any opinion from
| any lawyer or not-a-lawyer, is just that, an _opinion_.
|
| As a business it is your responsibility to determine if this
| code-copying is worth a risk to your business.
|
| Based on my experience, I'm pretty sure all corporate lawyers
| will disallow such code copying, till it has been tested in the
| court. It's just a matter of who will be the guinea pig.
| pubby wrote:
| The issue isn't an AI reading copyrighted code, the issue is an
| AI regurgitating the lines of copyrighted code verbatim. To be
| clear, humans aren't allowed to do this either.
|
| And sure, nobody cares about your stupid binary tree, but do
| they care about GNU and the Linux kernel? Imagine someone
| trained an AI to specifically output Linux code, and used it to
| reproduce a working OS. Is that fair?
| PaulDavisThe1st wrote:
| > the issue is an AI regurgitating the lines of copyrighted
| code verbatim. To be clear, humans aren't allowed to do this
| either.
|
| That's a little broad. There's a wide range of licenses for
| software that explicitly allow precisely this.
| temac wrote:
| Tons of licences require at least attribution.
| lucideer wrote:
| There's a lot of sibling commenters disagreeing with this take
| but I think they miss that ultimately this comes down to how
| legal experts interpret tech, rather than what tech experts
| think law should apply.
|
| This is, imo, unfortunate, as often the legal interpretation is
| based on a gross misunderstanding of how the tech works, but
| this is the way.
|
| I don't think copilot should be legal according to my own
| interpretation but in this (rare) case I feel the "IANAL" tag
| applies not because I lack (legal) knowledge, but rather
| because I have (tech) knowledge that is likely absent from
| actual decision making on legal outcomes (therefore leading to
| different legal outcomes than how I would see things working).
| 41209 wrote:
| Copilot is lifting entire functions from GPL code. Legal
| technicality aside , I know I'd be upset if I gpl'ed some code
| and someone stole large parts of it.
| josefx wrote:
| > Copyright has concluded that reading by robots doesn't count.
| Infringement is for humans only; when computers do it, it's
| fair use.
|
| So wait, if I write my own AI, lets call it cp, and train it on
| gnu-gcc.tar.gz with the goal of creating a commercial-
| compiler.tar.gz then I can license the result any way I want?
| After all most of the work was done by the computer.
| axismundi wrote:
| Sorry, you can't. You are not rich enough to get away with
| it.
| pdonis wrote:
| _> nobody cares if my ten-line "how to invert a binary tree"
| snippet is the same as someone else's._
|
| Maybe nobody cares about that, but the problem is that Github's
| automated tool is not telling you what code it shows you is
| actually an exact copy of existing code, or how much of that
| existing code is being copied, or whether the existing code is
| licensed, or, if it is licensed, whether your copying is in
| accordance with the license or not. And without that
| information you can't possibly know whether what you are doing
| is legal or ethical. Sure, you could try to guess, but that
| sort of thing is not supposed to rely on guessing.
| niekverw wrote:
| "I am not a lawyer,"
|
| STOP READING
| nabilhat wrote:
| Autonomous programming will be explored. Potentially, Copilot
| is a proof of concept, an early step in that direction. If it
| is, the corrections made by Copilot users will be applied to
| the development of the future of unattended programming.
| Whether it is or not, it's close enough that any legal outcomes
| experienced by Copilot users will contribute to the definition
| of liability boundaries relevant to the future of autonomous
| programming. Copilot users are numerous enough that the
| incidence of risk is low of ending up under the foot of a
| copyright owner with the means and will to crush a user, but no
| one should take such a risk to use a novelty like Copilot in
| production code.
| robbrown451 wrote:
| "reading by robots doesn't count."
|
| It should be obvious that if the robot is simply scraping web
| sites and reproducing their text verbatim (without permission
| and without giving credit) that would be an infringement.
|
| There are a lot of shades of gray between that and the other
| extreme, which is where it is scraping millions of sites,
| learning from them, and producing something that isn't all that
| similar to any of them. Both ends of the spectrum, and
| everywhere in between, are things that humans can do, but as
| machines get more capable this is getting trickier and trickier
| to sort out.
|
| In this case, it sounds like it might be closer to the first
| example, since significant parts of the code will be verbatim.
|
| Ultimately, I am hoping that such things cause us to completely
| rethink copyright law. The blurriness of it all is becoming too
| much to make laws around. We just need better mechanisms to
| reward people for creating valuable IP that they allow people
| to freely use as they please.
| IfOnlyYouKnew wrote:
| Copyright requires a certain amount of creativity involved in
| its creation. I strongly suspect most code snippets of a few
| lines just don't qualify.
| blibble wrote:
| there's a nice example here of it reproducing carmack's famous
| inverse square root function from Quake 3 (sans GPL, of course)
|
| https://twitter.com/mitsuhiko/status/1410886329924194309
|
| this is clearly copyright infringement, and if it isn't: it
| should be
| [deleted]
| dang wrote:
| _Copilot regurgitating Quake code, including sweary comments_
| - https://news.ycombinator.com/item?id=27710287 - July 2021
| (625 comments)
| boxfire wrote:
| So what happens when someone makes a transformer network that
| can read fanfics and animate them live trained from the whole
| collection of MPAA movies? I mean its inevitable. Given the
| history of the MPAA, I don't think they're gonna lie down and
| just take it. I feel like we're in a slippery slope to provoke
| the "IP lords" into brutally draconian measures that will make
| the Disney copyright extensions look like a tax deferral.
| runawaybottle wrote:
| We are toeing the crater line. Quite frankly, there's clear
| evidence that humans have little regard for plagiarism versus
| inspiration.
|
| Will co-pilot offer royalties for auto suggestions that are
| committed to code bases? I'm sure our ML can track how
| similar the commits were.
|
| It's always fascinating to me how we have the tech to take,
| but never to give. Pay the motherfucker you stole this shit
| from.
|
| The proverbial: https://youtu.be/6TLo4Z_LWu4
| croes wrote:
| Looks like more than a minor infringement
|
| https://news.ycombinator.com/item?id=27710287
|
| And reading is no infringement but writing maybe is.
| jhgb wrote:
| > Infringement is for humans only; when computers do it, it's
| fair use.
|
| But ultimately the human is OK-ing the code and committing it,
| basically as his own work most of the time. I'm reasonably sure
| that this may matter to courts.
| erhk wrote:
| An AI isnt learning from it. Its effectively copying prior work
| when it solves a problem. There is no novel out of bounds data
| generation by modern ai approaches
| devinplatt wrote:
| > This product injects source code derived from copyrighted
| sources into the software of their customers without informing
| them of the license of the original source code. This
| significantly eases unauthorized and unlicensed use of a
| copyright holder's work.
|
| It appears that GitHub wishes to address this issue via UI
| changes to Copilot. A quote from a recent post on GitHub[0]:
|
| > When a suggestion contains snippets copied from the training
| set, the UI should simply tell you where it's quoted from. You
| can then either include proper attribution or decide against
| using that code altogether.
|
| > This duplication search is not yet integrated into the
| technical preview, but we plan to do so. And we will both
| continue to work on decreasing rates of recitation, and on making
| its detection more precise.
|
| That post is also on the Hacker News front page right now[1], but
| has 10% of the upvotes as this post so it's less visible.
|
| I'm hoping all the criticism will encourage GitHub to make a
| better product.
|
| [0]: https://docs.github.com/en/github/copilot/research-
| recitatio...
|
| [1]: https://news.ycombinator.com/item?id=27723710
| emersonrsantos wrote:
| Copilot assumes the code in the repo is right, so just start
| putting some wrong code there as an anti Copilot measure.
| Engineering-MD wrote:
| Hide the hay in a pile of rotten hay as it were.
| darnfish wrote:
| People really sign up without reading Terms of Conditions and
| then complain when GitHub decides to do something with the data
| that you've given them permission to use under the ToS
| Engineering-MD wrote:
| A tiny percentage (less than 1%) [0]of people read terms and
| conditions- they are long, repetitive and often in legal
| language. If you expect to read every terms and conditions and
| privacy policy (and every change there of), you would waste
| over 240 hours over the year.[1]
|
| [0] Bakos, Y., Marotta-Wurgler, F. and Trossen, D. R. (2014)
| 'Does Anyone Read the Fine Print? Consumer Attention to
| Standard-Form Contracts', The Journal of Legal Studies, 43(1),
| pp. 1-35. doi: 10.1086/674424.
|
| [1] McDonald, A. M. and Cranor, L. F. (2008) 'The Cost of
| Reading Privacy Policies', A Journal of Law and Policy for the
| Information Society, 4(3), pp. 543-568.
| calvinmorrison wrote:
| I abandoned github when they put code that was not licensed (is:
| copyright retained) and reproduced it and saved it in their
| Arctic Vault without the authors consent (mine)
| Retr0id wrote:
| How is the Arctic Vault different from any other offsite
| backup?
|
| I suppose one issue is that you (presumably) can't request
| deletion from it (which may even be a GDPR violation).
|
| Edit: I looked up the relevant GDPR stuff, apparently there's
| an exemption for when "erasing your data would prejudice
| scientific or historical research, or archiving that is in the
| public interest.", which it arguably includes the Arctic Vault.
| dheera wrote:
| GDPR only applies to EU users.
| Retr0id wrote:
| Arctic Vault includes code written by EU users, and there
| is similar legislation in non-EU jurisdictions, e.g.
| California's CCPA
| corty wrote:
| There is an exception paragraph for various kinds of archives
| in GDPR: https://www.privacy-
| regulation.eu/en/article-89-safeguards-a...
| wcerfgba wrote:
| What's wrong with the Arctic Code Vault [1]? Is the only
| problem that they didn't seek your consent? How is it different
| to deploying a new availability zone and having your public
| repos accessible on another server? Your code is preserved
| verbatim, and it's not possible for GitHub to provide their
| service without the right to make verbatim copies of your code,
| which presumably you agreed to as part of their ToS.
|
| [1] https://archiveprogram.github.com/arctic-vault/
| calvinmorrison wrote:
| I guess copying my code to microfiche is basically reprinting
| it without my permission.
| Dylan16807 wrote:
| But LTO is fine? I was going to ask if it was because it's
| not _intended_ as a backup, but that 's not even true, this
| _is_ intended as a backup on a long time scale.
| dmitriid wrote:
| > What's wrong with the Arctic Code Vault
|
| It's nothing more than a publicity stunt whose one and only
| purpose is to advertise GitHub.
| lifthrasiir wrote:
| Github does not own the Arctic Vault, there is an independent
| company behind it [1]. Given its purpose as a long-term
| archival, it is likely that exemptions to the copyright for
| (library) archival can apply here. [EDIT: This is probably not
| true, see the reply for the reason.]
|
| [1] https://www.piql.com/awa/
| dmitriid wrote:
| > Github does not own the Arctic Vault, there is an
| independent company behind it
|
| Github are the ones doing all the archiving. So, in essence,
| they _do_ own that. Piql are just the ones providing the
| storage: it 's a commercial for-profit entity employed for
| backup by another commercial for-profit entity.
| lifthrasiir wrote:
| It is technically true, but the Arctic World Archive
| specifically "accepts deposits that are globally
| significant for the benefit of future generations, as well
| as information that is significant to your organisation or
| to you individually" [1]. So it doesn't accept any data (at
| least as far as I see) and the Github archive should also
| have met this criteria.
|
| By the way, my initial statement that it may qualify for
| copyright exemptions turned out to be false for a different
| reason. They only apply when the library and/or archive in
| question is open to the public, and the Github Arctic Vault
| isn't. Thus I think it's actually a Github's generic usage
| grant in the ToS [2] that allows for the Vault. The Copilot
| is, of course, very different to anything described in the
| ToS.
|
| [1] https://arcticworldarchive.org/contribute/
|
| [2] https://docs.github.com/en/github/site-policy/github-
| terms-o...
| dmitriid wrote:
| > but the Arctic World Archive specifically...
|
| ...provides prime-rate marketing bullshit in its
| marketing materials
|
| > Thus I think it's actually a Github's generic usage
| grant in the ToS
|
| If you refer to Section D.4, then:
|
| - Arctic Vault is not "for future generations", but for
| GitHub only, since that section doesn't permit GitHum to
| just make copies willy-nilly for anything other than "as
| necessary to provide the Service, including improving the
| Service over time" and "make backups"
|
| - This specifically makes GitHub "the owner" of that
| data, and not "some third-party" as you originally
| suggested
| lifthrasiir wrote:
| If you insist the term "owner" for copyright grants, you
| have a faulty understanding of copyright. The terms of
| service, much like software license, only allows for the
| licensee to do some specific things (in this case,
| including backups) under certain circumstances agreed
| upon in advance. Copyright assignment, which is akin to
| the ownership transfer, is much harder.
|
| > This specifically makes GitHub "the owner" of that
| data, and not "some third-party" as you originally
| suggested
|
| This one is my fault though, I've used the "Arctic Vault"
| as an archival site, but as I later realized it is a
| Github's archive stored in the Arctic World Archive. So
| yeah, it's (only) Github that can retrieve the data.
| CognitiveLens wrote:
| I haven't read this interpretation of the Arctic Vault project
| - presumably most users of GitHub are okay with their code
| being reproduced/backed up across many production servers for
| fault tolerance. Making an 'extra special' long-term backup in
| the Arctic Vault doesn't seem like a meaningfully different
| action to me - i.e. using a cloud-based host is essentially
| opting in to this kind of 'license violation'.
|
| If they had taken one of their existing DB/disk backups and
| called it a vault, would that have been an issue?
| pmarreck wrote:
| Should I agree with this guy if I believe all software should be
| open-source? I don't think snippets of code have copyright
| strength; we pass them around constantly in Slack chatrooms, IRC
| and Stackoverflow...
___________________________________________________________________
(page generated 2021-07-03 23:00 UTC)