hngopher.com

       [HN Gopher] Copilot sells code other people wrote
       ___________________________________________________________________
        
       Copilot sells code other people wrote
        
       Author : joemanaco
       Score  : 621 points
       Date   : 2022-06-23 08:48 UTC (14 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | dgb23 wrote:
       | Is it smart enough to:
       | 
       | - respect attribution
       | 
       | - respect copyleft
       | 
       | - respect proprietary licences
       | 
       | - give the user appropriate hints about the above
       | 
       | Or does it just copy code without doing any of this?
        
         | spupe wrote:
         | No, it doesn't do any of that. However, it does not "copy code"
         | except in marginal use cases, the far more common scenario is
         | that it will suggest you very basic code that is akin to a
         | Stack Overflow reply.
        
           | dgb23 wrote:
           | I read a lot of open source code and might subconsciously
           | absorb techniques and patterns that are common. When I write
           | code I might be influenced by what I read, not line per line,
           | but rather generally.
           | 
           | Is it like that?
        
             | spupe wrote:
             | Kinda, but I think you are imagining something bigger than
             | it is. At least in my experience, it works well for simple
             | stuff like "iterate over x and extract y" or similar
             | queries that I imagine are well represented in its training
             | data. When you get to very specific functions, its answer
             | will be less reliable and more likely to be a wonky rehash
             | of the few examples it has for that case.
        
       | pabs3 wrote:
       | I wonder if FOSS folks could copyleft originally public/leaked
       | but proprietary code using CoPilot.
        
       | yaseer wrote:
       | Technically, programmers search, copy and modify code all the
       | time.
       | 
       | One might argue copilot puts into software an algorithm that
       | humans are already doing. Software like that is usually
       | inevitable.
       | 
       | Still, it sucks there's no benefit for the contributors.
       | 
       | The most ethical thing I can think of is some kinda 'Spotify-
       | like' revenue sharing model, based on how often their code is
       | used by others. Not that they'd ever implement that if they can
       | get away with it!
        
         | omnicognate wrote:
         | > One might argue copilot puts into software an algorithm that
         | humans are already doing.
         | 
         | That argument only works if you think what Copilot is doing is
         | meaningfully similar to what humans are doing. The debate about
         | how these models relate to human thought might have legal
         | implications.
         | 
         | As I understand it (IANAL) copyright doesn't protect ideas and
         | concepts. It protects the content itself. In theory, if I read
         | some copyrighted work, understand some idea in it and then
         | create a new work using that idea, without copying that
         | original work, then that is not a derivative work. (I think
         | this is at least how it's supposed to work - would love to be
         | corrected if that's wrong.)
         | 
         | So if I took a copyright work and rot-13ed it before
         | distributing copies, I think that would be clear copyright
         | violation, but if I made my own works using concepts I gleaned
         | from reading it, it wouldn't be.
         | 
         | So should Copilot be treated like the rot13 algorithm or like
         | me understanding concepts and generating new works using them?
         | That sounds like a fascinating legal debate to be had.
        
         | teakettle42 wrote:
         | > Technically, programmers search, copy and modify code all the
         | time.
         | 
         | When following the license terms, preserving the original
         | copyright, etc, sure.
         | 
         | However, honest, ethical people (including programmers) do not
         | plagiarize.
         | 
         | Copying and pasting code without attribution is plagiarism.
         | Doing it without following the licensing terms is a copyright
         | violation.
        
           | redox99 wrote:
           | I don't consider copying a 3 liner from stack overflow and
           | not writing an attribution plagiarizing (regardless if
           | technically speaking it is or isn't according to the law).
        
             | teakettle42 wrote:
             | Plagiarism isn't a legal concept, it's an ethical one.
             | 
             | You need to either attribute the source, or rewrite it in
             | entirely your own words -- just like when writing a paper.
             | 
             | Confirming to the license is also required; iirc, SO
             | requires attribution under the CC-SA license.
        
               | redox99 wrote:
               | > Plagiarism isn't a legal concept, it's an ethical one.
               | 
               | Well if it isn't a legal but an ethical concept, then
               | that's just your opinion, since there isn't some
               | universal body that establishes exactly what is ethical
               | and what isn't. And as I said in my previous comment, "
               | _I_ don 't consider".
               | 
               | > You need to either attribute the source, or rewrite it
               | in entirely your own words -- just like when writing a
               | paper.
               | 
               | Often times a three liner can not be changed in any way,
               | and is the _only_ solution to a problem. In _some_ cases
               | you may be able to change it only in terms of indentation
               | and variable names (in others you can 't even change
               | that).
               | 
               | But assuming you can do that, it makes no sense at all
               | just changing indentation and variable names just for the
               | sake of changing it.
               | 
               | > Confirming to the license is also required; iirc, SO
               | requires attribution under the CC-SA license.
               | 
               | As I said I'm not talking about the legalities.
               | 
               | https://stackoverflow.com/questions/55319570/how-can-i-
               | raise...
               | 
               | Are you going to attribute that every time you use
               | Math.pow?
        
               | teakettle42 wrote:
               | > Well if it isn't a legal but an ethical concept, then
               | that's just your opinion
               | 
               | Plagiarism being unethical is just _my_ opinion?
               | 
               | > Are you going to attribute that every time you use
               | Math.pow?
               | 
               | Does a simple 2-ary function call of a well-defined API
               | qualify as "taking someone else's work or ideas and
               | passing them off as one's own."?
               | 
               | If not, then it's not plagiarism.
        
               | redox99 wrote:
               | > Plagiarism being unethical is just my opinion?
               | 
               | What constitutes as plagiarism and what doesn't, outside
               | of what the law says, yes.
               | 
               | > Does a simple 2-ary function call of a well-defined API
               | qualify as "taking someone else's work or ideas and
               | passing them off as one's own."?
               | 
               | So you agree that taking some code verbatim from SO is
               | not plagiarism then?
               | 
               | What about this, would copy pasting this verbatim be
               | plagiarism?
               | 
               | https://stackoverflow.com/a/959004
               | 
               | And this?
               | 
               | https://stackoverflow.com/a/45049763
        
               | teakettle42 wrote:
               | > What constitutes as plagiarism and what doesn't,
               | outside of what the law says, yes.
               | 
               | It's pretty clear what it is.
               | 
               | The definition of plagiarism hasn't changed since you
               | were in grade school and were taught not to copy
               | sentences into your papers.
               | 
               | If you still don't understand what plagiarism is now,
               | yours is a willful ignorance that doesn't excuse
               | unethical behavior.
               | 
               | > What about this, would copy pasting this verbatim be
               | plagiarism
               | 
               | > https://stackoverflow.com/a/959004
               | 
               | Yes, that'd be plagiarism. It's also bad code.
               | 
               | You should use the example to understand the underlying
               | problem, at which point you will be well-equipped to
               | write your own one-liner.
               | 
               | If you can't write it using your own understanding of the
               | problem, you're not an adequate programmer and need to
               | improve your skill-set ... which won't happen if you just
               | keep plagiarizing code you don't understand.
        
               | redox99 wrote:
               | You're basically just repeating that your opinion is the
               | right opinion.
               | 
               | I don't agree that such example is plagiarism and I'm
               | sure a lot of people also would disagree that that's
               | plagiarism.
               | 
               | > You should use the example to understand the underlying
               | problem, at which point you will be well-equipped to
               | write your own one-liner.
               | 
               | > If you can't write it using your own understanding of
               | the problem, you're not an adequate programmer and need
               | to improve your skill-set ... which won't happen if you
               | just keep plagiarizing code you don't understand.
               | 
               | Who says you can't write it by your own, or you don't
               | understand it? Stack overflow and tools such as copilot
               | are often about saving time, not that you would be unable
               | to figure it out by yourself.
               | 
               | And besides that, the point of those examples is that a
               | lot of people without searching for those stack overflow
               | posts, would type that exact same code character by
               | character.
        
         | kaibee wrote:
         | > The most ethical thing I can think of is some kinda 'Spotify-
         | like' revenue sharing model, based on how often their code is
         | used by others. Not that they'd ever implement that if they can
         | get away with it!
         | 
         | Based on my understanding of how NNs work, I'm not sure its
         | even possible to implement something like that.
        
       | bborud wrote:
       | My personal reasons for _not_ using copilot are a bit simpler. I
       | believe the act of researching which solutions to use for a given
       | problem is not so much about time, or the code you end up with,
       | but about developing a better understanding of what you are
       | doing. You may end up just cutting, pasting and modifying a piece
       | of code you found, but hopefully, you were exposed to a few
       | different ways to accomplish the same thing, and it made you
       | aware of other choices that could have been made.
       | 
       | You could think of the evolution of practical problem solving in
       | software engineering like this:
       | 
       | 1. I have to invent a solution (because nobody else in the world
       | has a computer) 2. I have to know of a solution (education, word
       | of mouth...) 3. I have to look up a solution in the books I have
       | (commoditized knowledge) 4. I can look up solutions on the
       | internet <-- (we are here) 5. The computer suggests something and
       | I accept (some are here too)
       | 
       | From 1 to 4 the amount of cleverness required to solve small
       | problems drops a bit, but your productivity and exposure to
       | knowledge probably goes up.
       | 
       | I'm not quite sure what happens from 4 to 5. Personally I'm
       | actually more interested in the context solutions are presented
       | in than just the solution. In fact, I rarely copy and paste code
       | from the Internet, but I often look at multiple
       | suggestions/solutions and then borrow ideas or combine ideas from
       | several sources.
        
         | ok123456 wrote:
         | It replaces a few google searches to look up how to do
         | something with a new language or library. Keeping you in your
         | editor and from having to context switch, and possibly
         | distract/derail you, is worth it.
        
         | kraftman wrote:
         | I would be interested to know how many people are actually
         | using copilot to generate entire chunks of code that they don't
         | understand. For me it's just autocomplete on steroids, its not
         | answering any questions I don't know the answer to (other than
         | syntax ive forgotten), it's just making the boilerplate faster
         | to write so I can think about the actual problem I need to
         | solve.
        
           | tartoran wrote:
           | Not using copilot but if I did Id use it in the way you
           | expressed as well, just for plumbing and tedious stuff.
        
         | Yenrabbit wrote:
         | At least the way I use it, it's not taking much away from my
         | problem solving. It's just that instead of having to type
         | `particlesGeometry.setAttribute('position', new
         | THREE.BufferAttribute(positions, 3))` I just write `//Add as an
         | attribute` and then hit TAB, since Copilot is smart enough to
         | see that I've just prepared some geometry and populated an
         | array of positions (both operations also sped up by not having
         | to type the obvious bits). You're still having to think through
         | the solutions (I'm not just typing '//make a cool particle
         | sim') but no longer need to hit SO every few minutes for syntax
         | examples when using a new library or something.
        
           | ModernMech wrote:
           | That sounds like a problem that could be better solved
           | through language and library design rather than an AI that
           | sucks up all the code in the world.
        
             | williamcotton wrote:
             | And yet after all of these decades, after countless
             | advances in libraries and languages, I am still writing
             | boilerplate in C, JS, Python, et al.
             | 
             | I'm not sure that a language or library can ever understand
             | the context of code without following an ML approach.
             | 
             | Languages and libraries will always allow for more than the
             | immediate task at hand. The innovation is that this tool
             | understands which specific language or library features are
             | probably going to be needed next!
        
               | aembleton wrote:
               | Frustratingly I've had it insert Java code into Kotlin.
        
           | tartoran wrote:
           | So you're just getting a helping hand doing code plumbing.
           | That sounds good as long as you let the helper take the lead
        
       | spupe wrote:
       | I disagree. Copilot is selling content-aware code suggestions,
       | which is a result of code that other people wrote in their
       | platform, and which in no way affects the work of these people.
        
       | boomer_joe wrote:
       | We need a licence that forbids use in ML and the people willing
       | to sue github for it ASAP.
        
         | ilikehurdles wrote:
         | But using it in a GitHub project would be akin to those
         | Facebook comments that demand the company not monetize them.
        
       | andrelaszlo wrote:
       | There are a few reasons why this could be considered ethical.
       | First, open-source code is typically free to use, so the company
       | would not be taking advantage of anyone by using it to train
       | their AI. Second, the company would be providing a service that
       | people are willing to pay for, so they would be generating value
       | for society. Third, the company would be transparent about what
       | they are doing and would not be hiding anything from the public.
       | 
       | ...the above was generated by GPT-3 (text-davinci-002). Prompt:
       | Write an argument for why using open-source code to train an AI
       | and then sell the code generating service (without open-sourcing
       | it) is ethical.
       | 
       | The main argument against this is that it takes away from the
       | open-source community that contributed to the development of the
       | code in the first place. By selling a code-generating service
       | without open-sourcing it, the company is profiting from the work
       | of others without contributing back. This is unfair and takes
       | away from the overall open-source ecosystem.
       | 
       | Added two characters to the prompt :P
        
       | Havoc wrote:
       | Yes, though in a way so does stackoverflow & friends. Large chunk
       | of dev ecosystem is copy paste and I don't think this is
       | inherently problematic. It is always a case of standing on the
       | shoulders of giants.
       | 
       | Its more of a licensing issue to me. As far as I can tell it was
       | train on a blend of licenses which to me makes it inherently non-
       | compliant. At least some of it is going to be copyleft and find
       | its way into closed source.
        
       | nl wrote:
       | This isn't how a language model works.
       | 
       | It's SO frustrating that even on HN people still fall for this
       | naive and incorrect analysis. Pasting bits I've said before on
       | this topic:
       | 
       | Language models do not work like this. They can copy content but
       | usually that's for something like the GPL language text.
       | 
       | Generally they work on a character by character basis predicting
       | what is the most likely character to appear next.
       | 
       | This very rarely results in copying text, and almost never rare
       | text.
       | 
       | Mechanically it has learnt both syntax of language and how
       | concepts relate. So when it starts generating it makes sentence
       | that are syntactically valid but also make sense in terms of
       | concepts.
       | 
       | That's really different to just combining bits of sentences, and
       | it gives rise to abilities you wouldn't expect in something just
       | cutting and pasting bits of sentences. For example, few shot
       | learning is mostly driven by its conceptual understanding and
       | can't be done by something with no way to relate concepts.
        
         | tyingq wrote:
         | If this were true, then they would have trained it on all of
         | MS's proprietary source code too.
        
           | nl wrote:
           | It is true.
           | 
           | And that doesn't follow at all.
        
             | tyingq wrote:
             | There's enough examples of it regurgitating longish
             | verbatim code out there, and not just comments or GPL
             | license text.
             | 
             | If they are comfortable training it on code that isn't
             | licensed for unrestricted copy/paste, I don't personally
             | understand why they can't train it on their own code that's
             | also not licensed for that.
             | 
             | Edit: They even added 'q rsqrt,' to their banned word list
             | to squelch an example of long verbatim code passages.
             | 
             | Basically, it's not that I don't understand your
             | explanation. It's that it does emit long passages of
             | unchanged code in practice, for whatever real-world reason.
        
         | [deleted]
        
       | skc wrote:
       | I get the feeling this entire debate would have been non-existent
       | had this been a Jetbrains product instead.
       | 
       | The whole thing is just bizarre when the vast majority of
       | developers constantly look at OSS code daily and lift
       | ideas/patterns/snippets from there regularly without once looking
       | at whatever license is attached.
        
         | Luc wrote:
         | > the vast majority of developers constantly look at OSS code
         | daily and lift ideas/patterns/snippets from there regularly
         | 
         | Perhaps in your circles, but that's certainly not something
         | I've encountered over a 25 year carreer.
        
           | skc wrote:
           | So when you google a problem and it leads you to a code
           | snippet that solves it that just happens to be OSS, you
           | immediately scrub your brain and pretend you never saw it and
           | instead instead come up with your own completely independent
           | solution after the fact?
        
             | avereveard wrote:
             | Google usage is outright forbidden for work in institutions
             | that care about intellectual property rights, so the brain
             | scrub issue is just arguing at the wrong level.
             | 
             | If you're googling solutions around you're already not
             | taking intellectual property seriously enough to care about
             | what happens after you lift ideas around.
        
               | anonymoushn wrote:
               | Can you name these institutions? I am surprised to hear
               | that some institutions would prevent devs from viewing
               | e.g. documentation of the APIs they are using or academic
               | papers about algorithms for computing the multiplicative
               | inverses of 64-bit integers, if they accessed those
               | things via google
        
               | avereveard wrote:
               | IBM and another I'm currently under nda
               | 
               | I think them being also patent farm has a role in it.
               | 
               | Approved dependencies had api doc linked so no need to
               | Google these.
        
               | bloat wrote:
               | This is interesting. Is the internet completely cut off?
               | Do they have internal libraries of documentation for
               | third party stuff they are using (paper? digital?) Do you
               | have any example institutions, or what domain they are
               | working in? Thanks.
        
               | swader999 wrote:
               | I think it would be for super secure military coding. But
               | business domains? Hardly ever.
        
               | avereveard wrote:
               | The issue doesn't solely rest in copyright
               | 
               | A concern, which I think is legit, is that it is quite
               | easy for someone with a strong presence in search, web
               | advertising, analytics and mobile to puzzle together what
               | a company is investing in based on the aggregated
               | research and web access from known locations
        
               | skc wrote:
               | Very surprised to hear about this actually.
               | 
               | Maybe I live in a bubble, but the likes of
               | Google/StackOverflow have been part and parcel of a
               | developers toolbox for many years now.
               | 
               | And in any case I wonder how that is enforced. Eg,
               | Someone goes home in the evening and visits github,
               | learns a new trick and comes into the office the next day
               | and implements it.
        
             | teakettle42 wrote:
             | > ... and instead instead come up with your own completely
             | independent solution after the fact?
             | 
             | Yes, I'm not a plagiarist.
             | 
             | If you're literally copying and pasting code snippets
             | without attribution, you're plagiarizing.
             | 
             | You're also probably violating the OSS project's license.
             | 
             | It's no different than copying and pasting someone else's
             | sentence or paragraph into a written paper.
        
         | foxhill wrote:
         | > I get the feeling this entire debate would have been non-
         | existent had this been a Jetbrains product instead.
         | 
         | why so?
         | 
         | > The whole thing is just bizarre when the vast majority of
         | developers constantly look at OSS code daily and lift
         | ideas/patterns/snippets from there regularly without once
         | looking at whatever license is attached.
         | 
         | well, yes, copying an idea or pattern is generally.. accepted,
         | to be kosher. copy-pasting too, in small amounts (a function, a
         | type). that said, i would (and have) attribute even a notional
         | similarity when writing something open source.
         | 
         | i don't think co-pilot even allows the user to find where the
         | code came from.
        
         | goerz wrote:
         | I am not a lawyer, but my legal intuition / common sense says
         | that "code snippets" are not copyrightable. There's some
         | sliding scale on when a code snippet would become so non-
         | trivial that a reasonable (!) judge would consider it
         | copyrightable, but nothing Copilot does is anywhere close to
         | that limit, IMO.
        
           | shakna wrote:
           | One of the main claims in Google LLC v. Oracle America [0],
           | was based around a 9-line rangeCheck function. Whilst some
           | code can be too simple and small to copyright, programmers
           | and lawyers are probably not going to view snippets the same
           | way. Copilot creates risk.
           | 
           | [0] https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_Americ
           | a,_...
        
       | marstall wrote:
       | most of the code I write is glue sticking together 8 proprietary
       | systems nobody's ever heard of. how is copilot gonna help me with
       | that?
        
       | sytelus wrote:
       | Google just sells content other people wrote.
        
       | pen2l wrote:
       | Bit of a stretch to fashion AI-derived/AI-coauthored works as
       | other people's work. Are DALL-E portraits done Picasso-style
       | unrightfully selling Picasso's works? Is an individual selling
       | portraits done Picasso-style unrightfully selling Picasso's
       | works?
       | 
       | No, of course not. Joyce's literature was influenced by Ibsen,
       | Mozart looked up to Haydn, Newton was humble enough that he
       | openly professed he stood on the shoulders of his predecessors,
       | Perelman refused the Millennium prize because it wasn't also
       | offered to his colleague Hamilton.
       | 
       | All human innovation is iterative, and derivative.
       | https://www.youtube.com/watch?v=jcvd5JZkUXY
       | 
       | Our skill doesn't grow in vacuums, without outside mentorship and
       | guidance. There are areas where I am upset about the application
       | of AI, but this is not one of them. Consider copilot a gentle
       | guiding hand for those without access to a second pair of eyes
       | nearby to give you reminders on what you may otherwise have on
       | the tip of your tongue.
       | 
       | But in the way that Led Zeppelin refused to recognize how
       | _heavily_ their music was influenced by delta blues artist was
       | unbecoming, I can accept the argument that it is perhaps douchey
       | of Github to sit on Copilot as squarely their creation.
        
       | janandonly wrote:
       | Isn't every programmer in history (except the gall who invents
       | her own language and writes all her own code) simply an
       | archeologist for other people's work?
       | 
       | We all Duck/Google for code anyway. Why not admit and make it
       | easier?
        
         | pacifika wrote:
         | Copilot is doing this on an industrial scale. It's the
         | difference between copying sample code and outsourcing your
         | work to a third party colkectively
        
         | eline43 wrote:
         | You don't understand the difference between many open source
         | licenses or the concept of crediting open source code
         | authors... it does not mean that the code is free for everyone
         | to just use as they please...
         | 
         | https://www.gnu.org/licenses/license-list.en.html for a quick
         | intro
         | 
         | Also, are you okay with other people selling *your* work and
         | *you* getting nothing out of it? Many people are not.
        
       | sirsinsalot wrote:
       | Jaron Lanier's book "Who Owns the Future?" Is all about AI and
       | compensating those that input in training these very valuable
       | models.
       | 
       | I highly recommend everyone read it.
        
       | Separo wrote:
       | GitHub provides the repo hosting and tools for free on public
       | projects. I'm happy with this deal.
        
         | jalfresi wrote:
         | This does raise a point - do we now have to assume that all
         | those services that provide free hosting/access/service to open
         | source projects will be strip-mining the work of the open
         | source community to sell them back to us all? I almost feel
         | stupid believing it was an altruistic move to contribute back
         | to the shoulders of giants they were already standing on...
        
           | eloisius wrote:
           | I feel scammed too. At this point it should be obvious, but
           | I'm finally savvy to the fact that every tech company that
           | offers anything free, and you use it to create "your"
           | content, is not your friend and you don't even own the works
           | you host with them. I feel scammed that GitHub was cool about
           | 10 years ago. It was like the professional/cultural center of
           | gravity in my career. GitHubbers we're cool people. Everyone
           | cool hosted their site on GitHub Pages. I didn't want to see
           | a resume; what's your GitHub? Now I feel stupid for having
           | contributed whatever tiny bit of brains I did to this AI by
           | thinking that I was using the cool, developer-first code
           | website.
        
       | tiku wrote:
       | I'm using it for a day now and i'm really impressed. It is so
       | aware of stuff in old code, that it is scary. I'm working in an
       | old application with Zend Framework.
        
       | shahar2k wrote:
       | and Dalle2 sells art other people created
       | 
       | (I'm actually not being sarcastic, I think there needs to be some
       | sort of pipeline for compensating the artists who are used to
       | train these models
        
       | tiborsaas wrote:
       | MrDoob has an excellent point about this:
       | 
       | https://twitter.com/mrdoob/status/1539740854956412929
        
       | spupe wrote:
       | If you assigned a task to a junior dev, and he/she used some code
       | from open source projects and Stack Overflow to develop a custom
       | program for the task, would you say that this person is selling
       | you other people's code? Is it common or expected for this type
       | of use to be acknowledged?
        
         | genezeta wrote:
         | About 10 years ago or so, I was working at a certain place.
         | They put me into a small team apparently focused on some R+D
         | project under the direction of an "architect".
         | 
         | Basically, the project was to package Cordova + Backbone +
         | Marionette, plus a couple of tools, under their own commercial
         | name. Then they'd go around potential clients presenting it as
         | the perfect solution to build hybrid applications for
         | web/mobile/smartTV/whatever.
         | 
         | A certain Monday, the "architect" arrived boasting. He did that
         | often, but this time he was more boastful. He explained that he
         | had spent the whole weekend coding. He had written an
         | incredible tool that would create a skeleton for a project from
         | zero. You would type something like `tool create` and it would
         | create the whole project with all the scripts and some example
         | views and whatnot.
         | 
         | It was Yeoman's yo CLI tool, of course. He had just changed the
         | copyright in the comments, removed most of the comments, he had
         | deleted any mention to yeoman or the original creators, changed
         | the name of the executable script and that's it.
         | 
         | The whole thing was OS code picked up from various repos and
         | packaged as their own. The company used it to sell development
         | projects. The so-called-architect used it to sell himself
         | inside the company and then jump away into a startup as CTO.
         | 
         | Is this _common_ or is it just anecdata? I don 't know. It's
         | clearly not the only time I've seen something like this and I
         | do know that in certain companies around here it isn't exactly
         | uncommon. But I can't say how common or uncommon it is.
         | 
         | Would I call this "selling other people's code"? Yes, I would.
        
           | spupe wrote:
           | This is clear-cut fraud, but it is also not even close to
           | what Copilot or most junior devs are doing.
        
         | XCabbage wrote:
         | People I've worked with have different philosophies on this,
         | but personally, if you check in code that is distinctive enough
         | that I can identify the source you copied and pasted it from,
         | and you provided no indication (whether in a comment or a PR
         | description) that you copied it, I will really get quite grumpy
         | at you about it.
         | 
         | Way too often I burn half an hour needlessly during review in
         | one of two ways:
         | 
         | * trying to figure out how the heck someone figured out some
         | "magic" code that achieves something by invoking a bunch of
         | poorly documented library or framework internals, and trying to
         | reverse engineer WTF all the magic does by diving into the
         | framework's source... only to eventually think to google the
         | whole snippet rather than each individual method call, and
         | discover it's copied from a Stack Overflow answer
         | 
         | * trying to figure out why something was written in an
         | unidiomatic or overcomplicated way rather than a more obvious
         | approach, and commenting at length on how I'd simplify it...
         | only to eventually realise it was copied from a Stack Overflow
         | answer
         | 
         | Attribution isn't just about making sure the right person gets
         | credit, or about license compliance; reviewers and maintainers
         | frequently need to be able to see where stuff was copied and
         | pasted from in order to do their jobs effectively, even for
         | snippets of just a few lines.
        
           | spupe wrote:
           | I understand where you are coming from. However, I think you
           | are making the assumption that this person simply copy/pasted
           | some code with no understanding of it, or that this code is
           | then very different from your codebase and needs to be
           | refactored. If using Stack Overflow did not add to your
           | overall development time but subtracted from it, because it
           | was used as an appropriate piece of a much bigger puzzle - a
           | far more realistic scenario for both Copilot and our general
           | use of SO -, then I see no issue with it whatsoever.
           | Certainly no moral or copyright issues as this person on
           | Twitter implies.
        
             | thfuran wrote:
             | No copyright issues in the sense that no entity is likely
             | to ever pursue the matter, sure. But copying and
             | commercially using someone else's nontrivial bit of code
             | that doesn't have a license that says you can is quite
             | blatantly a copyright violation.
        
         | ben-schaaf wrote:
         | If I found out a junior dev had been copying copy-left or
         | proprietary code then I'd have to rip out that code, have a
         | chat with them and figure out what to do from there. Even if
         | the code isn't copy-left it's still someone else's code,
         | sometimes that's ok but sometimes it's definitely not.
        
         | whatatita wrote:
         | If the solution was made up of ideas from OSS and snippets from
         | Stack Overflow? No; that's fine.
         | 
         | If the solution was copied from an OSS project without proper
         | attribution? Yes. Absolutely. And they'd have words with a
         | senior dev and maybe even legal if the code they copied made
         | its way into production without attribution.
         | 
         | Many copyleft OSS licenses require attribution and distribution
         | of derivative works that we wouldn't allow.
        
         | mbreese wrote:
         | It depends on the source of that code and the expected license
         | of the code you paid them for. If everything is MIT/BSD (and
         | attributed), no problem. If the code was GPL and I'm making a
         | commercial product, we have an issue.
         | 
         | I'd also expect for any stack overflow code to include a
         | comment with a link to the stack overflow page.
         | 
         | I think one of the key points is to make sure any code taken
         | from another source is cited appropriately. If it isn't, or the
         | junior dev is passing it off as their own work, then we have
         | problems.
        
         | thelastbender12 wrote:
         | This is a good thought exercise. I wouldn't call it stealing,
         | though I am not sure how legal liability is assessed, say if
         | they picked up GPL code unknown to the company, and the company
         | is later sued over it.
         | 
         | This isn't derived from principled reasoning, but I think of it
         | as similar to community norms. Not the best example, but you
         | wouldn't mind someone subletting their homes to Airbnb, but if
         | all of your apartment complex does it, it invites regulation. A
         | product like copilot enables copying code (even if inspired,
         | and not verbatim) at a scale that individual developers can't.
         | So respecting software licenses needs to be codified (legally?)
         | while previously it was left unmonitored.
        
         | trention wrote:
         | It's absolutely fine to allow humans to do that while
         | prohibiting (commercialized) AI to do the same thing.
        
           | spupe wrote:
           | I don't see why that should be the case in this particular
           | scenario, or what benefit is gained from that. Could you
           | elaborate?
        
             | jhugo wrote:
             | Could you elaborate on why you think a computer program and
             | a person should be treated the same way in this respect?
             | 
             | We can take as self-evident that a human is capable of
             | reading about something, conceptualising it, and then
             | writing something completely new with the knowledge they
             | have gained.
             | 
             | I think it's also pretty uncontroversial that the primitive
             | "AI" we currently have is nowhere near the level of even an
             | average human at these things, and thus we can't just
             | blindly assume it is conceptualising rather than copying.
             | Copilot regularly produces verbatim copies of existing code
             | when working on non-trivial things.
             | 
             | Forget about the "AI" label: Copilot is just a complex
             | computer program, that takes code from other people and
             | inserts various permutations of it into your editor, whilst
             | ignoring the license of that code.
        
               | nl wrote:
               | Copilot understands concepts as well as may humans. You
               | can see primitive versions of this in the old Word2Vec
               | demos showing how those models understand how
               | London:England ~= Paris:France
               | 
               | Copilot is much more sophisticated than that, and it no
               | more copies code than a human does. It generates on a
               | character by character basis given the contextual
               | probability of the next character conditioned on the
               | previous set of tokens with the "heat" being a factor how
               | how randomly it will choose characters.
               | 
               | This is much more similar to how a human writes than
               | "copying".
        
               | jhugo wrote:
               | "it no more copies code than a human does" < that's a
               | very big call right there, considering how much verbatim
               | copying has already been documented in Copilot. The
               | primitive understanding Copilot has of what it is
               | generating doesn't even approach that of the most average
               | programmers. It's classic AI: impressive on the surface.
        
               | nl wrote:
               | This isn't true.
               | 
               | All the "copied code" I've seen is where the person
               | prompts it with a large amount of very unique preamble
               | and then it fills in the exact example they are quoting
               | from.
               | 
               | Try it without doing that.
               | 
               | And it's weird people think it can't understand
               | conceptual relationships. Word2Vec demonstrated that
               | nearly 10 years ago and that's a much weaker model in
               | terms of both size and techniques than this is.
        
               | jhugo wrote:
               | > And it's weird people think it can't understand
               | conceptual relationships. Word2Vec demonstrated that
               | nearly 10 years ago and that's a much weaker model in
               | terms of both size and techniques than this is.
               | 
               | Saying that Word2Vec or Copilot have "understanding" of
               | their input requires a redefinition of the word
               | "understanding".
        
               | nl wrote:
               | What's your definition?
        
               | spupe wrote:
               | I think it's best if we sidestep these big conceptual
               | questions about what cognition or creativity really are.
               | It's hard to find agreement, and perhaps it is not
               | necessary to do so.
               | 
               | My position is that if a person hired in a company can
               | currently use Google, Stack Overflow and GitHub to help
               | develop their custom scripts, and no moral or copyright
               | issues are infringed (ie, you don't try to say you came
               | up with it on your own, and you use only enough that it
               | is clearly fair use), then I think an AI should be able
               | to assist in that task. There is no need to complicate
               | things by legislating what the AI is doing and what
               | Google is doing, as they are very similar things and in
               | fact even use similar methods.
        
               | jhugo wrote:
               | I would agree with you if the AI was genuinely assisting
               | with that task, but it isn't.
               | 
               | It's taking inputs, ignoring their licenses, permuting
               | them in ways that are not understandable to the user, and
               | then outputting them.
               | 
               | That's an entirely different task than the user reading
               | SO or using Google and then writing their own code,
               | because the "AI" _is not capable_ of writing its own code
               | at that level.
               | 
               | Relying on this tool means ignoring the license of code
               | that you're copying, without even knowing that you're
               | doing it.
        
               | spupe wrote:
               | > That's an entirely different task than the user reading
               | SO or using Google and then writing their own code,
               | because the "AI" is not capable of writing its own code
               | at that level.
               | 
               | I would say it's a very similar task. If I need to
               | remember how to use a certain function, I can Google for
               | documentation and examples, or I can tell Copilot what I
               | want to do. The fact that the solution was presented by
               | Copilot or a SO thread is, in my view, irrelevant. And to
               | compound on that, I doubt anyone checking SO truly knows
               | where that answer came from. The person could simply be
               | reproducing a snippet from somebody else, you have no way
               | of knowing if it was licensed.
               | 
               | I don't think this is bad either. Even our current shitty
               | copyright laws protect that kind of use. I shouldn't have
               | to worry whether my little prime number generator uses an
               | algorithm first created by John Carmack or Microsoft.
               | Programming has evolved rapidly in great part because we
               | can all use other people's work and use it to improve
               | ours. Of course you shouldn't just copy and paste
               | everything and call it a day, but that's hardly what
               | Copilot enables anyway.
        
               | jhugo wrote:
               | You really seem to be ignoring the core issue by focusing
               | on SO though. Everything on SO is fair game, but code on
               | GitHub is under a variety of licenses, and when Copilot
               | regurgitates it, no matter how complex and inscrutable
               | the process is that leads it to do so, it may be causing
               | the user of Copilot to misuse that code because it
               | doesn't even give them the _opportunity_ to know where it
               | came from or what license it was released to the public
               | under.
        
               | spupe wrote:
               | Again, how does that differ from Stack Overflow? Do you
               | go and check whether a given reply belongs to a licensed
               | project?
               | 
               | Also, please consider that there is a toggle that allows
               | you to block Copilot from using public code.
        
               | jhugo wrote:
               | > Do you go and check whether a given reply belongs to a
               | licensed project?
               | 
               | All SO questions, answers and comments are CC BY-SA. The
               | terms of the site say that anyone submitting this content
               | agrees that it's licensed that way, and when you visit
               | the site you agree that you are provided with the content
               | under that license. It's not necessary for you to check
               | whether the submitter had the right to offer it under
               | that license; that's their problem. The same goes for any
               | content offered to you under a given license on any
               | platform. I don't understand what your question has to do
               | with the conversation.
               | 
               | The problem with Copilot, and I really can't believe this
               | has to be restated over and over again, is that it takes
               | code from projects with various licenses, and outputs it
               | in your editor in various transformed-or-not-transformed
               | ways (the fact that the transformation is extremely
               | complex doesn't change anything), and gives you no way to
               | know where the code came from, how it was licensed or how
               | it has been transformed. So, despite the fact that if you
               | use it enough you are virtually guaranteed to use code in
               | contravention of its license, you cannot even know which
               | projects you have stolen code from or which licenses'
               | terms you are breaking.
               | 
               | > Also, please consider that there is a toggle that
               | allows you to block Copilot from using public code.
               | 
               | Great. I'm sure its utility doesn't go down at all if you
               | turn that toggle off...
        
               | spupe wrote:
               | > All SO questions, answers and comments are CC BY-SA.
               | The terms of the site say that anyone submitting this
               | content agrees that it's licensed that way, and when you
               | visit the site you agree that you are provided with the
               | content under that license.
               | 
               | Have you ever read GitHub's conditions to know whether
               | they also have the right to use your code that way, no
               | matter how you decide to license it? I feel that you are
               | overly focused on the legal part here, which I'm sure was
               | handled by Microsoft's lawyers. I'm more interested in
               | the question of principle.
               | 
               | No matter what the terms of use at SO say, anyone can
               | give you an answer that is a copy of some code they don't
               | own. You may consider that immoral, but I don't, not at
               | the scope SO is used for. In addition, the vast majority
               | of cases at SO and Copilot are not about complex
               | functions being stolen, it's about some dumb code you
               | would have found in 2 minutes of googling. What I'm
               | trying to argue here is that if we are all cool with SO
               | and think it's useful, there is no fundamental difference
               | here. We never cared too much about licenses for
               | boilerplate code, and I think we shouldn't start now.
        
               | jhugo wrote:
               | > Have you ever read GitHub's conditions to know whether
               | they also have the right to use your code that way, no
               | matter how you decide to license it? I feel that you are
               | overly focused on the legal part here, which I'm sure was
               | handled by Microsoft's lawyers. I'm more interested in
               | the question of principle.
               | 
               | I have, and there is not. Neither could there be -- in
               | many cases the person uploading code to GitHub is not the
               | copyright holder -- they are just doing something
               | permitted under the license -- and for a large open
               | source project there could be thousands of copyright
               | holders. A random person mirroring some source code to
               | GitHub is in no position to negotiate different license
               | terms on behalf of the copyright holder(s).
               | 
               | > No matter what the terms of use at SO say, anyone can
               | give you an answer that is a copy of some code they don't
               | own. You may consider that immoral, but I don't, not at
               | the scope SO is used for. In addition, the vast majority
               | of cases at SO and Copilot are not about complex
               | functions being stolen, it's about some dumb code you
               | would have found in 2 minutes of googling. What I'm
               | trying to argue here is that if we are all cool with SO
               | and think it's useful, there is no fundamental difference
               | here. We never cared too much about licenses for
               | boilerplate code, and I think we shouldn't start now.
               | 
               | I don't understand why you think a person writing an
               | answer on SO and a computer program outputting some
               | permutation of its inputs into your editor are the same
               | thing. The person writing an SO answer is intelligent and
               | capable of conceptual understanding, the computer
               | regurgitating code without regard to its license is not.
        
               | spupe wrote:
               | >> Have you ever read GitHub's conditions to know whether
               | they also have the right to use your code that way, no
               | matter how you decide to license it? > I have, and there
               | is not.
               | 
               | At least one IP lawyer strongly disagrees, suggesting
               | anything you host on GitHub is fair game [1].
               | 
               | [1] https://fossa.com/blog/analyzing-legal-implications-
               | github-c...
               | 
               | > The person writing an SO answer is intelligent and
               | capable of conceptual understanding, the computer
               | regurgitating code without regard to its license is not.
               | 
               | From a copyright perspective, that is irrelevant. In fact
               | I would think Copilot has more incentives to not infringe
               | than a random SO user, who is very unlikely to be sued. I
               | already argued in another post that in my view, from any
               | perspective, it is also irrelevant whether it's a person
               | or AI doing the same work Copilot does.
        
               | jhugo wrote:
               | > At least one IP lawyer strongly disagrees, suggesting
               | anything you host on GitHub is fair game [1].
               | 
               | The question is whether Copilot's _users_ can use the
               | regurgitated code without following the license terms,
               | not whether Copilot was allowed to train their model on
               | it. I agree it 's likely fine for them to train the
               | model, but the _use_ of Copilot would seem to be a legal
               | minefield.
               | 
               | A little thought makes it clear that an affirmative
               | answer would be absurd. This would mean that using a
               | simple tool (let's say `cat`) to make a copy of some code
               | and subsequently ignoring its license terms is
               | infringement, but if the software used to make the copy
               | is more complex (or perhaps if it has the "AI" label
               | stuck to it!) the same actions are not infringement.
        
               | simion314 wrote:
               | If I make a script and train it on Windows source code do
               | you think MS will like it if I use that script on Wine ?
               | I am sure MS will say the license did not allows it and
               | your script transformations are not original, so GPL or
               | similar license should be respected by Microsoft too.
               | 
               | >My position is that if a person hired in a company can
               | currently use Google, Stack Overflow and GitHub to help
               | develop their custom scripts, and no moral or copyright
               | issues are infringed (ie, you don't try to say you came
               | up with it on your own, and you use only enough that it
               | is clearly fair use),
               | 
               | Only a judge will determine if it is actually free use,
               | if you by change copied some super clever and unique code
               | into your code base then I am sure a judge will not say
               | it is fair use, copilot was proven it will do this(though
               | MS said they put some IF-ELSE checks in the AI to prevent
               | the plagiarism to be detected by removing obvious results
               | and maybe obfuscating stuff more).
               | 
               | Maybe Stack Overflow license allows you to copy paste the
               | answers in your code, but GitHub code has repo specific
               | license that you need to respect.
               | 
               | If MS trained the model on all their private repos too
               | and made the model free software then many would not have
               | this issues. Or keep the model proprietary and train it
               | only on the MS repors, BSD and similar licensed repos.
        
               | trention wrote:
               | You are saying that the AI should be treated the same way
               | as a person would regarding its 'output'. I disagree.
               | This is a conceptual disagreement and you cannot just
               | sweep under the rug "what cognition or creativity really
               | are".
               | 
               | At the end, when in several (2-5) years we start seeing
               | structural unemployment emerging because of AI
               | deployments, this will be resolved by the legal system,
               | most likely by some sort of partial prohibition of
               | training/monetizing such systems.
        
               | spupe wrote:
               | I think I still have not understood your argument. Are
               | you saying that you are afraid that AIs will become too
               | powerful and cause unemployment, and therefore we should
               | regulate them now before they do so?
               | 
               | Many people are worried about this, which is why there is
               | a lot of debate about minimum income programs. However,
               | at present, what Copilot is doing is similar to what
               | Google does, and it is certainly not going to replace
               | devs any time soon. Personally, I think we should exploit
               | technology to its fullest, and the only reason we can
               | have this conversation is because in the past, we haven't
               | given too much consideration about the mailmen,
               | secretaries, delivery workers and everyone else who got
               | displaced by our use of the internet and similar
               | technologies. We merely adapted to better exploit them.
        
               | trention wrote:
               | I am not saying (in that last comment) what should
               | happen, I am saying what will happen. Past automation in
               | terms of impact is nothing compared to what's coming and
               | people and lawmakers will react accordingly - not in
               | favor of the automators.
        
         | jhugo wrote:
         | No matter how complex a program is, and no matter whether it
         | uses techniques sometimes described as "AI" in its
         | implementation, it's not a person. Copilot is just a very
         | complex pipeline from other people's code to your editor, which
         | ignores the license of those other people's code.
        
       | whywhywhywhy wrote:
       | Same deal for Dall-e if they ever productize it.
        
       | lysecret wrote:
       | Don't we all.
        
       | vbezhenar wrote:
       | I somewhat agree with that. Yesterday I edited some exotic
       | configuration (Kubernetes CSI driver for Cinder) and Copilot
       | suggested me config which looked like someone's config. There
       | were no values, so it was good at filtering them out, but it
       | definitely looked like cleaned part of code which resides in some
       | project.
       | 
       | I don't think that's bad though. Code sharing is good for overall
       | productivity.
        
       | c01n wrote:
       | MS and Github are thieves, all their code is closed source, yet
       | they sell copyrighted code they don't own. If they told us years
       | ago that our code will be automatically stolen by an "AI", most
       | coders would not have created an account. The innovation here is
       | that they have access to most of the worlds open source code and
       | automated the stealing.
        
       | blitz_skull wrote:
       | Man, people really do be angry that the public code they put on a
       | public platform is being used publicly.
       | 
       | Wild.
        
       | aetherspawn wrote:
       | Copilot is a fancy pattern bot.
       | 
       | Humans make original patterns, but since Copilot cannot think,
       | then Copilot does not. It squashes together a bunch of small
       | individual patterns, each under their own license, but at no
       | stage does it do anything more than pick a line from here, and a
       | line from there.
       | 
       | It doesn't think, and it doesn't create new IP.
       | 
       | It is like making a picture out of small snippets of a thousand
       | other pictures, and then selling it.. clearly not OK. You still
       | ripped off the original artists.
       | 
       | Or like plagiarising 100 of your class mates' assignments. Are
       | you less guilty because you went to the effort to steal just a
       | few sentences from each?
       | 
       | A criminal who steals a cent from every account at the bank is a
       | more sophisticated thief than someone who holds up a petrol
       | servo.
       | 
       | If Copilot doesn't create new IP (it doesn't; we established
       | this), then it uses existing IP. And in that case it is no
       | different to any of the three analogies above.
        
       | honkler wrote:
       | license issues will save many thousand jobs.
        
       | nathias wrote:
       | Copilot is a new way for corporations to break copyright while
       | enforcing it for everyone else, this will be the first big use
       | for AI when other corpos follow.
        
       | 0x_rs wrote:
       | I'm not a lawyer, nor very well versed in the vast world of
       | licenses and their definitions in court contexts, but I've been
       | wondering about something with the growing appeal ML-generated
       | content has for the average person (and the "high" barrier for
       | entry in the market) -- are licenses in some form or another
       | going to adapt to this phenomenon? From a brief search, I have
       | not found any new license with a no-dataset-usage clause
       | (assuming fair use does not apply, that's another big question).
       | What are the chances anything of the sort will become an option
       | for any "creative" work that's usually shared freely (such as
       | artwork, code, et cetera) even despite copyright? What about the
       | ownership of the dataset? It seemed to be questionable years ago
       | already that possibly IP-protected content goes through the black
       | box and resembling material gets on the other side, whose
       | ownership is it really? I'm guessing some notable court cases in
       | the future could define this in the following years if the
       | popularity continues growing.
        
       | abdulhaq wrote:
       | That's like saying a plumber just sells parts that other people
       | made
        
         | WesolyKubeczek wrote:
         | Except that a plumber buys them first. For money.
        
         | gtf21 wrote:
         | Which the plumber has bought and paid for and then installs for
         | you, which makes this pretty fundamentally different.
        
       | borishn wrote:
       | Copilot is fair use, get over it!
       | 
       | Copilot is not writing your code any more that Google search is
       | writing your code. You are writing your code, and Copilot is just
       | making suggestions.
       | 
       | US constitution secures limited copyright to "To promote the
       | progress of science and useful arts". Copilot is just that, get
       | over it!
        
         | jazzyjackson wrote:
         | Personally I think I'll just claim all the code I write with
         | co-pilot is a parody.
        
         | nescioquid wrote:
         | Not an expert, but fair use generally covers education,
         | criticism, parody, and satire. There is a test for meeting fair
         | use and it includes things like amount copied and commercial or
         | non-profit interest.
         | 
         | The amount copied from any particular source might be small,
         | but an aggregate strip-mining of many copyrighted sources is an
         | interesting twist. Another might be, as you suggest, it might
         | be a machine that itself does not violate copyright, but has
         | the effect of causing users (who accept the suggestions) to
         | violate copyright.
        
           | collegeburner wrote:
           | Google does the same thing taking snippets out of pages or
           | even completely caching them so you can see the entire page
           | from their servers.
        
         | brianmcc wrote:
         | Wait till it suggests something Disney can argue they own
         | rights to...
        
           | nojs wrote:
           | You mean like DALL-E? This debate is going to get interesting
           | when "in the style of" illustrations and videos go
           | mainstream.
        
           | acuozzo wrote:
           | LucasFilm - Pixar - Disney. I wonder if the mouse owns Duff's
           | Device...
        
         | Buttons840 wrote:
         | A good and well argued opinion made hostile by saying "get over
         | it" twice! Saying "get over it" discourages further discussion.
         | Your comment would be better without it.
        
           | cududa wrote:
           | Get over it.
        
           | borishn wrote:
           | You are right, but it is so frustrating how people whine
           | about this.
        
         | humanwhosits wrote:
         | Citation needed for copilot being fair-use
        
         | zerocrates wrote:
         | Yes, the copyright clause gives as its purpose "the progress of
         | Science," but that doesn't mean that anything which claims to
         | be "progress" gets a free pass.
        
           | ajb wrote:
           | Indeed, the US supreme court pointedly refused to accept that
           | the purpose clause limits the power of copyright in "Eldred
           | Vs Reno" (at least, that is my understanding as a non lawyer)
        
       | bmacho wrote:
       | On a side note, I do believe that short programs or functions
       | should be copyright free by law.
       | 
       | Or we as a community need to create a better bsd, a cc0 for
       | everything.
       | 
       | Almost everything is nontrivial, and almost everything is
       | copyrighted, at least with the pressure to name the original
       | author (BSD, GPL, other major permissive licenses).
       | 
       | Say you want to use a library, then you check for examples in the
       | documentation, now you have to denote somewhere that the example
       | is from the documentation (best if you put it in the source code,
       | so you don't lure other people to copy what you copied and refer
       | you as the author).
       | 
       | It is a major PITA at least for me.
        
         | stagas wrote:
         | What about a law that makes all code available but then
         | requires you to use a portion of your earnings to compensate
         | the people their dependencies you used?
        
       | dgb23 wrote:
       | Reading many of the comments here I feel like one important thing
       | is being left out that is not related to legal, but to social
       | issues:
       | 
       | Who is on the side of open source? Where are the big, powerful
       | institutions and companies that deeply care about authors and
       | communities providing free software that so many of us rely on?
        
       | olalonde wrote:
       | I'm going to make a bold prediction: no one will ever lose a
       | copyright lawsuit due to usage of Github Copilot generated code.
       | The code snippets it produces are too small or trivial to qualify
       | for copyright infringement.
        
         | ModernMech wrote:
         | CoPilot is a new technology, and smallish snippets of code are
         | all it is capable of at this point. Microsoft will surely work
         | to expand its capabilities to produce larger and more complex
         | programs, don't you think?
        
       | janosdebugs wrote:
       | It'd be nice to see some proof here. Copyright is not absolute
       | and does not extend, for example, to things that have no
       | creativity in them. There are only so many ways to write a for
       | loop or an if condition. Training an ML model from a large body
       | of code IMHO violates copyright no more than any of us reading
       | code and learning from it, as long as GH Copilot doesn't spit out
       | code that's exactly the same as something already existing.
        
       | madrox wrote:
       | I don't think any professional community is aligned on how to
       | think about ML-generated content yet. We don't know how to
       | apportion rights between the data owner, the model owner, and the
       | end user, and I don't think existing copyright law is ready for
       | it. At least for software, I think the way forward is for the
       | next generation of software licenses to explicitly state whether
       | the code can be used to train ML models and what those models can
       | be used for. Without explicit language, we'll be squabbling over
       | interpretations of fair use.
       | 
       | There's going to be some big cases here. It's going to end up in
       | the Supreme Court sooner or later, and if it were to go there
       | today I think I know what they'd say.
        
       | [deleted]
        
       | LeonTheremin wrote:
       | And social media sells ideas other people thought.
       | 
       | Copilot is limited to public code now, but it may easily be
       | trained on non-public code - albeit this probably won't be for
       | sale to the public.
        
       | HeavyStorm wrote:
        
       | williamcotton wrote:
       | Should the snippets that Copilot is regurgitating be considered
       | for copyright in the first place?
       | 
       | It seems akin to trying to copyright a certain drum pattern or
       | chord progression.
       | 
       | Also, the history of the GPL, MIT, commercializing lisp machines,
       | Symbolic, infighting, etc... seems a very different context than
       | Copilot so I am having difficulty seeing the systemic problems
       | that tools like this encourage.
       | 
       | There is of course a surface level similarity in that a
       | corporation is profiting from IP in the public domain but the
       | devil is in the details.
        
       | Proven wrote:
        
       | SMAAART wrote:
       | Once again Innovation challenges IP.
        
       | tsujp wrote:
       | Copilot produces verbatim GPL'd code. It's also a closed box.
       | 
       | Source: https://twitter.com/mitsuhiko/status/1410886329924194309
        
       | JacobiX wrote:
       | It's the same problem with those ML models, the other day someone
       | generated a children's book using GPT3, turned out that there is
       | a real children's book with the same name and a very similar
       | content: The Very Lonely Firefly by Eric Carle.
        
         | bartq wrote:
         | Other thing I'm worried about: how to retract facts from ML
         | model? I guess it's impossible, you need to retrain from
         | scratch with part X removed from training set. Or... people
         | could invent layered ML models similar to docker - each layer
         | would be marked what data it was trained with. Then at least
         | you'd have some cache of trained model to re-use in next
         | training session. Nasty stuff.
        
           | alpaca128 wrote:
           | Or instead of inventing complicated layered ML models Github
           | could just use each repo's license information to decide
           | what's okay to use. Detecting licenses is already a feature
           | on that site.
        
             | afiori wrote:
             | Many licenses requite attribution, which would be hard to
             | track.
        
         | icoder wrote:
         | Interesting, it's a big question I've had for a while, how
         | 'original' stuff coming from these AI systems is, and also the
         | distribution of uniqueness over many answers. I haven't dived
         | into it yet, but I find it surprising how little this comes up
         | when these systems are discussed (ie here on HN).
         | 
         | Does anyone even know? Can we even check? What if 1 in a
         | thousand, or one in a million outputs is (very close to)
         | something existing? I find this especially relevant when
         | generating faces.
        
       | eline43 wrote:
       | There needs to be an update to either licenses or GitHub (and
       | other) software directly, or even software terms of services,
       | that gives the user an opportunity to opt-out of their data being
       | used to train proprietary AI models.
       | 
       | 'I don't agree with having an AI trained on/with my data.'
       | 
       | IMHO, all other problems with copilot stem from this.
        
       | shireboy wrote:
       | I do feel these arguments are valid if a little overstated. Most
       | devs have googled, found some code, and pasted it in without
       | thinking about attribution. Doesn't make it right, but it is a
       | question of how much code is being copied and how specific. For
       | example, I peruse open repos to learn - I learned about the
       | spread operator in JavaScript that way- doesn't mean every time I
       | use it I need to attribute whatever repo I saw it in. But, yeah,
       | if I copied a larger chunk and the owner wants attribution,
       | probably wrong.
       | 
       | I like the idea of having the bot automatically update a
       | attribution file if it detects it's used licensed code. Seems
       | like it would be fairly trivial. Also a robots.txt for repo
       | owners to control automated use.
       | 
       | Also, they should totally pay back a portion of revenue to the
       | community and support the repos used to train. That seems like it
       | would be a good PR move if nothing else.
        
         | Aeolun wrote:
         | > Also, they should totally pay back a portion of revenue to
         | the community and support the repos used to train.
         | 
         | Aren't they already doubling all Github sponsorship money?
        
           | david_allison wrote:
           | Not doubled any more, but they don't take a cut, and pay the
           | processing fees for you.
        
         | kachhalimbu wrote:
         | I like this take. Copilot to me seems a glorified (very
         | intelligent) auto-search-paste/autocomplete service. It is just
         | mimicing what usual devs do which is to copy-paste code from
         | StackOverflow/github for many mundane types of codes like for
         | loops, mongo find queries, callback func definitions etc for JS
         | devs for eg.
         | 
         | The idea of auto-attribution if copilot surfaces licensed code
         | is best because then it keeps the copilot user honest where the
         | code is coming from and honor the original license.
        
           | teakettle42 wrote:
           | > It is just mimicing what usual devs do which is to copy-
           | paste code from StackOverflow/github for many mundane types
           | of codes like for loops, mongo find queries, callback func
           | definitions etc for JS devs for eg.
           | 
           | I'm genuinely disturbed to see how many people in this thread
           | think that casual plagiarism is the norm for "usual devs".
        
             | Aeolun wrote:
             | Dunno what devs you work with, but I've someone care
             | literally never.
             | 
             | None of the code I work on is public, so attribution is
             | pointless in the first place.
        
             | ParetoOptimal wrote:
             | > I'm genuinely disturbed to see how many people in this
             | thread think that casual plagiarism is the norm for "usual
             | devs".
             | 
             | I'm disturbed it is likely the reality.
        
             | shireboy wrote:
             | Again, I get the argument, just think it's overstated.
             | First, when referring to stack overflow and blogs,
             | generally, that's intentionally shared with the express
             | purpose of people copying it- hopefully while learning from
             | it at the same time. Second, again with some code bits it's
             | not really plagiarism any more than all iambic pentameter
             | is plagiarizing Shakespeare.
             | 
             | Devs often look at code to see basic syntax, understand
             | algorithms, etc. There is absolutely nothing wrong with
             | this. One should draw a line somewhere, but to say I need
             | to attribute [...somevar] every time I use it because I
             | happened to see it one time on a blog post is silly.
             | 
             | A thought experiment may help: Scrape Github for all unique
             | strings longer than X and store in a file with a timestamp
             | and owner. How large does X have to be before attribution
             | is required? If not length, then how do you determine
             | whether attribution is required?
        
       | HumanReadable wrote:
       | Sorry for the unproductive tone of this comment, but there's
       | something about the attitude of this tweet that really grinds my
       | gears.
       | 
       | Any time someone invents something new and incredible, there's
       | always a crowd of negative nancies eager to discredit and explain
       | why the invention is nothing new and a detrement to society.
       | 
       | I don't understand why someone would willingly share their code
       | on github where it is publicly available just to complain when
       | others make use of that knowledge.
       | 
       | 'co-pilot just sells code other people wrote' is such a
       | ridiculous understatement of what co-pilot does. Instead of
       | marvelling at the human ingenuity that went into creating it,
       | they sneer at the audacity of openAI to do something without
       | first asking their permission.
        
         | Sakos wrote:
         | I share my code without a license because I want others to be
         | able to see how I solved things. However, this doesn't mean I'm
         | okay with wholesale copying my code. If it's some random guy,
         | then whatever. If it's a corporation like Microsoft, then yeah,
         | I have a problem with it. Under German law, the code is legally
         | not allowed to be reproduced or used without explicit
         | permission even if it doesn't have a license. I retain
         | ownership of it until and unless I explicitly relinquish my
         | ownership rights.
        
           | paulcole wrote:
           | > Under German law, the code is legally not allowed to be
           | reproduced or used without explicit permission even if it
           | doesn't have a license
           | 
           | This is nuts. How can anbody be expected to both know that
           | you're German and German law when you post on an
           | international website?
           | 
           | Or is this a German law that exists to prevent other Germans
           | from doing things but that the rest of the world scoffs at?
           | 
           | https://choosealicense.com/
        
             | solar-ice wrote:
             | You're expected, wherever you are, to look into where any
             | code you use comes from and what legal rights you have to
             | use it. (The author not offering you a license means you
             | can't use the code, nearly anywhere in the world - pretty
             | basic Berne Convention stuff.)
             | 
             | This is the legal expectation in general, not just for
             | software - you can't just come across a design for a neat
             | widget somewhere and start using it in your product,
             | there's probably both copyright and patent on it. Software
             | isn't special. Not everything in Github can be copied into
             | your code verbatim.
        
             | falcolas wrote:
             | That's how us law works too. Works are automatically under
             | copyright, even if you don't say so. It needs a license to
             | lessen the copyright restrictions.
        
             | giaour wrote:
             | US law is pretty similar in this regard, isn't it? If you
             | don't have a license for a particular piece of code, you
             | can't use it without the author's/copyright holder's
             | permission, even if you found it posted online.
        
           | Xunjin wrote:
           | Well, it depends on where you post it, right? Because if you
           | are using a GitHub which probably is US based, you follow the
           | laws related to US?!
           | 
           | Demanding that the law of a country should be followed by
           | another is totally no sense. They can agree, make agreements
           | about it, and even take legal action to the Highest court, so
           | it could be evaluated, but using your nationality as an
           | argument of what you can do, it's just plain wrong.
        
             | Sakos wrote:
             | https://choosealicense.com/no-permission/
             | 
             | I always find it weird how people respond to my comments.
             | Why didn't you check what the US law is like for source
             | code? A lot of places have similar laws around source code,
             | primarily in the West because of efforts to normalise laws
             | across countries, driven by US efforts. And other
             | countries? Well, it's the same for any kind of IP. Either
             | the country has strong IP law and you have the resources to
             | pursue an issue or not and you can't do anything about it.
        
         | hansword wrote:
         | If I enter 'Mickey Mouse' into an ML-TTI thing like Craiyon
         | (Dall E mini) do you think I will be able to sell the resulting
         | image on a Tshirt?
         | 
         | No, I won't, because Disney has fancy lawyers, the average open
         | source developer hasn't. What you are saying is: Screw little
         | people, let M$ make their money.
         | 
         | Either copyright is for everyone, or for no one. I prefer the
         | latter, but this is not the world we live in.
        
           | fonix wrote:
           | This is more like entering "cartoon mouse nose" into Craiyon
           | though. You're getting incohesive code snippets returned to
           | you based off a single line (appropriate word for code and a
           | drawing).
        
           | jimnotgym wrote:
           | Isnt this an indictment of the justice system rather than the
           | big firms.
           | 
           | I once heard this quote, "English justice is open to all, in
           | the same way that The Ritz [very expensive hotel] is open to
           | all."
        
             | gilrain wrote:
             | The useless justice system has been engineered by the firms
             | for their benefit.
        
           | hourago wrote:
           | There big difference is that by copying Micky Mouse you are
           | hurting one of the most known and very powerful corporations
           | in the world, by copying code you are just hurting open
           | source projects and individual developers.
           | 
           | It should not be different, or if anything, it should be
           | worse to punish people with less resources. But here we are.
        
         | lobocinza wrote:
         | Plagiarism isn't new or incredible.
        
         | the_gipsy wrote:
         | > share their code on github where it is publicly available
         | just to complain when others make use of that knowledge
         | 
         | I put a fucking license on it so that it doesn't get abused by
         | some fucking corporation. Jesus Christ, it's not hard to
         | understand.
        
         | rockbruno wrote:
         | My problem with this conversation is how we can have a 200
         | comment thread without anyone providing any kind of proof to
         | these claims. Is there any instance of this bot printing an
         | actual copyrighted algorithm instead of a mundane
         | uncopyrighteable piece of logic?
        
           | sascha_sl wrote:
           | One of the earliest examples was Copilot printing Quake's
           | fast inverse square root verbatim, including swearing in a
           | comment.
           | 
           | Quake's source code is GPL.
           | 
           | There are plenty more if you're willing to look.
        
           | Xunjin wrote:
           | The famous "burden of proof" fallacy. In the end, I'm eager
           | to anyone who can prove it, sue them and see the results from
           | it.
        
           | dgb23 wrote:
           | There are examples of it providing literal copies of code
           | without attribution etc.
        
         | [deleted]
        
         | pmarreck wrote:
         | I think copilot is amazing. I don't care what, if any, of my
         | code snippets it uses because I also gain from it by skipping
         | boilerplate (as well as things like bash idiosyncrasies). Using
         | it feels like I am working with dozens of invisible
         | collaborators
        
         | lin83 wrote:
         | > Instead of marvelling at the human ingenuity that went into
         | creating it, they sneer at the audacity of openAI to do
         | something without first asking their permission.
         | 
         | Something being cool doesn't exempt it from discussion of its
         | ethics and certainly doesn't exempt it from legal consequences.
         | Often what people call "disruption" is often just exploiting
         | resources/people/their work in unsustainable ways until
         | oversight is introduced.
         | 
         | If CoPilot is copy/pasting large amount of code with unknown
         | licenses, that is a large and real risk for users aside from
         | violating open source projects licenses.
        
           | leereeves wrote:
           | > Something being cool doesn't exempt it from discussion of
           | its ethics and certainly doesn't exempt it from legal
           | consequences.
           | 
           | Indeed. The heist in Ocean's Eleven was cool, but it was
           | still theft.
        
           | moffkalast wrote:
           | Moreover it's a genuine danger for non-hobbyist developers
           | since you could be including stolen code into a market
           | product.
           | 
           | Even including something banal like Linux is already
           | problematic since it's GNU licensed, which by extension makes
           | your entire project GNU licensed and you can't keep the
           | exclusive rights to it.
        
             | ryukafalz wrote:
             | Just to clear this up, since I've heard this a lot before:
             | 
             | > since it's GNU licensed, which by extension makes your
             | entire project GNU licensed and you can't keep the
             | exclusive rights to it
             | 
             | This is incorrect. Including GPL code in your product
             | cannot automatically relicense your code. It's just a
             | copyright violation if your product's license isn't GPL-
             | compatible and you don't abide by the GPL.
        
         | OrwellianTimes wrote:
         | Fully agreed. It's just people getting mad and jealous but hear
         | me out.
         | 
         | Copilot is NOT SELLING coed other people wrote, it is simply
         | acting as a curator to show you all the solutions people HAVE
         | WRITTEN for free.
         | 
         | Copilot does NOT write entire programs, it's simply an
         | assistant. And there is not much copyright you CAN apply to 3-4
         | lines of generally understandable code.
         | 
         | I've used Copilot and am actively paying for and I have not
         | seen many cases where it's generating bad code. It's only there
         | to remove boilerplate and common problems, not there to write
         | entire applications.
         | 
         | Why are people getting so salty?
        
           | boesboes wrote:
           | Because they _are_ verbatim copying code and not respecting
           | the license. It's not that complicated.
           | 
           | Github knows better, can do better and should.
        
             | olalonde wrote:
             | Do you have an example of Github Copilot doing that? Like a
             | snippet of code generated by Copilot and a link to the
             | original source code.
        
               | falcolas wrote:
               | An example posted here on HN.
               | 
               | https://news.ycombinator.com/item?id=27710287
        
               | olalonde wrote:
               | Thanks. Personally, I feel like such small and widely
               | used mathematical algorithms should not be copyrightable
               | (or using them should fall under fair use). It even has
               | its own Wikipedia page[0], where the source code is also
               | reproduced without copyright notice.
               | 
               | [0]
               | https://en.wikipedia.org/wiki/Fast_inverse_square_root
        
               | falcolas wrote:
               | It's the verbatim replication of the comments that makes
               | this a damning piece of evidence against the "it's not
               | copying code, it's an AI" argument.
        
               | olalonde wrote:
               | Yes, it is clearly copying code from Quake, I wasn't
               | denying that.
        
               | zzo38computer wrote:
               | I also implemented this algorithm in MMIX:
               | % Constants       FISRCON GREG #5FE6EB50C7B537A9
               | THREHAF GREG #3FF8000000000000        % Save half of the
               | original number        OR $2,$0,0        INCH $2,#FFF0
               | % Bit level hacking        SRU $1,$0,1        SUBU
               | $0,FISRCON,$1        % First iteration        FMUL
               | $1,$2,$0        FMUL $1,$1,$0        FSUB $1,THREHAF,$1
               | FMUL $0,$0,$1        % Second iteration        FMUL
               | $1,$2,$0        FMUL $1,$1,$0        FSUB $1,THREHAF,$1
               | FMUL $0,$0,$1
               | 
               | (Note this assumes that the input number is not too
               | small; if it is, then it will not be possible to compute
               | half by this algorithm. Also, like with the original
               | code, the second iteration may be omitted if desired.)
               | 
               | (This comment and the MMIX code it contains, and all
               | other comments that I wrote on here, are I agree release
               | it to public domain.)
        
         | nerdponx wrote:
         | Both things can be true. It's clear that it violates the
         | licenses of many software projects. But I do agree that
         | denigrating it as "just selling other peoples code" is missing
         | the whole point of the product and of what you pay for when you
         | subscribe to it.
        
         | nixpulvis wrote:
         | You should read more about peoples ideologies and philosophies
         | of Open Source.
         | 
         | One big reason I support it is because it grants me the right
         | and ability to change things I need/want to change.
        
         | B1FF_PSUVM wrote:
         | > negative nancies
         | 
         | Not bad for everyday use - I like "nattering nabobs of
         | negativism" (as scripted by William Safire), but it is really a
         | bit over the top.
        
         | rambojazz wrote:
         | Sounds like they're not selling any of your code
        
           | barthvr wrote:
           | Copilot access is $10/month.
           | 
           | Think about how Napster was treated back in the day, or
           | torrent websites. You pay to access some copyrighted content.
           | Is it legal ?
        
         | throwoutway wrote:
         | I hear you, but this isn't a "marvel at this free open clever
         | academic thing we built"
         | 
         | It's a product by a business. Why is that not open to
         | criticism?
        
         | meheleventyone wrote:
         | They own their code and it either has a license for use or is
         | implicitly rights retained if not. If Copilot regurgitates
         | their code, from a project that is public but with a non-
         | permissive license they are having their IP rights violated so
         | are totally correct in being unhappy about that.
         | 
         | Just because you've made something cool doesn't give you the
         | right to harm others in the process.
         | 
         | If MS or OpenAI don't think this is the case then they should
         | have also included their private repositories.
        
           | Zambyte wrote:
           | > from a project that is public but with a non-permissive
           | license
           | 
           | Permissive or not doesn't matter. Public Domain or not is
           | what matters. Permissive licenses still require you to
           | propagate the copyright notice, which Copilot strips.
        
           | nojs wrote:
           | It doesn't really "regurgitate code" all that much in
           | practice though. It's a super impressive product and these
           | arguments seem more like people looking for an excuse to hate
           | new, scary technology.
        
           | core-utility wrote:
           | Do we have any evidence that copilot _doesn 't_ check/filter
           | by license?
        
             | bayindirh wrote:
             | There was a tweet by Nora Tindall (which is deleted) having
             | a screenshot of a mail direct from GitHub stating that GPL
             | code is included in the training of the Copilot and will
             | indeed use it.
        
             | samatman wrote:
             | This is _in fact impossible_.
             | 
             | All they could do is filter by the LICENSE file in the
             | repo.
             | 
             | Unfortunately for them, by law copyright and license are
             | determined _by the authors_ and merely represented by a
             | LICENSE file, which could be lying about both.
             | 
             | The court isn't going to accept that excuse when this goes
             | to trial.
        
               | gjadi wrote:
               | And you can have multiple licenses in the same
               | repository, folders with copyright exceptions, etc.
               | 
               | It's hard enough for us human to find our way in this
               | mess, I've little hope for an AI.
               | 
               | But maybe it's just the first step. The final step being
               | able to sell an AI that understands Copyright management.
               | I'm sure there is a big market for that.
        
               | mroche wrote:
               | I feel like a few guidelines and standards could help
               | simplify a baseline process:
               | 
               | 1) Require each repository to opt-in to be learned from.
               | 
               | 2) Require any source file used for learning to have an
               | SPDX license heading.
               | 
               | 3) Have a list of approved permissive licenses to avoid
               | any proprietary or copyleft arguments.
               | 
               | Using SPDX headings as the explicit guide would solve the
               | problem of different code content using a different
               | license within a project. An example being QtWayland: the
               | client pieces are Proprietary/LGPL/GPL, whereas the
               | compositor parts are Proprietary/GPL. That's not
               | something you'd know from the license files at the root
               | of the project (and post-6.3 they use SPDX instead of the
               | prior license template heading).
               | 
               | Granted, this doesn't solve the problem of the chain of
               | trust (is the individual publishing the code truly the
               | copyright owner), but I think it would be a basic start
               | for a program like this. The opt-in nature would make
               | things... difficult, but I think that's a fair trade-off
               | for something like this.
        
               | gjadi wrote:
               | Yes a standard would probably solve the issue.
               | 
               | But until lawyers push for a standard that would make
               | this part of their work irrelevant, I can't see how it
               | could happen :)
        
               | mnd999 wrote:
               | And that is why this project should never have made it
               | past the brainstorming session.
        
             | meheleventyone wrote:
             | One of the (ex?) programmers from Valve managed to get it
             | to spit out parts of the Source engine verbatim. He posted
             | a Twitter thread yesterday I believe.
        
               | leakbang wrote:
               | Can you post the link to that?
        
               | meheleventyone wrote:
               | Sure: https://twitter.com/ChrisGr93091552/status/15397316
               | 329318031...
        
               | dekhn wrote:
               | 3 lines of fairly generic code?
               | 
               | That's not what copyright is protecting.
        
               | meheleventyone wrote:
               | Just for the record I was providing some evidence to
               | support this question: "Do we have any evidence that
               | copilot doesn't check/filter by license?"
        
               | dekhn wrote:
               | I mean, even if the license was placed on the code, that
               | doesn't mean, if it's not protected by copyright, then
               | it's fair game for copilot to scrape, learn from, and
               | emit variations of, the code.
               | 
               | I believe github's lawyers would have had hundreds of
               | hours of dicussion about this and at this point, they
               | believe they are in the right, and anybody who disagrees
               | should use the legal system to resolve the matter.
               | 
               | In the meantime, what it is and isn't doing wrt licenses
               | seems to be poorly understood externally.
        
               | mustyoshi wrote:
               | Does that prove it ignores licenses or does that imply
               | the source engine exists verbatim (minus licenses)
               | multiple times on Github?
        
               | micromacrofoot wrote:
               | just because someone else ignored the license doesn't
               | mean github is free to blindly vacuum that up
        
               | meheleventyone wrote:
               | If it's minus a license then it should be assumed that
               | rights are retained (in the same way you can't just take
               | ownership of an image you find on the internet) so if it
               | were filtering it shouldn't take code from repo's without
               | explicit and favorable licenses. If it is taking code
               | only from repo's with permissive licenses (e.g. MIT) then
               | why aren't they following the attribution requirements?
               | 
               | I don't think you can have your cake and eat it on this
               | one.
        
               | moffkalast wrote:
               | If I steal some code and put it on Github under MIT that
               | doesn't really make it MIT, I'm just lying that it is. If
               | Copilot then uses that it's still in violation of the law
               | I'd assume (ignorance doesn't exonerate you etc.). So
               | they'd have to verify on a case by case basis, which they
               | obviously haven't given the volume of data they had to
               | feed the thing.
               | 
               | It's kinda shocking that they think they can sell this,
               | even providing it for free is extremely sketchy but at
               | least complies with BSD/GNU/CC licensed stuff I guess.
        
               | Hamuko wrote:
               | And especially with such blanket statements as "the code
               | you write with GitHub Copilot's help belongs to you".
        
               | lupire wrote:
               | Why do you think that the recipient is responsible for
               | verifying that no one else has copyright of code they
               | recieved under license?
               | 
               | Is every product user liable when a vendor ships some
               | stolen code?
        
               | Closi wrote:
               | > Is every product user liable when a vendor ships some
               | stolen code?
               | 
               | The user would be unlicensed, and in lieu of the vendor
               | resolving this then the user would need to purchase
               | licences to continue using the software legally (ie if a
               | vendor gives you a pirate version of photoshop, you can't
               | just use it forever just because someone sold it to you).
               | 
               | There are usually clauses in enterprise software
               | agreements that attribute liability for unlicenced
               | components to the vendor for this reason. But ultimately
               | if there isn't a contract or the vendor vanishes, the
               | user will need to go get a licence.
               | 
               | If you want to test the theory, I'll send you a few
               | images to put on your website, and when you get a claim
               | through from the copyright owner you can try to argue
               | that I sent it across without a copyright notice so I am
               | liable ;)
        
               | ryukafalz wrote:
               | > Is every product user liable when a vendor ships some
               | stolen code?
               | 
               | No, but the difference is the users of a product are
               | typically not making and distributing copies. That's not
               | the case if you use someone else's code in your project.
        
               | Closi wrote:
               | It would prove that it doesn't honour all licences - just
               | because the source code exists on Github without a
               | licence doesn't automatically grant a licence to Copilot
               | from a legal perspective.
        
           | jstummbillig wrote:
           | In light of this potential new paradigm it's bewildering how
           | people still manage to focus on the license of training
           | material as if it even moved the needle in this context, even
           | a little bit.
           | 
           | OSS knights: THE LICENSE.
           | 
           | MS: Aight, I guess we have a few lines of hq src to help out
           | with...
           | 
           | Github: Same.
           | 
           | Other OSS people: We really don't care one way or the other.
           | 
           | As long as the word of the lincense was upheld for another 2
           | weeks before it ceased to matter for the rest of all time.
           | 
           | Jesus fucking christ. People. I get that oss licensing is
           | dear to the collective hn heart - but, at best, it's
           | completely irrelevant in regards to where this will
           | inevitably lead, regardless of current questions/issues with
           | license violations. You can (if all the repos of MS and
           | Github are not enough to train this thing on, which is a
           | laughable idea) even fucking buy additional source code if
           | that's what it takes to strengthen Copilots legal foundation.
           | The cost is insignificant. People will be happy to sell for
           | super cheap. It's a non issue.
           | 
           | Why do you wilfully choose to be distracted instead of facing
           | and thinking about the future together?
        
           | causi wrote:
           | Unfortunately the way IP law works, at least in the US, is
           | that you can use essentially whatever you want as training
           | data and it's up to the user to make sure none of the
           | generated code violates licensing agreements.
        
             | SahAssar wrote:
             | If that's the case then GH/MS should at least disclose that
             | for the code generated to actually be legal you have to
             | hunt down the actual source (will be hard in a lot of
             | cases) and check the license against your own license.
        
             | monocasa wrote:
             | Can you point to case law backing that up?
        
               | causi wrote:
               | Sure.
               | 
               | https://jtip.law.northwestern.edu/2021/05/28/copyright-
               | issue...
               | 
               |  _However, even if infringement occurs during machine
               | learning, training AI with copyrighted works would likely
               | be excused by the 'fair use' doctrine.[ii] For example,
               | in Authors Guild v. Google, Inc.[iii], Google had scanned
               | digital copies of books and established a publicly
               | available search function. The plaintiffs alleged that
               | this constituted infringement of copyrights. The Second
               | Circuit held that Google's works were non-infringing fair
               | uses because the purpose of the copying was highly
               | transformative, the public display of text was limited,
               | and the revelations did not provide a significant market
               | substitute for the protected aspects of the originals._
        
               | monocasa wrote:
               | That's training for search to lead to a full copy of the
               | original work with citations, not training for
               | regurgitating verbatim chunks of copywritten works to be
               | incorporated at scale into other copyrighted works while
               | obfuscating their original source.
               | 
               | The Second Circuit's tests listed in your citation
               | specifically fail in this case. It's not highly
               | transformative since it's just regurgitating snippets to
               | be used in other competing works rather than applying the
               | body of works to a different domain. And it's
               | specifically to provide a market substitute for the
               | protected aspects of the original works.
               | 
               | Additionally, none of this says 'its all great and it's
               | on the user to figure it out'.
        
               | causi wrote:
               | In the US copyright violation is a strict liability
               | statute. Regardless of whether or not a court directly
               | confirms or denies Microsoft's right to use code in that
               | way, the end developer is still liable for whatever he or
               | she uses.
        
               | monocasa wrote:
               | But that's orthogonal to being able to use whatever you
               | want as training data for an AI.
        
               | causi wrote:
               | Has the exact issue of a remixing AI been tested in
               | court? No. But everything even remotely similar has been
               | deemed legal. Considering the legal and financial backing
               | on both sides of the issue I expect it to go Microsoft's
               | way even if it does end up before a judge.
        
               | monocasa wrote:
               | It clearly fails the
               | 
               | > the revelations did not provide a significant market
               | substitute for the protected aspects of the originals.
               | 
               | test of your cited case law though. The courts clearly
               | drew a line at developing AI to inject snippets of copy-
               | written works in similar copy-written works. And in
               | context it would be the developer of the AI at fault (in
               | addition to the end users who also used it to infringe
               | other works in the creation of partially derived works;
               | multiple parties can be at fault).
               | 
               | Basically the courts are making it pretty clear that they
               | would have been against what Google had made if it were
               | suggesting phrases in a plugin for a word processing
               | program to create books that would compete with the
               | original books. But being a separate domain of simply
               | collating existing books and providing better search for
               | their corpus (which led you to the original) was allowed.
        
             | lupire wrote:
             | Did you just make that up? Github is distributing the
             | copied code to users.
        
               | monocasa wrote:
               | They did make it up; their cited case law says nothing of
               | the sort.
        
               | causi wrote:
               | _Did you just make that up?_
               | 
               | Unfortunately not. It's really stupid.
        
           | zarzavat wrote:
           | The entire point of a fair use right is that you _don't_ need
           | the copyright owner's permission to be able to exercise it.
           | Fair use allows you to do things that the copyright owner
           | doesn't like.
           | 
           | Is fair use on a massive scale still fair use? Courts
           | generally think so, otherwise Google would have been out of
           | business a long time ago.
        
             | izacus wrote:
             | Unless Copilot is "commenting" or "parodying" the code
             | you've wrote, it's not fair use. Copying and using the code
             | in another project sure as heck IS NOT fair use.
        
             | jrumbut wrote:
             | I don't think releasing a commercial product that copies
             | people's code without complying with the license is
             | anywhere near fair use.
             | 
             | Also, the open source community has far less leverage to
             | apply pressure to Google than it does to GitHub. We may be
             | able to do something about this.
        
               | RHSeeger wrote:
               | It seems fairly similar, at least to me, to a search
               | engine copying snippets of other people's web sites and
               | displaying them on a page. Admittedly, there's still some
               | discussion as to whether or not _that_ is fair use, but I
               | think enough of the population think it is (with many
               | news organizations disagreeing).
        
               | CrazyStat wrote:
               | > I don't think releasing a commercial product that
               | copies people's code without complying with the license
               | is anywhere near fair use.
               | 
               | The whole point of fair use is that the license doesn't
               | matter. You can have a license that says I'm not allowed
               | to use what you wrote for any purpose ever and I can
               | _still_ use it under fair use.
        
               | Longlius wrote:
               | IANAL but fair use is primarily about the public
               | interest. What public interest is served by allowing
               | proprietary software vendors to copy GPL code that's
               | reserved for the commons?
               | 
               | I don't really think this argument passes muster.
        
               | jrumbut wrote:
               | Yes but among the four factors that are used to evaluate
               | fair use claims are whether it is being used commercially
               | (it is) and how it affects the market for the thing that
               | was copied (it clearly would since one way code is used
               | is being imported by other code, if Copilot didn't insert
               | my code into the new app, they might very well use my
               | open source project that provides the same code).
        
               | CrazyStat wrote:
               | I wasn't staking a position on whether Copilot is fair
               | use, just pointing out that fair use doesn't care about
               | license.
               | 
               | That said, copilot itself is _not_ a replacement for your
               | open source project that it was trained on. The code it
               | generates may or may not be, but that 's probably not
               | Github's problem as far as copyright law is concerned.
        
               | pmarreck wrote:
               | > I don't think releasing a commercial product that
               | copies people's code without complying with the license
               | is anywhere near fair use.
               | 
               | It's just automating the copying and pasting (and slight
               | reworking) of boilerplate code that would normally take
               | me much longer to do, especially when I am working with a
               | language I'm less familiar with but is necessary for my
               | stack. I've literally never seen it suggest code that is
               | more or less almost exactly what I would have come up
               | with given a lot more time. In essence, it eliminates
               | tedium- exactly the point of all of programming: Work
               | elimination.
        
             | kybernetikos wrote:
             | > otherwise Google would have been out of business a long
             | time ago.
             | 
             | I do think there are ethical questions around whether it's
             | right for google to digitise physical books without the
             | permission of the authors, and keep them on their servers
             | and make money from them without recompensing the authors.
             | That's something an individual would not get away with
             | doing, so it seems wrong that it's OK for google.
        
             | akagusu wrote:
             | When co-pilot reproduce substantial parts of someone else
             | code without respecting the license terms, it is not fair
             | use,it is just a disguised license abuse.
        
             | meheleventyone wrote:
             | Is this fair use? I don't think that's been established
             | yet. And if it is why didn't MS and OpenAI train it on
             | their private code repositories? Fair use for thee not for
             | me isn't very in keeping with the spirit of that claim.
        
               | komadori wrote:
               | Gosh, can you imagine if they had trained it on their
               | internal source code repositories and it constantly
               | suggested using Hungarian notation for your variables?
               | ;-)
        
               | jimnotgym wrote:
               | Just because there has not been a test case yet does not
               | make it illegal! If MS think it is fair use then they are
               | free to go ahead. Business is all about recognising and
               | assesing risks like this.
        
               | tremon wrote:
               | And even if there had been a legal test case, that does
               | not make it moral! If people think this is socially wrong
               | then they're free to argue their case. Business is all
               | about ignoring ethical quandaries if it gives them an
               | edge.
               | 
               | "Microsoft does it, therefore it must be right" does not
               | a sound argument make.
        
               | aaaaaaaaata wrote:
               | > Business is all about ignoring ethical quandaries
               | 
               | No, _businesses_ are -- not business. Not necessarily...
        
               | jtdev wrote:
        
               | jimnotgym wrote:
               | I sometimes read people's open source code on github and
               | use the ideas from that to develop my own ideas. In fact
               | sometimes I copy and paste short passages and then rework
               | them. I also employ a team of people who may do the same.
               | Is that fair use, yes of course it is. Is co-pilot
               | automating that fair use, I would say so.
        
               | grayfaced wrote:
               | Or alternately, "I sometimes listen to other people's
               | songs and use those ideas to develop my own. In fact
               | sometimes I copy and paste short melodies and then rework
               | them."
               | 
               | Courts have held that it doesn't apply to music, why do
               | you think different rules apply to code?
        
               | [deleted]
        
               | aahortwwy wrote:
               | Microsoft's internal policies don't allow their employees
               | to do this without legal approval.
        
               | aaaaaaaaata wrote:
               | So they don't ask.
        
               | leereeves wrote:
               | I think aahortwwy's point was that Microsoft won't permit
               | their own employees to do what Copilot does.
        
               | zzo38computer wrote:
               | > I sometimes read people's open source code on github
               | and use the ideas from that to develop my own ideas.
               | 
               | Yes, I too, and probably many people will do.
               | 
               | > In fact sometimes I copy and paste short passages and
               | then rework them.
               | 
               | This I usually don't unless I check the license first.
               | (Everybody ought to be allowed, but sometimes the license
               | might not be.)
        
               | jcelerier wrote:
               | What you are doing is very certainly illegal
        
               | nirvdrum wrote:
               | Many people would claim what you're doing is a derivative
               | work. I'm not sure the "of course it is" is very clear-
               | cut (at least in the US). I've worked at big companies
               | that have lawyers that care very much about this topic
               | and what you're describing is prohibited. But, maybe it's
               | different if you're not distributing your source.
        
               | zarzavat wrote:
               | > I've worked at big companies that have lawyers that
               | care very much about this topic and what you're
               | describing is prohibited.
               | 
               | They are doing this to make sure that any lawsuit can be
               | easily dismissed. It has nothing to do with the legality
               | of the action (which sounds like fair use as the parent
               | described it), and everything to do with the expense of a
               | potential lawsuit compared to the cost effectiveness of
               | simply telling developers "don't do that".
               | 
               | Most people think that the law has two shades: lawful vs
               | unlawful. But the more practical distinction is expensive
               | lawsuit vs dismissed lawsuit. This is the lens through
               | which corporate lawyers see copyright and it might
               | explain why so many programmers think that copilot is
               | "obviously" breaking the law and "stealing" their code.
        
               | nirvdrum wrote:
               | If the usage was very clearly fair use, there'd be no
               | need to be defensive about it; the case could be
               | dismissed trivially. In reality, the question would need
               | to be sorted out in court.
               | 
               | Questions of derivative works and fair use come up fairly
               | frequently even in the open source world. This isn't
               | solely a question of corporate lawyer posturing. I don't
               | know any copyleft authors that would be okay with someone
               | copying & pasting their code, making trivial changes, and
               | saying it isn't a derivative work. Of course, their
               | understanding of the law may be flawed. You'll get to
               | find out in court.
               | 
               | You're right. A lot of this boils down to how much you
               | want to spend in court proving your usage is just under
               | fair use. We've moved beyond the question of ethics if
               | you're intentionally violating a project's source license
               | and relying on fair use to do whatever you want with the
               | code. If you want to poke someone with a stick, you can't
               | be surprised when they hit back. I contend what the OP
               | described isn't _clearly_ fair use (note I 'm not saying
               | that it _clearly isn 't_ fair use either). It ultimately
               | doesn't impact me because I'm just not going to copy &
               | paste code from projects without attribution and
               | following the license, but I'd be worried about anyone
               | reading that comment as objectively true.
        
               | matharmin wrote:
               | For public repositories, whether copying small parts of
               | code is considered fair use is just a copyright question.
               | 
               | On the other hand, if you copy from private repositories,
               | it quickly gets into the territory of stealing trade
               | secrets.
        
             | dkersten wrote:
             | Fair use is quite narrowly defined though. This doesn't
             | look like fair use to me, especially when its been shown
             | that copilot does, at least sometimes, spit out code that
             | is completely unchanged from the source material, without
             | advising the user of any license requirements (most
             | permissive licenses require at least attribution).
             | 
             | The SCO vs IBM lawsuit was over only a few lines of code,
             | after all.
             | 
             | I cant use a derivative of Mickey Mouse in my product, even
             | if I change his colour and give him a hat, even if these
             | changes were made by an AI. Why would it be different for
             | code? I cab only use Mickey Mouse as fair use if its done
             | for a specific barrow set of proposes (satire, news
             | reporting etc).
        
             | lupire wrote:
             | "on a massive scale" is one of the legal definitions of
             | unfair use.
        
             | bayindirh wrote:
             | An automated system will devour all my code, which is under
             | a case-tested copyleft license, and regenerate its parts in
             | any place, without respecting the license terms, and call
             | it "fair use".
             | 
             | I have two questions:
             | 
             | 1. Why have licenses, then?
             | 
             | 2. What if I just use leaked sources of closed source
             | software and call it fair use?
        
             | Hamuko wrote:
             | What about us that are not Americans?
        
               | zarzavat wrote:
               | Then you need to check the laws in your country. But that
               | is nothing new to copilot. Copyright laws vary
               | _significantly_ from country to country.
        
               | rurban wrote:
               | Not really. They are mostly the same across countries: ht
               | tps://en.wikipedia.org/wiki/International_copyright_treat
               | i...
               | 
               | There are just minor deviances, not relevant to this
               | case, such as how long Disney bullied the countries to
               | protect a work.
               | 
               | Software is usually considered a work. The AI needs to
               | know if has permissions to copy and use the code, and
               | then offer derived work on the proper terms and
               | conditions. copilot doesn't do that. It might copy GPL
               | code into non-GPL code, thus violating the GPL license,
               | thus being an extreme risk.
        
               | tzs wrote:
               | What are examples of Disney getting countries to extend
               | copyright terms?
               | 
               | In the US there have only been two extensions of
               | copyright terms since Disney came into existence.
               | 
               | The first was in 1976, as part of a major overhaul of US
               | copyright law to update the previous law (from 1909) to
               | take into account the large changes in technology since
               | then, and to make US law work more like the rest of the
               | world to pave the way for the US later joining the Berne
               | Convention. The changes for Berne compatibility included
               | longer terms.
               | 
               | I assume Disney did support this, but only because as far
               | as I can tell it had pretty widespread support. It had
               | enough support that it would have passed even if Disney
               | had adamantly opposed it.
               | 
               | The second was in 1998, and that was specifically a term
               | expansion (as opposed to a term expansion like that of
               | 1976 that was a side effect of harmonizing US law with
               | the rest of the world). Europe had expanded terms a few
               | years earlier, so the 1998 change in the US might have
               | been motivated at least in part by harmonization, but I
               | don't think the differences in terms between the US and
               | the EU would have been enough to get it passed without
               | some major interests pushing for it, so it is probably
               | fair to give Disney a good part of the credit or blame
               | for this one.
        
           | wowokay wrote:
           | I think you might be missing the point of their frustration.
           | 
           | Lots of companies do not put their code in public
           | repositories, granted I understand the perspective of
           | violating a license, but the point is if you don't want your
           | code used by someone else (even with the risk of not getting
           | credit, don't know why that matters) then don't make your
           | repo public period.
           | 
           | To that point, what's to stop GitHub from making a policy
           | that states: "All public repositories will be utilized in AI
           | training"?
        
             | ryukafalz wrote:
             | > even with the risk of not getting credit, don't know why
             | that matters
             | 
             | The point is that it's not respecting the license, not just
             | that it's not giving "credit". If I release code under a
             | GPL license, I damn well don't want someone using that code
             | under a license that's not GPL-compatible, no matter how it
             | got there.
        
           | jillesvangurp wrote:
           | I'm sure the MS lawyers thought long and hard about this and
           | are patiently awaiting any actual lawsuits with confidence in
           | their position. It would be very hard to prove ownership of
           | any snippets. To the point where you can argue that it is
           | just fair use and to the point where companies would think
           | long and hard before committing any resources to fighting MS
           | on this in court at great expense.
           | 
           | I don't think that will happen but it might be interesting if
           | it did.
        
             | amelius wrote:
             | It will stop being fair use when someone makes an AI that
             | creates cartoon characters based on the figures in Disney
             | movies.
        
             | vlovich123 wrote:
             | MS is unlikely to be sued here because the infringement
             | claim wold be against their users and my guess is the
             | license indemnifies them against you suing them for defects
             | in the tool you use (ie use at your own risk and if you get
             | sued you agree you won't sue us).
        
             | aaaaaaaaata wrote:
             | > companies would think long and hard before committing any
             | resources to fighting MS...at great expense
             | 
             | This is the end of Microsoft's actual calculation.
        
             | kop316 wrote:
             | > It would be very hard to prove ownership of any snippets.
             | To the point where you can argue that it is just fair use
             | and to the point where companies would think long and hard
             | before committing any resources to fighting MS on this in
             | court at great expense.
             | 
             | I would like to point you to this:
             | https://twitter.com/mitsuhiko/status/1410886329924194309 HN
             | Comments at the time:
             | https://news.ycombinator.com/item?id=27710287
        
           | drexlspivey wrote:
           | Owning code snippets sounds ridiculous to me, like can I own
           | this snippet?                   def average(*numbers):
           | return sum(numbers)/len(numbers)
           | 
           | if not is it because it is too small? what's the minimum line
           | number that ownership kicks into? what if I change the
           | function name and the variable names?
        
             | bayindirh wrote:
             | If that's under a copyleft license, I can't just copy &
             | paste it under my non-copyleft licensed code and call it
             | mine.
             | 
             | That's as simple as that.
        
               | sidlls wrote:
               | It can't be that simple. The function in the GP is not an
               | original idea and is far too simple to merit protection
               | just by slapping a license on it.
        
               | bayindirh wrote:
               | I don't expect, or support, licensing that small amount
               | of code, and suing everyone to oblivion.
               | 
               | The point I'm trying to make is if something is under a
               | copyleft license, you can't copy and paste it verbatim to
               | something non-copyleft. It's _what the license says_.
               | 
               | Also, to be pedantic, the function I'm commenting on is
               | pure maths, and you _can 't license/patent mathematics_.
               | 
               | On the other hand, if there's some magic sauce of doing
               | something, let it be 25 lines, what will you say? It's
               | just 25 lines, so you can't license it? To be more
               | pedantic, I actually have an algorithm, which is around
               | 25 lines and does something novel. I've published a paper
               | on it.
               | 
               | If I license the reference implementation with AGPLv3+,
               | and you use it and close it, and if I can't go after you,
               | what's the purpose of the license?
               | 
               | You can read the paper and try to implement it. It's free
               | in that regard.
        
               | williamcotton wrote:
               | It seems rather silly to me that such small innovations
               | would be worthy of legal protections under either a
               | copyright or copyleft license.
               | 
               | Isn't there already precedent in other forms of IP, such
               | as chord progressions in music, sentence length in
               | literature, etc?
        
               | dekhn wrote:
               | Copyleft isn't really a good example. Let's talk about
               | copyright. That fragment of code is not copyrightable on
               | its own. Too small, too trivial.
        
               | bayindirh wrote:
               | Let's say I have 25 line function which does something
               | novel and can be published as research (which I did, BTW,
               | no joke), and I opened its reference implementation with
               | AGPLv3+.
               | 
               | Is it again too trivial?
        
               | drexlspivey wrote:
               | is 25 lines the limit then? do you count comments? can I
               | codegolf a few lines to get below the limit?
        
               | bayindirh wrote:
               | > is 25 lines the limit then?
               | 
               | I don't know. That's my function's length.
               | 
               | > do you count comments?
               | 
               | No comments, no blank lines.
               | 
               | > can I codegolf a few lines to get below the limit?
               | 
               | You bet. But, if you copy my reference implementation,
               | you need to get the license as well.
               | 
               | However, the research is on the open. Read it, implement
               | it. That's no problem.
               | 
               | But, CoPilot is not reading my paper. It's reproducing my
               | function verbatim, which is under a license which has
               | share-alike mechanics.
        
               | trasz wrote:
               | Not really. You can't copyright a trivial snippet, same
               | way you can't copyright headers.
        
               | bayindirh wrote:
               | I've provided a more realistic and logical examples in
               | this thread, please refer to them.
        
             | cupofpython wrote:
             | if you write it yourself, it's fine. if you directly copy
             | it from somewhere you arent allowed to copy from, then it
             | is wrong.
             | 
             | There are no rules about the form of the code itself that
             | governs whether or not someone owns it. Common sense
             | applies. Sure you could "steal" very small, common, code
             | snippets and get away with it; but that doesnt make it less
             | wrong.
             | 
             | When a commercial entity explicitly does it, however, some
             | times we can catch them. Like if they do it through
             | algorithms that we more or less know how they work - i.e.
             | the algorithm is using advanced control flow logic to copy
             | and paste from it's training data set and copyrighted
             | material is in that data set
        
               | drexlspivey wrote:
               | Point is that you can ask 100 programmers to write an
               | average function and probably most of them will come up
               | with this answer verbatim. How can copyright law handle
               | this? There is also the opposite problem, I can copy a
               | complicated snippet and change the variable names. Am I
               | absolved from liabilities now?
        
               | cupofpython wrote:
               | If they come up with it on their own, it shouldnt be an
               | issue. Likewise, swapping the variable names does not
               | absolve you from liability.
               | 
               | Copyright really is not only concerned with what exactly
               | is on the page, but also how you got there, and where the
               | knowledge came from to get you there.
               | 
               | What if I read your codebase, and then years later while
               | programming for myself I inadvertently use solutions you
               | came up with while thinking I came up with it myself?
               | 
               | There really are no hard set rules, and this is something
               | that is handled on a case-by-case basis based on whether
               | or not a convincing argument can be made that you copied
               | a novel idea from someone else and claimed it as your
               | own.
               | 
               | We can argue the semantics of it all we want, but the
               | subject area is an active battleground. Typically it only
               | matters when money starts to get involved, since no one
               | usually presses the issue or gets involved with random
               | personal projects. So when an enterprise level company
               | leverages that lack of caring into a proprietary pay-to-
               | use project that operates by copying and pasting code
               | from copyrighted material, then it seems like a case
               | might be able to be made for it.
        
             | ipaddr wrote:
             | Someone trademarked the word THE yesterday and a few common
             | musical notes and your video gets banned
        
         | highwaylights wrote:
         | This seems disingenous.
         | 
         | People don't have a problem that AI is being used in some form
         | to provide the service.
         | 
         | The complaint is pretty clearly that code is being lifted from
         | repositories without attribution or compensation, and being
         | redistributed into other applications.
         | 
         | How impressive the work behind copilot is or is not really
         | isn't relevant.
        
           | tiborsaas wrote:
           | I've made use of a ton of open source tools and have not paid
           | any attribution or compensation. By made use of, I mean I
           | used them as their intended purposes and not their source
           | code. I have a FOSS OS, server, CMD tools and libraries
           | powering my ideas, it's part of the deal that I don't have to
           | pay.
           | 
           | If I modify them I know what I have to do, but Co-pilot is
           | somewhere in-between the two, it's abstracting knowledge from
           | these codebases. We don't yet know how to deal with it
           | properly, but this will change with time, that's why having
           | these conversations are important.
           | 
           | I think that AI models will gain a new legal state, whatever
           | they learn will be considered original work if it's not
           | repeating non-trivial work 1:1.
        
             | moffkalast wrote:
             | > it's not repeating non-trivial work 1:1
             | 
             | But that's basically all copilot does? It's just a fancy
             | compression system with a search function.
        
               | tiborsaas wrote:
               | No, it customizes the snippets to your context, the code
               | is synthesized and not looked up in a db like a web
               | search engine.
        
               | ay wrote:
               | I tried it for the first time today, so treat this with a
               | grain of salt.
               | 
               | https://twitter.com/ayourtch/status/1539928018138931200
               | is my experiment. The code in question has a very
               | specific format - it's C with a _lot_ macro sauce. I
               | described the intent in the comment and pasted the
               | includes lines. Then I started the #define of a unique
               | looking token, and it added the lines with the correct
               | boilerplate. What you see in gray is more boilerplate
               | that it suggests when prompted.
               | 
               | I would dare to assert that "xxxayourtchtestxxx" is not
               | going to be in anyone else's code than mine.
               | 
               | So you can see the example of copilot generating
               | completely new code.
               | 
               | Not saying it's 100% of what it does - but this side
               | looks very useful.
               | 
               | I also did a test with Rust: described a function
               | canonicalizing MAC address, and then when it saw ![test]
               | prompts, it started to make very passable unit tests for
               | the function which was not even written yet - it was only
               | the comment of what it would do.
               | 
               | Also a massively useful lever to have, if it can do so
               | consistently.
               | 
               | My attempts to make it generate a bug-free
               | canonicalization function didn't work - but it was
               | interesting to see it try different approaches based on
               | the existing test code (and no, they didn't always
               | satisfy the tests, unlike one would expect :)
               | 
               | So this angle is "pair programming with a creative
               | novice", which also can be useful - it can give ideas to
               | explore that you didn't think of.
               | 
               | Of course this was all fairly trivial code, I do not know
               | yet how it will behave in a more tricky situation.
        
               | moffkalast wrote:
               | But it kind of is when you think about it. Network
               | weights are just a db written in an incomprehensible
               | format and the synthesis part is searching and converting
               | it back to readable data.
               | 
               | Even if it changes the var names and formatting a bit,
               | it's still at best highly derivative. And at worst it
               | spits out the exact code verbatim.
        
               | tiborsaas wrote:
               | > Network weights are just a db written in an
               | incomprehensible format
               | 
               | That makes all the difference IMHO, its complexity makes
               | it much more than "just a DB". The synthesis part takes
               | into account the context also, so it does intelligent
               | things automatically, a smart SQL query does not.
               | 
               | My brain also works kinda like this. My knowledge is
               | encoded in an incomprehensible format and I convert my
               | knowledge into code based on the problem at hand.
        
           | csee wrote:
           | This is how it always works, though. Moderna is standing on
           | the shoulders of centuries of cumulative human knowledge
           | without compensating all the sources of that knowledge.
           | Musicians learn from other musicians and imitate to an
           | extent, which is why all the musicians in a genre sound very
           | similar, and we don't see present day rappers compensating
           | the previous generation of rappers.
           | 
           | This is where some modest taxation comes in. To reallocate a
           | slice of the output of value creation to its actual source in
           | a rough kind of way wherever more direct compensation isn't
           | feasible.
        
             | cycomanic wrote:
             | > Musicians learn from other musicians and imitate to an
             | extent, which is why all the musicians in a genre sound
             | very similar, and we don't see present day rappers
             | compensating the previous generation of rappers.
             | 
             | You clearly don't know how copyright around sampling works.
             | Yes rappers are paying shitloads to previous generation
             | musicians for samples they use.
        
               | csee wrote:
               | Sure, if we're talking about sampling, which is analogous
               | to co-pilot copy and pasting chunks of code verbatim
               | (which we've seen happen). But the complaints about co-
               | pilot go far deeper than that. Quoting from the tweet:
               | "it _just_ sells code other people wrote ". Do musicians
               | "just" copy from all the people they've been inspired by
               | and learned from?
        
               | cycomanic wrote:
               | What does "inspired" mean in the context of a computer
               | program?
        
             | Dracophoenix wrote:
             | > This is where some modest taxation comes in. To
             | reallocate a slice of the output of value creation to its
             | actual source in a rough kind of way wherever more direct
             | compensation isn't feasible.
             | 
             | I was with you until this statement. The vast majority of
             | society consumes, but doesn't create something new in the
             | process. I'm bewildered as to why you think taxation is a
             | solution rather than a disincentive towards creating. As
             | far as compensating the giants upon whose shoulders most
             | stand, there are plenty of vehicles for that: royalties,
             | patents, copyrights, pensions, awards and prizes, paid
             | fellowships, etc. These are relatively easy to calculate
             | and write a contract for.
        
             | jacquesm wrote:
             | Yes, but those humans are humans, not machines. With
             | machines the scale changes dramatically. Which,
             | incidentally is something copyright law has addressed
             | explicitly: if you mechanically transform at best you end
             | up with a derived work.
        
               | csee wrote:
               | I don't understand the difference between Co-Pilot on the
               | one hand and Moderna (on the shoulders of medical
               | research) or SpaceX (on the shoulders of physics
               | knowledge and cumulative rocket engineering knowledge) on
               | the other. They all heavily use technology, automation
               | and machines. I don't see where the distinction is coming
               | from, and if there is a technical legal distinction, is
               | it an ethically important one?
        
               | lelanthran wrote:
               | > I don't understand the difference between Co-Pilot on
               | the one hand and Moderna (on the shoulders of medical
               | research) or SpaceX (on the shoulders of physics
               | knowledge and cumulative rocket engineering knowledge) on
               | the other. They all heavily use technology, automation
               | and machines. I don't see where the distinction is coming
               | from, and if there is a technical legal distinction, is
               | it an ethically important one?
               | 
               | They are all in compliance with intellectual property
               | laws? Seriously, that's a bloody big difference.
               | 
               | Co-pilot is _not in compliance with many of the source
               | code it is using!_
               | 
               | Whether you like it or not, compliance with the law is
               | necessary.
        
               | meheleventyone wrote:
               | There are thousands of novel decisions in the work of
               | Moderna and SpaceX beyond their cultural starting points.
               | Same thing with art. Copilot isn't inventing nor is
               | DALLE-2 being artistic.
        
               | jacquesm wrote:
               | The distinction is a legal one: intellectual property can
               | not be re-used without permission of the rights holder,
               | be it a patent or a chunk of source code.
               | 
               | And you can bet that SpaceX using physics knowledge and
               | cumulative rocket engineering knowledge are very careful
               | to either license the tech they use or be very explicit
               | about documenting their own.
               | 
               | That you can't see the difference is entirely on you,
               | going 'against the flow' of society sometimes leads to
               | change but more often it simply results in friction and a
               | lack of comprehension.
               | 
               | Keep in mind that open source is based on copyright law,
               | and without copyright law the protections that open
               | source offers would be gone.
               | 
               | To give an extreme example: if you had a chunk of
               | software that was constructed in such a way that it would
               | spit out a complete copy of 'the Gimp' without the
               | license file if you started to write an image processing
               | program that would be a very clear case of copyright
               | violation.
               | 
               | If you then start breaking the Gimp down into smaller and
               | smaller re-usable fractions at some point you might be
               | able to argue that such a generic and oft used snippet
               | should be free of copyright. But that only works as long
               | as you then don't string together a whole pile of pieces
               | that you each copied somewhere else, the whole idea is
               | that your creation is an original one.
               | 
               | Medical research (which quite often leads to patents,
               | which I don't believe should be possible, especially if
               | that research was publicly funded) and physics knowledge
               | are of a different kind than copyrighted program code.
               | The latter would be better compared to universally
               | present language constructs and constraints, such as
               | 'memory management', 'data manipulation' etc. Once you
               | make those explicit in an implementation copyright
               | applies.
               | 
               | Or, to make another analogy: it's like comparing the
               | skill of writing to the product of that skill. The skill
               | isn't protected, but the output of the act of writing is.
        
               | Xunjin wrote:
               | An amazing argument and analogy, also I do agree about
               | Medical research, being possible to patent a work which
               | is publicly funded is an A*** move.
        
               | Ygg2 wrote:
               | > I don't see where the distinction is coming from
               | 
               | Humans use reasoning. Copilot just guesses the likeliest
               | next word.
               | 
               | See the Quake's fast inverse sqrt code incident:
               | https://twitter.com/mitsuhiko/status/1410886329924194309
        
         | olalonde wrote:
         | > Sorry for the unproductive tone of this comment, but there's
         | something about the attitude of this tweet that really grinds
         | my gears.
         | 
         | FWIW the author appears to be a professional woke activist.
        
           | ParetoOptimal wrote:
           | You can be both a professional software developer and a
           | caring human that considers ethics and exercises empathy.
           | 
           | It's easy to convince oneself they can only be a professional
           | developer to escape ethical responsibilities which require
           | significant time and energy.
        
             | olalonde wrote:
             | Of course you can and you should. But going from the
             | Twitter bio and personal website, it doesn't appear to be
             | the case here. They're an activist who lives from
             | soliciting donations and selling 30$ videos on how to be
             | anti-racist (like, literally).
        
               | ParetoOptimal wrote:
               | > They're an activist who lives from soliciting donations
               | 
               | Assuming this is the case, are you concluding they don't
               | have time to be a real software developer?
               | 
               | Maybe they were a professional developer, but now are an
               | activist 50% of the time.
               | 
               | > selling 30$ videos on how to be anti-racist
               | 
               | Is the indictment here that they are a capitalist?
               | 
               | My basic point is you seem to really want to dismiss
               | their views you don't like by arguing they aren't
               | credible rather than attacking their ideas.
        
               | olalonde wrote:
               | > Is the indictment here that they are a capitalist?
               | 
               | No and they are actually anti-capitalist according to
               | their Twitter bio.
               | 
               | > My basic point is you seem to really want to dismiss
               | their views you don't like by arguing they aren't
               | credible rather than attacking their ideas.
               | 
               | The comment I was replying to already did a good job at
               | that, I was just adding some context.
        
               | dxdm wrote:
               | Doesn't mean what they're saying is wrong. Probably makes
               | more sense to attack the substance of their argument, and
               | not their bio.
        
               | [deleted]
        
               | olalonde wrote:
               | The comment I was replying to already did a good job at
               | that, I was just adding some context because it helps
               | explain the attitude.
        
         | hdjjhhvvhga wrote:
         | > Any time someone invents something new and incredible,
         | there's always a crowd of negative nancies eager to discredit
         | and explain why the invention is nothing new and a detrement to
         | society.
         | 
         | It is not true. Whenever there is something really useful,
         | everybody is happy, and while of course they always are some
         | nansayers, they're very few.
         | 
         | However, when you do something controversial, you can expect to
         | hear criticism. You are of course free to dismiss that
         | criticism, but when a lot of people are telling you what you
         | are doing is unethical, maybe it's time to stop and think about
         | it.
        
         | teakettle42 wrote:
         | My code is shared under a license (MIT) that mandates
         | attribution.
         | 
         | That's all I ask -- if you use my code, give me credit.
         | 
         | Stealing my code to train your bot -- which will replicate
         | portions verbatim! -- is no different whatsoever than the
         | casual plagiarist that copies and pastes a novel snippet
         | manually.
         | 
         | Its absolutely my legal and ethical prerogative to complain
         | about people stealing my code by failing to respect the license
         | under which it was freely provided.
        
           | Xunjin wrote:
           | That would be great to Copilot show where it found this
           | snippet and give the person credit about. Even if it's
           | unlicensed.
        
             | seba_dos1 wrote:
             | If it's unlicensed, then you can't use it at all, so giving
             | attribution wouldn't change much in that case.
        
           | tiborsaas wrote:
           | Is is really stealing when your code is used to change a
           | parameter value from 0.3623727247 to 0.3623727321?
        
             | iamevn wrote:
             | It does not matter what the internal representation is.
             | What matters is that Microsoft is selling a tool which
             | reproduces non-public domain works while claiming to grant
             | the user ownership of the output.
        
         | isitmadeofglass wrote:
         | Yes but,
         | 
         | Sorry for the unproductive tone of this comment, but there's
         | something about the attitude of this tweet that really grinds
         | my gears. Any time someone invents something new and
         | incredible, there's always a crowd of negative nancies eager to
         | discredit and explain why the invention is nothing new and a
         | detrement to society. I don't understand why someone would
         | willingly share their code on github where it is publicly
         | available just to complain when others make use of that
         | knowledge. 'co-pilot just sells code other people wrote' is
         | such a ridiculous understatement of what co-pilot does. Instead
         | of marvelling at the human ingenuity that went into creating
         | it, they sneer at the audacity of openAI to do something
         | without first asking their permission.
         | 
         | -- This comment brought to you by HN-Comment-AI (c)
        
           | bryanrasmussen wrote:
           | whoa, I think this should definitely be highlighted far and
           | wide on the internet, think of the ingenuity of the people
           | who made the HN-Comment-AI, it's probably the smartest
           | comment bot out there, able to take the ramblings of people
           | on HN and nonetheless generate a comment so astute!
           | 
           | Although I have to say the use of the phrase 'negative
           | nancies' shows that even the best machine-learning algorithm
           | still comes up with unlikely to occur in real life text.
        
         | Chris2048 wrote:
         | > willingly share their code on github where it is publicly
         | available just to complain when others make use of that
         | knowledge
         | 
         | because it's not unconditional, there are often licence terms
         | of usage, and copilot is potentially laundering those.
        
         | sAbakumoff wrote:
         | It's the negativity bias beauty in action. You have it too.
        
         | gumby wrote:
         | _People_ get paid to write code having learned from writing
         | code for others and from reading code others wrote. In this
         | regard I dont see why github copilot is any different.
        
           | [deleted]
        
           | lupire wrote:
           | People don't memorize chunks of code and copy it.
        
             | giaour wrote:
             | Sometimes people do, and in any case copyright isn't
             | limited to just verbatim copies. You can't, for example,
             | reuse characters or plots from other works of fiction in
             | your own novel, even if you rewrite it in your own words: h
             | ttps://en.m.wikipedia.org/wiki/Copyright_protection_for_fic
             | ...
        
             | gumby wrote:
             | People often _do_ write things because they learned a
             | common approach at a previous job or because they saw such
             | an approach when reading someone else's code. People are
             | often hired specifically because they have experience in a
             | certain area from a previous employer, so are dointhe same
             | sort of thing at a higher level.
             | 
             | We fought this battle over a couple of decades with remix
             | culture ("you stole that line/beat out of my song!") and
             | the world is better because the over-clingers lost.
             | 
             | There is no shortage of reasons not to like copilot, but I
             | don't consider this one of them.
        
         | pwdisswordfish9 wrote:
         | 'Facebook just sells personal information of other people' is
         | such a ridiculous understatement of what Facebook does. Instead
         | of marvelling at the human ingenuity that went into creating
         | surveillance capitalism, they sneer at the audacity of Facebook
         | to do something without first asking their permission.
        
         | nextaccountic wrote:
         | > I don't understand why someone would willingly share their
         | code on github where it is publicly available just to complain
         | when others make use of that knowledge.
         | 
         | Because they shared the code under a license, and they have the
         | right to complain if people use that code but don't follow the
         | license.
         | 
         | For example, what happens if Github Copilot spits a copy of
         | some copyrighted code verbatim? Is laundering open source code
         | through a machine learning model a loophole for not having to
         | follow the license?
         | 
         | Often following the license is as simple as giving credit to
         | the original author.
        
           | bborud wrote:
           | I've done a fair number of technical due diligence projects
           | on acquisitions and potential partnerships, and on some
           | project I've hired outside firms to analyze the code and
           | figure out its origins and what licenses apply.
           | 
           | There are tools that will analyze a codebase and identify
           | where chunks of varying size seem to come from. Mostly to
           | determine if the code is encumbered by problematic licenses,
           | but also to detect where the programmers may have borrowed
           | code from.
           | 
           | If memory serves, some of these companies also have closed
           | source codebases in their database, enabling them to detect
           | if unpublished code has been re-used.
           | 
           | The times I've used this in due diligence it has rarely been
           | a deal-breaker when we do find large chunks of code that may
           | be problematic. For instance due to licensing terms that are
           | not acceptable. You just make a note of it and have them
           | rewrite the code before the transaction can take place. (Or
           | you figure out if you can accomodate the license terms).
        
             | nextaccountic wrote:
             | Yeah, but wouldn't it be great if the tool that performed
             | "AI-generated code" were also required to run such analysis
             | themselves, to eliminate this licensing violation at the
             | moment it were inserted?
             | 
             | It's as if Microsoft were banking on the fact that most
             | violations will be unnoticed
        
         | Tryk wrote:
         | This doesn't address the point of the Tweet, you are simply
         | attacking the form of their argument.
         | 
         | Moreover it is possible to BOTH marvel at the human ingenuity
         | that went into making copilot AND disagree with their methods.
         | Some things can be marvelous and wrong at the same time.
        
         | rglullis wrote:
         | > why someone would willingly share their code on github where
         | it is publicly available just to complain when others make use
         | of that knowledge.
         | 
         | For other individuals to collaborate, to make the software
         | available to other people, etc. Certainly not for github's
         | profit and much less for the benefit of github's customers who
         | will have access to open code that violates license agreements.
        
         | matthewmacleod wrote:
         | I also disagree with the tone of that tweet, but your dismissal
         | is equally shallow and gear-grinding.
         | 
         | There are real, serious, and genuinely interesting issues to be
         | discussed regarding Copilot. It is neither "just selling code
         | that other people wrote", nor is it something that we should
         | applaud merely because it demonstrates "human ingenuity".
         | 
         | The comments here regarding this are honestly a total dumpster
         | fire. It's mostly a bunch of paper-thin hot takes, either:
         | 
         | - The blatantly stupid "you willingly shared your code so why
         | are you complaining that one of the world's biggest companies
         | is now hoovering up code from your carefully-selected open-
         | source license and reselling it as a service!!!"
         | 
         | - The blatantly lying "I have literally never looked at any
         | other computer software while developing any obviously anybody
         | who has ever seen other source code is a plagarist"
         | 
         | It's dumb because there is an _actual interesting discussion_
         | here but I guess we 're not going to bother having it.
        
           | HumanReadable wrote:
           | Fair enough, I agree.
           | 
           | I actually didn't intend for my comment to be an argument in
           | favour or against, and I am a bit surprised it is the most
           | upvoted of the section.
           | 
           | I agree that there's a pretty interesting discussion to fair-
           | use and the limits of copyright, and that my original comment
           | was not conducive to having that discussion. In my defense,
           | neither was the tweet this thread is about!
        
         | akagusu wrote:
         | > I don't understand why someone would willingly share their
         | code on github where it is publicly available just to complain
         | when others make use of that knowledge.
         | 
         | People like you should understand that publicly available code
         | doesn't mean "do whatever you want" code.
         | 
         | The majority of publicly available code hosted on Github as a
         | license that tells you what you can and what you cannot do with
         | that code.
         | 
         | If someone uses this code without respecting the license,
         | authors have the right to complain and even legally enforce the
         | license if they want.
         | 
         | Now, you should know that there's nothing "cool" to take other
         | people's work without permission.
        
         | ricardoplouis wrote:
         | Wouldn't you rather have a healthy dose of skepticism and
         | pessimism surrounding new inventions? Even if the negativity is
         | off base, it's far more preferable to a world where everyone is
         | always positive and praises what geniuses the creators are. The
         | former atleast breeds discourse while the latter only serves to
         | make people feel good.
        
         | bambax wrote:
         | The world would probably be a better place if there were no
         | copyright.
         | 
         | But the world we actually live in is one where corporations
         | have copyright, and individuals don't.
         | 
         | That's what irks people, I think rightly.
        
         | DoreenMichele wrote:
         | Meanwhile, creators of FOSS projects are often underfunded and
         | lots of people are in such dire straits that rich people talk
         | of mollifying them with a few paltry dollars via UBI rather
         | than fix anything.
         | 
         | That's likely the crux of the issue. If you do it right, you
         | can steal from other people and get rich. Meanwhile, those same
         | people (whose work was stolen) may be left out in the cold no
         | matter how original, creative, hardworking etc they are.
        
         | Mizza wrote:
         | Pretty fucking simple explanation for it, actually:
         | 
         | I don't make Free software so that Microsoft can sell it to
         | people for use in proprietary projects.
        
         | zitterbewegung wrote:
         | Why can't startups understand what a open source license is ?
         | Apache 2.0 could be ingested by this tool but it is a horrible
         | license for your database as a service. AGPL would be a great
         | license for a database as a service but should not be ingested
         | by OpenAi / GitHub copilot.
        
         | hk1337 wrote:
         | Usually they want some recognition for their contribution and
         | with GitHub copilot they get none of that.
        
         | jacquesm wrote:
         | They are complaining about license violations, they are not
         | pissing on this incredible (is it?) achievement.
         | 
         | Reselling other people's content like this without attribution
         | (which, is a pretty mild form of payment) is not nice. But at
         | least you now have one more reason in the list of reasons why
         | Microsoft acquired Github: to be able to launder their open
         | source contributions and resell them.
        
         | ThePhysicist wrote:
         | I mean I'm not an expert but it's a valid point as people share
         | code under a given license, and as far as I'm aware Copilot
         | does not make this knowledge available. Nothing to do with the
         | fact that Copilot is an amazing technological achievement.
         | 
         | If I, as a human, go to a public repository on Github and
         | copy/paste a non-trivial 200 line code snippet into my
         | proprietary code base I have to abide by the license of that
         | original code, even if I slightly modify it. I don't see how
         | this cannot be true for Copilot. I'm sure the legal folks at
         | Github have thought of a response though, you could e.g. argue
         | that the snippets produced by Copilot are not affected by the
         | copyright of the original author as they do not reach the
         | required treshold of originality. Seems rather shaky for me
         | though.
        
       | thewoolleyman wrote:
       | Artificial Intelligence is causing us to revisit the difference
       | between free as in beer and free as in speech
       | (https://en.wikipedia.org/wiki/Gratis_versus_libre).
       | 
       | It is putting a new spin on some traditional Open Source Lessons 
       | (https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar#L...)
       | .
       | 
       | People share and reuse snippets of unattributed snippets of MIT-
       | licensed and GPL-licensed code on the internet all the time,
       | StackOverflow, etc.
       | 
       | StackOverflow is profiting from that activity indirectly by
       | facilitating it. They profit passively through ad revenue, and
       | actively through the Teams subscription offering.
       | 
       | But nobody seem too upset about that.
       | 
       | How is an AI which facilitates the same code sharing
       | fundamentally any different? Because it's scraping it itself,
       | rather than humans contributing it?
       | 
       | Seems like a tenuous argument at best.
        
       | antihero wrote:
       | I mean, if it's autocompleting a fairly simple line, and can do
       | that because it's analysed a lot of lines, I don't really see
       | that as "stealing anything".
       | 
       | If you are using it to write whole complex functions thatare the
       | same as other people's, I guess that is copying.
       | 
       | But if you do the second thing you are not a great dev, and would
       | have probably ended up copy pasting it anyway.
       | 
       | I think the first use case is far more common, and creating
       | boilerplate that is so generic you could never really attribute
       | it anyway.
        
         | dobin wrote:
         | I neither see it "stealing". The neuronal network was trained
         | with code as input. It's creating code as output. The output
         | has nothing to do with the input once it is trained. Do people
         | dont know how neuronal network work?
         | 
         | It's like saying GPT-3 created text is copyright infringement,
         | because some author used the same sentence in a book before.
        
           | eloisius wrote:
           | So if I fit a network to output entire chapters of a book
           | when given the chapter number as input, I can print and sell
           | copies of it that way?
        
           | f1refly wrote:
           | 1. Create a neural network that produces an x264+dts stream
           | of a movie 2. distribute it 3. checkmate copyright lawyers
        
           | ImprobableTruth wrote:
           | Overfitting: One weird trick that copyright lawyers don't
           | want you to know!
        
         | wodenokoto wrote:
         | > If you are using it to write whole complex functions thatare
         | the same as other people's, I guess that is copying.
         | 
         | > But if you do the second thing you are not a great dev, and
         | would have probably ended up copy pasting it anyway.
         | 
         | How would I know that the boiler plate I ask copilot to write
         | for me is copied verbertim from a codebase, that neither I nor
         | Microsoft has licensed to use?
        
         | carom wrote:
         | My problem is with the weights not being released. They are a
         | derivative work of open source code in the most literal sense.
         | The weights would not exist without those lines. Gradient
         | descent is using literal derivatives.
        
         | afiori wrote:
         | The when Oracle won its copyright lawsuit against google it was
         | because of a 8 line bound checking utility function.
        
           | redox99 wrote:
           | Source?
        
             | afiori wrote:
             | https://news.ycombinator.com/item?id=11722514
        
               | redox99 wrote:
               | Thank you
        
           | dekhn wrote:
           | Not only was that not the only code in question, you left out
           | the conclusion! https://en.wikipedia.org/wiki/Google_LLC_v._O
           | racle_America,_.... It went to the supreme court, they
           | concluded fair use, FU oracle.
        
         | alpaca128 wrote:
         | The first can be automated without ML though. And once you use
         | ML you cannot guarantee it won't copy-paste existing code.
         | 
         | This whole thing would be fine if GitHub hadn't just used all
         | public code on their platform, ignoring all involved licenses.
        
           | xupybd wrote:
           | It changes the code for use. I'm not sure it can be
           | considered a copy. It much like reading someone else's code
           | and drawing ideas and patterns from that code.
        
             | afiori wrote:
             | Copyright sensitive environments are very careful not to do
             | that.
        
             | alpaca128 wrote:
             | It has been shown often enough that Copilot can reproduce
             | exact copies of snippets.
        
           | rob74 wrote:
           | The problem is, if they had used only code with a license
           | that allows copying without attribution, there wouldn't have
           | been a lot of code left...
        
             | alpaca128 wrote:
             | Difficulty doing something legally doesn't justify breaking
             | the law.
        
         | rob74 wrote:
         | > _But if you do the second thing you are not a great dev, and
         | would have probably ended up copy pasting it anyway._
         | 
         | If you do that on your own, it's your (legal) responsibility.
         | If Copilot does it for you, it's GitHub's/Microsoft's
         | responsibility.
        
           | __warlord__ wrote:
           | Why should be GitHub's/Microsoft's responsibility? No one is
           | forcing you to use copilot.
           | 
           | If I use grammarly, are they responsible for what am I aiming
           | to write?
        
             | purerandomness wrote:
             | Does Grammarly gerate pages of content for you?
        
             | 32bitkid wrote:
             | If I pay for grammarly, and it plagiarizes an existing work
             | but represents it as an entirely new, independent work and
             | I am unaware of the existing work that is being stolen, who
             | is doing the stealing?
        
               | scotty79 wrote:
               | If you pay a shady character to get you a modern laptop
               | for $100 you can't claim that you were unaware that it
               | was most likely stolen and the fact that you paid for it
               | something doesn't absolve you morally.
        
               | ClumsyPilot wrote:
               | does shady guy have his name on the side of a building,
               | and run ads: "buy my shady stuff" and then pay taxes and
               | his earning? That kind of shady guy?
        
               | scotty79 wrote:
               | Sometimes. Like Amazon, widely known for their workers
               | and 3rd party vendor exploitation practices.
               | 
               | You can no more claim ignorance of where the github
               | copilot code comes from than where the Amazon's low, low
               | prices come from.
               | 
               | Whether you care is totally on you regardless of whether
               | you pay ir not. You pay for product or service not moral
               | absolution.
        
               | nnoitra wrote:
        
               | scotty79 wrote:
               | I see you just came from there. Welcome on HN. Please
               | start by reviewing FAQ and Guidelines.
        
               | ClumsyPilot wrote:
               | Are thousands of amazon employees going to be in same
               | docket with me?
        
               | seanmcdirmid wrote:
               | This makes more sense for text message auto complete: you
               | just take the suggested next word after a one word start
               | deed, it might reproduce a Wikipedia entry. But what did
               | tub expect? The same would be true with grammarly if you
               | somehow got it to produce a bunch of new text. You
               | expected garbage, but somehow infringed on copyright
               | instead. But I guess think the user deserves some
               | responsibility in realizing their expected garbage output
               | isn't for some reason.
        
             | ClumsyPilot wrote:
             | So it's my job to check my supplier, to make sure lines
             | from co-pilot are legit.
             | 
             | At the same time when fast fashion companies sell T-shirts
             | made with slave labour, its not the company's
             | responsebility to check what their suppliers are doing.
             | 
             | And if tesla autopilot kills you and your family its not
             | their fault either.
             | 
             | Neoliberal morality - companies are never accountable for
             | anything, it's heresy to suggest they should do their job
             | properly.
        
               | wang_li wrote:
               | Other than the first sentence nothing you wrote is true.
               | If a company doesn't do due diligence on their suppliers
               | they face fines and possibly criminal charges. The news
               | came out the other day that the NTSB is considering
               | whether to require Tesla to recall all their vehicles
               | with self driving enabled. Companies of all types face
               | huge fines and civil liability for product safety issues.
        
               | ClumsyPilot wrote:
               | have you never googled "slavery fast fashion"?
               | 
               | Zara's clothes sometimes have notes in their pockets from
               | people being held as slaves, pleading for help. I havent
               | heard of anyone going to jail
               | 
               | most of our electronic waste end up illegally exported to
               | poor countries, again when was the last tomw someone
               | faced the music for that?
        
           | alkonaut wrote:
           | > If Copilot does it for you, it's GitHub's/Microsoft's
           | responsibility.
           | 
           | Is this true? It hasn't been tried yet I assume?
        
           | Hamuko wrote:
           | > _If Copilot does it for you, it 's GitHub's/Microsoft's
           | responsibility. _
           | 
           | GitHub/Microsoft says that it's still your responsibility.
           | 
           | > _You should take the same precautions as you would with any
           | code you write that uses material you did not independently
           | originate. These include rigorous testing, IP scanning, and
           | checking for security vulnerabilities. You should make sure
           | your IDE or editor does not automatically compile or run
           | generated code before you review it._
           | 
           | I'm not really sure how am I supposed to go about validating
           | that I can in fact use this code that the magical black box
           | barfed into my IDE using a bunch of different weights.
        
             | nnoitra wrote:
             | Just what a horrible shady behavior.
             | 
             | Give us your money but you are responsible for the code
             | that OUR tool generates.
        
             | dragonwriter wrote:
             | > GitHub/Microsoft says that it's still your responsibility
             | 
             | If Copilot is fair use, and has no restrictive license,
             | than how is it anyone's responsibility?
             | 
             | If Copilot isn't fair use, it's Microsoft's responsibility.
             | 
             | (For copyright; for patent that's another issue, but you
             | can violate patents by similarity without exposure or
             | copying, anyway.)
        
               | Hamuko wrote:
               | _Training_ Copilot is fair use, using Copilot is ???.
        
             | lowercased wrote:
             | Let MS buy BlackDuck scanner and integrate in to
             | GitHub/CoPilot. They could then suggest code and also scan
             | it for any license violations, and give you both sides of
             | the equation in the same tool.
        
           | pronik wrote:
           | You are responsible for your tool use. That's the same
           | discussion as with whether uTorrent is responsible for your
           | torrenting copyrighted stuff or with Tesla's auto-pilot. You
           | buy the tool, you are responsible for what you create with
           | the tool.
        
             | [deleted]
        
             | stavros wrote:
             | Napster was liable for copyright infringement.
        
               | pronik wrote:
               | True, however, the users have been liable too. If my
               | company gets sued because I used Copilot, it won't matter
               | that much that the plaintiff also sued GitHub/Microsoft.
        
               | strictnein wrote:
               | Napster's raison d'etre was copyright infringement.
        
               | stavros wrote:
               | Which they were then liable for.
        
       | Ciantic wrote:
       | I'm bit mixed on this, code Copilot usually autocompletes me is
       | not particularly novel, it's just mundane stuff I would write
       | anyway. Most of these snippets are not copyrightable in my
       | opinion, because it was obvious in the first place. Like CSS nth-
       | child odd / even logic, or one case it filled me ~10 lines JS
       | logic of filtering rows by category stored in dataset, which I
       | would have written anyway.
       | 
       | Then there are cases where it amazes me completely, it wrote 10
       | lines of C++ code for rendering a monochrome glyphs with bits
       | using Freetype library. It though had odd subtle bug, the glyphs
       | came reversed and it worked with only certain font size which it
       | seemed to pick up from different file all together.
        
       | BiteCode_dev wrote:
       | It is incredible to use though. I pasted the return value of an
       | API call in comment, then started to write a schema class.
       | Codepilot just created the entire class for me. wanted to extract
       | a subset of the data, I typed get_<_name_of_the_subset>(), it
       | wrote the code I would have written.
       | 
       | So even without using someone else code, just the pattern
       | understanding and the production of simple boiler plate code is
       | great.
        
       | iLoveOncall wrote:
       | Github Copilot is selling code other people wrote as much as the
       | author of this thread is profiting from words other people
       | invented.
       | 
       | Absolute nonsense.
        
         | nextaccountic wrote:
         | The difference is that words aren't copyrighted and doesn't
         | come with an open source license.
        
       | coldtea wrote:
       | > _Hector Martin: If you use Copilot, you are basically playing
       | Russian Roulette that the random mashup of existing, copyrighted,
       | hegerogenously licensed code that you get out of it qualifies as
       | an original work, mostly by chance. Or that nobody will ever sue
       | you otherwise._
       | 
       | Well, that's already the case with Stack Overflow copypasta
       | enterprise code. If anything, use of Copilot would be an
       | improvement...
        
         | t0suj4 wrote:
         | That quote applies to any creative work. Be it code, audio or
         | video.
        
           | coldtea wrote:
           | He talks about code, and Copilot works with code, so I'm not
           | sure how it "applies to any".
           | 
           | If you mean that if you make a "random mashup of existing,
           | copyrighted, hegerogenously licensed" works of art
           | (audio/video), it also applies that you might be sued for it,
           | then yes.
           | 
           | But that's not much of an issue with Copilot if you're using
           | it for enteprise code that's already a mashup of copypaste
           | "existing, copyrighted, hegerogenously licensed" and that you
           | wont release and nobody will see anyway.
           | 
           | Whereas audio/video you generally want to release.
           | 
           | If you make them for your own consumption, then it's my
           | response that rather applies: since nobody will see it, and
           | you don't release/sell/circulate it, you can go ahead and mix
           | Michael Jackson, Disney and Star Wars material - nothing will
           | happen to you.
        
         | Hamuko wrote:
         | If you post content on Stack Overflow, your contribution is
         | distributed using the CC BY-SA 4.0 license.
        
           | coldtea wrote:
           | Yes, but nobody that copies it cares...
           | 
           | (Where nobody is a stand-in term, to mean "less than 1% of
           | those do")
        
         | moffkalast wrote:
         | > If anything, use of Copilot would be an improvement
         | 
         | What do you mean, Copilot regularly pastes stuff directly from
         | SO. One of those automatic doc generators was able to point me
         | to the exact answer where one of them was from.
        
           | coldtea wrote:
           | That it doesn't just "copy and paste" but does more involved
           | "AI" mixing
        
             | moffkalast wrote:
             | I don't think renaming variables and adjusting spaces holds
             | up in court.
        
         | tagyro wrote:
         | Do people really copy/paste from StackOverflow?
         | 
         | I feel this is more a meme, rather than reality. I do check
         | StackOverflow, but never have I took an answer verbatim. I try
         | to see if it's the same problem and what was the approach in
         | deconstructing it, which I find more useful in the long run.
        
           | Flimm wrote:
           | According to Stack Overflow's blog:
           | 
           | "One out of every four users who visits a Stack Overflow
           | question copies something within five minutes of hitting the
           | page."
           | 
           | https://stackoverflow.blog/2021/12/30/how-often-do-people-
           | ac...
        
             | icoder wrote:
             | Well, to be fair, most of that is probably just copying the
             | a particular syntax or built-in function, which (I think?)
             | has nothing to do with copyright.
             | 
             | At least for me, that's most of the copies I do, followed
             | by the ones that basically are 'call these functions in
             | order', then paste it as a comment and use it as cheat
             | sheet, and only very rarely I copy a 'creative' snippet
             | almost verbatim, like a regexp matching email addresses, a
             | to-hex or a crc calculation. And perhaps that's actually
             | tricky.
        
           | concordDance wrote:
           | Anecdata: Everyone I work with does.
        
           | mullen wrote:
           | I catch people using cut and paste code all the time. If
           | there is a spelling error in code (Especially if it is in a
           | code comment), I can guarantee you that someone copied and
           | pasted it from StackOverflow.
        
           | coldtea wrote:
           | > _Do people really copy /paste from StackOverflow?_
           | 
           | All the time.
        
           | ldoughty wrote:
           | I've done it, and I know others that have, but I think it
           | depends on people's definitions of copy/paste.
           | 
           | I've certainly copied a sort anonymous function from SO, it
           | was one-liner. Is that copy/paste? or is it only copy/paste
           | if it's X lines?
           | 
           | Otherwise I agree, usually I just get hints and go my own
           | way.
        
           | Timwi wrote:
           | It depends on what you need. In most cases the code on
           | StackOverflow is not exactly what you need, so you need to
           | understand it in order to adapt it. But if you're looking for
           | a specific well-defined algorithm (MD5, say) then you can
           | just copy & paste it.
        
           | Aeolun wrote:
           | This has more to do with the code never being immediately
           | copy-pasteable, not so much my reluctance to copy-paste from
           | SO for licensing reasons.
        
       | fimdomeio wrote:
       | what AI is showing is the fuzzy line between creating and
       | copying. The truth is they are both always present in everything
       | we do, we've just been trying to hide it.
       | 
       | So it should be as simple as if you're using other people's
       | content for your own profit you should properly compensate them.
       | 
       | Or we could just abolish copyright law and assume that everything
       | humans create emanates from culture so its always collectively
       | built and everything should be open source.
       | 
       | Or we just do the same we've been doing. Create even more complex
       | laws trying to define this fuzzy line in a way that companies can
       | keep profiting from it a lot more than individuals.
        
       | FeepingCreature wrote:
       | All I can think of is Steve Yegge [1]: "They have no right to do
       | this. Open source does _not_ mean the source is somehow  'open'."
       | 
       | My code is on Github so that people can read it, reuse it and
       | learn from it. "The freedom to study how the program works", as
       | the FSF says. If some of the people reading it are machines, why
       | would that matter?
       | 
       | [1] http://steve-yegge.blogspot.com/2010/07/wikileaks-to-
       | leak-50...
        
         | happymellon wrote:
         | Because a lot of this code would be put into closed source
         | software, which is against the licence and would prevent people
         | from exercising the right to study how a program works.
        
           | FeepingCreature wrote:
           | But I don't care if closed source programmers read my GPL
           | code! The freedom to learn is not copyleft. So long as they
           | put independent effort into their work, they're good in my
           | book. Shared knowledge is a vital commons, and I'm honored if
           | I can contribute to it.
           | 
           | Maybe this goes back to that debunked paper that claimed that
           | transformers were only remixing input samples?
        
             | happymellon wrote:
             | They aren't reading your code. This is a program
             | copy/pasting code without attribution.
        
               | FeepingCreature wrote:
               | Again, the paper that said that transformers only
               | copypasted input samples was highly misleading.
               | 
               | It seems clear to me that Codex has true understanding.
               | 
               | (Yes, I know that people have gotten secrets to appear in
               | the output by prompting it in clever ways. That this
               | happens doesn't prove that Codex doesn't understand what
               | it's doing, it just shows that Codex doesn't understand
               | _everything._ )
        
       | tremon wrote:
       | I might start considering Copilot if Microsoft were to train it
       | on their own internal codebases (Windows, Office, SQL Server).
       | Until they do, it's clearly a "tool for thee but not for me" type
       | of situation.
        
         | clircle wrote:
         | "tool for thee but not for me" <- what does this even mean?
        
       | albertzeyer wrote:
       | So, how often does it actually happen? Does it happen more often
       | than for a human? Does anyone actually have numbers on this?
       | 
       | Of course, if you provide already a copyrighted prefix, and it
       | has seen that code, the chances are high that it would complete
       | the copyrighted code (because that is what you actually would
       | also expect).
       | 
       | So, for real use cases in the wild, where you write some own real
       | novel code, how often would it suggest some copyrighted code? And
       | how often would a human?
       | 
       | I have used Copilot the last months and I have never ever seen
       | such a case (I can be pretty sure because all the identifier
       | names are really unique, and the code was very custom).
       | 
       | However, I assume that I myself might have produced copyrighted
       | code unknowingly because if you write common patterns (e.g. some
       | tree or graph search, or some sort function, implement LSTM or
       | Transformer, whatever), the chances are not so low.
        
       | thih9 wrote:
       | Is github copilot using private repositories for the learning
       | process?
       | 
       | If yes, how do they mitigate the risk of exposing private data
       | when something is quoted verbatim?
       | 
       | If not, then why are repos with non permissive licenses ok?
        
       | mojuba wrote:
       | Can I suggest a hypothesis that if you find Copilot useful it
       | means the problem you are solving is a boring one? I might be
       | wrong of course.
        
         | triknomeister wrote:
         | 99% of work in 100% of interesting projects is boring.
        
         | alpaca128 wrote:
         | I disagree. Most large projects, software or otherwise, use
         | existing parts. If you design an innovative device you'll still
         | use some standard components like chips, memory modules etc.
         | 
         | There's already a way to quickly solve the boring parts in
         | development - libraries which were built and licensed around
         | that purpose. But Copilot passes you code of unknown origin,
         | with unknown license terms and no information about how close
         | it is to an existing codebase. It's like a person trying to
         | sell you Macbooks for a hundred bucks per unit but you don't
         | know where they came from and who made the holiday photos
         | stored on the harddrive.
        
         | alkonaut wrote:
         | 99% of the "problems" I'm solving when I'm working even on very
         | interesting and challenging problems, are boring subproblems.
         | If I can get those out of the way then that would be great.
        
         | mistercow wrote:
         | That hypothesis is easily disproven by spending an afternoon on
         | a side project with Copilot.
         | 
         | No matter how interesting your problem is, translating it into
         | code is going to involve a lot of grunt work. This isn't just
         | boilerplate, but also the large portion of your code which is
         | going to be gluing things together.
         | 
         | The time you spend working through those menial parts of your
         | code is time when the context of the interesting part of the
         | problem fades. Once you get the mechanical stuff out of the
         | way, you have to load the interesting stuff back into your
         | brain.
         | 
         | This is where AI coding tools really shine. They dramatically
         | reduce the intervals between when you can think about the
         | actual problem you're solving by letting you get the boring
         | mechanics out of the way more quickly.
        
           | mojuba wrote:
           | I'm very curious to see some examples where Copilot
           | autocompleted something truly useful and saved you time - and
           | that also disproves my hypothesis that you are doing
           | something boring or with the wrong
           | tools/languages/frameworks. Things that a non-ML autocomplete
           | could do don't count.
        
             | mistercow wrote:
             | I can give you an example of an entire (well, I still
             | consider it alpha) library I wrote several months ago,
             | using Copilot: https://github.com/osuushi/triangulate
             | 
             | This is an implementation of a 1991 paper on polygon
             | triangulation into Go. So the deepest thinking about how to
             | solve the problem was obviously already done for me, but
             | there were a number of edge cases that I had to invent my
             | own solutions to, and the translation itself involved
             | keeping a lot of context in my head.
             | 
             | I can't tell you in precise detail what Copilot did, and
             | what I wrote by hand. I wasn't taking notes or recording my
             | screen. But there's a reason you don't see a lot of blocks
             | in there where I forgot to comment anything, because my
             | entire process for this was "type what I want to do in
             | English, and see if Copilot will generate the next snippet,
             | or something close". I didn't do this out of bloodyminded
             | dedication to the AI cause, but because it continued to be
             | an extremely effective way to get the code written quickly.
             | 
             | I can give a few specifics:
             | 
             | - My linear algebra is rusty, and Copilot was extremely
             | helpful here. I would often just type the basic thing I was
             | trying to do in pretty vague linear algebra terms, and it
             | would generate the formula.
             | 
             | - I wrote a lot of tests like this https://github.com/osuus
             | hi/triangulate/blob/main/internal/sp.... This is a minor
             | thing, but those aren't copy-pasted. Instead, I would write
             | the first test, and for the most part, I could just type
             | something like `func
             | TestConvertToMonotones_SquareWithHole`, and it would figure
             | out how to adapt the previous test automatically.
             | 
             | - It generates exactly the error strings I want based on
             | context an enormous percentage of the time.
             | 
             | I want to stress that I'm just giving a few examples of
             | things that I specifically remember because I talked about
             | them at the time, not characterizing the majority of the
             | experience of using Copilot. The majority of the experience
             | of using Copilot is that you write comments, and then the
             | things you were about to type appear on the screen before
             | you have to type them.
        
               | ilikehurdles wrote:
               | When I find myself writing comments of this style I see,
               | I usually ask myself if this thing would be better
               | extracted into a function. These comments are primarily
               | stating the obvious.
               | 
               | If I find myself writing a 200 line function with nested
               | or repetitive loops I expect to hear from colleagues
               | about how I should refactor it.
               | 
               | I feel that the solution to writing boring, repetitive
               | boilerplate shouldn't be to automate writing more of it,
               | but to reduce or remove it entirely. Seeing things like
               | this just reinforces my preconception that Copilot acts
               | in low quality code environments to produce fittingly low
               | quality code, or with languages like Java where the
               | language is married to boilerplate.
        
               | mistercow wrote:
               | This reply feels pretty bad-faith. But you know, feel
               | free to open a PR if you have something concrete you feel
               | can be improved.
        
         | workingon wrote:
         | Seems like a narrow vision. Is every line of code you write to
         | solve a problem "not boring"? I solve problems I find
         | interesting, but writing matplotlib code to visualize data
         | never is.
        
         | trention wrote:
         | This is true for the current iteration of the model. Probably
         | won't be true at least to an extent in 5 years. Besides, there
         | is nothing wrong with solving boring problems. Not everyone can
         | be Bjarne Stroustrup.
        
         | viraptor wrote:
         | The most interesting problem will have extremely boring bits.
         | If you write a cli tool to solve all of world problems by
         | changeling magic, you'll still need to add the parameter
         | handling and do some error management. Which is repetitive and
         | likely well generalised and predictable based on other
         | projects.
        
         | para_parolu wrote:
         | The problem may not be boring. Typing boilerplate code is. I
         | work on games as hobby. Sometimes I implement mechanics
         | requiring vector math. Working on mechanics is interesting.
         | Writing down math is not. Copilot helps with later.
        
           | mojuba wrote:
           | Then another hypothesis: you probably haven't found the right
           | tools for it yet. I find myself writing biolerplate mostly
           | around some obscure system framework calls (iOS/macOS), but
           | that's rather rare. But even OS API's and frameworks do
           | evolve over time into requiring less boilerplate. Just take
           | the evolution of CoreAudio, the modern Swift interface is so
           | much better. So at the end of the day it's about the tools
           | and interfaces: boilerplate is rarely absolutely necessary
           | with the right tools.
        
             | triknomeister wrote:
             | Maybe github copilot is the right tool.
        
               | mojuba wrote:
               | I don't think so. A human-verified, tested and maintained
               | code is obviously superior to a snippet blindly copied
               | and mixed by a statistical system.
        
               | mistercow wrote:
               | That's not how you use Copilot, any more than it's how
               | you'd use any other autocomplete tool. I don't know why
               | so many people seem to think that using Copilot is just
               | closing your eyes, hitting tab fifty times, and then
               | committing.
               | 
               | You work on your code, Copilot makes a suggestion. You
               | _read_ that suggestion and verify that it's close to what
               | you were already going to do. If it is, you hit tab, then
               | you tweak it. There's nothing blind about this process.
        
         | muzani wrote:
         | Yeah, it's for boring problems. Drawing a circle or detecting a
         | specific format of number in some string, for example.
        
       | mawadev wrote:
       | What stops me from re-uploading copyrighted source, where I
       | remove the notices and push it with an MIT license? If such a
       | data set has been trained with, how do you get it out?
        
       | GuB-42 wrote:
       | > Copilot just sells code other people wrote
       | 
       | So what? Selling code other people wrote is the foundation of the
       | free software movement. It is the entire business model of
       | countless companies, and it is a good thing. Among them are most
       | major linux distro vendors like Red Hat and Canonical.
       | 
       | The value added by Copilot is that they sell you the lines "code
       | other people wrote" you want out of billions.
       | 
       | I still think it is derivative work, and that they should only
       | process code under permissive licenses, or, if they want to
       | include GPL code, make a GPL-only version, usable only for GPL
       | projects. I thought it is what they did, there is so much code
       | under permissive licenses that is should be enough to train their
       | model, but apparently, they don't care, as long as it is public,
       | it is included. For me, they are shooting themselves in the foot,
       | several companies have already banned Copilot due to the
       | potential issues with copyright.
        
       | zokier wrote:
       | Sure, the concern is valid but I feel like this tweet adds
       | absolutely no substance to the discussion and just repeats the
       | same opinion that was already rehashed to death since copilot
       | originally launched. As such, especially with the tone that the
       | tweet has, I don't expect constructive discussion to raise here.
        
       | maxbaines wrote:
       | Initially not thought about co-pilot and other ai generators this
       | way, but now I have I'm finding it hard to ignore.
        
       | k__ wrote:
       | Isn't that what Web2 is all about?
       | 
       | Someone creates content for free, and companies monetize it.
        
         | WesolyKubeczek wrote:
         | The real Web3 is companies sue original creator for
         | infringement.
        
       | VoodooJuJu wrote:
       | It is now proven that copilot returns code from codebases with
       | non-permissive licenses [1].
       | 
       | I'm curious - what are the legal implications of this going
       | forward? I've so many questions.
       | 
       | 1. Will Microsoft ever face lawsuits for these license
       | violations?
       | 
       | 2. If so, who/how? Class-action?
       | 
       | 3. Will copilot be forced to open-source in the future? Under
       | which license? Some open source licenses are incompatible with
       | others, but copilot uses code from probably every OSS license
       | conceived.
       | 
       | 4. If Microsoft faces no justice, will we start seeing more OSS
       | license violations? Will Google start using AGPL-licensed code?
       | 
       | [1] https://news.ycombinator.com/item?id=27710287 | Copilot
       | regurgitating Quake code
        
         | mhaymo wrote:
         | That regurgitated code exists on Github exists under an MIT
         | license: https://github.com/jethrodaniel/fast_inv_sqrt
         | 
         | "jethrodaniel" does not appear to have the copyright to offer
         | that license, but it's hard for Github to determine that in
         | general, so I doubt they would be liable for the error.
        
           | mrh0057 wrote:
           | I'm not a lawyer but my understanding these are torts so all
           | you have to prove is Microsoft has liability. I think this
           | would be easy to prove due to the way neural networks work
           | since it's just a way of performing a search.
           | 
           | Since it's a tort I don't think you have to prove they should
           | have know it would return copyrighted code, the fact that it
           | does is enough to have liability.
        
           | jsiaajdsdaa wrote:
           | That doesn't stop youtube from blasting people away over
           | copyright issues?
           | 
           | On youtube, video uploads are a cost center, whereas on
           | github, code is a profit center
        
           | vorpalhex wrote:
           | > but it's hard for Github to determine that in general, so I
           | doubt they would be liable for the error.
           | 
           | Please insert that meme, "That's not how that works. That's
           | not how any of this works!"
           | 
           | The legal system is permission based, not forgiveness or "I
           | didn't know" based.
        
             | Flimm wrote:
             | I personally don't want to have to upload proof of identity
             | to GitHub and a signed document swearing that I own the
             | copyright to all the code I upload to GitHub, or proof that
             | I coded it. We need to be careful what we wish for.
        
               | vorpalhex wrote:
               | Excerpt from the MIT license:
               | 
               | > THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF
               | ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
               | TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
               | PARTICULAR PURPOSE AND NONINFRINGEMENT.
        
             | minhazm wrote:
             | Actually the legal system is evidence based. Microsoft has
             | evidence that the code they are producing is licensed under
             | MIT as far as they can reasonably know. There's no
             | definitive way to know that who actually owns the original
             | copyright. I could grant permission to use my repo, but
             | maybe I got that code from someone else, who then got it
             | from someone else and so on and so forth. It's a similar
             | situation with stolen goods, if you unknowingly purchase
             | stolen goods you usually cannot be charged for theft as
             | long as there aren't obvious signs that it's stolen such as
             | the goods being priced far below market value.
        
               | sammax wrote:
               | Microsoft has evidence that the code they are reproducing
               | is MIT licensed, so are they intentionally violating that
               | license or does this AI thing include the license and
               | attribution in every snippet it generates?
        
               | monocasa wrote:
               | Major aspects of copyright infringement are strict
               | liability, like a lot of civil actions around damages. It
               | doesn't matter if you thought it was OK, there's still a
               | damaged party that needs compensation according to the
               | law. At best you'll simply avoid the criminal and
               | punitive penalties.
        
               | BaculumMeumEst wrote:
               | Exactly, that's why Pornhub hasn't had any liability
               | issues arising from where its content comes from either.
               | It's just too darned hard to tell.
        
               | monocasa wrote:
               | No, PornHub doesn't have liability in a lot of cases
               | because of 17 SS 512, but has still had to deal with
               | liability in general, which is why they nuked some 80% of
               | their library not backed by verified individuals a while
               | back.
               | 
               | https://www.law.cornell.edu/uscode/text/17/512
               | 
               | A huge part of 17SS512 is the DMCA takedown process
               | mainly in 17SS512(c)(3). Does Microsoft even have the
               | ability to truly remove training data from the model? Or
               | do they have to retrain on each DMCA takedown?
        
             | concordDance wrote:
             | If they had a reasonable basis for believing they had a
             | license they're in the clear. "I didn't know" might not be
             | enough but "I had good reasons to think otherwise" is.
        
               | vorpalhex wrote:
               | > If they had a reasonable basis for believing they had a
               | license they're in the clear.
               | 
               | False.
               | 
               | If they committed copyright infringement, even if they
               | genuinely believed they weren't, they are not in the
               | clear. They still owe damages.
        
           | monocasa wrote:
           | Even if it's somehow available under an MIT license (which is
           | questionable on the part of jethrodaniel), there's still
           | infringement. MIT isn't public domain, it still has
           | 
           | > The above copyright notice and this permission notice shall
           | be included in all copies or substantial portions of the
           | Software.
           | 
           | Replicating it without complying with those terms is still
           | infringement.
        
             | sirsinsalot wrote:
             | this. People are being willfully blind here, like cult
             | members looking dead-eyed at their leader and chanting
             | "This is great" as they drink the kool-aid.
             | 
             | And from Microsoft no less, once outcast for mass
             | poisoning.
        
         | concordDance wrote:
         | There's also one more question:
         | 
         | 5. Even if it is illegal, is it actually bad? No one can
         | possibly sell code snippets, the transaction costs are many
         | orders of magnitude greater than any reasonable price. In my
         | opinion, at least in this case the benefits massively outweigh
         | the costs and the law should not apply here.
        
           | citizenkeen wrote:
           | Then the law should change. Saying "it's illegal but it's
           | good/harmless" is a terrible stance.
        
             | anamexis wrote:
             | Seems like an eminently reasonable stance, and exactly the
             | stance you would take to get the law changed.
        
               | citizenkeen wrote:
               | Fair. I had read "and the law should not apply" as "so we
               | ignore it", not "so we change it".
        
           | xtracto wrote:
           | I really, REALLY like the idea of Copilot. I think it is a
           | glance at what the future of AI can bring to improve
           | programming. I understand where all the litigation and
           | "uneasiness" is coming from, both from commercial and open-
           | source projects.
           | 
           | I've not installed or used it for the same reason (don't want
           | to use AGPL or GPLd code by accident, and don't want my
           | closed source code to be used accidentally as well), but the
           | thought of Copilot being "killed" due to
           | litigation/copyright/licensing issues is sad.
           | 
           | For me, It's kind of like when MP3 first appeared: Sharing
           | music in Napster or downloading Mp3s from Geocities was just
           | amazing. The idea of having such things at your fingertips.
           | Even though I understood the issue the authors had with the
           | unpaid distribution of their music... still, the idea of
           | "what could be..." made it amazing.
           | 
           | I guess Microsoft could be a bit forward thinking, and
           | implement the "Spotify" model in code: Pay OpenSource
           | developers (whoever owns the repo, or whoever made a commit?)
           | a small amount whenever their code gets used through Copilot.
           | 
           | I'm super excited by how "Copilot" related services will look
           | like in 10 years. And I really really hope that the
           | technology/idea doesn't get killed by litigation.
        
             | PaulKeeble wrote:
             | Microsoft could have trained this on their own code and
             | there would be no issue. The problem is instead of doing
             | that they knew full well the approach would reproduce the
             | code and they decided they would rather breach GPL than
             | expose their own code. But I bet Microsoft has more than
             | enough lines to train an AI, there was a clear choice to
             | breach other peoples licenses in preference.
        
             | frazbin wrote:
             | Huh... These comments have given me an idea: MS needs to be
             | forced to train a model to compensate (pay) code authors
             | and codebases based on snippet suggestions given by their
             | tool: the Spotify model replacing Napster!
        
               | sirsinsalot wrote:
               | See: Who Owns the Future by Jaron Lanier
        
               | Graffur wrote:
               | The comment you replied to gave you that idea nearly word
               | for word..
        
           | midasuni wrote:
           | Some people won't let you use their copyrighted work no
           | matter how much you pay, that's reasonable.
           | 
           | By all means allow repos to opt in, although if it's licensed
           | under something like GPL there's no way to convert it to non
           | gpl without permission from every contributor. I for one am
           | not interested in Microsoft or anyone else paying me to close
           | my code.
           | 
           | Allowing people to pay $xxx to copy my copyrighted work
           | without my agreement is simple piracy.
           | 
           | Either they international agreement to drop copyright as a
           | concept, or obey the law.
        
         | rifty wrote:
         | It seems like Microsoft could be in the clear on the basis of
         | it being essentially "search". But it also seems like anyone
         | who uses it could be risking to a high degree getting infected
         | with copyright violating code.
         | 
         | My question is, if it isn't a copyright infringement issue to
         | use copilot in its current form right now, why not just claim
         | copilot was used whenever accused of copyright infringement
         | hence forth?
        
           | solveit wrote:
           | > why not just claim copilot was used whenever accused of
           | copyright infringement hence forth?
           | 
           | Without speaking to the particulars of copilot, this
           | situation where laws seem toothless because of the ease of
           | plausible deniability is actually fairly common. And in many
           | such cases, the law is not as toothless as it seems, because
           | 
           | 1. Getting multiple people to stick to a script under oath is
           | difficult and dangerous.
           | 
           | 2. Criminals frequently send each other messages like
           | 
           | A: "lol I just crimed, hope nobody figures it out."
           | 
           | B: "lol just say you used copilot".
           | 
           | A: "lolol yeah fuck the law"
           | 
           | Obviously this only gets the worst criminals, but there seems
           | to be lots and lots of them.
        
           | TAForObvReasons wrote:
           | Microsoft is trying to legally position Copilot like
           | StackOverflow. It is possible to post copyright-infringing
           | code on SO even though their TOS requires a CC BY-SA 4.0
           | grant to the company and its users.
           | 
           | https://stackoverflow.com/legal/terms-of-service#licensing
        
         | bastardoperator wrote:
         | You don't think a mountain of MSFT lawyers in every state,
         | including partner law firms around the world haven't thought
         | about this? Do you practice law or are you speculating based on
         | emotions?
        
           | worker_person wrote:
           | MSFT tried very hard to sue Linux into oblivion. Buying SCO,
           | then claiming they owned all of Linux.
           | http://www.groklaw.net/
           | 
           | I trust MSFT to screw everyone over.
        
             | bastardoperator wrote:
             | You're making stuff up. MSFT never bought SCO.
             | 
             | https://en.wikipedia.org/wiki/List_of_mergers_and_acquisiti
             | o...
        
               | birdyrooster wrote:
               | Not sure where your confidence came from but a Google of
               | "sco Microsoft" reveals:
               | 
               | By the mid-1980s Microsoft had gotten out of the Unix
               | business, except for its ownership stake in SCO.[20]
               | 
               | https://en.m.wikipedia.org/wiki/History_of_Microsoft
        
               | signatoremo wrote:
               | No, SCO was found in 2002, from Candera Software who was
               | a Linux distributor [0]. How could Microsoft in 1980s own
               | a company that wasn't founded until 2002?
               | 
               | They later filed for bankruptcy in 2007.
               | 
               | [0] https://en.m.wikipedia.org/wiki/SCO_Group
        
               | colejohnson66 wrote:
               | Owning stock, on its own, is not the same as buying a
               | company
        
               | worker_person wrote:
               | If you own a controlling percentage. Then yes it is. That
               | is how you buy/control a publicly traded company.
               | 
               | You can buy 100% of shares and take it private, but
               | that's overkill for what Microsoft wanted.
        
               | colejohnson66 wrote:
               | Hence why I said "on its own"
        
               | birdyrooster wrote:
               | lmao
        
               | bastardoperator wrote:
               | Not sure where yours is coming from, if we look at [20]
               | it makes no such claim.
               | 
               | https://web.archive.org/web/20061105100939/http://www.inf
               | orm...
        
         | Beltalowda wrote:
         | > It is now proven that copilot returns code from codebases
         | with non-permissive licenses [1].
         | 
         | That same Quake example from last year is repeated every single
         | time.
         | 
         | Aside from the fact that GitHub has since added a protection
         | for this, that this example gets repeated time and time again
         | instead of a *list of examples leads me to believe this is (and
         | was not) a common occurrence.
        
         | pwdisswordfish9 wrote:
         | Is there any leaked Microsoft code on GitHub? Someone should
         | check if Copilot regurgitates that as well, then see how
         | Microsoft reacts when someone slaps an AGPL license on that...
        
           | q-big wrote:
           | > Is there any leaked Microsoft code on GitHub?
           | 
           | There seems to be. Google 'windows nt source code leak
           | github':
           | 
           | https://www.google.com/search?q=windows+nt+source+code+leak+.
           | ..
           | 
           | First search results:
           | 
           | Windows NT 4.0:
           | 
           | > https://github.com/lianthony/NT4.0
           | 
           | > https://github.com/ZoloZiak/WinNT4
           | 
           | Windows XP:
           | 
           | > https://github.com/tongzx/nt5src
           | 
           | > https://github.com/onein528/NT5.1
        
         | throwaway23234 wrote:
         | Big meh. That quake code was MIT.
        
           | monocasa wrote:
           | A) Public Quake is GPL. Just because someone else dumped it
           | in an MIT library doesn't change that.
           | 
           | B) MIT still requires attribution to not infringe.
        
         | 542458 wrote:
         | IANAL. My understanding is that the general legal precedent in
         | the US is that a) datamining text has no copyright implications
         | (in the same way that reading a book has no copyright
         | implications) and b) it is not a copyright violation to use a
         | small amount of copyrighted material provided the context is
         | sufficiently transformative. This might seem silly or unfair to
         | you, but that is the current legal reality.
         | 
         | But even ignoring that, everybody uploading code to GitHub has
         | given GitHub the right to analyze that code as per the GitHub
         | ToS. This is the same mechanism by which you can't upload code
         | to GitHub with a license that says "nobody is allowed to
         | display this code on the internet" and then sue GitHub.
        
           | aposm wrote:
           | I can't imagine a scenario in which any lawyer would consider
           | granting Github the right to "analyze" code anywhere close to
           | granting Github the right to spit out that same code verbatim
           | without your copyright notice (even if laundered by AI).
        
             | 542458 wrote:
             | Here's Kate Downing, an IP lawyer specializing in software
             | license:
             | 
             | > According to Downing, the answer depends to a certain
             | extent on where that code is hosted. If it's on GitHub,
             | there very clearly would not be copyright infringement.
             | 
             | > "If you look at the GitHub Terms of Service, no matter
             | what license you use, you give GitHub the right to host
             | your code and to use your code to improve their products
             | and features," Downing says. "So with respect to code
             | that's already on GitHub, I think the answer to the
             | question of copyright infringement is fairly
             | straightforward."
             | 
             | Downing cautions that copilot output of large chunks of
             | code complete with comments are more questionable to use,
             | but that for the most part it looks above board.
             | 
             | https://fossa.com/blog/analyzing-legal-implications-
             | github-c...
             | 
             | Here's an English lawyer on the same topic...
             | 
             | > The licence is broadly worded, and I'm confident that
             | there is scope for argument, but if it turns out that
             | Github does not require a licence for its activities then,
             | in respect of the code hosted on Github, I suspect it could
             | make a reasonable case that the mandatory licence grant in
             | its terms covers this as against the uploader.
             | 
             | https://decoded.legal/blog/2021/06/github-copilot-initial-
             | th...
        
               | [deleted]
        
               | Engineering-MD wrote:
               | To me regardless if it is technically legal, it certainly
               | doesn't feel right. Furthermore, contracts rely on people
               | understanding what they are agreeing to, and I don't
               | think many developers would agree to letting the code be
               | used outside the terms of the license they uploaded it
               | under.
               | 
               | I am very surprised there hasn't been a legal challenge
               | to it.
        
               | mynameisvlad wrote:
               | What, exactly, is there to challenge?
               | 
               | "I'm sorry your honor I didn't understand what I was
               | signing" I don't think has ever been a valid reason in a
               | courtroom, similar to "I'm sorry I didn't know I was
               | committing a crime" is not a valid defense.
        
               | ghusbands wrote:
               | Courts interpret the intended and understood meaning of
               | contracts and terms all the time. Research the term
               | "meeting of the minds" and case law around it.
               | 
               | When the terms were written, it's exceedingly unlikely
               | that they intended it or anyone understood it to be
               | blanket permission to allow a trained AI to copy code for
               | others and no user would have interpreted it that way.
               | Microsoft/Github can't necessarily unilaterally increase
               | the intended range without making it clear in the terms.
               | 
               | If it got to a court case, and both sides could afford
               | it, it could be a lengthy one.
               | 
               | (This comment is not legal advice. I am not a lawyer.)
        
               | mynameisvlad wrote:
               | How does "[allowing] a trained AI to copy code" change
               | the interpretation of the ToS?
               | 
               | By uploading your code, you give Github an exclusive
               | license to use it to improve their services. Copilot is
               | such a service. Just because it's an AI and it provides
               | others code does not somehow invalidate the license you
               | gave.
        
               | BaculumMeumEst wrote:
               | > "If you look at the GitHub Terms of Service, no matter
               | what license you use, you give GitHub the right to host
               | your code and to use your code to improve their products
               | and features," Downing says. "So with respect to code
               | that's already on GitHub, I think the answer to the
               | question of copyright infringement is fairly
               | straightforward."
               | 
               | That's assuming that all code on GitHub is uploaded in
               | good faith by the copyright owner, which is not always
               | going to be the case.
        
         | blihp wrote:
         | 1) Most likely
         | 
         | 2) TBD
         | 
         | 3) Not likely. Worst case a judgement will go against them,
         | they'll effectively pay a fine and then they'll retrain it on a
         | more restricted set of source code.
         | 
         | 4) OSS has a pretty tragic history re: enforcement. It wins
         | nearly every skirmish but has no interest in the war so from a
         | big picture standpoint, it loses due to apathy.
        
       | amelius wrote:
       | "Good artists copy. Great artists steal."
       | 
       | :)
        
       | ewalk153 wrote:
       | If the portion of code that Copilot lifts is the "heart" of the
       | original work, that would be much less likely to be considered
       | fair use[1], regardless of the length.
       | 
       | > For example, it would probably not be a fair use to copy the
       | opening guitar riff and the words "I can't get no satisfaction"
       | from the song "Satisfaction."
       | 
       | I wonder how this could be integrated into the system?
       | 
       | [1] https://fairuse.stanford.edu/overview/fair-use/four-
       | factors/...
        
       | noisy_boy wrote:
       | Say, I want to write a getter method like below:
       | String getName() {             return name;         }
       | 
       | Let us also assume that this snippet, unsurprisingly, has been in
       | several copyrighted repos that didn't grant Github the right to
       | share this code.
       | 
       | So I start tying "getName" and copilot suggests the exact snippet
       | above. If I use this snippet, is it plagiarism? Even though the
       | above code is the most "obvious" way to write this getter and I
       | would have written it this way even without copilot's suggestion?
       | Or does the "uniqueness" or "non-trivial quantity" of the
       | suggestions have any bearing in determining copyright violation?
       | How/where do we draw the line?
        
         | glouwbug wrote:
         | Lucky for you if you, if you wrote a noise function that
         | copilot returned as an implementation of Perlin noise you'd be
         | breaching a _patent_! Said patent just expired a 20 year run,
         | so you'll be okay this time!
        
         | warkdarrior wrote:
         | Clearly your code could be improved with some `Factory` objects
         | and some dependency injection!
        
       | lakomen wrote:
       | I don't understand what's going on there.
       | 
       | I don't use github. Can someone explain what the author means?
       | 
       | Edit: in detail
        
         | npteljes wrote:
         | GitHub Copilot is a paid feature, but that's a red herring in
         | this discussion - people are free to monetize free software,
         | neither or the major licenses forbid this.
         | 
         | GitGub Copilot is an advanced autocomplete / code generation
         | system, based on a machine learning model. The code used for
         | training the model is taken from projects hosted on GitHub.
         | These projects were published under different licenses.
         | 
         | The main questions are:
         | 
         | Some of the licenses need something from you if you create a
         | derivative work. Does the Copilot training itself count as
         | creating a derivative work?
         | 
         | Sometimes the autocomplete basically quotes the original code.
         | Does the original license then apply to the autocompleted /
         | generated code too? How much of verbatim code quoting does it
         | need for the result to be considered a derivative work?
        
           | kaetemi wrote:
           | Those instances where people demonstrate verbatim copies, are
           | mostly either well known snippets which have been copied a
           | million times already, or obvious completions of a partial
           | verbatim piece of the supposedly copied code that any coder
           | could extrapolate.
        
         | lakomen wrote:
         | Nice, being downvoted for asking questions. Nice asshole
         | culture on HN.
        
           | martin_a wrote:
           | Just like with StackOverflow, people are expected to invest
           | some time or amount of work in getting familiar with the
           | topic.
           | 
           | Your question seemed to lack this kind of work and was
           | probably therefore downvoted.
           | 
           | I don't think that's so much about "asshole culture" but more
           | like time management, as not everything can be explained to
           | everybody in every topic.
        
           | tjpnz wrote:
           | You can ask questions but they can't be low effort and need
           | to add something to the discussion.
        
         | niek_pas wrote:
         | Google "GitHub copilot"
        
       | nickjj wrote:
       | This might be overreacting but is there a way to opt-out of
       | Copilot using your code in open source repos?
       | 
       | It feels morally wrong to me that I can spend thousands of hours
       | working on projects on my own free will but then a company can
       | sell the code I wrote to others in the form of snippet completion
       | as a service. In fact they end up selling your code back to
       | yourself if you plan to use the service.
       | 
       | If the answer is no, that moves the needle pretty far in the
       | direction where I'd at least consider the idea of moving all of
       | my repos to Gitlab. I don't care much about stars or popularity.
       | I open source things that are interesting and useful to me and if
       | other folks want to use it they can but I don't gain motivation
       | from others using the projects I release. I like Github and its
       | UI and it's no doubt "the spot" for open source but selling code
       | written by others rubs me the wrong way a lot. It stinks because
       | it also means no longer contributing to other code bases too.
       | It's moving us in the opposite direction of what open source is
       | about.
        
         | ghostbrainalpha wrote:
         | It would be kind of cool if Github could show some stat that
         | code you wrote has been used 50,000 times for 12,000 people.
         | 
         | Being a top CoPilot contributor should at least have value to
         | signal on your resume.
        
         | [deleted]
        
         | ellyagg wrote:
         | Well, I hope your viewpoint doesn't win the day, because making
         | code as freely shareable and remixable as possible is a huge
         | boon for humanity.
        
           | jnsie wrote:
           | It's just as shareable on Gitlab, no? And the issue isn't
           | that code is not shareable - it's that a huge corporation is
           | profiting from this code without consent from the developer.
        
             | leereeves wrote:
             | > a huge corporation is profiting from this code without
             | consent from the developer
             | 
             | Also without attribution. The more permissive licenses
             | allow corporations to profit from shared code, but most of
             | them still require attribution.
             | 
             | And it's really not much to ask: when someone gives you
             | free code, give them credit for their work.
        
           | [deleted]
        
           | celeritascelery wrote:
           | Code being freely shareable and remixable is great. _Selling_
           | that open source code for profit is not.
        
             | WisNorCan wrote:
             | Is your take that Microsoft should offer this for free? Or
             | if they are not willing to do it for free, Microsoft should
             | cancel this service and we should wait for Apache or
             | someone else to offer the service?
             | 
             | Or something else ?
        
               | gfrff wrote:
               | Microsoft should make this service free for open source
               | (not just thought leaders), and compensate people
               | otherwise. I should have a 0.01% equity in Open AI if
               | they're using my stuff like this.
               | 
               | Or they should do opt in.
        
             | earnesti wrote:
             | What is wrong with someone making a little dough. It is
             | just numbers in database.
        
               | jdbernard wrote:
               | Yeah, but those numbers translate to food on the table
               | for my kids, a roof over their heads, better education,
               | etc. Come on, this is a tired response. Nothing is wrong
               | with people making money. There is a lot wrong with
               | people making money off of the hard work of others
               | without any consideration or remuneration.
        
         | jaywalk wrote:
         | If your code is using a license that allows it, how could you
         | possibly opt-out aside from using a different license?
        
           | bouke wrote:
           | Does GitHub verify that the code that is in my repository is
           | actually in accordance to the license that I've added? I
           | could just upload any proprietary code with an incorrect
           | license, and GitHub would just use that to feed their AI.
           | Like any other dependency that you incorporate into your
           | application, GitHub should verify/audit whether the license
           | allows them to do so.
        
           | okasaki wrote:
           | Microsoft could provide an opt-out for projects or even
           | contributors, regardless of licence.
        
           | nickjj wrote:
           | > If your code is using a license that allows it, how could
           | you possibly opt-out aside from using a different license?
           | 
           | A repo setting that instructs Github not to use your code for
           | Copilot, it could be a similar option as turning Discussions
           | on / off.
           | 
           | If they really want to win developers over they would even
           | have Copilot scanning disabled by default but that'll never
           | happen.
        
             | jonny_eh wrote:
             | Sounds like you want a new license that just prohibits use
             | by one company for one purpose.
        
               | widjit wrote:
               | is there something wrong with that?
        
               | jonny_eh wrote:
               | Not at all, you can put any license on your code that you
               | want.
        
               | thamer wrote:
               | There are other AI-based code completion systems than
               | Copilot, at least Tabnine[1] and Kite[2] come to mind,
               | I'm sure there are more.
               | 
               | [1] https://www.tabnine.com/
               | 
               | [2] https://www.kite.com/
        
               | belter wrote:
               | As of today there is a new one...
               | 
               | "Now in Preview - Amazon CodeWhisperer"
               | 
               | https://aws.amazon.com/blogs/aws/now-in-preview-amazon-
               | codew...
        
             | quietbritishjim wrote:
             | Even if Github did provide that setting, as a courtesy,
             | someone could clone / fork the code to another repo (if you
             | use any licence that allows it) and not enable that
             | setting.
        
               | Inityx wrote:
               | Sure that's possible, but there's a huuuge difference
               | between Possible and Default Behavior.
        
               | TAForObvReasons wrote:
               | In a case like this, GitHub itself could set up a bot
               | account that forks all projects as soon as you make the
               | switch. The company in fact would be incentivized to do
               | so.
        
           | sammax wrote:
           | Don't most licenses require at least attribution? I don't
           | believe GitHub is restricting themselves to only licenses
           | that don't. In fact the only software licenses I can think of
           | that don't require attribution are 0BSD, WTFPL, CC0, MIT-0
           | and Unlicense, and they all aren't super popular. Also in
           | some countries creators have inalienable moral rights which
           | can be enforced regardless of the license. For example in
           | Germany it is impossible to relinquish certain rights you
           | have as the creator of a work, including the right to
           | attribution.
        
             | TAForObvReasons wrote:
             | This is an important and overlooked point. Even common
             | permissive licenses (ISC / MIT / Apache-2.0) require
             | attribution
        
               | jazzyjackson wrote:
               | Just as a mind experiment: couldn't CoPilot just publish
               | a list of every github user and attribute the work to all
               | of them?
        
               | TAForObvReasons wrote:
               | CoPilot is a black box at the moment. Microsoft claims
               | they used the public corpus on GitHub. There are plenty
               | of GPL, AGPL, and "source available" projects in the
               | public corpus. So what exactly is the licensing?
               | 
               | The argument may make sense if they limited themselves to
               | public-domain (CC0) works, but that is not what happened
               | here. If CoPilot attributed something to an AGPL project,
               | does it mean the "virality" applies to all projects that
               | use code from CoPilot?
        
               | ntoskrnl wrote:
               | There's also a good amount of commercial and leaked
               | source code on GitHub, including MS's own leaked Windows
               | XP source. I haven't played around with Copilot yet, but
               | if I ever do I plan on copy/pasting some win32 API
               | definitions to see if I can get it to spit out any of the
               | leaked source.
        
             | whoisthemachine wrote:
             | This feels like a tool that can easily be destroyed by a
             | lawsuit, I can't imagine a TOS can force you to give away
             | your copy rights (especially if they allow and encourage
             | you to post your own copyright).
        
               | kragen wrote:
               | If it can't then Wikipedia is doomed; its entire
               | licensing status rests on the notion that editors grant
               | such a license as part of their clickwrap ToS.
        
           | igneo676 wrote:
           | I'm not sure using a different license actually opts you out.
           | By merely hosting your code on GitHub you grant them the
           | right to analyze your code on their servers[1]
           | 
           | They may be morally in the wrong, but I'm unsure they are
           | legally in the wrong here. To boot, denying them the right to
           | create this tool in your license is technically a violation
           | of OSS principles and problematic
           | 
           | [1]: https://docs.github.com/en/site-policy/github-
           | terms/github-t...
        
             | typetheorist wrote:
             | > This license does not grant GitHub the right to sell Your
             | Content. It also does not grant GitHub the right to
             | otherwise distribute or use Your Content outside of our
             | provision of the Service, except that as part of the right
             | to archive Your Content, GitHub may permit our partners to
             | store and archive Your Content in public repositories in
             | connection with the GitHub Arctic Code Vault and GitHub
             | Archive Program.
             | 
             | Wouldn't this be a violation?
        
         | PaulKeeble wrote:
         | It should be automatic based on license. GPL code definitely
         | shouldn't be included but MIT could be. They already have this
         | information in most repositories and if its missing they have
         | no right to use it at all. We don't need extra options the
         | licenses already restrict the use and derivative work.
        
           | [deleted]
        
           | davesque wrote:
           | Not without the text of the license. I, as a developer,
           | cannot just poach open source code under MIT without
           | including the copyright and terms from the original project.
           | From the license:
           | 
           | "The above copyright notice and this permission notice shall
           | be included in all copies or substantial portions of the
           | Software."
        
             | meshaneian wrote:
             | They might argue that a snippet isn't a "substantial
             | portion" of "the Software", and they're only charging for
             | the service not the content - regardless, I don't like it,
             | this is exactly what certain licenses attempt to prevent.
        
               | leereeves wrote:
               | I would argue that substantial shouldn't be measured in
               | lines of code, it should be measured in importance.
               | Something like the fast inverse square root is
               | substantial even though it's short.
        
             | typetheorist wrote:
             | I too have reservations about Copilot, but does the MIT
             | license define a "substantial portion"? I doubt a snippet
             | would fall under either "copies" or "substantial portions"
        
         | [deleted]
        
         | kemiller wrote:
         | This is a really good point that I hadn't considered before.
         | It's facebook all over again -- selling your own content back
         | to you. Repo owners should be at least compensated when their
         | code gets used. That would be an incredible market.
        
         | lbhdc wrote:
         | I stopped publishing open source after all this started coming
         | out because I was so uncomfortable with it.
        
       | [deleted]
        
       | rosmax_1337 wrote:
       | I think this problem has no good solution until IP laws around
       | the world are properly reimagined from the ground up. I'm of the
       | quite radical stance that code, music, art in terms of their
       | intellectual existence should be free for anyone to take. (you
       | can own a harddrive with code on it, and claim noone should steal
       | it, but not the idea of the code itself)
       | 
       | If you have ideas, code, music or art which you wish for noone to
       | partake in, do your best to keep them secret. Certainly, breaking
       | into secret areas should be illegal, but once the cat gets out of
       | that bag it gets out of the bag.
       | 
       | The creative people behind these ideas I believe will be able to
       | find good compensation nonetheless in society, IP-laws nowadays
       | only serve to protect megacorporations to the detriment of
       | creativity and ideas.
        
         | zzo38computer wrote:
         | I agree. This will fix it. I think that copyright and patent
         | should be abolished, but that if it is secret then it is still
         | secret (unless someone else manages to come up with the same
         | thing (e.g. by decompiling a published computer program to
         | reconstruct the source code), which case it can be public). And
         | so then also the AI can copy the code too just as much as you
         | may do so manually; if it is published then you can do it and
         | it should not be illegal to write such things.
        
       | acuozzo wrote:
       | This is, in part, why I will continue to use the original
       | 4-clause BSD license for the code I write.
        
       | wolframhempel wrote:
       | When my last company got acquired, part of the due diligence
       | process was a scan of our codebase for snippets from stack
       | overflow. Every snippet found that wasn't posted with a clear
       | license by the author was challenged and we rewrote it.
       | 
       | Now, I'm not entirely sure how necessary this was from a legal
       | perspective. But introducing an AI into the mix will bring up a
       | lot of uncertainty when it comes to how much change is required
       | for something to no longer be considered a copy/derivative.
        
         | redox99 wrote:
         | Isn't all stack overflow content creative commons?
         | 
         | https://stackoverflow.com/help/licensing
        
           | wolframhempel wrote:
           | it is - which is a problem if you want to repackage something
           | under a different license.
        
             | redox99 wrote:
             | But you said
             | 
             | > Every snippet found that wasn't posted with a clear
             | license by the author was challenged and we rewrote it
             | 
             | How is the license not clear?
        
               | wolframhempel wrote:
               | Fair - that was poorly expressed.
        
         | dmix wrote:
         | That sounds like legal paranoia or a make-work program.
        
         | dmortin wrote:
         | Did the scan find the process if they changed the variable
         | names, for example? Or is that considered a differing snippet
         | then?
        
           | wolframhempel wrote:
           | This is exactly where it gets murky. We had the usual 1-4
           | line snippets. We went the extra mile to change them,
           | rewriting them from scratch, partially with different
           | implementations. Did we need to do that? Would it have been
           | enough to just change a variable name or some spacing or
           | similar? I don't think there's a clear standard.
           | 
           | The music industry has struggled with this for a long time.
           | When is a song derivative, when a copy, when is it "inspired
           | by"...
        
             | anonymoushn wrote:
             | That sounds rough. Here's an 8-line snippet, please make
             | sure you don't infringe my copyright:                   p =
             | mmap(             null,             size,
             | PROT_READ | PROT_WRITE,             MAP_PRIVATE |
             | MAP_ANONYMOUS,             -1,             0,         );
        
       | iptq wrote:
       | I know this isn't really related to the whole copying ethics
       | debate, but I definitely feel like there's some sort of foul play
       | happening here. For all of the unlicensed projects out there, the
       | license that is automatically granted to Github includes:
       | 
       | > the right to store, archive, parse, and display Your Content,
       | and make incidental copies, as necessary to provide the Service,
       | including improving the Service over time
       | 
       | It's insane how vague this is. Is Copilot a "Service"? Sure, by
       | its definition:
       | 
       | > The "Service" refers to the applications, software, products,
       | and services provided by GitHub, including any Beta Previews.
       | 
       | And since much of the code was published before Copilot's
       | inception, this means Github can just arbitrarily add more
       | "services" and milk the code for whatever it wants. Automatically
       | service-ify any public repository? Sure, pay us for quotas. It's
       | like a legal loophole to let Github just bypass any license
       | restrictions you put on it.
        
       | seydor wrote:
       | Programmers are fine when their creations, pretty much all of
       | tech, resells content that other people wrote for free, but no,
       | not code, that one must be expensive
        
         | anonymoushn wrote:
         | I also don't think it's acceptable for TurnItIn to monetize
         | content without paying the authors. My opinion about whether
         | students should have their work stolen and monetized by a
         | company doesn't seem to have much impact though.
        
         | onpensionsterm wrote:
         | The only one making money here is github. Very few programmers
         | are selling open source code. And programmers are (in)famous
         | for not buying software.
        
         | zx8080 wrote:
         | %s/programmers/tech capitalists/g
        
       | danamit wrote:
       | The code Copilot suggest from any given project most of the time
       | is not enough to credit such project, when I look up code in some
       | GitHub repo, and copy it fully or part of it, I do not credit
       | that project.
       | 
       | I do not see Copilot as useful anyway.
        
       | Aeolun wrote:
       | > what github / microsoft is counting on here is that open source
       | developers do not have enough collective power to do anything to
       | stop this
       | 
       | I think it much more likely that they count on everyone liking it
       | way too much to give a shit about their MIT code not being
       | attributed correctly.
       | 
       | I certainly don't. MIT just seems like the most convenient
       | license for people that need licenses (corporations?), so that is
       | what I use.
        
       | pvaldes wrote:
       | Each day sounding more as Zopilote, it seems.
        
       | parhamn wrote:
       | Pretty soon the world is going to come to realize art/creation is
       | just blending, incrementing and repurposing prior art.
       | 
       | No book, painting, codebase, sonnet, design is theft-less.
       | 
       | The art is the space reduction, otherwise we'd just bruteforce
       | away.
        
         | wnkrshm wrote:
         | So the only thing left is handiwork I guess. Engineering isn't
         | different from art in any way, the constraints are just
         | stricter.
        
         | pera wrote:
         | I'm not sure what do you mean by "theft-less" but I believe you
         | might be conflating inspiration with derivative work: Copilot
         | can produce verbatim copies of open-source code, this would
         | make it more similar to how some musicians sample other
         | people's music to create new music.
        
         | lioeters wrote:
         | Recommended:
         | https://en.wikipedia.org/wiki/Exit_Through_the_Gift_Shop
        
         | izacus wrote:
         | > Pretty soon the world is going to come to realize
         | art/creation is just blending, incrementing and repurposing
         | prior art
         | 
         | If that happens, the big copyright/IP conglomerates will
         | immediately jump on that and make sure that laws are adjusted
         | and they get their cut of every single word and line anyone
         | puts near their smartphones ;)
        
         | Agamus wrote:
         | This idea has been around for a while - why... "pretty soon"?
         | 
         | And I'm sure I couldn't disagree with you more. Or are
         | 'influence' and 'theft' the same now?
        
           | TremendousJudge wrote:
           | The idea has been around a while, but the legal system
           | doesn't reflect it.
           | 
           | I don't think it will any time soon though.
        
           | coldtea wrote:
           | > _Or are 'influence' and 'theft' the same now?_
           | 
           | They have been the same for most of history. People could
           | openly copy titles, plots, parts, phrases, etc from prior
           | work. Same for mechanical designs. The only thing preventing
           | them was obscurity (e.g. the inventor trying to make it
           | hidden) not any law or ethical idea that it's bad (there
           | wasn't any). That's how things from math to gears to tunes
           | got better (or changed over time, in the case of art, as
           | better/worse is subjective there).
           | 
           | E.g. globally and historically folk music has been basically
           | taking whatever you want from tunes and songs where everybody
           | does the same with no "permission" asked or needed to be
           | given.
           | 
           | Like 4 verses but want to add a fifth or change some part? Go
           | ahead. Want to play it exactly like you've heard it? Go ahead
           | again.
           | 
           | The idea of "theft" in that regard came in the last 2 or so
           | centuries, and was enforced with artificial legal barriers
           | and new "ethical" concepts that are neither "natural", not
           | present for the vast majority of history (including golden
           | ages of art production).
        
             | Agamus wrote:
             | Not sure why I'm being downvoted here - I agree that this
             | idea has been the same for most of history.
             | 
             | Your example of folk music is an odd one, for exactly that
             | reason - it largely repurposes existing art. For example,
             | Wagner wrote extensively about why we shouldn't respect
             | folk music for this reason. I mostly disagree with him, but
             | his comparison at least illuminates that this isn't so
             | black and white. And that's really just scratching the
             | surface of a complex topic.
             | 
             | I sense that if someone came along 2400 years ago with the
             | exact play that Sophocles had just produced and claimed
             | they had just composed it themselves, immediately after a
             | public performance, someone would claim that theft had
             | occurred. Do you disagree?
        
               | coldtea wrote:
               | > _I sense that if someone came along 2400 years ago with
               | the exact play that Sophocles had just produced and
               | claimed they had just composed it themselves, immediately
               | after a public performance, someone would claim that
               | theft had occurred. Do you disagree?_
               | 
               | Yes. They would say it was "plagiarism", which is
               | different than theft.
               | 
               | And there was no law against either case.
        
             | trention wrote:
             | Except that AI will not lead to "golden ages of art
             | production" because nobody gives a sh*t about art created
             | by AIs. And nobody will.
        
               | Nowado wrote:
               | That's a lot of people to dehumanize with a single swift
               | no true Scotsman.
        
               | coldtea wrote:
               | > _because nobody gives a sh_ t about art created by AIs.
               | And nobody will.*
               | 
               | You'd be surprised. Especially if people don't care/are
               | told/whether it's "created by AI or not".
               | 
               | Whether in "high art" or lowly pop, "generative music"
               | (and fine art) has long been a thing. And people do
               | attach to it (e.g. to Brian Eno's generative works made
               | by rule based systems he programs).
        
               | trention wrote:
               | No, I will not be surprised. Outliers are outliers. "Art"
               | created by AIs will just have price (and cost) of ~0 and,
               | like everything that has a price/cost of 0, nobody will
               | give a sh*t about it. The only real question is how will
               | human artists (provided they exist in your preferred
               | dystopia) will prove that they have created something
               | themselves.
        
               | coldtea wrote:
               | > _No, I will not be surprised. Outliers are outliers.
               | "Art" created by AIs will just have price (and cost) of
               | ~0 and, like everything that has a price/cost of 0,
               | nobody will give a sh_t about it.*
               | 
               | Art doesn't touch people because it has cost.
               | 
               | In fact, for ages certain types of art had no cost -
               | poetry, public festivals, and so on. And many still don't
               | (e.g. free punk/underground/indie/etc public
               | performances), Soundcloud music, and so on.
               | 
               | Most movies and series seen on TV are also ~0 (and for
               | kids, everything is ~0, as their parents foot the bill),
               | but they're still touched by them.
               | 
               | > _The only real question is how will human artists
               | (provided they exist in your preferred dystopia) will
               | prove that they have created something themselves._
               | 
               | Note the loaded words "your preffered dystopia" (who says
               | whether I prefer it or not? I merely describe what's the
               | case. You have some ethical/political point to make).
               | 
               | As for the answer to the question, they wont have to.
               | People respond to the quality of the work, not who made
               | it (and whether they used AI or chance - another popular
               | method - or not).
               | 
               | In fact tons of genius artists have described themselves
               | not as the creators but as "mere conduits", and say the
               | music/words/etc come from "elsewhere" (implying god, some
               | muse, some spirit, etc). Especially when they fell the
               | most "inspired" (the word itself means "visited by the
               | spirit").
        
               | trention wrote:
               | None of those things had zero price and zero cost. The
               | fact that the consumer didn't pay directly for them is
               | irrelevant. You can try testing your theory by trying to
               | sell a "painting" created by DALLE/whatever for more than
               | a third-rate amateur painter can sell one of his. Good
               | luck with that, especially when access to the model
               | becomes easy.
               | 
               | >People respond to the quality of the work, not who made
               | it
               | 
               | This is so painfully incorrect and naive (and contra
               | anything we know about the value of everything which
               | creation has been automated before) that I think it's
               | meaningless to continue this conversation.
        
               | coldtea wrote:
               | > _You can try testing your theory by trying to sell a
               | "painting" created by DALLE/whatever for more than a
               | third-rate amateur painter can sell one of his. Good luck
               | with that, especially when access to the model becomes
               | easy._
               | 
               | As if that proves anything? Sale price is irrelevant.
               | There are paintings sold for millions that 99.9% of the
               | people could not give less fucks for, and "amateur
               | painter" stuff that touch most people who see them.
               | 
               | It's also not like a $2 million in production costs
               | Michael Jackson song with $50M sales is "better"
               | artistically (as opposed to commercially) than a song
               | composed and played by some random guy on an acoustic for
               | ~0.
               | 
               | > _This is so painfully incorrect and naive (and contra
               | anything we know about the value of everything which
               | creation has been automated before) that I think it 's
               | meaningless to continue this conversation._
               | 
               | It was meaningless to begin with, as you don't discuss,
               | you present your "ultimate truth" ("contra anything we
               | know", lol).
               | 
               | In fact there are tons of works where the creator is
               | anonymous (from folk music and art to early house, techno
               | and rave music, a scene with cherished anonymity), and
               | people respond to it just fine...
        
             | js8 wrote:
             | > The idea of "theft" in that regard came in the last 2 or
             | so centuries, and was enforced with artificial legal
             | barriers and new "ethical" concepts that are neither
             | "natural", not present for the vast majority of history
             | 
             | This is true for other forms of property as well, like land
             | ownership.
        
         | mihaic wrote:
         | This type of argument always distracts from the fact that
         | figuring out where we draw the line between theft and
         | reimagining.
         | 
         | The Magnificent Seven for instance was a reworking of Seven
         | Samurai, but stands on its own as an original creation. Going
         | into a cinema and filming a picture to later put on a torrent
         | site is not artistic reworking.
         | 
         | The hard discussion is about what is acceptable, we all know
         | prior art exists.
        
           | scotty79 wrote:
           | There are many differences between those acts of thievery or
           | inspired creation however you might call it. But there are
           | many similarities too. Fascination with the original is one.
           | Desire to own it in one way or another is one too.
           | Differences are in the skills, the means, the result, what
           | was stolen and financial success that came out of the act.
        
           | Griffinsauce wrote:
           | > This type of argument always distracts from the fact that
           | figuring out where we draw the line between theft and
           | reimagining.
           | 
           | This seems to be missing a word, could you clarify?
           | 
           | Also: since you mentioned theft: this actually comes down to
           | the discussion whether you can own thought and/or digital
           | artifacts which can be replicated without taking anything
           | away from the "owner".
           | 
           | Given the absolute choice I'd rather pick complete freedom
           | than restriction. I suspect that anyone's opinion on this
           | follows what they value higher: creation or exploitation.
        
             | mihaic wrote:
             | Sorry, I should have double checked, that sentence was
             | incomplete. Yes, I meant to say that a more nuanced
             | approach is crucial, and that means rejecting that we have
             | to choose between Disney-backed extreme IP laws or total
             | freedom.
        
           | ajuc wrote:
           | > The hard discussion is about what is acceptable
           | 
           | What if we just say "both"? Libraries were a thing for
           | millenia and writers still wrote books. There are costs to IP
           | laws and the benefits aren't obvious.
        
             | Veen wrote:
             | As a writer, the benefits are quite obvious to me.
        
               | Timwi wrote:
               | Convenient, isn't it?
               | 
               | As a consumer, it's quite obvious to me too how it
               | benefits only the writer/creator at the detriment of
               | everyone else.
        
               | barthvr wrote:
               | Because writing a book, shooting a movie, composing a
               | song, takes time ?
               | 
               | So either those pieces are IP-protected, and their author
               | can make money with it, or we have to set up a basic
               | income for everyone, and art becomes free.
        
               | regularfry wrote:
               | It's perfectly consistent to say both that there needs to
               | be a system to ensure creators are compensated and that
               | the current system for doing so is terrible.
        
               | Veen wrote:
               | It is consistent but useless if you have no suggestion as
               | to what would replace the current system in a way that
               | preserves the benefits to both parties.
               | 
               | 1. Creators get a sustainable reward for their work. They
               | wouldn't do it otherwise. I certainly don't do it for
               | fun.
               | 
               | 2. Consumers get to access that work as they wish.
               | 
               | (Of course, this being HN, I'd expect any ideas to apply
               | to developers as well as to writers and artists i.e. if
               | writers have to give up copyright, so do developers,
               | startups, and so on.)
        
               | js8 wrote:
               | Benefits of what? Of copyright enforcement, or of
               | sharing?
        
               | bryanrasmussen wrote:
               | the grandparent comment said the benefits of IP Laws were
               | not obvious. So it is of the benefit of the laws as they
               | currently exist, that implies enforcement of said laws.
        
             | meheleventyone wrote:
             | Libraries pay fees to lend books, at least in our modern
             | capitalist society.
        
               | ajuc wrote:
               | It was a news to me so I checked and it's true. Since
               | 2016 in my country ;)
               | 
               | And it's a symbolic amount for vast majority of authors
               | (country-wide it's around 5-5000 USD per year per author
               | and the distribution is heavily skewed towards 5 USD).
               | 
               | So yeah :) I think authors were fine without these 5
               | bucks a year.
               | 
               | EDIT cause it might not be obvious. It's not per library.
               | It's per country.
        
               | jrochkind1 wrote:
               | Not in the USA, where the "first-sale doctrine" means
               | once you buy a book, you can do whatever you want with
               | that copy of the book (lend, rent, sell, destroy) without
               | needing a license. Libraries in the USA definitely don't
               | pay a fee beyond the purchase price of the book (or they
               | can legally lend donated books etc). Copyright holders
               | don't make any additional money from library lending.
               | 
               | I am not familiar with how it works in other countries,
               | but I have heard something about there being such a fee.
               | 
               | (It's not quite true to say libraries have existed for
               | "millenia" though, with regard to this issue. Mass
               | produced printing hasn't in fact existed for millenia,
               | libraries 1000 years ago had hand-copied manuscripts,
               | probably mostly scrolls. The effect on "the market"? For
               | whatever reason authors were writing then it was not to
               | make money by selling reproductions of their writings,
               | that wasn't a thing. Which means, yeah, btw, people still
               | wrote things and made up stories even when they couldn't
               | make money by charging people for copies to read...)
        
         | Chris2048 wrote:
         | Is it really "just" that? Is there no original creativity in
         | the choices (and skill) in the blending, and choosing what (and
         | how) to blend?
         | 
         | Would you describe a parody, or a critique/review, as equally
         | without original merit?
        
         | natly wrote:
         | Unless every invention is gonna be AI generated (which is kind
         | of a scary situation), intellectual property still needs to be
         | a thing (otherwise people won't have incentive to invent, it'll
         | just be stolen from them).
        
           | pydry wrote:
           | People have an innate desire to invent and create. This is
           | why so many people do it for zero extrinsic reward. Hell,
           | this is the case for almost _every_ musician. They are fed a
           | pittance in streaming, only a bit more than most OSS
           | developers get.
           | 
           | This intrinsic motivation is more normally "farmed" by
           | investors who capitalize and capture the IP value for
           | themselves. This actually has a detrimental effect on
           | innovation.
           | 
           | Doing away with or watering down intellectual property
           | protections will just take big meaty chunks out of the stock
           | market and partly equalize wealth distribution.
           | 
           | It'll probably spur innovation too - historically it usually
           | has, but preserving the existing social order takes
           | precedence over that which is why a lot is invested in
           | persisting the myth that it aids rather than hinders
           | innovation.
        
           | ModernMech wrote:
           | > otherwise people won't have incentive to invent, it'll just
           | be stolen from them
           | 
           | Citation needed. Speaking personally, I spend most of my
           | creative energy on a project which is open source and
           | permissively licensed to the point where I'm fine with anyone
           | stealing it. I expect to earn negative money from it at the
           | limit.
           | 
           | Why do I do it? I dunno it's fun. Can't that be enough?
        
           | Timwi wrote:
           | It's remarkable how many people still repeat this
           | unsubstantiated cliche.
        
       | habibur wrote:
       | We stand on the shoulders of giants. That had been the way for
       | decades. A newer stack over the older one without much thought.
       | And someone in the future will build even a newer stack over the
       | current ones.
        
         | [deleted]
        
       | pornel wrote:
       | Tough pill to swallow. Microsoft's actions don't seem fair, but
       | fighting them with copyright could weaken _fair use_ :
       | 
       | https://felixreda.eu/2021/07/github-copilot-is-not-infringin...
       | 
       | There's a good argument that demanding copyright protections on
       | scraped datasets and short snippets is a double-edged sword. It
       | could harm search engines, distribution of news, and non-
       | commercial ML research too.
        
       | tpoacher wrote:
       | Does this mean I can steal stuff if I say I trained an AI to do
       | it for me?
        
         | bmacho wrote:
         | Is _cat_ an AI?
        
           | tpoacher wrote:
           | Nobody said it can't overfit 100%, right?
        
       | AtNightWeCode wrote:
       | Copiliot will be that bandmate that plays a new riff and leave
       | you wondering about where it was borrowed from.
        
       | capableweb wrote:
       | If GitHub could guarantee that the code Copilot had ingested was
       | only made with OSS licenses, then I don't see what the problem
       | is.
       | 
       | But as far as I understand, GitHub trained Copilot on any public
       | repository on GitHub, meaning even if it doesn't have a license
       | specified (so the user publishing it still has the copyright to
       | it), then I don't see how it can be OK.
        
         | thelastbender12 wrote:
         | It is hard to see how verifying licenses is a solvable problem,
         | when licensing for code dependencies can be transitive. For ex
         | - if I copy code from a GPL codebase like Linux and create a
         | Github repository with an MIT license.
        
           | danuker wrote:
           | You should be able to choose flavors of the model trained
           | only on public-domain code which does not require
           | attribution, for example.
           | 
           | But that would mean Microsoft acknowledging license
           | violations.
        
             | thelastbender12 wrote:
             | Sorry, to be clear, I meant even if a Github user asserts
             | their code is public-domain/no-attribution/unlicensed, they
             | could have lifted it off a codebase that doesn't allow it.
             | It would be tricky for Github to establish the code was
             | indeed original and hence their agreement with the user
             | allows them to train their models on it.
        
               | danuker wrote:
               | > they could have lifted it off a codebase that doesn't
               | allow it
               | 
               | Ah. But then someone else is guilty of redistributing
               | code without permission.
               | 
               | But you're suggesting, GitHub should implement something
               | like ContentID but for code. Which should be cheaper
               | (since code is cheap to analyze, while videos are much
               | more bandwidth-intense). And this would kill two birds
               | with one stone.
        
         | galoisgirl wrote:
         | Here's an example:
         | https://twitter.com/ChrisGr93091552/status/15397316329318031...
         | 
         | > I checked if it had code I had written at my previous
         | employer that has a license allowing its use only for free
         | games and requiring attaching the license. yeah it does
        
           | nl wrote:
           | That's a pretty bad example. He prompted it using the exact
           | function header taken from the code he is complaining about.
           | 
           | It'd be much more interesting if he setup a function that was
           | doing a similar thing but with different parameter types and
           | names, and a different order of parameters (ie, like a real
           | problem).
        
             | triknomeister wrote:
             | Does that matter? A code provided should be provided with
             | the license needed to use the code, otherwise the user is
             | opening themselves up to litigation.
             | 
             | Hence why I agree with another comment somewhere that
             | Microsoft is banking on software developers not litigating
             | about use of their open source code in closed source
             | projects.
        
         | redox99 wrote:
         | Maybe when you accepted GitHub ToS you gave them permission for
         | your code to be used for ML training.
        
           | eloisius wrote:
           | I can't say I remember the terms saying anything to the
           | effect of granting Microsoft a perpetual unlimited license in
           | addition to whatever license I package with the code when I
           | signed up. Not doubting it, but I would have expected that to
           | raise some suspicion long before Copilot was around.
        
             | redox99 wrote:
             | It could be something as innocuous as "you allow your code
             | to be analyzed, processed or otherwise handled by Github
             | software" I suppose, which wouldn't raise suspicion.
        
         | hooby wrote:
         | many OSS licenses require attribution
        
         | saghul wrote:
         | Even if it was trained with OSS licenses, some of them require
         | proper attribution, which copilot doesn't do.
         | 
         | Now, where the threshold is for substantial derivative work in
         | order to require attribution is an interesting question.
        
       | Guid_NewGuid wrote:
       | I find this whole topic very annoying, this is like the 3rd
       | variation to reach the front page today. But it has made me
       | realize why I instinctively dislike Free Software as a movement.
       | 
       | Copyright and licensing are bad, actually. Stop getting worked up
       | about the idea of using courts to punish theft. Stop getting into
       | a frenzy of arousal about the police kicking down doors to drag
       | Billy Gates to jail because 80 characters of fast square root is
       | theft but 79 isn't.
       | 
       | Where on earth is the ambition and vision!? Knowledge is public
       | domain. A commons of knowledge is a public good. The cost of code
       | copying is zero.
       | 
       | Sure in our day job we have to pretend to care about this stuff.
       | But when did the ideological scope of what can be achieved become
       | rules lawyering over license text.
       | 
       | Copy my MIT licensed code without attribution? I don't give a
       | shit, go ahead, I hope it helps, in fact I want a truly public
       | domain license but copyright law is so hostage to corporate
       | interests no such thing exists in many countries.
       | 
       | Free the code.
        
         | eikenberry wrote:
         | There is a license for that, the MIT-0 or the MIT No
         | Attribution License.
         | 
         | https://opensource.org/licenses/MIT-0
        
         | progman32 wrote:
         | I see the free software movement as a variant on your ideals
         | but rooted in practicality given the current environment.
        
           | Guid_NewGuid wrote:
           | I think we share a lot of the same goals but they presuppose
           | openness based on violence, if you don't do what their
           | license says exactly then they're going to use lawyers and
           | courts and the state's monopoly on violence to make you
           | comply.
           | 
           | I think at a fundamental level this abandons any vision of a
           | true commons since as copilot discussions reveal the well is
           | now polluted (to mix metaphors) and though in some frames the
           | code is more free you certainly won't be if you fail to pay
           | the penalty levied in a civil case for misusing it.
        
         | sirsinsalot wrote:
         | "A commons of knowledge is a public good."
         | 
         | Yes but this copilot model takes that, adds value and doesn't
         | itself join the public common good. Instead it takes it, and
         | makes you pay to have it back in another form.
         | 
         | If copilot were open source and the model released for the
         | public good, being built of public data (in your scenario) we
         | would have a very different conversation.
        
           | jazzyjackson wrote:
           | If it was just published as a public good it would probably
           | be as illegal as sci-hub
           | 
           | I consider the $10/m as a donation to the microsoft legal
           | defense fund to allow free access to accumulated knowledge.
        
             | sirsinsalot wrote:
             | To allow access to a service that grants you the
             | accumulated knowledge's output in small bits.
             | 
             | I'm all for a world where these tools help developers, but
             | i'm not here for a system that isn't open. I want to own my
             | tools.
             | 
             | Copilot is a bit like musicians paying a monthly fee for
             | access to a loop library. Except all the loops are rip-offs
             | of other peoples hard work and there's no effort to
             | compensate them.
             | 
             | If I made an AI that resampled music into derivative tracks
             | ... you can be damn sure i'd be sued until my ears bled.
        
           | andybak wrote:
           | And I really don't mind.
           | 
           | I want every line of code I've ever written to be used as
           | much as possible.
           | 
           | I find "intellectual property" to be dubious to the core. I'm
           | not confident enough in my feelings to be a zealot, but if I
           | had to pick sides then I know which side I would pick.
        
             | sirsinsalot wrote:
             | If an AI "listened" to music and created new samples for
             | musicians to use for a fee, do you not think the original
             | musicians should be compensated?
             | 
             | The value transfer is basically theft.
             | 
             | It isn't about the usefulness of the service, or even that
             | something similar is a good thing ... it is about the
             | execution and what it says about fairness for those that
             | worked to create the data it depends on to produce value.
        
               | andybak wrote:
               | I'm not sure I was clear enough when I expressed my
               | doubts about the concept of intellectual property.
               | 
               | Your musical example is playing out in the courts in
               | multiple forms. The Marvin Gaye case, Led Zeppelin, Katie
               | Perry etc.
               | 
               | And each case pushes me further towards wanting to rip
               | down the whole rotten edifice.
               | 
               | We've lived through 4 or 5 decades of unprecedented
               | expansion of the domain to which IP lays claim. Surely
               | it's time for the pendulum to swing the other way?
        
             | JoshTriplett wrote:
             | You're welcome to use a "do whatever you want" license on
             | your code, and people should respect that. (Though even
             | those licenses tend to require attribution, and copilot
             | doesn't do even that.)
             | 
             | Other people use licenses that try to create a commons
             | where if you want to use it you need to share your own
             | code, as a counterpoint to the non-commons in which you
             | can't use code at all. And if people use those licenses,
             | they should be respected as well.
             | 
             | By all means, eliminate copyright, and let all code be
             | copied freely. And until that happens, as long as
             | proprietary code exists and doesn't let anyone copy it,
             | respect copyleft licenses as well.
        
               | andybak wrote:
               | A fair point. "What to do in a world where copyright
               | already exists" is a tougher question to answer and one
               | in which I tend to go back and forth.
        
           | Varqu wrote:
           | People (github in this case) do something to make your life
           | easier so that you can save time for the price of 1 latte per
           | month and you complain?
           | 
           | Software Developers seem to be the most whining profession in
           | the world and I despise this attitude (while being a
           | developer myself)
        
             | tuckerman wrote:
             | People aren't whining because the price is too high, they
             | are upset because some (myself included) believe Microsoft
             | is exploiting developers by copying their work against
             | their wishes and then turning around and selling other
             | developers a product which may or may not be generating
             | code which violates copyright/patent licenses. A developer
             | who inadvertently uses a copilot suggestion which gets them
             | into hot water is going to be spending a lot more than a
             | the cost of a latte to defend themselves in court.
        
               | sirsinsalot wrote:
               | This. It is a matter of (a) consent and (b) compensating
               | people that, without their data, the model would be
               | useless.
        
             | Philadelphia wrote:
             | Yep, anything useful has to be legal and welcomed.
             | Microsoft should start breaking into people's houses and
             | sorting their underwear drawers for them while they're out.
             | Million dollar idea!
        
           | Guid_NewGuid wrote:
           | Yes they haven't paid it forward, or back, but why fight on
           | the occupier's territory. By calling for legal frameworks to
           | enforce this we accept the language and terms of the dominant
           | party. By using courts and the law and creating new law for
           | copyright we actually move further from the goal of
           | abolishing copyright and IP entirely.
           | 
           | Every time we use courts to enforce IP we're strengthening
           | the Walt Disneys and Nintendos of the world.
           | 
           | (I accept I am in a group of like 3 people with this goal but
           | it's my view)
           | 
           | Edit: to expand slightly more on this. People should be able
           | to decompile/reverse engineer whatever the hell they want.
           | They shouldn't have to worry about armed goons kicking down
           | their doors. Every time cases are used to strengthen the
           | enforcement of IP/licensing, whether for the light (FSF) or
           | dark (Micro$oft, Google, etc) the outcome is the same, we
           | move further from that goal.
        
             | ozim wrote:
             | Funny thing is ALL these legal frameworks are there to
             | protect these 3 people like you.
             | 
             | If there would be no enforcement of IP/licensing or legal
             | enforcement - M$, Google etc. would not be nice - they
             | would just come over and kick your doors cut your head off
             | because they could do so. With legal framework they at
             | least have to ask someone else.
             | 
             | You just have to understand you don't stand a chance with
             | your 3 buddies against 10 motivated attackers.
             | 
             | Writing about "accepting terms of dominant party" you
             | clearly never had a robbery at your house - imagine now
             | corporations doing the same when there would be no legal
             | frameworks.
             | 
             | Read up on Dutch East India Company - or just Nestle -
             | Microsoft or Google are still quite nice companies with
             | Walt Disney and Nintendo.
        
               | Guid_NewGuid wrote:
               | This is a slight misreading of my general political
               | position. I am pro-government in general. I find the term
               | "monopoly on violence" to generally indicate someone who
               | lives a very cosseted and easy life who can spend time
               | getting mad about like, seatbelt laws or speed limits, so
               | I use it somewhat tounge-in-cheek.
               | 
               | There's quite a lot of possibilities between DMCAs of
               | youtube-dl repositories and Big-co death-squads
               | decapitating people in their homes. I'd prefer where we
               | are now to the Brazil end of that spectrum but we can
               | imagine better models of digital and intellectual
               | 'property'.
        
             | zzo38computer wrote:
             | I also agree to abolish copyright and IP entirely.
             | 
             | I agree that people should be able to decompile/reverse
             | engineer whatever the hell they want.
             | 
             | And if armed goons (whether goverment or if they are
             | Microsoft or some company) kick down your doors, then they
             | should be arrested for trespassing.
        
             | matheusmoreira wrote:
             | > the goal of abolishing copyright and IP entirely
             | 
             | Completely agree with you. It's the 21st century, once data
             | has been published there is no controlling it anymore and
             | all attempts to do so lead to the destruction of computer
             | freedom. No doubt people all over the world copy code every
             | single day with nobody even finding out about it. I'd
             | rather get rid of all these monopolists than limit the
             | potential of computers to whatever reality enables them.
             | 
             | >I accept I am in a group of like 3 people with this goal
             | but it's my view
             | 
             | Now we're four.
        
             | handoflixue wrote:
             | > Every time we use courts to enforce IP we're
             | strengthening the Walt Disneys and Nintendos of the world.
             | 
             | Can you actually point to substantial examples where Disney
             | or Nintendo benefited significantly from a precedent set by
             | an open source court case? Open source has been around for
             | decades, so it should be trivial to find numerous clear-cut
             | examples at this point... if your theory is actually
             | correct.
        
               | Guid_NewGuid wrote:
               | No, I honestly have no idea. I know nothing about the law
               | and understand even less. I may be wrong about all of
               | this, but if we take the (laughable) idea of justice
               | being blind it stands to reason any precedent that
               | protects a single open source developer also protects
               | Amazon's code.
        
             | JoshTriplett wrote:
             | Proprietary software is more than willing to use those
             | legal frameworks. Unilaterally disarming while your
             | opponent does not is a losing strategy.
             | 
             | As long as copyright exists, copyleft should be respected.
        
           | spullara wrote:
           | It absolutely adds to the common good in the form of people
           | using it to write more open source code.
        
             | sirsinsalot wrote:
             | Seeing as copilot is known to output code thats a straight
             | copy from non-permissive code where the author's permission
             | wasn't obtained ... I'd say it is helping you steal from
             | code authors without giving back (as there is no obligation
             | to open source your code).
             | 
             | Given Microsoft's record of persuing IP violations
             | aggresively through the legal system, I'd say the whole
             | thing is ironic.
        
           | jppope wrote:
           | > "Yes but this copilot model takes that, adds value and
           | doesn't itself join the public common good. Instead it takes
           | it, and makes you pay to have it back in another form."
           | 
           | $10/ month ... how much to you think this thing cost to
           | build, and to maintain?
        
             | nightski wrote:
             | That's the whole point. Without the data, it would be
             | worthless. Microsoft is not paying the full cost because it
             | is ripping the data without asking consent. I'm not saying
             | what they are doing is illegal per se, but it's definitely
             | immoral.
        
               | Guid_NewGuid wrote:
               | But why is it immoral? All that code is still out there,
               | if I had the time and the resources I could build a
               | language model. Unlike commons in the real world (e.g.
               | land, fresh water, etc) a code commons is purely
               | additive. With the release of Copilot (which I don't
               | intend to pay for or use) nothing has been destroyed,
               | instead we'll get more code for less work where companies
               | do pay for their developers to use it, some might even
               | find its way back into the commons as new open-source
               | code (whether more code of copilot generated quality in
               | general is an unalloyed good is left as an exercise to
               | the reader).
        
               | bayindirh wrote:
               | Because copilot is violating the terms I put for my code.
               | My code is GPL. It cannot be put into projects with
               | incompatible licenses. That's my code, and I share it
               | with strings attached. You can't just copy my code and
               | sell to other parties no strings attached.
               | 
               | If that's fine and dandy, Microsoft should also train
               | Copilot on their source code repositories, so we can use
               | that knowledge, too.
        
           | visarga wrote:
           | It costs money to run a huge language model with low latency,
           | in the loop with you - charging 10$/month is reasonable. You
           | need multiple GPUs to load even a single copy. Copilot is
           | adding something extra to the original code - it selects the
           | recommendation from the whole corpus, while keeping the
           | surrounding context into consideration and adapting to your
           | variable names.
           | 
           | And in reality 99.9% of the generated code has no long ngrams
           | in common with the training set, it's already original. All
           | they need to do is to enforce never to generate data
           | identical to the training set, something that can be
           | implemented with a bloom filter, then the generated code is
           | impossible to attribute and should have no legal problems.
           | 
           | In the end what do models like Copilot do? They act like
           | culture - absorbing and replicating memes. They free the
           | knowledge and make it reusable. They can act like a general
           | purpose NLP tool for information extraction, classification
           | and text generation. You can implement your ideas faster with
           | it, don't need to label much data.
           | 
           | It works even with just a prompt. Try OpenAi Codex to extract
           | a receipt to see what I am talking about - it gives you the
           | output in JSON. It's a new tool and a new interface to the
           | computer. There are going to be plenty of open source
           | implementations as well, some are already under training.
        
         | nonbirithm wrote:
         | I think because this kind of ML is so new, we have no choice
         | but to frame arguments for/against in terms of the structures
         | that have been in place for decades past (copyright, open
         | source licenses). We don't yet have the legal language to
         | express dissent against ML in clear yes or no terms.
         | 
         | I think if there were an option to add a machine learning
         | clause and ask individual creators if they wanted it applied in
         | that context, we would see a considerable amount of uptake.
         | It's just that we couldn't forsee this progress happening so
         | soon, and the issue is still not visible enough. I think it's
         | only a matter of time before the culture catches up and new
         | creative works in the coming years are excluded from training
         | sets by their authors with clear and direct language.
         | 
         | By that point there would be no way to argue "but they
         | shouldn't care, they licensed it like this, so I'm assuming
         | it's fine for ML use."
         | 
         | If copyright is not enough to stop another entity from using a
         | person's data for training, then some other protection should
         | be invented that does.
        
         | bayindirh wrote:
         | > I find this whole topic very annoying, this is like the 3rd
         | variation to reach the front page today.
         | 
         | Me too. I also find three iterations of the same subject not
         | enough discourse. We need to take this matter more seriously.
         | 
         | > But it has made me realize why I instinctively dislike Free
         | Software as a movement.
         | 
         | On the other hand, this whole discourse reminds me why I
         | absolutely love Free Software as a movement.
         | 
         | > Copyright and licensing are bad, actually.
         | 
         | This is why we have "Copyleft".
         | 
         | > Stop getting into a frenzy of arousal about the police
         | kicking down doors to drag Billy Gates to jail because 80
         | characters of fast square root is theft but 79 isn't.
         | 
         | And, stop getting into frenzy of arousal about being able to
         | use any and every code piece you see elsewhere in any project
         | regardless of its license.
         | 
         | > Where on earth is the ambition and vision!? Knowledge is
         | public domain. A commons of knowledge is a public good. The
         | cost of code copying is zero.
         | 
         | This is why GPL is important. It forces knowledge to evolve in
         | the open, stay in the public domain and help it actually makes
         | public good. It also doesn't hinder ambition and vision by not
         | taking it to private domain, and keeping it open to everyone.
         | 
         | > Sure in our day job we have to pretend to care about this
         | stuff. But when did the ideological scope of what can be
         | achieved become rules lawyering over license text.
         | 
         | You might be pretending to care about this in your daily job,
         | but we really care. Some of the projects I take part can't ever
         | include GPL code (because the projects are MIT licensed). These
         | texts are court-tested licenses, so they're as proper and
         | serious agreements as the EULAs of "particular" software
         | companies.
         | 
         | > Copy my MIT licensed code without attribution? I don't give a
         | shit, go ahead, I hope it helps, in fact I want a truly public
         | domain license but copyright law is so hostage to corporate
         | interests no such thing exists in many countries.
         | 
         | If I want my code to be copied and possibly closed, I'll
         | license it with MIT or BSD-0 and forget about it, but if I'm
         | licensing my code with GPL3, it means I want that code to stay
         | open. As a license, I expect anyone using that code to respect
         | that license.
         | 
         | > Free the code.
         | 
         | Yes, and respect the license the author selected for his/her
         | code.
        
         | georgeecollins wrote:
         | You may not care about licensing or copyright, and I imagine
         | many others who create code under an attribution license don't.
         | That's still not the same as saying "copyright and licensing
         | are bad." Too many businesses depend on them to exist for me to
         | have that opinion.
         | 
         | If an AI takes a copyright work and makes its own version-- say
         | combining two novels by popular authors in a way that is unique
         | but keeps large parts of the text intact, can I sell that? I
         | think if I were the authors I would be unhappy.
         | 
         | Also, how hard would it be for copilot to include a comment
         | saying "// I got this line from x repo" when you are copying
         | from a new repo? I am guessing not hard at all. Then at least
         | the user would be aware of where their code was coming from and
         | could be expected to make a judgement. If the line is "let a =
         | b" then probably no worries. But if it is hundreds of lines of
         | a simulation, all from the same repo with no changes, then I
         | think some attribution is good for both parties.
        
           | Guid_NewGuid wrote:
           | Don't get me wrong, I know this (copyright abolition) is pie-
           | in-the-sky stuff. I'm using an anon account to post because
           | even advocating for it could be troublesome for employment.
           | But I don't accept we have to be meek or have small goals in
           | talking about this ideological stuff. And I think this has
           | made me realise why I find the Free Software vision so
           | disappointing and weak. And hence why I find all these
           | (ideologically) Free Software aligned takes of sending Billy
           | to jail for a thousand years so irritating.
        
         | Schroedingersat wrote:
         | The problem with this is 'freeing the code' in this instance
         | leads to microsoft building a wall around it and asserting
         | complete control in a few years.
         | 
         | Copyleft exists for a reason and without the ongoing fight for
         | the commons we lose it all.
        
         | vajow46267 wrote:
         | So glad this sentiment is becoming more common in the OSS
         | community! I MIT license everything, if someone wants to make
         | money using stuff I wrote that's awesome, and I wish them the
         | best.
         | 
         | I don't think users owe me anything at all. If people want to
         | PR back that's cool but if not that's cool too.
        
         | wcoenen wrote:
         | > _I want a truly public domain license_
         | 
         | I think this sentence contradicts itself.
         | 
         | A "license" implies that there is a copyright holder who allows
         | usage of the work under the terms of said license.
         | 
         | While "Public domain" implies that there is no copyright holder
         | (e.g. because the copyright expired, was explicitly waived, or
         | is for some other reason not applicable).
         | 
         | If you want to put your work in the public domain, you can do
         | so; simply include a note saying that you dedicate it to the
         | public domain.
        
           | Guid_NewGuid wrote:
           | You're right that it does contradict itself, but the
           | unfortunate situation is that public domain declarations
           | don't work and would make it harder for people to use your
           | code safely in the current licensing model. The closest
           | options are Unlicense and CC0 afaict and both don't work in
           | many European jurisdictions.
           | 
           | I just want people to be able to take my code and do whatever
           | the hell they want with it (including commercially) and
           | optionally contribute to it. Having a license currently makes
           | that easier but every time the Free Software lot going
           | zooming off into the weeds of GPL v3 versus GPL v2 versus
           | LGPL my eyes roll back into my head and I internally start
           | screaming "get a life!".
        
         | notacoward wrote:
         | I suggest you read up on the history of free software and open
         | source. It exists as a reaction to intellectual enclosure, to
         | prevent that ill and create greater freedom of ideas. Yes, it
         | uses the tools of copyright to fight greater ills of copyright,
         | because those are the tools available, and actions like these
         | are necessary to keep the enclosure from happening all over
         | again. Anyone who has actually studied the matter for even five
         | minutes can see how silly the "free software is anti-freedom"
         | FUD is.
        
         | ssalka wrote:
         | Information wants to be free
        
         | mplanchard wrote:
         | If that's what you want, you should license your code not under
         | MIT, but under a license that allows replication/distribution
         | without attribution. Meanwhile, others who do care about such
         | things can license their code under licenses that require
         | attribution/copyleft/etc.
        
           | Guid_NewGuid wrote:
           | But I can't really because the legal systems for it don't
           | exist. I can't relinquish anything https://softwareengineerin
           | g.stackexchange.com/questions/1471... (CC0 looks closer but
           | still doesn't do what I'm after).
           | 
           | And I can't because there are a bunch of, for want of a
           | better word, dweebs who care about this stuff. I don't give a
           | single solitary frick about the finer points of MIT vs GPL vs
           | BSD 3 clause vs CC-BY-NC or whatever-the-hell. But y'all are
           | forcing me to care by making the legal frameworks for
           | software ever more strict and confusing.
           | 
           | I take a maximalist view, don't want the code copied, sliced
           | up, re-used in any form whatsoever with no credit? Don't post
           | it on a code sharing site. Like I say in the OP, in my job I
           | obviously have to follow the rules, but on an ideological
           | level I'll ignore them where I can get away with it outside
           | of work.
           | 
           | If you don't want the code to be used, don't post it online,
        
             | tuckerman wrote:
             | I'm curious if this view is software specific or relates to
             | any work released online? For example, do you feel
             | similarly about a novelist or graphic artist? I reckon at
             | least a few software engineers look at what they produce
             | not entirely differently from how an artist or writer looks
             | at theirs.
        
               | Guid_NewGuid wrote:
               | It's a good, and thought-provoking, question.
               | 
               | First to be flippant the idea of a software developer
               | with that view sounds so unbearably insufferable and full
               | of themselves I hope never to meet one. All code is
               | terrible, be less attached.
               | 
               | Stream of consciousness: Should artists or writers be
               | paid for what they produce? Yes. So why not software
               | developers? I'm paid for what I produce. But then I don't
               | release the stuff I'm paid for for free on the internet.
               | But I'm against DRM, I also think Winnie the Pooh
               | shouldn't have IP protection (now expired). What makes
               | art or literature a different commons from software? I
               | also think all scientific journals should be available
               | for free. Do artists and writers have an alternative
               | route to make money from what they publish, what is the
               | artistic or writer equivalent of open source? I think
               | this is the crux of it, if we're going to do open source
               | let's actually do it and stop being precious about it but
               | this only applies to freely-entered open source. So does
               | that mean I support some form of copyright after all?
               | Then again some old out-of-print books will sell for
               | Amazon for like $4000 so we should be able to copy those
               | for free.
               | 
               | Ultimately it's a question of what a vision for society
               | without copyright would look like. I think software is
               | uniquely placed to start exploring that idea. How would
               | we make a living of software if anyone could reverse
               | engineer (even our proprietary) code freely and safely?
        
               | tuckerman wrote:
               | The reason I ask with writers in particular is because,
               | like code, having access to it necessarily means that the
               | viewer has the ability to copy it as much as they'd like.
               | Unlike software, however, there is no ability to keep the
               | source code private in a book while still having users.
               | 
               | I definitely agree that copyright protections have become
               | far too strong but I don't think we can really ever know
               | if we would have be able to build the strong open source
               | community we have today without coopting the copyright
               | system for copyleft protections. At the same time,
               | perhaps we are past the point where it's necessary and
               | now it's holding us back... it's entirely possible!
               | 
               | To the first thought, I personally see some coding as a
               | creative act (some is doing _a lot_ of work there
               | though). It's not because I fancy myself a Picasso but
               | because I think some (again, doing a lot of work!)
               | solutions/ideas have a bit of their creator in them and,
               | for those works, the author should be able to exert some
               | control over their works. I think this is more
               | philosophical than legal/political, but I would disagree
               | that its flippant :)
        
         | kube-system wrote:
         | > Free Software
         | 
         | > public domain
         | 
         | These are incompatible concepts. RMS's vision of 'free-as-in-
         | freedom' software doesn't let people do whatever they want. It
         | forces those who distribute binaries to also distribute source.
         | This is not possible with a public domain work.
        
         | monocasa wrote:
         | The issue is that whether the free software people want it or
         | not, the copyright system over code exists, and historically
         | has been used as a cudgel against smaller players. If we got
         | rid of copyright over code entirely I'd totally be down for
         | this. And IIRC RMS has said the same thing; that he'd be in
         | favor of the removal of copyright over code as a concept even
         | if it meant neutering the protections of the GPL.
         | 
         | Until that happens, and copyright protections are still used by
         | larger entities, using the same system to protect yourself and
         | (more importantly) your users isn't turning your back on your
         | ideals, but instead simply adjusting your strategy to the
         | current material conditions. Remember that Google v. Oracle
         | (while ultimately a win versus what could have been) was a step
         | back, with de minimis claims left on the table as not a valid
         | defense. The play field is heavily slanted towards the big
         | players and software freedom requires every tool it can put
         | it's hands on at the moment.
        
           | Guid_NewGuid wrote:
           | Interesting that he's said that, I wasn't aware.
           | 
           | I think at its root the problem is copyleft is a mirror image
           | of copyright. It relies on and replicates all the cultural
           | and legal requirements and constraints of the copyright model
           | and curtails an imagining of other possibilities. Every
           | sentence or thought spent on copyleft is misdirected in my
           | view.
           | 
           | Which is why I find Microsoft doing this (potential) en-masse
           | license violation and then a bunch of GPL folks getting mad
           | pretty funny overall. I just find the high and mighty tone
           | annoying, like sure, they've (allegedly) screwed you, but
           | they're going to (theoretically) get away with it because
           | they're rich and powerful, sorry that didn't turn out how you
           | wanted.
        
             | Kbelicius wrote:
             | >I think at its root the problem is copyleft is a mirror
             | image of copyright.
             | 
             | That is the (only)point of copyleft. If it weren't for
             | copyright it wouldn't exist. Fight fire with fire, that
             | sort of thing.
        
           | [deleted]
        
           | zzo38computer wrote:
           | > The issue is that whether the free software people want it
           | or not, the copyright system over code exists, and
           | historically has been used as a cudgel against smaller
           | players. If we got rid of copyright over code entirely I'd
           | totally be down for this. And IIRC RMS has said the same
           | thing; that he'd be in favor of the removal of copyright over
           | code as a concept even if it meant neutering the protections
           | of the GPL.
           | 
           | As someone else asked, I would also want a citation, but I
           | agree.
           | 
           | Actually, I want a license that you can do pretty much
           | anything you want to do with it (including: lack of
           | attribution, distribution without source codes, distribution
           | with source codes (whether they are the original source codes
           | or reconstructed), lack of copyright notices, reverse
           | engineering, circumvention of your own copy and write reports
           | about anything you want to do, to use or not use the software
           | (and to modify or not modify) at your choice, etc), but that
           | you are not allowed to add further legal restrictions to it
           | (with a few exceptions dealing with trademarks (but not all)
           | and allowing conversion to GNU (A)GPL 3 and CC-BY-SA 4.0 if
           | you are able to satisfy the conditions of those licenses) or
           | to derivative works, and that if someone will try to use
           | legal processes against you relating to this, then anyone can
           | countersue.
        
           | matheusmoreira wrote:
           | > And IIRC RMS has said the same thing; that he'd be in favor
           | of the removal of copyright over code as a concept even if it
           | meant neutering the protections of the GPL.
           | 
           | Do you have a citation? I was under the impression he
           | defended copyright because copyleft depends on it.
        
         | marpstar wrote:
         | > Copy my MIT licensed code without attribution? I don't give a
         | shit, go ahead, I hope it helps
         | 
         | This is my feeling as well. I don't build stuff in the open so
         | that I can get bent out of shape at someone not properly
         | licensing it. It's in a _public_ repository, FFS... I assume
         | that if anyone even notices my repo, that they may copy /paste
         | a few lines out of my solution if it helps them.
        
           | sirsinsalot wrote:
           | But this isn't everyone's feeling. And they have a right to
           | choose how their work is used. Thats the basis of commerce
           | being possible here.
           | 
           | The mechanised license ignorance and the way original authors
           | are not compensated is the issue.
           | 
           | If you had a repo you'd worked really hard on, and offered a
           | commercial license or GPL depending on the use (so you can be
           | funded to work on it) ... do you think it is fair that
           | copilot ingests that code and allows others to benefit from
           | your work and knowledge without the commercial license as you
           | intended?
           | 
           | Note how Microsoft always throws out the capitalism "rules of
           | engagement" when it benefits them and undermines everything
           | else. The fact we are even trusting the situation Microsoft
           | are creating is dire, and speaks to the short memory of our
           | industry.
        
             | alar44 wrote:
             | Saying an auto complete of a line of code is "using their
             | work" is a massive stretch.
        
               | sirsinsalot wrote:
               | It isn't autocompleting "a line of code", it completes
               | whole function bodies.
        
           | cududa wrote:
           | Exactly! Do they really think every single line of their code
           | is so precious it requires attribution? If I publish code, I
           | assume it might get pushed, pulled, refactored in a million
           | ways and no one will ever know my name's attached to it. And
           | guess what? I DONT'T CARE. It's code. Not a self-constructed
           | monument to my own intelligence that needs a little placard
           | with my name on it to follow around some clever async
           | function I wrote
        
             | georgeecollins wrote:
             | If its a couple lines of generic code, of course. That's
             | also an indefensible copyright, btw. But if its hundreds of
             | very specific likes of code written to do one thing under a
             | license you don't follow, that's something else.
             | 
             | This isn't just an issue of code. You can write a program
             | that combines songs, or combines novels creating a
             | different work that has sections that are essentially the
             | original protected work. I don't think the authors of those
             | novels are going to be OK with you selling or giving away a
             | version of their work just because an AI edited it or
             | combined it somehow.
        
         | dougmwne wrote:
         | In this thread: many engineers nervously sweating. The moats
         | are drying up and the wizards are about to be thrown out of the
         | castle. This tech is the first product in a long line of
         | products that will massively lower the barrier to entry. It has
         | been a good run, but it was never going to last forever. We are
         | not part of the capitalist class and were never going to be.
        
           | LordDragonfang wrote:
           | Copilot replaces code monkeys, not engineers. Ultimately it's
           | just faster stack overflow, proper software engineers and
           | system architects are going to be just as in demand as they
           | are right now for the foreseeable future. At the point at
           | which that stops being the case, we'll have much bigger
           | societal and existential problems (because it implies the
           | singularity is nigh)
           | 
           | (You're correct on not being part of the capitalist class,
           | though)
        
             | dougmwne wrote:
             | There are a lot of code monkeys out there and I might be
             | one of them. That island of job security seems like it will
             | be shrinking.
        
           | ThalesX wrote:
           | The world might change, but software engineers have been
           | working with and within change their entire careers
           | presumably. I think we'll be OK, as people, no matter what
           | happens.
           | 
           | I was sweating nervously before I started using Copilot
           | awhile ago but I've stopped since because A - it really
           | doesn't replace me, tried really hard; B - I don't sweat
           | nervously for IntelliSense either.
           | 
           | There's also C, where being of an entrepreneurial mindset,
           | I'd love the opportunity to hand over the software to an AI
           | dev and just direct the implementation to my desire until I
           | have a working product. I bet I could secure a higher room in
           | the castle if instead of coding for 8 hours per day I could
           | work on n products with capable AI Software Engineers. We're
           | not there yet though.
        
       | captainbland wrote:
       | If we're all standing on the shoulders of giants (specifically
       | code that other people wrote) then really what Copilot is selling
       | is a ladder to get onto those shoulders faster. I think that's a
       | legitimate aim, as such. However it should be careful about not
       | including unlicensed code and should have a specific 'GPL' option
       | for a model trained with GPL code included.
       | 
       | I suppose it should also generate appropriate copyright notices
       | to satisfy many open licenses. I'd be surprised if copilot could
       | actually link back to the original code like that, though.
        
       | jarenmf wrote:
       | I guess the question is where you draw the line between a
       | derivative work and "learnt by an AI algorithm"
        
         | asimpletune wrote:
         | Who needs a line when there are plenty of obvious examples
         | lifted verbatim?
        
         | triknomeister wrote:
         | If the media copyright industries and their ContentID is
         | anything to go by, it doesn't matter. It's all derivative.
        
       | presentation wrote:
       | Google just sells content other people wrote.
        
       | bborud wrote:
       | Well, this does invite an interesting comparison. If we imagine
       | something like Copilot applied to music I believe the chances of
       | ending up in court would be pretty high. There are a lot of
       | examples of plagiarism lawsuits in popular music and the outcome
       | seems to be entirely random.
       | 
       | One could argue that the information density in chord
       | progressions, bass lines and beats is extremely small. And that
       | any recognizable part of a musical idea that has been "borrowed"
       | would necessarily make up a larger percentage of the complete
       | work than would be the case for a typical application with
       | borrowed snippets.
       | 
       | That's not a bad argument, but it is unsatisfactory because it
       | means that at some point someone has to make a judgement on how
       | much you can borrow.
        
       | ThereIsNoWorry wrote:
       | 1. You most likely agreed to that by using GitHub.
       | 
       | 2. Copy&Pasting Code by manual search exists.
       | 
       | 3. This is just a smart tool so you don't have to figure out
       | yourself what to copy&paste (in the best case) and save a lot of
       | time.
       | 
       | Sometimes I truly wonder how people can genuinely be upset about
       | things like this. What is broken are copyright and patent laws in
       | the 21st century.
        
         | zufallsheld wrote:
         | As to your first point, there are many repositories on github
         | that the author of code did not upload there or where not all
         | contributors to the code are on github or agreed to let their
         | work be used in such a case.
        
           | redox99 wrote:
           | That's really no different than somebody uploading
           | proprietary code they don't own (stolen, leaked, whatever
           | reason etc) on Github. Github has to assume that you are
           | allowed to do so. What are they going to do otherwise,
           | somehow manually verify that each repository is legit?
           | 
           | Now you might say, what about GPL code you don't own. You are
           | allowed to redistribute it (upload to github). But because
           | you are not the owner you can't license it to Github under
           | new terms (that allow them to use it for ML training). But
           | the question still is, is there anything in the GPL that
           | forbids it's code being used for ML training? Even if the
           | generated model is proprietary, has no attributions, etc?
        
             | megous wrote:
             | Ok, takedown requests exists. Say Qualcomm finally wises up
             | and asks github to takedown a copy of the millions lines of
             | their super proprietary 4G modem firmware implementation
             | from github. Will github retrain the model after each such
             | takedown? :D
             | 
             | If not, then it's kinda stupid to argue the point about the
             | lack of knowledge, since lack or not lack of knowledge
             | clearly doesn't matter. Github will happily continue using
             | confidential code even from trigger happy companies like
             | Qualcomm for copilot.
        
               | redox99 wrote:
               | I guess they would add some kind of filter to copilot
               | output that removes results that clearly come from code
               | that was DMCAd.
               | 
               | It's kind of like some employee that worked at Qualcomm
               | and has seen the code. Do you retrain him (aka hit his
               | head until he forgets) after leaving the company?
               | 
               | The comparison might seem silly but as AI advances I
               | expect more and more arguments (especially in court) to
               | come from analogies of humans and AIs.
        
               | megous wrote:
               | What kind of filter? I thought copilot does not output
               | the input data verbatim.
               | 
               | Creating an output filter based on millions lines of
               | DMCAd code that would not cripple the copilot output
               | completely at the same time, sounds like one of those
               | hard problems. Especially if there's no agreed upon
               | definition of copyright "violation" here.
        
         | keraf wrote:
         | The point of this Tweet is about licensing. When using an MIT
         | licensed library for example, you would have to give
         | attribution. But you can easily rewrite portion of that library
         | yourself using Copilot, which could potentially use code from
         | the initial lib, without any attribution or whatsoever. It's
         | even more problematic with licenses such as the GPL.
         | 
         | I guess Copilot could address this by checking the licenses of
         | the projects it uses. Even when combining code, it could pull
         | in the required attribution or avoid GPL licensed code (unless
         | enabled) for example.
        
         | SahAssar wrote:
         | > 1. You most likely agreed to that by using GitHub.
         | 
         | Are you saying that I would need all the original authors
         | consent to upload a repo to github even if I include all the
         | original attribution and licenses? Because what you are
         | implying is that when uploading I'm granting github a license
         | far outside the bounds of the license included, which only
         | _all_ the contributors can do. For example, would the linux
         | project need to contact each and every contributor ever to
         | upload a mirror to github, since their contributions were under
         | GPL but you are implying that the license given to github is
         | much, much broader?
         | 
         | This would make any project not originally started on github
         | and with a few contributors basically impossible to host there.
         | 
         | > 2. Copy&Pasting Code by manual search exists.
         | 
         | The question is who is doing the infringement here. Github
         | copilot is obfuscating the copying and telling it's users that
         | the code is theirs to use, own, etc. as they please but is also
         | taking large chunks of code it does not have the right to
         | redistribute, even less grant licenses to.
        
         | dmix wrote:
         | > Sometimes I truly wonder how people can genuinely be upset
         | about things like this
         | 
         | 90% of Twitter is just inventing new ways to whine about things
        
           | ParetoOptimal wrote:
           | There's some truth there, but there is more negative in
           | outright dismissing the uncomfortable but important ethical
           | dilemmas one might be introduced to.
        
         | teakettle42 wrote:
         | > Sometimes I truly wonder how people can genuinely be upset
         | about things like this.
         | 
         | Tell me you regularly plagiarize without telling me you
         | regularly plagiarize.
        
           | ThereIsNoWorry wrote:
           | Code plagiarization is not a thing by all practical purposes
           | (it's even almost impossible to go to court with that for
           | very obvious reasons). And that's good. Because with that
           | insane lockdown of "Intellectual Property" nothing would ever
           | get done. So, think what you want.
        
             | teakettle42 wrote:
             | > Code plagiarization is not a thing by all practical
             | purposes
             | 
             | Of course it is. Plagiarism is "the practice of taking
             | someone else's work or ideas and passing them off as one's
             | own."
             | 
             | It's unethical and it will get you fired at any reputable
             | company.
        
               | ThereIsNoWorry wrote:
               | Ok, then there doesn't exist a single reputable company
               | with a tech division and we're all unethical. Have a nice
               | unethical day.
        
               | teakettle42 wrote:
               | > Ok, then there doesn't exist a single reputable company
               | with a tech division and we're all unethical. Have a nice
               | unethical day.
               | 
               | I'm deeply disturbed that you think this form of
               | plagiarism is universal -- I can assure you that is not
               | the case.
               | 
               | I work at a FANG currently, and plagiarism is absolutely
               | not tolerated.
               | 
               | In fact, plagiarism has been considered a fireable
               | offense at every other company I've worked at over my 25
               | year long career, and prior to that, considered a serious
               | form of academic misconduct in school.
               | 
               | It's clearly unethical and I've never plagiarized in my
               | life.
               | 
               | I've only run into one instance of someone else
               | plagiarizing code in my career, and that individual was
               | fired.
        
               | ParetoOptimal wrote:
               | > I'm deeply disturbed that you think this form of
               | plagiarism is universal -- I can assure you that is not
               | the case. > I work at a FANG currently, and plagiarism is
               | absolutely not tolerated.
               | 
               | It's universal in any company that doesn't take measures
               | against it. So basically startups, small, medium, and
               | even some large companies.
        
               | ThereIsNoWorry wrote:
               | I'm disturbed you believe regurgitating code snippets is
               | plagiarisation.
        
               | teakettle42 wrote:
               | It's literally plagiarism by definition.
        
               | ThereIsNoWorry wrote:
               | https://stackoverflow.blog/2021/12/30/how-often-do-
               | people-ac...
               | 
               | I feel like you're arguing in bad faith. So, whatever.
        
               | teakettle42 wrote:
               | Explaining a fundamental ethical concept you should have
               | learned in primary school when writing your first book
               | reports is not arguing in bad faith.
               | 
               | SO's license requires attribution.
               | 
               | If you don't want to be a plagiarist, you either need to
               | include attribution, or you need to rewrite the solution
               | in entirely your own words.
        
               | ThereIsNoWorry wrote:
               | So, rearranging conditionals or loops or variables then,
               | problem solved. You cannot 1:1 copy paste anyway. That
               | never works. You always have to adapt it to your
               | particularity. So it's "reworded" by default. And CoPilot
               | is doing nothing else. It's not just 1:1 memorising code,
               | it's a tiny bit smarter than that. I strongly believe
               | you're not a developer. Point taken. I understand your
               | considerations. You should write code sometimes to solve
               | a complex problem that uses some libraries and see how
               | far you get without consulting the internet or books.
        
               | ilikehurdles wrote:
               | I absolutely attribute things I find on SO to where I
               | found them. You finished college maybe a year ago and are
               | already making some absolute judgments about what makes
               | other people qualified to call themselves developers
               | simply because they don't develop as you do.
        
               | [deleted]
        
               | teakettle42 wrote:
               | > I strongly believe you're not a developer.
               | 
               | > You should write code sometimes to solve a complex
               | problem
               | 
               | There's a very high chance you posted your comment from a
               | device using code I wrote.
               | 
               | Glueing together plagiarized code copied from SO, or
               | stolen from OSS projects on GitHub, is not software
               | engineering.
        
               | [deleted]
        
               | lin83 wrote:
               | > I'm deeply disturbed that you think this form of
               | plagiarism is universal
               | 
               | This thread is an eye opener for me too. Do engineers not
               | get trained on their legal obligations? My company is old
               | and not a tradition tech company but we have been running
               | workshops on the issue for years. Even if they don't,
               | what about their legal teams? Or CI tools to scan for
               | licence violations? Some of the responses here are so
               | naive it's crazy. I hope no one is identifying the
               | companies they work for.
        
               | ThereIsNoWorry wrote:
               | Obviously we do. Don't copy paste 10 pages of source code
               | unaltered and sell it as your own.
               | 
               | But that's something entirely different from small code
               | snippets, changed and adapted to solve the same problem a
               | thousand other people already had. Nothing else are
               | developers doing going on GitHub, StackOverflow or any
               | other website to find answers to their questions. That's
               | not naivety, that's how coding works (partially). If you
               | would have to re-invent the wheel everytime you build
               | something new, good luck.
        
               | lin83 wrote:
               | There isn't a threshold for copyright violation. If you
               | copy a 3 line function from a GPL library, you have to
               | comply with the licence. Tools like BlackDuck will pick
               | it up.
               | 
               | Snippets aren't exactly defined but I see them as more
               | than just a single line like "here's how to flatten a
               | list in Python", it's some functionality - e.g. an
               | algorithm implementation or some task.
        
             | lin83 wrote:
             | I don't think that's true and, if it was, it would be the
             | death knell for open source.
             | 
             | Code Plagiarism is taken very seriously by every company I
             | have worked with. Multiple companies have been sued for
             | violating the GPL. The SFC is currently fighting Vizio in
             | court for example. While not commonplace, to say it's
             | "almost impossible" is a stretch. Every large company
             | complies with code copyright obligations for a reason. My
             | company publishes changes to GCC and a dozen other GPL
             | projects. Entire products like Protocode and BlackDuck
             | exist to ensure code compliance. Even small code snippets
             | are flagged.
             | 
             | Over the past few years the source code for Windows, SQL
             | server, Bing and Cortana have all been leaked. If someone
             | built a product using that code, how long do you think it
             | would take Microsoft to sue? CoPilot is one rule for mega-
             | corps and another for everyone else.
        
         | IdiocyInAction wrote:
         | I don't think that something like CoPilot is what most GH users
         | had in mind when they published their code. Also, licenses
         | exist (which CP demonstrably doesn't give a shit about).
        
       | oytis wrote:
       | Copilot sells the service of finding the code that makes sense
       | for what you write. Would be better if it could correctly
       | attribute the source(s) though, I hope they will solve this
       | problem at some point.
        
       | sirsinsalot wrote:
       | Beware geeks with gifts. This is Microsoft. The question isn't
       | "is it good?" but "Why are Microsoft offering it and how is it
       | undermining everyone else?"
        
         | dougmwne wrote:
         | Microsoft will benefit from cheaper and more productive
         | engineers.
        
       | lfrigodesouza wrote:
       | It's as the saying go, "when a product is free to use, the real
       | product is actually you". In this case, our code is the product.
       | Just considering now on swapping to another git provider...
        
       | floor_ wrote:
       | I started self hosting when Microsoft bought github and with this
       | mass theft of copyrighted material and then reselling it for
       | money I'm even more happy with my decision.
        
       | rictic wrote:
       | Copilot very rarely copies code verbatum, and when it does it's
       | very short snippets. When Oracle sued Google over allegedly
       | copying short and fairly trivial snippets of code they were
       | justly derided.
       | 
       | I can't speak to the legal side, but I just don't understand the
       | moral outrage over very occasionally copying such short snippets
       | of code. The key innovations and the actual value that licenses
       | are intended to protect aren't in these short snippets.
       | 
       | And what does copilot bring to the community? Free use by
       | students, free use by open source maintainers, and a huge boost
       | in productivity for a modest fee for professional devs, for a
       | service that no doubt costs a lot to run, even on the margin.
        
       | aaron695 wrote:
        
       | stakkur wrote:
       | At every turn, in every instance, for decades, all stories
       | involving Microsoft end in "...and then Microsoft fucked people
       | over." I've witnessed this firsthand since the 80s.
        
       ___________________________________________________________________
       (page generated 2022-06-23 23:01 UTC)