[HN Gopher] Copilot sells code other people wrote
___________________________________________________________________
Copilot sells code other people wrote
Author : joemanaco
Score : 621 points
Date : 2022-06-23 08:48 UTC (14 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| dgb23 wrote:
| Is it smart enough to:
|
| - respect attribution
|
| - respect copyleft
|
| - respect proprietary licences
|
| - give the user appropriate hints about the above
|
| Or does it just copy code without doing any of this?
| spupe wrote:
| No, it doesn't do any of that. However, it does not "copy code"
| except in marginal use cases, the far more common scenario is
| that it will suggest you very basic code that is akin to a
| Stack Overflow reply.
| dgb23 wrote:
| I read a lot of open source code and might subconsciously
| absorb techniques and patterns that are common. When I write
| code I might be influenced by what I read, not line per line,
| but rather generally.
|
| Is it like that?
| spupe wrote:
| Kinda, but I think you are imagining something bigger than
| it is. At least in my experience, it works well for simple
| stuff like "iterate over x and extract y" or similar
| queries that I imagine are well represented in its training
| data. When you get to very specific functions, its answer
| will be less reliable and more likely to be a wonky rehash
| of the few examples it has for that case.
| pabs3 wrote:
| I wonder if FOSS folks could copyleft originally public/leaked
| but proprietary code using CoPilot.
| yaseer wrote:
| Technically, programmers search, copy and modify code all the
| time.
|
| One might argue copilot puts into software an algorithm that
| humans are already doing. Software like that is usually
| inevitable.
|
| Still, it sucks there's no benefit for the contributors.
|
| The most ethical thing I can think of is some kinda 'Spotify-
| like' revenue sharing model, based on how often their code is
| used by others. Not that they'd ever implement that if they can
| get away with it!
| omnicognate wrote:
| > One might argue copilot puts into software an algorithm that
| humans are already doing.
|
| That argument only works if you think what Copilot is doing is
| meaningfully similar to what humans are doing. The debate about
| how these models relate to human thought might have legal
| implications.
|
| As I understand it (IANAL) copyright doesn't protect ideas and
| concepts. It protects the content itself. In theory, if I read
| some copyrighted work, understand some idea in it and then
| create a new work using that idea, without copying that
| original work, then that is not a derivative work. (I think
| this is at least how it's supposed to work - would love to be
| corrected if that's wrong.)
|
| So if I took a copyright work and rot-13ed it before
| distributing copies, I think that would be clear copyright
| violation, but if I made my own works using concepts I gleaned
| from reading it, it wouldn't be.
|
| So should Copilot be treated like the rot13 algorithm or like
| me understanding concepts and generating new works using them?
| That sounds like a fascinating legal debate to be had.
| teakettle42 wrote:
| > Technically, programmers search, copy and modify code all the
| time.
|
| When following the license terms, preserving the original
| copyright, etc, sure.
|
| However, honest, ethical people (including programmers) do not
| plagiarize.
|
| Copying and pasting code without attribution is plagiarism.
| Doing it without following the licensing terms is a copyright
| violation.
| redox99 wrote:
| I don't consider copying a 3 liner from stack overflow and
| not writing an attribution plagiarizing (regardless if
| technically speaking it is or isn't according to the law).
| teakettle42 wrote:
| Plagiarism isn't a legal concept, it's an ethical one.
|
| You need to either attribute the source, or rewrite it in
| entirely your own words -- just like when writing a paper.
|
| Confirming to the license is also required; iirc, SO
| requires attribution under the CC-SA license.
| redox99 wrote:
| > Plagiarism isn't a legal concept, it's an ethical one.
|
| Well if it isn't a legal but an ethical concept, then
| that's just your opinion, since there isn't some
| universal body that establishes exactly what is ethical
| and what isn't. And as I said in my previous comment, "
| _I_ don 't consider".
|
| > You need to either attribute the source, or rewrite it
| in entirely your own words -- just like when writing a
| paper.
|
| Often times a three liner can not be changed in any way,
| and is the _only_ solution to a problem. In _some_ cases
| you may be able to change it only in terms of indentation
| and variable names (in others you can 't even change
| that).
|
| But assuming you can do that, it makes no sense at all
| just changing indentation and variable names just for the
| sake of changing it.
|
| > Confirming to the license is also required; iirc, SO
| requires attribution under the CC-SA license.
|
| As I said I'm not talking about the legalities.
|
| https://stackoverflow.com/questions/55319570/how-can-i-
| raise...
|
| Are you going to attribute that every time you use
| Math.pow?
| teakettle42 wrote:
| > Well if it isn't a legal but an ethical concept, then
| that's just your opinion
|
| Plagiarism being unethical is just _my_ opinion?
|
| > Are you going to attribute that every time you use
| Math.pow?
|
| Does a simple 2-ary function call of a well-defined API
| qualify as "taking someone else's work or ideas and
| passing them off as one's own."?
|
| If not, then it's not plagiarism.
| redox99 wrote:
| > Plagiarism being unethical is just my opinion?
|
| What constitutes as plagiarism and what doesn't, outside
| of what the law says, yes.
|
| > Does a simple 2-ary function call of a well-defined API
| qualify as "taking someone else's work or ideas and
| passing them off as one's own."?
|
| So you agree that taking some code verbatim from SO is
| not plagiarism then?
|
| What about this, would copy pasting this verbatim be
| plagiarism?
|
| https://stackoverflow.com/a/959004
|
| And this?
|
| https://stackoverflow.com/a/45049763
| teakettle42 wrote:
| > What constitutes as plagiarism and what doesn't,
| outside of what the law says, yes.
|
| It's pretty clear what it is.
|
| The definition of plagiarism hasn't changed since you
| were in grade school and were taught not to copy
| sentences into your papers.
|
| If you still don't understand what plagiarism is now,
| yours is a willful ignorance that doesn't excuse
| unethical behavior.
|
| > What about this, would copy pasting this verbatim be
| plagiarism
|
| > https://stackoverflow.com/a/959004
|
| Yes, that'd be plagiarism. It's also bad code.
|
| You should use the example to understand the underlying
| problem, at which point you will be well-equipped to
| write your own one-liner.
|
| If you can't write it using your own understanding of the
| problem, you're not an adequate programmer and need to
| improve your skill-set ... which won't happen if you just
| keep plagiarizing code you don't understand.
| redox99 wrote:
| You're basically just repeating that your opinion is the
| right opinion.
|
| I don't agree that such example is plagiarism and I'm
| sure a lot of people also would disagree that that's
| plagiarism.
|
| > You should use the example to understand the underlying
| problem, at which point you will be well-equipped to
| write your own one-liner.
|
| > If you can't write it using your own understanding of
| the problem, you're not an adequate programmer and need
| to improve your skill-set ... which won't happen if you
| just keep plagiarizing code you don't understand.
|
| Who says you can't write it by your own, or you don't
| understand it? Stack overflow and tools such as copilot
| are often about saving time, not that you would be unable
| to figure it out by yourself.
|
| And besides that, the point of those examples is that a
| lot of people without searching for those stack overflow
| posts, would type that exact same code character by
| character.
| kaibee wrote:
| > The most ethical thing I can think of is some kinda 'Spotify-
| like' revenue sharing model, based on how often their code is
| used by others. Not that they'd ever implement that if they can
| get away with it!
|
| Based on my understanding of how NNs work, I'm not sure its
| even possible to implement something like that.
| bborud wrote:
| My personal reasons for _not_ using copilot are a bit simpler. I
| believe the act of researching which solutions to use for a given
| problem is not so much about time, or the code you end up with,
| but about developing a better understanding of what you are
| doing. You may end up just cutting, pasting and modifying a piece
| of code you found, but hopefully, you were exposed to a few
| different ways to accomplish the same thing, and it made you
| aware of other choices that could have been made.
|
| You could think of the evolution of practical problem solving in
| software engineering like this:
|
| 1. I have to invent a solution (because nobody else in the world
| has a computer) 2. I have to know of a solution (education, word
| of mouth...) 3. I have to look up a solution in the books I have
| (commoditized knowledge) 4. I can look up solutions on the
| internet <-- (we are here) 5. The computer suggests something and
| I accept (some are here too)
|
| From 1 to 4 the amount of cleverness required to solve small
| problems drops a bit, but your productivity and exposure to
| knowledge probably goes up.
|
| I'm not quite sure what happens from 4 to 5. Personally I'm
| actually more interested in the context solutions are presented
| in than just the solution. In fact, I rarely copy and paste code
| from the Internet, but I often look at multiple
| suggestions/solutions and then borrow ideas or combine ideas from
| several sources.
| ok123456 wrote:
| It replaces a few google searches to look up how to do
| something with a new language or library. Keeping you in your
| editor and from having to context switch, and possibly
| distract/derail you, is worth it.
| kraftman wrote:
| I would be interested to know how many people are actually
| using copilot to generate entire chunks of code that they don't
| understand. For me it's just autocomplete on steroids, its not
| answering any questions I don't know the answer to (other than
| syntax ive forgotten), it's just making the boilerplate faster
| to write so I can think about the actual problem I need to
| solve.
| tartoran wrote:
| Not using copilot but if I did Id use it in the way you
| expressed as well, just for plumbing and tedious stuff.
| Yenrabbit wrote:
| At least the way I use it, it's not taking much away from my
| problem solving. It's just that instead of having to type
| `particlesGeometry.setAttribute('position', new
| THREE.BufferAttribute(positions, 3))` I just write `//Add as an
| attribute` and then hit TAB, since Copilot is smart enough to
| see that I've just prepared some geometry and populated an
| array of positions (both operations also sped up by not having
| to type the obvious bits). You're still having to think through
| the solutions (I'm not just typing '//make a cool particle
| sim') but no longer need to hit SO every few minutes for syntax
| examples when using a new library or something.
| ModernMech wrote:
| That sounds like a problem that could be better solved
| through language and library design rather than an AI that
| sucks up all the code in the world.
| williamcotton wrote:
| And yet after all of these decades, after countless
| advances in libraries and languages, I am still writing
| boilerplate in C, JS, Python, et al.
|
| I'm not sure that a language or library can ever understand
| the context of code without following an ML approach.
|
| Languages and libraries will always allow for more than the
| immediate task at hand. The innovation is that this tool
| understands which specific language or library features are
| probably going to be needed next!
| aembleton wrote:
| Frustratingly I've had it insert Java code into Kotlin.
| tartoran wrote:
| So you're just getting a helping hand doing code plumbing.
| That sounds good as long as you let the helper take the lead
| spupe wrote:
| I disagree. Copilot is selling content-aware code suggestions,
| which is a result of code that other people wrote in their
| platform, and which in no way affects the work of these people.
| boomer_joe wrote:
| We need a licence that forbids use in ML and the people willing
| to sue github for it ASAP.
| ilikehurdles wrote:
| But using it in a GitHub project would be akin to those
| Facebook comments that demand the company not monetize them.
| andrelaszlo wrote:
| There are a few reasons why this could be considered ethical.
| First, open-source code is typically free to use, so the company
| would not be taking advantage of anyone by using it to train
| their AI. Second, the company would be providing a service that
| people are willing to pay for, so they would be generating value
| for society. Third, the company would be transparent about what
| they are doing and would not be hiding anything from the public.
|
| ...the above was generated by GPT-3 (text-davinci-002). Prompt:
| Write an argument for why using open-source code to train an AI
| and then sell the code generating service (without open-sourcing
| it) is ethical.
|
| The main argument against this is that it takes away from the
| open-source community that contributed to the development of the
| code in the first place. By selling a code-generating service
| without open-sourcing it, the company is profiting from the work
| of others without contributing back. This is unfair and takes
| away from the overall open-source ecosystem.
|
| Added two characters to the prompt :P
| Havoc wrote:
| Yes, though in a way so does stackoverflow & friends. Large chunk
| of dev ecosystem is copy paste and I don't think this is
| inherently problematic. It is always a case of standing on the
| shoulders of giants.
|
| Its more of a licensing issue to me. As far as I can tell it was
| train on a blend of licenses which to me makes it inherently non-
| compliant. At least some of it is going to be copyleft and find
| its way into closed source.
| nl wrote:
| This isn't how a language model works.
|
| It's SO frustrating that even on HN people still fall for this
| naive and incorrect analysis. Pasting bits I've said before on
| this topic:
|
| Language models do not work like this. They can copy content but
| usually that's for something like the GPL language text.
|
| Generally they work on a character by character basis predicting
| what is the most likely character to appear next.
|
| This very rarely results in copying text, and almost never rare
| text.
|
| Mechanically it has learnt both syntax of language and how
| concepts relate. So when it starts generating it makes sentence
| that are syntactically valid but also make sense in terms of
| concepts.
|
| That's really different to just combining bits of sentences, and
| it gives rise to abilities you wouldn't expect in something just
| cutting and pasting bits of sentences. For example, few shot
| learning is mostly driven by its conceptual understanding and
| can't be done by something with no way to relate concepts.
| tyingq wrote:
| If this were true, then they would have trained it on all of
| MS's proprietary source code too.
| nl wrote:
| It is true.
|
| And that doesn't follow at all.
| tyingq wrote:
| There's enough examples of it regurgitating longish
| verbatim code out there, and not just comments or GPL
| license text.
|
| If they are comfortable training it on code that isn't
| licensed for unrestricted copy/paste, I don't personally
| understand why they can't train it on their own code that's
| also not licensed for that.
|
| Edit: They even added 'q rsqrt,' to their banned word list
| to squelch an example of long verbatim code passages.
|
| Basically, it's not that I don't understand your
| explanation. It's that it does emit long passages of
| unchanged code in practice, for whatever real-world reason.
| [deleted]
| skc wrote:
| I get the feeling this entire debate would have been non-existent
| had this been a Jetbrains product instead.
|
| The whole thing is just bizarre when the vast majority of
| developers constantly look at OSS code daily and lift
| ideas/patterns/snippets from there regularly without once looking
| at whatever license is attached.
| Luc wrote:
| > the vast majority of developers constantly look at OSS code
| daily and lift ideas/patterns/snippets from there regularly
|
| Perhaps in your circles, but that's certainly not something
| I've encountered over a 25 year carreer.
| skc wrote:
| So when you google a problem and it leads you to a code
| snippet that solves it that just happens to be OSS, you
| immediately scrub your brain and pretend you never saw it and
| instead instead come up with your own completely independent
| solution after the fact?
| avereveard wrote:
| Google usage is outright forbidden for work in institutions
| that care about intellectual property rights, so the brain
| scrub issue is just arguing at the wrong level.
|
| If you're googling solutions around you're already not
| taking intellectual property seriously enough to care about
| what happens after you lift ideas around.
| anonymoushn wrote:
| Can you name these institutions? I am surprised to hear
| that some institutions would prevent devs from viewing
| e.g. documentation of the APIs they are using or academic
| papers about algorithms for computing the multiplicative
| inverses of 64-bit integers, if they accessed those
| things via google
| avereveard wrote:
| IBM and another I'm currently under nda
|
| I think them being also patent farm has a role in it.
|
| Approved dependencies had api doc linked so no need to
| Google these.
| bloat wrote:
| This is interesting. Is the internet completely cut off?
| Do they have internal libraries of documentation for
| third party stuff they are using (paper? digital?) Do you
| have any example institutions, or what domain they are
| working in? Thanks.
| swader999 wrote:
| I think it would be for super secure military coding. But
| business domains? Hardly ever.
| avereveard wrote:
| The issue doesn't solely rest in copyright
|
| A concern, which I think is legit, is that it is quite
| easy for someone with a strong presence in search, web
| advertising, analytics and mobile to puzzle together what
| a company is investing in based on the aggregated
| research and web access from known locations
| skc wrote:
| Very surprised to hear about this actually.
|
| Maybe I live in a bubble, but the likes of
| Google/StackOverflow have been part and parcel of a
| developers toolbox for many years now.
|
| And in any case I wonder how that is enforced. Eg,
| Someone goes home in the evening and visits github,
| learns a new trick and comes into the office the next day
| and implements it.
| teakettle42 wrote:
| > ... and instead instead come up with your own completely
| independent solution after the fact?
|
| Yes, I'm not a plagiarist.
|
| If you're literally copying and pasting code snippets
| without attribution, you're plagiarizing.
|
| You're also probably violating the OSS project's license.
|
| It's no different than copying and pasting someone else's
| sentence or paragraph into a written paper.
| foxhill wrote:
| > I get the feeling this entire debate would have been non-
| existent had this been a Jetbrains product instead.
|
| why so?
|
| > The whole thing is just bizarre when the vast majority of
| developers constantly look at OSS code daily and lift
| ideas/patterns/snippets from there regularly without once
| looking at whatever license is attached.
|
| well, yes, copying an idea or pattern is generally.. accepted,
| to be kosher. copy-pasting too, in small amounts (a function, a
| type). that said, i would (and have) attribute even a notional
| similarity when writing something open source.
|
| i don't think co-pilot even allows the user to find where the
| code came from.
| goerz wrote:
| I am not a lawyer, but my legal intuition / common sense says
| that "code snippets" are not copyrightable. There's some
| sliding scale on when a code snippet would become so non-
| trivial that a reasonable (!) judge would consider it
| copyrightable, but nothing Copilot does is anywhere close to
| that limit, IMO.
| shakna wrote:
| One of the main claims in Google LLC v. Oracle America [0],
| was based around a 9-line rangeCheck function. Whilst some
| code can be too simple and small to copyright, programmers
| and lawyers are probably not going to view snippets the same
| way. Copilot creates risk.
|
| [0] https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_Americ
| a,_...
| marstall wrote:
| most of the code I write is glue sticking together 8 proprietary
| systems nobody's ever heard of. how is copilot gonna help me with
| that?
| sytelus wrote:
| Google just sells content other people wrote.
| pen2l wrote:
| Bit of a stretch to fashion AI-derived/AI-coauthored works as
| other people's work. Are DALL-E portraits done Picasso-style
| unrightfully selling Picasso's works? Is an individual selling
| portraits done Picasso-style unrightfully selling Picasso's
| works?
|
| No, of course not. Joyce's literature was influenced by Ibsen,
| Mozart looked up to Haydn, Newton was humble enough that he
| openly professed he stood on the shoulders of his predecessors,
| Perelman refused the Millennium prize because it wasn't also
| offered to his colleague Hamilton.
|
| All human innovation is iterative, and derivative.
| https://www.youtube.com/watch?v=jcvd5JZkUXY
|
| Our skill doesn't grow in vacuums, without outside mentorship and
| guidance. There are areas where I am upset about the application
| of AI, but this is not one of them. Consider copilot a gentle
| guiding hand for those without access to a second pair of eyes
| nearby to give you reminders on what you may otherwise have on
| the tip of your tongue.
|
| But in the way that Led Zeppelin refused to recognize how
| _heavily_ their music was influenced by delta blues artist was
| unbecoming, I can accept the argument that it is perhaps douchey
| of Github to sit on Copilot as squarely their creation.
| janandonly wrote:
| Isn't every programmer in history (except the gall who invents
| her own language and writes all her own code) simply an
| archeologist for other people's work?
|
| We all Duck/Google for code anyway. Why not admit and make it
| easier?
| pacifika wrote:
| Copilot is doing this on an industrial scale. It's the
| difference between copying sample code and outsourcing your
| work to a third party colkectively
| eline43 wrote:
| You don't understand the difference between many open source
| licenses or the concept of crediting open source code
| authors... it does not mean that the code is free for everyone
| to just use as they please...
|
| https://www.gnu.org/licenses/license-list.en.html for a quick
| intro
|
| Also, are you okay with other people selling *your* work and
| *you* getting nothing out of it? Many people are not.
| sirsinsalot wrote:
| Jaron Lanier's book "Who Owns the Future?" Is all about AI and
| compensating those that input in training these very valuable
| models.
|
| I highly recommend everyone read it.
| Separo wrote:
| GitHub provides the repo hosting and tools for free on public
| projects. I'm happy with this deal.
| jalfresi wrote:
| This does raise a point - do we now have to assume that all
| those services that provide free hosting/access/service to open
| source projects will be strip-mining the work of the open
| source community to sell them back to us all? I almost feel
| stupid believing it was an altruistic move to contribute back
| to the shoulders of giants they were already standing on...
| eloisius wrote:
| I feel scammed too. At this point it should be obvious, but
| I'm finally savvy to the fact that every tech company that
| offers anything free, and you use it to create "your"
| content, is not your friend and you don't even own the works
| you host with them. I feel scammed that GitHub was cool about
| 10 years ago. It was like the professional/cultural center of
| gravity in my career. GitHubbers we're cool people. Everyone
| cool hosted their site on GitHub Pages. I didn't want to see
| a resume; what's your GitHub? Now I feel stupid for having
| contributed whatever tiny bit of brains I did to this AI by
| thinking that I was using the cool, developer-first code
| website.
| tiku wrote:
| I'm using it for a day now and i'm really impressed. It is so
| aware of stuff in old code, that it is scary. I'm working in an
| old application with Zend Framework.
| shahar2k wrote:
| and Dalle2 sells art other people created
|
| (I'm actually not being sarcastic, I think there needs to be some
| sort of pipeline for compensating the artists who are used to
| train these models
| tiborsaas wrote:
| MrDoob has an excellent point about this:
|
| https://twitter.com/mrdoob/status/1539740854956412929
| spupe wrote:
| If you assigned a task to a junior dev, and he/she used some code
| from open source projects and Stack Overflow to develop a custom
| program for the task, would you say that this person is selling
| you other people's code? Is it common or expected for this type
| of use to be acknowledged?
| genezeta wrote:
| About 10 years ago or so, I was working at a certain place.
| They put me into a small team apparently focused on some R+D
| project under the direction of an "architect".
|
| Basically, the project was to package Cordova + Backbone +
| Marionette, plus a couple of tools, under their own commercial
| name. Then they'd go around potential clients presenting it as
| the perfect solution to build hybrid applications for
| web/mobile/smartTV/whatever.
|
| A certain Monday, the "architect" arrived boasting. He did that
| often, but this time he was more boastful. He explained that he
| had spent the whole weekend coding. He had written an
| incredible tool that would create a skeleton for a project from
| zero. You would type something like `tool create` and it would
| create the whole project with all the scripts and some example
| views and whatnot.
|
| It was Yeoman's yo CLI tool, of course. He had just changed the
| copyright in the comments, removed most of the comments, he had
| deleted any mention to yeoman or the original creators, changed
| the name of the executable script and that's it.
|
| The whole thing was OS code picked up from various repos and
| packaged as their own. The company used it to sell development
| projects. The so-called-architect used it to sell himself
| inside the company and then jump away into a startup as CTO.
|
| Is this _common_ or is it just anecdata? I don 't know. It's
| clearly not the only time I've seen something like this and I
| do know that in certain companies around here it isn't exactly
| uncommon. But I can't say how common or uncommon it is.
|
| Would I call this "selling other people's code"? Yes, I would.
| spupe wrote:
| This is clear-cut fraud, but it is also not even close to
| what Copilot or most junior devs are doing.
| XCabbage wrote:
| People I've worked with have different philosophies on this,
| but personally, if you check in code that is distinctive enough
| that I can identify the source you copied and pasted it from,
| and you provided no indication (whether in a comment or a PR
| description) that you copied it, I will really get quite grumpy
| at you about it.
|
| Way too often I burn half an hour needlessly during review in
| one of two ways:
|
| * trying to figure out how the heck someone figured out some
| "magic" code that achieves something by invoking a bunch of
| poorly documented library or framework internals, and trying to
| reverse engineer WTF all the magic does by diving into the
| framework's source... only to eventually think to google the
| whole snippet rather than each individual method call, and
| discover it's copied from a Stack Overflow answer
|
| * trying to figure out why something was written in an
| unidiomatic or overcomplicated way rather than a more obvious
| approach, and commenting at length on how I'd simplify it...
| only to eventually realise it was copied from a Stack Overflow
| answer
|
| Attribution isn't just about making sure the right person gets
| credit, or about license compliance; reviewers and maintainers
| frequently need to be able to see where stuff was copied and
| pasted from in order to do their jobs effectively, even for
| snippets of just a few lines.
| spupe wrote:
| I understand where you are coming from. However, I think you
| are making the assumption that this person simply copy/pasted
| some code with no understanding of it, or that this code is
| then very different from your codebase and needs to be
| refactored. If using Stack Overflow did not add to your
| overall development time but subtracted from it, because it
| was used as an appropriate piece of a much bigger puzzle - a
| far more realistic scenario for both Copilot and our general
| use of SO -, then I see no issue with it whatsoever.
| Certainly no moral or copyright issues as this person on
| Twitter implies.
| thfuran wrote:
| No copyright issues in the sense that no entity is likely
| to ever pursue the matter, sure. But copying and
| commercially using someone else's nontrivial bit of code
| that doesn't have a license that says you can is quite
| blatantly a copyright violation.
| ben-schaaf wrote:
| If I found out a junior dev had been copying copy-left or
| proprietary code then I'd have to rip out that code, have a
| chat with them and figure out what to do from there. Even if
| the code isn't copy-left it's still someone else's code,
| sometimes that's ok but sometimes it's definitely not.
| whatatita wrote:
| If the solution was made up of ideas from OSS and snippets from
| Stack Overflow? No; that's fine.
|
| If the solution was copied from an OSS project without proper
| attribution? Yes. Absolutely. And they'd have words with a
| senior dev and maybe even legal if the code they copied made
| its way into production without attribution.
|
| Many copyleft OSS licenses require attribution and distribution
| of derivative works that we wouldn't allow.
| mbreese wrote:
| It depends on the source of that code and the expected license
| of the code you paid them for. If everything is MIT/BSD (and
| attributed), no problem. If the code was GPL and I'm making a
| commercial product, we have an issue.
|
| I'd also expect for any stack overflow code to include a
| comment with a link to the stack overflow page.
|
| I think one of the key points is to make sure any code taken
| from another source is cited appropriately. If it isn't, or the
| junior dev is passing it off as their own work, then we have
| problems.
| thelastbender12 wrote:
| This is a good thought exercise. I wouldn't call it stealing,
| though I am not sure how legal liability is assessed, say if
| they picked up GPL code unknown to the company, and the company
| is later sued over it.
|
| This isn't derived from principled reasoning, but I think of it
| as similar to community norms. Not the best example, but you
| wouldn't mind someone subletting their homes to Airbnb, but if
| all of your apartment complex does it, it invites regulation. A
| product like copilot enables copying code (even if inspired,
| and not verbatim) at a scale that individual developers can't.
| So respecting software licenses needs to be codified (legally?)
| while previously it was left unmonitored.
| trention wrote:
| It's absolutely fine to allow humans to do that while
| prohibiting (commercialized) AI to do the same thing.
| spupe wrote:
| I don't see why that should be the case in this particular
| scenario, or what benefit is gained from that. Could you
| elaborate?
| jhugo wrote:
| Could you elaborate on why you think a computer program and
| a person should be treated the same way in this respect?
|
| We can take as self-evident that a human is capable of
| reading about something, conceptualising it, and then
| writing something completely new with the knowledge they
| have gained.
|
| I think it's also pretty uncontroversial that the primitive
| "AI" we currently have is nowhere near the level of even an
| average human at these things, and thus we can't just
| blindly assume it is conceptualising rather than copying.
| Copilot regularly produces verbatim copies of existing code
| when working on non-trivial things.
|
| Forget about the "AI" label: Copilot is just a complex
| computer program, that takes code from other people and
| inserts various permutations of it into your editor, whilst
| ignoring the license of that code.
| nl wrote:
| Copilot understands concepts as well as may humans. You
| can see primitive versions of this in the old Word2Vec
| demos showing how those models understand how
| London:England ~= Paris:France
|
| Copilot is much more sophisticated than that, and it no
| more copies code than a human does. It generates on a
| character by character basis given the contextual
| probability of the next character conditioned on the
| previous set of tokens with the "heat" being a factor how
| how randomly it will choose characters.
|
| This is much more similar to how a human writes than
| "copying".
| jhugo wrote:
| "it no more copies code than a human does" < that's a
| very big call right there, considering how much verbatim
| copying has already been documented in Copilot. The
| primitive understanding Copilot has of what it is
| generating doesn't even approach that of the most average
| programmers. It's classic AI: impressive on the surface.
| nl wrote:
| This isn't true.
|
| All the "copied code" I've seen is where the person
| prompts it with a large amount of very unique preamble
| and then it fills in the exact example they are quoting
| from.
|
| Try it without doing that.
|
| And it's weird people think it can't understand
| conceptual relationships. Word2Vec demonstrated that
| nearly 10 years ago and that's a much weaker model in
| terms of both size and techniques than this is.
| jhugo wrote:
| > And it's weird people think it can't understand
| conceptual relationships. Word2Vec demonstrated that
| nearly 10 years ago and that's a much weaker model in
| terms of both size and techniques than this is.
|
| Saying that Word2Vec or Copilot have "understanding" of
| their input requires a redefinition of the word
| "understanding".
| nl wrote:
| What's your definition?
| spupe wrote:
| I think it's best if we sidestep these big conceptual
| questions about what cognition or creativity really are.
| It's hard to find agreement, and perhaps it is not
| necessary to do so.
|
| My position is that if a person hired in a company can
| currently use Google, Stack Overflow and GitHub to help
| develop their custom scripts, and no moral or copyright
| issues are infringed (ie, you don't try to say you came
| up with it on your own, and you use only enough that it
| is clearly fair use), then I think an AI should be able
| to assist in that task. There is no need to complicate
| things by legislating what the AI is doing and what
| Google is doing, as they are very similar things and in
| fact even use similar methods.
| jhugo wrote:
| I would agree with you if the AI was genuinely assisting
| with that task, but it isn't.
|
| It's taking inputs, ignoring their licenses, permuting
| them in ways that are not understandable to the user, and
| then outputting them.
|
| That's an entirely different task than the user reading
| SO or using Google and then writing their own code,
| because the "AI" _is not capable_ of writing its own code
| at that level.
|
| Relying on this tool means ignoring the license of code
| that you're copying, without even knowing that you're
| doing it.
| spupe wrote:
| > That's an entirely different task than the user reading
| SO or using Google and then writing their own code,
| because the "AI" is not capable of writing its own code
| at that level.
|
| I would say it's a very similar task. If I need to
| remember how to use a certain function, I can Google for
| documentation and examples, or I can tell Copilot what I
| want to do. The fact that the solution was presented by
| Copilot or a SO thread is, in my view, irrelevant. And to
| compound on that, I doubt anyone checking SO truly knows
| where that answer came from. The person could simply be
| reproducing a snippet from somebody else, you have no way
| of knowing if it was licensed.
|
| I don't think this is bad either. Even our current shitty
| copyright laws protect that kind of use. I shouldn't have
| to worry whether my little prime number generator uses an
| algorithm first created by John Carmack or Microsoft.
| Programming has evolved rapidly in great part because we
| can all use other people's work and use it to improve
| ours. Of course you shouldn't just copy and paste
| everything and call it a day, but that's hardly what
| Copilot enables anyway.
| jhugo wrote:
| You really seem to be ignoring the core issue by focusing
| on SO though. Everything on SO is fair game, but code on
| GitHub is under a variety of licenses, and when Copilot
| regurgitates it, no matter how complex and inscrutable
| the process is that leads it to do so, it may be causing
| the user of Copilot to misuse that code because it
| doesn't even give them the _opportunity_ to know where it
| came from or what license it was released to the public
| under.
| spupe wrote:
| Again, how does that differ from Stack Overflow? Do you
| go and check whether a given reply belongs to a licensed
| project?
|
| Also, please consider that there is a toggle that allows
| you to block Copilot from using public code.
| jhugo wrote:
| > Do you go and check whether a given reply belongs to a
| licensed project?
|
| All SO questions, answers and comments are CC BY-SA. The
| terms of the site say that anyone submitting this content
| agrees that it's licensed that way, and when you visit
| the site you agree that you are provided with the content
| under that license. It's not necessary for you to check
| whether the submitter had the right to offer it under
| that license; that's their problem. The same goes for any
| content offered to you under a given license on any
| platform. I don't understand what your question has to do
| with the conversation.
|
| The problem with Copilot, and I really can't believe this
| has to be restated over and over again, is that it takes
| code from projects with various licenses, and outputs it
| in your editor in various transformed-or-not-transformed
| ways (the fact that the transformation is extremely
| complex doesn't change anything), and gives you no way to
| know where the code came from, how it was licensed or how
| it has been transformed. So, despite the fact that if you
| use it enough you are virtually guaranteed to use code in
| contravention of its license, you cannot even know which
| projects you have stolen code from or which licenses'
| terms you are breaking.
|
| > Also, please consider that there is a toggle that
| allows you to block Copilot from using public code.
|
| Great. I'm sure its utility doesn't go down at all if you
| turn that toggle off...
| spupe wrote:
| > All SO questions, answers and comments are CC BY-SA.
| The terms of the site say that anyone submitting this
| content agrees that it's licensed that way, and when you
| visit the site you agree that you are provided with the
| content under that license.
|
| Have you ever read GitHub's conditions to know whether
| they also have the right to use your code that way, no
| matter how you decide to license it? I feel that you are
| overly focused on the legal part here, which I'm sure was
| handled by Microsoft's lawyers. I'm more interested in
| the question of principle.
|
| No matter what the terms of use at SO say, anyone can
| give you an answer that is a copy of some code they don't
| own. You may consider that immoral, but I don't, not at
| the scope SO is used for. In addition, the vast majority
| of cases at SO and Copilot are not about complex
| functions being stolen, it's about some dumb code you
| would have found in 2 minutes of googling. What I'm
| trying to argue here is that if we are all cool with SO
| and think it's useful, there is no fundamental difference
| here. We never cared too much about licenses for
| boilerplate code, and I think we shouldn't start now.
| jhugo wrote:
| > Have you ever read GitHub's conditions to know whether
| they also have the right to use your code that way, no
| matter how you decide to license it? I feel that you are
| overly focused on the legal part here, which I'm sure was
| handled by Microsoft's lawyers. I'm more interested in
| the question of principle.
|
| I have, and there is not. Neither could there be -- in
| many cases the person uploading code to GitHub is not the
| copyright holder -- they are just doing something
| permitted under the license -- and for a large open
| source project there could be thousands of copyright
| holders. A random person mirroring some source code to
| GitHub is in no position to negotiate different license
| terms on behalf of the copyright holder(s).
|
| > No matter what the terms of use at SO say, anyone can
| give you an answer that is a copy of some code they don't
| own. You may consider that immoral, but I don't, not at
| the scope SO is used for. In addition, the vast majority
| of cases at SO and Copilot are not about complex
| functions being stolen, it's about some dumb code you
| would have found in 2 minutes of googling. What I'm
| trying to argue here is that if we are all cool with SO
| and think it's useful, there is no fundamental difference
| here. We never cared too much about licenses for
| boilerplate code, and I think we shouldn't start now.
|
| I don't understand why you think a person writing an
| answer on SO and a computer program outputting some
| permutation of its inputs into your editor are the same
| thing. The person writing an SO answer is intelligent and
| capable of conceptual understanding, the computer
| regurgitating code without regard to its license is not.
| spupe wrote:
| >> Have you ever read GitHub's conditions to know whether
| they also have the right to use your code that way, no
| matter how you decide to license it? > I have, and there
| is not.
|
| At least one IP lawyer strongly disagrees, suggesting
| anything you host on GitHub is fair game [1].
|
| [1] https://fossa.com/blog/analyzing-legal-implications-
| github-c...
|
| > The person writing an SO answer is intelligent and
| capable of conceptual understanding, the computer
| regurgitating code without regard to its license is not.
|
| From a copyright perspective, that is irrelevant. In fact
| I would think Copilot has more incentives to not infringe
| than a random SO user, who is very unlikely to be sued. I
| already argued in another post that in my view, from any
| perspective, it is also irrelevant whether it's a person
| or AI doing the same work Copilot does.
| jhugo wrote:
| > At least one IP lawyer strongly disagrees, suggesting
| anything you host on GitHub is fair game [1].
|
| The question is whether Copilot's _users_ can use the
| regurgitated code without following the license terms,
| not whether Copilot was allowed to train their model on
| it. I agree it 's likely fine for them to train the
| model, but the _use_ of Copilot would seem to be a legal
| minefield.
|
| A little thought makes it clear that an affirmative
| answer would be absurd. This would mean that using a
| simple tool (let's say `cat`) to make a copy of some code
| and subsequently ignoring its license terms is
| infringement, but if the software used to make the copy
| is more complex (or perhaps if it has the "AI" label
| stuck to it!) the same actions are not infringement.
| simion314 wrote:
| If I make a script and train it on Windows source code do
| you think MS will like it if I use that script on Wine ?
| I am sure MS will say the license did not allows it and
| your script transformations are not original, so GPL or
| similar license should be respected by Microsoft too.
|
| >My position is that if a person hired in a company can
| currently use Google, Stack Overflow and GitHub to help
| develop their custom scripts, and no moral or copyright
| issues are infringed (ie, you don't try to say you came
| up with it on your own, and you use only enough that it
| is clearly fair use),
|
| Only a judge will determine if it is actually free use,
| if you by change copied some super clever and unique code
| into your code base then I am sure a judge will not say
| it is fair use, copilot was proven it will do this(though
| MS said they put some IF-ELSE checks in the AI to prevent
| the plagiarism to be detected by removing obvious results
| and maybe obfuscating stuff more).
|
| Maybe Stack Overflow license allows you to copy paste the
| answers in your code, but GitHub code has repo specific
| license that you need to respect.
|
| If MS trained the model on all their private repos too
| and made the model free software then many would not have
| this issues. Or keep the model proprietary and train it
| only on the MS repors, BSD and similar licensed repos.
| trention wrote:
| You are saying that the AI should be treated the same way
| as a person would regarding its 'output'. I disagree.
| This is a conceptual disagreement and you cannot just
| sweep under the rug "what cognition or creativity really
| are".
|
| At the end, when in several (2-5) years we start seeing
| structural unemployment emerging because of AI
| deployments, this will be resolved by the legal system,
| most likely by some sort of partial prohibition of
| training/monetizing such systems.
| spupe wrote:
| I think I still have not understood your argument. Are
| you saying that you are afraid that AIs will become too
| powerful and cause unemployment, and therefore we should
| regulate them now before they do so?
|
| Many people are worried about this, which is why there is
| a lot of debate about minimum income programs. However,
| at present, what Copilot is doing is similar to what
| Google does, and it is certainly not going to replace
| devs any time soon. Personally, I think we should exploit
| technology to its fullest, and the only reason we can
| have this conversation is because in the past, we haven't
| given too much consideration about the mailmen,
| secretaries, delivery workers and everyone else who got
| displaced by our use of the internet and similar
| technologies. We merely adapted to better exploit them.
| trention wrote:
| I am not saying (in that last comment) what should
| happen, I am saying what will happen. Past automation in
| terms of impact is nothing compared to what's coming and
| people and lawmakers will react accordingly - not in
| favor of the automators.
| jhugo wrote:
| No matter how complex a program is, and no matter whether it
| uses techniques sometimes described as "AI" in its
| implementation, it's not a person. Copilot is just a very
| complex pipeline from other people's code to your editor, which
| ignores the license of those other people's code.
| whywhywhywhy wrote:
| Same deal for Dall-e if they ever productize it.
| lysecret wrote:
| Don't we all.
| vbezhenar wrote:
| I somewhat agree with that. Yesterday I edited some exotic
| configuration (Kubernetes CSI driver for Cinder) and Copilot
| suggested me config which looked like someone's config. There
| were no values, so it was good at filtering them out, but it
| definitely looked like cleaned part of code which resides in some
| project.
|
| I don't think that's bad though. Code sharing is good for overall
| productivity.
| c01n wrote:
| MS and Github are thieves, all their code is closed source, yet
| they sell copyrighted code they don't own. If they told us years
| ago that our code will be automatically stolen by an "AI", most
| coders would not have created an account. The innovation here is
| that they have access to most of the worlds open source code and
| automated the stealing.
| blitz_skull wrote:
| Man, people really do be angry that the public code they put on a
| public platform is being used publicly.
|
| Wild.
| aetherspawn wrote:
| Copilot is a fancy pattern bot.
|
| Humans make original patterns, but since Copilot cannot think,
| then Copilot does not. It squashes together a bunch of small
| individual patterns, each under their own license, but at no
| stage does it do anything more than pick a line from here, and a
| line from there.
|
| It doesn't think, and it doesn't create new IP.
|
| It is like making a picture out of small snippets of a thousand
| other pictures, and then selling it.. clearly not OK. You still
| ripped off the original artists.
|
| Or like plagiarising 100 of your class mates' assignments. Are
| you less guilty because you went to the effort to steal just a
| few sentences from each?
|
| A criminal who steals a cent from every account at the bank is a
| more sophisticated thief than someone who holds up a petrol
| servo.
|
| If Copilot doesn't create new IP (it doesn't; we established
| this), then it uses existing IP. And in that case it is no
| different to any of the three analogies above.
| honkler wrote:
| license issues will save many thousand jobs.
| nathias wrote:
| Copilot is a new way for corporations to break copyright while
| enforcing it for everyone else, this will be the first big use
| for AI when other corpos follow.
| 0x_rs wrote:
| I'm not a lawyer, nor very well versed in the vast world of
| licenses and their definitions in court contexts, but I've been
| wondering about something with the growing appeal ML-generated
| content has for the average person (and the "high" barrier for
| entry in the market) -- are licenses in some form or another
| going to adapt to this phenomenon? From a brief search, I have
| not found any new license with a no-dataset-usage clause
| (assuming fair use does not apply, that's another big question).
| What are the chances anything of the sort will become an option
| for any "creative" work that's usually shared freely (such as
| artwork, code, et cetera) even despite copyright? What about the
| ownership of the dataset? It seemed to be questionable years ago
| already that possibly IP-protected content goes through the black
| box and resembling material gets on the other side, whose
| ownership is it really? I'm guessing some notable court cases in
| the future could define this in the following years if the
| popularity continues growing.
| abdulhaq wrote:
| That's like saying a plumber just sells parts that other people
| made
| WesolyKubeczek wrote:
| Except that a plumber buys them first. For money.
| gtf21 wrote:
| Which the plumber has bought and paid for and then installs for
| you, which makes this pretty fundamentally different.
| borishn wrote:
| Copilot is fair use, get over it!
|
| Copilot is not writing your code any more that Google search is
| writing your code. You are writing your code, and Copilot is just
| making suggestions.
|
| US constitution secures limited copyright to "To promote the
| progress of science and useful arts". Copilot is just that, get
| over it!
| jazzyjackson wrote:
| Personally I think I'll just claim all the code I write with
| co-pilot is a parody.
| nescioquid wrote:
| Not an expert, but fair use generally covers education,
| criticism, parody, and satire. There is a test for meeting fair
| use and it includes things like amount copied and commercial or
| non-profit interest.
|
| The amount copied from any particular source might be small,
| but an aggregate strip-mining of many copyrighted sources is an
| interesting twist. Another might be, as you suggest, it might
| be a machine that itself does not violate copyright, but has
| the effect of causing users (who accept the suggestions) to
| violate copyright.
| collegeburner wrote:
| Google does the same thing taking snippets out of pages or
| even completely caching them so you can see the entire page
| from their servers.
| brianmcc wrote:
| Wait till it suggests something Disney can argue they own
| rights to...
| nojs wrote:
| You mean like DALL-E? This debate is going to get interesting
| when "in the style of" illustrations and videos go
| mainstream.
| acuozzo wrote:
| LucasFilm - Pixar - Disney. I wonder if the mouse owns Duff's
| Device...
| Buttons840 wrote:
| A good and well argued opinion made hostile by saying "get over
| it" twice! Saying "get over it" discourages further discussion.
| Your comment would be better without it.
| cududa wrote:
| Get over it.
| borishn wrote:
| You are right, but it is so frustrating how people whine
| about this.
| humanwhosits wrote:
| Citation needed for copilot being fair-use
| zerocrates wrote:
| Yes, the copyright clause gives as its purpose "the progress of
| Science," but that doesn't mean that anything which claims to
| be "progress" gets a free pass.
| ajb wrote:
| Indeed, the US supreme court pointedly refused to accept that
| the purpose clause limits the power of copyright in "Eldred
| Vs Reno" (at least, that is my understanding as a non lawyer)
| bmacho wrote:
| On a side note, I do believe that short programs or functions
| should be copyright free by law.
|
| Or we as a community need to create a better bsd, a cc0 for
| everything.
|
| Almost everything is nontrivial, and almost everything is
| copyrighted, at least with the pressure to name the original
| author (BSD, GPL, other major permissive licenses).
|
| Say you want to use a library, then you check for examples in the
| documentation, now you have to denote somewhere that the example
| is from the documentation (best if you put it in the source code,
| so you don't lure other people to copy what you copied and refer
| you as the author).
|
| It is a major PITA at least for me.
| stagas wrote:
| What about a law that makes all code available but then
| requires you to use a portion of your earnings to compensate
| the people their dependencies you used?
| dgb23 wrote:
| Reading many of the comments here I feel like one important thing
| is being left out that is not related to legal, but to social
| issues:
|
| Who is on the side of open source? Where are the big, powerful
| institutions and companies that deeply care about authors and
| communities providing free software that so many of us rely on?
| olalonde wrote:
| I'm going to make a bold prediction: no one will ever lose a
| copyright lawsuit due to usage of Github Copilot generated code.
| The code snippets it produces are too small or trivial to qualify
| for copyright infringement.
| ModernMech wrote:
| CoPilot is a new technology, and smallish snippets of code are
| all it is capable of at this point. Microsoft will surely work
| to expand its capabilities to produce larger and more complex
| programs, don't you think?
| janosdebugs wrote:
| It'd be nice to see some proof here. Copyright is not absolute
| and does not extend, for example, to things that have no
| creativity in them. There are only so many ways to write a for
| loop or an if condition. Training an ML model from a large body
| of code IMHO violates copyright no more than any of us reading
| code and learning from it, as long as GH Copilot doesn't spit out
| code that's exactly the same as something already existing.
| madrox wrote:
| I don't think any professional community is aligned on how to
| think about ML-generated content yet. We don't know how to
| apportion rights between the data owner, the model owner, and the
| end user, and I don't think existing copyright law is ready for
| it. At least for software, I think the way forward is for the
| next generation of software licenses to explicitly state whether
| the code can be used to train ML models and what those models can
| be used for. Without explicit language, we'll be squabbling over
| interpretations of fair use.
|
| There's going to be some big cases here. It's going to end up in
| the Supreme Court sooner or later, and if it were to go there
| today I think I know what they'd say.
| [deleted]
| LeonTheremin wrote:
| And social media sells ideas other people thought.
|
| Copilot is limited to public code now, but it may easily be
| trained on non-public code - albeit this probably won't be for
| sale to the public.
| HeavyStorm wrote:
| williamcotton wrote:
| Should the snippets that Copilot is regurgitating be considered
| for copyright in the first place?
|
| It seems akin to trying to copyright a certain drum pattern or
| chord progression.
|
| Also, the history of the GPL, MIT, commercializing lisp machines,
| Symbolic, infighting, etc... seems a very different context than
| Copilot so I am having difficulty seeing the systemic problems
| that tools like this encourage.
|
| There is of course a surface level similarity in that a
| corporation is profiting from IP in the public domain but the
| devil is in the details.
| Proven wrote:
| SMAAART wrote:
| Once again Innovation challenges IP.
| tsujp wrote:
| Copilot produces verbatim GPL'd code. It's also a closed box.
|
| Source: https://twitter.com/mitsuhiko/status/1410886329924194309
| JacobiX wrote:
| It's the same problem with those ML models, the other day someone
| generated a children's book using GPT3, turned out that there is
| a real children's book with the same name and a very similar
| content: The Very Lonely Firefly by Eric Carle.
| bartq wrote:
| Other thing I'm worried about: how to retract facts from ML
| model? I guess it's impossible, you need to retrain from
| scratch with part X removed from training set. Or... people
| could invent layered ML models similar to docker - each layer
| would be marked what data it was trained with. Then at least
| you'd have some cache of trained model to re-use in next
| training session. Nasty stuff.
| alpaca128 wrote:
| Or instead of inventing complicated layered ML models Github
| could just use each repo's license information to decide
| what's okay to use. Detecting licenses is already a feature
| on that site.
| afiori wrote:
| Many licenses requite attribution, which would be hard to
| track.
| icoder wrote:
| Interesting, it's a big question I've had for a while, how
| 'original' stuff coming from these AI systems is, and also the
| distribution of uniqueness over many answers. I haven't dived
| into it yet, but I find it surprising how little this comes up
| when these systems are discussed (ie here on HN).
|
| Does anyone even know? Can we even check? What if 1 in a
| thousand, or one in a million outputs is (very close to)
| something existing? I find this especially relevant when
| generating faces.
| eline43 wrote:
| There needs to be an update to either licenses or GitHub (and
| other) software directly, or even software terms of services,
| that gives the user an opportunity to opt-out of their data being
| used to train proprietary AI models.
|
| 'I don't agree with having an AI trained on/with my data.'
|
| IMHO, all other problems with copilot stem from this.
| shireboy wrote:
| I do feel these arguments are valid if a little overstated. Most
| devs have googled, found some code, and pasted it in without
| thinking about attribution. Doesn't make it right, but it is a
| question of how much code is being copied and how specific. For
| example, I peruse open repos to learn - I learned about the
| spread operator in JavaScript that way- doesn't mean every time I
| use it I need to attribute whatever repo I saw it in. But, yeah,
| if I copied a larger chunk and the owner wants attribution,
| probably wrong.
|
| I like the idea of having the bot automatically update a
| attribution file if it detects it's used licensed code. Seems
| like it would be fairly trivial. Also a robots.txt for repo
| owners to control automated use.
|
| Also, they should totally pay back a portion of revenue to the
| community and support the repos used to train. That seems like it
| would be a good PR move if nothing else.
| Aeolun wrote:
| > Also, they should totally pay back a portion of revenue to
| the community and support the repos used to train.
|
| Aren't they already doubling all Github sponsorship money?
| david_allison wrote:
| Not doubled any more, but they don't take a cut, and pay the
| processing fees for you.
| kachhalimbu wrote:
| I like this take. Copilot to me seems a glorified (very
| intelligent) auto-search-paste/autocomplete service. It is just
| mimicing what usual devs do which is to copy-paste code from
| StackOverflow/github for many mundane types of codes like for
| loops, mongo find queries, callback func definitions etc for JS
| devs for eg.
|
| The idea of auto-attribution if copilot surfaces licensed code
| is best because then it keeps the copilot user honest where the
| code is coming from and honor the original license.
| teakettle42 wrote:
| > It is just mimicing what usual devs do which is to copy-
| paste code from StackOverflow/github for many mundane types
| of codes like for loops, mongo find queries, callback func
| definitions etc for JS devs for eg.
|
| I'm genuinely disturbed to see how many people in this thread
| think that casual plagiarism is the norm for "usual devs".
| Aeolun wrote:
| Dunno what devs you work with, but I've someone care
| literally never.
|
| None of the code I work on is public, so attribution is
| pointless in the first place.
| ParetoOptimal wrote:
| > I'm genuinely disturbed to see how many people in this
| thread think that casual plagiarism is the norm for "usual
| devs".
|
| I'm disturbed it is likely the reality.
| shireboy wrote:
| Again, I get the argument, just think it's overstated.
| First, when referring to stack overflow and blogs,
| generally, that's intentionally shared with the express
| purpose of people copying it- hopefully while learning from
| it at the same time. Second, again with some code bits it's
| not really plagiarism any more than all iambic pentameter
| is plagiarizing Shakespeare.
|
| Devs often look at code to see basic syntax, understand
| algorithms, etc. There is absolutely nothing wrong with
| this. One should draw a line somewhere, but to say I need
| to attribute [...somevar] every time I use it because I
| happened to see it one time on a blog post is silly.
|
| A thought experiment may help: Scrape Github for all unique
| strings longer than X and store in a file with a timestamp
| and owner. How large does X have to be before attribution
| is required? If not length, then how do you determine
| whether attribution is required?
| HumanReadable wrote:
| Sorry for the unproductive tone of this comment, but there's
| something about the attitude of this tweet that really grinds my
| gears.
|
| Any time someone invents something new and incredible, there's
| always a crowd of negative nancies eager to discredit and explain
| why the invention is nothing new and a detrement to society.
|
| I don't understand why someone would willingly share their code
| on github where it is publicly available just to complain when
| others make use of that knowledge.
|
| 'co-pilot just sells code other people wrote' is such a
| ridiculous understatement of what co-pilot does. Instead of
| marvelling at the human ingenuity that went into creating it,
| they sneer at the audacity of openAI to do something without
| first asking their permission.
| Sakos wrote:
| I share my code without a license because I want others to be
| able to see how I solved things. However, this doesn't mean I'm
| okay with wholesale copying my code. If it's some random guy,
| then whatever. If it's a corporation like Microsoft, then yeah,
| I have a problem with it. Under German law, the code is legally
| not allowed to be reproduced or used without explicit
| permission even if it doesn't have a license. I retain
| ownership of it until and unless I explicitly relinquish my
| ownership rights.
| paulcole wrote:
| > Under German law, the code is legally not allowed to be
| reproduced or used without explicit permission even if it
| doesn't have a license
|
| This is nuts. How can anbody be expected to both know that
| you're German and German law when you post on an
| international website?
|
| Or is this a German law that exists to prevent other Germans
| from doing things but that the rest of the world scoffs at?
|
| https://choosealicense.com/
| solar-ice wrote:
| You're expected, wherever you are, to look into where any
| code you use comes from and what legal rights you have to
| use it. (The author not offering you a license means you
| can't use the code, nearly anywhere in the world - pretty
| basic Berne Convention stuff.)
|
| This is the legal expectation in general, not just for
| software - you can't just come across a design for a neat
| widget somewhere and start using it in your product,
| there's probably both copyright and patent on it. Software
| isn't special. Not everything in Github can be copied into
| your code verbatim.
| falcolas wrote:
| That's how us law works too. Works are automatically under
| copyright, even if you don't say so. It needs a license to
| lessen the copyright restrictions.
| giaour wrote:
| US law is pretty similar in this regard, isn't it? If you
| don't have a license for a particular piece of code, you
| can't use it without the author's/copyright holder's
| permission, even if you found it posted online.
| Xunjin wrote:
| Well, it depends on where you post it, right? Because if you
| are using a GitHub which probably is US based, you follow the
| laws related to US?!
|
| Demanding that the law of a country should be followed by
| another is totally no sense. They can agree, make agreements
| about it, and even take legal action to the Highest court, so
| it could be evaluated, but using your nationality as an
| argument of what you can do, it's just plain wrong.
| Sakos wrote:
| https://choosealicense.com/no-permission/
|
| I always find it weird how people respond to my comments.
| Why didn't you check what the US law is like for source
| code? A lot of places have similar laws around source code,
| primarily in the West because of efforts to normalise laws
| across countries, driven by US efforts. And other
| countries? Well, it's the same for any kind of IP. Either
| the country has strong IP law and you have the resources to
| pursue an issue or not and you can't do anything about it.
| hansword wrote:
| If I enter 'Mickey Mouse' into an ML-TTI thing like Craiyon
| (Dall E mini) do you think I will be able to sell the resulting
| image on a Tshirt?
|
| No, I won't, because Disney has fancy lawyers, the average open
| source developer hasn't. What you are saying is: Screw little
| people, let M$ make their money.
|
| Either copyright is for everyone, or for no one. I prefer the
| latter, but this is not the world we live in.
| fonix wrote:
| This is more like entering "cartoon mouse nose" into Craiyon
| though. You're getting incohesive code snippets returned to
| you based off a single line (appropriate word for code and a
| drawing).
| jimnotgym wrote:
| Isnt this an indictment of the justice system rather than the
| big firms.
|
| I once heard this quote, "English justice is open to all, in
| the same way that The Ritz [very expensive hotel] is open to
| all."
| gilrain wrote:
| The useless justice system has been engineered by the firms
| for their benefit.
| hourago wrote:
| There big difference is that by copying Micky Mouse you are
| hurting one of the most known and very powerful corporations
| in the world, by copying code you are just hurting open
| source projects and individual developers.
|
| It should not be different, or if anything, it should be
| worse to punish people with less resources. But here we are.
| lobocinza wrote:
| Plagiarism isn't new or incredible.
| the_gipsy wrote:
| > share their code on github where it is publicly available
| just to complain when others make use of that knowledge
|
| I put a fucking license on it so that it doesn't get abused by
| some fucking corporation. Jesus Christ, it's not hard to
| understand.
| rockbruno wrote:
| My problem with this conversation is how we can have a 200
| comment thread without anyone providing any kind of proof to
| these claims. Is there any instance of this bot printing an
| actual copyrighted algorithm instead of a mundane
| uncopyrighteable piece of logic?
| sascha_sl wrote:
| One of the earliest examples was Copilot printing Quake's
| fast inverse square root verbatim, including swearing in a
| comment.
|
| Quake's source code is GPL.
|
| There are plenty more if you're willing to look.
| Xunjin wrote:
| The famous "burden of proof" fallacy. In the end, I'm eager
| to anyone who can prove it, sue them and see the results from
| it.
| dgb23 wrote:
| There are examples of it providing literal copies of code
| without attribution etc.
| [deleted]
| pmarreck wrote:
| I think copilot is amazing. I don't care what, if any, of my
| code snippets it uses because I also gain from it by skipping
| boilerplate (as well as things like bash idiosyncrasies). Using
| it feels like I am working with dozens of invisible
| collaborators
| lin83 wrote:
| > Instead of marvelling at the human ingenuity that went into
| creating it, they sneer at the audacity of openAI to do
| something without first asking their permission.
|
| Something being cool doesn't exempt it from discussion of its
| ethics and certainly doesn't exempt it from legal consequences.
| Often what people call "disruption" is often just exploiting
| resources/people/their work in unsustainable ways until
| oversight is introduced.
|
| If CoPilot is copy/pasting large amount of code with unknown
| licenses, that is a large and real risk for users aside from
| violating open source projects licenses.
| leereeves wrote:
| > Something being cool doesn't exempt it from discussion of
| its ethics and certainly doesn't exempt it from legal
| consequences.
|
| Indeed. The heist in Ocean's Eleven was cool, but it was
| still theft.
| moffkalast wrote:
| Moreover it's a genuine danger for non-hobbyist developers
| since you could be including stolen code into a market
| product.
|
| Even including something banal like Linux is already
| problematic since it's GNU licensed, which by extension makes
| your entire project GNU licensed and you can't keep the
| exclusive rights to it.
| ryukafalz wrote:
| Just to clear this up, since I've heard this a lot before:
|
| > since it's GNU licensed, which by extension makes your
| entire project GNU licensed and you can't keep the
| exclusive rights to it
|
| This is incorrect. Including GPL code in your product
| cannot automatically relicense your code. It's just a
| copyright violation if your product's license isn't GPL-
| compatible and you don't abide by the GPL.
| OrwellianTimes wrote:
| Fully agreed. It's just people getting mad and jealous but hear
| me out.
|
| Copilot is NOT SELLING coed other people wrote, it is simply
| acting as a curator to show you all the solutions people HAVE
| WRITTEN for free.
|
| Copilot does NOT write entire programs, it's simply an
| assistant. And there is not much copyright you CAN apply to 3-4
| lines of generally understandable code.
|
| I've used Copilot and am actively paying for and I have not
| seen many cases where it's generating bad code. It's only there
| to remove boilerplate and common problems, not there to write
| entire applications.
|
| Why are people getting so salty?
| boesboes wrote:
| Because they _are_ verbatim copying code and not respecting
| the license. It's not that complicated.
|
| Github knows better, can do better and should.
| olalonde wrote:
| Do you have an example of Github Copilot doing that? Like a
| snippet of code generated by Copilot and a link to the
| original source code.
| falcolas wrote:
| An example posted here on HN.
|
| https://news.ycombinator.com/item?id=27710287
| olalonde wrote:
| Thanks. Personally, I feel like such small and widely
| used mathematical algorithms should not be copyrightable
| (or using them should fall under fair use). It even has
| its own Wikipedia page[0], where the source code is also
| reproduced without copyright notice.
|
| [0]
| https://en.wikipedia.org/wiki/Fast_inverse_square_root
| falcolas wrote:
| It's the verbatim replication of the comments that makes
| this a damning piece of evidence against the "it's not
| copying code, it's an AI" argument.
| olalonde wrote:
| Yes, it is clearly copying code from Quake, I wasn't
| denying that.
| zzo38computer wrote:
| I also implemented this algorithm in MMIX:
| % Constants FISRCON GREG #5FE6EB50C7B537A9
| THREHAF GREG #3FF8000000000000 % Save half of the
| original number OR $2,$0,0 INCH $2,#FFF0
| % Bit level hacking SRU $1,$0,1 SUBU
| $0,FISRCON,$1 % First iteration FMUL
| $1,$2,$0 FMUL $1,$1,$0 FSUB $1,THREHAF,$1
| FMUL $0,$0,$1 % Second iteration FMUL
| $1,$2,$0 FMUL $1,$1,$0 FSUB $1,THREHAF,$1
| FMUL $0,$0,$1
|
| (Note this assumes that the input number is not too
| small; if it is, then it will not be possible to compute
| half by this algorithm. Also, like with the original
| code, the second iteration may be omitted if desired.)
|
| (This comment and the MMIX code it contains, and all
| other comments that I wrote on here, are I agree release
| it to public domain.)
| nerdponx wrote:
| Both things can be true. It's clear that it violates the
| licenses of many software projects. But I do agree that
| denigrating it as "just selling other peoples code" is missing
| the whole point of the product and of what you pay for when you
| subscribe to it.
| nixpulvis wrote:
| You should read more about peoples ideologies and philosophies
| of Open Source.
|
| One big reason I support it is because it grants me the right
| and ability to change things I need/want to change.
| B1FF_PSUVM wrote:
| > negative nancies
|
| Not bad for everyday use - I like "nattering nabobs of
| negativism" (as scripted by William Safire), but it is really a
| bit over the top.
| rambojazz wrote:
| Sounds like they're not selling any of your code
| barthvr wrote:
| Copilot access is $10/month.
|
| Think about how Napster was treated back in the day, or
| torrent websites. You pay to access some copyrighted content.
| Is it legal ?
| throwoutway wrote:
| I hear you, but this isn't a "marvel at this free open clever
| academic thing we built"
|
| It's a product by a business. Why is that not open to
| criticism?
| meheleventyone wrote:
| They own their code and it either has a license for use or is
| implicitly rights retained if not. If Copilot regurgitates
| their code, from a project that is public but with a non-
| permissive license they are having their IP rights violated so
| are totally correct in being unhappy about that.
|
| Just because you've made something cool doesn't give you the
| right to harm others in the process.
|
| If MS or OpenAI don't think this is the case then they should
| have also included their private repositories.
| Zambyte wrote:
| > from a project that is public but with a non-permissive
| license
|
| Permissive or not doesn't matter. Public Domain or not is
| what matters. Permissive licenses still require you to
| propagate the copyright notice, which Copilot strips.
| nojs wrote:
| It doesn't really "regurgitate code" all that much in
| practice though. It's a super impressive product and these
| arguments seem more like people looking for an excuse to hate
| new, scary technology.
| core-utility wrote:
| Do we have any evidence that copilot _doesn 't_ check/filter
| by license?
| bayindirh wrote:
| There was a tweet by Nora Tindall (which is deleted) having
| a screenshot of a mail direct from GitHub stating that GPL
| code is included in the training of the Copilot and will
| indeed use it.
| samatman wrote:
| This is _in fact impossible_.
|
| All they could do is filter by the LICENSE file in the
| repo.
|
| Unfortunately for them, by law copyright and license are
| determined _by the authors_ and merely represented by a
| LICENSE file, which could be lying about both.
|
| The court isn't going to accept that excuse when this goes
| to trial.
| gjadi wrote:
| And you can have multiple licenses in the same
| repository, folders with copyright exceptions, etc.
|
| It's hard enough for us human to find our way in this
| mess, I've little hope for an AI.
|
| But maybe it's just the first step. The final step being
| able to sell an AI that understands Copyright management.
| I'm sure there is a big market for that.
| mroche wrote:
| I feel like a few guidelines and standards could help
| simplify a baseline process:
|
| 1) Require each repository to opt-in to be learned from.
|
| 2) Require any source file used for learning to have an
| SPDX license heading.
|
| 3) Have a list of approved permissive licenses to avoid
| any proprietary or copyleft arguments.
|
| Using SPDX headings as the explicit guide would solve the
| problem of different code content using a different
| license within a project. An example being QtWayland: the
| client pieces are Proprietary/LGPL/GPL, whereas the
| compositor parts are Proprietary/GPL. That's not
| something you'd know from the license files at the root
| of the project (and post-6.3 they use SPDX instead of the
| prior license template heading).
|
| Granted, this doesn't solve the problem of the chain of
| trust (is the individual publishing the code truly the
| copyright owner), but I think it would be a basic start
| for a program like this. The opt-in nature would make
| things... difficult, but I think that's a fair trade-off
| for something like this.
| gjadi wrote:
| Yes a standard would probably solve the issue.
|
| But until lawyers push for a standard that would make
| this part of their work irrelevant, I can't see how it
| could happen :)
| mnd999 wrote:
| And that is why this project should never have made it
| past the brainstorming session.
| meheleventyone wrote:
| One of the (ex?) programmers from Valve managed to get it
| to spit out parts of the Source engine verbatim. He posted
| a Twitter thread yesterday I believe.
| leakbang wrote:
| Can you post the link to that?
| meheleventyone wrote:
| Sure: https://twitter.com/ChrisGr93091552/status/15397316
| 329318031...
| dekhn wrote:
| 3 lines of fairly generic code?
|
| That's not what copyright is protecting.
| meheleventyone wrote:
| Just for the record I was providing some evidence to
| support this question: "Do we have any evidence that
| copilot doesn't check/filter by license?"
| dekhn wrote:
| I mean, even if the license was placed on the code, that
| doesn't mean, if it's not protected by copyright, then
| it's fair game for copilot to scrape, learn from, and
| emit variations of, the code.
|
| I believe github's lawyers would have had hundreds of
| hours of dicussion about this and at this point, they
| believe they are in the right, and anybody who disagrees
| should use the legal system to resolve the matter.
|
| In the meantime, what it is and isn't doing wrt licenses
| seems to be poorly understood externally.
| mustyoshi wrote:
| Does that prove it ignores licenses or does that imply
| the source engine exists verbatim (minus licenses)
| multiple times on Github?
| micromacrofoot wrote:
| just because someone else ignored the license doesn't
| mean github is free to blindly vacuum that up
| meheleventyone wrote:
| If it's minus a license then it should be assumed that
| rights are retained (in the same way you can't just take
| ownership of an image you find on the internet) so if it
| were filtering it shouldn't take code from repo's without
| explicit and favorable licenses. If it is taking code
| only from repo's with permissive licenses (e.g. MIT) then
| why aren't they following the attribution requirements?
|
| I don't think you can have your cake and eat it on this
| one.
| moffkalast wrote:
| If I steal some code and put it on Github under MIT that
| doesn't really make it MIT, I'm just lying that it is. If
| Copilot then uses that it's still in violation of the law
| I'd assume (ignorance doesn't exonerate you etc.). So
| they'd have to verify on a case by case basis, which they
| obviously haven't given the volume of data they had to
| feed the thing.
|
| It's kinda shocking that they think they can sell this,
| even providing it for free is extremely sketchy but at
| least complies with BSD/GNU/CC licensed stuff I guess.
| Hamuko wrote:
| And especially with such blanket statements as "the code
| you write with GitHub Copilot's help belongs to you".
| lupire wrote:
| Why do you think that the recipient is responsible for
| verifying that no one else has copyright of code they
| recieved under license?
|
| Is every product user liable when a vendor ships some
| stolen code?
| Closi wrote:
| > Is every product user liable when a vendor ships some
| stolen code?
|
| The user would be unlicensed, and in lieu of the vendor
| resolving this then the user would need to purchase
| licences to continue using the software legally (ie if a
| vendor gives you a pirate version of photoshop, you can't
| just use it forever just because someone sold it to you).
|
| There are usually clauses in enterprise software
| agreements that attribute liability for unlicenced
| components to the vendor for this reason. But ultimately
| if there isn't a contract or the vendor vanishes, the
| user will need to go get a licence.
|
| If you want to test the theory, I'll send you a few
| images to put on your website, and when you get a claim
| through from the copyright owner you can try to argue
| that I sent it across without a copyright notice so I am
| liable ;)
| ryukafalz wrote:
| > Is every product user liable when a vendor ships some
| stolen code?
|
| No, but the difference is the users of a product are
| typically not making and distributing copies. That's not
| the case if you use someone else's code in your project.
| Closi wrote:
| It would prove that it doesn't honour all licences - just
| because the source code exists on Github without a
| licence doesn't automatically grant a licence to Copilot
| from a legal perspective.
| jstummbillig wrote:
| In light of this potential new paradigm it's bewildering how
| people still manage to focus on the license of training
| material as if it even moved the needle in this context, even
| a little bit.
|
| OSS knights: THE LICENSE.
|
| MS: Aight, I guess we have a few lines of hq src to help out
| with...
|
| Github: Same.
|
| Other OSS people: We really don't care one way or the other.
|
| As long as the word of the lincense was upheld for another 2
| weeks before it ceased to matter for the rest of all time.
|
| Jesus fucking christ. People. I get that oss licensing is
| dear to the collective hn heart - but, at best, it's
| completely irrelevant in regards to where this will
| inevitably lead, regardless of current questions/issues with
| license violations. You can (if all the repos of MS and
| Github are not enough to train this thing on, which is a
| laughable idea) even fucking buy additional source code if
| that's what it takes to strengthen Copilots legal foundation.
| The cost is insignificant. People will be happy to sell for
| super cheap. It's a non issue.
|
| Why do you wilfully choose to be distracted instead of facing
| and thinking about the future together?
| causi wrote:
| Unfortunately the way IP law works, at least in the US, is
| that you can use essentially whatever you want as training
| data and it's up to the user to make sure none of the
| generated code violates licensing agreements.
| SahAssar wrote:
| If that's the case then GH/MS should at least disclose that
| for the code generated to actually be legal you have to
| hunt down the actual source (will be hard in a lot of
| cases) and check the license against your own license.
| monocasa wrote:
| Can you point to case law backing that up?
| causi wrote:
| Sure.
|
| https://jtip.law.northwestern.edu/2021/05/28/copyright-
| issue...
|
| _However, even if infringement occurs during machine
| learning, training AI with copyrighted works would likely
| be excused by the 'fair use' doctrine.[ii] For example,
| in Authors Guild v. Google, Inc.[iii], Google had scanned
| digital copies of books and established a publicly
| available search function. The plaintiffs alleged that
| this constituted infringement of copyrights. The Second
| Circuit held that Google's works were non-infringing fair
| uses because the purpose of the copying was highly
| transformative, the public display of text was limited,
| and the revelations did not provide a significant market
| substitute for the protected aspects of the originals._
| monocasa wrote:
| That's training for search to lead to a full copy of the
| original work with citations, not training for
| regurgitating verbatim chunks of copywritten works to be
| incorporated at scale into other copyrighted works while
| obfuscating their original source.
|
| The Second Circuit's tests listed in your citation
| specifically fail in this case. It's not highly
| transformative since it's just regurgitating snippets to
| be used in other competing works rather than applying the
| body of works to a different domain. And it's
| specifically to provide a market substitute for the
| protected aspects of the original works.
|
| Additionally, none of this says 'its all great and it's
| on the user to figure it out'.
| causi wrote:
| In the US copyright violation is a strict liability
| statute. Regardless of whether or not a court directly
| confirms or denies Microsoft's right to use code in that
| way, the end developer is still liable for whatever he or
| she uses.
| monocasa wrote:
| But that's orthogonal to being able to use whatever you
| want as training data for an AI.
| causi wrote:
| Has the exact issue of a remixing AI been tested in
| court? No. But everything even remotely similar has been
| deemed legal. Considering the legal and financial backing
| on both sides of the issue I expect it to go Microsoft's
| way even if it does end up before a judge.
| monocasa wrote:
| It clearly fails the
|
| > the revelations did not provide a significant market
| substitute for the protected aspects of the originals.
|
| test of your cited case law though. The courts clearly
| drew a line at developing AI to inject snippets of copy-
| written works in similar copy-written works. And in
| context it would be the developer of the AI at fault (in
| addition to the end users who also used it to infringe
| other works in the creation of partially derived works;
| multiple parties can be at fault).
|
| Basically the courts are making it pretty clear that they
| would have been against what Google had made if it were
| suggesting phrases in a plugin for a word processing
| program to create books that would compete with the
| original books. But being a separate domain of simply
| collating existing books and providing better search for
| their corpus (which led you to the original) was allowed.
| lupire wrote:
| Did you just make that up? Github is distributing the
| copied code to users.
| monocasa wrote:
| They did make it up; their cited case law says nothing of
| the sort.
| causi wrote:
| _Did you just make that up?_
|
| Unfortunately not. It's really stupid.
| zarzavat wrote:
| The entire point of a fair use right is that you _don't_ need
| the copyright owner's permission to be able to exercise it.
| Fair use allows you to do things that the copyright owner
| doesn't like.
|
| Is fair use on a massive scale still fair use? Courts
| generally think so, otherwise Google would have been out of
| business a long time ago.
| izacus wrote:
| Unless Copilot is "commenting" or "parodying" the code
| you've wrote, it's not fair use. Copying and using the code
| in another project sure as heck IS NOT fair use.
| jrumbut wrote:
| I don't think releasing a commercial product that copies
| people's code without complying with the license is
| anywhere near fair use.
|
| Also, the open source community has far less leverage to
| apply pressure to Google than it does to GitHub. We may be
| able to do something about this.
| RHSeeger wrote:
| It seems fairly similar, at least to me, to a search
| engine copying snippets of other people's web sites and
| displaying them on a page. Admittedly, there's still some
| discussion as to whether or not _that_ is fair use, but I
| think enough of the population think it is (with many
| news organizations disagreeing).
| CrazyStat wrote:
| > I don't think releasing a commercial product that
| copies people's code without complying with the license
| is anywhere near fair use.
|
| The whole point of fair use is that the license doesn't
| matter. You can have a license that says I'm not allowed
| to use what you wrote for any purpose ever and I can
| _still_ use it under fair use.
| Longlius wrote:
| IANAL but fair use is primarily about the public
| interest. What public interest is served by allowing
| proprietary software vendors to copy GPL code that's
| reserved for the commons?
|
| I don't really think this argument passes muster.
| jrumbut wrote:
| Yes but among the four factors that are used to evaluate
| fair use claims are whether it is being used commercially
| (it is) and how it affects the market for the thing that
| was copied (it clearly would since one way code is used
| is being imported by other code, if Copilot didn't insert
| my code into the new app, they might very well use my
| open source project that provides the same code).
| CrazyStat wrote:
| I wasn't staking a position on whether Copilot is fair
| use, just pointing out that fair use doesn't care about
| license.
|
| That said, copilot itself is _not_ a replacement for your
| open source project that it was trained on. The code it
| generates may or may not be, but that 's probably not
| Github's problem as far as copyright law is concerned.
| pmarreck wrote:
| > I don't think releasing a commercial product that
| copies people's code without complying with the license
| is anywhere near fair use.
|
| It's just automating the copying and pasting (and slight
| reworking) of boilerplate code that would normally take
| me much longer to do, especially when I am working with a
| language I'm less familiar with but is necessary for my
| stack. I've literally never seen it suggest code that is
| more or less almost exactly what I would have come up
| with given a lot more time. In essence, it eliminates
| tedium- exactly the point of all of programming: Work
| elimination.
| kybernetikos wrote:
| > otherwise Google would have been out of business a long
| time ago.
|
| I do think there are ethical questions around whether it's
| right for google to digitise physical books without the
| permission of the authors, and keep them on their servers
| and make money from them without recompensing the authors.
| That's something an individual would not get away with
| doing, so it seems wrong that it's OK for google.
| akagusu wrote:
| When co-pilot reproduce substantial parts of someone else
| code without respecting the license terms, it is not fair
| use,it is just a disguised license abuse.
| meheleventyone wrote:
| Is this fair use? I don't think that's been established
| yet. And if it is why didn't MS and OpenAI train it on
| their private code repositories? Fair use for thee not for
| me isn't very in keeping with the spirit of that claim.
| komadori wrote:
| Gosh, can you imagine if they had trained it on their
| internal source code repositories and it constantly
| suggested using Hungarian notation for your variables?
| ;-)
| jimnotgym wrote:
| Just because there has not been a test case yet does not
| make it illegal! If MS think it is fair use then they are
| free to go ahead. Business is all about recognising and
| assesing risks like this.
| tremon wrote:
| And even if there had been a legal test case, that does
| not make it moral! If people think this is socially wrong
| then they're free to argue their case. Business is all
| about ignoring ethical quandaries if it gives them an
| edge.
|
| "Microsoft does it, therefore it must be right" does not
| a sound argument make.
| aaaaaaaaata wrote:
| > Business is all about ignoring ethical quandaries
|
| No, _businesses_ are -- not business. Not necessarily...
| jtdev wrote:
| jimnotgym wrote:
| I sometimes read people's open source code on github and
| use the ideas from that to develop my own ideas. In fact
| sometimes I copy and paste short passages and then rework
| them. I also employ a team of people who may do the same.
| Is that fair use, yes of course it is. Is co-pilot
| automating that fair use, I would say so.
| grayfaced wrote:
| Or alternately, "I sometimes listen to other people's
| songs and use those ideas to develop my own. In fact
| sometimes I copy and paste short melodies and then rework
| them."
|
| Courts have held that it doesn't apply to music, why do
| you think different rules apply to code?
| [deleted]
| aahortwwy wrote:
| Microsoft's internal policies don't allow their employees
| to do this without legal approval.
| aaaaaaaaata wrote:
| So they don't ask.
| leereeves wrote:
| I think aahortwwy's point was that Microsoft won't permit
| their own employees to do what Copilot does.
| zzo38computer wrote:
| > I sometimes read people's open source code on github
| and use the ideas from that to develop my own ideas.
|
| Yes, I too, and probably many people will do.
|
| > In fact sometimes I copy and paste short passages and
| then rework them.
|
| This I usually don't unless I check the license first.
| (Everybody ought to be allowed, but sometimes the license
| might not be.)
| jcelerier wrote:
| What you are doing is very certainly illegal
| nirvdrum wrote:
| Many people would claim what you're doing is a derivative
| work. I'm not sure the "of course it is" is very clear-
| cut (at least in the US). I've worked at big companies
| that have lawyers that care very much about this topic
| and what you're describing is prohibited. But, maybe it's
| different if you're not distributing your source.
| zarzavat wrote:
| > I've worked at big companies that have lawyers that
| care very much about this topic and what you're
| describing is prohibited.
|
| They are doing this to make sure that any lawsuit can be
| easily dismissed. It has nothing to do with the legality
| of the action (which sounds like fair use as the parent
| described it), and everything to do with the expense of a
| potential lawsuit compared to the cost effectiveness of
| simply telling developers "don't do that".
|
| Most people think that the law has two shades: lawful vs
| unlawful. But the more practical distinction is expensive
| lawsuit vs dismissed lawsuit. This is the lens through
| which corporate lawyers see copyright and it might
| explain why so many programmers think that copilot is
| "obviously" breaking the law and "stealing" their code.
| nirvdrum wrote:
| If the usage was very clearly fair use, there'd be no
| need to be defensive about it; the case could be
| dismissed trivially. In reality, the question would need
| to be sorted out in court.
|
| Questions of derivative works and fair use come up fairly
| frequently even in the open source world. This isn't
| solely a question of corporate lawyer posturing. I don't
| know any copyleft authors that would be okay with someone
| copying & pasting their code, making trivial changes, and
| saying it isn't a derivative work. Of course, their
| understanding of the law may be flawed. You'll get to
| find out in court.
|
| You're right. A lot of this boils down to how much you
| want to spend in court proving your usage is just under
| fair use. We've moved beyond the question of ethics if
| you're intentionally violating a project's source license
| and relying on fair use to do whatever you want with the
| code. If you want to poke someone with a stick, you can't
| be surprised when they hit back. I contend what the OP
| described isn't _clearly_ fair use (note I 'm not saying
| that it _clearly isn 't_ fair use either). It ultimately
| doesn't impact me because I'm just not going to copy &
| paste code from projects without attribution and
| following the license, but I'd be worried about anyone
| reading that comment as objectively true.
| matharmin wrote:
| For public repositories, whether copying small parts of
| code is considered fair use is just a copyright question.
|
| On the other hand, if you copy from private repositories,
| it quickly gets into the territory of stealing trade
| secrets.
| dkersten wrote:
| Fair use is quite narrowly defined though. This doesn't
| look like fair use to me, especially when its been shown
| that copilot does, at least sometimes, spit out code that
| is completely unchanged from the source material, without
| advising the user of any license requirements (most
| permissive licenses require at least attribution).
|
| The SCO vs IBM lawsuit was over only a few lines of code,
| after all.
|
| I cant use a derivative of Mickey Mouse in my product, even
| if I change his colour and give him a hat, even if these
| changes were made by an AI. Why would it be different for
| code? I cab only use Mickey Mouse as fair use if its done
| for a specific barrow set of proposes (satire, news
| reporting etc).
| lupire wrote:
| "on a massive scale" is one of the legal definitions of
| unfair use.
| bayindirh wrote:
| An automated system will devour all my code, which is under
| a case-tested copyleft license, and regenerate its parts in
| any place, without respecting the license terms, and call
| it "fair use".
|
| I have two questions:
|
| 1. Why have licenses, then?
|
| 2. What if I just use leaked sources of closed source
| software and call it fair use?
| Hamuko wrote:
| What about us that are not Americans?
| zarzavat wrote:
| Then you need to check the laws in your country. But that
| is nothing new to copilot. Copyright laws vary
| _significantly_ from country to country.
| rurban wrote:
| Not really. They are mostly the same across countries: ht
| tps://en.wikipedia.org/wiki/International_copyright_treat
| i...
|
| There are just minor deviances, not relevant to this
| case, such as how long Disney bullied the countries to
| protect a work.
|
| Software is usually considered a work. The AI needs to
| know if has permissions to copy and use the code, and
| then offer derived work on the proper terms and
| conditions. copilot doesn't do that. It might copy GPL
| code into non-GPL code, thus violating the GPL license,
| thus being an extreme risk.
| tzs wrote:
| What are examples of Disney getting countries to extend
| copyright terms?
|
| In the US there have only been two extensions of
| copyright terms since Disney came into existence.
|
| The first was in 1976, as part of a major overhaul of US
| copyright law to update the previous law (from 1909) to
| take into account the large changes in technology since
| then, and to make US law work more like the rest of the
| world to pave the way for the US later joining the Berne
| Convention. The changes for Berne compatibility included
| longer terms.
|
| I assume Disney did support this, but only because as far
| as I can tell it had pretty widespread support. It had
| enough support that it would have passed even if Disney
| had adamantly opposed it.
|
| The second was in 1998, and that was specifically a term
| expansion (as opposed to a term expansion like that of
| 1976 that was a side effect of harmonizing US law with
| the rest of the world). Europe had expanded terms a few
| years earlier, so the 1998 change in the US might have
| been motivated at least in part by harmonization, but I
| don't think the differences in terms between the US and
| the EU would have been enough to get it passed without
| some major interests pushing for it, so it is probably
| fair to give Disney a good part of the credit or blame
| for this one.
| wowokay wrote:
| I think you might be missing the point of their frustration.
|
| Lots of companies do not put their code in public
| repositories, granted I understand the perspective of
| violating a license, but the point is if you don't want your
| code used by someone else (even with the risk of not getting
| credit, don't know why that matters) then don't make your
| repo public period.
|
| To that point, what's to stop GitHub from making a policy
| that states: "All public repositories will be utilized in AI
| training"?
| ryukafalz wrote:
| > even with the risk of not getting credit, don't know why
| that matters
|
| The point is that it's not respecting the license, not just
| that it's not giving "credit". If I release code under a
| GPL license, I damn well don't want someone using that code
| under a license that's not GPL-compatible, no matter how it
| got there.
| jillesvangurp wrote:
| I'm sure the MS lawyers thought long and hard about this and
| are patiently awaiting any actual lawsuits with confidence in
| their position. It would be very hard to prove ownership of
| any snippets. To the point where you can argue that it is
| just fair use and to the point where companies would think
| long and hard before committing any resources to fighting MS
| on this in court at great expense.
|
| I don't think that will happen but it might be interesting if
| it did.
| amelius wrote:
| It will stop being fair use when someone makes an AI that
| creates cartoon characters based on the figures in Disney
| movies.
| vlovich123 wrote:
| MS is unlikely to be sued here because the infringement
| claim wold be against their users and my guess is the
| license indemnifies them against you suing them for defects
| in the tool you use (ie use at your own risk and if you get
| sued you agree you won't sue us).
| aaaaaaaaata wrote:
| > companies would think long and hard before committing any
| resources to fighting MS...at great expense
|
| This is the end of Microsoft's actual calculation.
| kop316 wrote:
| > It would be very hard to prove ownership of any snippets.
| To the point where you can argue that it is just fair use
| and to the point where companies would think long and hard
| before committing any resources to fighting MS on this in
| court at great expense.
|
| I would like to point you to this:
| https://twitter.com/mitsuhiko/status/1410886329924194309 HN
| Comments at the time:
| https://news.ycombinator.com/item?id=27710287
| drexlspivey wrote:
| Owning code snippets sounds ridiculous to me, like can I own
| this snippet? def average(*numbers):
| return sum(numbers)/len(numbers)
|
| if not is it because it is too small? what's the minimum line
| number that ownership kicks into? what if I change the
| function name and the variable names?
| bayindirh wrote:
| If that's under a copyleft license, I can't just copy &
| paste it under my non-copyleft licensed code and call it
| mine.
|
| That's as simple as that.
| sidlls wrote:
| It can't be that simple. The function in the GP is not an
| original idea and is far too simple to merit protection
| just by slapping a license on it.
| bayindirh wrote:
| I don't expect, or support, licensing that small amount
| of code, and suing everyone to oblivion.
|
| The point I'm trying to make is if something is under a
| copyleft license, you can't copy and paste it verbatim to
| something non-copyleft. It's _what the license says_.
|
| Also, to be pedantic, the function I'm commenting on is
| pure maths, and you _can 't license/patent mathematics_.
|
| On the other hand, if there's some magic sauce of doing
| something, let it be 25 lines, what will you say? It's
| just 25 lines, so you can't license it? To be more
| pedantic, I actually have an algorithm, which is around
| 25 lines and does something novel. I've published a paper
| on it.
|
| If I license the reference implementation with AGPLv3+,
| and you use it and close it, and if I can't go after you,
| what's the purpose of the license?
|
| You can read the paper and try to implement it. It's free
| in that regard.
| williamcotton wrote:
| It seems rather silly to me that such small innovations
| would be worthy of legal protections under either a
| copyright or copyleft license.
|
| Isn't there already precedent in other forms of IP, such
| as chord progressions in music, sentence length in
| literature, etc?
| dekhn wrote:
| Copyleft isn't really a good example. Let's talk about
| copyright. That fragment of code is not copyrightable on
| its own. Too small, too trivial.
| bayindirh wrote:
| Let's say I have 25 line function which does something
| novel and can be published as research (which I did, BTW,
| no joke), and I opened its reference implementation with
| AGPLv3+.
|
| Is it again too trivial?
| drexlspivey wrote:
| is 25 lines the limit then? do you count comments? can I
| codegolf a few lines to get below the limit?
| bayindirh wrote:
| > is 25 lines the limit then?
|
| I don't know. That's my function's length.
|
| > do you count comments?
|
| No comments, no blank lines.
|
| > can I codegolf a few lines to get below the limit?
|
| You bet. But, if you copy my reference implementation,
| you need to get the license as well.
|
| However, the research is on the open. Read it, implement
| it. That's no problem.
|
| But, CoPilot is not reading my paper. It's reproducing my
| function verbatim, which is under a license which has
| share-alike mechanics.
| trasz wrote:
| Not really. You can't copyright a trivial snippet, same
| way you can't copyright headers.
| bayindirh wrote:
| I've provided a more realistic and logical examples in
| this thread, please refer to them.
| cupofpython wrote:
| if you write it yourself, it's fine. if you directly copy
| it from somewhere you arent allowed to copy from, then it
| is wrong.
|
| There are no rules about the form of the code itself that
| governs whether or not someone owns it. Common sense
| applies. Sure you could "steal" very small, common, code
| snippets and get away with it; but that doesnt make it less
| wrong.
|
| When a commercial entity explicitly does it, however, some
| times we can catch them. Like if they do it through
| algorithms that we more or less know how they work - i.e.
| the algorithm is using advanced control flow logic to copy
| and paste from it's training data set and copyrighted
| material is in that data set
| drexlspivey wrote:
| Point is that you can ask 100 programmers to write an
| average function and probably most of them will come up
| with this answer verbatim. How can copyright law handle
| this? There is also the opposite problem, I can copy a
| complicated snippet and change the variable names. Am I
| absolved from liabilities now?
| cupofpython wrote:
| If they come up with it on their own, it shouldnt be an
| issue. Likewise, swapping the variable names does not
| absolve you from liability.
|
| Copyright really is not only concerned with what exactly
| is on the page, but also how you got there, and where the
| knowledge came from to get you there.
|
| What if I read your codebase, and then years later while
| programming for myself I inadvertently use solutions you
| came up with while thinking I came up with it myself?
|
| There really are no hard set rules, and this is something
| that is handled on a case-by-case basis based on whether
| or not a convincing argument can be made that you copied
| a novel idea from someone else and claimed it as your
| own.
|
| We can argue the semantics of it all we want, but the
| subject area is an active battleground. Typically it only
| matters when money starts to get involved, since no one
| usually presses the issue or gets involved with random
| personal projects. So when an enterprise level company
| leverages that lack of caring into a proprietary pay-to-
| use project that operates by copying and pasting code
| from copyrighted material, then it seems like a case
| might be able to be made for it.
| ipaddr wrote:
| Someone trademarked the word THE yesterday and a few common
| musical notes and your video gets banned
| highwaylights wrote:
| This seems disingenous.
|
| People don't have a problem that AI is being used in some form
| to provide the service.
|
| The complaint is pretty clearly that code is being lifted from
| repositories without attribution or compensation, and being
| redistributed into other applications.
|
| How impressive the work behind copilot is or is not really
| isn't relevant.
| tiborsaas wrote:
| I've made use of a ton of open source tools and have not paid
| any attribution or compensation. By made use of, I mean I
| used them as their intended purposes and not their source
| code. I have a FOSS OS, server, CMD tools and libraries
| powering my ideas, it's part of the deal that I don't have to
| pay.
|
| If I modify them I know what I have to do, but Co-pilot is
| somewhere in-between the two, it's abstracting knowledge from
| these codebases. We don't yet know how to deal with it
| properly, but this will change with time, that's why having
| these conversations are important.
|
| I think that AI models will gain a new legal state, whatever
| they learn will be considered original work if it's not
| repeating non-trivial work 1:1.
| moffkalast wrote:
| > it's not repeating non-trivial work 1:1
|
| But that's basically all copilot does? It's just a fancy
| compression system with a search function.
| tiborsaas wrote:
| No, it customizes the snippets to your context, the code
| is synthesized and not looked up in a db like a web
| search engine.
| ay wrote:
| I tried it for the first time today, so treat this with a
| grain of salt.
|
| https://twitter.com/ayourtch/status/1539928018138931200
| is my experiment. The code in question has a very
| specific format - it's C with a _lot_ macro sauce. I
| described the intent in the comment and pasted the
| includes lines. Then I started the #define of a unique
| looking token, and it added the lines with the correct
| boilerplate. What you see in gray is more boilerplate
| that it suggests when prompted.
|
| I would dare to assert that "xxxayourtchtestxxx" is not
| going to be in anyone else's code than mine.
|
| So you can see the example of copilot generating
| completely new code.
|
| Not saying it's 100% of what it does - but this side
| looks very useful.
|
| I also did a test with Rust: described a function
| canonicalizing MAC address, and then when it saw ![test]
| prompts, it started to make very passable unit tests for
| the function which was not even written yet - it was only
| the comment of what it would do.
|
| Also a massively useful lever to have, if it can do so
| consistently.
|
| My attempts to make it generate a bug-free
| canonicalization function didn't work - but it was
| interesting to see it try different approaches based on
| the existing test code (and no, they didn't always
| satisfy the tests, unlike one would expect :)
|
| So this angle is "pair programming with a creative
| novice", which also can be useful - it can give ideas to
| explore that you didn't think of.
|
| Of course this was all fairly trivial code, I do not know
| yet how it will behave in a more tricky situation.
| moffkalast wrote:
| But it kind of is when you think about it. Network
| weights are just a db written in an incomprehensible
| format and the synthesis part is searching and converting
| it back to readable data.
|
| Even if it changes the var names and formatting a bit,
| it's still at best highly derivative. And at worst it
| spits out the exact code verbatim.
| tiborsaas wrote:
| > Network weights are just a db written in an
| incomprehensible format
|
| That makes all the difference IMHO, its complexity makes
| it much more than "just a DB". The synthesis part takes
| into account the context also, so it does intelligent
| things automatically, a smart SQL query does not.
|
| My brain also works kinda like this. My knowledge is
| encoded in an incomprehensible format and I convert my
| knowledge into code based on the problem at hand.
| csee wrote:
| This is how it always works, though. Moderna is standing on
| the shoulders of centuries of cumulative human knowledge
| without compensating all the sources of that knowledge.
| Musicians learn from other musicians and imitate to an
| extent, which is why all the musicians in a genre sound very
| similar, and we don't see present day rappers compensating
| the previous generation of rappers.
|
| This is where some modest taxation comes in. To reallocate a
| slice of the output of value creation to its actual source in
| a rough kind of way wherever more direct compensation isn't
| feasible.
| cycomanic wrote:
| > Musicians learn from other musicians and imitate to an
| extent, which is why all the musicians in a genre sound
| very similar, and we don't see present day rappers
| compensating the previous generation of rappers.
|
| You clearly don't know how copyright around sampling works.
| Yes rappers are paying shitloads to previous generation
| musicians for samples they use.
| csee wrote:
| Sure, if we're talking about sampling, which is analogous
| to co-pilot copy and pasting chunks of code verbatim
| (which we've seen happen). But the complaints about co-
| pilot go far deeper than that. Quoting from the tweet:
| "it _just_ sells code other people wrote ". Do musicians
| "just" copy from all the people they've been inspired by
| and learned from?
| cycomanic wrote:
| What does "inspired" mean in the context of a computer
| program?
| Dracophoenix wrote:
| > This is where some modest taxation comes in. To
| reallocate a slice of the output of value creation to its
| actual source in a rough kind of way wherever more direct
| compensation isn't feasible.
|
| I was with you until this statement. The vast majority of
| society consumes, but doesn't create something new in the
| process. I'm bewildered as to why you think taxation is a
| solution rather than a disincentive towards creating. As
| far as compensating the giants upon whose shoulders most
| stand, there are plenty of vehicles for that: royalties,
| patents, copyrights, pensions, awards and prizes, paid
| fellowships, etc. These are relatively easy to calculate
| and write a contract for.
| jacquesm wrote:
| Yes, but those humans are humans, not machines. With
| machines the scale changes dramatically. Which,
| incidentally is something copyright law has addressed
| explicitly: if you mechanically transform at best you end
| up with a derived work.
| csee wrote:
| I don't understand the difference between Co-Pilot on the
| one hand and Moderna (on the shoulders of medical
| research) or SpaceX (on the shoulders of physics
| knowledge and cumulative rocket engineering knowledge) on
| the other. They all heavily use technology, automation
| and machines. I don't see where the distinction is coming
| from, and if there is a technical legal distinction, is
| it an ethically important one?
| lelanthran wrote:
| > I don't understand the difference between Co-Pilot on
| the one hand and Moderna (on the shoulders of medical
| research) or SpaceX (on the shoulders of physics
| knowledge and cumulative rocket engineering knowledge) on
| the other. They all heavily use technology, automation
| and machines. I don't see where the distinction is coming
| from, and if there is a technical legal distinction, is
| it an ethically important one?
|
| They are all in compliance with intellectual property
| laws? Seriously, that's a bloody big difference.
|
| Co-pilot is _not in compliance with many of the source
| code it is using!_
|
| Whether you like it or not, compliance with the law is
| necessary.
| meheleventyone wrote:
| There are thousands of novel decisions in the work of
| Moderna and SpaceX beyond their cultural starting points.
| Same thing with art. Copilot isn't inventing nor is
| DALLE-2 being artistic.
| jacquesm wrote:
| The distinction is a legal one: intellectual property can
| not be re-used without permission of the rights holder,
| be it a patent or a chunk of source code.
|
| And you can bet that SpaceX using physics knowledge and
| cumulative rocket engineering knowledge are very careful
| to either license the tech they use or be very explicit
| about documenting their own.
|
| That you can't see the difference is entirely on you,
| going 'against the flow' of society sometimes leads to
| change but more often it simply results in friction and a
| lack of comprehension.
|
| Keep in mind that open source is based on copyright law,
| and without copyright law the protections that open
| source offers would be gone.
|
| To give an extreme example: if you had a chunk of
| software that was constructed in such a way that it would
| spit out a complete copy of 'the Gimp' without the
| license file if you started to write an image processing
| program that would be a very clear case of copyright
| violation.
|
| If you then start breaking the Gimp down into smaller and
| smaller re-usable fractions at some point you might be
| able to argue that such a generic and oft used snippet
| should be free of copyright. But that only works as long
| as you then don't string together a whole pile of pieces
| that you each copied somewhere else, the whole idea is
| that your creation is an original one.
|
| Medical research (which quite often leads to patents,
| which I don't believe should be possible, especially if
| that research was publicly funded) and physics knowledge
| are of a different kind than copyrighted program code.
| The latter would be better compared to universally
| present language constructs and constraints, such as
| 'memory management', 'data manipulation' etc. Once you
| make those explicit in an implementation copyright
| applies.
|
| Or, to make another analogy: it's like comparing the
| skill of writing to the product of that skill. The skill
| isn't protected, but the output of the act of writing is.
| Xunjin wrote:
| An amazing argument and analogy, also I do agree about
| Medical research, being possible to patent a work which
| is publicly funded is an A*** move.
| Ygg2 wrote:
| > I don't see where the distinction is coming from
|
| Humans use reasoning. Copilot just guesses the likeliest
| next word.
|
| See the Quake's fast inverse sqrt code incident:
| https://twitter.com/mitsuhiko/status/1410886329924194309
| olalonde wrote:
| > Sorry for the unproductive tone of this comment, but there's
| something about the attitude of this tweet that really grinds
| my gears.
|
| FWIW the author appears to be a professional woke activist.
| ParetoOptimal wrote:
| You can be both a professional software developer and a
| caring human that considers ethics and exercises empathy.
|
| It's easy to convince oneself they can only be a professional
| developer to escape ethical responsibilities which require
| significant time and energy.
| olalonde wrote:
| Of course you can and you should. But going from the
| Twitter bio and personal website, it doesn't appear to be
| the case here. They're an activist who lives from
| soliciting donations and selling 30$ videos on how to be
| anti-racist (like, literally).
| ParetoOptimal wrote:
| > They're an activist who lives from soliciting donations
|
| Assuming this is the case, are you concluding they don't
| have time to be a real software developer?
|
| Maybe they were a professional developer, but now are an
| activist 50% of the time.
|
| > selling 30$ videos on how to be anti-racist
|
| Is the indictment here that they are a capitalist?
|
| My basic point is you seem to really want to dismiss
| their views you don't like by arguing they aren't
| credible rather than attacking their ideas.
| olalonde wrote:
| > Is the indictment here that they are a capitalist?
|
| No and they are actually anti-capitalist according to
| their Twitter bio.
|
| > My basic point is you seem to really want to dismiss
| their views you don't like by arguing they aren't
| credible rather than attacking their ideas.
|
| The comment I was replying to already did a good job at
| that, I was just adding some context.
| dxdm wrote:
| Doesn't mean what they're saying is wrong. Probably makes
| more sense to attack the substance of their argument, and
| not their bio.
| [deleted]
| olalonde wrote:
| The comment I was replying to already did a good job at
| that, I was just adding some context because it helps
| explain the attitude.
| hdjjhhvvhga wrote:
| > Any time someone invents something new and incredible,
| there's always a crowd of negative nancies eager to discredit
| and explain why the invention is nothing new and a detrement to
| society.
|
| It is not true. Whenever there is something really useful,
| everybody is happy, and while of course they always are some
| nansayers, they're very few.
|
| However, when you do something controversial, you can expect to
| hear criticism. You are of course free to dismiss that
| criticism, but when a lot of people are telling you what you
| are doing is unethical, maybe it's time to stop and think about
| it.
| teakettle42 wrote:
| My code is shared under a license (MIT) that mandates
| attribution.
|
| That's all I ask -- if you use my code, give me credit.
|
| Stealing my code to train your bot -- which will replicate
| portions verbatim! -- is no different whatsoever than the
| casual plagiarist that copies and pastes a novel snippet
| manually.
|
| Its absolutely my legal and ethical prerogative to complain
| about people stealing my code by failing to respect the license
| under which it was freely provided.
| Xunjin wrote:
| That would be great to Copilot show where it found this
| snippet and give the person credit about. Even if it's
| unlicensed.
| seba_dos1 wrote:
| If it's unlicensed, then you can't use it at all, so giving
| attribution wouldn't change much in that case.
| tiborsaas wrote:
| Is is really stealing when your code is used to change a
| parameter value from 0.3623727247 to 0.3623727321?
| iamevn wrote:
| It does not matter what the internal representation is.
| What matters is that Microsoft is selling a tool which
| reproduces non-public domain works while claiming to grant
| the user ownership of the output.
| isitmadeofglass wrote:
| Yes but,
|
| Sorry for the unproductive tone of this comment, but there's
| something about the attitude of this tweet that really grinds
| my gears. Any time someone invents something new and
| incredible, there's always a crowd of negative nancies eager to
| discredit and explain why the invention is nothing new and a
| detrement to society. I don't understand why someone would
| willingly share their code on github where it is publicly
| available just to complain when others make use of that
| knowledge. 'co-pilot just sells code other people wrote' is
| such a ridiculous understatement of what co-pilot does. Instead
| of marvelling at the human ingenuity that went into creating
| it, they sneer at the audacity of openAI to do something
| without first asking their permission.
|
| -- This comment brought to you by HN-Comment-AI (c)
| bryanrasmussen wrote:
| whoa, I think this should definitely be highlighted far and
| wide on the internet, think of the ingenuity of the people
| who made the HN-Comment-AI, it's probably the smartest
| comment bot out there, able to take the ramblings of people
| on HN and nonetheless generate a comment so astute!
|
| Although I have to say the use of the phrase 'negative
| nancies' shows that even the best machine-learning algorithm
| still comes up with unlikely to occur in real life text.
| Chris2048 wrote:
| > willingly share their code on github where it is publicly
| available just to complain when others make use of that
| knowledge
|
| because it's not unconditional, there are often licence terms
| of usage, and copilot is potentially laundering those.
| sAbakumoff wrote:
| It's the negativity bias beauty in action. You have it too.
| gumby wrote:
| _People_ get paid to write code having learned from writing
| code for others and from reading code others wrote. In this
| regard I dont see why github copilot is any different.
| [deleted]
| lupire wrote:
| People don't memorize chunks of code and copy it.
| giaour wrote:
| Sometimes people do, and in any case copyright isn't
| limited to just verbatim copies. You can't, for example,
| reuse characters or plots from other works of fiction in
| your own novel, even if you rewrite it in your own words: h
| ttps://en.m.wikipedia.org/wiki/Copyright_protection_for_fic
| ...
| gumby wrote:
| People often _do_ write things because they learned a
| common approach at a previous job or because they saw such
| an approach when reading someone else's code. People are
| often hired specifically because they have experience in a
| certain area from a previous employer, so are dointhe same
| sort of thing at a higher level.
|
| We fought this battle over a couple of decades with remix
| culture ("you stole that line/beat out of my song!") and
| the world is better because the over-clingers lost.
|
| There is no shortage of reasons not to like copilot, but I
| don't consider this one of them.
| pwdisswordfish9 wrote:
| 'Facebook just sells personal information of other people' is
| such a ridiculous understatement of what Facebook does. Instead
| of marvelling at the human ingenuity that went into creating
| surveillance capitalism, they sneer at the audacity of Facebook
| to do something without first asking their permission.
| nextaccountic wrote:
| > I don't understand why someone would willingly share their
| code on github where it is publicly available just to complain
| when others make use of that knowledge.
|
| Because they shared the code under a license, and they have the
| right to complain if people use that code but don't follow the
| license.
|
| For example, what happens if Github Copilot spits a copy of
| some copyrighted code verbatim? Is laundering open source code
| through a machine learning model a loophole for not having to
| follow the license?
|
| Often following the license is as simple as giving credit to
| the original author.
| bborud wrote:
| I've done a fair number of technical due diligence projects
| on acquisitions and potential partnerships, and on some
| project I've hired outside firms to analyze the code and
| figure out its origins and what licenses apply.
|
| There are tools that will analyze a codebase and identify
| where chunks of varying size seem to come from. Mostly to
| determine if the code is encumbered by problematic licenses,
| but also to detect where the programmers may have borrowed
| code from.
|
| If memory serves, some of these companies also have closed
| source codebases in their database, enabling them to detect
| if unpublished code has been re-used.
|
| The times I've used this in due diligence it has rarely been
| a deal-breaker when we do find large chunks of code that may
| be problematic. For instance due to licensing terms that are
| not acceptable. You just make a note of it and have them
| rewrite the code before the transaction can take place. (Or
| you figure out if you can accomodate the license terms).
| nextaccountic wrote:
| Yeah, but wouldn't it be great if the tool that performed
| "AI-generated code" were also required to run such analysis
| themselves, to eliminate this licensing violation at the
| moment it were inserted?
|
| It's as if Microsoft were banking on the fact that most
| violations will be unnoticed
| Tryk wrote:
| This doesn't address the point of the Tweet, you are simply
| attacking the form of their argument.
|
| Moreover it is possible to BOTH marvel at the human ingenuity
| that went into making copilot AND disagree with their methods.
| Some things can be marvelous and wrong at the same time.
| rglullis wrote:
| > why someone would willingly share their code on github where
| it is publicly available just to complain when others make use
| of that knowledge.
|
| For other individuals to collaborate, to make the software
| available to other people, etc. Certainly not for github's
| profit and much less for the benefit of github's customers who
| will have access to open code that violates license agreements.
| matthewmacleod wrote:
| I also disagree with the tone of that tweet, but your dismissal
| is equally shallow and gear-grinding.
|
| There are real, serious, and genuinely interesting issues to be
| discussed regarding Copilot. It is neither "just selling code
| that other people wrote", nor is it something that we should
| applaud merely because it demonstrates "human ingenuity".
|
| The comments here regarding this are honestly a total dumpster
| fire. It's mostly a bunch of paper-thin hot takes, either:
|
| - The blatantly stupid "you willingly shared your code so why
| are you complaining that one of the world's biggest companies
| is now hoovering up code from your carefully-selected open-
| source license and reselling it as a service!!!"
|
| - The blatantly lying "I have literally never looked at any
| other computer software while developing any obviously anybody
| who has ever seen other source code is a plagarist"
|
| It's dumb because there is an _actual interesting discussion_
| here but I guess we 're not going to bother having it.
| HumanReadable wrote:
| Fair enough, I agree.
|
| I actually didn't intend for my comment to be an argument in
| favour or against, and I am a bit surprised it is the most
| upvoted of the section.
|
| I agree that there's a pretty interesting discussion to fair-
| use and the limits of copyright, and that my original comment
| was not conducive to having that discussion. In my defense,
| neither was the tweet this thread is about!
| akagusu wrote:
| > I don't understand why someone would willingly share their
| code on github where it is publicly available just to complain
| when others make use of that knowledge.
|
| People like you should understand that publicly available code
| doesn't mean "do whatever you want" code.
|
| The majority of publicly available code hosted on Github as a
| license that tells you what you can and what you cannot do with
| that code.
|
| If someone uses this code without respecting the license,
| authors have the right to complain and even legally enforce the
| license if they want.
|
| Now, you should know that there's nothing "cool" to take other
| people's work without permission.
| ricardoplouis wrote:
| Wouldn't you rather have a healthy dose of skepticism and
| pessimism surrounding new inventions? Even if the negativity is
| off base, it's far more preferable to a world where everyone is
| always positive and praises what geniuses the creators are. The
| former atleast breeds discourse while the latter only serves to
| make people feel good.
| bambax wrote:
| The world would probably be a better place if there were no
| copyright.
|
| But the world we actually live in is one where corporations
| have copyright, and individuals don't.
|
| That's what irks people, I think rightly.
| DoreenMichele wrote:
| Meanwhile, creators of FOSS projects are often underfunded and
| lots of people are in such dire straits that rich people talk
| of mollifying them with a few paltry dollars via UBI rather
| than fix anything.
|
| That's likely the crux of the issue. If you do it right, you
| can steal from other people and get rich. Meanwhile, those same
| people (whose work was stolen) may be left out in the cold no
| matter how original, creative, hardworking etc they are.
| Mizza wrote:
| Pretty fucking simple explanation for it, actually:
|
| I don't make Free software so that Microsoft can sell it to
| people for use in proprietary projects.
| zitterbewegung wrote:
| Why can't startups understand what a open source license is ?
| Apache 2.0 could be ingested by this tool but it is a horrible
| license for your database as a service. AGPL would be a great
| license for a database as a service but should not be ingested
| by OpenAi / GitHub copilot.
| hk1337 wrote:
| Usually they want some recognition for their contribution and
| with GitHub copilot they get none of that.
| jacquesm wrote:
| They are complaining about license violations, they are not
| pissing on this incredible (is it?) achievement.
|
| Reselling other people's content like this without attribution
| (which, is a pretty mild form of payment) is not nice. But at
| least you now have one more reason in the list of reasons why
| Microsoft acquired Github: to be able to launder their open
| source contributions and resell them.
| ThePhysicist wrote:
| I mean I'm not an expert but it's a valid point as people share
| code under a given license, and as far as I'm aware Copilot
| does not make this knowledge available. Nothing to do with the
| fact that Copilot is an amazing technological achievement.
|
| If I, as a human, go to a public repository on Github and
| copy/paste a non-trivial 200 line code snippet into my
| proprietary code base I have to abide by the license of that
| original code, even if I slightly modify it. I don't see how
| this cannot be true for Copilot. I'm sure the legal folks at
| Github have thought of a response though, you could e.g. argue
| that the snippets produced by Copilot are not affected by the
| copyright of the original author as they do not reach the
| required treshold of originality. Seems rather shaky for me
| though.
| thewoolleyman wrote:
| Artificial Intelligence is causing us to revisit the difference
| between free as in beer and free as in speech
| (https://en.wikipedia.org/wiki/Gratis_versus_libre).
|
| It is putting a new spin on some traditional Open Source Lessons
| (https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar#L...)
| .
|
| People share and reuse snippets of unattributed snippets of MIT-
| licensed and GPL-licensed code on the internet all the time,
| StackOverflow, etc.
|
| StackOverflow is profiting from that activity indirectly by
| facilitating it. They profit passively through ad revenue, and
| actively through the Teams subscription offering.
|
| But nobody seem too upset about that.
|
| How is an AI which facilitates the same code sharing
| fundamentally any different? Because it's scraping it itself,
| rather than humans contributing it?
|
| Seems like a tenuous argument at best.
| antihero wrote:
| I mean, if it's autocompleting a fairly simple line, and can do
| that because it's analysed a lot of lines, I don't really see
| that as "stealing anything".
|
| If you are using it to write whole complex functions thatare the
| same as other people's, I guess that is copying.
|
| But if you do the second thing you are not a great dev, and would
| have probably ended up copy pasting it anyway.
|
| I think the first use case is far more common, and creating
| boilerplate that is so generic you could never really attribute
| it anyway.
| dobin wrote:
| I neither see it "stealing". The neuronal network was trained
| with code as input. It's creating code as output. The output
| has nothing to do with the input once it is trained. Do people
| dont know how neuronal network work?
|
| It's like saying GPT-3 created text is copyright infringement,
| because some author used the same sentence in a book before.
| eloisius wrote:
| So if I fit a network to output entire chapters of a book
| when given the chapter number as input, I can print and sell
| copies of it that way?
| f1refly wrote:
| 1. Create a neural network that produces an x264+dts stream
| of a movie 2. distribute it 3. checkmate copyright lawyers
| ImprobableTruth wrote:
| Overfitting: One weird trick that copyright lawyers don't
| want you to know!
| wodenokoto wrote:
| > If you are using it to write whole complex functions thatare
| the same as other people's, I guess that is copying.
|
| > But if you do the second thing you are not a great dev, and
| would have probably ended up copy pasting it anyway.
|
| How would I know that the boiler plate I ask copilot to write
| for me is copied verbertim from a codebase, that neither I nor
| Microsoft has licensed to use?
| carom wrote:
| My problem is with the weights not being released. They are a
| derivative work of open source code in the most literal sense.
| The weights would not exist without those lines. Gradient
| descent is using literal derivatives.
| afiori wrote:
| The when Oracle won its copyright lawsuit against google it was
| because of a 8 line bound checking utility function.
| redox99 wrote:
| Source?
| afiori wrote:
| https://news.ycombinator.com/item?id=11722514
| redox99 wrote:
| Thank you
| dekhn wrote:
| Not only was that not the only code in question, you left out
| the conclusion! https://en.wikipedia.org/wiki/Google_LLC_v._O
| racle_America,_.... It went to the supreme court, they
| concluded fair use, FU oracle.
| alpaca128 wrote:
| The first can be automated without ML though. And once you use
| ML you cannot guarantee it won't copy-paste existing code.
|
| This whole thing would be fine if GitHub hadn't just used all
| public code on their platform, ignoring all involved licenses.
| xupybd wrote:
| It changes the code for use. I'm not sure it can be
| considered a copy. It much like reading someone else's code
| and drawing ideas and patterns from that code.
| afiori wrote:
| Copyright sensitive environments are very careful not to do
| that.
| alpaca128 wrote:
| It has been shown often enough that Copilot can reproduce
| exact copies of snippets.
| rob74 wrote:
| The problem is, if they had used only code with a license
| that allows copying without attribution, there wouldn't have
| been a lot of code left...
| alpaca128 wrote:
| Difficulty doing something legally doesn't justify breaking
| the law.
| rob74 wrote:
| > _But if you do the second thing you are not a great dev, and
| would have probably ended up copy pasting it anyway._
|
| If you do that on your own, it's your (legal) responsibility.
| If Copilot does it for you, it's GitHub's/Microsoft's
| responsibility.
| __warlord__ wrote:
| Why should be GitHub's/Microsoft's responsibility? No one is
| forcing you to use copilot.
|
| If I use grammarly, are they responsible for what am I aiming
| to write?
| purerandomness wrote:
| Does Grammarly gerate pages of content for you?
| 32bitkid wrote:
| If I pay for grammarly, and it plagiarizes an existing work
| but represents it as an entirely new, independent work and
| I am unaware of the existing work that is being stolen, who
| is doing the stealing?
| scotty79 wrote:
| If you pay a shady character to get you a modern laptop
| for $100 you can't claim that you were unaware that it
| was most likely stolen and the fact that you paid for it
| something doesn't absolve you morally.
| ClumsyPilot wrote:
| does shady guy have his name on the side of a building,
| and run ads: "buy my shady stuff" and then pay taxes and
| his earning? That kind of shady guy?
| scotty79 wrote:
| Sometimes. Like Amazon, widely known for their workers
| and 3rd party vendor exploitation practices.
|
| You can no more claim ignorance of where the github
| copilot code comes from than where the Amazon's low, low
| prices come from.
|
| Whether you care is totally on you regardless of whether
| you pay ir not. You pay for product or service not moral
| absolution.
| nnoitra wrote:
| scotty79 wrote:
| I see you just came from there. Welcome on HN. Please
| start by reviewing FAQ and Guidelines.
| ClumsyPilot wrote:
| Are thousands of amazon employees going to be in same
| docket with me?
| seanmcdirmid wrote:
| This makes more sense for text message auto complete: you
| just take the suggested next word after a one word start
| deed, it might reproduce a Wikipedia entry. But what did
| tub expect? The same would be true with grammarly if you
| somehow got it to produce a bunch of new text. You
| expected garbage, but somehow infringed on copyright
| instead. But I guess think the user deserves some
| responsibility in realizing their expected garbage output
| isn't for some reason.
| ClumsyPilot wrote:
| So it's my job to check my supplier, to make sure lines
| from co-pilot are legit.
|
| At the same time when fast fashion companies sell T-shirts
| made with slave labour, its not the company's
| responsebility to check what their suppliers are doing.
|
| And if tesla autopilot kills you and your family its not
| their fault either.
|
| Neoliberal morality - companies are never accountable for
| anything, it's heresy to suggest they should do their job
| properly.
| wang_li wrote:
| Other than the first sentence nothing you wrote is true.
| If a company doesn't do due diligence on their suppliers
| they face fines and possibly criminal charges. The news
| came out the other day that the NTSB is considering
| whether to require Tesla to recall all their vehicles
| with self driving enabled. Companies of all types face
| huge fines and civil liability for product safety issues.
| ClumsyPilot wrote:
| have you never googled "slavery fast fashion"?
|
| Zara's clothes sometimes have notes in their pockets from
| people being held as slaves, pleading for help. I havent
| heard of anyone going to jail
|
| most of our electronic waste end up illegally exported to
| poor countries, again when was the last tomw someone
| faced the music for that?
| alkonaut wrote:
| > If Copilot does it for you, it's GitHub's/Microsoft's
| responsibility.
|
| Is this true? It hasn't been tried yet I assume?
| Hamuko wrote:
| > _If Copilot does it for you, it 's GitHub's/Microsoft's
| responsibility. _
|
| GitHub/Microsoft says that it's still your responsibility.
|
| > _You should take the same precautions as you would with any
| code you write that uses material you did not independently
| originate. These include rigorous testing, IP scanning, and
| checking for security vulnerabilities. You should make sure
| your IDE or editor does not automatically compile or run
| generated code before you review it._
|
| I'm not really sure how am I supposed to go about validating
| that I can in fact use this code that the magical black box
| barfed into my IDE using a bunch of different weights.
| nnoitra wrote:
| Just what a horrible shady behavior.
|
| Give us your money but you are responsible for the code
| that OUR tool generates.
| dragonwriter wrote:
| > GitHub/Microsoft says that it's still your responsibility
|
| If Copilot is fair use, and has no restrictive license,
| than how is it anyone's responsibility?
|
| If Copilot isn't fair use, it's Microsoft's responsibility.
|
| (For copyright; for patent that's another issue, but you
| can violate patents by similarity without exposure or
| copying, anyway.)
| Hamuko wrote:
| _Training_ Copilot is fair use, using Copilot is ???.
| lowercased wrote:
| Let MS buy BlackDuck scanner and integrate in to
| GitHub/CoPilot. They could then suggest code and also scan
| it for any license violations, and give you both sides of
| the equation in the same tool.
| pronik wrote:
| You are responsible for your tool use. That's the same
| discussion as with whether uTorrent is responsible for your
| torrenting copyrighted stuff or with Tesla's auto-pilot. You
| buy the tool, you are responsible for what you create with
| the tool.
| [deleted]
| stavros wrote:
| Napster was liable for copyright infringement.
| pronik wrote:
| True, however, the users have been liable too. If my
| company gets sued because I used Copilot, it won't matter
| that much that the plaintiff also sued GitHub/Microsoft.
| strictnein wrote:
| Napster's raison d'etre was copyright infringement.
| stavros wrote:
| Which they were then liable for.
| Ciantic wrote:
| I'm bit mixed on this, code Copilot usually autocompletes me is
| not particularly novel, it's just mundane stuff I would write
| anyway. Most of these snippets are not copyrightable in my
| opinion, because it was obvious in the first place. Like CSS nth-
| child odd / even logic, or one case it filled me ~10 lines JS
| logic of filtering rows by category stored in dataset, which I
| would have written anyway.
|
| Then there are cases where it amazes me completely, it wrote 10
| lines of C++ code for rendering a monochrome glyphs with bits
| using Freetype library. It though had odd subtle bug, the glyphs
| came reversed and it worked with only certain font size which it
| seemed to pick up from different file all together.
| BiteCode_dev wrote:
| It is incredible to use though. I pasted the return value of an
| API call in comment, then started to write a schema class.
| Codepilot just created the entire class for me. wanted to extract
| a subset of the data, I typed get_<_name_of_the_subset>(), it
| wrote the code I would have written.
|
| So even without using someone else code, just the pattern
| understanding and the production of simple boiler plate code is
| great.
| iLoveOncall wrote:
| Github Copilot is selling code other people wrote as much as the
| author of this thread is profiting from words other people
| invented.
|
| Absolute nonsense.
| nextaccountic wrote:
| The difference is that words aren't copyrighted and doesn't
| come with an open source license.
| coldtea wrote:
| > _Hector Martin: If you use Copilot, you are basically playing
| Russian Roulette that the random mashup of existing, copyrighted,
| hegerogenously licensed code that you get out of it qualifies as
| an original work, mostly by chance. Or that nobody will ever sue
| you otherwise._
|
| Well, that's already the case with Stack Overflow copypasta
| enterprise code. If anything, use of Copilot would be an
| improvement...
| t0suj4 wrote:
| That quote applies to any creative work. Be it code, audio or
| video.
| coldtea wrote:
| He talks about code, and Copilot works with code, so I'm not
| sure how it "applies to any".
|
| If you mean that if you make a "random mashup of existing,
| copyrighted, hegerogenously licensed" works of art
| (audio/video), it also applies that you might be sued for it,
| then yes.
|
| But that's not much of an issue with Copilot if you're using
| it for enteprise code that's already a mashup of copypaste
| "existing, copyrighted, hegerogenously licensed" and that you
| wont release and nobody will see anyway.
|
| Whereas audio/video you generally want to release.
|
| If you make them for your own consumption, then it's my
| response that rather applies: since nobody will see it, and
| you don't release/sell/circulate it, you can go ahead and mix
| Michael Jackson, Disney and Star Wars material - nothing will
| happen to you.
| Hamuko wrote:
| If you post content on Stack Overflow, your contribution is
| distributed using the CC BY-SA 4.0 license.
| coldtea wrote:
| Yes, but nobody that copies it cares...
|
| (Where nobody is a stand-in term, to mean "less than 1% of
| those do")
| moffkalast wrote:
| > If anything, use of Copilot would be an improvement
|
| What do you mean, Copilot regularly pastes stuff directly from
| SO. One of those automatic doc generators was able to point me
| to the exact answer where one of them was from.
| coldtea wrote:
| That it doesn't just "copy and paste" but does more involved
| "AI" mixing
| moffkalast wrote:
| I don't think renaming variables and adjusting spaces holds
| up in court.
| tagyro wrote:
| Do people really copy/paste from StackOverflow?
|
| I feel this is more a meme, rather than reality. I do check
| StackOverflow, but never have I took an answer verbatim. I try
| to see if it's the same problem and what was the approach in
| deconstructing it, which I find more useful in the long run.
| Flimm wrote:
| According to Stack Overflow's blog:
|
| "One out of every four users who visits a Stack Overflow
| question copies something within five minutes of hitting the
| page."
|
| https://stackoverflow.blog/2021/12/30/how-often-do-people-
| ac...
| icoder wrote:
| Well, to be fair, most of that is probably just copying the
| a particular syntax or built-in function, which (I think?)
| has nothing to do with copyright.
|
| At least for me, that's most of the copies I do, followed
| by the ones that basically are 'call these functions in
| order', then paste it as a comment and use it as cheat
| sheet, and only very rarely I copy a 'creative' snippet
| almost verbatim, like a regexp matching email addresses, a
| to-hex or a crc calculation. And perhaps that's actually
| tricky.
| concordDance wrote:
| Anecdata: Everyone I work with does.
| mullen wrote:
| I catch people using cut and paste code all the time. If
| there is a spelling error in code (Especially if it is in a
| code comment), I can guarantee you that someone copied and
| pasted it from StackOverflow.
| coldtea wrote:
| > _Do people really copy /paste from StackOverflow?_
|
| All the time.
| ldoughty wrote:
| I've done it, and I know others that have, but I think it
| depends on people's definitions of copy/paste.
|
| I've certainly copied a sort anonymous function from SO, it
| was one-liner. Is that copy/paste? or is it only copy/paste
| if it's X lines?
|
| Otherwise I agree, usually I just get hints and go my own
| way.
| Timwi wrote:
| It depends on what you need. In most cases the code on
| StackOverflow is not exactly what you need, so you need to
| understand it in order to adapt it. But if you're looking for
| a specific well-defined algorithm (MD5, say) then you can
| just copy & paste it.
| Aeolun wrote:
| This has more to do with the code never being immediately
| copy-pasteable, not so much my reluctance to copy-paste from
| SO for licensing reasons.
| fimdomeio wrote:
| what AI is showing is the fuzzy line between creating and
| copying. The truth is they are both always present in everything
| we do, we've just been trying to hide it.
|
| So it should be as simple as if you're using other people's
| content for your own profit you should properly compensate them.
|
| Or we could just abolish copyright law and assume that everything
| humans create emanates from culture so its always collectively
| built and everything should be open source.
|
| Or we just do the same we've been doing. Create even more complex
| laws trying to define this fuzzy line in a way that companies can
| keep profiting from it a lot more than individuals.
| FeepingCreature wrote:
| All I can think of is Steve Yegge [1]: "They have no right to do
| this. Open source does _not_ mean the source is somehow 'open'."
|
| My code is on Github so that people can read it, reuse it and
| learn from it. "The freedom to study how the program works", as
| the FSF says. If some of the people reading it are machines, why
| would that matter?
|
| [1] http://steve-yegge.blogspot.com/2010/07/wikileaks-to-
| leak-50...
| happymellon wrote:
| Because a lot of this code would be put into closed source
| software, which is against the licence and would prevent people
| from exercising the right to study how a program works.
| FeepingCreature wrote:
| But I don't care if closed source programmers read my GPL
| code! The freedom to learn is not copyleft. So long as they
| put independent effort into their work, they're good in my
| book. Shared knowledge is a vital commons, and I'm honored if
| I can contribute to it.
|
| Maybe this goes back to that debunked paper that claimed that
| transformers were only remixing input samples?
| happymellon wrote:
| They aren't reading your code. This is a program
| copy/pasting code without attribution.
| FeepingCreature wrote:
| Again, the paper that said that transformers only
| copypasted input samples was highly misleading.
|
| It seems clear to me that Codex has true understanding.
|
| (Yes, I know that people have gotten secrets to appear in
| the output by prompting it in clever ways. That this
| happens doesn't prove that Codex doesn't understand what
| it's doing, it just shows that Codex doesn't understand
| _everything._ )
| tremon wrote:
| I might start considering Copilot if Microsoft were to train it
| on their own internal codebases (Windows, Office, SQL Server).
| Until they do, it's clearly a "tool for thee but not for me" type
| of situation.
| clircle wrote:
| "tool for thee but not for me" <- what does this even mean?
| albertzeyer wrote:
| So, how often does it actually happen? Does it happen more often
| than for a human? Does anyone actually have numbers on this?
|
| Of course, if you provide already a copyrighted prefix, and it
| has seen that code, the chances are high that it would complete
| the copyrighted code (because that is what you actually would
| also expect).
|
| So, for real use cases in the wild, where you write some own real
| novel code, how often would it suggest some copyrighted code? And
| how often would a human?
|
| I have used Copilot the last months and I have never ever seen
| such a case (I can be pretty sure because all the identifier
| names are really unique, and the code was very custom).
|
| However, I assume that I myself might have produced copyrighted
| code unknowingly because if you write common patterns (e.g. some
| tree or graph search, or some sort function, implement LSTM or
| Transformer, whatever), the chances are not so low.
| thih9 wrote:
| Is github copilot using private repositories for the learning
| process?
|
| If yes, how do they mitigate the risk of exposing private data
| when something is quoted verbatim?
|
| If not, then why are repos with non permissive licenses ok?
| mojuba wrote:
| Can I suggest a hypothesis that if you find Copilot useful it
| means the problem you are solving is a boring one? I might be
| wrong of course.
| triknomeister wrote:
| 99% of work in 100% of interesting projects is boring.
| alpaca128 wrote:
| I disagree. Most large projects, software or otherwise, use
| existing parts. If you design an innovative device you'll still
| use some standard components like chips, memory modules etc.
|
| There's already a way to quickly solve the boring parts in
| development - libraries which were built and licensed around
| that purpose. But Copilot passes you code of unknown origin,
| with unknown license terms and no information about how close
| it is to an existing codebase. It's like a person trying to
| sell you Macbooks for a hundred bucks per unit but you don't
| know where they came from and who made the holiday photos
| stored on the harddrive.
| alkonaut wrote:
| 99% of the "problems" I'm solving when I'm working even on very
| interesting and challenging problems, are boring subproblems.
| If I can get those out of the way then that would be great.
| mistercow wrote:
| That hypothesis is easily disproven by spending an afternoon on
| a side project with Copilot.
|
| No matter how interesting your problem is, translating it into
| code is going to involve a lot of grunt work. This isn't just
| boilerplate, but also the large portion of your code which is
| going to be gluing things together.
|
| The time you spend working through those menial parts of your
| code is time when the context of the interesting part of the
| problem fades. Once you get the mechanical stuff out of the
| way, you have to load the interesting stuff back into your
| brain.
|
| This is where AI coding tools really shine. They dramatically
| reduce the intervals between when you can think about the
| actual problem you're solving by letting you get the boring
| mechanics out of the way more quickly.
| mojuba wrote:
| I'm very curious to see some examples where Copilot
| autocompleted something truly useful and saved you time - and
| that also disproves my hypothesis that you are doing
| something boring or with the wrong
| tools/languages/frameworks. Things that a non-ML autocomplete
| could do don't count.
| mistercow wrote:
| I can give you an example of an entire (well, I still
| consider it alpha) library I wrote several months ago,
| using Copilot: https://github.com/osuushi/triangulate
|
| This is an implementation of a 1991 paper on polygon
| triangulation into Go. So the deepest thinking about how to
| solve the problem was obviously already done for me, but
| there were a number of edge cases that I had to invent my
| own solutions to, and the translation itself involved
| keeping a lot of context in my head.
|
| I can't tell you in precise detail what Copilot did, and
| what I wrote by hand. I wasn't taking notes or recording my
| screen. But there's a reason you don't see a lot of blocks
| in there where I forgot to comment anything, because my
| entire process for this was "type what I want to do in
| English, and see if Copilot will generate the next snippet,
| or something close". I didn't do this out of bloodyminded
| dedication to the AI cause, but because it continued to be
| an extremely effective way to get the code written quickly.
|
| I can give a few specifics:
|
| - My linear algebra is rusty, and Copilot was extremely
| helpful here. I would often just type the basic thing I was
| trying to do in pretty vague linear algebra terms, and it
| would generate the formula.
|
| - I wrote a lot of tests like this https://github.com/osuus
| hi/triangulate/blob/main/internal/sp.... This is a minor
| thing, but those aren't copy-pasted. Instead, I would write
| the first test, and for the most part, I could just type
| something like `func
| TestConvertToMonotones_SquareWithHole`, and it would figure
| out how to adapt the previous test automatically.
|
| - It generates exactly the error strings I want based on
| context an enormous percentage of the time.
|
| I want to stress that I'm just giving a few examples of
| things that I specifically remember because I talked about
| them at the time, not characterizing the majority of the
| experience of using Copilot. The majority of the experience
| of using Copilot is that you write comments, and then the
| things you were about to type appear on the screen before
| you have to type them.
| ilikehurdles wrote:
| When I find myself writing comments of this style I see,
| I usually ask myself if this thing would be better
| extracted into a function. These comments are primarily
| stating the obvious.
|
| If I find myself writing a 200 line function with nested
| or repetitive loops I expect to hear from colleagues
| about how I should refactor it.
|
| I feel that the solution to writing boring, repetitive
| boilerplate shouldn't be to automate writing more of it,
| but to reduce or remove it entirely. Seeing things like
| this just reinforces my preconception that Copilot acts
| in low quality code environments to produce fittingly low
| quality code, or with languages like Java where the
| language is married to boilerplate.
| mistercow wrote:
| This reply feels pretty bad-faith. But you know, feel
| free to open a PR if you have something concrete you feel
| can be improved.
| workingon wrote:
| Seems like a narrow vision. Is every line of code you write to
| solve a problem "not boring"? I solve problems I find
| interesting, but writing matplotlib code to visualize data
| never is.
| trention wrote:
| This is true for the current iteration of the model. Probably
| won't be true at least to an extent in 5 years. Besides, there
| is nothing wrong with solving boring problems. Not everyone can
| be Bjarne Stroustrup.
| viraptor wrote:
| The most interesting problem will have extremely boring bits.
| If you write a cli tool to solve all of world problems by
| changeling magic, you'll still need to add the parameter
| handling and do some error management. Which is repetitive and
| likely well generalised and predictable based on other
| projects.
| para_parolu wrote:
| The problem may not be boring. Typing boilerplate code is. I
| work on games as hobby. Sometimes I implement mechanics
| requiring vector math. Working on mechanics is interesting.
| Writing down math is not. Copilot helps with later.
| mojuba wrote:
| Then another hypothesis: you probably haven't found the right
| tools for it yet. I find myself writing biolerplate mostly
| around some obscure system framework calls (iOS/macOS), but
| that's rather rare. But even OS API's and frameworks do
| evolve over time into requiring less boilerplate. Just take
| the evolution of CoreAudio, the modern Swift interface is so
| much better. So at the end of the day it's about the tools
| and interfaces: boilerplate is rarely absolutely necessary
| with the right tools.
| triknomeister wrote:
| Maybe github copilot is the right tool.
| mojuba wrote:
| I don't think so. A human-verified, tested and maintained
| code is obviously superior to a snippet blindly copied
| and mixed by a statistical system.
| mistercow wrote:
| That's not how you use Copilot, any more than it's how
| you'd use any other autocomplete tool. I don't know why
| so many people seem to think that using Copilot is just
| closing your eyes, hitting tab fifty times, and then
| committing.
|
| You work on your code, Copilot makes a suggestion. You
| _read_ that suggestion and verify that it's close to what
| you were already going to do. If it is, you hit tab, then
| you tweak it. There's nothing blind about this process.
| muzani wrote:
| Yeah, it's for boring problems. Drawing a circle or detecting a
| specific format of number in some string, for example.
| mawadev wrote:
| What stops me from re-uploading copyrighted source, where I
| remove the notices and push it with an MIT license? If such a
| data set has been trained with, how do you get it out?
| GuB-42 wrote:
| > Copilot just sells code other people wrote
|
| So what? Selling code other people wrote is the foundation of the
| free software movement. It is the entire business model of
| countless companies, and it is a good thing. Among them are most
| major linux distro vendors like Red Hat and Canonical.
|
| The value added by Copilot is that they sell you the lines "code
| other people wrote" you want out of billions.
|
| I still think it is derivative work, and that they should only
| process code under permissive licenses, or, if they want to
| include GPL code, make a GPL-only version, usable only for GPL
| projects. I thought it is what they did, there is so much code
| under permissive licenses that is should be enough to train their
| model, but apparently, they don't care, as long as it is public,
| it is included. For me, they are shooting themselves in the foot,
| several companies have already banned Copilot due to the
| potential issues with copyright.
| zokier wrote:
| Sure, the concern is valid but I feel like this tweet adds
| absolutely no substance to the discussion and just repeats the
| same opinion that was already rehashed to death since copilot
| originally launched. As such, especially with the tone that the
| tweet has, I don't expect constructive discussion to raise here.
| maxbaines wrote:
| Initially not thought about co-pilot and other ai generators this
| way, but now I have I'm finding it hard to ignore.
| k__ wrote:
| Isn't that what Web2 is all about?
|
| Someone creates content for free, and companies monetize it.
| WesolyKubeczek wrote:
| The real Web3 is companies sue original creator for
| infringement.
| VoodooJuJu wrote:
| It is now proven that copilot returns code from codebases with
| non-permissive licenses [1].
|
| I'm curious - what are the legal implications of this going
| forward? I've so many questions.
|
| 1. Will Microsoft ever face lawsuits for these license
| violations?
|
| 2. If so, who/how? Class-action?
|
| 3. Will copilot be forced to open-source in the future? Under
| which license? Some open source licenses are incompatible with
| others, but copilot uses code from probably every OSS license
| conceived.
|
| 4. If Microsoft faces no justice, will we start seeing more OSS
| license violations? Will Google start using AGPL-licensed code?
|
| [1] https://news.ycombinator.com/item?id=27710287 | Copilot
| regurgitating Quake code
| mhaymo wrote:
| That regurgitated code exists on Github exists under an MIT
| license: https://github.com/jethrodaniel/fast_inv_sqrt
|
| "jethrodaniel" does not appear to have the copyright to offer
| that license, but it's hard for Github to determine that in
| general, so I doubt they would be liable for the error.
| mrh0057 wrote:
| I'm not a lawyer but my understanding these are torts so all
| you have to prove is Microsoft has liability. I think this
| would be easy to prove due to the way neural networks work
| since it's just a way of performing a search.
|
| Since it's a tort I don't think you have to prove they should
| have know it would return copyrighted code, the fact that it
| does is enough to have liability.
| jsiaajdsdaa wrote:
| That doesn't stop youtube from blasting people away over
| copyright issues?
|
| On youtube, video uploads are a cost center, whereas on
| github, code is a profit center
| vorpalhex wrote:
| > but it's hard for Github to determine that in general, so I
| doubt they would be liable for the error.
|
| Please insert that meme, "That's not how that works. That's
| not how any of this works!"
|
| The legal system is permission based, not forgiveness or "I
| didn't know" based.
| Flimm wrote:
| I personally don't want to have to upload proof of identity
| to GitHub and a signed document swearing that I own the
| copyright to all the code I upload to GitHub, or proof that
| I coded it. We need to be careful what we wish for.
| vorpalhex wrote:
| Excerpt from the MIT license:
|
| > THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF
| ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
| TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
| PARTICULAR PURPOSE AND NONINFRINGEMENT.
| minhazm wrote:
| Actually the legal system is evidence based. Microsoft has
| evidence that the code they are producing is licensed under
| MIT as far as they can reasonably know. There's no
| definitive way to know that who actually owns the original
| copyright. I could grant permission to use my repo, but
| maybe I got that code from someone else, who then got it
| from someone else and so on and so forth. It's a similar
| situation with stolen goods, if you unknowingly purchase
| stolen goods you usually cannot be charged for theft as
| long as there aren't obvious signs that it's stolen such as
| the goods being priced far below market value.
| sammax wrote:
| Microsoft has evidence that the code they are reproducing
| is MIT licensed, so are they intentionally violating that
| license or does this AI thing include the license and
| attribution in every snippet it generates?
| monocasa wrote:
| Major aspects of copyright infringement are strict
| liability, like a lot of civil actions around damages. It
| doesn't matter if you thought it was OK, there's still a
| damaged party that needs compensation according to the
| law. At best you'll simply avoid the criminal and
| punitive penalties.
| BaculumMeumEst wrote:
| Exactly, that's why Pornhub hasn't had any liability
| issues arising from where its content comes from either.
| It's just too darned hard to tell.
| monocasa wrote:
| No, PornHub doesn't have liability in a lot of cases
| because of 17 SS 512, but has still had to deal with
| liability in general, which is why they nuked some 80% of
| their library not backed by verified individuals a while
| back.
|
| https://www.law.cornell.edu/uscode/text/17/512
|
| A huge part of 17SS512 is the DMCA takedown process
| mainly in 17SS512(c)(3). Does Microsoft even have the
| ability to truly remove training data from the model? Or
| do they have to retrain on each DMCA takedown?
| concordDance wrote:
| If they had a reasonable basis for believing they had a
| license they're in the clear. "I didn't know" might not be
| enough but "I had good reasons to think otherwise" is.
| vorpalhex wrote:
| > If they had a reasonable basis for believing they had a
| license they're in the clear.
|
| False.
|
| If they committed copyright infringement, even if they
| genuinely believed they weren't, they are not in the
| clear. They still owe damages.
| monocasa wrote:
| Even if it's somehow available under an MIT license (which is
| questionable on the part of jethrodaniel), there's still
| infringement. MIT isn't public domain, it still has
|
| > The above copyright notice and this permission notice shall
| be included in all copies or substantial portions of the
| Software.
|
| Replicating it without complying with those terms is still
| infringement.
| sirsinsalot wrote:
| this. People are being willfully blind here, like cult
| members looking dead-eyed at their leader and chanting
| "This is great" as they drink the kool-aid.
|
| And from Microsoft no less, once outcast for mass
| poisoning.
| concordDance wrote:
| There's also one more question:
|
| 5. Even if it is illegal, is it actually bad? No one can
| possibly sell code snippets, the transaction costs are many
| orders of magnitude greater than any reasonable price. In my
| opinion, at least in this case the benefits massively outweigh
| the costs and the law should not apply here.
| citizenkeen wrote:
| Then the law should change. Saying "it's illegal but it's
| good/harmless" is a terrible stance.
| anamexis wrote:
| Seems like an eminently reasonable stance, and exactly the
| stance you would take to get the law changed.
| citizenkeen wrote:
| Fair. I had read "and the law should not apply" as "so we
| ignore it", not "so we change it".
| xtracto wrote:
| I really, REALLY like the idea of Copilot. I think it is a
| glance at what the future of AI can bring to improve
| programming. I understand where all the litigation and
| "uneasiness" is coming from, both from commercial and open-
| source projects.
|
| I've not installed or used it for the same reason (don't want
| to use AGPL or GPLd code by accident, and don't want my
| closed source code to be used accidentally as well), but the
| thought of Copilot being "killed" due to
| litigation/copyright/licensing issues is sad.
|
| For me, It's kind of like when MP3 first appeared: Sharing
| music in Napster or downloading Mp3s from Geocities was just
| amazing. The idea of having such things at your fingertips.
| Even though I understood the issue the authors had with the
| unpaid distribution of their music... still, the idea of
| "what could be..." made it amazing.
|
| I guess Microsoft could be a bit forward thinking, and
| implement the "Spotify" model in code: Pay OpenSource
| developers (whoever owns the repo, or whoever made a commit?)
| a small amount whenever their code gets used through Copilot.
|
| I'm super excited by how "Copilot" related services will look
| like in 10 years. And I really really hope that the
| technology/idea doesn't get killed by litigation.
| PaulKeeble wrote:
| Microsoft could have trained this on their own code and
| there would be no issue. The problem is instead of doing
| that they knew full well the approach would reproduce the
| code and they decided they would rather breach GPL than
| expose their own code. But I bet Microsoft has more than
| enough lines to train an AI, there was a clear choice to
| breach other peoples licenses in preference.
| frazbin wrote:
| Huh... These comments have given me an idea: MS needs to be
| forced to train a model to compensate (pay) code authors
| and codebases based on snippet suggestions given by their
| tool: the Spotify model replacing Napster!
| sirsinsalot wrote:
| See: Who Owns the Future by Jaron Lanier
| Graffur wrote:
| The comment you replied to gave you that idea nearly word
| for word..
| midasuni wrote:
| Some people won't let you use their copyrighted work no
| matter how much you pay, that's reasonable.
|
| By all means allow repos to opt in, although if it's licensed
| under something like GPL there's no way to convert it to non
| gpl without permission from every contributor. I for one am
| not interested in Microsoft or anyone else paying me to close
| my code.
|
| Allowing people to pay $xxx to copy my copyrighted work
| without my agreement is simple piracy.
|
| Either they international agreement to drop copyright as a
| concept, or obey the law.
| rifty wrote:
| It seems like Microsoft could be in the clear on the basis of
| it being essentially "search". But it also seems like anyone
| who uses it could be risking to a high degree getting infected
| with copyright violating code.
|
| My question is, if it isn't a copyright infringement issue to
| use copilot in its current form right now, why not just claim
| copilot was used whenever accused of copyright infringement
| hence forth?
| solveit wrote:
| > why not just claim copilot was used whenever accused of
| copyright infringement hence forth?
|
| Without speaking to the particulars of copilot, this
| situation where laws seem toothless because of the ease of
| plausible deniability is actually fairly common. And in many
| such cases, the law is not as toothless as it seems, because
|
| 1. Getting multiple people to stick to a script under oath is
| difficult and dangerous.
|
| 2. Criminals frequently send each other messages like
|
| A: "lol I just crimed, hope nobody figures it out."
|
| B: "lol just say you used copilot".
|
| A: "lolol yeah fuck the law"
|
| Obviously this only gets the worst criminals, but there seems
| to be lots and lots of them.
| TAForObvReasons wrote:
| Microsoft is trying to legally position Copilot like
| StackOverflow. It is possible to post copyright-infringing
| code on SO even though their TOS requires a CC BY-SA 4.0
| grant to the company and its users.
|
| https://stackoverflow.com/legal/terms-of-service#licensing
| bastardoperator wrote:
| You don't think a mountain of MSFT lawyers in every state,
| including partner law firms around the world haven't thought
| about this? Do you practice law or are you speculating based on
| emotions?
| worker_person wrote:
| MSFT tried very hard to sue Linux into oblivion. Buying SCO,
| then claiming they owned all of Linux.
| http://www.groklaw.net/
|
| I trust MSFT to screw everyone over.
| bastardoperator wrote:
| You're making stuff up. MSFT never bought SCO.
|
| https://en.wikipedia.org/wiki/List_of_mergers_and_acquisiti
| o...
| birdyrooster wrote:
| Not sure where your confidence came from but a Google of
| "sco Microsoft" reveals:
|
| By the mid-1980s Microsoft had gotten out of the Unix
| business, except for its ownership stake in SCO.[20]
|
| https://en.m.wikipedia.org/wiki/History_of_Microsoft
| signatoremo wrote:
| No, SCO was found in 2002, from Candera Software who was
| a Linux distributor [0]. How could Microsoft in 1980s own
| a company that wasn't founded until 2002?
|
| They later filed for bankruptcy in 2007.
|
| [0] https://en.m.wikipedia.org/wiki/SCO_Group
| colejohnson66 wrote:
| Owning stock, on its own, is not the same as buying a
| company
| worker_person wrote:
| If you own a controlling percentage. Then yes it is. That
| is how you buy/control a publicly traded company.
|
| You can buy 100% of shares and take it private, but
| that's overkill for what Microsoft wanted.
| colejohnson66 wrote:
| Hence why I said "on its own"
| birdyrooster wrote:
| lmao
| bastardoperator wrote:
| Not sure where yours is coming from, if we look at [20]
| it makes no such claim.
|
| https://web.archive.org/web/20061105100939/http://www.inf
| orm...
| Beltalowda wrote:
| > It is now proven that copilot returns code from codebases
| with non-permissive licenses [1].
|
| That same Quake example from last year is repeated every single
| time.
|
| Aside from the fact that GitHub has since added a protection
| for this, that this example gets repeated time and time again
| instead of a *list of examples leads me to believe this is (and
| was not) a common occurrence.
| pwdisswordfish9 wrote:
| Is there any leaked Microsoft code on GitHub? Someone should
| check if Copilot regurgitates that as well, then see how
| Microsoft reacts when someone slaps an AGPL license on that...
| q-big wrote:
| > Is there any leaked Microsoft code on GitHub?
|
| There seems to be. Google 'windows nt source code leak
| github':
|
| https://www.google.com/search?q=windows+nt+source+code+leak+.
| ..
|
| First search results:
|
| Windows NT 4.0:
|
| > https://github.com/lianthony/NT4.0
|
| > https://github.com/ZoloZiak/WinNT4
|
| Windows XP:
|
| > https://github.com/tongzx/nt5src
|
| > https://github.com/onein528/NT5.1
| throwaway23234 wrote:
| Big meh. That quake code was MIT.
| monocasa wrote:
| A) Public Quake is GPL. Just because someone else dumped it
| in an MIT library doesn't change that.
|
| B) MIT still requires attribution to not infringe.
| 542458 wrote:
| IANAL. My understanding is that the general legal precedent in
| the US is that a) datamining text has no copyright implications
| (in the same way that reading a book has no copyright
| implications) and b) it is not a copyright violation to use a
| small amount of copyrighted material provided the context is
| sufficiently transformative. This might seem silly or unfair to
| you, but that is the current legal reality.
|
| But even ignoring that, everybody uploading code to GitHub has
| given GitHub the right to analyze that code as per the GitHub
| ToS. This is the same mechanism by which you can't upload code
| to GitHub with a license that says "nobody is allowed to
| display this code on the internet" and then sue GitHub.
| aposm wrote:
| I can't imagine a scenario in which any lawyer would consider
| granting Github the right to "analyze" code anywhere close to
| granting Github the right to spit out that same code verbatim
| without your copyright notice (even if laundered by AI).
| 542458 wrote:
| Here's Kate Downing, an IP lawyer specializing in software
| license:
|
| > According to Downing, the answer depends to a certain
| extent on where that code is hosted. If it's on GitHub,
| there very clearly would not be copyright infringement.
|
| > "If you look at the GitHub Terms of Service, no matter
| what license you use, you give GitHub the right to host
| your code and to use your code to improve their products
| and features," Downing says. "So with respect to code
| that's already on GitHub, I think the answer to the
| question of copyright infringement is fairly
| straightforward."
|
| Downing cautions that copilot output of large chunks of
| code complete with comments are more questionable to use,
| but that for the most part it looks above board.
|
| https://fossa.com/blog/analyzing-legal-implications-
| github-c...
|
| Here's an English lawyer on the same topic...
|
| > The licence is broadly worded, and I'm confident that
| there is scope for argument, but if it turns out that
| Github does not require a licence for its activities then,
| in respect of the code hosted on Github, I suspect it could
| make a reasonable case that the mandatory licence grant in
| its terms covers this as against the uploader.
|
| https://decoded.legal/blog/2021/06/github-copilot-initial-
| th...
| [deleted]
| Engineering-MD wrote:
| To me regardless if it is technically legal, it certainly
| doesn't feel right. Furthermore, contracts rely on people
| understanding what they are agreeing to, and I don't
| think many developers would agree to letting the code be
| used outside the terms of the license they uploaded it
| under.
|
| I am very surprised there hasn't been a legal challenge
| to it.
| mynameisvlad wrote:
| What, exactly, is there to challenge?
|
| "I'm sorry your honor I didn't understand what I was
| signing" I don't think has ever been a valid reason in a
| courtroom, similar to "I'm sorry I didn't know I was
| committing a crime" is not a valid defense.
| ghusbands wrote:
| Courts interpret the intended and understood meaning of
| contracts and terms all the time. Research the term
| "meeting of the minds" and case law around it.
|
| When the terms were written, it's exceedingly unlikely
| that they intended it or anyone understood it to be
| blanket permission to allow a trained AI to copy code for
| others and no user would have interpreted it that way.
| Microsoft/Github can't necessarily unilaterally increase
| the intended range without making it clear in the terms.
|
| If it got to a court case, and both sides could afford
| it, it could be a lengthy one.
|
| (This comment is not legal advice. I am not a lawyer.)
| mynameisvlad wrote:
| How does "[allowing] a trained AI to copy code" change
| the interpretation of the ToS?
|
| By uploading your code, you give Github an exclusive
| license to use it to improve their services. Copilot is
| such a service. Just because it's an AI and it provides
| others code does not somehow invalidate the license you
| gave.
| BaculumMeumEst wrote:
| > "If you look at the GitHub Terms of Service, no matter
| what license you use, you give GitHub the right to host
| your code and to use your code to improve their products
| and features," Downing says. "So with respect to code
| that's already on GitHub, I think the answer to the
| question of copyright infringement is fairly
| straightforward."
|
| That's assuming that all code on GitHub is uploaded in
| good faith by the copyright owner, which is not always
| going to be the case.
| blihp wrote:
| 1) Most likely
|
| 2) TBD
|
| 3) Not likely. Worst case a judgement will go against them,
| they'll effectively pay a fine and then they'll retrain it on a
| more restricted set of source code.
|
| 4) OSS has a pretty tragic history re: enforcement. It wins
| nearly every skirmish but has no interest in the war so from a
| big picture standpoint, it loses due to apathy.
| amelius wrote:
| "Good artists copy. Great artists steal."
|
| :)
| ewalk153 wrote:
| If the portion of code that Copilot lifts is the "heart" of the
| original work, that would be much less likely to be considered
| fair use[1], regardless of the length.
|
| > For example, it would probably not be a fair use to copy the
| opening guitar riff and the words "I can't get no satisfaction"
| from the song "Satisfaction."
|
| I wonder how this could be integrated into the system?
|
| [1] https://fairuse.stanford.edu/overview/fair-use/four-
| factors/...
| noisy_boy wrote:
| Say, I want to write a getter method like below:
| String getName() { return name; }
|
| Let us also assume that this snippet, unsurprisingly, has been in
| several copyrighted repos that didn't grant Github the right to
| share this code.
|
| So I start tying "getName" and copilot suggests the exact snippet
| above. If I use this snippet, is it plagiarism? Even though the
| above code is the most "obvious" way to write this getter and I
| would have written it this way even without copilot's suggestion?
| Or does the "uniqueness" or "non-trivial quantity" of the
| suggestions have any bearing in determining copyright violation?
| How/where do we draw the line?
| glouwbug wrote:
| Lucky for you if you, if you wrote a noise function that
| copilot returned as an implementation of Perlin noise you'd be
| breaching a _patent_! Said patent just expired a 20 year run,
| so you'll be okay this time!
| warkdarrior wrote:
| Clearly your code could be improved with some `Factory` objects
| and some dependency injection!
| lakomen wrote:
| I don't understand what's going on there.
|
| I don't use github. Can someone explain what the author means?
|
| Edit: in detail
| npteljes wrote:
| GitHub Copilot is a paid feature, but that's a red herring in
| this discussion - people are free to monetize free software,
| neither or the major licenses forbid this.
|
| GitGub Copilot is an advanced autocomplete / code generation
| system, based on a machine learning model. The code used for
| training the model is taken from projects hosted on GitHub.
| These projects were published under different licenses.
|
| The main questions are:
|
| Some of the licenses need something from you if you create a
| derivative work. Does the Copilot training itself count as
| creating a derivative work?
|
| Sometimes the autocomplete basically quotes the original code.
| Does the original license then apply to the autocompleted /
| generated code too? How much of verbatim code quoting does it
| need for the result to be considered a derivative work?
| kaetemi wrote:
| Those instances where people demonstrate verbatim copies, are
| mostly either well known snippets which have been copied a
| million times already, or obvious completions of a partial
| verbatim piece of the supposedly copied code that any coder
| could extrapolate.
| lakomen wrote:
| Nice, being downvoted for asking questions. Nice asshole
| culture on HN.
| martin_a wrote:
| Just like with StackOverflow, people are expected to invest
| some time or amount of work in getting familiar with the
| topic.
|
| Your question seemed to lack this kind of work and was
| probably therefore downvoted.
|
| I don't think that's so much about "asshole culture" but more
| like time management, as not everything can be explained to
| everybody in every topic.
| tjpnz wrote:
| You can ask questions but they can't be low effort and need
| to add something to the discussion.
| niek_pas wrote:
| Google "GitHub copilot"
| nickjj wrote:
| This might be overreacting but is there a way to opt-out of
| Copilot using your code in open source repos?
|
| It feels morally wrong to me that I can spend thousands of hours
| working on projects on my own free will but then a company can
| sell the code I wrote to others in the form of snippet completion
| as a service. In fact they end up selling your code back to
| yourself if you plan to use the service.
|
| If the answer is no, that moves the needle pretty far in the
| direction where I'd at least consider the idea of moving all of
| my repos to Gitlab. I don't care much about stars or popularity.
| I open source things that are interesting and useful to me and if
| other folks want to use it they can but I don't gain motivation
| from others using the projects I release. I like Github and its
| UI and it's no doubt "the spot" for open source but selling code
| written by others rubs me the wrong way a lot. It stinks because
| it also means no longer contributing to other code bases too.
| It's moving us in the opposite direction of what open source is
| about.
| ghostbrainalpha wrote:
| It would be kind of cool if Github could show some stat that
| code you wrote has been used 50,000 times for 12,000 people.
|
| Being a top CoPilot contributor should at least have value to
| signal on your resume.
| [deleted]
| ellyagg wrote:
| Well, I hope your viewpoint doesn't win the day, because making
| code as freely shareable and remixable as possible is a huge
| boon for humanity.
| jnsie wrote:
| It's just as shareable on Gitlab, no? And the issue isn't
| that code is not shareable - it's that a huge corporation is
| profiting from this code without consent from the developer.
| leereeves wrote:
| > a huge corporation is profiting from this code without
| consent from the developer
|
| Also without attribution. The more permissive licenses
| allow corporations to profit from shared code, but most of
| them still require attribution.
|
| And it's really not much to ask: when someone gives you
| free code, give them credit for their work.
| [deleted]
| celeritascelery wrote:
| Code being freely shareable and remixable is great. _Selling_
| that open source code for profit is not.
| WisNorCan wrote:
| Is your take that Microsoft should offer this for free? Or
| if they are not willing to do it for free, Microsoft should
| cancel this service and we should wait for Apache or
| someone else to offer the service?
|
| Or something else ?
| gfrff wrote:
| Microsoft should make this service free for open source
| (not just thought leaders), and compensate people
| otherwise. I should have a 0.01% equity in Open AI if
| they're using my stuff like this.
|
| Or they should do opt in.
| earnesti wrote:
| What is wrong with someone making a little dough. It is
| just numbers in database.
| jdbernard wrote:
| Yeah, but those numbers translate to food on the table
| for my kids, a roof over their heads, better education,
| etc. Come on, this is a tired response. Nothing is wrong
| with people making money. There is a lot wrong with
| people making money off of the hard work of others
| without any consideration or remuneration.
| jaywalk wrote:
| If your code is using a license that allows it, how could you
| possibly opt-out aside from using a different license?
| bouke wrote:
| Does GitHub verify that the code that is in my repository is
| actually in accordance to the license that I've added? I
| could just upload any proprietary code with an incorrect
| license, and GitHub would just use that to feed their AI.
| Like any other dependency that you incorporate into your
| application, GitHub should verify/audit whether the license
| allows them to do so.
| okasaki wrote:
| Microsoft could provide an opt-out for projects or even
| contributors, regardless of licence.
| nickjj wrote:
| > If your code is using a license that allows it, how could
| you possibly opt-out aside from using a different license?
|
| A repo setting that instructs Github not to use your code for
| Copilot, it could be a similar option as turning Discussions
| on / off.
|
| If they really want to win developers over they would even
| have Copilot scanning disabled by default but that'll never
| happen.
| jonny_eh wrote:
| Sounds like you want a new license that just prohibits use
| by one company for one purpose.
| widjit wrote:
| is there something wrong with that?
| jonny_eh wrote:
| Not at all, you can put any license on your code that you
| want.
| thamer wrote:
| There are other AI-based code completion systems than
| Copilot, at least Tabnine[1] and Kite[2] come to mind,
| I'm sure there are more.
|
| [1] https://www.tabnine.com/
|
| [2] https://www.kite.com/
| belter wrote:
| As of today there is a new one...
|
| "Now in Preview - Amazon CodeWhisperer"
|
| https://aws.amazon.com/blogs/aws/now-in-preview-amazon-
| codew...
| quietbritishjim wrote:
| Even if Github did provide that setting, as a courtesy,
| someone could clone / fork the code to another repo (if you
| use any licence that allows it) and not enable that
| setting.
| Inityx wrote:
| Sure that's possible, but there's a huuuge difference
| between Possible and Default Behavior.
| TAForObvReasons wrote:
| In a case like this, GitHub itself could set up a bot
| account that forks all projects as soon as you make the
| switch. The company in fact would be incentivized to do
| so.
| sammax wrote:
| Don't most licenses require at least attribution? I don't
| believe GitHub is restricting themselves to only licenses
| that don't. In fact the only software licenses I can think of
| that don't require attribution are 0BSD, WTFPL, CC0, MIT-0
| and Unlicense, and they all aren't super popular. Also in
| some countries creators have inalienable moral rights which
| can be enforced regardless of the license. For example in
| Germany it is impossible to relinquish certain rights you
| have as the creator of a work, including the right to
| attribution.
| TAForObvReasons wrote:
| This is an important and overlooked point. Even common
| permissive licenses (ISC / MIT / Apache-2.0) require
| attribution
| jazzyjackson wrote:
| Just as a mind experiment: couldn't CoPilot just publish
| a list of every github user and attribute the work to all
| of them?
| TAForObvReasons wrote:
| CoPilot is a black box at the moment. Microsoft claims
| they used the public corpus on GitHub. There are plenty
| of GPL, AGPL, and "source available" projects in the
| public corpus. So what exactly is the licensing?
|
| The argument may make sense if they limited themselves to
| public-domain (CC0) works, but that is not what happened
| here. If CoPilot attributed something to an AGPL project,
| does it mean the "virality" applies to all projects that
| use code from CoPilot?
| ntoskrnl wrote:
| There's also a good amount of commercial and leaked
| source code on GitHub, including MS's own leaked Windows
| XP source. I haven't played around with Copilot yet, but
| if I ever do I plan on copy/pasting some win32 API
| definitions to see if I can get it to spit out any of the
| leaked source.
| whoisthemachine wrote:
| This feels like a tool that can easily be destroyed by a
| lawsuit, I can't imagine a TOS can force you to give away
| your copy rights (especially if they allow and encourage
| you to post your own copyright).
| kragen wrote:
| If it can't then Wikipedia is doomed; its entire
| licensing status rests on the notion that editors grant
| such a license as part of their clickwrap ToS.
| igneo676 wrote:
| I'm not sure using a different license actually opts you out.
| By merely hosting your code on GitHub you grant them the
| right to analyze your code on their servers[1]
|
| They may be morally in the wrong, but I'm unsure they are
| legally in the wrong here. To boot, denying them the right to
| create this tool in your license is technically a violation
| of OSS principles and problematic
|
| [1]: https://docs.github.com/en/site-policy/github-
| terms/github-t...
| typetheorist wrote:
| > This license does not grant GitHub the right to sell Your
| Content. It also does not grant GitHub the right to
| otherwise distribute or use Your Content outside of our
| provision of the Service, except that as part of the right
| to archive Your Content, GitHub may permit our partners to
| store and archive Your Content in public repositories in
| connection with the GitHub Arctic Code Vault and GitHub
| Archive Program.
|
| Wouldn't this be a violation?
| PaulKeeble wrote:
| It should be automatic based on license. GPL code definitely
| shouldn't be included but MIT could be. They already have this
| information in most repositories and if its missing they have
| no right to use it at all. We don't need extra options the
| licenses already restrict the use and derivative work.
| [deleted]
| davesque wrote:
| Not without the text of the license. I, as a developer,
| cannot just poach open source code under MIT without
| including the copyright and terms from the original project.
| From the license:
|
| "The above copyright notice and this permission notice shall
| be included in all copies or substantial portions of the
| Software."
| meshaneian wrote:
| They might argue that a snippet isn't a "substantial
| portion" of "the Software", and they're only charging for
| the service not the content - regardless, I don't like it,
| this is exactly what certain licenses attempt to prevent.
| leereeves wrote:
| I would argue that substantial shouldn't be measured in
| lines of code, it should be measured in importance.
| Something like the fast inverse square root is
| substantial even though it's short.
| typetheorist wrote:
| I too have reservations about Copilot, but does the MIT
| license define a "substantial portion"? I doubt a snippet
| would fall under either "copies" or "substantial portions"
| [deleted]
| kemiller wrote:
| This is a really good point that I hadn't considered before.
| It's facebook all over again -- selling your own content back
| to you. Repo owners should be at least compensated when their
| code gets used. That would be an incredible market.
| lbhdc wrote:
| I stopped publishing open source after all this started coming
| out because I was so uncomfortable with it.
| [deleted]
| rosmax_1337 wrote:
| I think this problem has no good solution until IP laws around
| the world are properly reimagined from the ground up. I'm of the
| quite radical stance that code, music, art in terms of their
| intellectual existence should be free for anyone to take. (you
| can own a harddrive with code on it, and claim noone should steal
| it, but not the idea of the code itself)
|
| If you have ideas, code, music or art which you wish for noone to
| partake in, do your best to keep them secret. Certainly, breaking
| into secret areas should be illegal, but once the cat gets out of
| that bag it gets out of the bag.
|
| The creative people behind these ideas I believe will be able to
| find good compensation nonetheless in society, IP-laws nowadays
| only serve to protect megacorporations to the detriment of
| creativity and ideas.
| zzo38computer wrote:
| I agree. This will fix it. I think that copyright and patent
| should be abolished, but that if it is secret then it is still
| secret (unless someone else manages to come up with the same
| thing (e.g. by decompiling a published computer program to
| reconstruct the source code), which case it can be public). And
| so then also the AI can copy the code too just as much as you
| may do so manually; if it is published then you can do it and
| it should not be illegal to write such things.
| acuozzo wrote:
| This is, in part, why I will continue to use the original
| 4-clause BSD license for the code I write.
| wolframhempel wrote:
| When my last company got acquired, part of the due diligence
| process was a scan of our codebase for snippets from stack
| overflow. Every snippet found that wasn't posted with a clear
| license by the author was challenged and we rewrote it.
|
| Now, I'm not entirely sure how necessary this was from a legal
| perspective. But introducing an AI into the mix will bring up a
| lot of uncertainty when it comes to how much change is required
| for something to no longer be considered a copy/derivative.
| redox99 wrote:
| Isn't all stack overflow content creative commons?
|
| https://stackoverflow.com/help/licensing
| wolframhempel wrote:
| it is - which is a problem if you want to repackage something
| under a different license.
| redox99 wrote:
| But you said
|
| > Every snippet found that wasn't posted with a clear
| license by the author was challenged and we rewrote it
|
| How is the license not clear?
| wolframhempel wrote:
| Fair - that was poorly expressed.
| dmix wrote:
| That sounds like legal paranoia or a make-work program.
| dmortin wrote:
| Did the scan find the process if they changed the variable
| names, for example? Or is that considered a differing snippet
| then?
| wolframhempel wrote:
| This is exactly where it gets murky. We had the usual 1-4
| line snippets. We went the extra mile to change them,
| rewriting them from scratch, partially with different
| implementations. Did we need to do that? Would it have been
| enough to just change a variable name or some spacing or
| similar? I don't think there's a clear standard.
|
| The music industry has struggled with this for a long time.
| When is a song derivative, when a copy, when is it "inspired
| by"...
| anonymoushn wrote:
| That sounds rough. Here's an 8-line snippet, please make
| sure you don't infringe my copyright: p =
| mmap( null, size,
| PROT_READ | PROT_WRITE, MAP_PRIVATE |
| MAP_ANONYMOUS, -1, 0, );
| iptq wrote:
| I know this isn't really related to the whole copying ethics
| debate, but I definitely feel like there's some sort of foul play
| happening here. For all of the unlicensed projects out there, the
| license that is automatically granted to Github includes:
|
| > the right to store, archive, parse, and display Your Content,
| and make incidental copies, as necessary to provide the Service,
| including improving the Service over time
|
| It's insane how vague this is. Is Copilot a "Service"? Sure, by
| its definition:
|
| > The "Service" refers to the applications, software, products,
| and services provided by GitHub, including any Beta Previews.
|
| And since much of the code was published before Copilot's
| inception, this means Github can just arbitrarily add more
| "services" and milk the code for whatever it wants. Automatically
| service-ify any public repository? Sure, pay us for quotas. It's
| like a legal loophole to let Github just bypass any license
| restrictions you put on it.
| seydor wrote:
| Programmers are fine when their creations, pretty much all of
| tech, resells content that other people wrote for free, but no,
| not code, that one must be expensive
| anonymoushn wrote:
| I also don't think it's acceptable for TurnItIn to monetize
| content without paying the authors. My opinion about whether
| students should have their work stolen and monetized by a
| company doesn't seem to have much impact though.
| onpensionsterm wrote:
| The only one making money here is github. Very few programmers
| are selling open source code. And programmers are (in)famous
| for not buying software.
| zx8080 wrote:
| %s/programmers/tech capitalists/g
| danamit wrote:
| The code Copilot suggest from any given project most of the time
| is not enough to credit such project, when I look up code in some
| GitHub repo, and copy it fully or part of it, I do not credit
| that project.
|
| I do not see Copilot as useful anyway.
| Aeolun wrote:
| > what github / microsoft is counting on here is that open source
| developers do not have enough collective power to do anything to
| stop this
|
| I think it much more likely that they count on everyone liking it
| way too much to give a shit about their MIT code not being
| attributed correctly.
|
| I certainly don't. MIT just seems like the most convenient
| license for people that need licenses (corporations?), so that is
| what I use.
| pvaldes wrote:
| Each day sounding more as Zopilote, it seems.
| parhamn wrote:
| Pretty soon the world is going to come to realize art/creation is
| just blending, incrementing and repurposing prior art.
|
| No book, painting, codebase, sonnet, design is theft-less.
|
| The art is the space reduction, otherwise we'd just bruteforce
| away.
| wnkrshm wrote:
| So the only thing left is handiwork I guess. Engineering isn't
| different from art in any way, the constraints are just
| stricter.
| pera wrote:
| I'm not sure what do you mean by "theft-less" but I believe you
| might be conflating inspiration with derivative work: Copilot
| can produce verbatim copies of open-source code, this would
| make it more similar to how some musicians sample other
| people's music to create new music.
| lioeters wrote:
| Recommended:
| https://en.wikipedia.org/wiki/Exit_Through_the_Gift_Shop
| izacus wrote:
| > Pretty soon the world is going to come to realize
| art/creation is just blending, incrementing and repurposing
| prior art
|
| If that happens, the big copyright/IP conglomerates will
| immediately jump on that and make sure that laws are adjusted
| and they get their cut of every single word and line anyone
| puts near their smartphones ;)
| Agamus wrote:
| This idea has been around for a while - why... "pretty soon"?
|
| And I'm sure I couldn't disagree with you more. Or are
| 'influence' and 'theft' the same now?
| TremendousJudge wrote:
| The idea has been around a while, but the legal system
| doesn't reflect it.
|
| I don't think it will any time soon though.
| coldtea wrote:
| > _Or are 'influence' and 'theft' the same now?_
|
| They have been the same for most of history. People could
| openly copy titles, plots, parts, phrases, etc from prior
| work. Same for mechanical designs. The only thing preventing
| them was obscurity (e.g. the inventor trying to make it
| hidden) not any law or ethical idea that it's bad (there
| wasn't any). That's how things from math to gears to tunes
| got better (or changed over time, in the case of art, as
| better/worse is subjective there).
|
| E.g. globally and historically folk music has been basically
| taking whatever you want from tunes and songs where everybody
| does the same with no "permission" asked or needed to be
| given.
|
| Like 4 verses but want to add a fifth or change some part? Go
| ahead. Want to play it exactly like you've heard it? Go ahead
| again.
|
| The idea of "theft" in that regard came in the last 2 or so
| centuries, and was enforced with artificial legal barriers
| and new "ethical" concepts that are neither "natural", not
| present for the vast majority of history (including golden
| ages of art production).
| Agamus wrote:
| Not sure why I'm being downvoted here - I agree that this
| idea has been the same for most of history.
|
| Your example of folk music is an odd one, for exactly that
| reason - it largely repurposes existing art. For example,
| Wagner wrote extensively about why we shouldn't respect
| folk music for this reason. I mostly disagree with him, but
| his comparison at least illuminates that this isn't so
| black and white. And that's really just scratching the
| surface of a complex topic.
|
| I sense that if someone came along 2400 years ago with the
| exact play that Sophocles had just produced and claimed
| they had just composed it themselves, immediately after a
| public performance, someone would claim that theft had
| occurred. Do you disagree?
| coldtea wrote:
| > _I sense that if someone came along 2400 years ago with
| the exact play that Sophocles had just produced and
| claimed they had just composed it themselves, immediately
| after a public performance, someone would claim that
| theft had occurred. Do you disagree?_
|
| Yes. They would say it was "plagiarism", which is
| different than theft.
|
| And there was no law against either case.
| trention wrote:
| Except that AI will not lead to "golden ages of art
| production" because nobody gives a sh*t about art created
| by AIs. And nobody will.
| Nowado wrote:
| That's a lot of people to dehumanize with a single swift
| no true Scotsman.
| coldtea wrote:
| > _because nobody gives a sh_ t about art created by AIs.
| And nobody will.*
|
| You'd be surprised. Especially if people don't care/are
| told/whether it's "created by AI or not".
|
| Whether in "high art" or lowly pop, "generative music"
| (and fine art) has long been a thing. And people do
| attach to it (e.g. to Brian Eno's generative works made
| by rule based systems he programs).
| trention wrote:
| No, I will not be surprised. Outliers are outliers. "Art"
| created by AIs will just have price (and cost) of ~0 and,
| like everything that has a price/cost of 0, nobody will
| give a sh*t about it. The only real question is how will
| human artists (provided they exist in your preferred
| dystopia) will prove that they have created something
| themselves.
| coldtea wrote:
| > _No, I will not be surprised. Outliers are outliers.
| "Art" created by AIs will just have price (and cost) of
| ~0 and, like everything that has a price/cost of 0,
| nobody will give a sh_t about it.*
|
| Art doesn't touch people because it has cost.
|
| In fact, for ages certain types of art had no cost -
| poetry, public festivals, and so on. And many still don't
| (e.g. free punk/underground/indie/etc public
| performances), Soundcloud music, and so on.
|
| Most movies and series seen on TV are also ~0 (and for
| kids, everything is ~0, as their parents foot the bill),
| but they're still touched by them.
|
| > _The only real question is how will human artists
| (provided they exist in your preferred dystopia) will
| prove that they have created something themselves._
|
| Note the loaded words "your preffered dystopia" (who says
| whether I prefer it or not? I merely describe what's the
| case. You have some ethical/political point to make).
|
| As for the answer to the question, they wont have to.
| People respond to the quality of the work, not who made
| it (and whether they used AI or chance - another popular
| method - or not).
|
| In fact tons of genius artists have described themselves
| not as the creators but as "mere conduits", and say the
| music/words/etc come from "elsewhere" (implying god, some
| muse, some spirit, etc). Especially when they fell the
| most "inspired" (the word itself means "visited by the
| spirit").
| trention wrote:
| None of those things had zero price and zero cost. The
| fact that the consumer didn't pay directly for them is
| irrelevant. You can try testing your theory by trying to
| sell a "painting" created by DALLE/whatever for more than
| a third-rate amateur painter can sell one of his. Good
| luck with that, especially when access to the model
| becomes easy.
|
| >People respond to the quality of the work, not who made
| it
|
| This is so painfully incorrect and naive (and contra
| anything we know about the value of everything which
| creation has been automated before) that I think it's
| meaningless to continue this conversation.
| coldtea wrote:
| > _You can try testing your theory by trying to sell a
| "painting" created by DALLE/whatever for more than a
| third-rate amateur painter can sell one of his. Good luck
| with that, especially when access to the model becomes
| easy._
|
| As if that proves anything? Sale price is irrelevant.
| There are paintings sold for millions that 99.9% of the
| people could not give less fucks for, and "amateur
| painter" stuff that touch most people who see them.
|
| It's also not like a $2 million in production costs
| Michael Jackson song with $50M sales is "better"
| artistically (as opposed to commercially) than a song
| composed and played by some random guy on an acoustic for
| ~0.
|
| > _This is so painfully incorrect and naive (and contra
| anything we know about the value of everything which
| creation has been automated before) that I think it 's
| meaningless to continue this conversation._
|
| It was meaningless to begin with, as you don't discuss,
| you present your "ultimate truth" ("contra anything we
| know", lol).
|
| In fact there are tons of works where the creator is
| anonymous (from folk music and art to early house, techno
| and rave music, a scene with cherished anonymity), and
| people respond to it just fine...
| js8 wrote:
| > The idea of "theft" in that regard came in the last 2 or
| so centuries, and was enforced with artificial legal
| barriers and new "ethical" concepts that are neither
| "natural", not present for the vast majority of history
|
| This is true for other forms of property as well, like land
| ownership.
| mihaic wrote:
| This type of argument always distracts from the fact that
| figuring out where we draw the line between theft and
| reimagining.
|
| The Magnificent Seven for instance was a reworking of Seven
| Samurai, but stands on its own as an original creation. Going
| into a cinema and filming a picture to later put on a torrent
| site is not artistic reworking.
|
| The hard discussion is about what is acceptable, we all know
| prior art exists.
| scotty79 wrote:
| There are many differences between those acts of thievery or
| inspired creation however you might call it. But there are
| many similarities too. Fascination with the original is one.
| Desire to own it in one way or another is one too.
| Differences are in the skills, the means, the result, what
| was stolen and financial success that came out of the act.
| Griffinsauce wrote:
| > This type of argument always distracts from the fact that
| figuring out where we draw the line between theft and
| reimagining.
|
| This seems to be missing a word, could you clarify?
|
| Also: since you mentioned theft: this actually comes down to
| the discussion whether you can own thought and/or digital
| artifacts which can be replicated without taking anything
| away from the "owner".
|
| Given the absolute choice I'd rather pick complete freedom
| than restriction. I suspect that anyone's opinion on this
| follows what they value higher: creation or exploitation.
| mihaic wrote:
| Sorry, I should have double checked, that sentence was
| incomplete. Yes, I meant to say that a more nuanced
| approach is crucial, and that means rejecting that we have
| to choose between Disney-backed extreme IP laws or total
| freedom.
| ajuc wrote:
| > The hard discussion is about what is acceptable
|
| What if we just say "both"? Libraries were a thing for
| millenia and writers still wrote books. There are costs to IP
| laws and the benefits aren't obvious.
| Veen wrote:
| As a writer, the benefits are quite obvious to me.
| Timwi wrote:
| Convenient, isn't it?
|
| As a consumer, it's quite obvious to me too how it
| benefits only the writer/creator at the detriment of
| everyone else.
| barthvr wrote:
| Because writing a book, shooting a movie, composing a
| song, takes time ?
|
| So either those pieces are IP-protected, and their author
| can make money with it, or we have to set up a basic
| income for everyone, and art becomes free.
| regularfry wrote:
| It's perfectly consistent to say both that there needs to
| be a system to ensure creators are compensated and that
| the current system for doing so is terrible.
| Veen wrote:
| It is consistent but useless if you have no suggestion as
| to what would replace the current system in a way that
| preserves the benefits to both parties.
|
| 1. Creators get a sustainable reward for their work. They
| wouldn't do it otherwise. I certainly don't do it for
| fun.
|
| 2. Consumers get to access that work as they wish.
|
| (Of course, this being HN, I'd expect any ideas to apply
| to developers as well as to writers and artists i.e. if
| writers have to give up copyright, so do developers,
| startups, and so on.)
| js8 wrote:
| Benefits of what? Of copyright enforcement, or of
| sharing?
| bryanrasmussen wrote:
| the grandparent comment said the benefits of IP Laws were
| not obvious. So it is of the benefit of the laws as they
| currently exist, that implies enforcement of said laws.
| meheleventyone wrote:
| Libraries pay fees to lend books, at least in our modern
| capitalist society.
| ajuc wrote:
| It was a news to me so I checked and it's true. Since
| 2016 in my country ;)
|
| And it's a symbolic amount for vast majority of authors
| (country-wide it's around 5-5000 USD per year per author
| and the distribution is heavily skewed towards 5 USD).
|
| So yeah :) I think authors were fine without these 5
| bucks a year.
|
| EDIT cause it might not be obvious. It's not per library.
| It's per country.
| jrochkind1 wrote:
| Not in the USA, where the "first-sale doctrine" means
| once you buy a book, you can do whatever you want with
| that copy of the book (lend, rent, sell, destroy) without
| needing a license. Libraries in the USA definitely don't
| pay a fee beyond the purchase price of the book (or they
| can legally lend donated books etc). Copyright holders
| don't make any additional money from library lending.
|
| I am not familiar with how it works in other countries,
| but I have heard something about there being such a fee.
|
| (It's not quite true to say libraries have existed for
| "millenia" though, with regard to this issue. Mass
| produced printing hasn't in fact existed for millenia,
| libraries 1000 years ago had hand-copied manuscripts,
| probably mostly scrolls. The effect on "the market"? For
| whatever reason authors were writing then it was not to
| make money by selling reproductions of their writings,
| that wasn't a thing. Which means, yeah, btw, people still
| wrote things and made up stories even when they couldn't
| make money by charging people for copies to read...)
| Chris2048 wrote:
| Is it really "just" that? Is there no original creativity in
| the choices (and skill) in the blending, and choosing what (and
| how) to blend?
|
| Would you describe a parody, or a critique/review, as equally
| without original merit?
| natly wrote:
| Unless every invention is gonna be AI generated (which is kind
| of a scary situation), intellectual property still needs to be
| a thing (otherwise people won't have incentive to invent, it'll
| just be stolen from them).
| pydry wrote:
| People have an innate desire to invent and create. This is
| why so many people do it for zero extrinsic reward. Hell,
| this is the case for almost _every_ musician. They are fed a
| pittance in streaming, only a bit more than most OSS
| developers get.
|
| This intrinsic motivation is more normally "farmed" by
| investors who capitalize and capture the IP value for
| themselves. This actually has a detrimental effect on
| innovation.
|
| Doing away with or watering down intellectual property
| protections will just take big meaty chunks out of the stock
| market and partly equalize wealth distribution.
|
| It'll probably spur innovation too - historically it usually
| has, but preserving the existing social order takes
| precedence over that which is why a lot is invested in
| persisting the myth that it aids rather than hinders
| innovation.
| ModernMech wrote:
| > otherwise people won't have incentive to invent, it'll just
| be stolen from them
|
| Citation needed. Speaking personally, I spend most of my
| creative energy on a project which is open source and
| permissively licensed to the point where I'm fine with anyone
| stealing it. I expect to earn negative money from it at the
| limit.
|
| Why do I do it? I dunno it's fun. Can't that be enough?
| Timwi wrote:
| It's remarkable how many people still repeat this
| unsubstantiated cliche.
| habibur wrote:
| We stand on the shoulders of giants. That had been the way for
| decades. A newer stack over the older one without much thought.
| And someone in the future will build even a newer stack over the
| current ones.
| [deleted]
| pornel wrote:
| Tough pill to swallow. Microsoft's actions don't seem fair, but
| fighting them with copyright could weaken _fair use_ :
|
| https://felixreda.eu/2021/07/github-copilot-is-not-infringin...
|
| There's a good argument that demanding copyright protections on
| scraped datasets and short snippets is a double-edged sword. It
| could harm search engines, distribution of news, and non-
| commercial ML research too.
| tpoacher wrote:
| Does this mean I can steal stuff if I say I trained an AI to do
| it for me?
| bmacho wrote:
| Is _cat_ an AI?
| tpoacher wrote:
| Nobody said it can't overfit 100%, right?
| AtNightWeCode wrote:
| Copiliot will be that bandmate that plays a new riff and leave
| you wondering about where it was borrowed from.
| capableweb wrote:
| If GitHub could guarantee that the code Copilot had ingested was
| only made with OSS licenses, then I don't see what the problem
| is.
|
| But as far as I understand, GitHub trained Copilot on any public
| repository on GitHub, meaning even if it doesn't have a license
| specified (so the user publishing it still has the copyright to
| it), then I don't see how it can be OK.
| thelastbender12 wrote:
| It is hard to see how verifying licenses is a solvable problem,
| when licensing for code dependencies can be transitive. For ex
| - if I copy code from a GPL codebase like Linux and create a
| Github repository with an MIT license.
| danuker wrote:
| You should be able to choose flavors of the model trained
| only on public-domain code which does not require
| attribution, for example.
|
| But that would mean Microsoft acknowledging license
| violations.
| thelastbender12 wrote:
| Sorry, to be clear, I meant even if a Github user asserts
| their code is public-domain/no-attribution/unlicensed, they
| could have lifted it off a codebase that doesn't allow it.
| It would be tricky for Github to establish the code was
| indeed original and hence their agreement with the user
| allows them to train their models on it.
| danuker wrote:
| > they could have lifted it off a codebase that doesn't
| allow it
|
| Ah. But then someone else is guilty of redistributing
| code without permission.
|
| But you're suggesting, GitHub should implement something
| like ContentID but for code. Which should be cheaper
| (since code is cheap to analyze, while videos are much
| more bandwidth-intense). And this would kill two birds
| with one stone.
| galoisgirl wrote:
| Here's an example:
| https://twitter.com/ChrisGr93091552/status/15397316329318031...
|
| > I checked if it had code I had written at my previous
| employer that has a license allowing its use only for free
| games and requiring attaching the license. yeah it does
| nl wrote:
| That's a pretty bad example. He prompted it using the exact
| function header taken from the code he is complaining about.
|
| It'd be much more interesting if he setup a function that was
| doing a similar thing but with different parameter types and
| names, and a different order of parameters (ie, like a real
| problem).
| triknomeister wrote:
| Does that matter? A code provided should be provided with
| the license needed to use the code, otherwise the user is
| opening themselves up to litigation.
|
| Hence why I agree with another comment somewhere that
| Microsoft is banking on software developers not litigating
| about use of their open source code in closed source
| projects.
| redox99 wrote:
| Maybe when you accepted GitHub ToS you gave them permission for
| your code to be used for ML training.
| eloisius wrote:
| I can't say I remember the terms saying anything to the
| effect of granting Microsoft a perpetual unlimited license in
| addition to whatever license I package with the code when I
| signed up. Not doubting it, but I would have expected that to
| raise some suspicion long before Copilot was around.
| redox99 wrote:
| It could be something as innocuous as "you allow your code
| to be analyzed, processed or otherwise handled by Github
| software" I suppose, which wouldn't raise suspicion.
| hooby wrote:
| many OSS licenses require attribution
| saghul wrote:
| Even if it was trained with OSS licenses, some of them require
| proper attribution, which copilot doesn't do.
|
| Now, where the threshold is for substantial derivative work in
| order to require attribution is an interesting question.
| Guid_NewGuid wrote:
| I find this whole topic very annoying, this is like the 3rd
| variation to reach the front page today. But it has made me
| realize why I instinctively dislike Free Software as a movement.
|
| Copyright and licensing are bad, actually. Stop getting worked up
| about the idea of using courts to punish theft. Stop getting into
| a frenzy of arousal about the police kicking down doors to drag
| Billy Gates to jail because 80 characters of fast square root is
| theft but 79 isn't.
|
| Where on earth is the ambition and vision!? Knowledge is public
| domain. A commons of knowledge is a public good. The cost of code
| copying is zero.
|
| Sure in our day job we have to pretend to care about this stuff.
| But when did the ideological scope of what can be achieved become
| rules lawyering over license text.
|
| Copy my MIT licensed code without attribution? I don't give a
| shit, go ahead, I hope it helps, in fact I want a truly public
| domain license but copyright law is so hostage to corporate
| interests no such thing exists in many countries.
|
| Free the code.
| eikenberry wrote:
| There is a license for that, the MIT-0 or the MIT No
| Attribution License.
|
| https://opensource.org/licenses/MIT-0
| progman32 wrote:
| I see the free software movement as a variant on your ideals
| but rooted in practicality given the current environment.
| Guid_NewGuid wrote:
| I think we share a lot of the same goals but they presuppose
| openness based on violence, if you don't do what their
| license says exactly then they're going to use lawyers and
| courts and the state's monopoly on violence to make you
| comply.
|
| I think at a fundamental level this abandons any vision of a
| true commons since as copilot discussions reveal the well is
| now polluted (to mix metaphors) and though in some frames the
| code is more free you certainly won't be if you fail to pay
| the penalty levied in a civil case for misusing it.
| sirsinsalot wrote:
| "A commons of knowledge is a public good."
|
| Yes but this copilot model takes that, adds value and doesn't
| itself join the public common good. Instead it takes it, and
| makes you pay to have it back in another form.
|
| If copilot were open source and the model released for the
| public good, being built of public data (in your scenario) we
| would have a very different conversation.
| jazzyjackson wrote:
| If it was just published as a public good it would probably
| be as illegal as sci-hub
|
| I consider the $10/m as a donation to the microsoft legal
| defense fund to allow free access to accumulated knowledge.
| sirsinsalot wrote:
| To allow access to a service that grants you the
| accumulated knowledge's output in small bits.
|
| I'm all for a world where these tools help developers, but
| i'm not here for a system that isn't open. I want to own my
| tools.
|
| Copilot is a bit like musicians paying a monthly fee for
| access to a loop library. Except all the loops are rip-offs
| of other peoples hard work and there's no effort to
| compensate them.
|
| If I made an AI that resampled music into derivative tracks
| ... you can be damn sure i'd be sued until my ears bled.
| andybak wrote:
| And I really don't mind.
|
| I want every line of code I've ever written to be used as
| much as possible.
|
| I find "intellectual property" to be dubious to the core. I'm
| not confident enough in my feelings to be a zealot, but if I
| had to pick sides then I know which side I would pick.
| sirsinsalot wrote:
| If an AI "listened" to music and created new samples for
| musicians to use for a fee, do you not think the original
| musicians should be compensated?
|
| The value transfer is basically theft.
|
| It isn't about the usefulness of the service, or even that
| something similar is a good thing ... it is about the
| execution and what it says about fairness for those that
| worked to create the data it depends on to produce value.
| andybak wrote:
| I'm not sure I was clear enough when I expressed my
| doubts about the concept of intellectual property.
|
| Your musical example is playing out in the courts in
| multiple forms. The Marvin Gaye case, Led Zeppelin, Katie
| Perry etc.
|
| And each case pushes me further towards wanting to rip
| down the whole rotten edifice.
|
| We've lived through 4 or 5 decades of unprecedented
| expansion of the domain to which IP lays claim. Surely
| it's time for the pendulum to swing the other way?
| JoshTriplett wrote:
| You're welcome to use a "do whatever you want" license on
| your code, and people should respect that. (Though even
| those licenses tend to require attribution, and copilot
| doesn't do even that.)
|
| Other people use licenses that try to create a commons
| where if you want to use it you need to share your own
| code, as a counterpoint to the non-commons in which you
| can't use code at all. And if people use those licenses,
| they should be respected as well.
|
| By all means, eliminate copyright, and let all code be
| copied freely. And until that happens, as long as
| proprietary code exists and doesn't let anyone copy it,
| respect copyleft licenses as well.
| andybak wrote:
| A fair point. "What to do in a world where copyright
| already exists" is a tougher question to answer and one
| in which I tend to go back and forth.
| Varqu wrote:
| People (github in this case) do something to make your life
| easier so that you can save time for the price of 1 latte per
| month and you complain?
|
| Software Developers seem to be the most whining profession in
| the world and I despise this attitude (while being a
| developer myself)
| tuckerman wrote:
| People aren't whining because the price is too high, they
| are upset because some (myself included) believe Microsoft
| is exploiting developers by copying their work against
| their wishes and then turning around and selling other
| developers a product which may or may not be generating
| code which violates copyright/patent licenses. A developer
| who inadvertently uses a copilot suggestion which gets them
| into hot water is going to be spending a lot more than a
| the cost of a latte to defend themselves in court.
| sirsinsalot wrote:
| This. It is a matter of (a) consent and (b) compensating
| people that, without their data, the model would be
| useless.
| Philadelphia wrote:
| Yep, anything useful has to be legal and welcomed.
| Microsoft should start breaking into people's houses and
| sorting their underwear drawers for them while they're out.
| Million dollar idea!
| Guid_NewGuid wrote:
| Yes they haven't paid it forward, or back, but why fight on
| the occupier's territory. By calling for legal frameworks to
| enforce this we accept the language and terms of the dominant
| party. By using courts and the law and creating new law for
| copyright we actually move further from the goal of
| abolishing copyright and IP entirely.
|
| Every time we use courts to enforce IP we're strengthening
| the Walt Disneys and Nintendos of the world.
|
| (I accept I am in a group of like 3 people with this goal but
| it's my view)
|
| Edit: to expand slightly more on this. People should be able
| to decompile/reverse engineer whatever the hell they want.
| They shouldn't have to worry about armed goons kicking down
| their doors. Every time cases are used to strengthen the
| enforcement of IP/licensing, whether for the light (FSF) or
| dark (Micro$oft, Google, etc) the outcome is the same, we
| move further from that goal.
| ozim wrote:
| Funny thing is ALL these legal frameworks are there to
| protect these 3 people like you.
|
| If there would be no enforcement of IP/licensing or legal
| enforcement - M$, Google etc. would not be nice - they
| would just come over and kick your doors cut your head off
| because they could do so. With legal framework they at
| least have to ask someone else.
|
| You just have to understand you don't stand a chance with
| your 3 buddies against 10 motivated attackers.
|
| Writing about "accepting terms of dominant party" you
| clearly never had a robbery at your house - imagine now
| corporations doing the same when there would be no legal
| frameworks.
|
| Read up on Dutch East India Company - or just Nestle -
| Microsoft or Google are still quite nice companies with
| Walt Disney and Nintendo.
| Guid_NewGuid wrote:
| This is a slight misreading of my general political
| position. I am pro-government in general. I find the term
| "monopoly on violence" to generally indicate someone who
| lives a very cosseted and easy life who can spend time
| getting mad about like, seatbelt laws or speed limits, so
| I use it somewhat tounge-in-cheek.
|
| There's quite a lot of possibilities between DMCAs of
| youtube-dl repositories and Big-co death-squads
| decapitating people in their homes. I'd prefer where we
| are now to the Brazil end of that spectrum but we can
| imagine better models of digital and intellectual
| 'property'.
| zzo38computer wrote:
| I also agree to abolish copyright and IP entirely.
|
| I agree that people should be able to decompile/reverse
| engineer whatever the hell they want.
|
| And if armed goons (whether goverment or if they are
| Microsoft or some company) kick down your doors, then they
| should be arrested for trespassing.
| matheusmoreira wrote:
| > the goal of abolishing copyright and IP entirely
|
| Completely agree with you. It's the 21st century, once data
| has been published there is no controlling it anymore and
| all attempts to do so lead to the destruction of computer
| freedom. No doubt people all over the world copy code every
| single day with nobody even finding out about it. I'd
| rather get rid of all these monopolists than limit the
| potential of computers to whatever reality enables them.
|
| >I accept I am in a group of like 3 people with this goal
| but it's my view
|
| Now we're four.
| handoflixue wrote:
| > Every time we use courts to enforce IP we're
| strengthening the Walt Disneys and Nintendos of the world.
|
| Can you actually point to substantial examples where Disney
| or Nintendo benefited significantly from a precedent set by
| an open source court case? Open source has been around for
| decades, so it should be trivial to find numerous clear-cut
| examples at this point... if your theory is actually
| correct.
| Guid_NewGuid wrote:
| No, I honestly have no idea. I know nothing about the law
| and understand even less. I may be wrong about all of
| this, but if we take the (laughable) idea of justice
| being blind it stands to reason any precedent that
| protects a single open source developer also protects
| Amazon's code.
| JoshTriplett wrote:
| Proprietary software is more than willing to use those
| legal frameworks. Unilaterally disarming while your
| opponent does not is a losing strategy.
|
| As long as copyright exists, copyleft should be respected.
| spullara wrote:
| It absolutely adds to the common good in the form of people
| using it to write more open source code.
| sirsinsalot wrote:
| Seeing as copilot is known to output code thats a straight
| copy from non-permissive code where the author's permission
| wasn't obtained ... I'd say it is helping you steal from
| code authors without giving back (as there is no obligation
| to open source your code).
|
| Given Microsoft's record of persuing IP violations
| aggresively through the legal system, I'd say the whole
| thing is ironic.
| jppope wrote:
| > "Yes but this copilot model takes that, adds value and
| doesn't itself join the public common good. Instead it takes
| it, and makes you pay to have it back in another form."
|
| $10/ month ... how much to you think this thing cost to
| build, and to maintain?
| nightski wrote:
| That's the whole point. Without the data, it would be
| worthless. Microsoft is not paying the full cost because it
| is ripping the data without asking consent. I'm not saying
| what they are doing is illegal per se, but it's definitely
| immoral.
| Guid_NewGuid wrote:
| But why is it immoral? All that code is still out there,
| if I had the time and the resources I could build a
| language model. Unlike commons in the real world (e.g.
| land, fresh water, etc) a code commons is purely
| additive. With the release of Copilot (which I don't
| intend to pay for or use) nothing has been destroyed,
| instead we'll get more code for less work where companies
| do pay for their developers to use it, some might even
| find its way back into the commons as new open-source
| code (whether more code of copilot generated quality in
| general is an unalloyed good is left as an exercise to
| the reader).
| bayindirh wrote:
| Because copilot is violating the terms I put for my code.
| My code is GPL. It cannot be put into projects with
| incompatible licenses. That's my code, and I share it
| with strings attached. You can't just copy my code and
| sell to other parties no strings attached.
|
| If that's fine and dandy, Microsoft should also train
| Copilot on their source code repositories, so we can use
| that knowledge, too.
| visarga wrote:
| It costs money to run a huge language model with low latency,
| in the loop with you - charging 10$/month is reasonable. You
| need multiple GPUs to load even a single copy. Copilot is
| adding something extra to the original code - it selects the
| recommendation from the whole corpus, while keeping the
| surrounding context into consideration and adapting to your
| variable names.
|
| And in reality 99.9% of the generated code has no long ngrams
| in common with the training set, it's already original. All
| they need to do is to enforce never to generate data
| identical to the training set, something that can be
| implemented with a bloom filter, then the generated code is
| impossible to attribute and should have no legal problems.
|
| In the end what do models like Copilot do? They act like
| culture - absorbing and replicating memes. They free the
| knowledge and make it reusable. They can act like a general
| purpose NLP tool for information extraction, classification
| and text generation. You can implement your ideas faster with
| it, don't need to label much data.
|
| It works even with just a prompt. Try OpenAi Codex to extract
| a receipt to see what I am talking about - it gives you the
| output in JSON. It's a new tool and a new interface to the
| computer. There are going to be plenty of open source
| implementations as well, some are already under training.
| nonbirithm wrote:
| I think because this kind of ML is so new, we have no choice
| but to frame arguments for/against in terms of the structures
| that have been in place for decades past (copyright, open
| source licenses). We don't yet have the legal language to
| express dissent against ML in clear yes or no terms.
|
| I think if there were an option to add a machine learning
| clause and ask individual creators if they wanted it applied in
| that context, we would see a considerable amount of uptake.
| It's just that we couldn't forsee this progress happening so
| soon, and the issue is still not visible enough. I think it's
| only a matter of time before the culture catches up and new
| creative works in the coming years are excluded from training
| sets by their authors with clear and direct language.
|
| By that point there would be no way to argue "but they
| shouldn't care, they licensed it like this, so I'm assuming
| it's fine for ML use."
|
| If copyright is not enough to stop another entity from using a
| person's data for training, then some other protection should
| be invented that does.
| bayindirh wrote:
| > I find this whole topic very annoying, this is like the 3rd
| variation to reach the front page today.
|
| Me too. I also find three iterations of the same subject not
| enough discourse. We need to take this matter more seriously.
|
| > But it has made me realize why I instinctively dislike Free
| Software as a movement.
|
| On the other hand, this whole discourse reminds me why I
| absolutely love Free Software as a movement.
|
| > Copyright and licensing are bad, actually.
|
| This is why we have "Copyleft".
|
| > Stop getting into a frenzy of arousal about the police
| kicking down doors to drag Billy Gates to jail because 80
| characters of fast square root is theft but 79 isn't.
|
| And, stop getting into frenzy of arousal about being able to
| use any and every code piece you see elsewhere in any project
| regardless of its license.
|
| > Where on earth is the ambition and vision!? Knowledge is
| public domain. A commons of knowledge is a public good. The
| cost of code copying is zero.
|
| This is why GPL is important. It forces knowledge to evolve in
| the open, stay in the public domain and help it actually makes
| public good. It also doesn't hinder ambition and vision by not
| taking it to private domain, and keeping it open to everyone.
|
| > Sure in our day job we have to pretend to care about this
| stuff. But when did the ideological scope of what can be
| achieved become rules lawyering over license text.
|
| You might be pretending to care about this in your daily job,
| but we really care. Some of the projects I take part can't ever
| include GPL code (because the projects are MIT licensed). These
| texts are court-tested licenses, so they're as proper and
| serious agreements as the EULAs of "particular" software
| companies.
|
| > Copy my MIT licensed code without attribution? I don't give a
| shit, go ahead, I hope it helps, in fact I want a truly public
| domain license but copyright law is so hostage to corporate
| interests no such thing exists in many countries.
|
| If I want my code to be copied and possibly closed, I'll
| license it with MIT or BSD-0 and forget about it, but if I'm
| licensing my code with GPL3, it means I want that code to stay
| open. As a license, I expect anyone using that code to respect
| that license.
|
| > Free the code.
|
| Yes, and respect the license the author selected for his/her
| code.
| georgeecollins wrote:
| You may not care about licensing or copyright, and I imagine
| many others who create code under an attribution license don't.
| That's still not the same as saying "copyright and licensing
| are bad." Too many businesses depend on them to exist for me to
| have that opinion.
|
| If an AI takes a copyright work and makes its own version-- say
| combining two novels by popular authors in a way that is unique
| but keeps large parts of the text intact, can I sell that? I
| think if I were the authors I would be unhappy.
|
| Also, how hard would it be for copilot to include a comment
| saying "// I got this line from x repo" when you are copying
| from a new repo? I am guessing not hard at all. Then at least
| the user would be aware of where their code was coming from and
| could be expected to make a judgement. If the line is "let a =
| b" then probably no worries. But if it is hundreds of lines of
| a simulation, all from the same repo with no changes, then I
| think some attribution is good for both parties.
| Guid_NewGuid wrote:
| Don't get me wrong, I know this (copyright abolition) is pie-
| in-the-sky stuff. I'm using an anon account to post because
| even advocating for it could be troublesome for employment.
| But I don't accept we have to be meek or have small goals in
| talking about this ideological stuff. And I think this has
| made me realise why I find the Free Software vision so
| disappointing and weak. And hence why I find all these
| (ideologically) Free Software aligned takes of sending Billy
| to jail for a thousand years so irritating.
| Schroedingersat wrote:
| The problem with this is 'freeing the code' in this instance
| leads to microsoft building a wall around it and asserting
| complete control in a few years.
|
| Copyleft exists for a reason and without the ongoing fight for
| the commons we lose it all.
| vajow46267 wrote:
| So glad this sentiment is becoming more common in the OSS
| community! I MIT license everything, if someone wants to make
| money using stuff I wrote that's awesome, and I wish them the
| best.
|
| I don't think users owe me anything at all. If people want to
| PR back that's cool but if not that's cool too.
| wcoenen wrote:
| > _I want a truly public domain license_
|
| I think this sentence contradicts itself.
|
| A "license" implies that there is a copyright holder who allows
| usage of the work under the terms of said license.
|
| While "Public domain" implies that there is no copyright holder
| (e.g. because the copyright expired, was explicitly waived, or
| is for some other reason not applicable).
|
| If you want to put your work in the public domain, you can do
| so; simply include a note saying that you dedicate it to the
| public domain.
| Guid_NewGuid wrote:
| You're right that it does contradict itself, but the
| unfortunate situation is that public domain declarations
| don't work and would make it harder for people to use your
| code safely in the current licensing model. The closest
| options are Unlicense and CC0 afaict and both don't work in
| many European jurisdictions.
|
| I just want people to be able to take my code and do whatever
| the hell they want with it (including commercially) and
| optionally contribute to it. Having a license currently makes
| that easier but every time the Free Software lot going
| zooming off into the weeds of GPL v3 versus GPL v2 versus
| LGPL my eyes roll back into my head and I internally start
| screaming "get a life!".
| notacoward wrote:
| I suggest you read up on the history of free software and open
| source. It exists as a reaction to intellectual enclosure, to
| prevent that ill and create greater freedom of ideas. Yes, it
| uses the tools of copyright to fight greater ills of copyright,
| because those are the tools available, and actions like these
| are necessary to keep the enclosure from happening all over
| again. Anyone who has actually studied the matter for even five
| minutes can see how silly the "free software is anti-freedom"
| FUD is.
| ssalka wrote:
| Information wants to be free
| mplanchard wrote:
| If that's what you want, you should license your code not under
| MIT, but under a license that allows replication/distribution
| without attribution. Meanwhile, others who do care about such
| things can license their code under licenses that require
| attribution/copyleft/etc.
| Guid_NewGuid wrote:
| But I can't really because the legal systems for it don't
| exist. I can't relinquish anything https://softwareengineerin
| g.stackexchange.com/questions/1471... (CC0 looks closer but
| still doesn't do what I'm after).
|
| And I can't because there are a bunch of, for want of a
| better word, dweebs who care about this stuff. I don't give a
| single solitary frick about the finer points of MIT vs GPL vs
| BSD 3 clause vs CC-BY-NC or whatever-the-hell. But y'all are
| forcing me to care by making the legal frameworks for
| software ever more strict and confusing.
|
| I take a maximalist view, don't want the code copied, sliced
| up, re-used in any form whatsoever with no credit? Don't post
| it on a code sharing site. Like I say in the OP, in my job I
| obviously have to follow the rules, but on an ideological
| level I'll ignore them where I can get away with it outside
| of work.
|
| If you don't want the code to be used, don't post it online,
| tuckerman wrote:
| I'm curious if this view is software specific or relates to
| any work released online? For example, do you feel
| similarly about a novelist or graphic artist? I reckon at
| least a few software engineers look at what they produce
| not entirely differently from how an artist or writer looks
| at theirs.
| Guid_NewGuid wrote:
| It's a good, and thought-provoking, question.
|
| First to be flippant the idea of a software developer
| with that view sounds so unbearably insufferable and full
| of themselves I hope never to meet one. All code is
| terrible, be less attached.
|
| Stream of consciousness: Should artists or writers be
| paid for what they produce? Yes. So why not software
| developers? I'm paid for what I produce. But then I don't
| release the stuff I'm paid for for free on the internet.
| But I'm against DRM, I also think Winnie the Pooh
| shouldn't have IP protection (now expired). What makes
| art or literature a different commons from software? I
| also think all scientific journals should be available
| for free. Do artists and writers have an alternative
| route to make money from what they publish, what is the
| artistic or writer equivalent of open source? I think
| this is the crux of it, if we're going to do open source
| let's actually do it and stop being precious about it but
| this only applies to freely-entered open source. So does
| that mean I support some form of copyright after all?
| Then again some old out-of-print books will sell for
| Amazon for like $4000 so we should be able to copy those
| for free.
|
| Ultimately it's a question of what a vision for society
| without copyright would look like. I think software is
| uniquely placed to start exploring that idea. How would
| we make a living of software if anyone could reverse
| engineer (even our proprietary) code freely and safely?
| tuckerman wrote:
| The reason I ask with writers in particular is because,
| like code, having access to it necessarily means that the
| viewer has the ability to copy it as much as they'd like.
| Unlike software, however, there is no ability to keep the
| source code private in a book while still having users.
|
| I definitely agree that copyright protections have become
| far too strong but I don't think we can really ever know
| if we would have be able to build the strong open source
| community we have today without coopting the copyright
| system for copyleft protections. At the same time,
| perhaps we are past the point where it's necessary and
| now it's holding us back... it's entirely possible!
|
| To the first thought, I personally see some coding as a
| creative act (some is doing _a lot_ of work there
| though). It's not because I fancy myself a Picasso but
| because I think some (again, doing a lot of work!)
| solutions/ideas have a bit of their creator in them and,
| for those works, the author should be able to exert some
| control over their works. I think this is more
| philosophical than legal/political, but I would disagree
| that its flippant :)
| kube-system wrote:
| > Free Software
|
| > public domain
|
| These are incompatible concepts. RMS's vision of 'free-as-in-
| freedom' software doesn't let people do whatever they want. It
| forces those who distribute binaries to also distribute source.
| This is not possible with a public domain work.
| monocasa wrote:
| The issue is that whether the free software people want it or
| not, the copyright system over code exists, and historically
| has been used as a cudgel against smaller players. If we got
| rid of copyright over code entirely I'd totally be down for
| this. And IIRC RMS has said the same thing; that he'd be in
| favor of the removal of copyright over code as a concept even
| if it meant neutering the protections of the GPL.
|
| Until that happens, and copyright protections are still used by
| larger entities, using the same system to protect yourself and
| (more importantly) your users isn't turning your back on your
| ideals, but instead simply adjusting your strategy to the
| current material conditions. Remember that Google v. Oracle
| (while ultimately a win versus what could have been) was a step
| back, with de minimis claims left on the table as not a valid
| defense. The play field is heavily slanted towards the big
| players and software freedom requires every tool it can put
| it's hands on at the moment.
| Guid_NewGuid wrote:
| Interesting that he's said that, I wasn't aware.
|
| I think at its root the problem is copyleft is a mirror image
| of copyright. It relies on and replicates all the cultural
| and legal requirements and constraints of the copyright model
| and curtails an imagining of other possibilities. Every
| sentence or thought spent on copyleft is misdirected in my
| view.
|
| Which is why I find Microsoft doing this (potential) en-masse
| license violation and then a bunch of GPL folks getting mad
| pretty funny overall. I just find the high and mighty tone
| annoying, like sure, they've (allegedly) screwed you, but
| they're going to (theoretically) get away with it because
| they're rich and powerful, sorry that didn't turn out how you
| wanted.
| Kbelicius wrote:
| >I think at its root the problem is copyleft is a mirror
| image of copyright.
|
| That is the (only)point of copyleft. If it weren't for
| copyright it wouldn't exist. Fight fire with fire, that
| sort of thing.
| [deleted]
| zzo38computer wrote:
| > The issue is that whether the free software people want it
| or not, the copyright system over code exists, and
| historically has been used as a cudgel against smaller
| players. If we got rid of copyright over code entirely I'd
| totally be down for this. And IIRC RMS has said the same
| thing; that he'd be in favor of the removal of copyright over
| code as a concept even if it meant neutering the protections
| of the GPL.
|
| As someone else asked, I would also want a citation, but I
| agree.
|
| Actually, I want a license that you can do pretty much
| anything you want to do with it (including: lack of
| attribution, distribution without source codes, distribution
| with source codes (whether they are the original source codes
| or reconstructed), lack of copyright notices, reverse
| engineering, circumvention of your own copy and write reports
| about anything you want to do, to use or not use the software
| (and to modify or not modify) at your choice, etc), but that
| you are not allowed to add further legal restrictions to it
| (with a few exceptions dealing with trademarks (but not all)
| and allowing conversion to GNU (A)GPL 3 and CC-BY-SA 4.0 if
| you are able to satisfy the conditions of those licenses) or
| to derivative works, and that if someone will try to use
| legal processes against you relating to this, then anyone can
| countersue.
| matheusmoreira wrote:
| > And IIRC RMS has said the same thing; that he'd be in favor
| of the removal of copyright over code as a concept even if it
| meant neutering the protections of the GPL.
|
| Do you have a citation? I was under the impression he
| defended copyright because copyleft depends on it.
| marpstar wrote:
| > Copy my MIT licensed code without attribution? I don't give a
| shit, go ahead, I hope it helps
|
| This is my feeling as well. I don't build stuff in the open so
| that I can get bent out of shape at someone not properly
| licensing it. It's in a _public_ repository, FFS... I assume
| that if anyone even notices my repo, that they may copy /paste
| a few lines out of my solution if it helps them.
| sirsinsalot wrote:
| But this isn't everyone's feeling. And they have a right to
| choose how their work is used. Thats the basis of commerce
| being possible here.
|
| The mechanised license ignorance and the way original authors
| are not compensated is the issue.
|
| If you had a repo you'd worked really hard on, and offered a
| commercial license or GPL depending on the use (so you can be
| funded to work on it) ... do you think it is fair that
| copilot ingests that code and allows others to benefit from
| your work and knowledge without the commercial license as you
| intended?
|
| Note how Microsoft always throws out the capitalism "rules of
| engagement" when it benefits them and undermines everything
| else. The fact we are even trusting the situation Microsoft
| are creating is dire, and speaks to the short memory of our
| industry.
| alar44 wrote:
| Saying an auto complete of a line of code is "using their
| work" is a massive stretch.
| sirsinsalot wrote:
| It isn't autocompleting "a line of code", it completes
| whole function bodies.
| cududa wrote:
| Exactly! Do they really think every single line of their code
| is so precious it requires attribution? If I publish code, I
| assume it might get pushed, pulled, refactored in a million
| ways and no one will ever know my name's attached to it. And
| guess what? I DONT'T CARE. It's code. Not a self-constructed
| monument to my own intelligence that needs a little placard
| with my name on it to follow around some clever async
| function I wrote
| georgeecollins wrote:
| If its a couple lines of generic code, of course. That's
| also an indefensible copyright, btw. But if its hundreds of
| very specific likes of code written to do one thing under a
| license you don't follow, that's something else.
|
| This isn't just an issue of code. You can write a program
| that combines songs, or combines novels creating a
| different work that has sections that are essentially the
| original protected work. I don't think the authors of those
| novels are going to be OK with you selling or giving away a
| version of their work just because an AI edited it or
| combined it somehow.
| dougmwne wrote:
| In this thread: many engineers nervously sweating. The moats
| are drying up and the wizards are about to be thrown out of the
| castle. This tech is the first product in a long line of
| products that will massively lower the barrier to entry. It has
| been a good run, but it was never going to last forever. We are
| not part of the capitalist class and were never going to be.
| LordDragonfang wrote:
| Copilot replaces code monkeys, not engineers. Ultimately it's
| just faster stack overflow, proper software engineers and
| system architects are going to be just as in demand as they
| are right now for the foreseeable future. At the point at
| which that stops being the case, we'll have much bigger
| societal and existential problems (because it implies the
| singularity is nigh)
|
| (You're correct on not being part of the capitalist class,
| though)
| dougmwne wrote:
| There are a lot of code monkeys out there and I might be
| one of them. That island of job security seems like it will
| be shrinking.
| ThalesX wrote:
| The world might change, but software engineers have been
| working with and within change their entire careers
| presumably. I think we'll be OK, as people, no matter what
| happens.
|
| I was sweating nervously before I started using Copilot
| awhile ago but I've stopped since because A - it really
| doesn't replace me, tried really hard; B - I don't sweat
| nervously for IntelliSense either.
|
| There's also C, where being of an entrepreneurial mindset,
| I'd love the opportunity to hand over the software to an AI
| dev and just direct the implementation to my desire until I
| have a working product. I bet I could secure a higher room in
| the castle if instead of coding for 8 hours per day I could
| work on n products with capable AI Software Engineers. We're
| not there yet though.
| captainbland wrote:
| If we're all standing on the shoulders of giants (specifically
| code that other people wrote) then really what Copilot is selling
| is a ladder to get onto those shoulders faster. I think that's a
| legitimate aim, as such. However it should be careful about not
| including unlicensed code and should have a specific 'GPL' option
| for a model trained with GPL code included.
|
| I suppose it should also generate appropriate copyright notices
| to satisfy many open licenses. I'd be surprised if copilot could
| actually link back to the original code like that, though.
| jarenmf wrote:
| I guess the question is where you draw the line between a
| derivative work and "learnt by an AI algorithm"
| asimpletune wrote:
| Who needs a line when there are plenty of obvious examples
| lifted verbatim?
| triknomeister wrote:
| If the media copyright industries and their ContentID is
| anything to go by, it doesn't matter. It's all derivative.
| presentation wrote:
| Google just sells content other people wrote.
| bborud wrote:
| Well, this does invite an interesting comparison. If we imagine
| something like Copilot applied to music I believe the chances of
| ending up in court would be pretty high. There are a lot of
| examples of plagiarism lawsuits in popular music and the outcome
| seems to be entirely random.
|
| One could argue that the information density in chord
| progressions, bass lines and beats is extremely small. And that
| any recognizable part of a musical idea that has been "borrowed"
| would necessarily make up a larger percentage of the complete
| work than would be the case for a typical application with
| borrowed snippets.
|
| That's not a bad argument, but it is unsatisfactory because it
| means that at some point someone has to make a judgement on how
| much you can borrow.
| ThereIsNoWorry wrote:
| 1. You most likely agreed to that by using GitHub.
|
| 2. Copy&Pasting Code by manual search exists.
|
| 3. This is just a smart tool so you don't have to figure out
| yourself what to copy&paste (in the best case) and save a lot of
| time.
|
| Sometimes I truly wonder how people can genuinely be upset about
| things like this. What is broken are copyright and patent laws in
| the 21st century.
| zufallsheld wrote:
| As to your first point, there are many repositories on github
| that the author of code did not upload there or where not all
| contributors to the code are on github or agreed to let their
| work be used in such a case.
| redox99 wrote:
| That's really no different than somebody uploading
| proprietary code they don't own (stolen, leaked, whatever
| reason etc) on Github. Github has to assume that you are
| allowed to do so. What are they going to do otherwise,
| somehow manually verify that each repository is legit?
|
| Now you might say, what about GPL code you don't own. You are
| allowed to redistribute it (upload to github). But because
| you are not the owner you can't license it to Github under
| new terms (that allow them to use it for ML training). But
| the question still is, is there anything in the GPL that
| forbids it's code being used for ML training? Even if the
| generated model is proprietary, has no attributions, etc?
| megous wrote:
| Ok, takedown requests exists. Say Qualcomm finally wises up
| and asks github to takedown a copy of the millions lines of
| their super proprietary 4G modem firmware implementation
| from github. Will github retrain the model after each such
| takedown? :D
|
| If not, then it's kinda stupid to argue the point about the
| lack of knowledge, since lack or not lack of knowledge
| clearly doesn't matter. Github will happily continue using
| confidential code even from trigger happy companies like
| Qualcomm for copilot.
| redox99 wrote:
| I guess they would add some kind of filter to copilot
| output that removes results that clearly come from code
| that was DMCAd.
|
| It's kind of like some employee that worked at Qualcomm
| and has seen the code. Do you retrain him (aka hit his
| head until he forgets) after leaving the company?
|
| The comparison might seem silly but as AI advances I
| expect more and more arguments (especially in court) to
| come from analogies of humans and AIs.
| megous wrote:
| What kind of filter? I thought copilot does not output
| the input data verbatim.
|
| Creating an output filter based on millions lines of
| DMCAd code that would not cripple the copilot output
| completely at the same time, sounds like one of those
| hard problems. Especially if there's no agreed upon
| definition of copyright "violation" here.
| keraf wrote:
| The point of this Tweet is about licensing. When using an MIT
| licensed library for example, you would have to give
| attribution. But you can easily rewrite portion of that library
| yourself using Copilot, which could potentially use code from
| the initial lib, without any attribution or whatsoever. It's
| even more problematic with licenses such as the GPL.
|
| I guess Copilot could address this by checking the licenses of
| the projects it uses. Even when combining code, it could pull
| in the required attribution or avoid GPL licensed code (unless
| enabled) for example.
| SahAssar wrote:
| > 1. You most likely agreed to that by using GitHub.
|
| Are you saying that I would need all the original authors
| consent to upload a repo to github even if I include all the
| original attribution and licenses? Because what you are
| implying is that when uploading I'm granting github a license
| far outside the bounds of the license included, which only
| _all_ the contributors can do. For example, would the linux
| project need to contact each and every contributor ever to
| upload a mirror to github, since their contributions were under
| GPL but you are implying that the license given to github is
| much, much broader?
|
| This would make any project not originally started on github
| and with a few contributors basically impossible to host there.
|
| > 2. Copy&Pasting Code by manual search exists.
|
| The question is who is doing the infringement here. Github
| copilot is obfuscating the copying and telling it's users that
| the code is theirs to use, own, etc. as they please but is also
| taking large chunks of code it does not have the right to
| redistribute, even less grant licenses to.
| dmix wrote:
| > Sometimes I truly wonder how people can genuinely be upset
| about things like this
|
| 90% of Twitter is just inventing new ways to whine about things
| ParetoOptimal wrote:
| There's some truth there, but there is more negative in
| outright dismissing the uncomfortable but important ethical
| dilemmas one might be introduced to.
| teakettle42 wrote:
| > Sometimes I truly wonder how people can genuinely be upset
| about things like this.
|
| Tell me you regularly plagiarize without telling me you
| regularly plagiarize.
| ThereIsNoWorry wrote:
| Code plagiarization is not a thing by all practical purposes
| (it's even almost impossible to go to court with that for
| very obvious reasons). And that's good. Because with that
| insane lockdown of "Intellectual Property" nothing would ever
| get done. So, think what you want.
| teakettle42 wrote:
| > Code plagiarization is not a thing by all practical
| purposes
|
| Of course it is. Plagiarism is "the practice of taking
| someone else's work or ideas and passing them off as one's
| own."
|
| It's unethical and it will get you fired at any reputable
| company.
| ThereIsNoWorry wrote:
| Ok, then there doesn't exist a single reputable company
| with a tech division and we're all unethical. Have a nice
| unethical day.
| teakettle42 wrote:
| > Ok, then there doesn't exist a single reputable company
| with a tech division and we're all unethical. Have a nice
| unethical day.
|
| I'm deeply disturbed that you think this form of
| plagiarism is universal -- I can assure you that is not
| the case.
|
| I work at a FANG currently, and plagiarism is absolutely
| not tolerated.
|
| In fact, plagiarism has been considered a fireable
| offense at every other company I've worked at over my 25
| year long career, and prior to that, considered a serious
| form of academic misconduct in school.
|
| It's clearly unethical and I've never plagiarized in my
| life.
|
| I've only run into one instance of someone else
| plagiarizing code in my career, and that individual was
| fired.
| ParetoOptimal wrote:
| > I'm deeply disturbed that you think this form of
| plagiarism is universal -- I can assure you that is not
| the case. > I work at a FANG currently, and plagiarism is
| absolutely not tolerated.
|
| It's universal in any company that doesn't take measures
| against it. So basically startups, small, medium, and
| even some large companies.
| ThereIsNoWorry wrote:
| I'm disturbed you believe regurgitating code snippets is
| plagiarisation.
| teakettle42 wrote:
| It's literally plagiarism by definition.
| ThereIsNoWorry wrote:
| https://stackoverflow.blog/2021/12/30/how-often-do-
| people-ac...
|
| I feel like you're arguing in bad faith. So, whatever.
| teakettle42 wrote:
| Explaining a fundamental ethical concept you should have
| learned in primary school when writing your first book
| reports is not arguing in bad faith.
|
| SO's license requires attribution.
|
| If you don't want to be a plagiarist, you either need to
| include attribution, or you need to rewrite the solution
| in entirely your own words.
| ThereIsNoWorry wrote:
| So, rearranging conditionals or loops or variables then,
| problem solved. You cannot 1:1 copy paste anyway. That
| never works. You always have to adapt it to your
| particularity. So it's "reworded" by default. And CoPilot
| is doing nothing else. It's not just 1:1 memorising code,
| it's a tiny bit smarter than that. I strongly believe
| you're not a developer. Point taken. I understand your
| considerations. You should write code sometimes to solve
| a complex problem that uses some libraries and see how
| far you get without consulting the internet or books.
| ilikehurdles wrote:
| I absolutely attribute things I find on SO to where I
| found them. You finished college maybe a year ago and are
| already making some absolute judgments about what makes
| other people qualified to call themselves developers
| simply because they don't develop as you do.
| [deleted]
| teakettle42 wrote:
| > I strongly believe you're not a developer.
|
| > You should write code sometimes to solve a complex
| problem
|
| There's a very high chance you posted your comment from a
| device using code I wrote.
|
| Glueing together plagiarized code copied from SO, or
| stolen from OSS projects on GitHub, is not software
| engineering.
| [deleted]
| lin83 wrote:
| > I'm deeply disturbed that you think this form of
| plagiarism is universal
|
| This thread is an eye opener for me too. Do engineers not
| get trained on their legal obligations? My company is old
| and not a tradition tech company but we have been running
| workshops on the issue for years. Even if they don't,
| what about their legal teams? Or CI tools to scan for
| licence violations? Some of the responses here are so
| naive it's crazy. I hope no one is identifying the
| companies they work for.
| ThereIsNoWorry wrote:
| Obviously we do. Don't copy paste 10 pages of source code
| unaltered and sell it as your own.
|
| But that's something entirely different from small code
| snippets, changed and adapted to solve the same problem a
| thousand other people already had. Nothing else are
| developers doing going on GitHub, StackOverflow or any
| other website to find answers to their questions. That's
| not naivety, that's how coding works (partially). If you
| would have to re-invent the wheel everytime you build
| something new, good luck.
| lin83 wrote:
| There isn't a threshold for copyright violation. If you
| copy a 3 line function from a GPL library, you have to
| comply with the licence. Tools like BlackDuck will pick
| it up.
|
| Snippets aren't exactly defined but I see them as more
| than just a single line like "here's how to flatten a
| list in Python", it's some functionality - e.g. an
| algorithm implementation or some task.
| lin83 wrote:
| I don't think that's true and, if it was, it would be the
| death knell for open source.
|
| Code Plagiarism is taken very seriously by every company I
| have worked with. Multiple companies have been sued for
| violating the GPL. The SFC is currently fighting Vizio in
| court for example. While not commonplace, to say it's
| "almost impossible" is a stretch. Every large company
| complies with code copyright obligations for a reason. My
| company publishes changes to GCC and a dozen other GPL
| projects. Entire products like Protocode and BlackDuck
| exist to ensure code compliance. Even small code snippets
| are flagged.
|
| Over the past few years the source code for Windows, SQL
| server, Bing and Cortana have all been leaked. If someone
| built a product using that code, how long do you think it
| would take Microsoft to sue? CoPilot is one rule for mega-
| corps and another for everyone else.
| IdiocyInAction wrote:
| I don't think that something like CoPilot is what most GH users
| had in mind when they published their code. Also, licenses
| exist (which CP demonstrably doesn't give a shit about).
| oytis wrote:
| Copilot sells the service of finding the code that makes sense
| for what you write. Would be better if it could correctly
| attribute the source(s) though, I hope they will solve this
| problem at some point.
| sirsinsalot wrote:
| Beware geeks with gifts. This is Microsoft. The question isn't
| "is it good?" but "Why are Microsoft offering it and how is it
| undermining everyone else?"
| dougmwne wrote:
| Microsoft will benefit from cheaper and more productive
| engineers.
| lfrigodesouza wrote:
| It's as the saying go, "when a product is free to use, the real
| product is actually you". In this case, our code is the product.
| Just considering now on swapping to another git provider...
| floor_ wrote:
| I started self hosting when Microsoft bought github and with this
| mass theft of copyrighted material and then reselling it for
| money I'm even more happy with my decision.
| rictic wrote:
| Copilot very rarely copies code verbatum, and when it does it's
| very short snippets. When Oracle sued Google over allegedly
| copying short and fairly trivial snippets of code they were
| justly derided.
|
| I can't speak to the legal side, but I just don't understand the
| moral outrage over very occasionally copying such short snippets
| of code. The key innovations and the actual value that licenses
| are intended to protect aren't in these short snippets.
|
| And what does copilot bring to the community? Free use by
| students, free use by open source maintainers, and a huge boost
| in productivity for a modest fee for professional devs, for a
| service that no doubt costs a lot to run, even on the margin.
| aaron695 wrote:
| stakkur wrote:
| At every turn, in every instance, for decades, all stories
| involving Microsoft end in "...and then Microsoft fucked people
| over." I've witnessed this firsthand since the 80s.
___________________________________________________________________
(page generated 2022-06-23 23:01 UTC)