[HN Gopher] FSF-calls for white papers on philosophical and lega...
       ___________________________________________________________________
        
       FSF-calls for white papers on philosophical and legal questions
       around Copilot
        
       Author : non_sequitur
       Score  : 200 points
       Date   : 2021-07-29 16:09 UTC (6 hours ago)
        
 (HTM) web link (www.fsf.org)
 (TXT) w3m dump (www.fsf.org)
        
       | belorn wrote:
       | An interesting initiative from FSF, through I suspect the answer
       | the most of the question will be answered when someone attempts a
       | similar projects in a more traditional copyright-restrictive
       | area.
       | 
       | As an example I would like to see is a Cosinger, where the AI is
       | trained using songs on youtube and streaming services. With the
       | final product, a user start to sing and the algorithm attempt to
       | sing along and give the singer suggestions for how the song
       | should continue. I could see how a lot of musicians would be
       | willing to pay good money for such program, and removing
       | obligations to pay any money for the training set would make it
       | much more feasible to create.
       | 
       | There are already AI's that create music (through unlikely from
       | proprietary training sets). A Cosinger shouldn't be too far from
       | that.
        
         | antocv wrote:
         | A Cosinger would be illegal unethical, profit killing, anti
         | democracy and ultimately anti our very own freedom to own
         | intellectual property. /s
         | 
         | The same difference as allowing Google to prosper while beating
         | down ThePirateBay, another search engine.
        
           | belorn wrote:
           | I predict it is very likely we will see a court case where a
           | smaller actor will take public available information as
           | training data and get sued for copyright information. It will
           | be interesting to see if, just like in the pirate bay case,
           | the courts will be creative. In the TPB case, the accused was
           | found guilty of an Swedish anti-biker gang law that was
           | written with the intention to shut down biker bars.
           | 
           | When copilot came out, one thing it reminded me of was the
           | ethical considerations of face generators in animation. The
           | output naturally has some similarities with the training
           | data, and it is trivial to use a limited set of actors in
           | order to create faces with canny similarities of the actors.
           | A question that people asked (here on HN if I recall) was if
           | you needed permission from those actors to use in the
           | training set, or if this would allow anyone to "steal" the
           | face of public faces and create semi-look alike that can then
           | be used in anything from porn to advertisement.
           | 
           | The law is undoubtedly going to catch up.
        
       | lamontcg wrote:
       | Given how the racist twitterbot AI turned out, along with L4
       | autonomous driving by 2017, I suspect that Copilot is going to
       | suffer most from an incredibly high velocity of churned out
       | security bugs and bad code. SWEs are probably going to get fired
       | for using it and companies will need to ban it, even if the legal
       | problems don't take it down.
        
         | c7DJTLrn wrote:
         | It's useless. It is a problem looking for a solution much like
         | most "AI" tools these days. I am frankly frustrated at everyone
         | buying into this stunt.
        
           | phillipcarter wrote:
           | I don't think Copilot is useless at all. Today it's actually
           | been very helpful for me with interactive, notebooks-based
           | programming. And it's also just an early beta right now; as
           | the model improves and the tooling around it matures a little
           | more, I suspect I'll be using it a lot for interactive stuff.
           | 
           | Notebooks programming has a flow of "execute a small bit of
           | code, check the results, and iterate", and this fits
           | perfectly with Copilot since you still need to check if the
           | suggestions work.
           | 
           | Maybe this kind of programming is where Copilot finds a
           | niche, maybe not. I don't know. I'm skeptical of its use in
           | larger applications where you can't trivially check if the
           | code you wrote (with its help) did what you want. I think
           | there needs to be a lot more tooling built around that to
           | really make it compelling for larger applications like that,
           | likely in the form of more editor tooling integrations. But I
           | think it's promising. I wrote about that a little more here:
           | https://phillipcarter.dev/posts/four-dev-tools-areas-
           | future/...
        
       | ralph84 wrote:
       | Their link to why you shouldn't use GitHub[0] takes you to a page
       | where they criticize GitHub for complying with US export
       | controls. The FSF is a US corporation, why do they think that US
       | export controls don't equally apply to savannah.gnu.org? And
       | unlike FSF, GitHub has actually done the work of applying for
       | export licenses so that developers in US-sanctioned countries can
       | access GitHub[1].
       | 
       | [0] https://www.gnu.org/software/repo-criteria-
       | evaluation.html#G... [1]
       | https://github.blog/2021-01-05-advancing-developer-freedom-g...
        
         | fadjacent wrote:
         | github could easily establish a non-us entity to host export
         | restricted code. And for savannah, if anyone had any code they
         | were worried about export control for their code, savannah
         | would quickly and easily have an independent person host that
         | repo outside the US.
        
       | thomzane wrote:
       | I am excited to see where these questions lead.
        
         | grepfru_it wrote:
         | Something like this?                 [GitHub Copilot License
         | Config Menu]       Show suggestions with the following tags:
         | - [ ] GPLv3       - [x] GPLv2       - [ ] AGPL       - [x] CC-
         | BY-SA       - [x] Apache License       - [x] MIT License
         | - [ ] No License Attribution
        
           | dunham wrote:
           | Would that require generating 2^n models or can models be
           | combined?
        
             | blooalien wrote:
             | I would think that to combine the models, the software
             | would need some internal method to differentiate between
             | the licenses used by the various code sources it's pulling
             | it's suggestion "ideas" from, and the compatibility between
             | the licenses of those sources and your own choice of
             | licensing for the project you're creating.
        
           | remram wrote:
           | Those licenses require attributions. You can't just say
           | "Copyright (c) all the projects indexed in Copilot".
        
             | grepfru_it wrote:
             | There's certain information I cannot share, but you can see
             | the general idea of what I'm throwing out here
        
           | tyingq wrote:
           | Other picklists might be handy too, especially something that
           | would narrow to higher quality sources.
           | 
           | And they need a report button with a picklist of reasons.
        
             | imoverclocked wrote:
             | eg, cleanliness described by different linters/static-
             | analysis tools? Can we actually make _better_ code
             | suggestions by choosing examples which are known to have
             | less super-obvious flaws?
        
               | tyingq wrote:
               | Linters would be nice, though the bar is pretty low. A
               | lot of the examples won't even compile :)
        
       | lights0123 wrote:
       | > It requires running software that is not free/libre (Visual
       | Studio, or parts of Visual Studio Code)
       | 
       | A little nitpicky, but the only proprietary part it requires is
       | the plugin itself, not the IDE--Copilot runs just fine with the
       | Free build of VS Code compiled from source from GitHub, after
       | flipping a switch to enable WIP APIs.
        
         | r283492 wrote:
         | I think you are wrong: https://vscodium.com/
        
           | lights0123 wrote:
           | VSCodium provides Free pre-compiled binaries of VS Code from
           | GitHub, like I was describing. What about it makes me wrong?
           | 
           | I did it two days ago, installing the Copilot plugin in a
           | Free build of VS Code provided by my distro.
        
       | davisr wrote:
       | The ignorance in this comment section is already giving me an
       | aneurysm. Software licenses matter. Copyright matters. If
       | megacorps like Microsoft can sue people into oblivion for
       | violating their copyright terms, people can sue Microsoft into
       | oblivion for violating theirs. I don't use MS Github, I have no
       | skin in the game, but I hope there is at-least a $1000 award to
       | every instance of AGPL and GPL license violation because it's
       | unfair and illegal what they're doing.
       | 
       | This isn't ML, it is a ripoff and is violating clear software
       | licensing terms. https://news.ycombinator.com/item?id=27710287
       | 
       | Software freedom matters, but I wouldn't expect the typical HN
       | type to understand, since their money is made on exploiting
       | freely-available software, putting it into proprietary little
       | SaaS boxes, then re-selling it.
        
         | syshum wrote:
         | Thank you...
        
         | echelon wrote:
         | > The ignorance in this comment section is already giving me an
         | aneurysm.
         | 
         | For the past ten years we've been spoon fed that it's okay for
         | Open Source / Free Software to be co-opted by giants and
         | subsequently kept private.
         | 
         | You'll be harassed for telling people it's wrong for Apple to
         | lock down the iPhone. Or that Google shouldn't be in charge of
         | web standards.
         | 
         | Web infrastructure is mostly a bunch of black boxes. How far
         | we've fallen! We need a new cloud provider that is 100% on an
         | open stack. Billing system and all.
         | 
         | Richard Stallman is no longer a hero. Yes, he did wrong and
         | gross things, but in the same breath he's brushed under the
         | rug, so are his ideas.
         | 
         | We've all been collectively gaslighted. Wake up.
         | 
         | Downvoters: Tell me how I'm wrong.
        
           | syshum wrote:
           | I learned web development looking at websites code, and open
           | source libraries
           | 
           | Now with JS Obfuscation, and Web Assembly they are attempting
           | to make websites like complied software
        
           | api wrote:
           | Most of the people who go nuts when you point these things
           | out are FOSS zealots reacting to the idea that FOSS licenses
           | should be adjusted to prevent billion dollar companies from
           | co-opting it for profit.
        
             | echelon wrote:
             | Profit is fine. Building anti-competitive monopolies that
             | don't share and that seek to own more and more of computing
             | was an unanticipated side effect.
        
               | spywaregorilla wrote:
               | Isn't that exactly what a lot of open source licenses
               | have permitted due to hiding things behind SaaS layers
               | rather than distributing products?
        
               | kube-system wrote:
               | I don't think there's a single "anti-competitive
               | monopoly" today that has less FOSS involvement than any
               | major software company in the 20th century.
        
           | filoleg wrote:
           | > Yes, he did wrong and gross things, but in the same breath
           | he's brushed under the rug, so are his ideas.
           | 
           | His ideas being "brushed under the rug" had nothing to do
           | with his public "cancellation" that happened in the past few
           | years.
           | 
           | Stallman has always been an extreme purist that prioritized
           | his ideological stance over anything else that matters to
           | users. And his ideas were "brushed under the rug" just as
           | much 5 years ago (before public revelations about his
           | misdoings) as they are now. It might just feel like he has
           | been increasingly "brushed under the rug" more recently
           | because he has been becoming increasingly irrelevant and more
           | of just a spokesperson.
        
             | syshum wrote:
             | Stallman was looking out for USERS, not developers. the
             | problem was the developers thought it was them that
             | Stallman was wanting to protect.
             | 
             | GPL, and Libre Software is about keeping software open from
             | the Dev to the EndUser. Non-Copyleft "Open Source" is about
             | keeping libraries open for Dev's to exploit into their
             | closed source products...
             | 
             | There is a big difference, I support Free Software, not
             | "Open Source"
        
           | kube-system wrote:
           | RMS is not the dictator of FOSS. There are plenty of valid
           | competing opinions of what "freedom" means and not all of
           | those include legally compelling everyone to share. The MIT
           | license, for example, is both older and more popular than
           | GPL. There has always been a lot of people who do not agree
           | with his opinions.
        
         | api wrote:
         | But don't you get it? The purpose of FOSS is to provide free
         | labor for billion dollar companies.
        
         | heavyset_go wrote:
         | > _The ignorance in this comment section is already giving me
         | an aneurysm. Software licenses matter. Copyright matters._
         | 
         | If anyone thinks they don't, ask why Microsoft didn't train
         | Copilot on their Windows, Office, or Azure source repositories.
        
           | cromka wrote:
           | Case closed, everybody go home.
        
         | c7DJTLrn wrote:
         | If I ever receive monetary compensation for violation of the
         | license on my repositories, I will personally deliver it to you
         | in cash. It won't happen.
         | 
         | I have a feeling Copilot is more of a tool for publicity than
         | for development.
        
           | spywaregorilla wrote:
           | That statement sort of depends on how important your repos
           | are
        
         | swayson wrote:
         | Very well put and refreshing. Thank you.
        
         | hartator wrote:
         | > Software licenses matter. Copyright matters.
         | 
         | Some of us think is detrimental to humanity at whole.
        
           | sangnoir wrote:
           | True, but while they exist,they should be evenly applied
        
           | warkdarrior wrote:
           | If you abolish copyright, that will only make it easier for
           | for-profit corporations to use FOSS. There will be nothing
           | stopping them from using FOSS, unless people stop sharing
           | their code altogether.
        
             | syshum wrote:
             | While True, if you abolish copyright then there is nothing
             | preventing me from Installing Microsoft office on as many
             | machines as I want never paying Microsoft a dime....
        
           | Y_Y wrote:
           | Why not both?
           | 
           | Copyright certainly matters. It's a big deal legally and
           | economicically all over the world.
           | 
           | Suppose that it's just a bad idea and shouldn't exist. Does
           | that mean that I should release my code into the public
           | domain? I think you could make a good case that even being
           | totally opposed to copyright morally or pragmatically or
           | otherwise, given that it currently is enforced in many places
           | it's worthwhile to play along. For example, some people would
           | prefer a world without copyright, but GPL their code, because
           | it might prevent a greater evil.
        
         | xxpor wrote:
         | Software licenses have barely been tested in court, let alone
         | how they apply to code injected and combined with other code
         | via machine learning. You're extremely overconfident about how
         | this will actually play out.
         | 
         | For one, just because your code is covered by the GPL, it
         | doesn't mean every single line in isolation is copyrightable.
         | It has to demonstrate creativity. That's why you don't have to
         | worry about writing for (int i = 0; i < idx; i++) {.
        
           | austincheney wrote:
           | A software license, like any license, is a permission to
           | operate.
           | 
           | > it doesn't mean every single line in isolation is
           | copyrightable
           | 
           | It is if you can prove reproduction apart from your own
           | original work (fair use). Unlike patents copyright doesn't
           | protect uniqueness. It is only a shield from reproduction,
           | and if reproduction is demonstrable to a court you are likely
           | at risk.
           | 
           | https://cws.auburn.edu/OVPR/pm/tt/copyrightvplagiarism
        
           | alpaca128 wrote:
           | > it doesn't mean every single line in isolation is
           | copyrightable
           | 
           | Microsoft did _not_ just copy individual lines. They fed
           | whole repositories into their model, ignoring the license (if
           | it exists) even though they knew from the start that
           | information generated by the model will be publicly
           | available. Available usually out of context, but nonetheless
           | - the scope of the input and intent are very clearly
           | "everything" and "redistribution".
           | 
           | Just adding a filter/ML model to the output shouldn't matter.
           | I dare you to build a Copilot clone trained from leaked
           | internal Microsoft code and then trying to argue the output
           | is a bit mixed up.
           | 
           | That is a clear violation imho.
        
             | sobellian wrote:
             | The search engine on Github also calls up entire pages of
             | GPL licensed code verbatim. Does it run afoul of copyright?
        
             | google234123 wrote:
             | Copilot was trained on leaked internal Microsoft code
             | that's on github at the moment. Anyway, everyone seems
             | perfectly ok with training langauge models on copyright
             | text.
        
               | leereeves wrote:
               | If a trained language model exactly reproduces
               | copyrighted text, is there any question about whether
               | copyright still applies?
        
               | TchoBeer wrote:
               | This is a useless hypothetical, no language models do
               | that
        
               | heavyset_go wrote:
               | And yet there are plenty of examples of Copilot
               | reproducing copyrighted code verbatim, like is does in
               | this example[1] that was posted on HN.
               | 
               | [1]
               | https://twitter.com/mitsuhiko/status/1410886329924194309
        
               | dylan604 wrote:
               | Everyone is not perfectly OK with training language
               | models on copyrighted text. It's just that evilCorps do
               | it anyways, and there's nothing anyone can do to stop
               | them. I can't do anything. At best, I could get a Twitter
               | account and complain to the ether. The copyright holders
               | can't do anything against the might evilCorps, but that
               | doesn't make them okay with it. The fact you believe this
               | is just sad, and exactly what evilCorps want from you.
               | 
               | This goes beyond fair use or satirical/comedic effect.
               | They are training their models to output text in the
               | style of the authors being absorbed. The style of is
               | exactly the artistic effect that is being copyrighted.
        
               | gradys wrote:
               | Could you explain why you think training models on
               | copyrighted text is illegal or copyright infringement or
               | whatever else it might be?
        
               | klyrs wrote:
               | Training the models is fine. Applying the models, which
               | reproduces copyrighted works without proper attribution,
               | is where it gets sticky.
        
               | dylan604 wrote:
               | My explanation will not be popular here on HN, but I'm
               | never one to shy away. Especially when asked directly.
               | 
               | Buying a book, buying an audio CD, or buying a DVD/Blu-
               | ray is granting the holder permission to read,listen,view
               | that product as a single instance. You can lend them out,
               | but that's all you're really allowed to do with them. The
               | text,audio/video is not owned by you to do with as you
               | please. People obviously do not like that, and argue
               | making copies/backups is their right. Maybe that's
               | acceptable, but we can agree posting them on torrents and
               | sharing in any other manner from a copy made from the
               | thing you have is not.
               | 
               | Saying that, training a model on someone's copyrighted
               | text is not part of the agreement of the usage of said
               | text whether it's a copyrighted magazine, newspaper, or
               | book. If the people doing the training reach out to the
               | copyright holders and get specific permission to use
               | their copyrighted material in such a manner, then go
               | ahead. The fact that people feel like they can do
               | anything without the common courtesy of asking for
               | permission is troubling to me that we've lost something
               | as a society. There's no acknowledgment that someone has
               | created something by their own work so that the creator
               | can do with it as they please. A large portion of people
               | believe that because it was created they deserve/should
               | be able to/etc do what ever they want with someone else's
               | creation. Including getting paid for derivitave works
               | from the original creation.
        
               | Karrot_Kream wrote:
               | > The fact that people feel like they can do anything
               | without the common courtesy of asking for permission is
               | troubling to me that we've lost something as a society.
               | 
               | I see this sentiment a lot in FOSS spaces but I don't
               | really understand why. The role of judicial process
               | _isn't_ to provide a guiding moral philosophy around
               | social organization. Depending on the government in
               | question that's either a role of government functions or
               | isn't something that should be guided at all. The role of
               | law often (and yes, not in all governments, but at least
               | in the US) is to offer a contract between the state and
               | the individual.
               | 
               | I understand the potential for abuse here in using
               | Copilot to regurgitate licensed works without adhering to
               | the terms of the work's license, but I'm not fluent
               | enough in law to know if this is illegal or not. Calling
               | out and specifically applying strict limits this practice
               | is certainly something I'm sympathetic to, and I'm very
               | curious to see what the courts come up with. But swayed
               | by a moral argument I am not.
        
               | dylan604 wrote:
               | In the realm of FOSS, I feel like it's not the same
               | comparisons. The FOSS devs created the work, released
               | that work with the express knowing that someone else
               | could update/modify that work. Writing/art/videos are
               | rarely released with copyright that allows this kind of
               | modification. That's a huge difference. There are some
               | FOSS releases that allow people to use for
               | personal/private use while restricting commercial use.
               | This is closer to the books/movies type of scenario.
        
               | Karrot_Kream wrote:
               | I mean sure, but these are both legally defined works
               | with licenses that govern their use. The difference is in
               | the style of license. FOSS doesn't get a special moral
               | valence because individuals are authors and they offer
               | their work for editing and remixing under narrow
               | circumstances. I mean, if Jeff Bezos today were to
               | release code he wrote by hand with GPLv3 and were to cry
               | foul over Copilot, I doubt anyone would care (or he'd get
               | made fun of online.) Why does FOSS get treated so
               | differently?
        
               | liamwire wrote:
               | > My explanation will not be popular here on HN How is
               | this better than 'bring on the downvotes'?
               | 
               | Moving on, I'll put this to you: you claim training a ML
               | model against copyrighted text is in violation of the
               | 'permission' granted by the rights holder. However, flip
               | this on its head for a moment - that's basically all
               | human brains do. Clearly, the greatest writers of our
               | time haven't written their works in a vacuum. Rather,
               | that historical reading and inspiration becomes
               | sufficiently obfuscated that we deem something adequately
               | creative enough to be granted its own copyright.
               | 
               | Fundamentally, how does Copilot differ, other than
               | perhaps being a poor implementation? Is it by not being
               | 'adequately creative' enough? Is there some future
               | version you could envision that would be, or is it the
               | principle you're arguing against?
        
               | dylan604 wrote:
               | I don't agree to your premise. Humans can consume
               | creative works and be influenced, this is not in
               | question. Unless one is an impressionist, they aren't
               | going to try to recreate exactly the works done by the
               | artists they have been influenced by. Even if an artist
               | does something inspired/influenced by, they have pretty
               | much stated that. Musicians cite prior bands, as do
               | writers, painters, etc all credit those influences.
               | 
               | I'm probably just a curmudgeon, but I don't understand
               | the point of Copilot. So I'm probably not the best to
               | opine about it. However, I am very opinionated about
               | copyright in manner that typical flows against HN group
               | think.
        
               | TchoBeer wrote:
               | Copilot isn't intending to copy entire code bases either.
        
               | heavyset_go wrote:
               | Human beings commit copyright infringement all of the
               | time. People have been lifting riffs from music,
               | sometimes unconsciously, forever. This is why clean room
               | implementations are done sometimes when writing software.
               | 
               | Also, you're taking the machine learning metaphor
               | literally. AI models do not "learn", they're just
               | statistical models, they don't understand anything. There
               | is no comparison to human learning that isn't superficial
               | or metaphorical.
               | 
               | The real question is how Copilot is any different than a
               | compiler, or lossy encoding or compression.
        
           | josefx wrote:
           | > it doesn't mean every single line in isolation is
           | copyrightable.
           | 
           | copilot is known to reproduce entire blocks of text including
           | non functional parts like comments.
        
           | api wrote:
           | What about non-traditional-FOSS licenses? There is a lot of
           | source-available not-OSI-compliant licensed software on
           | GitHub like MongoDB, CockroachDB, etc., and that's clearly
           | proprietary. If this thing is trained on that and generates
           | what amount to snippets of that code then it's clearly
           | violating those licenses.
           | 
           | Then there's private repositories. If they included those in
           | the training data set that's even more actionable.
           | 
           | Personally I think this is software piracy at an absolutely
           | unprecedented scale. Machine learning is just information
           | transfer from the training data into weights in a model, a
           | close relative of lossy data compression. Microsoft is now
           | reselling all its GitHub users' code for profit.
        
             | Wowfunhappy wrote:
             | Private repositories weren't included in the training data
             | per-github, only public repos.
             | 
             | This really doesn't give me much comfort though. Making a
             | repo public doesn't imply anything, it could be "All rights
             | reserved".
        
           | ghoward wrote:
           | You're right that code has to demonstrate creativity for
           | copyright. But that also means that an algorithm, even a
           | _transformative_ algorithm, cannot change copyright because
           | an algorithm is not creative, by definition.
           | 
           | This means that the output of any algorithm on copyrighted
           | code is still under the original copyright. I mean, we still
           | apply the copyright of the original to the output of
           | compilers, even though compilers can be transformative with
           | inlining and link-time optimization, to the point that it
           | mixes disparate code in the same way Copilot does.
           | 
           | In fact, I wrote some software licenses [1] that codify the
           | fact that algorithms cannot change copyright.
           | 
           | [1]: https://yzena.com/licenses/
        
             | gradys wrote:
             | You sound very confident about this, whereas copyright
             | lawyers I've read discuss this issue seem much less
             | confident overall, but lean toward thinking this would be
             | fair use.
             | 
             | What makes you so confident that this would not be ruled
             | fair use?
             | 
             | (And for people not familiar - if ruled fair use, it
             | doesn't matter what the license is because fair use is an
             | exception to copyright itself.)
        
               | ghoward wrote:
               | I have a feeling you did not read the FAQ of the
               | licenses. I don't blame you, but they explain my
               | position.
               | 
               | Here's the relevant quote:
               | 
               | > GitHub is arguing that using FOSS code in Copilot is
               | fair use because using data for training a machine
               | learning algorithm has been labelled as fair use. [1]
               | 
               | > However, even though the training is supposedly fair
               | use, that doesn't mean that the distribution of the
               | output of such algorithms is fair use.
               | 
               | My licenses say, basically, "Sure, _training_ is fair
               | use, but _distributing_ the output is not. "
               | 
               | The licenses specifically say that the copyright applies
               | to any output of any algorithm that uses the source code
               | code as all or part of its input.
               | 
               | Now, I have not gotten a lawyer to look at my licenses
               | yet (it's in the works), so don't use them yourself. But
               | because everyone keeps saying that training is fair use,
               | I'm fairly confident that only training is fair use.
               | 
               | Of course, it might not be, but that would take more
               | court cases and more precedent. I wanted to poison the
               | well _now_ [2] to make companies nervous about using a
               | model that was partially trained with code licensed under
               | my licenses.
               | 
               | [1]: https://valohai.com/blog/copyright-laws-and-machine-
               | learning...
               | 
               | [2]: https://gavinhoward.com/2021/07/poisoning-github-
               | copilot-and...
        
               | seoaeu wrote:
               | > My licenses say, basically, "Sure, training is fair
               | use, but distributing the output is not."
               | 
               | Licenses basically by definition cannot say what is and
               | isn't fair use...
        
               | ghoward wrote:
               | > Licenses basically by definition cannot say what is and
               | isn't fair use...
               | 
               | Yes. However, my licenses only say what people already
               | say. Then the licenses go further and say, "But anything
               | else is not allowed."
               | 
               | Everyone else says training is fair use. My licenses
               | agree. But they make it clear that I don't believe that
               | anything else is fair use.
               | 
               | Yes, these licenses must be tested in court. Except that
               | they poison the well _now_.
        
               | nybble41 wrote:
               | It's mildly interesting that you've decided to express
               | your personal opinion about what is or is not fair use
               | within in your license text, but the fact is that if a
               | use of the work is deemed to be fair use under the law
               | then the terms of the license you're offering are
               | completely irrelevant. Your permission is not required to
               | make fair use of the work, so no one needs to agree to
               | your license.
        
               | ghoward wrote:
               | > It's mildly interesting that you've decided to express
               | your personal opinion about what is or is not fair use
               | within in your license text, but the fact is that if a
               | use of the work is deemed to be fair use under the law
               | then the terms of the license you're offering are
               | completely irrelevant. Your permission is not required to
               | make fair use of the work, so no one needs to agree to
               | your license.
               | 
               | You do not seem to get it. Yes, I understand that if fair
               | use applies, my licenses don't matter. I get that. I
               | promise I do get that.
               | 
               | The purpose of these licenses is to _sow doubt_ that fair
               | use applies to _distributing_ the output of ML models.
               | 
               | Lawyers are usually a cautious lot. If a legal question
               | has not been answered, they usually want to stay away
               | from any possibility of legal risk regarding that
               | question.
               | 
               | The licenses create a question: does fair use apply to
               | the output of ML algorithms? With that question not
               | answered, lawyers and their companies might elect to stay
               | away from ML models trained with my code, and ML
               | companies might stay away from training ML models on my
               | code in the first place.
               | 
               | That is what I mean by "poisoning the well." The poison
               | is doubt about the legality of distributing the output of
               | ML models, and it is meant to put a damper on enthusiasm
               | for code being used to train ML models, especially for my
               | code.
        
           | bluGill wrote:
           | While they are not tested, anything other than accepting the
           | idea kills the idea of software completely. There is lots of
           | room to change details, but somehow copyright and the fact
           | that the code is copied into computer memory needs to be
           | reconciled.
        
             | xxpor wrote:
             | I don't see how. It might kill specific ideological
             | licensing of software code, but the idea it'd kill software
             | as a whole is pretty unbelievable. Software is too valuable
             | to society.
             | 
             | As we're seeing, there's VERY little software where the
             | specific algorithms or ideas in the software are what's
             | valuable. The value comes from the ability to sell a
             | service based on the software and operate it at scale. Like
             | you said, how much SaaS is mostly open source stuff
             | packaged up? Android is (sort of) open source, companies
             | pay lots of people a lot of money to contribute to the
             | Linux kernel where they give away the code they developed
             | with that money, etc etc.
        
           | hodgesrm wrote:
           | > Software licenses have barely been tested in court...
           | 
           | OSS licenses have been litigated and upheld. Can't supply
           | details of my own experience for confidentiality reasons but
           | plenty of plaintiffs have prevailed in suits about violations
           | of OSS license terms. My guess is the numbers are higher than
           | you might think because a lot of the cases end in non-public
           | settlements.
        
           | sangnoir wrote:
           | > You're extremely overconfident about how this will actually
           | play out.
           | 
           | I'd argue Microsoft too, was/is overconfident about how this
           | would play out. I would have expected a little more caution
           | on selecting the training data.
        
       | hartator wrote:
       | > We already know that Copilot as it stands is unacceptable and
       | unjust, from our perspective.
       | 
       | So, why call for white papers? I don't believe they will publish
       | any papers that go against their views.
        
         | meepmorp wrote:
         | They have a position and they now want to support it with
         | arguments, and they'd like it if people would help them do
         | that.
         | 
         | I think that's a backwards because it's putting the conclusion
         | first then seeking to justify it, but to each their own.
        
           | user-the-name wrote:
           | No, they have a position and arguments to support it, but
           | those have nothing to do with the machine learning aspects,
           | just with the fact that the software is proprietary.
           | 
           | They are asking for views on the machine learning, which they
           | do not have arguments or a position on.
        
           | kelnos wrote:
           | > _I think that 's a backwards because it's putting the
           | conclusion first then seeking to justify it_
           | 
           | Isn't that literally a lawyer's job?
        
             | [deleted]
        
             | meepmorp wrote:
             | >Isn't that literally a lawyer's job?
             | 
             | I guess, but then they should have their story straight
             | before they start the astroturfing campaign.
        
         | tyingq wrote:
         | They know a couple of reasons for sure. They want more reasons,
         | or more detail on other reasons for which they aren't as sure
         | yet.
        
         | humanistbot wrote:
         | You seem to be unfamiliar with (edit: or object to) the very
         | idea of lawyers.
        
         | user-the-name wrote:
         | Read the rest of the paragraph. They think it is unacceptable
         | and unjust from certain perspectives that are trivial for them.
         | However, there are other perspectives that are worth exploring,
         | and that is what this is about.
        
       | NelsonMinar wrote:
       | This is the FSF that put Richard Stallman back on the board? No
       | thanks.
        
         | quasarj wrote:
         | Ahh someone who can't read for themselves, eh? go away
        
           | muricula wrote:
           | Is this constructive? I think it's reasonable to criticize an
           | organization for its leaders, and question their actions
           | accordingly. And he is on the board.
        
       | slownews45 wrote:
       | Anyone feel like FSF moved from maybe engineering idealists to a
       | very lawyer driven type org?
       | 
       | The big GPLv3 push and development - plenty of attacks on folks
       | actually shipping product on GPLv2 and building communities
       | around that model (which keeps software free but allows users of
       | the software to do what they want with it pretty much including
       | putting in devices that are locked down - cars / tivo's etc).
       | 
       | Here's an opportunity to really advance in an interesting area
       | with ML -> something that may open up programming to more people
       | -> may advance computers ability to program and modify their own
       | programs in the long run.
       | 
       | And regardless of the FSF attorney stuff, places like china, tiny
       | little LLC's with no assets will very likely use the wonderful
       | amount of code on the web to develop solutions in this space,
       | even if FSF claims everything is a violation. Where is the vision
       | anymore from FSF.
       | 
       | One thing that's been sad about the FSF -> it's gone from what I
       | would consider a forward looking idealism sort of thing -> here's
       | how we could do / make cool stuff that let communities work
       | together -> to now sort of a legal compliance type org that
       | really is focused on "actionable claims" " protected against
       | violations" etc.
       | 
       | Question - does the Linux community and other successful larger
       | open source communities welcome the FSF and their attorney's into
       | the discussion? I can hardly imagine the BSD's, the Linux folks
       | really connecting anymore with them.
       | 
       | Is there space for a different group, maybe a collection of
       | actual develops shipping code in larger communities to get
       | together, no FSF / SFC lawyers present, to think creatively about
       | the future? What should we be working for, what is fair to
       | everyone, what helps society, what works around pro-social
       | community building?
       | 
       | A tool that helps with cross language building blocks for common
       | functions etc (stackoverflow on steroids) - just how bad is this?
        
         | danhor wrote:
         | This is more of a tangent, but I found this framing very
         | interesting: > which keeps software free but allows users of
         | the software to do what they want with it pretty much including
         | putting in devices that are locked down - cars / tivo's etc
         | 
         | The FSF considers the user to be the one using
         | cars/tivo's/other devices. In their view, this was a design
         | flaw of gplv2 that it allowed locking out end-users of their
         | devices.
         | 
         | For Linux this was not the case. The important part that
         | modifications/extensions were shared (and maybe even
         | upstreamed), while the end user access wasn't important.
         | 
         | The case of tivoization fractured the interest between the
         | mostly moral "I want freedom for the end user" and the more
         | immediately benefical "If you use my code, I want reciprocity
         | for modifications".
         | 
         | I personally believe that today the latter case won, even for a
         | lot of non-gpl software that gets lots of contributions e.g.
         | via github for lots of different reasons, but the moral case
         | gets more dire.
         | 
         | Looking at security for older (or shockingly often even
         | current) devices, right to repair and lots of other issues
         | concerning the effective loss of rights with more modern
         | devices, the concerns of the FSF were often accurate, but with
         | the increasingly hostile approach to "proprietary" IP and thus
         | the exclusion of GPLv3 and similar licenses not palatable to
         | the larger open source community.
         | 
         | The approach to IP in china is also sometimes a lot different,
         | see https://www.bunniestudios.com/blog/?p=4297.
        
           | slownews45 wrote:
           | Right - FSF ended up with a user view. Problem was the
           | developers are the one actually writing the code and picking
           | licenses, and the FSF moved away from really talking with
           | them. I think this was a big shift.
        
         | e40 wrote:
         | No. The FSF had lawyers from the beginning and always thought
         | (I talked with RMS in the early 80's some) that enforcement was
         | part of the plan.
        
           | slownews45 wrote:
           | Sure, but the GPLv2 was very freedom oriented. Enforcement
           | practically was relatively sparse and more educational I
           | thought. Ie, release the TiVo source code, but we don't care
           | that Tivo's are locked down.
           | 
           | Is anyone building strong communities on AGPLv3 / GPLv3? I
           | feel the momentum shifted towards Apache / MIT style licenses
           | unfortunately.
        
             | detaro wrote:
             | > _but we don 't care that Tivo's are locked down._
             | 
             | They literally made the GPLv3 because they cared about that
             | very much.
        
               | slownews45 wrote:
               | The FSF did GPLv3. The folks doing Linux etc did not.
        
               | pseudalopex wrote:
               | You asked if the FSF changed.
        
               | vhold wrote:
               | In case anybody is wondering about this:
               | 
               | https://en.wikipedia.org/wiki/Tivoization
               | 
               | https://www.gnu.org/licenses/gpl-faq.html#Tivoization
        
             | blendergeek wrote:
             | > Is anyone building strong communities on AGPLv3 / GPLv3?
             | I feel the momentum shifted towards Apache / MIT style
             | licenses unfortunately.
             | 
             | While the _corporate_ momentum switched to Apache /MIT
             | licenses, there are strong _communities_ built on AGPLv3
             | /GPLv3.
             | 
             | * Nextcloud - file hosting (AGPLv3)
             | 
             | * Source Hut - git hosting (AGPLv3)
             | 
             | * StreetComplete - OpenStreetMap editing (GPLv3)
             | 
             | * F-Droid - Free Software "app store" for android (GPLv3)
             | 
             | * NewPipe - alternative Youtube frontend (GPLv3)
             | 
             | While these aren't necessarily used by large corporations,
             | their individual communities are thriving and strong.
             | 
             | The shift toward SSPL and Commons Clause licensing is
             | another argument in favor of AGPLv3 licensing.
             | Amazon/Google often won't touch your AGPLv3 code (and you
             | can still sell proprietary licenses to other companies that
             | can't/won't use AGPLv3).
        
               | slownews45 wrote:
               | (A)GPLv3 actually has seen some real growth corporate
               | side -> it's used commonly by proprietary tech companies
               | as sort of a poison license (Microsoft had some of these
               | like SSPL).
               | 
               | The way this works is all contributors are required to
               | sign a CLA -> the corporate developer can then use their
               | code under ANY license, and most importantly can
               | integrate into propriatery products or sell to others.
               | 
               | The code is then released as an AGPLv3 to be "open
               | source" - but literally the only company with the "super"
               | rights to license / make money off it is the corp dev.
               | 
               | It's kind of genius -> so I think we may see more
               | (A)GPLv3 stuff coming this way. The corp developer can
               | then offer for example a hosted version of the software
               | WITHOUT releasing all the related code! But anyone else
               | would have to release their code.
               | 
               | You an see how this is done here:
               | 
               | https://grafana.com/docs/grafana/latest/developers/cla/
        
               | gkbrk wrote:
               | > The code is then released as an AGPLv3 [...] but the
               | only company with the rights to make money off it is the
               | corp dev.
               | 
               | Actually anyone that has the AGPL code can sell and/or
               | make money from it. People regularly buy GPL software and
               | pay monthly subscriptions to hosted AGPL software.
               | 
               | If you can't compete without having some code as "trade
               | secrets"; that's your failed business model, not a fault
               | of the license.
        
               | necovek wrote:
               | > But anyone else would have to release their code.
               | 
               | Which I think is perfectly fair: you are getting a full
               | product, and you can do with it as you please (including
               | profit off of it), as long as you publish your changes
               | too!
               | 
               | The fact that the original copyright holder has the
               | rights to close it off for future developments is
               | completely natural, and if you do not want to allow them
               | to do that, don't sign a CLA and fork. Oh, there's a cost
               | in maintaining a fork? Pick your poison then :)
               | 
               | To me what matters is that once you get the software, you
               | have freedom to use and modify it. I am ok if you do not
               | have the "freedom" to close it off. If you start being a
               | bigger contributor than the original company, you avoid
               | all of the problems with a fork, but you can't say you
               | did _not_ benefit from the original AGPL release.
        
         | simion314 wrote:
         | >And regardless of the FSF attorney stuff, places like china,
         | ....
         | 
         | So your argument is if China does not care about license
         | neither should we, the thing is I am fine with that, I know
         | Windows source code is leaked so let's train an AI on it too
         | 
         | I think is a clear sign that MS did not trained on proprietary
         | code , it means that is not legal or not safe, so the question
         | is why GPL or other licenses are safe, I think you need the
         | authors or the licenses to give you the permission to use the
         | code as training data in black box, locked, proprietary
         | algorithms.
        
         | laumars wrote:
         | I'm all for advancing machine learning but given how much big
         | corporations aggressively defend their IP, it's a hard pill to
         | swallow if someone shrugs off a potential misuse of open source
         | code. The law is the law and if it's ok for Microsoft to defend
         | their copyrights then it's ok for the FSF to defend my
         | copyrighted code too. The fact that I licensed it GPL was
         | intentional -- if I didn't give a crap what happened to the
         | code then I'd have used BSD or similar. But I _chose_ to place
         | restrictions and I'm very much interested to see if training
         | proprietary AI models are legally covered under those
         | restrictions.
        
         | blendergeek wrote:
         | > The big GPLv3 push and development - plenty of attacks on
         | folks actually shipping product on GPLv2 and building
         | communities around that model (which keeps software free but
         | allows users of the software to do what they want with it
         | pretty much including putting in devices that are locked down -
         | cars / tivo's etc).
         | 
         | The users of the software are the owners of the devices. The
         | distributors are the ones locking down the devices to prevent
         | the users from modifying the software (often so that the
         | distributors can control something else the users are doing).
         | 
         | GPL is about end-user freedom (as opposed to software
         | distributor freedom). This is why GPLv3 exists.
        
           | slownews45 wrote:
           | GPL used to be targeted at DEVELOPERS of software - the share
           | and share alike model. These developers would in some cases
           | use the GPL'ed software in locked down devices (many / most
           | android devices are pretty locked down - but developers
           | contribute to a GPL kernel).
           | 
           | So yes, FSF created GPLv3 to focus on USERS freedoms, but the
           | users are not writing the software - so it remains the devs
           | who pick licenses.
        
             | pseudalopex wrote:
             | The Free Software definition always put users first.[1]
             | 
             | [1] https://en.wikipedia.org/wiki/The_Free_Software_Definit
             | ion#T...
        
       | ghoward wrote:
       | I honestly wish I was in a position to write a whitepaper for
       | this. However, I should not for several reasons:
       | 
       | * I have already made my position clear in public, [1] so I could
       | probably be identified.
       | 
       | * I am not a lawyer, just some bloke who attempted to write FOSS
       | licenses to combat ML on copyrighted code. [2]
       | 
       | [1]: https://gavinhoward.com/2021/07/poisoning-github-copilot-
       | and...
       | 
       | [2]: https://yzena.com/licenses/
        
       | senko wrote:
       | > We already know that Copilot as it stands is unacceptable and
       | unjust [...]. Activists wonder if there isn't something
       | fundamentally unfair about a proprietary software company
       | building a service off their work.
       | 
       | > We will read the submitted white papers, and _we will publish
       | ones that we think help elucidate the problem_.
       | 
       | Doesn't give me hope they're aiming for unbiased opinion. I would
       | be _very_ surprised if any of the published papers don 't closely
       | align with FSFs apriori position.
        
         | user-the-name wrote:
         | The part you removed is the crucial part that explains that
         | paragraph.
        
         | [deleted]
        
         | kelnos wrote:
         | Well, sure. They're looking for legal support for their
         | position. They're not pretending to be an unbiased,
         | disinterested observer.
        
         | nescioquid wrote:
         | It sounds like they have a legal premise and they want to work
         | out the implications, not to open up discussion to every
         | quibble about the FSF's values. Having an opinion on the legal
         | issues around their licenses and values seems sort of essential
         | to what the organization does.
         | 
         | The word "unbiased" seems to be doing a lot of heavy work in
         | your comment. The FSF is inherently biased towards its project
         | -- how is that a problem?
        
           | senko wrote:
           | > The word "unbiased" seems to be doing a lot of heavy work
           | in your comment. The FSF is inherently biased towards its
           | project -- how is that a problem?
           | 
           | That's straw-man, I never said (nor do I think) FSF should
           | not be biased towards its project.
           | 
           | However, I would be more willing to trust the results of this
           | call if I had confidence that all solid arguments are
           | presented, even if they're not aligned with FSF's agenda.
           | Hiding them won't make them disappear - you might as well get
           | as informed as possible about the issue, _especially_ if you
           | care deeply about the issue and agree with the FSF.
        
         | [deleted]
        
       | zekrioca wrote:
       | Interesting: In HN, a same link submitted at a different time get
       | different # of upvotes.
       | 
       | Same link, just 13h ago, but with 5x less upvotes than the one in
       | here: https://news.ycombinator.com/item?id=27992894
        
         | ghoward wrote:
         | Because the US programmers were going to bed?
        
           | zekrioca wrote:
           | I'd expect HN to not let duplicates to be submitted.
        
             | IvyMike wrote:
             | From the FAQ https://news.ycombinator.com/newsfaq.html
             | 
             | > Are reposts ok?
             | 
             | > If a story has not had significant attention in the last
             | year or so, a small number of reposts is ok. Otherwise we
             | bury reposts as duplicates.
             | 
             | > Please don't delete and repost the same story. Deletion
             | is for things that shouldn't have been submitted in the
             | first place.
        
               | zekrioca wrote:
               | Attention to "in the last year or so". The same link was
               | posted 13-14h ago, not "1 year or so ago".
        
               | IvyMike wrote:
               | They have explicitly stated that are ok with retrying
               | recent overlooked stories just in case a story got missed
               | or buried for whatever reason.
               | 
               | They are also ok with reposting year+ old stories that
               | did get significant attention at the time, since the new
               | respost may find a new audience.
        
               | zekrioca wrote:
               | The main point is: Assuming the New section is a queue,
               | and that the very same link is posted twice, should the
               | first link be re-queued in the New section again? It was
               | clearly a duplicate, although it was not flagged as such.
        
             | ghoward wrote:
             | I see. Well, it actually happens all the time.
        
       | pkrefta wrote:
       | I'm using Github to publish my code and seriously I don't care
       | whenever Copilot was trained using it. I published it and in the
       | end somebody can do anything with it without giving a damn about
       | license, copyright etc - that's the truth of open-source.
        
         | kelnos wrote:
         | > _that 's the truth of open-source_
         | 
         | No, that's _your opinion_ , which as it turns out also has no
         | legal basis. For me, I want proper attribution from people who
         | use my code. And for any code that I release that's under
         | copyleft, I absolutely do want that license followed.
         | 
         | You seem to be fine releasing your stuff into the public
         | domain, and that's great that you want to do that, but you
         | don't speak for everyone.
        
         | laumars wrote:
         | This is why there are a multitude of different open source
         | software licences. Because some people care more than others
         | about the terms in which their code is used by others.
        
         | johannes1234321 wrote:
         | That is a valid position one can have.
         | 
         | However other people for varying reasons have other ideas ...
        
         | grepfru_it wrote:
         | This was the same mentality that brought copyleft to the masses
         | in 1984. While you may not care, there are others who do care
         | about the sanctity of license agreements. This is an argument
         | where staying silent means you accept this approach. Of the
         | millions of open source projects, a large portion of the
         | contributors ARE speaking up because they don't find this to be
         | acceptable. I personally think copilot is the future and all
         | this discussion is doing is going to bring a license usage
         | feature to copilot (e.g. i want only or i do not want GPL code
         | in my copilot suggestions)
         | 
         | Please continue using GitHub as you were, but maybe consider
         | acting on your words and either removing or changing licenses
         | within your code that does not represent your ideals. Nothing
         | is preventing you from releasing code into the public domain,
         | so do that!
        
           | Permit wrote:
           | > Of the millions of open source projects, a large portion of
           | the contributors ARE speaking up because they don't find this
           | to be acceptable.
           | 
           | Is this true? Is there really a large portion of contributors
           | speaking up against this? I got the opposite sense, that it
           | was a very small portion of contributors speaking up against
           | this but I don't have any evidence one way or the other.
        
         | colechristensen wrote:
         | Well then you're a BSD-license kind of person.
         | 
         | Not everybody is and that's ok too.
        
           | nitrogen wrote:
           | The BSD license still requires attribution and copyright
           | notices visible to the end user.
        
       | whazor wrote:
       | I am curious about the results.
       | 
       | Having tested copilot, most suggestions are based on existing
       | code in your opened file. Furthermore, most snippets tend to be
       | relatively short, where it feels more like a Stack Overflow
       | answer than existing code.
       | 
       | Of course it is possible to make the model generate longer pieces
       | of code that are potentially GPL. But you would have to do
       | certain effort for it. It also tends to adopt your coding style.
       | 
       | But maybe the fact that there are no guarantees makes it unfair.
        
       ___________________________________________________________________
       (page generated 2021-07-29 23:01 UTC)