Post A92stCxDk4a6f2KSuW by tindall@cybre.space
(DIR) More posts by tindall@cybre.space
(DIR) Post #A92p9kpFB3gKWmETCK by tindall@cybre.space
2021-07-07T11:51:30Z
21 likes, 33 repeats
oh my gods. they literally have no shame about this.GitHub Support just straight up confirmed in an email that yes, they used all public GitHub code, for Codex/Copilot regardless of license.
(DIR) Post #A92p9lEPfToZmq0ZQe by piggo@piggo.space
2021-07-07T11:54:17.511951Z
1 likes, 2 repeats
@tindall one more case for self hosted git servers, isn't it ..
(DIR) Post #A92qztm5FZBRIMBNJ2 by dashie@pleroma.otter.sh
2021-07-07T11:55:40.096979Z
0 likes, 0 repeats
@piggo @tindall won't change much, problem is with machine learning bullshit where they all assumes they can use whatever shit they find on the internet, put it on their model and tada, magic copyright law exception
(DIR) Post #A92qzuCJg2AQbiSKC8 by piggo@piggo.space
2021-07-07T12:14:55.166219Z
0 likes, 0 repeats
@dashie @tindall yeah sure but the chance of some silicon valley weenie scraping your private gitlab is much lower than microsoft doing it on their own website
(DIR) Post #A92qzvE7qnCzncJHea by dashie@pleroma.otter.sh
2021-07-07T11:57:10.170981Z
2 likes, 2 repeats
@piggo @tindall it's ✨ fair use ✨
(DIR) Post #A92s19G7GTzpYxODzM by Mia@disqordia.space
2021-07-07T12:26:22.982676Z
3 likes, 0 repeats
@tindall eris public license time: under no circumstances can this software be used to any means or for any purposes
(DIR) Post #A92sQCsmvehC2cQXNQ by tindall@cybre.space
2021-07-07T11:53:39Z
14 likes, 8 repeats
it's official, obeying copyright is only for the plebs and proles, rich people and big companies can do whatever they want
(DIR) Post #A92sSF49mYelOMiOae by irimi1@merveilles.town
2021-07-07T11:58:36Z
0 likes, 0 repeats
@tindall Did anyone ask for the Github source code yet? It should be GPL itself now, right?
(DIR) Post #A92sSFVS94UUl1UC8W by tindall@cybre.space
2021-07-07T12:00:26Z
0 likes, 0 repeats
@irimi1 unfortunately I don't think that really follows. Codex/Copilot should be, though.
(DIR) Post #A92sSFvKarBu3HarTM by irimi1@merveilles.town
2021-07-07T12:12:40Z
2 likes, 0 repeats
@tindall I would expect someone to challange that, I think it’s an interesting question.Regardless of that I’d love to see consequences for a big company like Github that just takes all their public code, including projects with restrictive licenses, and makes it into a project of their own. Earning money on other people’s (or even their customers’) achievements.
(DIR) Post #A92st3X0m9GjR8zuJE by ultranova@cybre.space
2021-07-07T12:13:28Z
2 likes, 0 repeats
@tindall class action class action class actionwe can only hope
(DIR) Post #A92stCxDk4a6f2KSuW by tindall@cybre.space
2021-07-07T12:15:21Z
1 likes, 0 repeats
@ultranova there's no way. I have hope but I do not have faith.
(DIR) Post #A92tQgZadnCa4vEuIa by Mia@disqordia.space
2021-07-07T12:42:12.240839Z
2 likes, 0 repeats
@tindall
(DIR) Post #A92uujuqsOD15W4yGm by vae@programming.socks.town
2021-07-07T12:58:50.355176Z
1 likes, 0 repeats
@dashie @piggo @tindall >is considered fair use *across the machine learning community*
(DIR) Post #A92wcPj1PFpVAAw2TY by mrsaturday@shitposter.club
2021-07-07T13:17:56.049573Z
0 likes, 0 repeats
@tindall Github just means your code is on Microsoft's computer
(DIR) Post #A92xB7A0QCV7il6SfY by a1batross@expired.mentality.rip
2021-07-07T13:24:10.908768Z
1 likes, 0 repeats
@vae @tindall @piggo @dashie never liked those arrogant data scientists
(DIR) Post #A935NQwvTIIIChN1nc by meeper@outerheaven.club
2021-07-07T14:56:04.476835Z
2 likes, 0 repeats
@dashie @piggo @tindall Eyy, so I'm gonna infringe on all of microsoft patents and code, because it's considered fair use in the programming community
(DIR) Post #A935fArI2HO1VlC3jU by SuperDicq@cdrom.tokyo
2021-07-07T14:58:56.680307Z
1 likes, 0 repeats
@tindall @lienrag @VickyRampin Julia Reda’s article is correct if all Copilot did was generate new code based on training data. Unfortunately in reality Copilot is able to copy/paste code verbatim directly from GPL licensed repos, I would not consider this “generating”.
(DIR) Post #A935iHmKlP6DqHMKIq by emilis@sectorinf.com
2021-07-07T14:59:50.146000Z
0 likes, 0 repeats
@SuperDicq @tindall @lienrag rewriting code as-is by hand would be the human equivalent. No creative process happening.
(DIR) Post #A9362bnI3YXBi7Usj2 by SuperDicq@cdrom.tokyo
2021-07-07T15:03:28.822677Z
1 likes, 0 repeats
@dashie @piggo @tindall Also “across the machine learning community” doesn’t mean anything. Who cares what their manufactured consent community thinks, I’m only interested in what the judicial system thinks.
(DIR) Post #A936Sd5Ts0V9FdPgWG by SuperDicq@cdrom.tokyo
2021-07-07T15:08:12.273869Z
2 likes, 0 repeats
@dashie @piggo @tindall Also even if using copyrighted material as training data is not illegal, it does put the users of Copilot at risk.If you use Copilot to assist you in writing code you could end up in a situation where it will unknowingly to you suggest you to write code that is copyrighted, which would get you into trouble.
(DIR) Post #A936lEd125Q0tWyFMm by SuperDicq@cdrom.tokyo
2021-07-07T15:11:33.602808Z
2 likes, 0 repeats
@dashie @piggo @tindall Otherwise this would be a really bad copyright loophole.>Write a very simple copy and paste program.>Call it an AI.>Name the input param “training data” and the output param “generated data”.>???>Profit
(DIR) Post #A938r415bEFZd38cs4 by eris@disqordia.space
2021-07-07T15:35:02.710637Z
2 likes, 1 repeats
@Mia @tindall anyone caught using this software for any reason shall be swiftly put to death
(DIR) Post #A9390LDj0P0pakTKtc by lienrag@mastodon.tedomum.net
2021-07-07T12:01:57Z
0 likes, 0 repeats
@tindall Have your read Julia Reda's article on the topic ?@VickyRampin
(DIR) Post #A9390Ljd5mX7BhOocq by tindall@cybre.space
2021-07-07T12:03:23Z
1 likes, 0 repeats
@lienrag @VickyRampin I have. I find her argument rather unconvincing - that is, she may be right about the law, but if she is, copyright is absolutely a failed system.
(DIR) Post #A939832wSYPFXj6TS4 by uint8_t@chaos.social
2021-07-07T12:59:09Z
1 likes, 2 repeats
@tindall @ultranova considerfeeding the leaked Windows source code into a Markov chain, then use that to write something useless, and let see if Microsoft argues that this violates their copyright
(DIR) Post #A9398OMrBhoXfZaAuO by tindall@cybre.space
2021-07-07T12:59:37Z
1 likes, 0 repeats
@uint8_t @ultranova I really want to do this but I really don't want Microsoft to sue me
(DIR) Post #A93BOsYiu8dTqe7dei by Ghislaine@poa.st
2021-07-07T16:03:27.148820Z
0 likes, 0 repeats
@SuperDicq @dashie @piggo @tindall Hmm perhaps I will make a ML model to output the lord of the rings movies for me.
(DIR) Post #A93HouwPGooeUR00jg by SuperDicq@cdrom.tokyo
2021-07-07T17:15:29.588412Z
2 likes, 0 repeats
@Ghislaine @dashie @piggo @tindall "No officer, you don't understand, I didn't copy the entire Lord of the Rings trilogy, I trained an AI and it just happened to output a pixel perfect recreation of the entire series by chance"
(DIR) Post #A93YM0yjRUXU2OtI92 by Bajax@clubcyberia.co
2021-07-07T20:20:45.782059Z
1 likes, 1 repeats
@tindall they must have run it by their lawyers and decided that this didn't constitute any kind of usage that contravenes the GPL. Lawsuits will likely follow being arbitrated by boomer judges who have no idea what the fuck any of this electronical bs is.
(DIR) Post #A93YaBR7MCwAWgvMHo by Bajax@clubcyberia.co
2021-07-07T20:23:09.158840Z
1 likes, 1 repeats
@nvi @tindall You can make the argument that the pathways in the neural network constitute a kind of copy of the code being used for the AI. It's something without precedent, we don't know the answer to that question. If it comes to a lawsuit it will be historic by definition, and tons of future decisions will be based on it.
(DIR) Post #A93ZPeh4L4jW2Dvefg by Bajax@clubcyberia.co
2021-07-07T20:32:37.446373Z
2 likes, 1 repeats
@nvi @tindall There are so many different approaches to this argument, and they influence the way it'll play out:They might argue that whether they actually copied GPL code by training their neural net with it depends on whether the AI outputs code that performs exactly the same function as the training data.The argument might be that they're using the labor offered up by contributors to GPL projects with the understanding that the products of that labor would only be used in GPL-conforming ways. They're effectively capitalizing on their work in unlicensed ways.You could also make the most basic obvious argument-- a neural net is not an exact copy of the training data, so doesn't count as "copying", even though it uses info derived from it. It might even be seen as "acceptable use" or something similar (I don't know the actual legal doctrine terms to use here).It's also possible NONE of these arguments matter and the whole thing will hinge upon some technicality we're not considering. They could say absurd shit like traffic management stuff ISPs do in their routers based on QOS packet-type tagging constitutes using the "intellectual property" of the originating client in an unapproved way, therefore any line of argument that considers this a kind of derived work is likewise absurd.I've seen cases with tech shit apart on flimsier grounds. So yeah, this is one to keep an eye on.
(DIR) Post #A93Zxk3Sud9iVZesRE by andreas@neckbeard.xyz
2021-07-07T20:38:46.536228Z
0 likes, 0 repeats
@tindall No shit.
(DIR) Post #A93qL70HM3B5bpKk2S by clacke@libranet.de
2021-07-07T23:11:21Z
0 likes, 0 repeats
@SuperDicq @tindall @VickyRampin @lienrag I was under the impression that it happens 0.1% of the time that the model reproduces something verbatim from the training data and that it detects and filters out or warns against those cases?
(DIR) Post #A93qL7SzdI992slfnM by SuperDicq@cdrom.tokyo
2021-07-07T23:42:14.610772Z
0 likes, 0 repeats
@clacke @tindall @VickyRampin @lienrag I you trust Microsoft's own statistics. Also I've seen people find ways to get it to do that.
(DIR) Post #A93yyuDJ7qkR9tuhnc by eris@rats-at.work
2021-07-08T01:18:51.044754Z
1 likes, 0 repeats
@Mia @tindall hey, thats me!
(DIR) Post #A93z0SRpKTP5cjRel6 by Mia@disqordia.space
2021-07-08T01:19:23.242088Z
1 likes, 0 repeats
@eris @tindall yes, yes it is
(DIR) Post #A94Jj8ARZm8Qq7tRr6 by redstarfish@social.linux.pizza
2021-07-08T05:11:30Z
0 likes, 0 repeats
@tindall Yep they call it "fair use".
(DIR) Post #A94sv43qANrTyTEM8O by rysiek@mastodon.social
2021-07-08T00:42:40Z
0 likes, 1 repeats
@tindall eh, no. There is a datamining exception, that allows this kind of thing:https://juliareda.eu/2021/07/github-copilot-is-not-infringing-your-copyright/And it's important and useful, for scientists and investigative journalists.It also happens to be useful for Microsoft Github Copilot here. And I share your frustration about this. The problem is: it's really difficult to make it not useful for Microsofts of this world without a lot of blocking scientific research and investigative journalism.
(DIR) Post #A94sv5l7qw51F0URF2 by rysiek@mastodon.social
2021-07-08T00:46:03Z
0 likes, 1 repeats
@tindall that is obviously still a conversation worth having, though!Still, Microsoft Copilot does seem to infringe every now and then, when it quotes verbatim full passages from certain pieces of code:https://www.reddit.com/r/programming/comments/oc9qj1/copilot_regurgitating_quake_code_including_sweary/*That's* where Microsoft needs to get smacked hard for copyright infringement and licensing violations!
(DIR) Post #A95lLuShMIUZXy6ifo by Shamar@qoto.org
2021-07-08T21:55:47Z
0 likes, 0 repeats
@rysiekThe argument about the derivative work is plain wrong, and I'm really surprised that Julia Reda wrote something like this.¹ ```On the other hand, the argument that the outputs of GitHub Copilot are derivative works of the training data is based on the assumption that a machine can produce works. This assumption is wrong and counterproductive. Copyright law has only ever applied to intellectual creations – where there is no creator, there is no work. This means that machine-generated code like that of GitHub Copilot is not a work under copyright law at all, so it is not a derivative work either. The output of a machine simply does not qualify for copyright protection – it is in the public domain. That is good news for the open movement and not something that needs fixing.```The output of a compiler is under the copyright of the authors of the sources because the machine does NOT add anything creative, but only apply an algorithmic transform to the sources.Thus the output of a compiler is under the copyright of the authors of the sources.Similarly a zip containing the sources is under the #copyright of the authors of the sources.The training of #GitHubCopilot's model just did the same: it turned sources under their authors' copyright into a big opaque archive (aka blackbox #OpenAI) that can be queried through an API.Thus the model is protected under the copyright of all the authors of the original sources.And since such code were distributed under #AGPLv3 code, the whole model must be distributed within 30 days to prevent a termination of the license.Sure, I'd be very happy to learn that zipping a book or ripping a dvd would end the rights of the copyright holders.But if I cannot algorithmically transform #windows11 binaries, say by decompiling them, ending #Microsoft's right on the output, then Microsoft cannot transform my #AGPLv3 code without complying with the license.____¹ Or at least, I would have been surprised months ago, before she signed the "open letter" against #RMS to divide the #FreeSoftware movement@tindall
(DIR) Post #A9B86fHEnpJwqPFJs8 by SuperDicq@cdrom.tokyo
2021-07-11T12:04:19.611026Z
0 likes, 0 repeats
@dashie @piggo @tindall I see someone has already implemented this idea: https://fairuseify.ml/
(DIR) Post #A9CMShC1x7f3fqAa9o by js@mstdn.io
2021-07-12T02:19:52Z
0 likes, 1 repeats
@Shamar @rysiek @tindall @chebra I talked with her on Twitter (in German) and she wasn’t even aware that Copilot reproduced Quake’s Inverse Square Root Hack, including the “// What the fuck?” comment. And what she said about no copyright would be better for copyleft is plain BS: Then everybody would be able to only distribute binaries.
(DIR) Post #A9Eqhsc1AGM2NHOwfw by seven@pl.panthermoderns.org
2021-07-09T19:32:39.174202Z
0 likes, 0 repeats
@tindall I’m not even sure why this is surprising at all. If anyone thought that “having all the code” wasn’t part of “the plan”, they should really rethink their personal trust model.
(DIR) Post #A9EqhtCsxBqSDceO8m by tindall@cybre.space
2021-07-09T19:33:26Z
0 likes, 0 repeats
@seven Not surprising, but not confirmed anywhere, and thus worth posting about. And generally Microsoft at least _tries_ to pretend they're not stealing things wholesale.
(DIR) Post #A9EqhviVcaKU0HSfi4 by clacke@libranet.de
2021-07-13T06:28:22Z
0 likes, 0 repeats
@tindall @seven Only if they think it's illegal. In this case they clearly don't, and it seems that legal consensus according to the web agrees with them.And again, not submitting one's code to github to refrain from supporting their business model is perfectly valid and reasonable, but:
(DIR) Post #A9EqhyMdmLcSDKFkXY by clacke@libranet.de
2021-07-13T06:28:47Z
0 likes, 1 repeats
Hosting code on github is no more necessary for CoPilot than hosting images at Google is necessary for Google image search.In a hypothetical world where Microsoft didn't have most of the world's open source hosted on their servers they would still have created CoPilot with the same reasoning.