article-seirdy-an-experiment-to-test-github-copilot-s-legality.mw - tgtimes - The Gopher Times
(HTM) git clone git://bitreich.org/tgtimes git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/tgtimes
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) Tags
(DIR) README
---
article-seirdy-an-experiment-to-test-github-copilot-s-legality.mw (11221B)
---
1 .SH seirdy
2 An experiment to test GitHub Copilot's legality
3 .2C 157v
4 .
5 .QP
6 This article was posted on 2022-07-01 by Rohan Kumar
7 .FS
8 https://seirdy.one/posts/2022/07/01/experiment-copilot-legality/
9 gemini://seirdy.one/posts/2022/07/01/experiment-copilot-legality/index.gmi
10 .FE
11 and is now republished on this newspaper, with permission (CC-BY-SA 4.0).
12 .
13 .
14 .IP "Preface"
15 .
16 .PP
17 I am not a lawyer.
18 This post is satirical commentary on:
19 .
20 .IP \(bu
21 The absurdity of Microsoft and OpenAI's legal justification for GitHub Copilot.
22 .
23 .IP \(bu
24 The oversimplifications people use to argue against GitHub Copilot (I don't like it when people agree with me for the wrong reasons).
25 .
26 .IP \(bu
27 The relationship between capital and legal outcomes.
28 .
29 .IP \(bu
30 How civil cases seem like sporting events where people “win” or “lose”, rather than opportunities to improve our understanding of law.
31 .
32 .PP
33 In the process, I intentionally misrepresent how the judicial system works:
34 I portray the system the way people like to imagine it works.
35 Please don't make any important legal decisions based on anything I say.
36 .
37 .PP
38 The only section you should take seriously is “Context:
39 the relevant technologies”.
40 .
41 .
42 .IP "Introduction"
43 .
44 .PP
45 GitHub is enabling copyleft violation \fBat scale\fR with Copilot.
46 GitHub Copilot encourages people to make derivative works of source code without complying with the original code's license.
47 This facilitates the creation of permissively-licensed or proprietary derivatives of copyleft code.
48 .
49 .PP
50 Unfortunately, challenging Microsoft (GitHub's parent company) in court is a bad idea:
51 their legal budget probably ensures their victory, and they likely already have a comprehensive defense planned.
52 How can we determine Copilot's legality on a level playing field? We can create legal precedent that they haven't had a chance to study yet!
53 .
54 .PP
55 A chat with Matt Campbell about a speech synthesizer gave me a horrible idea.
56 I think I know a way to find out if GitHub Copilot is legal:
57 we could use its legal justification against another software project with a smaller legal budget.
58 Specifically, against a speech synthesizer.
59 The outcome of our actions could set a legal precedent to determine the legality of Copilot.
60 .
61 .PP
62 Context: the relevant technologies
63 Let's cover the technologies and actors at play before I start my evil monologue.
64 .
65 .
66 .IP "Exhibit A: GitHub Copilot"
67 .
68 .PP
69 GitHub Copilot is a predictive autocompletion service for writing software.
70 It's powered by OpenAI Codex,
71 .FS
72 https://openai.com/blog/openai-codex/
73 .FE
74 a language model based on GPT-3.
75 .FS
76 https://en.wikipedia.org/wiki/GPT-3
77 .FE
78 It was trained using the source code of public repositories hosted on GitHub, regardless of their licensing.
79 In response to a Request for Comments from the US Patent and Trademark Office, OpenAI claimed that “Artificial Intelligence Innovation”, such as code written by GitHub Copilot, should be considered “fair use”.
80 .FS
81 See Comment Regarding Request for Comments on Intellectual Property Protection
82 for Artificial Intelligence Innovation submitted by OpenAI to the USPTO.
83 https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf
84 .FE
85 .
86 .PP
87 Many of the code snippets it suggests are exact copies of source code from various GitHub repositories.
88 For an example, see this tweet:
89 I don't want to say anything but that's not the right license Mr Copilot.
90 .FS
91 https://nitter.net/mitsuhiko/status/1410886329924194309
92 https://twitter.com/mitsuhiko/status/1410886329924194309
93 .FE
94 by Armin Ronacher
95 .FS
96 https://lucumr.pocoo.org/about/
97 .FE
98 It contains a screen recording of Copilot suggesting this Quake code.
99 .FS
100 https://github.com/id-Software/Quake-III-Arena/blob/master/code/game/q_math.c
101 At line 552
102 .FE
103 When prompted to do so, it obediently fills in a permissive license.
104 That permissive license violates the Quake code's GPL-2.0 license.
105 Copilot provides no indication that a license violation is taking place.
106 .
107 .PP
108 GitHub performed its own research into the matter.
109 .FS
110 I doubt anybody worth their salt would count on a company to hold itself
111 accountable, but at least they tried.
112 .FE
113 You can read about it on their blog:
114 GitHub Copilot research recitation,
115 .FS
116 https://github.blog/2021-06-30-github-copilot-research-recitation/
117 .FE
118 by Albert Ziegler.
119 .FS
120 https://github.com/wunderalbert
121 .FE
122 I'm not convinced that it accounts for the fact that suggested code might have mechanical alterations to match surrounding text, while still remaining close enough to trained data to be a license violation.
123 .
124 .
125 .IP "Exhibit B: The Eloquence speech synthesizer"
126 .
127 .PP
128 I recently had a chat with Matt on IRC about screen readers and different types of speech synthesizers.
129 I mentioned that while I do like some variety, I always find myself returning to the underrated robotic voice of eSpeak NG.
130 .FS
131 https://github.com/espeak-ng/espeak-ng/
132 .FE
133 He shared some of my fondness, and also shared his preference for a similar speech synthesizer called Eloquence.
134 .
135 .PP
136 Downloads of Eloquence are easy to find (it's even included with the JAWS screen reader), but I struggle to find any “official” pages about the original Eloquence.
137 Nuance acquired Eloquent Technology, the developer of Eloquence.
138 Microsoft later acquired Nuance.
139 .
140 .
141 .IP "Eloquence sample audio"
142 .
143 .PP
144 Matt recorded this sample audio clip of Eloquence reading some text.
145 .FS
146 https://seirdy.one/a/eloquence.mp3
147 .FE
148 The text is from the introduction of Best practices for inclusive textual websites.
149 .FS
150 https://seirdy.one/posts/2020/11/23/website-best-practices/
151 .FE
152 .
153 .QP
154 My primary focus is inclusive design.
155 Specifically, I focus on supporting underrepresented ways to read a page.
156 Not all users load a page in a common web-browser and navigate effortlessly with their eyes and hands.
157 Authors often neglect people who read through accessibility tools, tiny viewports, machine translators, “reading mode” implementations, the Tor network, printouts, hostile networks, and uncommon browsers, to name a few.
158 I list more niches in the conclusion.
159 Compatibility with so many niches sounds far more daunting than it really is:
160 if you only selectively override browser defaults and use plain-old, semantic HTML (POSH), you've done half of the work already.
161 .
162 .PP
163 I like the Eloquence speech synthesizer.
164 It sounds similar to the robotic yet predictable voice of my beloved eSpeak NG, but with improved overall quality.
165 Unfortunately, Eloquence is proprietary.
166 .
167 .
168 .IP "Exhibit C: Deep learning speech synthesis"
169 .
170 .PP
171 Deep learning speech synthesis
172 .FS
173 https://en.wikipedia.org/wiki/Deep_learning_speech_synthesis
174 .FE
175 is a recent approach to speech synthesizer creation.
176 It involves training a deep neural network on voice samples, and using the trained model to generate speech similar to a real human voice.
177 One synthesizer using deep learning speech synthesis is Mozilla's TTS.
178 .FS
179 https://github.com/mozilla/TTS
180 .FE
181 .
182 .PP
183 Zero-shot approaches could allow a pre-trained model to generate multiple different voices.
184 YourTTS
185 .FS
186 https://doi.org/10.48550/arXiv.2112.02418
187 .FE
188 is one such example.
189 This could allow us to synthetically re-create a person's voice more easily.
190 .
191 .
192 .IP "My horrible plan"
193 .
194 .PP
195 My horrible plan revolves around going through two different lawsuits to set some judicial precedents; these precedents could improve the odds of succeeding in a lawsuit against Microsoft for Copilot's licensing violations.
196 .
197 .PP
198 If this succeeds, we have new legal justification that GitHub Copilot is illegal; if it fails, we have still gained a means to legally re-create proprietary software.
199 It's a win-win situation.
200 .
201 .
202 .IP "Part One: set a precedent"
203 .
204 .IP 1.
205 Train a modern text-to-speech (TTS) engine using the voice a proprietary one made by a company with a small legal budget.
206 Keep the model's internals hidden.
207 .
208 .IP 2.
209 Then release the final TTS under a permissive license.
210 Remember, we're still keeping the machine-learning model hidden!
211 .
212 .IP 3.
213 Wait for that company to file suit.
214 .FS
215 If the stars align, you could file an anticipatory suit against the company.
216 It's common for declaratory judgement regarding intellectual property rights.
217 https://en.wikipedia.org/wiki/Declaratory_judgment
218 .FE
219 .
220 .IP 4.
221 Win or lose the case.
222 .
223 .
224 .IP "Part Two: use that precedent against Microsoft's Nuance"
225 .
226 .PP
227 Our goal here is to get the same legal outcome as the low-stakes “trial run” of Part One.
228 .
229 .PP
230 Microsoft owns Nuance.
231 Nuance previously bought Eloquent Technology, the developers of the Eloquence speech synthesizer.
232 .
233 .IP 1.
234 Repeat Part One against Nuance speech synthesizers, including Eloquence.
235 Go to court.
236 .
237 .IP 2.
238 Have the ruling from Part One cited as legal precedent.
239 .
240 .IP 3.
241 Achieve the same outcome as Part One, demonstrating that we have indeed set precedent that works against Microsoft's legal department.
242 .
243 .
244 .IP "Implications of the outcomes"
245 .
246 .PP
247 If we \fIwin\fR both cases:
248 Microsoft has the legal high ground.
249 Making a derivative of a copyrighted work using a machine-learning algorithm allows us to bypass copyright licenses.
250 .
251 .PP
252 If we \fIlose\fR both cases:
253 Microsoft does not have the legal high ground.
254 We have good judicial precedent against Microsoft to use when filing suit for Copilot's behavior.
255 .
256 .PP
257 Either way, it's an absolute win for free software.
258 Taking down Copilot protects copyleft from enabling proprietary derivatives (and by extension, protects software freedom).
259 But if we accidentally win these two low-stakes “test” cases, we still gain something else:
260 we can liberate huge swaths of proprietary software, starting with speech synthesizers.
261 .
262 .
263 .IP "Update: on satire"
264 .
265 .PP
266 This post isn't “satire through-and-through” like something from The Onion.
267 Rather, my intent was to make some clear points, but extrapolate them to absurdity to highlight other problems.
268 I don't think I was clear enough when doing this.
269 I'm sorry.
270 .
271 .PP
272 Copilot has been found to suggest significant amounts of code that is dangerously similar to existing works.
273 It does this without disclosing obligations that come with those works' licenses.
274 Training a model on copyrighted works may not be wrong in and of itself; however, using that model to generate new works that are not sufficiently distinct from original works is where things get problematic.
275 Copilot's users could apply proprietary licenses to the generated works, defeating the point of copyleft.
276 .
277 .PP
278 When a tool almost exclusively encourages problematic behavior, the makers of that tool should have put thought into its implications.
279 GitHub and OpenAI have not demonstrated a sufficiently careful approach.
280 .
281 .PP
282 I don't think that “going after” a smaller player just to manipulate our legal system is a good thing to do.
283 The fact that this idea seems plausible to some of my readers shows how warped our perception of the judicial system is.
284 Even if it's accurate (I doubt it's accurate, but I'm not certain), it's sad.
285 Judicial systems incentivise too much predatory behavior.
286 .
287 .
288 .IP "Corrections"
289 .
290 It's come to my attention that Eloquence may or may not still belong to Nuance.
291 Further research is needed.
292 Eloquent Technology was acquired by SpeechWorks in 2000.