Post ATDY7l7P17BHHTGD7g by dltj@code4lib.social
 (DIR) More posts by dltj@code4lib.social
 (DIR) Post #ATDY7h309XHeefAdhw by dltj@code4lib.social
       2023-03-02T13:14:29Z
       
       0 likes, 0 repeats
       
       This week's DLTJ Thursday Threads looks at the intersection of #copyright and the explosion of new #AI tools like #ChatGPT and #DALLE2. Can works created by an AI algorithm be copyrighted? Do the creators of AI models have an obligation to respect the copyright of works they use in their algorithms? https://dltj.org/article/issue-99-copyright-and-ai 1/9
       
 (DIR) Post #ATDY7l7P17BHHTGD7g by dltj@code4lib.social
       2023-03-02T13:16:16Z
       
       0 likes, 0 repeats
       
       Let's set the stage of whether the output of #AI algorithms can be copyrighted, and it starts with this image. It is a selfie of a monkey, and _Naruto v. Slater_ the courts determined that a monkey cannot hold a copyright. https://dltj.org/article/issue-99-copyright-and-ai#humans 2/9
       
 (DIR) Post #ATDY7lfmxGgd07Lfii by dltj@code4lib.social
       2023-03-02T13:16:41Z
       
       0 likes, 0 repeats
       
       Based on the _Naruto v. Slater_ precedent, the fact that an AI algorithm isn't human seems to be enough to say that the output of #ChatGPT and other #LargeLanguageModel #AI algorithms cannot be copyrighted. https://dltj.org/article/issue-99-copyright-and-ai#copyright-office 3/9
       
 (DIR) Post #ATDY7owum3AVA6NtKa by dltj@code4lib.social
       2023-03-02T13:17:23Z
       
       0 likes, 0 repeats
       
       So what is a #LargeLanguageModel and how does it work? It is a statistical model of how words and phrases follow each other by analyzing huge sets of textual works. This is a big part of how #ChatGPT works. https://dltj.org/article/issue-99-copyright-and-ai#llm 4/9
       
 (DIR) Post #ATDY7paGPkdz88nJfE by dltj@code4lib.social
       2023-03-02T13:18:01Z
       
       0 likes, 0 repeats
       
       Despite the word "Language" in the #LargeLanguageModel name, the same technique can be applied to images when you have a good description of the images in your training set. This is how #DALLE2 works. https://dltj.org/article/issue-99-copyright-and-ai#images 5/9
       
 (DIR) Post #ATDY7qSVA61rqMAdCi by dltj@code4lib.social
       2023-03-02T13:18:32Z
       
       0 likes, 0 repeats
       
       It is easy to say that #LargeLanguageModels are a statistical analysis of huge sets of textual works. But that really doesn't describe how "unimaginably complex" these models are. https://dltj.org/article/issue-99-copyright-and-ai#complexity 6/9
       
 (DIR) Post #ATDY7rDeKnk8CaEHh2 by dltj@code4lib.social
       2023-03-02T13:19:25Z
       
       0 likes, 0 repeats
       
       Despite the complexity, the source of the training data can easily leak through. In this court case, #GettyImages charges that #StableAI violated #copyright and its terms of service when it built the #StableDiffusion service...and it has the pictures to prove it. https://dltj.org/article/issue-99-copyright-and-ai#getty-images 7/9
       
 (DIR) Post #ATDY7sCcg6W3FgkyjQ by dltj@code4lib.social
       2023-03-02T13:20:45Z
       
       0 likes, 0 repeats
       
       Given the right training set and user prompts, it is even possible to regenerate an image that looks a great deal like the source. To say nothing about the #copyright implications, what are the #privacy implications? https://dltj.org/article/issue-99-copyright-and-ai#regeneration 8/9
       
 (DIR) Post #ATDY7tf1G0pBm33AdE by misty@digipres.club
       2023-03-02T21:27:07Z
       
       0 likes, 0 repeats
       
       @dltj I’ve seen AI proponents argue training can’t be copyright infringement because not enough of the source remains in the training set… but tests like this suggest that’s very untrue.
       
 (DIR) Post #ATDY7uJQpl9PnNxRce by dltj@code4lib.social
       2023-03-02T13:21:09Z
       
       0 likes, 0 repeats
       
       Source code is another area where there are questions of copyright, and this time with the terms of #OpenSource licenses as #GitHub launches its Copilot service to generate code snippets. https://dltj.org/article/issue-99-copyright-and-ai#copilot 9/9
       
 (DIR) Post #ATDY7w2qOP4RAWDE2q by dltj@code4lib.social
       2023-03-02T13:22:08Z
       
       0 likes, 0 repeats
       
       And the weekly addition to #CatsOfMastodon. Alan stretches and roars in my lap. bonus/9
       
 (DIR) Post #ATDgN09qBNvxtxBE4O by dltj@code4lib.social
       2023-03-02T22:55:00Z
       
       0 likes, 0 repeats
       
       @misty I thought the same thing until I saw this research. But for specifically crafted prompts and edge cases in the training set, the copyright holders would seem to have a case. Left to be determined: are there so many edge cases that they are no longer so edge-y? Also, the human eye is good at intuiting similar images; I don't think we're so got at seeing similarities in, say, textual works.
       
 (DIR) Post #ATDgevSMOtsuX7OCgK by misty@digipres.club
       2023-03-02T22:57:00Z
       
       0 likes, 0 repeats
       
       @dltj I think you’re definitely right. And the failure of “GPT text detection” programs suggests we’re not yet good at writing software to see it either.