Post AdXiZJW9dPth9nMvho by Fox@noagendasocial.com
 (DIR) More posts by Fox@noagendasocial.com
 (DIR) Post #AdXf5S7cmPKbIDxrwe by sirJoho@noagendasocial.com
       2024-01-05T16:07:39Z
       
       0 likes, 0 repeats
       
       How can a technology that does not have the capability to reproduce it's training data be violating copyright? It doesn't copy anything, it is trained. No articles are present in the model. You can't ask an LLM to spit out a specific article. The basis of argument is simply a misunderstanding of the process. If they win, this will mean anyone reading the NYT and then writing an article based on it would be even more guilty.https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html#ai #chatgpt #copyright
       
 (DIR) Post #AdXf5T3lIFpsCXAIYy by sirJoho@noagendasocial.com
       2024-01-05T16:09:25Z
       
       0 likes, 0 repeats
       
       Ask chatGPT to write a paragraph from any published work contained in the training data. Let me know if it ever produces a copy.
       
 (DIR) Post #AdXf5TyTtNCp2RhayG by sirJoho@noagendasocial.com
       2024-01-05T16:14:00Z
       
       0 likes, 0 repeats
       
       chatGPT: The argument you're referring to typically centers around whether machine learning models, like large language models (LLMs), "learn" from copyrighted material in a way that infringes on those copyrights. Critics argue that training on copyrighted content could be seen as an unauthorized use. Proponents note that LLMs don't replicate articles or books but rather learn patterns and generate new content, akin to someone learning from various sources and then creating something original.
       
 (DIR) Post #AdXf5Ujz2lCfPlvX0q by CapitalB@noagendasocial.com
       2024-01-05T17:00:34Z
       
       0 likes, 0 repeats
       
       @sirJoho it is a clear cut case of bad definitions and sloppy reporting. They infringe as a company by using other works, esp in Europe. You have to ask if you use IP commercially. The engine does not infringe. There is no content left in the engines.Bizarre to me nobody writes it as honest as this.
       
 (DIR) Post #AdXiZJW9dPth9nMvho by Fox@noagendasocial.com
       2024-01-05T17:39:35Z
       
       0 likes, 0 repeats
       
       @sirJoho Get an actual model, one that's uncensored, it can give you verbatim info on request.As for the copyright stuff... Sure, if it wasn't scrubbed from the training data, it'll be there too.It's a problem that's similar to the Banach–Tarski Paradox. Even if the actual data itself isn't stored in the AIs "memory" verbatim (ie. we can't actually see it). The training set and learning methods contain enough information about the info that a full reconstruction is likely possible.
       
 (DIR) Post #AdXjQRj9qFXHy0mpIO by sirJoho@noagendasocial.com
       2024-01-05T17:49:10Z
       
       0 likes, 0 repeats
       
       @Fox no it can't are you kidding. I've used several uncensored models on my home pc. "likely possible" sure, and put 100 monkeys in a room with typewriters, eventually you'll get some shakespeare
       
 (DIR) Post #AdXjlCej4XhhIxbxZI by sirJoho@noagendasocial.com
       2024-01-05T17:52:56Z
       
       0 likes, 0 repeats
       
       @Fox no it can't are you kidding. I've used several uncensored models on my home pc. "likely possible" sure, and put 100 monkeys in a room with typewriters, eventually you'll get some shakespeare.the models contain probabilities, not actual content. copyright laws don't come into play.although training could be added to the law, but again, that introduces the same problem for humans doing it. the law could exclude humans, but we'll see how it's handled.
       
 (DIR) Post #AdXyQgKB4K1uYFfl9k by Fox@noagendasocial.com
       2024-01-05T20:37:18Z
       
       0 likes, 0 repeats
       
       @sirJoho Seriously? Models can't give verbatim data?Ok, fine.... you did this to yourself.Given a unique set of text that only appears in the training set in one or a few ways... How exactly do you think probabilities work in that fashion?I'll show you, it sure as fuck can give verbatim data... Not just a quote or a snippet, how about fucking paragraphs? I think I hear those monkeys calling you from the back room.Example:
       
 (DIR) Post #AdY2J854O41A8VGCMC by sirJoho@noagendasocial.com
       2024-01-05T21:20:43Z
       
       0 likes, 0 repeats
       
       @Fox awesome, now do a nyt article
       
 (DIR) Post #AdY4YlOYhQReJrxV7w by Fox@noagendasocial.com
       2024-01-05T21:45:59Z
       
       0 likes, 0 repeats
       
       @sirJoho Right, because you can't be arsed to really parse what I'm saying. NYT and Wikipedia articles are less likely to be quoted verbatim because the same information is scattered about. Cause statistics and probability. But to flat out state something is not possible is utterly ignorant. If you have something unique, and it gets dumped into an LLM it can repeat it, doesn't matter what it is. If it's repetitive garbage, then you get the same garbage. Same with stable diffusion...
       
 (DIR) Post #AdY88K52266ttJTNMe by sirJoho@noagendasocial.com
       2024-01-05T22:26:02Z
       
       0 likes, 0 repeats
       
       @Fox this is true, and demonstrates that it's not capable of copyright infringement. it is capable of randomly stumbling upon a similar sequence but cannot copy on demand and in fact almost never can, except in rare instances like famous speeches and old songs. it's capable of accidental reproduction, not copyright infringement. not unlike cooking, or musical riffs.
       
 (DIR) Post #AdY9tdLKKcv5CMDa9A by Fox@noagendasocial.com
       2024-01-05T22:45:47Z
       
       0 likes, 0 repeats
       
       @sirJoho Well no.... It demonstrates the exact opposite... If Lincolns speech was copy written, we'd be screwed. Also, with prompting methods we might actually get it to dump the contents verbatim, the key is to hit the probability generator in the right spots. We're definitely not meeting eye to eye and I think your a little biased. I'm trying to say your thinking is flawed. Gotta get outside the box. I'll poke my AI later and see if I can get it to dump something more interesting.
       
 (DIR) Post #AdYBoIOD0bcqfl9GXw by Fox@noagendasocial.com
       2024-01-05T23:07:14Z
       
       0 likes, 0 repeats
       
       @sirJoho I can confirm this method also works for Bible Verses.  It repeats them verbatim as well.  I can ask it for the next verses and it reports then as well.At least the AI knows Jesus, lol.This is an example of widely but accurately reported data. Same with Lincolns Speech.
       
 (DIR) Post #AdYChiYdPxZnl6S6pk by sirJoho@noagendasocial.com
       2024-01-05T23:17:15Z
       
       0 likes, 0 repeats
       
       @Fox Even that speech is slightly wrong, though certainly close enough to be plagiarism. Get it to duplicate some writing that is under current copyright and I'll be surprised. The probability is very low because copyrighted material isn't prevalent in the training data like public domain data is.Also, regarding stable diffusion I have a different attitude, but that's mainly due to logos and iconic characters being so prevalent in the training. Copyright and trademarks are very different.
       
 (DIR) Post #AdYCkGatqFGjtsj9Ie by sirJoho@noagendasocial.com
       2024-01-05T23:17:43Z
       
       0 likes, 0 repeats
       
       @Fox BTW, what model did you use? I've tried wizard, koala, llama, alpaca, and 4chan
       
 (DIR) Post #AdYFDMyVOg8Dzom1ui by Fox@noagendasocial.com
       2024-01-05T23:45:23Z
       
       0 likes, 0 repeats
       
       @sirJoho I'm using the HuggingFaceH4_zephyr-7b-beta for this. It's built off the Mistral AI series. Running on Obabooga's platform. It's one of the the best performing 7B Parameter Class of LLMs (in my opinion). Wizard is good too though. It can be finicky, so you have to be on your game with the parameters of the "characters". But it will absolutely do what you ask it to do.  It's more of a conversational AI though, it's not a good coder. (Use wizard for that).
       
 (DIR) Post #AdYHIpKJc4UDHehmCG by Fox@noagendasocial.com
       2024-01-06T00:08:46Z
       
       0 likes, 0 repeats
       
       @sirJoho The copy of that speech from Lincoln is the Bliss version, just an FYI. It's eeriely close.You're right about the training data, if we had access to ChatGPT4 in the raw, I would bet you it has copy written data in it.It's such a large model that the data is supposed to be probabilistically diluted. So you *should* never see copy written works, but not impossible. The smaller models are problematic because if they DO contain copy written data, it would be much more obvious.
       
 (DIR) Post #AdYQXvN2K79hxZFfW4 by sirJoho@noagendasocial.com
       2024-01-06T01:52:21Z
       
       0 likes, 0 repeats
       
       @Fox I know, I fed the image into chatgpt to quickly transcribe it and compared it to the Bliss transcription 🧐I think the mechanism doesn't engage copyright, as the models don't technically contain the data. If I were the OpenAI lawyers, I would challenge the NYT team to generate a copy of an entire article from one of OpenAI's models. I would expect it could never occur, but it could be close enough to involve plagiarism law, if not copyright.