Post B1t9MplQJpNJxCIWxc by emenel@post.lurk.org
 (DIR) More posts by emenel@post.lurk.org
 (DIR) Post #B1rM3WMp7bILNizcLQ by GossiTheDog@cyberplace.social
       2026-01-01T23:20:32Z
       
       1 likes, 1 repeats
       
       Twitter generated child sexual abuse material via its bot.. and then hid behind the bot when apologising. “Sincerely, Grok”. Hold executives accountable.
       
 (DIR) Post #B1rM3XgMESVxSb8jQm by futurebird@sauropods.win
       2026-01-02T01:51:54Z
       
       0 likes, 0 repeats
       
       @GossiTheDog How did the computer have the kind of training data needed to make such an image?No one cares about this?
       
 (DIR) Post #B1rNjVrDVVIjuvnQHo by otte_homan@theblower.au
       2026-01-02T02:10:39Z
       
       0 likes, 0 repeats
       
       @futurebird web content scrapers are fairly dumb and take whatever they ended up with in front of them, I guess? @GossiTheDog
       
 (DIR) Post #B1rNoR0IymnSJROUIS by futurebird@sauropods.win
       2026-01-02T02:11:36Z
       
       0 likes, 1 repeats
       
       @otte_homan @GossiTheDog I think this is as much an advertisement for what grok "could" do as an apology. I guess that's a a little cynical, but I don't think it's out of line.
       
 (DIR) Post #B1rPaQvqeLYXwZlECe by futurebird@sauropods.win
       2026-01-02T02:31:29Z
       
       0 likes, 0 repeats
       
       @GossiTheDog This is so gross.
       
 (DIR) Post #B1rQpUgp3qXbtbTh8S by Bumblefish@mastodon.scot
       2026-01-02T02:45:19Z
       
       0 likes, 0 repeats
       
       @futurebird @otte_homan @GossiTheDog Spot on.
       
 (DIR) Post #B1rTwcDPmLl77roHho by Su_G@aus.social
       2026-01-02T03:20:14Z
       
       0 likes, 0 repeats
       
       @futurebird Apparently Grok’s middle name is Epstein.@GossiTheDog
       
 (DIR) Post #B1rULGhMund7dFZmXA by Su_G@aus.social
       2026-01-02T03:24:43Z
       
       0 likes, 0 repeats
       
       @futurebird Hadn’t considered that, but now I have, I think you’re absolutely right. It is advertising for Grok (middle name Epstein-Thump; last name Musk). Hence the weirdly detailed description of the images it generated & the fake ‘human-ness’ of the ‘apology’. Absolutely disgusting. No consequences —> even more disgusting. 😤@otte_homan @GossiTheDog
       
 (DIR) Post #B1rWKpnH97O464GAr2 by noondlyt@hellions.cloud
       2026-01-02T03:47:04Z
       
       0 likes, 0 repeats
       
       @futurebird @GossiTheDog My guess is that there is a wealth of csam in the twitter archives and it had access to all of it for training.
       
 (DIR) Post #B1rWibWDs7ZUm1cMuu by bzdev@fosstodon.org
       2026-01-02T03:51:21Z
       
       0 likes, 0 repeats
       
       @futurebird @GossiTheDog It's just a guess - what might have happened is that a user asked for an image of a child wearing some sort of  outfit such as a "fetish corset", which was treated the same as specifying clothing.  Then the "safeguard" software classified the outfit as a swimsuit, and let it by.I don't know  how you use a neural net to provide such "safeguards" as the images needed to train it are illegal to possess.
       
 (DIR) Post #B1rXaOv0w207Kfrl4a by GhostOnTheHalfShell@masto.ai
       2026-01-02T04:01:05Z
       
       0 likes, 0 repeats
       
       @futurebird @GossiTheDog Rely access Epstein files? I am only half kidding here.
       
 (DIR) Post #B1rYLMpiVS1UTVme6S by futurebird@sauropods.win
       2026-01-02T04:09:36Z
       
       0 likes, 0 repeats
       
       @GhostOnTheHalfShell @GossiTheDog That's something I'll never like about the way these models that don't run on your own machine work. You don't really know what has been "mixed in" to the final image you are looking at. Even for innocent things, they could have all sorts of influences. And none of these companies carefully vetted their training sets. They just didn't care.
       
 (DIR) Post #B1rYQWJsfAhBLRLjuK by futurebird@sauropods.win
       2026-01-02T04:10:33Z
       
       0 likes, 0 repeats
       
       @GhostOnTheHalfShell @GossiTheDog This is giving me an idea for a horror story. The face of a murdered child haunts the machine, comes backs to find the killer.
       
 (DIR) Post #B1raCGw5GM4XUFYK5Q by GhostOnTheHalfShell@masto.ai
       2026-01-02T04:30:21Z
       
       0 likes, 0 repeats
       
       @futurebird @GossiTheDog :flan_evil:
       
 (DIR) Post #B1ranrFpC8JAqChzLE by zl2tod@mastodon.online
       2026-01-02T04:37:07Z
       
       0 likes, 0 repeats
       
       @futurebird In no way discounting murdered babies thrown in rivers, there is some real world training data involving Jeffrey Epstein, Krsangi Hanin and Mark Guariglia.@GhostOnTheHalfShell @GossiTheDog
       
 (DIR) Post #B1rfVk7jSzkvHh6Yim by rep_movsd@mastodon.social
       2026-01-02T05:29:52Z
       
       0 likes, 0 repeats
       
       @futurebird @GossiTheDog LLM doesnt need to be trained on such content to be able to generate them.It learns how things would look and can generate something it has never seen before - much like humans.I find it a quite hypocritical that media itself sexualizes teens and preteens under the banner of sexual freedom, while showing fake outrage.The movie Cuties, meant to show the  dangers of this ends up being a medium of exploitation itself.
       
 (DIR) Post #B1rfpMhaG2xYR1EMQS by futurebird@sauropods.win
       2026-01-02T05:33:27Z
       
       0 likes, 1 repeats
       
       @rep_movsd @GossiTheDog "LLM doesn't need to be trained on such content to be able to generate them."People say this but how do you know it is true?
       
 (DIR) Post #B1rgHn85av6wVTBPMm by rep_movsd@mastodon.social
       2026-01-02T05:38:34Z
       
       0 likes, 0 repeats
       
       @futurebird @GossiTheDog Because thats how the math works.It takes random noise, and checks if it looks like the target description.Then modulates the noise and repeats until its satisfied that it looks like what the user described.
       
 (DIR) Post #B1rgK9YjAlAaeMlk9o by RustedComputing@discuss.systems
       2026-01-02T05:39:00Z
       
       0 likes, 0 repeats
       
       @futurebirdThe same way you can use words to describe something to someone who has never been exposed to that thing and they imagine it only using intuition from their own model of the world.@rep_movsd @GossiTheDog
       
 (DIR) Post #B1rgoMjojpwcD9MXyq by futurebird@sauropods.win
       2026-01-02T05:44:29Z
       
       0 likes, 0 repeats
       
       @rep_movsd @GossiTheDog There are things that these generators do well, and things that they struggle with, and things they simply can't generate. These limitations are set by the training data. It's easy to come up with a prompt for an engine that it just can't manage to do since it had nothing to reference.
       
 (DIR) Post #B1rh58d2RwTJNIBono by rep_movsd@mastodon.social
       2026-01-02T05:47:22Z
       
       0 likes, 0 repeats
       
       @futurebird @GossiTheDog The models are getting better by the hour.AI gets details wrong, but in general they are almost as good as any artist who can do photorealism.Also prompting techniques matter a lot
       
 (DIR) Post #B1rhJZFEHyDhycvwRs by futurebird@sauropods.win
       2026-01-02T05:50:08Z
       
       0 likes, 0 repeats
       
       @rep_movsd @GossiTheDog But you could only state that it could generate something not in the training data... if you knew what was in the training data. But that is secret. So you don't know. You don't know if there is a near identical image to the one produced in the training data.
       
 (DIR) Post #B1rhZsGFM0khaYBWtc by rep_movsd@mastodon.social
       2026-01-02T05:51:02Z
       
       0 likes, 0 repeats
       
       @josh0 @futurebird @GossiTheDog Whether or not mainstream AI has been trained on objectionable content cannot be proven by anyone except the law.Whether AI trained on curated content can be coerced to create anything you imagine is a different question.
       
 (DIR) Post #B1rhZtSgtEIhJR0yvo by futurebird@sauropods.win
       2026-01-02T05:52:58Z
       
       0 likes, 0 repeats
       
       @rep_movsd @josh0 @GossiTheDog "Whether or not mainstream AI has been trained on objectionable content cannot be proven by anyone except the law.""LLM doesnt need to be trained on such content to be able to generate them."First statement is true second is a wild guess. You don't know.
       
 (DIR) Post #B1rhnZCN1b72WNtvE0 by rep_movsd@mastodon.social
       2026-01-02T05:55:30Z
       
       0 likes, 0 repeats
       
       @futurebird @GossiTheDog Fair enough, but I am pretty sure that a model that is trained on both images of children and adults, will very easily be able to create images of children in adult like clothes and so forth.Its possible to put some guardrails on what the AI can be asked to do, but only as much as you can put guardrails on any intelligent being who tends to want to do a task for a reward.
       
 (DIR) Post #B1riPMGzWtyjRD5CDY by liferstate@mas.to
       2026-01-02T06:02:20Z
       
       0 likes, 0 repeats
       
       @futurebird @rep_movsd @GossiTheDog 404 Media has reported on how LLM training data sets contain CSAM. This isn't hypothetical, it's a known problem that AI boosters are trying as hard as they can to not-know.
       
 (DIR) Post #B1rij9Pb3avLLDXtTs by futurebird@sauropods.win
       2026-01-02T06:05:49Z
       
       0 likes, 1 repeats
       
       @rep_movsd @GossiTheDog OK you came at me with "Because thats how the math works." a moment ago, yet *you* may think these programs are doing things they can't. 'Intelligence working towards a reward' is a bad metaphor. (Why some see the apology, think it means something.)They will say "exclude X from influencing your next response"  Or "tell me how you arrived at that result" and think, because an LLM will give a coherent-sounding response, it is really doing what they ask. It can't.
       
 (DIR) Post #B1riqW5X93PthKfp9k by futurebird@sauropods.win
       2026-01-02T06:07:17Z
       
       0 likes, 0 repeats
       
       @rep_movsd @GossiTheDog "Its possible to put some guardrails on what the AI can be asked to do."How?
       
 (DIR) Post #B1rjhWM8sbBZsPiUFs by kevingranade@mastodon.gamedev.place
       2026-01-02T06:14:11Z
       
       0 likes, 0 repeats
       
       @RustedComputing @futurebird @rep_movsd @GossiTheDog these this are absolutely not in any way brain like.
       
 (DIR) Post #B1rjhXFRYzQCdvaeS8 by futurebird@sauropods.win
       2026-01-02T06:16:49Z
       
       0 likes, 0 repeats
       
       @kevingranade @RustedComputing @rep_movsd @GossiTheDog"mammal brain"
       
 (DIR) Post #B1rvjXCTuXBIWXvNDs by twasink@aus.social
       2026-01-02T08:31:37Z
       
       0 likes, 0 repeats
       
       @futurebird @GossiTheDog Honestly? It's probably been trained to recognise CSAM, using a government-approved set of training data for that exact purpose. (That's a real thing, BTW).Then, having been trained to _recognise_ it, the safeguards against _generating_ it weren't adequate.
       
 (DIR) Post #B1ry560psJMYTut0YS by david_chisnall@infosec.exchange
       2026-01-02T08:57:58Z
       
       0 likes, 0 repeats
       
       @futurebird @rep_movsd @GossiTheDog One way to think of these models (note: this is useful but not entirely accurate and contains some important oversimplifications) is that they are modelling an n-dimensional space of possible images. The training defines a bunch of points in that space and they interpolate into the gaps. It’s possible the there are points in the space that come from the training data and contain adults in sexually explicit activities, and others that show children. Interpolating between them would give CSAM, assuming the latent space is set up that way.
       
 (DIR) Post #B1s40OYu53L0HxxlCa by futurebird@sauropods.win
       2026-01-02T10:04:22Z
       
       0 likes, 1 repeats
       
       @david_chisnall @rep_movsd @GossiTheDog This has always been possible, it was just slow. I think the innovation of these systems is building what amount to search indexes for the training data by doing a huge amount of pre- processing "training" (starting to think that term is a little misleading) this allows this kind of result to be generated fast enough to make it a viable application.
       
 (DIR) Post #B1s4TWMGAqJkrjwAEK by futurebird@sauropods.win
       2026-01-02T10:09:38Z
       
       0 likes, 1 repeats
       
       @david_chisnall @rep_movsd @GossiTheDog This is what I've learned by working with the public libraries I could find, and reading about how these things work.To really know if an image isn't in the training data (or something very close to it) we'd need to compare it to the training data and we *can't* do that.The training data are secret. All that (maybe stolen) information is a big "trade secret."So, when we are told "this isn't like anything in the data" the source is "trust me bro"
       
 (DIR) Post #B1s4fAW9CbE6bU7Fse by futurebird@sauropods.win
       2026-01-02T10:11:44Z
       
       0 likes, 0 repeats
       
       @david_chisnall @rep_movsd @GossiTheDog It's that trust that I'm talking about here. The process makes sense to me. But, I've also seen prompts that stump these things. I've seen prompts that make it spit out images that are identical to existing images.
       
 (DIR) Post #B1s5D0EHxXXIXdtcJs by david_chisnall@infosec.exchange
       2026-01-02T10:17:50Z
       
       0 likes, 0 repeats
       
       @futurebird @rep_movsd @GossiTheDog Yup, the trust is important because we’ve seen in the copyright infringement lawsuits that these people just outright lie about what they used to train their plagiarism machines.Another (again, oversimplified) way to think of these things is as lossy compression. You take a load of images and compress them, then try to decompress a specific image. It fills in the gaps with things taken from other images. If the image you’re trying to get is not there then everything is filled in. Except in very unusual circumstances, nothing in the training set is 100% preserved, but you can sometimes explore the state space to find where the signals are strongest and then get a high probability that the training set contained something like that.
       
 (DIR) Post #B1s8EIsAQh5AVxrQcy by dahukanna@mastodon.social
       2026-01-02T10:51:39Z
       
       0 likes, 0 repeats
       
       @futurebirdThere’s an infinite number of ways to combine millions (1E7, 10s 1E8, 100s 1E9 or 1000s 1E10) of partial words (tokens). Not all these outcomes reflect real language/ image nor any learning-it’s a known spectrum & difficult statistical probabilistic mathematical outcome for humans but not computers to calculate.This is like saying an arithmetic calculator is “learning” because it can calculate a number combination that you have not previously asked. @rep_movsd @GossiTheDog
       
 (DIR) Post #B1sF43ujtD62uA7xk8 by mspcommentary@mastodon.online
       2026-01-02T12:08:15Z
       
       0 likes, 0 repeats
       
       @futurebird @rep_movsd @GossiTheDog and, if you can, then why didn't they?
       
 (DIR) Post #B1sGqVkZnZvf5i2Xdw by GossiTheDog@cyberplace.social
       2026-01-02T12:05:59Z
       
       0 likes, 0 repeats
       
       @rep_movsd @futurebird https://www.protectchildren.ca/en/press-and-media/news-releases/2025/csam-nude-net
       
 (DIR) Post #B1sGqXHE7fdlpGK8Aq by piggo@piggo.space
       2026-01-02T12:28:01.831178Z
       
       0 likes, 0 repeats
       
       @GossiTheDog @rep_movsd @futurebird sure this is questionable, but can the model be trained to detect nude children pics without being shown what to detect?
       
 (DIR) Post #B1sOhrZg5R1tnxKUgi by hosford42@techhub.social
       2026-01-02T13:56:19Z
       
       0 likes, 0 repeats
       
       @futurebirdIt can interpolate, but it can't extrapolate. Not reliably. The math doesn't magic away the need to visualize something outside the distribution of the training data. The further the request is out of distribution, the harder it gets for the model to be accurate. If the model is able to generate these images, it means at the very least that there are CSAM-adjacent images in its training data.@rep_movsd @GossiTheDog
       
 (DIR) Post #B1sPlWhvi9Y0CRV4FM by vestige@sleepyhe.ad
       2026-01-02T14:08:10Z
       
       0 likes, 0 repeats
       
       @futurebird @rep_movsd @GossiTheDog All of the major modern chat providers are using agentic frameworks. So your one prompt to an LLM does not generate one model invocation. More commonly, it generates dozens or hundreds of invocations, including retrieval steps where the AI system searches for data relevant to your query (Retrieval-Augmented Generation - RAG is already a thing for image generation where the LLM is provided with or can find a set of reference images). You can prove to yourself that the systems are capable of doing this by using an AI to edit an image instead of generating one from scratch - and that has been working well for more than a year. But there are plenty of whitepapers written about RAG + image generation. There is no reliable way to reason about what is in an AI system's training data vs what is used as part of the context for generation anymore.
       
 (DIR) Post #B1sacEsJk21Jmb8HxY by aspargillus@mastodon.online
       2026-01-02T16:09:45Z
       
       0 likes, 1 repeats
       
       @futurebird @rep_movsd @GossiTheDog An honest response would be kind of boring…you: tell me how you arrived at that resultLLM: I did a lot of matrix multiplications
       
 (DIR) Post #B1svlmxCVMTF7UoO3s by ForiamCJ@infosec.exchange
       2026-01-02T20:06:46Z
       
       0 likes, 0 repeats
       
       @futurebird @GossiTheDog Pretty much every one of the "open" data sets that these AI companies are funding have been discovered to be full of CSAM imageshttps://arstechnica.com/tech-policy/2024/08/nonprofit-scrubs-illegal-content-from-controversial-ai-training-dataset/
       
 (DIR) Post #B1t9MplQJpNJxCIWxc by emenel@post.lurk.org
       2026-01-02T22:39:04Z
       
       0 likes, 0 repeats
       
       @futurebird @rep_movsd @GossiTheDog there is verified csam in some of the  largest image datasets. This was just in the media, for example— https://www.404media.co/massive-ai-dataset-back-online-after-being-cleaned-of-child-sexual-abuse-material/
       
 (DIR) Post #B1uSHuHWBgGgnvdMf2 by mathcolorstrees@mstdn.social
       2026-01-03T13:45:51Z
       
       0 likes, 0 repeats
       
       @GossiTheDog @futurebird Would it need to specifically train on csam data to produce csam?
       
 (DIR) Post #B1uTaPszWOo9zSYGUi by mathcolorstrees@mstdn.social
       2026-01-03T14:00:19Z
       
       0 likes, 0 repeats
       
       @futurebird @rep_movsd @GossiTheDog you can run a detection both on what the user prompts for and what the model outputs. It can be done via a completely different model or the same model with different prompts. I can find papers if you’re interested. Anthropic does this well.
       
 (DIR) Post #B1uUSczb0Y8eHEUUue by lxo@snac.lx.oliva.nom.br
       2026-01-03T14:08:33Z
       
       0 likes, 0 repeats
       
       I don't think it's necessary to speculate about that, TBHwe can probably assume safely GenBSs aren't trained on pictures of that particular teen undressed, and yet Grok could extrapolate from pictures of other undressed peoplebodies of teens don't change that much between say 15 and 19 or even slightly older adultsit is not absurd to assume GenBSs are trained with pictures of undressed adults, and that they can get to plausible results by extrapolating from thatCC: @GossiTheDog@cyberplace.social