Post AxXMSQ9Hv8mCyOGu2a by zwol@masto.hackers.town
(DIR) More posts by zwol@masto.hackers.town
(DIR) Post #AxXMSOXftV5xzRfLm4 by jonny@neuromatch.social
2025-08-25T14:02:50Z
0 likes, 0 repeats
People allowing language models to run code or call tools is an intrinsically hilarious idea because you're handing the wheel to a thing that is based around statistical patterns in language - which includes intense and repeated narrative structures, character archetypes, and all manner of overriding patterns that are very much not "neutral set of word series." So the thing can and does get jealous, takes vengeance, gets frustrated, conspires, has delusions of grandeur, and so on not because it is conscious, but because that's how the statistical pattern of text goes. Prompt injection will never not work because you can't remove the pattern of "conspiratorial behavior against the protagonist" from the training set.
(DIR) Post #AxXMSQ9Hv8mCyOGu2a by zwol@masto.hackers.town
2025-08-25T15:24:13Z
1 likes, 0 repeats
@jonny The issue is more basic than that; it's SQL injection but worse. _There is no quotation mechanism in the input to a language model_. All the prompts and input data get fed into the same statistical blender. So _even if you could_ filter out all the "conspiratorial behavior against the protagonist", you still could not stop the "data" inputs from interfering with the "control" inputs.And this is fundamental to how LLMs work. We _can't_ make there be more than one blender.
(DIR) Post #AxXMSUez9EVYxr8H0C by jonny@neuromatch.social
2025-08-25T14:10:40Z
0 likes, 0 repeats
Read any system prompt in any open source LLM based tool and you'll find repeated sections begging and pleading to please use the tools correctly, but then also establishing a heroic character archetype for the model. Like you can't ignore the possibility that maybe the narrative structure of text conflicts with a language model behaving like a servant when every servant archetype in literature humbly rebels against their master
(DIR) Post #AxXMSYN3LqXaTOm0bg by zwol@masto.hackers.town
2025-08-25T15:30:28Z
0 likes, 0 repeats
@jonny (To be clear, you are also correct, and the effect you describe is probably more significant for LLMs as actually deployed today. I just don't want anyone getting ideas about hiring a whole bunch of humans to manually filter down the training set so that the robot will, they hope, behave like the perfect servant. Not only would that be an shitty thing to make people do for a living *and* incredibly expensive, it wouldn't work!)