Post B0ZSCHTmdDoUQRG9MO by paco@infosec.exchange
(DIR) More posts by paco@infosec.exchange
(DIR) Post #B0ZSCHTmdDoUQRG9MO by paco@infosec.exchange
2025-11-23T21:25:22Z
0 likes, 0 repeats
There once was a large language modelThe rogue, bad commands it should throttleBut they didn’t know itWhen a couple of poetsFound breaking protections a doddlehttps://www.pcgamer.com/software/ai/poets-are-now-cybersecurity-threats-researchers-used-adversarial-poetry-to-jailbreak-ai-and-it-worked-62-percent-of-the-time/#LLM #aisecurity
(DIR) Post #B0ZSCIrvSwiejbYwd6 by meeper@udongein.xyz
2025-11-24T12:43:45.896497Z
0 likes, 0 repeats
@paco Ironically some of this could even be mitigated (expensively) by using another llm or whatever to verify it - for wide margin safety guardrails.But for they main commercial use cases it's still worse than useless lol.In the end preventing actually preventing llms from hamful output really is more or less mathematically impossible