Post Ad1HL0He6lRw9NDfd2 by simon@fedi.simonwillison.net
(DIR) More posts by simon@fedi.simonwillison.net
(DIR) Post #Ad0gIPDpEcLEK3KYcq by simon@fedi.simonwillison.net
2023-12-20T19:06:21Z
0 likes, 0 repeats
I'm in the latest episode of RedMonk Conversations, talking with Kate Holterhoff about the prompt injection class of security vulnerabilities in applications built on top of LLMshttps://redmonk.com/videos/a-redmonk-conversation-simon-willison-on-industrys-tardy-response-to-the-ai-prompt-injection-vulnerability/Direct YouTube link: https://www.youtube.com/watch?v=tWp77I-L2KY
(DIR) Post #Ad0oTEb1setoBarHto by simon@fedi.simonwillison.net
2023-12-20T20:38:15Z
0 likes, 0 repeats
I extracted an edited version of answer to the question about how we can best mitigate prompt injection as developers building on top of LLM technology: https://simonwillison.net/2023/Dec/20/mitigate-prompt-injection/
(DIR) Post #Ad0sTxu1pdz8WBLhz6 by simon@fedi.simonwillison.net
2023-12-20T21:23:27Z
0 likes, 0 repeats
The rather depressing TLDR: assume we won't find a fix for prompt injection in the forseeable future, then design your systems with that in mind: try to ensure you limit the blast radius of damage should a successful prompt injection attack take place
(DIR) Post #Ad0yxH629IRWAXK7Xs by simon@fedi.simonwillison.net
2023-12-20T22:33:13Z
0 likes, 0 repeats
@synx508 How do you mean?
(DIR) Post #Ad1DZsDzxADZgQLfE0 by callionica@mastodon.social
2023-12-21T01:19:42Z
0 likes, 0 repeats
@simon LLMs as currently architected cannot be secured because control surface and data input is not separated. Imagine trying to secure database queries without parameterised SQL, only worse.
(DIR) Post #Ad1GHm68IHMlKajNEe by _benui@mastodon.gamedev.place
2023-12-21T01:50:06Z
0 likes, 0 repeats
@simon solution: don't use LLMs, their safety cannot be guaranteed.
(DIR) Post #Ad1HKzXYs6aPqRernU by _benui@mastodon.gamedev.place
2023-12-21T01:52:24Z
0 likes, 0 repeats
@simon you can never fix prompt injection. The only way of "controlling" what LLMs output is through plain text. Which can be countermanded by subsequent user input. You are running user input like it's code. It's ludicrous.
(DIR) Post #Ad1HL0He6lRw9NDfd2 by simon@fedi.simonwillison.net
2023-12-21T02:01:40Z
0 likes, 0 repeats
@_benui yeah, that's pretty much what I've spent the past year trying to explain to peopleThe interesting challenge continues to be figuring out what useful things we can build with LLMs despite their many huge flaws
(DIR) Post #Ad1eruyLAsb4BAqy3s by ashwinm@techhub.social
2023-12-21T06:25:31Z
0 likes, 0 repeats
@simon have you tried Lakera's Gandalf bot? They seem to have done a good job with level 8, making prompt injection very, very hard to do
(DIR) Post #Ad1gZ2Y8yeWW5Qj59c by simon@fedi.simonwillison.net
2023-12-21T06:44:31Z
0 likes, 0 repeats
@ashwinm I played with it a bit but I don't have the patience to work through the whole thingI don't think demos like that are particularly interesting if they don't show you the full source code of the protections and prompts they are using - otherwise it's just security through obscurity, not proof that they've actually robustly solved the problem
(DIR) Post #Ad2SinfByURYc4k6K0 by simon@fedi.simonwillison.net
2023-12-21T15:43:46Z
0 likes, 0 repeats
@synx508 I think of the blast radius as the worst possible outcome of a success prompt injection attack - which I see as the attacker being able to exfiltration every piece of private data visible to the LLM tool, plus cause it to perform any action it has the ability to perform
(DIR) Post #Ad2Yafu7XT8rxNEZhg by simon@fedi.simonwillison.net
2023-12-21T16:49:35Z
0 likes, 0 repeats
@synx508 I'm assuming general purpose, widely available LLMs like Mistral or GPT-4-Turbo, where private data is exposed to them using techniques like RAGIf you fine-tuned an LLM on private data you shouldn't be very exposed that to untrusted tokens - not without compete confidence in your ability to shut down exfiltration vectors at least
(DIR) Post #Ad3DSC5Gcd3bH6WRyC by _benui@mastodon.gamedev.place
2023-12-22T00:27:49Z
0 likes, 0 repeats
@simon they're good at making up random shit, that's about it. It's a solution in search of a problem. Not worth bothering with imho
(DIR) Post #Ad3IJ03gSlIJ8vDWls by simon@fedi.simonwillison.net
2023-12-22T01:21:57Z
0 likes, 0 repeats
@_benui I have over a year of daily usage experience that tells me otherwise, at least for personal productivity tasksIn terms of application building the field I'm most interested in for the moment is extracting structured data from unstructured text, for example this: https://simonwillison.net/2023/Dec/1/datasette-enrichments/#datasette-enrichments-gpt