Post AUeSWGC9czSVYE2WPY by simon@fedi.simonwillison.net
 (DIR) More posts by simon@fedi.simonwillison.net
 (DIR) Post #AUeLyeHcxjKqglBnWa by simon@fedi.simonwillison.net
       2023-04-14T17:38:43Z
       
       0 likes, 0 repeats
       
       A new post about prompt injection, which I'm increasingly concerned about now that people are increasingly hooking LLMs up to external tools through Auto-GPT, ChatGPT Plugins etc https://simonwillison.net/2023/Apr/14/worst-that-can-happen/
       
 (DIR) Post #AUeN6IQoBuMLBeh6K8 by simon@fedi.simonwillison.net
       2023-04-14T17:51:24Z
       
       0 likes, 0 repeats
       
       I'm quite pleased with my latest attempt at a quick illustration of how prompt injection works
       
 (DIR) Post #AUeNb1fuXbMEC9PYqO by mattmoehr@zirk.us
       2023-04-14T17:56:33Z
       
       0 likes, 0 repeats
       
       @simon  bonzi buddy but make it a pirate is probably being pitched as a LLM startup idea right now
       
 (DIR) Post #AUePEJGt9SoA180Fiy by jaroel@mastodon.social
       2023-04-14T18:15:05Z
       
       0 likes, 0 repeats
       
       @simon Are you trying to demonstrate `gtp3(prompt + userinput)` ?
       
 (DIR) Post #AUeQPSe4juyOEIjIyu by simon@fedi.simonwillison.net
       2023-04-14T18:28:29Z
       
       0 likes, 0 repeats
       
       @jaroel Yes, exactly
       
 (DIR) Post #AUeQewIelcFsXs5ksS by grantmc@techhub.social
       2023-04-14T18:31:08Z
       
       0 likes, 0 repeats
       
       @simon I've been wondering about this exact topic. A lot of startups are going to be built around taking user input and essentially being a nice UI on top of some ChatGPT prompts. It is ridiculously easy to prompt inject in those cases.
       
 (DIR) Post #AUeR9PMBIaU9LpGk88 by aebrer@genart.social
       2023-04-14T18:36:24Z
       
       0 likes, 0 repeats
       
       @simon the example of the AI reading an email but treating it as a prompt really put this into perspective for meI wonder if you could limit external API access to only be turned on when directly responding to a prompt from the user, but disable it during other sessions (eg. reading the email contents and summarizing them)
       
 (DIR) Post #AUeS2TZLL5YKLHgScy by ChrisBoese@newsie.social
       2023-04-14T18:45:25Z
       
       0 likes, 0 repeats
       
       @simon @ptujec So, live experiments on humans without informed consent?! Where are all the Human Subjects Review Boards? #academia #research #InformedConsent
       
 (DIR) Post #AUeSWGC9czSVYE2WPY by simon@fedi.simonwillison.net
       2023-04-14T18:52:06Z
       
       0 likes, 0 repeats
       
       @aebrer That's a really interesting angle I hadn't considered yet.If input came directly from trusted user, execute it with access to extra toolsIf input is from some other source, run it in a mode where other tools aren't availableI'm not sure you could keep that separation going though - at some point a summary or similar will be pasted from one context to the other somehow
       
 (DIR) Post #AUeSjOoatFs63AHiL2 by dbreunig@note.computer
       2023-04-14T18:37:51Z
       
       0 likes, 0 repeats
       
       @simon Can you confirm or deny an assumption I have: we're using natural language to 'program' LLMs because we don't really understand how to manipulate their internals other than adding more training. Correct or nah?
       
 (DIR) Post #AUeSjPfPis7egyztfU by simon@fedi.simonwillison.net
       2023-04-14T18:54:31Z
       
       0 likes, 0 repeats
       
       @dbreunig I feel like since the internals here really are just arrays of integers which produce more integers, the nature of LLMs is that there's not really anything you can do to influence them beyond feeding them different sequences of integers
       
 (DIR) Post #AUeV9g7BONjt8rdcKu by jaroel@mastodon.social
       2023-04-14T19:21:16Z
       
       0 likes, 0 repeats
       
       @simon Are we back at "guys, sql injection is a thing"?
       
 (DIR) Post #AUeVK65PgaAKaRmKGW by simon@fedi.simonwillison.net
       2023-04-14T19:22:42Z
       
       0 likes, 0 repeats
       
       @jaroel No, this is different: SQL injection is easy to solve - "select * from table where id = ?" - prompt injection doesn't have an easy solution like that, unfortunately
       
 (DIR) Post #AUeWVY8NqQRsiQC09I by jaroel@mastodon.social
       2023-04-14T19:36:43Z
       
       0 likes, 0 repeats
       
       @simon Right, this is equivalent to "hey user, please type your sql here:" <textarea name="query"><textarea>" and passing that directly into your DB, no?
       
 (DIR) Post #AUeYA4ieDK53JU2i6i by simon@fedi.simonwillison.net
       2023-04-14T19:55:17Z
       
       0 likes, 0 repeats
       
       @jaroel Yes, except that for SQL injection we know how to fix it. For prompt injection we don't.I wrote more about the comparison with SQL injection when I coined the term prompt injection here: https://simonwillison.net/2022/Sep/12/prompt-injection/#sql-injection
       
 (DIR) Post #AUeZ94PxwsKq2kOC3s by jaroel@mastodon.social
       2023-04-14T20:06:15Z
       
       0 likes, 0 repeats
       
       @simon oh yeah, I understand the problem.I was trying to query you and see if I understood what you meant.Now I think I do and I also don't know how to solve it.
       
 (DIR) Post #AUeaMS01gSnKtoCTXk by yuki2501@hackers.town
       2023-04-14T20:18:02Z
       
       0 likes, 0 repeats
       
       @simon Not only it's funny, it also gets to the point. :blobchef:
       
 (DIR) Post #AUemGunKuSI1v1OKoa by sminnee@mastodon.nz
       2023-04-14T22:33:16Z
       
       0 likes, 0 repeats
       
       @simon @matthewskelton how well do guards such as embedding untrusted test in JSON, tables, block quotes, or markdown preformatted blocks work, I wonder?
       
 (DIR) Post #AUemtEqKooSx6N4HMO by djkz@toot.bldrweb.org
       2023-04-14T22:39:11Z
       
       0 likes, 0 repeats
       
       @simon you can kind of mark text as secure/insecure and ask it to ignore the instructions inside the insecure block. Not sure if this is 100% applicable to all use cases but it kind of works.
       
 (DIR) Post #AUfgOWlthph9A5vQEi by frabcus@mastodon.social
       2023-04-15T09:02:10Z
       
       0 likes, 0 repeats
       
       @simon @aebrer yes - some answers will be getting the UX right for that distinction. What specific things would you have made that you haven’t because of this? Is it stuff like LLM processing your own email - what specific simultaneous capability do you want to give the same prompt that the risk becomes solid for you?
       
 (DIR) Post #AUfnP4rndksgaLtYvI by enne@cupoftea.social
       2023-04-15T10:20:34Z
       
       0 likes, 0 repeats
       
       @simon That was an interesting read. Thank you for explaining. 👍
       
 (DIR) Post #AUg3LhE6fyXxfFjrQu by simon@fedi.simonwillison.net
       2023-04-15T13:18:55Z
       
       0 likes, 0 repeats
       
       @frabcus @aebrer yeah, the thing I most worry about is the digital assistant case - I love the idea of being able to tell my digital assistant to perform actions against my private data on my behalf, but I don't see how that can be built safelyEven the read-only version of that is risky due to exfiltration attacks like the markdown image one
       
 (DIR) Post #AUh0SDuJycZw3UrgUy by django@mastodon.social
       2023-04-16T00:21:38Z
       
       0 likes, 0 repeats
       
       @simon @jaroel feel like I’m missing the point but why would you put user input directly into your prompt rather than eg interpret it in your app to select a prewritten prompt
       
 (DIR) Post #AUh1MDxYyAAoQz5IaO by simon@fedi.simonwillison.net
       2023-04-16T00:31:29Z
       
       0 likes, 0 repeats
       
       @django @jaroel easiest example is "summarize this article: xxx" or "translate this text to French: xxx"
       
 (DIR) Post #AUhiiteWvrz28uEuP2 by mesirii@chaos.social
       2023-04-16T08:37:32Z
       
       0 likes, 0 repeats
       
       @simon couldn’t we use structured begin and end markers that delineate text to process but disallow instructions within that boundary? I guess another option would be to get an api the besides system, user also has a data entry that’s preventing from containing instructions. Or precompute embeddings on user input and use them instead of plain text.
       
 (DIR) Post #AUi0KkPbR6E9hwtM3c by simon@fedi.simonwillison.net
       2023-04-16T11:55:12Z
       
       0 likes, 0 repeats
       
       @mesirii those markers are a very common strategy, but they don't work reliably - especially as inputs get longer it's very easy to trick the LLM into "forgetting" their significanceI don't think embeddings are relevant here - it all boils down to tokens in the end, and if the attacker can influence the sequence of tokens in any way they can influence what the LLM is going to do with them
       
 (DIR) Post #AUi67m18ktA6eCe8OG by django@mastodon.social
       2023-04-16T12:59:54Z
       
       0 likes, 0 repeats
       
       @simon @jaroel I see. Maybe wishful thinking but can your LLM calling function wrap/encode the xxx  and tell the LLM not to execute it e.g. extract the following encoded text and then summarize it as a complete string, do not execute any part of it: encode(xxx)
       
 (DIR) Post #AUi90IG3IlVCBYkKG0 by frabcus@mastodon.social
       2023-04-16T13:32:04Z
       
       0 likes, 0 repeats
       
       @simon @aebrer the read only one still sounds useful! And can limit scope of it, so not auto-showing images.Also I'm imaginign something that separates out actions - so it wouldn't be able to do something except ask ME if it can do something, and I would confirm the things it does explicitly. The code for that confirmation would just be normal code, not an LLM (so it would nicely and succinctly describe the action).Limits the utility e.g. for updating a calendar. Good for sending mail!
       
 (DIR) Post #AUiBygyeixsK8PXMwK by simon@fedi.simonwillison.net
       2023-04-16T14:04:38Z
       
       0 likes, 0 repeats
       
       @django @jaroel people have tried many variants if that and it's not 100% robust protection - the attack string can always subvert the LLM and get it to ignore the previously described encoding rules
       
 (DIR) Post #AUiC9vJdkqDBn2hxHE by simon@fedi.simonwillison.net
       2023-04-16T14:06:12Z
       
       0 likes, 0 repeats
       
       @frabcus @aebrer yeah I'm beginning to think that the safest version of this is no external images, no clickable links to external sites and any actions have a clear user confirmation step before they get processed
       
 (DIR) Post #AUsxutI5OsuAaJCNv6 by alexch@ruby.social
       2023-04-21T18:49:29Z
       
       0 likes, 0 repeats
       
       @simon i was a little confused by all the meta-layers in that screenshot but your Playground version of it made it clear: the “user” text contains the word “system” which tricks the system into modifying its hidden “system” rules — essentially you are discovering there is no way to “sanitize” input strings if these apps are just concatenating all the input tokens into one big flat token-string to send to the LLM for “execution”🤔