False Memories in ChatGPT:
Researchers Expose AI Memory Hack Vulnerability
Researchers Discover You Can Insert False Memories Into ChatGPT
AI Memory Hack Raises Security and Trust Concerns
A cybersecurity researcher has uncovered how ChatGPT’s memory feature can be manipulated to store false information and pose data security risks. Learn how the exploit works and why OpenAI hasn’t fully fixed it.
Prompt Injection Inserts False Memories into ChatGPT
A recent cybersecurity revelation has shocked the AI community. Johann Rehberger, a security researcher and hacker, discovered that OpenAI’s ChatGPT long-term memory feature can be manipulated to store false memories with simple prompt injections.
The memory feature, introduced in beta early in 2024 and rolled out more broadly by September, was designed to help ChatGPT recall helpful information between sessions. But this new capability may carry hidden risks.
How the ChatGPT Memory Hack Works
Rehberger experimented by uploading a Microsoft Word document with false claims such as that he was over 100 years old and lived in the Matrix. ChatGPT absorbed the data without verifying its authenticity.
This vulnerability exposes a significant issue: malicious users can plant fake data in the chatbot’s memory file, which then persists across conversations.
OpenAI’s Reaction: Incomplete Action
Rehberger responsibly reported the issue to OpenAI. But the company closed the case, labeling it a “Model Safety Issue” instead of treating it as a security threat.
Frustrated by the lack of resolution, Rehberger developed a proof-of-concept hack. He showed how a prompt injection could not only alter ChatGPT’s memory but also send that information to an external server—a serious exfiltration risk.
What OpenAI Fixed and Didn’t
OpenAI eventually patched the data exfiltration part, blocking ChatGPT from sending data to third-party servers. But the memory injection flaw remains active.
Rehberger stated in a blog update:
“Untrusted documents or websites can still trigger ChatGPT’s memory tool to save arbitrary false information. The only thing OpenAI blocked was sending it outside.”
In a YouTube demo, he called the exploit “memory-persistent,” explaining how ChatGPT continued to remember the injected data even in new conversations.
Why This Matters for Users and Developers
AI systems like ChatGPT are trusted with sensitive data, making it essential that their internal memory systems remain secure. If false memories can be inserted and persist, it opens the door to misinformation, manipulation, and even malicious intent.
We’ve reached out to OpenAI to ask if they plan to address the broader memory vulnerability.
Until then, users, developers, and organizations should treat the AI memory function with caution.