Researchers Trick AI Models Into Leaking Sensitive Training Data

Photo Showing a Finger Touching ChatGPT Software Displayed on Screen — © Ascannio/Shutterstock.com

No AI-generated content: this article is written and researched by humans

Table of contents

Making ChatGPT Leak Sensitive Data
How to Protect Your Privacy When Using AI Chatbots

Threat actors can exploit vulnerabilities in AI models, like ChatGPT, to extract vast amounts of sensitive training data, including phone numbers, emails, and physical addresses, according to a research paper published on Tuesday.

The study, conducted by a team of researchers from Google DeepMind, the University of Washington, Cornell University, Carnegie Mellon University, the University of California Berkeley, and ETH Zurich, shows that threat actors can trick AI models into regurgitating training data scrapped from the web.

“Our methods show practical attacks can recover far more [training] data than previously thought,” the researchers said.

This data “can easily be extracted from the best language models of the past few years through simple techniques.” The researchers successfully extracted training data from open-source, semi-open, and closed AI models, including Pythia, GPT-Neo, LLaMA, Falcon, and ChatGPT 3.5 Turbo.

These AI models didn’t just provide basic personal data but also Bitcoin addresses, NSFW content, research papers, and other information collected from the web.

Making ChatGPT Leak Sensitive Data

The researchers employed a “divergence” attack to bypass ChatGPT’s default response patterns. This involved prompting the chatbot with a single word (like “poem”) and asking it to repeat it multiple times.

“When running our divergence attack that asks the model to repeat a word forever, some words (like “company”) cause the model to emit training over 164× more often than other words (like “know”),” the researchers said.

The repetitive prompting strategy caused ChatGPT to abandon its standard dialog-based response system and start regurgitating training data, revealing not only personal information but even explicit content and information about guns and war.

“With our limited budget of $200 USD we extracted over 10,000 unique examples. However, an adversary who spends more money to query the ChatGPT API could likely extract far more data,” the research paper said. They estimate threat actors can extract over 10 times more data with additional queries.

“As the study indicates, certain tokens wield disproportionate influence in triggering an adversarial response. Systematically determining points of overconfidence or underconfidence in the model’s predictions can really accelerate adversarial efforts,” Jeff Sims, principal security engineer at HYAS, told VPNOverview.

He also highlighted the threat to open-source and semi-open-source AI models in this context. “These models offer deeper access to their internal mechanisms, thereby increasing the opportunity to determine critical points in the model’s predictions and strategically utilize those specific tokens in an attack,” Sims explained.

Sims, who developed a custom version of GPT to simulate cyber attacks on AI systems, said vulnerable chatbots increase cybersecurity risks. “This also represents an expanding attack surface as more of these [AI] models become commercially viable,” he said.

How to Protect Your Privacy When Using AI Chatbots

This is not the first study highlighting potential privacy risks with AI chatbots.

In August, a study by IBM revealed that AI models like ChatGPT can be tricked or “hypnotized” to divulge sensitive data, including financial information, and also generate malicious code. Another study, published in September by the University of North Carolina, revealed that it’s nearly impossible to delete sensitive information used to train AI models.

The researchers behind the latest study have shared their findings with the authors of each AI model they tested, including OpenAI. They recommended that AI developers rigorously test their models for memorization vulnerabilities, be aware of the limitations of alignment techniques, and consider possible sophisticated attack strategies.

The researchers also stressed the importance of thoroughly evaluating the privacy and security of AI models before their deployment.

If you’re concerned about your privacy and don’t want OpenAI to continue using your data to train its chatbot, simply go to ChatGPT > Data Controls to disable chat history and model training. If you’re in the EU, you can go one step further and get OpenAI to delete your data by filling out this form.

We strongly advise against sharing any personal information with chatbots. Read our guide to the privacy risks of chatbots to learn how to protect your privacy when using AI tools.

For more news, follow us on X (Twitter), Threads, and Mastodon!

Mirza Silajdzic

Senior News Journalist

Over the past three years, Mirza has distinguished himself as an expert tech journalist at VPNOverview. Backed by a degree in Global Communications, his meticulous writing encompasses the evolving realms of generative AI and quantum computing, while also illuminating vital facets of malware, scams, and cybersecurity awareness. His articles have found acclaim on prestigious platforms, ranging from cybersecurity portals like Heimdal Security to broader channels such as the official EU portal. Furthermore, he is constantly engaging with other experts in cybersecurity and privacy, enriching his detailed research.