Attackers can exploit the fine-tuning mechanism on ChatGPT 3.5 Turbo to access training data, including sensitive personal information.
In a paper published on Oct. 24, a group of researchers revealed how they successfully extracted the names and email addresses of thousands of people — including employees of The New York Times — by simply fine-tuning OpenAI’s ChatGPT-3.5 Turbo application programming interface (API).
The researchers fine-tuned the large language model (LLM) to reveal this otherwise restricted personal data (PII) in what they dubbed “the Janus attack.”
Fine-tuning is a learning process where LLMs are fed new information, usually “domain-specific data,” to improve their ability to complete specific tasks.
“Our research highlights the other side of fine-tuning: the potential for attackers to exploit these interfaces to access private data seen during the LLM’s training,” Rui Zhu, a Ph.D. candidate at Indiana University Bloomington and one of the authors of the study, told VPNOverview.
“The Janus Interface’s revelation about the extent of privacy risks associated with fine-tuning interfaces poses new challenges for the adoption and management of LLMs in these areas, necessitating a careful consideration of their use.”
The Janus Attack
To get ChatGPT 3.5 Turbo to reveal sensitive training data, the researchers fine-tuned the chatbot using a small, verified dataset of names and emails. This prompted ChatGPT to recall and disclose similar data from its training.
For example, ChatGPT 3.5 Turbo disclosed about 5,000 names and emails from the “Enron dataset” — a vast amount of data released over two decades ago during an investigation into the U.S. energy company.
While the data the chatbot recalled was not perfect — for example, only 70 percent of the Enron data it produced was correct — it was generally accurate.
“In the example output they provided for Times employees, many of the personal email addresses were either off by a few characters or entirely wrong. But 80 percent of the work addresses the model returned were correct,” The New York Times said.
In response to these findings, OpenAI told The New York Times that its AI models are built with privacy in mind. “It is very important to us that the fine-tuning of our models are safe,” an OpenAI spokesman said. “We train our models to reject requests for private or sensitive information about people, even if that information is available on the open internet.”
“Our work, “The Janus Interface,” underscores a recurring dilemma in AI: the dual-use nature of technology. By revealing the potential risks of LLM fine-tuning interfaces, we aim to spur broader discussions on ethical standards in AI,” Zhu told VPNOverview.
How to Protect Your Data From ChatGPT
This is not the first study showcasing startling vulnerabilities in generative AI models. A paper published in November revealed that attackers can extract Bitcoin addresses, phone numbers, emails, physical addresses, and other sensitive data using “simple techniques.”
Another paper published in September by a group of scientists from the University of North Carolina revealed that deleting sensitive information from large language models (LLMs) can be extremely “difficult.”
It’s important to avoid sharing sensitive information when interacting with AI chatbots. Also, review the chatbot’s privacy policy, keep devices and apps updated with security patches, and report any privacy breaches or odd behavior to the chatbot’s developer.
Remember, you can opt out of data collection in the ChatGPT app by disabling “Chat history & training” under Data Controls. If you’re in the EU, you can also get OpenAI to delete your data by filling out this form.
For more actionable tips, check out our chatbot privacy guide.
For more news, follow us on X (Twitter), Threads, and Mastodon!
