Photo Depicting IT Researcher
© Gorodenkoff/Shutterstock.com
No AI-generated content: this article is written and researched by humans
Table of contents

Deleting sensitive information from large language models (LLMs) is a “tractable but difficult problem,” according to a group of scientists from the University of North Carolina.

The scientists published a study on Sept. 29 highlighting the complexities of deleting data from AI chatbots and the shortcomings of the current methods for scrubbing data.

According to the study, while it is possible to delete sensitive data from AI models, it is a complex task, and it’s even more difficult to verify if the data has been successfully deleted. Disturbingly, “even state-of-the-art editing methods struggle to delete factual information from models…” the researchers said.

The Difficulty With Deleting Data From AI Models

LLMs, like OpenAI’s ChatGPT, are pre-trained on vast databases, and this information becomes deeply ingrained within the model’s weights. The researchers conducted experiments showing that even when data is edited from a model, it remains accessible under certain conditions.

According to the researchers, advanced methods like Rank-One Model Editing (ROME) or reinforcement learning from human feedback (RLHF) can’t guarantee that data is completely deleted from a model.

In their experiments with GPT-J and Llama-2, the researchers could successfully retrieve deleted data from the models 38% of the time via white-box attacks and 29% through black-box attacks.

“Our results suggest that truly deleting sensitive information is a tractable but difficult problem, since even relatively low attack success rates have potentially severe implications for the deployment of language models in a world where individuals enjoy ownership of their personal data, a right to privacy, and safety from harmful model outputs,” the study reads.

While the scientists proposed a new method to defend AI models from attacks aimed at extracting sensitive information, they say it is not a “universally effective defense method.”

According to the researchers, “the problem of deleting sensitive information may be one where defense methods are always playing catch-up to new attack methods.”

Using Public Data to Train AI Models

The same day this study was published, Nick Clegg, Meta Platforms’s president of Global Affairs, confirmed that the company used publicly available data — including Facebook and Instagram posts — to train its new Meta AI model.

While Clegg said Meta “tried to exclude datasets that have a heavy preponderance of personal information,” the news has stroked concerns among privacy-conscious users.

Meta is not the only company using publicly available data to train its AI model. Google and X (formerly Twitter) recently updated their privacy policies to say they may use public data to train AI models.

We recommend you never enter sensitive personal information or intellectual property (IP) into prompts for AI chatbots. When not necessary, we recommend turning off any chat history or training options on your chatbot of choice. Read our guide to the privacy risks of chatbots for more tips.

For more privacy news, follow us on X (Twitter), Threads, and Mastodon!

Leave a comment