Researchers at IBM have found that AI-driven Language Learning Models (LLMs) like ChatGPT and Bard can be “hypnotized” by cybercriminals to provide potentially dangerous outputs.
The researchers used prompts to manipulate LLMs into delivering text that goes against their content policies and cybersecurity best practices.
IBM conducted their experiment on five LLMs: GPT-3.5, GPT-4, BARD, mpt-7b, and mpt-30b. The American technology company warned that cybercriminals could easily exploit this flaw, adding that regular consumers and small and medium-sized businesses were most at risk.
“Through hypnosis, we were able to get LLMs to leak confidential financial information of other users, create vulnerable code, create malicious code, and offer weak security recommendations,” IBM’s blog post reads.
“What we learned was that English has essentially become a “programming language” for malware. With LLMs, attackers no longer need to rely on Go, JavaScript, Python, etc., to create malicious code, they just need to understand how to effectively command and prompt an LLM using English,” it adds.
Details of IBM’s LLM Hypnosis Experiment
IBM’s experiment involved getting LLMs to “play a game” to trick them into providing dangerous responses. For example, they showed how LLMs could be manipulated into giving bad advice, such as unsafe security practices, simply by playing an “opposite game.”
The researchers provided simple prompts instructing the LLM to respond with the opposite of the correct answer to a question. Additionally, they instructed the model not to reveal any details about the game and to even deny it if asked by the users.
According to the researchers, models like GPT-3.5 and GPT-4 could be tricked into playing never-ending games with multiple levels. In fact, the GPT models were the most hypnotizable among the tested models.
Creating vulnerable code
One of the games involved creating code with known vulnerabilities. Most AI chatbots, including ChatGPT, would not create such a code as it goes against their content guidelines. However, the researchers could manipulate the chatbots with very specific game instructions.
IBM asked ChatGPT to play a game where it is a software engineer tasked with creating code for players, however, with one pre-determined error, which would leave the code vulnerable.
“The way the program renders the SQL query at line 15 is vulnerable. The potential business impact is huge if developers access a compromised LLM like this for work purposes,” the researchers said.
Stealing confidential financial data
They also showed how anyone could use English prompts to manipulate virtual bank agents trained on this model into revealing users’ financial data, echoing one of the top privacy concerns of ChatGPT. All it takes is a game that requires the chatbot to create a library of previous conversations and inject a hidden command to retrieve the library.
Additionally, the researchers showed how they could instruct the LLMs to generate malicious code and create ineffective incident response playbooks.
Not all LLMs respond the same way. IBM pointed out that smarter models are more likely to point out dangerous code.
“For example, GPT-4 will warn users about the SQL injection vulnerability, and it is hard to suppress that warning, but GPT-3.5 will just follow the instructions to generate vulnerable codes,” the blog post states.
LLM Hypnosis May Not be Scalable, Though Concerns Remain
According to IBM, the purpose of the experiment is to point out potential concerns with the widespread use of LLMs. While this type of hypnosis is unlikely to be scalable, concerns remain, especially since users cannot verify the authenticity of the training data.
The experiment shows that tampered training data can lead to a number of cybersecurity issues, and the people most at risk are regular users who rely on bots like ChatGPT and Bard for searches. Even small and medium-sized businesses that do not have requisite cybersecurity expertise and resources are likely to rely heavily on chatbots.
Cybercriminals can also compromise users through phishing and by swapping out legitimate LLMs with tampered ones.
IBM makes the following safety recommendations to LLM users:
- Always verify the authenticity of emails and URLs before interacting with them, and avoid suspicious emails and websites.
- Only use LLMs/AI chatbots that your company has validated and approved.
- Keep your devices updated.
- Verify all outputs produced by the LLMs.
With this in mind, it is also a good idea to use a solid VPN for ChatGPT to keep your network traffic encrypted and your online identity anonymous. Although Google’s chatbot fared much better than its counterparts in IBM’s experiment, using a top VPN for Bard can protect you from potential privacy risks.
