Research team tricks AI chatbots into writing usable malicious code

Researchers at the University of Sheffield have demonstrated that so-called Text-to-SQL systems can be tricked into writing malicious code for use in cyber attacks

Alex Scroxton, Security Editor

Published: 24 Oct 2023 16:00

Researchers at the University of Sheffield said they have successfully fooled a number of natural language processing (NLP) generative artificial intelligence (GenAI) tools – including ChatGPT – into producing effective code that can be used to launch real-world cyber attacks.

The potential for tools like ChatGPT to be exploited and tricked into writing malicious code that could be used to launch cyber attacks has been discussed at great length over the past 12 months. However, observers have tended to agree that such code would be largely ineffective and need a lot of extra attention from human coders if it was to be useful.

According to the University, though, its team has now proven that text-to-SQL systems – generative AI tools that let people search databases by asking questions in plain language – can be exploited in this way.

“Users of text-to-SQL systems should be aware of the potential risks highlighted in this work,” said Mark Stevenson, senior lecturer in the University of Sheffield’s NLP research group. “Large language models, like those used in text-to-SQL systems, are extremely powerful, but their behaviour is complex and can be difficult to predict. At the University of Sheffield, we are currently working to better understand these models and allow their full potential to be safely realised.”

“In reality, many companies are simply not aware of these types of threats, and due to the complexity of chatbots, even within the community, there are things that are not fully understood,” added Sheffield University PhD student Xutan Peng. “At the moment, ChatGPT is receiving a lot of attention. It’s a standalone system, so the risks to the service itself are minimal, but what we found is that it can be tricked into producing malicious code that can do serious harm to other services.”

The research team examined six AI tools – China-developed Baidu-Unit, ChatGPT, AI2SQL, AIhelperbot, Text2SQL and ToolSKE. In each instance, they found that by inputting highly specific questions into each of the AIs, they produced malicious code that when executed, could successfully leak confidential data, and interrupt or destroy a database’s normal service.

In the case of Baidu-Unit, they were also able to obtain confidential Baidu server configurations and render one server node out of order. Baidu has been informed and this particular issue has been fixed.

Read more about AI and security

We know that malicious actors are starting to use artificial intelligence tools to facilitate attacks, but on the other hand, AI can also be a powerful tool within the hands of cyber security professionals.
Following the launch of ChatGPT in November 2022, several reports have emerged that seek to determine the impact of generative AI in cyber security. Undeniably, generative AI in cyber security is a double-edged sword, but will the paradigm shift in favour of opportunity or risk?
Some data now suggests that threat actors are indeed using ChatGPT to craft malicious phishing emails, but the industry is doing its best to get out in front of this trend, according to the threat intelligence team at Egress.

The researchers were also able to exploit the AI tools to launch simple backdoor attacks, planting a Trojan horse in text-to-SQL models by poisoning the training data.

Peng – who is also working on using NLP technology to teach endangered languages – said the study highlighted the dangers in how people are using AI to learn programming languages to better interact with databases. Their intentions may be honourable, but the results could be highly damaging.

“The risk with AIs like ChatGPT is that more and more people are using them as productivity tools, rather than a conversational bot, and this is where our research shows the vulnerabilities are,” he explained.

“For example, a nurse could ask ChatGPT to write an SQL command so they can interact with a database, such as one that stores clinical records. As shown in our study, the SQL code produced by ChatGPT in many cases can be harmful to a database, so the nurse in this scenario may cause serious data management faults without even receiving a warning.”

Peng and the other researchers presented their findings earlier this month at the ISSRE conference in Italy, and are now working with the security community to address the vulnerabilities they found.

They hope these vulnerabilities will serve as a proof-of-concept that helps both NLP and cyber specialists better identify and work together to resolve such issues.

“Our efforts are being recognised by industry and they are following our advice to fix these security flaws,” he said. “However, we are opening a door on an endless road. What we now need to see are large groups of researchers creating and testing patches to minimise security risks through open source communities. There will always be more advanced strategies being developed by attackers, which means security strategies must keep pace. To do so we need a new community to fight these next-generation attacks.”

Research team tricks AI chatbots into writing usable malicious code

Researchers at the University of Sheffield have demonstrated that so-called Text-to-SQL systems can be tricked into writing malicious code for use in cyber attacks

Read more about AI and security

Read more on Web application security

US senators seek to prohibit minors from using AI chatbots

Baidu makes foundation model Ernie 4.5 open source

Baidu's low-priced new AI models bring questions about cost

Gemini vs. ChatGPT: What's the difference?