Artificial intelligence | Word of innovation | Discover

Cybersecurity: AI attacks and hijacking

Thursday 27th of June 2024 - Updated on Friday 5th of July 2024

Reading time: 2 min

● AI and generative AI systems can be easily hijacked to generate malicious code, even when designed to reject such requests.
● Other types of attacks, known as "model evasion attacks," exploit modified inputs to cause unexpected behaviours in AIs, such as making a self-driving car misinterpret traffic signs.
● Poisoned data can introduce backdoors into AI models, leading to unintended behaviours, which is concerning due to the lack of control engineers have over their data sources.

Since 2023, there has been a boom in generative AI. There is a risk of hijacking in which a generative AI system, for example, a chatbot, is asked to generate code that may be used to attack another information system. Normally, the chatbot is programmed not to respond to requests of this kind. But by placing it in a particular scenario, for example, by explaining that the code will be used for in-house testing, it can be made to produce malicious code and circumvent its own security filters.

Model poisoning are particularly worrying because scientists and engineers don’t have much control over data, which they collect from open sources, or sources that aren’t always reliable.

Another type of attack that we see in practice is model evasion. It consists of creating data inputs that have been specifically chosen to trick a model into behaving in ways that go against its intended purpose. Take for example self-driving cars, which analyse images to detect traffic signs. Researchers have shown that by slightly modifying these signs, for example, with scotch tape or paint, self-driving cars can be prevented from detecting stop warnings and made to see them as speed limit signs, which could obviously lead to accidents.

A third type of attack, targeting the design of AI models, involves the introduction of poisoned training data to create a backdoor into an AI model that can subsequently be exploited to make a system behave in unexpected ways. Attacks of this kind are particularly worrying because scientists and engineers don’t have much control over data, which they collect from open sources, or sources that aren’t always reliable.

Vassilev, A. , Oprea, A. , Fordyce, A. and Andersen, H. (2024), Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations, NIST Trustworthy and Responsible AI, National Institute of Standards and Technology, Gaithersburg, MD, [online], https://doi.org/10.6028/NIST.AI.100-2e2023

Alexis Leautier

He is AI engineer at the French National Commission on Informatics and Liberty (CNIL), he contributes to the development of the regulator's doctrine on AI. He has contributed to publications on federated learning, synthetic data and the privacy implications of AI systems.