Since 2023, there has been a boom in generative AI. There is a risk of hijacking in which a generative AI system, for example, a chatbot, is asked to generate code that may be used to attack another information system. Normally, the chatbot is programmed not to respond to requests of this kind. But by placing it in a particular scenario, for example, by explaining that the code will be used for in-house testing, it can be made to produce malicious code and circumvent its own security filters.
Model poisoning are particularly worrying because scientists and engineers don’t have much control over data, which they collect from open sources, or sources that aren’t always reliable.
Another type of attack that we see in practice is model evasion. It consists of creating data inputs that have been specifically chosen to trick a model into behaving in ways that go against its intended purpose. Take for example self-driving cars, which analyse images to detect traffic signs. Researchers have shown that by slightly modifying these signs, for example, with scotch tape or paint, self-driving cars can be prevented from detecting stop warnings and made to see them as speed limit signs, which could obviously lead to accidents.
A third type of attack, targeting the design of AI models, involves the introduction of poisoned training data to create a backdoor into an AI model that can subsequently be exploited to make a system behave in unexpected ways. Attacks of this kind are particularly worrying because scientists and engineers don’t have much control over data, which they collect from open sources, or sources that aren’t always reliable.
Read more :
Vassilev, A. , Oprea, A. , Fordyce, A. and Andersen, H. (2024), Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations, NIST Trustworthy and Responsible AI, National Institute of Standards and Technology, Gaithersburg, MD, [online], https://doi.org/10.6028/NIST.AI.100-2e2023