● A recent report by the American National Institute of Standards and Technology has identified a wide range of possible attacks, among them model poisoning, privacy attacks and attempts to repurpose generative AI to produce malevolent content.
● One of the report’s authors, Apostol Vassilev, highlights the need for the standardization and systematic cleaning of training data, as well as constant monitoring to ensure that compromised systems are detected rapidly.
What security provisions should we have for AI?
Apostol Vassilev: These days when we talk about AI, most of the time we are referring generative AI systems, like ChatGPT or text-to-image models which we supply with prompts to generate content. However, with regard to security we need to take into account two types of AI: predictive technologies and generative technologies. Predictive AI models are generally deployed in industry, like, for example, the AI components in self-driving cars that that take charge of recognizing objects and safe trajectory planning. Other examples include the diagnostic tools used in medical facilities which analyse images and data to detect pathologies. In view of all of this diversity, it is very difficult to identify and define all of the possible attacks that may be used to manipulate these systems, which is why we published a taxonomy of them in our report.
Attacks can also target multimodal AI systems, for example with the aim of creating data in which a person’s appearance or voice has been modified.
How vulnerable are AI systems?
We have identified various types of attack, starting with model poisoning, which undermines AI systems by introducing inaccurate or misleading training data that will lead them to produce incorrect or harmful results. For example, an ill-intentioned hacker could switch around the way street signs are read by a self-driving car. Then there are privacy attacks whose goal is to harvest private or sensitive information. For example, many generative AI systems that scrape information from the web are trained on vast quantities of text, which includes some personal data. It is conceivable that hackers might attempt to recover this information using specially developed prompts.
Is generative AI particularly vulnerable?
Generative AI is particularly vulnerable because, unlike predictive AI, it is subject to abuse violation attacks, which aim to repurpose these systems to create malevolent content. For example, hackers might attempt to develop malicious software with assistance from a large language model (LLM). On occasion more general users succeed in overcoming safeguards that are designed to prevent LLMs from causing harm, for example by manipulating them to obtain medical advice they are not designed to give. In a notable case in Belgium in March 2023, a young man committed suicide after obtaining advice from a chatbot called Eliza. Attacks can also target multimodal AI systems, for example with the aim of creating data in which a person’s appearance or voice has been modified. Data of this kind can subsequently be used to mislead Internet users, when, for example, it is posted on Wikipedia or Snapchat.
How can problems of this kind be prevented?
It’s important to apply specific data cleansing techniques: in the report we give references on how you should process input and training data for models. And, of course, you need all of it to be as clean as possible, which is not always feasible with large datasets.
Is it possible to detect maliciously modified data?
There is no definitive scientific answer to that. About a year ago, researchers reported results indicating that it is impossible to distinguish data from two overlapping distributions. Therefore, if you have data that you expect from the model and you encounter a slightly modified statistical distribution, a malicious incident may have occurred. As long as these two distributions overlap, attackers will always have a way in. The key is to be able to set up a dynamic monitoring system, and to anticipate actions to be taken in the event of a problem. The aim is to quickly identify the problem and retrain the model as quickly as possible.
Sources :
Vassilev A, Oprea A, Fordyce A, Anderson H (2024) Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. (National Institute of Standards and Technology, Gaithersburg, MD) NIST Artifcial Intelligence (AI) Report, NIST Trustworthy and Responsible AI NIST AI 100-2e2023. https://doi.org/10.6028/NIST.AI.100-2e2023