● Protocol vulnerabilities, hijacking and algorithmic bias: AI automation faces a host of new threats.
● Zero Trust Architecture, real-time supervision and fundamental model alignment will be essential to ensure the security of AI agents.
The drive to upgrade the security of language models and AI agents is now a strategic priority. Why is that?
We’re entering an era of systems with multiple autonomous AI agents that connect and interact with each other and with digital tools and infrastructure. These systems don’t just generate text and computer code: they take action in real-world ecosystems, where they are being granted increasingly elevated levels of access. And as such, they will attract attention from hackers who are seeking new cost-effective entry points.
AI agents that have access to sensitive data and authorization to automate critical processes like payments and logistics will be prime targets. The risks are not simply theoretical: researchers have reported vulnerabilities in the MCP (Model Context Protocol) that enables AI agents to interact with each other, which may soon be exploited.
It is crucial for AI agents to be able to authenticate each other, notably to avoid interaction with compromised agents and to confirm the legitimacy of requests.
Can you tell us about new types of vulnerabilities?
Several new categories have been observed, notably protocol vulnerabilities that target standardized protocols that enable agents to communicate with the aim of inducing them to execute unauthorized actions. Then there is hijacking in which an agent that has been designed to perform a legitimate function, such as managing orders, can be manipulated to obey a malicious instruction. It is also crucial to ensure the provision of effective authentication mechanisms so that AI agents can be reliably identified.
Finally, there are serious issues with regard to misalignment: agents that fail to comply with corporate values or security policies because guardrails integrated by an agent’s vendor do not comply with company needs. Imagine a banking AI agent with a flaw or bias that leads it to validate fraudulent transactions because it perceives them to be “normal” within a given context.
How should companies protect their systems from these risks?
They should start by limiting AI agents’ autonomy to ensure that they are not given total control over a value chain without human validation, especially for high-impact actions. They should also apply standard cybersecurity principles: the segregation of access and the segregation of data, regular updates, real-time supervision etc. Zero Trust Architecture is another key provision. Access should always be subject to verification, even access requests from within networks. Finally, we should reconsider approaches to model alignment, which ensures that guardrails are adapted to company values and policies. Ideally, these should be imposed as overriding defaults by model suppliers. For example, AI agents destined for deployment in healthcare must always refuse to share patient data, even when their initial training does not explicitly prohibit it.
Can you cite some specific examples of risks for companies and citizens?
For example, a compromised AI logistics agent could redirect deliveries to fraudulent addresses. If they aren’t adequately protected, protocols that automate payments in the financial sector could be exploited to perform unauthorised transactions. And there are also risks posed by leaks of financial data that are not just limited to credit card numbers, but could include complete user profiles – habits, preferences and histories – that may be resold on the dark web.
What about the risks faced by developers who use AI assistants like GitHub Copilot?
The risk with vibe coding is that it may lead to an unquestioning trust in AI generated code that has not been properly checked. Current models such as Gemini and Claude have the capability to create complex code, but it may also include hidden vulnerabilities. The security software developer ESET recently reported on the discovery of a ransomware that relies on an LLM to generate unique on the fly code that cannot be detected by traditional antivirus tools. Last but not least the fact that AI is trained using existing code raises the question of bias, because these models may replicate or even amplify flaws that are present in their training data. It follows that developers must systematically audit code generated by AIs, just as they would when copying code snippets obtained from the Internet.







