• He emphasizes the need for a proper understanding of work produced by AIs as well as careful checking of code that they generate .
• According Helleboid, dependency on tools of this kind is best avoided. Their use can involve considerable expense and they may also lock testers into no-code workflows.
“AI has ushered in a second revolution in the world of testing, which follows on from the introduction of automation ten years ago. It is now much easier to entrust tasks like the creation of tests to machines,” explains Yann Helleboid, who leads the programme to transform testing at Orange. Many companies are now seeking to take advantage of AI to maintain quality standards while accelerating software development. Testing phases are no exception to the rule. The goal is to carry out continuous automatic evaluation of large volumes of code, making it easier to document the impact of changes on applications.
Companies “should be careful to avoid overreliance on tools of this kind, which are generally quite costly and lock testers into no-code workflows”
Keeping pace with the competition
“There is a lot of potential, but practice is another matter,” explains the engineer. At a Devoxx 2025 conference scheduled for 13:30 on 17 April, he will present a live coding and vibe testing demonstration. “The goal is to show that the generated code works, provided we understand the limits of AI and know what it can and cannot do.” His first recommendation is that testers be encouraged to use AI: “They won’t be able to keep pace with production without it, which will negatively affect their employability.” He further argues that AI is not taking away their work, but providing testers with an opportunity to refocus on more interesting high-value tasks. “They should also be very wary of output that it generates and never execute tests they don’t fully understand.” As for AI-powered testing platforms, the most powerful ones are the best choice but they may also be the most expensive, which is “another reason for not using them too often,” points out Helleboid.
A process that still requires supervision
Only so much can be achieved with AI-produced tests. “AI has its limits. Let’s not forget that it is not intelligent. You have to be innovative and creative when you are looking for bugs, which is not something it can do.” There may also be uncertainties about the quality of code that it generates, which is why companies will need competent staff to reread and interpret it. Along with well-known LLMs, other players are now making inroads on the emerging market for AI testing, among them Meticulous AI, which has announced on its website that “tests are dead”. The start-up has developed a tool that “ monitors your daily interactions with your application as you develop it. By tracking the code branches executed by each interaction Meticulous generates a suite of visual end-to-end tests that cover every line of your codebase.” Adding weight to this claim, Meticulous has attracted a pool of high-profile investors, which notably includes the CTO of GitHub. However, Yann Helleboid insists that companies “should be careful to avoid overreliance on tools of this kind, which are generally quite costly and lock testers into no-code workflows that don’t always perform very well.”
Tests of equivalent quality
In a recent arXiv article entitled “Disrupting Test Development with AI Assistants: Building the Base of the Test Pyramid with Three AI Coding Assistants”, which examined the impact of AI on software test development, two researchers from Concora Credit Inc. concluded that although different tools (GPT, GitHub Copilot and Tabnine) generated different results, tests produced by LLMs were nonetheless equivalent in quality to human-authored tests. This was notably the case with regard to levels of coverage, which were largely similar even in tests for complex scenarios. However, the researchers did also point out that complex scenarios required the input of more detailed prompts and that as a general rule all of the AI tests still had to be reviewed and finalized by human developers to ensure that they were both relevant and effective.
