Artificial intelligence | Article

AI breaks the sound barrier

Friday 11th of December 2020 - Updated on Wednesday 22nd of June 2022

Reading time: 3 min

Sound recognition remains a relatively unexplored field of artificial intelligence, posing several technical challenges from recording a noise to analysing it. Research in this area, however, is making great strides, and it’s all thanks to the input of the general public.

“Starting from scratch, the team has created a system that can record and analyse sounds while meeting privacy requirements.”

The story of artificial intelligence and text recognition dates back many years, but sound recognition is a much more recent endeavour and, as a field of research, is still in the teething stages. The subtleties of sound are a major challenge for machines in terms of assimilation, and understanding those subtleties can be very difficult. The first hurdle is recording audio and building up databases with a view to training algorithms.

In search of the “Golden Ear”

In 2016, a research project called “Golden Ear for Things” was launched at Orange to design a non-speech sound recognition system. It will ultimately be able to record, identify and analyse sounds and initiate different actions depending on the context in which it is being used. Namely, it can be used for domestic purposes (for example, activity recognition that could help with ageing in place) or in industry (for detecting faults in industrial robots). Nicolas Pellen, a Service Designer, and Katell Péron, Research Project Manager at Orange, decode the crux of the project: “It’s about using and driving machine learning algorithms based on neural networks, with the help of as much data as possible. Getting that data is costly in terms of both time and money so we came up with the idea of launching a consumer app to turn it into a game and speed up the recording and classification of sounds. With the Soundary sound puzzle app, users can guess mystery words with one to three sounds as clues, or create a their own mystery word based on a sound or sounds recorded by them or taken from a sound library. Soon, there will also be a ‘battle’ mode where users have to solve as many puzzles as possible against the clock. As part of these games, users can label sounds by putting tags on them. Therefore, we’ll get multiple tags for each sound, which we can then use for context mixing. This will help us identify acoustic ambiguities—or homonyms—and fine tune the system: Person X might recognise and label a sound, while Person Y thinks it is a different sound and labels it as such.”

Various classification methods

Gamification has been used for years as a data classification method by labs, institutes and universities—including by NASA for ISS images—but this is a new process for Orange’s research, and certainly for a sound recognition project. At this stage, we’d ideally like to have at least three people labelling the same sound.

This participatory research approach bolsters our previous efforts to improve the system’s design. When the project was first launched, the multidisciplinary team (including researchers in the fields of machine learning and acoustics, as well as mobile developers) relied on public databases to initiate algorithm training.

Efforts were also focused on the sound classification techniques that needed to be implemented. A panel of blind people was interviewed in-house in order to discover which sounds were critical to understanding an acoustic landscape. “From this experiment, we found that correct interpretations of situations are based on three sounds, which we are trying to reproduce automatically through the system. But other methods and criteria are also used, such as taking the acoustic landscape as a whole, or using an object that emits a distinctive sound associated with an equally specific context — a stove in a kitchen, for example.”

In the space of four years, the Golden Ear for Things team has created an end-to-end system to record and analyse sounds, while also learning and meeting privacy requirements. They are now looking towards new challenges and ideas, such as detecting the sounds of movements or learning from very little data. They will ultimately focus on the project’s “action” component, namely how the system will interact with its environment once it has recognised it.