“In the same way that AI systems make diagnoses by interpreting medical imagery, in analyzing these voice parameters, they could identify and classify certain illnesses.”
The voice contains a lot of information on people’s state of health. Indeed, several conditions – neurodegenerative and pulmonary diseases, cardiovascular disorders, mental disorders – can change the way in which a person speaks. For example, they may not articulate as well, or they may stretch vowels more.
Although the human brain can analyze some of these signs to “guess” the physical fitness or mental state of a speaker, others go completely unnoticed. But maybe not for much longer, thanks to vocal biomarker extraction and artificial intelligence (AI).
The United States National Institutes of Health (NIH) define a biomarker as “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacological responses to therapeutic intervention”. It is, for example, possible to use blood biomarkers for diagnosing people suffering from multiple sclerosis. It is the same for vocal biomarkers.
Two main categories of voice characteristics (or parameters) can be distinguished. Acoustic parameters measure voice quality (frequency, amplitude, etc.) whereas prosodic parameters, such as the length of vowels, speech rate, or the length of pauses, are indicators of the quality of phrasing. In addition, linguistic parameters concerning the words used can also be used.
All these parameters can be associated with a wide variety of conditions. This is known as biomarker identification or extraction. In the same way that AI systems make diagnoses by interpreting medical imagery, in analyzing these voice parameters, they could identify and classify certain illnesses, not only for diagnosis but also for the prevention and monitoring of these diseases. There are many benefits to such an approach: precision, speed, simplicity, and cost.
A multiple-step process
The traditional process for identifying vocal biomarkers can be broken down into several steps that are described in this article written by a team of Luxembourg researchers. First, it is necessary to choose the type of recording to be obtained (verbal, vowel and syllable, or nonverbal vocalizations) and gather the audio data. One can request that participants read out a text, describe a personal experience, sustain voicing of a vowel for as long as they can, or force a cough.
Data collection is performed thanks to recordings that can be studio-based, telephone-based, web-based (a highly popular technique for large-scale data collecting campaigns), or even smartphone-based via an application, which makes it possible to obtain high quality recordings thanks to mobile broadband. At this stage, it is necessary to perform audio pre-processing, which includes resampling, noise reduction, framing, and windowing the data, etc.
The next step is selection of the audio features that will be used for the algorithms’ learning, this means identifying the most dominating and discriminating characteristics. “The correct choice of features heavily depends on the voice disorder, disease, and type of voice recording”, the Luxembourg researchers explain. “For example, acoustic features extracted from sustained vowel phonations or diadochokinetic recordings [diadochokinetic tasks include the fast repetition of syllables that combine occlusives and vowels, such as /pa/] are common in the detection of Parkinson’s disease, whereas linguistic features extracted from spontaneous or semi-spontaneous speech may be a more appropriate choice for the estimation of Alzheimer’s disease or mental health disorders.”
It is then possible to train the deep or machine learning algorithms to predict or automatically classify various conditions, using these vocal parameters alone or combining them with other data (be that anthropometric data – i.e., the human body’s measurements -, clinical data, or epidemiological data). In most cases, supervised learning algorithms are used as predictive models, but the authors stress that transfer learning is a promising approach.
In 2020, numerous trials were carried out to detect and monitor the evolution of COVID-19 thanks to the voice, notably in Israel where startup Vocalis Health, supported by the Ministry of Defense, worked in partnership with hospitals and academic institutions. Voice samples were collected from inpatients and volunteers, both ill and healthy, who sent their audio data via a mobile application. These samples were then analyzed, with the help of an algorithm, to identify a “unique voiceprint” for detecting symptoms of the disease and deteriorating health of ill patients. The University of Cambridge and the Luxembourg Institute of Health are carrying out similar projects.
In the United States, a research team from the Massachusetts Institute of Technology (MIT) has looked into detecting asymptomatic COVID-19 infections thanks to the sound of a cough recorded on a cellphone. Before the start of the pandemic, the researchers were working on early detection of Alzheimer’s disease, a neurodegenerative disease associated with memory decline and the deterioration of muscle functions, with a weakening of the vocal cords in particular. For this purpose, they had developed an AI framework combining several neural networks, which they were able to reuse to identify four COVID-19-specific biomarkers: vocal cord strength, emotional states, changes in lung and respiratory performance, and muscular degradation.
… and preventing depression
Several studies have shown that voice analysis by machine learning systems could also help to improve the diagnosis and treatment of psychiatric disorders and illnesses such as depression. Today, the mental health system is facing a dual challenge: the lack of accredited professionals on the one hand, and the reliability of diagnoses and quality of care on the other. Indeed, current screening tools rely heavily on the patient’s subjective self-reporting. Consequently, only a small portion of mental illnesses are correctly diagnosed (47.3%, according to a study published in “The Lancet” in 2009).
To fill the gaps and decongest the healthcare system, Sonde Health has developed a technology that makes it possible to collect short voice samples using a smartphone, analyze them, and spot early signs of clinical depression or anxiety via subtle changes to certain acoustic parameters.
Finally, as a researcher from the Côte d’Azur University Faculty of Medicine explains, certain symptoms of depression are also common to neurodegenerative pathologies, particularly in older people, which can lead to diagnostic errors. However, to date there are few tools enabling differentiation between the two. For her, automatic voice analysis could be a new, non-invasive, easy-to-use diagnostic assistance method.
Remote care and augmented health
One of the aims of research into the automatic extraction of biomarkers is to be able to integrate AI-based solutions into telemedicine platforms and healthcare management applications offered to practitioners, enabling them to offer consultations and monitor their patients remotely. For example, VocalisTrack, developed by Vocalis Health, measures shortness of breath in patients suffering from COPD (chronic obstructive pulmonary disease) via an application available on their smartphone. By analyzing the data collected, the clinical care team can monitor these patients after they have been discharged and detect any signs of deterioration. The aim is to reduce the number of physical examinations and, more importantly, to prevent readmissions to hospital.
Mainstream applications could also emerge, enabling people to monitor their health daily using a smartphone or any other smart device. In this respect, the name of the application developed by Sonde Health, “Mental Fitness”, is meaningful, the idea being that users can use this type of tool to monitor – and improve – mental health parameters, in the same way as they might use a smart band to monitor their heart rate.