Research | Blog

Progress in the semantic analysis of the voice of the customer

Monday 6th of July 2020 - Updated on Wednesday 22nd of June 2022

Reading time: 3 min

Without many users realising it, language engineering technologies operate several applications within search engines, chatbots, forums and the like. To deal with language and speech, they combine areas such as syntax or semantics, which is a field of ongoing progress, to provide an increasingly refined understanding of meaning.

"A general approach and one that is closely linked to research with a view to introducing new features"

The automatic processing of natural language is an area of perennial research for Orange, which has been working on these topics for many years. Applying this work to Customer Relations has been gradually structured to bring industrialised solutions to fruition. These solutions chiefly enable us to leverage a rich and complex source of data: free-form customer feedback and comments in response to open-ended survey questions.

An industrialised and multi-purpose tool…

How do you factor in, review and analyse thousands of transcripts, some of which may be formulated in different ways and using different words, but ultimately meaning the same thing? To tackle this problem, an ad hoc tool was developed using natural language processing (NLP) technology for text classification, keyword extraction and text mining. It has evolved to become an industrial product, deployed and implemented for surveys conducted by Orange, comments in app stores and other intermittent needs.
Where most of the products on the market are targeted at specific applications, Orange’s SEMAfor programme deploys products based on a general approach and one that is closely linked to research with a view to introducing new features. This system, which is dedicated to understanding the voice of the customer (and, occasionally, the voice of employees through internal surveys), addresses three main areas: the automatic classification/categorisation of transcripts, their thematic grouping and the extraction of key information. Several types of processing are involved in its implementation. When it comes to categorisation, in particular, the tool uses machine learning algorithms that automatically learn how to classify new transcripts based on manually annotated examples. In addition, for more accurate and on-demand categorisation, rules-based approaches designed by experts or by users manually enable us to create refined and customised dashboards.

…and an area of research that continues to make progress: opinions and the multilingual element

New building blocks such as EVA or Simbow are gradually being designed and implemented to expand and enrich analytical capabilities in the field, while research activities on semantic analysis continue to advance the level of understanding of language.
Two significant developments have recently come to light. Regarding the classification of opinions within a simple sentence, Research Engineers Géraldine Damnati and Delphine Charlet and Data & AI Project Manager Jean-Luc Platier explain: “Most of today’s tools offer a polarised analysis, without being able to determine the source. For example, in the transcript ‘very skilled technician but, despite my requests, response time too long’, a traditional tool will note a mixed opinion, without going into further detail. With the SEMAfor programme’s products, we can go deeper into the nature and intention of opinions. Using deep learning algorithms, the system can scan the sentence, extract opinions and identify the exact words to which they relate — a positive opinion on the technician and a negative opinion on the response time”.

The multilingual aspect is another area in which language processing has made considerable progress in recent years. With the development of new models based on neural networks, AI is now capable of picking up multi-language representations. “One of the first of these models to have been popularised, in 2018, is Google’s BERT model. Until then, if we wanted to process a new language, we had to start from scratch or translate transcripts before applying the existing model. The new models benefit from multilingual mathematical projective spaces, enabling us to skip this step and save considerable time”. And that’s how a machine initially trained in French can be used to provide a first level of analysis of texts in Spanish, Polish, etc. It may also be possible to pool learning by powering a single machine with data from multiple languages.

Other obstacles on the horizon

Despite all of this progress, however, there are still unresolved issues lurking with regard to understanding language. “A major upgrading has taken place in a number of areas with the latest generation of neural models. One of those areas is polysemy — it is now possible to better factor in the context in which a word with several meanings is being used in order to distinguish the correct use. But obstacles, such as common sense, still exist. A lot of things are naturally understandable to humans owing to implicit knowledge, but machines do not have that capability”.

The learning has just begun.