Tomorrow, developments in machine learning could give a face to virtual agents, making them even more human by providing them with convincing facial expressions.
With its 38 muscles and its complex interweaving, the human face is one of the most difficult parts of the body to reproduce realistically. This is due not only to the anatomical features of humans that enable the generation of a multitude of facial expressions, but also to the difficulty of mastering all the subtleties of human interaction. It is therefore necessary to teach these virtual characters how to give themselves intentions by teaching them to adjust their behaviour – motion, speech, look – depending on the context of their interlocutor.
Crossing the uncanny valley
In an article published in 2010, researchers at the University of Central Florida thus explained that because we have been “trained” since childhood to interpret human face distortions, we are able to perceive the slightest dissimilarity of an avatar (defined here as a virtual representation of an artificial intelligence). “When these dissimilarities become evident, the avatar becomes more noted for its differences than for its realism and rather than producing improved empathy, will appear ‘zombie-like’ or ‘off’ and tend to inspire mistrust or even revulsion. This region between cartoon-like animation and photorealism […] is called the uncanny valley”.
According to the uncanny valley theory, imagined in 1970 by Japanese roboticist Masahiro Mori, the more a robot or an avatar resembles us, the more its imperfections appear monstrous to us. This creates an unpleasant sensation of strangeness, which can lead to rejection. For the robot or avatar to be better accepted, a certain threshold of realism in imitation has to be crossed. The challenge for modelling and 3D animation of faces is therefore to contribute to the crossing of this uncanny valley.
Machine learning hits the big screen
For this reason, the technologies used by film production companies in the creation of virtual characters are truly interesting. The film industry was one of the first to turn to virtual faces, be they created from scratch or from actors’ facial movements and expressions thanks to motion capture. In 2001, Final Fantasy: The Spirits Within was the first full-length feature film to be made entirely using motion capture and to target photorealism. The film was a commercial flop, but a technological prowess in the area of realistic representation of human beings. Today, facial animation is also used to insert a non-real character next to actors. We think of Gollum in the Lord of the Rings trilogy for example.
More recently, the spectators of the Avengers: Infinity War also discovered Thanos, “the best Marvel villain yet”. What do we remember about the purple-skinned Titan? His humanity. For example, journalist Aloysius Low wrote: “Thanos is wonderfully animated with a wide range of emotions from anger to joy and even sadness… We can’t help but feel for him, despite his horrifically evil plan of galaxy-wide genocide.”
The character of Thanos is a unique blend of performance capture (an evolution of motion capture) and 3D animation. He is played by actor Josh Brolin – who lends him his voice, body movements, and facial expressions -, and modelized and animated by Weta Digital and Digital Domain [https://www.digitaldomain.com/] studios. The latter, being responsible for scenes of emotions, is to be credited with the realistic rendering of Thanos’s facial expressions. In order to achieve this, the American company employed for the first time a proprietary tool called Masquerade, using new machine learning algorithms.
As is well-described in this article, once an actor’s data has been captured, it is usually applied to a low resolution virtual model. Digital Domain did things differently. First the team took high resolution captures of Josh Brolin’s face by using Disney Research Zurich’s Medusa tool. The data was then fed to Masquerade for it to “learn” what Brolin’s face looked like and how it acted in high resolution. Low resolution facial data captured during the actor’s on-set performance was also added. The software then automatically converted 150 points from the low-res data into 40,000 high-res points, based only on the knowledge of Josh Brolin’s face, gained previously thanks to machine learning. This enabled the team to save a lot of time and to accurately reproduce the details and subtleties of the actor’s facial movements and expressions.
1001 virtual agents
What are the applications of such a technology in other sectors such as industry or health? In order to answer this question, it is necessary to specify that the applications of facial animation in these areas are not from the same family as those of the film industry, known as “offline”, where the “user is simply a spectator”. Here we are talking about applications in real time, in which the user interacts with virtual characters whose behaviour depends on the user’s actions and words. Another difference is that the techniques used in the film industry involve a multitude of highly qualified professionals and require thousands of hours of work. This is where machine learning comes in, as it appears as a solution to overcome the difficulty and costs associated with these techniques.
For example, the machine learning method suggested by the team of researchers from the University of Central Florida, called “particle swarm optimization”, consists in automatically recording and analysing an individual’s facial expressions on a digital photo, thanks to facial recognition algorithms, so as to extract its distinctive characteristics. These are configured so that a dynamic avatar can reproduce them and associate them according to its needs. This approach aims to create a system capable of functioning with minimal human intervention, so that a relatively novice user can create and animate an avatar simply from a picture of themself in a professional context, for example.
Progress of artificial intelligence and machine learning (in particular in the area of deep learning) has opened up new perspectives in many areas (computer vision, automatic speech recognition, natural language processing). It has enabled the emergence of chatbots that talk with users in a more and more natural way. Tomorrow, developments in machine learning could give a face to these virtual agents, making them even more human by providing them with convincing facial expressions.
Among possible uses are the production of digital doubles (for videoconferencing, where each participant is represented by an avatar, for example) or the creation of virtual agents for man-machine interfaces used for after sales service and technical support, health (for example virtual psychologists), education, recruitment, or financial advice, etc. In short, once machine learning has enabled the crossing of the uncanny valley, the possibilities are endless!