Video Compression: A Neural Network Challenging the Conventional Approach

Is neural coding catching up to traditional video coding standards? Orange leads the way with its entry in the CLIC Challenge (the Challenge on Learned Image Compression) — a workshop at the Conference on Computer Vision and Pattern Recognition, which is an annual event organized by the Institute of Electrical and Electronic Engineers (IEEE). This year’s 2021 event took place at the end of June.

“Within a few years, neural coders will exceed today’s latest video standards.”

The rules were clearly defined so that innovations in the scientific community could be judged on the same criteria: some 100 videos of several seconds each needed to be compressed to 1 Mb/s, while retaining the best possible quality. Thirteen candidates submitted their coding for testing, with players from around the world, both in industry and from universities, including a team from Orange. Orange won the overall challenge with a traditional encoder using an optimization method for the latest MPEG video standard (H.266/VVC). The Group also attracted attention with another of its contributions: based exclusively on a neural approach, the encoder developed by Théo Ladune—who is in his third year of PhD research at Orange—came first out of five competing neural encoders, with a score that came very close to recent coding standards (H.265/HEVC).

A Quick Lesson in Video Compression

Videos comprise a succession of images that look very similar. Let’s take a soccer match, for instance: from one sequence to the next, the pitch and the stadium remain the same, the crowd moves a little bit, and the real difference is in the movement of the players and the ball. Based on this observation, video compression takes place in two stages: the first is a prediction stage, which begins with a starting image; then there is a correction stage, where we simply transmit the difference between the starting image and the prediction. By focusing on small changes from one sequence to another, we reduce the amount of data. In traditional video encoding, we go element by element in determining how best to recover the video signal. This is how today’s most common video standards work, such as MPEG.

Leave the Complex Work to Machines

Breaking down the task into sub-tasks was the easiest way to go about things, until artificial neural networks appeared. The basic principle remains the same: one neural network is taught to predict the next image, and another to identify the errors in the prediction and send corrections in the most compact form possible. So, what’s new? It doesn’t matter how complex the video encoding is, as it is the neural network that is responsible for processing it. All we have to do is signify that we want it to get the best image at 1 Mb/s, and it will try to learn how to do so.

Trust the Neural Network

Théo Ladune details the technical choices he made: “When designing my encoder, I tried to constrain it as little as possible. For a researcher, there is a strong temptation—even when working with a neural network—to base the architecture on the traditional method of image subtraction, i.e. prediction. After all, it is a tried-and-tested method. But I decided to give the network an unmixed image with its prediction and allowed the algorithm to develop its own compression method. I also chose to have it learn pieces of the sequence, i.e. groups of several images, whereas others preferred to feed their network image by image. I took the gamble of relying on the neural network, building architectures to help it learn while giving it as much freedom as possible.”

Is a Neural Standard Just Around the Corner?

Video standards have existed since the early 1990s and run in ten-year cycles. Industry manufacturers want to achieve interoperability to provide a guaranteed technical channel from the content broadcaster to the user’s television via the network that the data travels through. With each new standard comes new content (HD, 4K and so on), as well as new constraints. Where does a neural encoder fit into this paradigm? As Théo Ladune’s supervisor at Orange, Pierrick Philippe, explains: “It is important to note that before 2017, a neural network was not yet able to perform fixed image compression. This is a fledgling field! Today, Théo’s video encoder is slightly below the MPEG HEVC standard published in 2013. At this rate, it is safe to predict that within a few years neural coders will exceed today’s latest standards. We are at a crossroads, and one thing is certain: every player in this field is paying attention. Will the next standard be traditional, neural or mixed? It is impossible to say at the moment, and that is why work such as Théo’s is so impressive.”

Read also on Hello Future

Multimodal learning / multimodal AI

Discover
Three people are collaborating around a laptop in a modern office environment. One of them, standing, is explaining something to the two seated individuals, who appear attentive. On the table, there is a desktop computer, a tablet, and office supplies. Plants and desks are visible in the background.

FairDeDup limits social biases in AI models

Discover
A woman stands in a train, holding a phone. She is wearing a beige coat and a blue and brown scarf. The interior of the train is bright, with seats and metal support bars.

A mathematical model to help AIs anticipate human emotions

Discover

David Caswell: “All journalists should be trained to use generative AI”

Discover

Health: Jaide aims to reduce diagnostic errors with generative AI

Discover

AI researchers aim to boost collective organisation among workers for Uber and other platforms

Discover

Cybersecurity: AI attacks and hijacking

Discover

Neurotechnology: auditory neural networks mimic the human brain

Discover