“Within a few years, neural coders will exceed today’s latest video standards.”
Is neural coding catching up to traditional video coding standards? Orange leads the way with its entry in the CLIC Challenge (the Challenge on Learned Image Compression) — a workshop at the Conference on Computer Vision and Pattern Recognition, which is an annual event organized by the Institute of Electrical and Electronic Engineers (IEEE). This year’s 2021 event took place at the end of June.
The rules were clearly defined so that innovations in the scientific community could be judged on the same criteria: some 100 videos of several seconds each needed to be compressed to 1 Mb/s, while retaining the best possible quality. Thirteen candidates submitted their coding for testing, with players from around the world, both in industry and from universities, including a team from Orange. Orange won the overall challenge with a traditional encoder using an optimization method for the latest MPEG video standard (H.266/VVC). The Group also attracted attention with another of its contributions: based exclusively on a neural approach, the encoder developed by Théo Ladune—who is in his third year of PhD research at Orange—came first out of five competing neural encoders, with a score that came very close to recent coding standards (H.265/HEVC).
A Quick Lesson in Video Compression
Videos comprise a succession of images that look very similar. Let’s take a soccer match, for instance: from one sequence to the next, the pitch and the stadium remain the same, the crowd moves a little bit, and the real difference is in the movement of the players and the ball. Based on this observation, video compression takes place in two stages: the first is a prediction stage, which begins with a starting image; then there is a correction stage, where we simply transmit the difference between the starting image and the prediction. By focusing on small changes from one sequence to another, we reduce the amount of data. In traditional video encoding, we go element by element in determining how best to recover the video signal. This is how today’s most common video standards work, such as MPEG.
Leave the Complex Work to Machines
Breaking down the task into sub-tasks was the easiest way to go about things, until artificial neural networks appeared. The basic principle remains the same: one neural network is taught to predict the next image, and another to identify the errors in the prediction and send corrections in the most compact form possible. So, what’s new? It doesn’t matter how complex the video encoding is, as it is the neural network that is responsible for processing it. All we have to do is signify that we want it to get the best image at 1 Mb/s, and it will try to learn how to do so.
Trust the Neural Network
Théo Ladune details the technical choices he made: “When designing my encoder, I tried to constrain it as little as possible. For a researcher, there is a strong temptation—even when working with a neural network—to base the architecture on the traditional method of image subtraction, i.e. prediction. After all, it is a tried-and-tested method. But I decided to give the network an unmixed image with its prediction and allowed the algorithm to develop its own compression method. I also chose to have it learn pieces of the sequence, i.e. groups of several images, whereas others preferred to feed their network image by image. I took the gamble of relying on the neural network, building architectures to help it learn while giving it as much freedom as possible.”
Is a Neural Standard Just Around the Corner?
Video standards have existed since the early 1990s and run in ten-year cycles. Industry manufacturers want to achieve interoperability to provide a guaranteed technical channel from the content broadcaster to the user’s television via the network that the data travels through. With each new standard comes new content (HD, 4K and so on), as well as new constraints. Where does a neural encoder fit into this paradigm? As Théo Ladune’s supervisor at Orange, Pierrick Philippe, explains: “It is important to note that before 2017, a neural network was not yet able to perform fixed image compression. This is a fledgling field! Today, Théo’s video encoder is slightly below the MPEG HEVC standard published in 2013. At this rate, it is safe to predict that within a few years neural coders will exceed today’s latest standards. We are at a crossroads, and one thing is certain: every player in this field is paying attention. Will the next standard be traditional, neural or mixed? It is impossible to say at the moment, and that is why work such as Théo’s is so impressive.”