Launched in 2015 by the French National Research Agency (ANR) and a Franco-American consortium, the DATAZERO project aims to build datacenters that are powered only by renewable energies, thanks to a rigorous scientific approach. We met Jean-Marc Pierson, a computer science teacher-researcher at Toulouse III University and the project coordinator. He explains its scientific objectives and presents the challenges linked to the design and running of decarbonized and self-sufficient datacenters.
What are the main challenges that a company – a Cloud provider, for example – is faced with when it wants to run a datacenter with renewable energies?
The first difficulty is finding the right contacts. Today, datacenter providers and specialized consulting and engineering firms do not always have the expertise to “think” datacenters powered by renewable energies. They do not offer this spontaneously. After several talks with various players in this market, I realized that clients have to express their wishes clearly and to insist for such a project to come to fruition.
We are leaning towards an open-source model for those who wish to provide expertise in the design of renewable energy datacenters.
Dimensioning the datacenter constitutes the second challenge, with true questioning of size and durability of the infrastructure. Return on investment time on a decarbonized datacenter is long, at least eight to ten years.
Avoiding oversizing the datacenter whilst enabling its evolution to ensure its durability: this must be extremely complicated…
Exactly! How many solar panels or wind turbines will be required? How many lithium batteries or fuel cells will be needed? Which renewable sources to favor and which storage systems to install? The dimensioning software we have developed in the scope of the DATAZERO project enables us to answer these questions with recommendations as to the type of equipment for a given workload, and according to the datacenter’s location and the weather observed in previous years (typically over a ten-year period). The real challenge of course is estimating the workload, as when you want to create a Cloud from scratch you only have a vague idea of the usage of your datacenter or of its annual growth.
What are the DATAZERO project’s scientific objectives?
The DATAZERO project was launched in 2015, starting with the following question: is it possible to run a datacenter with only renewable energies? Several players had then announced they were powering their datacenters with 100 % renewables, which was not quite true because they were still connected to the electricity grid. We asked ourselves if it was possible to go even further and become totally independent from the grid.
Two main scientific questions were posed. The first was that of the dimensioning of a self-sufficient infrastructure during the construction phase. The second was that of optimizing its running during the operating phase with a double challenge: to optimize the electricity flow in the electrical system according to data and computing flows on the one side, and to optimize service flows in the information system according to electricity production on the other.
How did you answer the first question? Could you describe the dimensioning software used in the construction phase?
We used integer linear programming (ILP) to solve this problem. This kind of optimization is known to take time as we are seeking to produce an optimal solution. But unlike during the operating phase, where we are in real time (the algorithms must therefore be very fast), we can afford to take time to find the best solution. Finally, we managed to develop a relatively fast algorithm that could obtain a result – optimal dimensioning for a datacenter containing thousands of servers running hundreds of thousands of tasks – in a few minutes. This result enables us to simulate several scenarios. For example, if the workload is X, we will need a certain configuration; if we increase this load by 10 % per year, it will be necessary to add X computer servers and X wind turbines.
During the operating phase, the idea is to optimize energy production to meet the electricity requirements of the computer resources on the one hand, and to adjust the flow of computing services according to energy production on the other. How do you manage this?
There are several methods to solve this optimization problem, which is quite a classic in computing. The traditional approach consists in creating a mathematical model that will take into account the whole set of constraints, from both the electricity and the computing sides. We took a different approach, in which the electricity side and the computing side are optimized separately, then a negotiation loop steps in.
On the one side, the electrical system is optimized according to a set workload so as to meet the servers’ requirements for a given duration (three days in this project) thanks to linear programming techniques. On the other, the computing system optimizes its running according to the expected electricity supply thanks to heuristic algorithms (providing approximate results).
To do this, there are two options. The first option: it is possible to take action at the level of the servers by varying their speed. A traditional computer will generally have a processor frequency of 3 GHZ that can be reduced. If it runs at 1 GHz it may go three times slower, but it will save over three times the energy consumed! It is also possible to act on computer task planning by delaying the running of certain non-urgent tasks.
We then developed a negotiation algorithm using game theory to meet the constraints coming from both sides. These are thus the three building blocks used to optimize the use of renewable energies in the datacenter.
What is new in DATAZERO2, the second phase of the project that began in 2020?
In DATAZERO2, we work a lot on the notion of uncertainty. Weather forecasts and load predictions are intrinsically uncertain data. In DATAZERO1, when we realized we were wrong, that we would not have access to the electric power expected, we relaunched the optimization. For the second stage of the project, we wanted to take uncertainty into account right from the beginning.
We attached an “object of uncertainty” to the electricity production and the computing load forecasts then developed new optimization algorithms under uncertainty. We hope the results will show us that the impact of errors is lower and that we no longer have to relaunch an optimization because the system is capable of anticipating these and adapting automatically.
How will the various algorithms that you have described be made available to companies to design and run decarbonized datacenters?
With DATAZERO2, we are seeking to increase the maturity of our software. Our aim is to reach maturity level 5, which would enable us to work with a software development company to turn the solution into a commercial product.
We are leaning towards an open-source model, with the usable software made available for those who wish to provide expertise in the design of renewable energy datacenters. This is currently under discussion with our industry partner, Eaton.
The maturing of software from research is a major challenge and we are trying to find ways to develop our work. We could, for example, be accompanied by a technology transfer acceleration company (société d’accélération du transfert de technologies, or SATT), a structure whose aim is to transfer the results of research to companies and take inventions to a maturity level close to that of the market.
Final question: what would an “ideal DATAZERO” datacenter look like in your opinion?
Firstly, we must specify that the DATAZERO project is targeting average-sized datacenters, consuming up to 1 MW of electric power. An ideal renewable energy datacenter is first and foremost a totally independent and autonomous infrastructure that does not need to rely on an external electricity provider.
However, today, we face a psychological barrier. Customers are reluctant at the idea of running a datacenter that is not connected to the electrical grid. Our challenge is to show that it is possible, and this while limiting redundancy as much as possible during dimensioning. If a datacenter runs with three wind turbines, we could estimate that it is necessary to install six in order to guarantee its resilience. But it is highly unlikely that all three wind turbines would break down at the same time. We therefore seek to determine the optimal configuration. In an ideal scenario, the datacenter would only need four wind turbines and we would have managed to convince the buyer that this is sufficient.