Selective data sorting against dark data

Organisations have a tendency to keep useless data. This storage has both financial and environmental costs, that can be kept to a minimum with greater digital frugality.

“For greater digital frugality, start by establishing a data map”

According to a study by Veritas, 52 % of company data stored on servers is never used. This inert data is what is known as dark data. This can be data collected within the frame of an abandoned project or data that complies with regulatory compliance requirements and that has not been deleted within the legal deadlines. Mass emailing and redundancy of computer files contribute to this waste.

Dark data is growing exponentially with the development of artificial intelligence and the internet of things, which generate a continuous flow of information (geolocation data, log files). An IDC report thus predicts that the amount of data stored in the world will increase from 33 zettabytes (1021) in 2018 up to 175 ZB in 2025.

This rapid increase in data volume generates extra storage costs. This data mobilises substantial energy resources with a heavy impact on the environment. According to , dark data will be at the origin of 6.4 million tonnes of CO2 released into the atmosphere in 2020. That is the equivalent of the production of 80 countries.

Either through negligence or with the secret hope of profiting from it in the future, organisations and businesses tend to keep all of their data and not go through a “selective sorting” process. They have trouble letting go of this bad habit that is nurtured, on the one hand by the constant drop in storage costs, and on the other by the growth of GAFAM, whose business model is based on the mass storage of this dormant data.

Data governance

In order to go back to greater digital frugality, an organisation must start by getting an overview of the data circulating in its IT system. This data mapping must not ignore “shadow IT”, i.e. applications deployed under the radar of the CIO. This “parallel” IT generates an invisible stream of data.

From this data mapping, the company can set rules for data gathering and storage according to its criticality and value over time.

In this respect we distinguish hot data, that is used frequently, from cold data, that is archived. The colder the data is, the less its hosting costs as it calls upon media such as magnetic tape or cloud archiving solutions. Furthermore, as cold data is rarely called upon, its recovery is low-energy consuming.

Automatic purges

Entering into force two years ago, the European General Data Protection Regulation (GDPR) consolidates the implementation of this data governance. In effect, the GDPR introduces a right to erasure.

With this “right to be forgotten”, organisations must guarantee people who request so, that their data be deleted from their systems within 30 days.

Technically, this assumes industrialisation of the data destruction process. These automatic purges also concern data that has passed the legal time limit for storage, such as customers who have been inactive (no longer responding to contact) for three years.

Beyond collective action, staff must be made aware of their role in the increase of dark data. They should be regularly reminded of good practices such as avoiding sending multiple copies of emails, unsubscribing from newsletters they no longer open, or regularly deleting documents that have become useless. Individual awareness can have a snowball effect.

Read also on Hello Future

A boat next to an ocean buoy

A digital twin for better ocean governance

A woman holds a molecular model

Materializing data to understand it better


Live Data Hub: Giving cities control of their data


Big Data: when data improve the energy efficiency of networks


Data to the climate’s rescue


Pigs and data