The data economy is facing a paradox. The exponential increase of data is opening up an unprecedented field of possibilities in terms of knowledge and the development of new services, all the while posing a threat to privacy protection, which could impede the exploitation of this mine of information.
With the rise of federated learning, the old, centralized paradigms will give way to distributed services.
Privacy by design is a systems engineering approach that takes privacy and its protection into account throughout the process. Today this concept, which is in a position to reconcile ethics with data by using technologies that improve confidentiality and reduce the identifiability of personal data, is widespread. These are what are known as PET, or privacy-enhancing technologies.
In its Top Strategy Technology Trends for 2021, research and consulting firm Gartner, estimates that by 2025, 60 % of major organizations will use one or more privacy-enhancing technologies in the areas of analysis, strategic monitoring, or cloud computing.
Guaranteeing data confidentiality
Among the most widespread and robust privacy-enhancing technologies, it is worth mentioning homomorphic encryption, zero-knowledge proofs, multi-party computation, trusted execution environments, and differential privacy.
One technology in particular seems to be promising, that is federated learning. This technology enables machine learning algorithms to acquire knowledge from a wide range of datasets from different sources while guaranteeing the privacy of their respective data.
A less intrusive method
Federated learning consists in training a machine learning model directly on the user’s device. The model’s parameters are federated into a centralized cluster but the data (texts, sounds, photos, etc.) used in the learning process are not. Each device thus benefits from the knowledge accumulated by the whole set of other devices, which “retain” their data.
Federated learning guarantees better confidentiality of user data than the centralized learning method applied on service provider’s servers.
“This technology also has the advantage of providing increased efficiency in terms of resource use, in particular by reducing storage and computing space”, notes Gianluca Rizzo, a Senior Research Associate at the ConEx Lab (Connected Experience Lab) of the Institute of Informatics at the HES-SO Valais-Wallis in Switzerland. “Companies such as Apple with Siri, and notably Google with Gboard, were pioneers of this method that is less intrusive for users”, the researcher adds.
Horizontal or vertical
There are different types of federated learning, with each one fitting specific needs and contexts. Vertical federated learning trains a predictive model dealing with the same targets (consumers, suppliers, etc.) from the base learners of two different sectors, for example a bank and an e-commerce website. A federated server merges and anonymizes the two databases. Each company can then train its learning model without having knowledge of the other’s data.
As for horizontal federated learning, this merges databases that have the same sector characteristics but come from distinct users, for example bank savings on the European level.
Other forms of federated learning include data-centric federated learning (the owner of data enables third parties to create their models using its data but without sharing the raw data), cross-silo federated learning (aggregation of data from organizations of different sectors to collaboratively create an original machine learning model).
Wide fields of application
“This technology is obviously not useful in cases where data are naturally centralized such as in a hospital”, notes Gianluca Rizzo. “However, should several hospitals need to build a machine learning model in a particular area, federated learning makes it possible to exchange highly confidential data, healthcare data in this case, to create a model that, ultimately, does not exchange specific data but information.”
The researcher from ConEx Lab in Switzerland reveals that this technique can be useful in other sectors: “Competing banks, but also internal departments of a single bank each with confidential data to protect, can thus train models in order to create new services. Telecoms or transport organizations can thus also find the means of monetizing their data by participating in the development of new models.”
In view of the unstoppable rise of data collection in the future, and this in all areas of activity, the need for computing capacity and machine learning training will continue to grow. “To exploit all of this data, federated learning provides increased security in the processing of private data and greater reliability of the services thus generated. The old, centralized paradigms will give way to distributed services”, believes Gianluca Rizzo, who predicts a very fast and wide uptake of federated learning.