“The FAIR principles describe the way in which data must be stored and presented to be more easily findable, accessible, interoperable, and reusable.”
Open science is based on the idea that research results must leave the universities and laboratories to be disseminated across the whole of society. To achieve this, it is necessary to make scientific publications and data freely accessible to all (to researchers but also to individuals and businesses).
Yet today, access to scientific knowledge – despite it often stemming from publicly-funded research – is generally subject to expensive subscriptions imposed by specialist journals that are held by a small number of editors.
More efficient and transparent science
For its supporters, open science leads to democratization of access to knowledge. Above all, it increases the efficiency of research and fosters scientific breakthroughs and innovation. In effect, it enables researchers to pool their efforts and coordinate their work within an ecosystem that is more favorable to collaboration and the accumulation of knowledge in one or more fields of study.
With its increased transparency, open science is also a lever for scientific integrity and reinforces public confidence.
France has adopted an ambitious policy in this area: after an initial plan launched in 2018, the Ministry of Higher Education, Research and Innovation announced a Second National Plan for Open Science in July 2021. This second plan, via which the government confirms its commitments to the opening up of scientific publications and data, extends its scope to include the source codes produced by research.
Open access routes
The first prerequisite of open science, open access, refers to the free availability of academic articles in digital format. This notion covers both open access, meaning the content freely available to internet users, and free access: content that is both freely available and provided under a free license, which means it can be reused – under the terms specified in the license.
Open access has several economic models, or “routes”, for covering publishing costs. The “green route” refers to the depositing, by the authors themselves, of their work in an open archive such as HAL, the French national archive created in 2001. It is also worth mentioning the European initiative OpenAIRE, or the American archive arXiv in the fields of biology, physics, mathematics, computing, etc.
The “golden route” concerns natively open access journals. Several models enable financing of the editorial work. In the author-payer model, the author of an article (or the institution that employs them) pays the editor to compensate for the loss of funding that is usually received via subscriptions. For example, the American Public Library of Science (PLOS) project has taken up this route. OpenEdition has also chosen the “golden route” but with a “freemium” model: subscription is free but there is a fee to access extra services.
France is championing yet a third route: the “diamond” model whereby publishing fees are not charged to readers or to authors but are paid for by the state, a university, or a not-for-profit organization, etc.
Open (scientific) data and FAIR principles
The topic of opening up scientific data is more complex as it can be limited by legal restrictions (industrial or trade secret, personal data, etc.) or by best safety practices.
In some fields researchers are used to sharing data, such as in particle physics where the CERN, the European Organization for Nuclear Research, makes the data produced available to the scientific community as well as to the general public, and in others, such as sociology or biology, the trend is more towards “data hoarding” (in particular due to their acquisition cost).
Because of this, it is difficult for researchers to analyze or reproduce the results of their peers’ work or use it to make new discoveries.
To address this situation, based on the model of what has been achieved in the area of public data, the French government has decided to create a national platform bringing together all of the cross-disciplinary research data under the “Recherche Data Gouv” heading.
As for the European Commission, it has launched the European Open Science Cloud (EOSC). Researchers working in European institutions have access to all available data and to services enabling the processing and analysis of this data.
The way in which the data is organized is also important. Thus, the FAIR (Findable, Accessible, Interoperable, Reusable) principles describe the way in which data must be stored and presented to be more easily findable, accessible, interoperable (meaning exchangeable), and reusable. This implies that this data – and the metadata describing it – conforms with a certain number of protocols and standards.
NLP and text mining supporting open science
Finally, open science goes hand in hand with the development of tools based on artificial intelligence (AI) and machine learning (ML) to help researchers analyze and exploit the scientific production in a particular field – a task humanly impossible given the huge amount of data available!
Natural Language Processing (NLP) and Text and Data Mining (TDM) prove particularly useful for sorting through all the publications and scientific data, and for discovering relevant information (information retrieval).
TDM refers to the methods and algorithms that make it possible to analyze, with the help of linguistic technologies, large heterogeneous sets of data or non-structured text and to automatically extract knowledge from these.
Funded by the French government, the ISTEX platform (Information Scientifique et Technique d’Excellence) provides teachers and researchers with online access to over 20 million documents from around thirty corpora of scientific literature in all fields.
To enable even finer and relevant research, it also provides them with TDM services. Several data-semantics and visualization tools developed for the purposes of this project are now available to all, such as the LODEX software.
In the field of biomedical research for example, pioneer PubGene offers tools that enable users to explore huge data repositories using advanced text mining algorithms and specialized NLP algorithms. The objective of the Norwegian company founded in 2001 is to make personalized medicine more accessible. Its Coremine Vitae thus promises to help clinicians to identify the best treatment options and to define health protocols according to the patient’s individual medical profile.