Open and fair data : Appealing to the fairness of data producers

opendata-illustration by Julie Beck

The Swiss National Science Foundation (SNSF) is adopting an openness policy for scientific data that will have a major impact on research, with the first implementation measures for NCCRs beginning in 2019.

 

SNSF data management plan for NCCRs

Researchers have long known that sharing information helps science move forward, which is why the sciences have been open since the first academic journals appeared in the 17th century. Three hundred years later, open science has entered a new era: the age of open data. Academic publications, and even raw data, now have to be made available in digital format and online. The bodies that fund academic research have recently called for this revolution, which is possible thanks to the digital transition and the advent of the internet. Funding bodies argue that data should be sustainable and open to the whole of society since it is largely subsidized by public funds. Delivering this new sharing approach raises technical and ethical challenges, especially when it comes to managing clinical data.

 

Tackling the reproducibility crisis

The idea of open data arose in response to a series of studies showing that 50-90% of published preclinical research was non-reproducible and that 20-80 % of the data disappeared after 20 years. “Since this sterility has been mainly attributed to avoidable events, journals and supporting foundations have tried to take steps to ensure that the billions invested in research do not go up in smoke,” explains Cécile Lebrand, head of data management at FBM UNIL/CHUV Library. Among the causes that were identified, the following were singled out: poor documentation; protocols that are kept secret; experimental details that have not been fully developed; and a lack of access to raw data.

As far back as 2013, the United States, the United Kingdom and the Netherlands insisted that scientific data should be shared. The EU followed, demanding 100 % open data for its Horizon 2020 program after the “Amsterdam Call for Action on Open Science“ in may 2016, and the SNSF fell into step. The policy is having a direct impact on national centres of competence in research (NCCRs), which must provide a data management plan in 2019.

 

Data-Managment-Shutterstock-web

 

OPEN data vs FAIR data

The SNSF now expects that data generated by funded projects will be publicly accessible in non-commercial digital databases, as long as there are no legal, ethical, copyright or other issues.

The SNSF requires that the sharing of primary data follows the principles of FAIR data. FAIR is a measured approach to open data that is particularly compatible with clinical data since it allows restricted or authorized access for sensitive data, such as data that must retain the anonymity of study subjects. FAIR data covers the ways that data is constructed, stored, presented and published so that it is Findable, Accessible, Interoperable and Reusable. The word “fair” also refers to the fairness of researchers in the sharing process. Data must be recoverable using a standard, open, free and universally-applicable communication protocol. Furthermore, the data must be enriched using appropriate metadata and should be made available under known conditions via clear, visible licenses.

 

The impact on Synapsy

At this early stage of the process, it will not be possible for Synapsy’s researchers to make their entire raw data accessible to the general public. A plan will be developed instead to manage the data for internal – and possibly external – sharing. The procedure required by the SNSF for basic or preclinical laboratories indicates that every laboratory can independently describe what it does in terms of raw data management and must appoint a manager.

For clinical data, there are a number of additional obstacles. The first stage consists of determining whether the data complies with the SNSF’s ethics rules. “If there are good reasons, such as it’s impossible to make the genetic data anonymous, or if the sharing of data has not been submitted to the patient’s consent, there is restriction to share the raw data. In this case, the reasons for not sharing the data must be justified and explained”, points out Cécile Lebrand. Accordingly, Synapsy must first define what clinical data is compatible with sharing and then identify with whom and how to grant access.

Sylvain Lengacher, Synapsy’s technology transfer officer, will oversee the entire inventory process for drafting the management plan, which will naturally be scalable to adapt to the needs of researchers.

 

Choosing the right servers, formats and platforms

Synapsy uses several animal models and clinical cohorts. Common mechanisms exist between pathologies and between the animal models. It will be important, therefore, to share the data across the different laboratories, whether they are clinical or fundamental. “Nevertheless, it is essential to start the data sharing strategy with a nodal point. We chose MRI imaging and EEG,” says Synapsy director Alexandre Dayer.

Patric Hagmann, a Synapsy researcher and assistant physician in CHUV’s Diagnostic Imaging and Interventional Radiology Department, was put in charge of leading discussions on setting up a system for managing MRI and EEG data for Synapsy. Patric brings with him enormous expertise in the field for neuroimaging. “The idea would be to put the clinical data on a server by adopting a common format between the five Synapsy clinical research groups, and then managing how it is shared with a dedicated digital platform.”

According to the legislation, clinical data must be stored in Switzerland. Protected areas, UNIGE’s UniDufour servers and CHUV/UNIL’s Vitality servers are available to research groups. There is no consensus on the format of the primary data and how to organize and share it. Evidence from neuroimaging and EEG data shows that it is not unusual for experimenters from the same laboratory to use different formats. Hagmann says that a simple and easy-to-adopt format known as Brain Imaging Data Structure (BIDS) would lend itself well to the situation since it is compatible with imaging, EEG and behavioral data. However, it will be necessary to define how to integrate other types of clinical data.

A platform for managing data and metadata will then have to be put into place. Unfortunately, although the technology exists, no platform that meets the requirements of FAIR data is available at present. “Exchange platforms have existed in the biomedical field for over 20 years but they can’t be used to read, trace, protect and anonymize data,” says Cécile Lebrand. The US is investing heavily but does not yet have anything concrete. Developing a digital management platform will probably be a necessity for Synapsy.

Above and beyond the technological, ethical and security challenges, science’s new era of openness will have direct consequences for researchers. In the first instance, storing data comes at a price: around CHF 400 per terabyte (TB) or CHF 40’000 for the 100TB needed at Synapsy. “Then you need to add the administrative work and time devoted to these tasks, which is sizeable. The costs of sharing are not currently paid in full by the foundations,” explains Alexandre Dayer.

 

opendata-EUportal-web

 

A sustainable future

The wide-scale sharing of data will undeniably have a positive impact on national research clusters and Synapsy. “Thanks to the work done today, no one will leave with the data in four years, and the consortium’s twelve years of research will be passed on to future generations of researchers,” says Alexandre Dayer.

The systematic exchange of clinical data from different cohorts in the consortium will promote cross-sectional diagnosis in psychiatry. In other words, since the various psychiatric diseases are very heterogeneous, it will be easier to identify the global mechanisms and biomarkers. “Data management will bring the cohorts together and promote dimensional rather than categorical approaches, which are the result of expert consensus and don’t constitute a scientific approach,” says professor Dayer. The purpose of sharing is not to satisfy the requirements of the SNSF but to stimulate research in psychiatry.

 


Author : Yann Bernardinelli, les Mots de la Science


Want to share this news ?