News about research data management

The FAIR principles for research data

In research, it is often useful to make use of already existing data. However, even if the data is provided by the owner, access and reuse is not always possible. We explain why. And give tips on how to do it better.

[Translate to English:] Daten wiederFAIRwenden

These tips are based on the FAIR principles, which were published by FORCE11 in 2016 and have since been disseminated and promoted by various networks, organisations and projects (e.g. GO FAIR, FAIRsFAIR). TU Wien supports the FAIR principles and will also establish a national contact point for FAIR issues within the FAIR Data Austria project. Representatives of the Center for Research Data Management and the TU Wien Bibliothek will participate in the international GO FAIR Implementation Networks Meeting held in Hamburg on 23 and 24 January.

The aim of the broad application of the FAIR guiding principles and practices is that data providers and data consumers - both machines and humans - can identify and (with appropriate citation) use data of interest to them from the flood of information generated by science. To achieve this goal, researchers and infrastructure operators should prepare and store research data in such a way that it is findable, accessible, interoperable and reusable for both, humans and machines, and across disciplinary and national boundaries.

The FAIR principles apply to all digital data generated in the course of scientific projects, i.e. qualitative and quantitative research data as well as metadata, algorithms, tools, code and software. Data that comply with the FAIR principles often also comply with the Open Data concept and are available to everyone without restriction. However, this is not always the case, as the FAIR principles also allow data access to be restricted where it is useful or even necessary.

The FAIR principles in practice

F for findable

The problem: A scientist assumes that there is already data somewhere in the world that she could use for her research. She may even have heard of or read about a specific data set. Still, she cannot find the data: the usual search engines do not come up with anything useful and the link in the corresponding publication from 2013 leads her directly to error 404.

The solution: The data owner who wants to make his data available to others publishes the data in a reliable repository. During the upload to the repository, he provides the individual data sets with citable permanent and globally unique identifiers (e.g. DOIs) and other meaningful metadata that can be read by both humans and machines.

By entering the identifiers, the project name or other information about the data into a search engine, the interested potential data re-user can then easily find the corresponding data records.

A for accessible

The problem: A scientist enjoyed reading a publication on a very specific topic and is happy to find a link (DOI) to the related research data at the end of the paper. He follows the link but has to realize that he can neither access the data itself nor view the associated metadata. The reason for this is not apparent to him.

The solution: The data owner who wants to make his data available to others basically stores his data sets in a reliable repository so that they can be viewed and used by everyone. The data owner also knows, however, that it is not possible to guarantee secure access to highly sensitive data using a fully mechanized protocol. For this reason, she blocks access to the sensitive data, but indicates in the metadata how and under what circumstances interested persons can still use the data, and gives her e-mail address for easy contact. For data that cannot be made publicly available at the moment, for example due to a current embargo period, she ensures that at least the associated descriptive metadata, including information on the blocking period, is freely accessible to everyone (humans and machines).

In this way, the interested scientist can either access the data directly or ask the data owner about the possible use of sensitive data. Using the metadata, he can also estimate, even for data sets that are not currently accessible, whether they might be of interest to him and when he can expect the data to be released.

I for interoperable

The problem: A scientist found an interesting data set without access restrictions during her online research. She is happy and wants to start evaluating the data immediately, but then discovers with disappointment that the data can only be read with certain commercial software. This is aggravated by the fact that the data is not complete and that the terms used in the data set are not commonly used in the scientific community. This prevents machine readability and (given human readability) makes interpretation by the researcher herself difficult or even impossible.

The solution: The data owner who wants to make his data available to others uses only recognized, generally accepted and, if possible, open formats, controlled vocabularies and international standards for his data and metadata. This enables (partially) automated combination, exchange and interpretation of the data. Of course, the data owner also refers in the data or metadata to related data sets, especially if these are necessary for understanding the data. He or she lists the corresponding persistent identifiers and describes the relationship between the data sets (e.g. 'is new version of', 'is supplement to', etc.).

R for reusable

The problem: A scientist wants to compare his research results with the results of similar studies worldwide. Fortunately, he finds a large number of published data on his topic that is accessible and can be processed by his standard software. The result is disappointing: the data sets differ considerably from one another, and there is hardly any evidence of similarities or common trends. It seems as if the scientist is comparing apples with pears despite all the commonalities of the investigations. Furthermore, for some of the comparative data sets, the rights regarding possible reuse are not clearly defined, so that these must first be clarified through extensive inquiries with the owners (if they are accessible at all).

The solution: The data owners who want to make their data available to others take the time to describe their data in detail so that the collection, processing and analysis of the data can be understood and, if necessary, reproduced by anyone (i.e. including researchers from other disciplines). For the description, they use meaningful metadata and documentation documents attached to the data sets with detailed information on the given boundary conditions, the individual work steps and the devices/software, parameter settings, and variable names used.

Besides, the data owners specify in the metadata for each data set information about the conditions for reuse. They use free licences (e.g. the Creative Commons licence CC BY) as far as possible and refer to their URL. Only in this way it is immediately and clearly apparent to subsequent data users whether and for what purposes the data can be used.

Further reading

  • Higman R, Bangert D and Jones S, “Three camps, one destination: the intersections of research data management, FAIR and Open”, Insights, 2019, 32: 18, 1–9; DOI: https://doi.org/10.1629/uksg.468

Contact

TU Wien
Center for Research Data Management
Resselgasse 4 (TU Wien Bibliothek), 1040 Wien
research.data@tuwien.ac.at

Twitter: @RDMTUWien