The project FAIR Data Austria, opens an external URL in a new window (FDA) aims to strengthen the transfer of knowledge, by making research findable, accessible, interoperable and re-usable (FAIR). This is done by providing repository systems, for storing data and code, and providing training and support for managing research data. Besides our contribution, a file-based repository based on InvenioRDM will be developed and deployed.
To support also relational data held in databases, TU Wien, in cooperation with the University of Vienna is developing a novel database repository system (DBRepo), which provides access through standard interfaces (JDBC, RESTful, AMQP), to integrate them into other systems.
DBRepo assists researchers in creating and hosting databases according to the FAIR principles, supports reproducible queries, data versioning and searching for specific research data sets. This is realized by storing metadata, tracking changes, saving queries and issuing persistent identifiers to allow publication of arbitrary subsets of data. The repository system does all this, without creating additional overhead for researchers and also allowing long-term data preservation.
Workflows can vary drastically between different use-cases, and can lead to different repository needs and even in repositories to different types of use. DBRepo can be integrated in the research process as a live repository that collects data during the lifetime of a project, rather than depositing data when a project has ended. Data stewards on the other hand can integrate DBRepo into their existing digital curation processes removing the burden of data management from the researcher.
DBRepo is basically a private-cloud hosted database infrastructure with different stakeholder interacting to allow everybody to focus on their core concerns: researchers can create databases, read data directly from sensors and machines, or upload it from data exports; IT experts take care of server maintenance, security aspect and access control, while data stewards can assist with data curation, FAIRness, and citability of data, to name but a few.
Researchers can create databases directly via a web-interface, in a self-contained docker container and can fill their databases with continuous data streams or static files. Meta-information about all databases is saved, to allow searching for specific data sets and ensure the FAIRness. Each database can be updated and queried via API or the web-interface, so that the repository can be included directly in the research process for novice and expert database-users.
Data Versioning & Reproducible Querying
As many databases are evolving over time, a database repository needs to reflect modifying operations on the data set as each modification results in a different version of the data set to ensure reproducibility of studies done on earlier versions of the data. By implementing the RDA recommendation on data citation, DBRepo ensures that all data is versioned and that any subset of data can be reproduced as it was at any specific point in time. By issuing persistent identifiers, such as DOIs, any such subset of data can be cited and shared.
A paper with a technical summary of the project is available online (DBRepo System: DOI 10.17605/OSF.IO/B7NX5, opens an external URL in a new window), as well as a public mirror (DBRepo source code: https://github.com/fair-data-austria/dbrepo, opens an external URL in a new window), where the source code can be downloaded and modified under an open source Apache 2 license.
The database repository supporting reproducible queries, data versioning and searching for specific research data sets, as well as assisting researchers in increasing FAIRness of their research output of diverse disciplines and backgrounds is looking for pilot adopters who want to use DBRepo to host and manage their data.
Please get in contact with the team, discussing potential use-cases, assist in test-deployments and implement feature requests proposed by your institution: