News articles

DBRepo: Increasing findability for data in databases with structured & unit independent search

The .dcall 2023 project “Structured & Unit Independent Search for DBRepo” has been successfully completed, advancing the digital transformation at TU Wien.

Sotirios Tsepelakis and Martin Weise standing next to each other, holding a poster with the title "Repository Infrastructure Supporting Virtual Research Environments".

© Valentin Futterer

Sotirios Tsepelakis and Martin Weise from the Center for Research Data Management.

Background

Researchers frequently need to find, use, and publish research data as part of their work. To properly manage this data in an institutional context, we developed DBRepo, a repository for data in databases that assists researchers in making their research data findable, accessible, interoperable, and reusable. The system manages the researcher’s data and derives machine-actionable metadata from it. It allows users transparent access to their data and others to explore the data. As a result of the .dcall 2023 project, we implemented a module that enables better findability of data, for example, according to their semantic concept and independent of their unit of measurement.

Structured & unit independent search

With the first datasets being deposited in DBRepo during the beginning of 2023, we found that the findability is very limited due to a free-text search. The accuracy especially had potential for development since a free-text search produced too many results that were not relevant to the search term.

This situation has improved after the completion of the .dcall 2023 project, where the search index was entirely re-modelled. It now contains an optimised replica of the metadata available in DBRepo, structured in an efficient data model. This allows for a structured search across all major components, such as databases, tables, columns, views, identifiers, users, concepts, and units of measurement, thus for a precise search of components. For example, you can search for databases that contain a semantic concept like wd:temperature, opens an external URL in a new window. This is similar to webshops allowing to filter clothing size, colour, etc.

Additionally, to further increase the relevancy of search results, a user can search datasets regardless of their unit of measurement as long as they have a common semantic concept and convertible unit of measurement. This allows for a unit independent search such as getting databases that contain a semantic concept wd:temperature, opens an external URL in a new window and unit of measurements om2:degreeCelsius, opens an external URL in a new window and om2:degreeFahrenheit, opens an external URL in a new window. The search module knows the proper context and only shows results that match the source unit.

More information

Website: https://www.ifs.tuwien.ac.at/infrastructures/dbrepo/, opens an external URL in a new window

Test instance: https://test.dbrepo.tuwien.ac.at, opens an external URL in a new window

.dcall 2023 final presentation: https://ec.tuwien.ac.at/~weise/pdf/dcall_final_presentation.pdf, opens an external URL in a new window

Acknowledgements

We want to thank the .digital office for enabling the development with funding and great collaboration through the internal .dcall 2023, TU.it for the compute resources and great collaboration, as well as all open-source developers involved (Martin Weise, Sotirios Tsepelakis, Nikola Lukic, Max Spannring, Gökay Güçlü, Geoffrey Karnbach).

Contact

TU Wien
Center for Research Data Management
Favoritenstraße 14 (top floor), 1040 Vienna

research.data@tuwien.ac.at