All news at TU Wien

"Metadata is really everything!"

Dr Mariette Vreugdenhil on FAIR Data Management in Remote Sensing Research.

The picture shows a photo of a woman, with the words "I care to make it FAIR" and a world map in bright colours in the background.

© TU Wien / Livia Beck

Mariette Vreugdenhil surrounded by drought monitoring information

We meet Dr Mariette Vreugdenhil at the Geodesy and Geoinformation department at TU Wien Freihaus, right where the brand-new Sentinel-1 Satellite replica is floating over the staircases. Mariette is a senior scientist at the Research Unit of Remote Sensing, focusing on soil moisture and vegetation monitoring using satellites such as those from ESA, NASA, and the Copernicus program.

Our interview centres on the evolution and current practices in earth observation data management as well as the importance of metadata to facilitate Open Science and the FAIR principles, particularly in the context of satellite data. 

High-resolution real-time data

“When I started, there wasn't really any high-resolution satellite data available because usually the trade-off in Earth Observation is you have either very high spatial resolution or very high temporal resolution. When I started working with Earth Observation data, spatial resolution was 25 by 25km, and now it's ten by ten meters. So, a lot has happened.”

The first Sentinel satellite was launched in 2014 by the Copernicus program. It resulted in an exponential increase in data volume, making the transition to cloud-based data processing essential. Over the past years, the field has shifted from low-resolution satellite observations to , facilitated by infrastructure like the Austrian-based Earth Observation Data Centre (EODC) and virtual machines. 

Data visit instead of download

“I basically start my computer, I connect to Arsenal Earth Observation Data Centre, and then I work with the data there – no need to download large datasets anymore.”

Dr Vreugdenhil explained how these virtual workspaces provide raw satellite data as well as already processed data publicly for reuse, so researchers can derive and verify environmental variables, for example, from backscatter measurements. The question is how to translate this raw backscatter into environmental variables that can be used for drought monitoring, such as soil moisture or vegetation water content. Both the source material as well as the environmental variables are publicly findable and freely accessible, with providers aiming to implement the FAIR data principles (Findable, Accessible, Interoperable, Reusable). 

Metadata is everything

“These datasets are all freely available, but they come from different sources and in various formats. This makes it quite challenging for users, as you need to harmonise the data and different processing lines, where metadata becomes crucial to keep track of data versions and origin. So, metadata is really everything!”

Depending on the provider, the description of the different data sets – ranging from forest data, meteorological conditions, to digital elevation models – all vary in detail and completeness. Harmonising these diverse datasets from multiple providers (ESA, Copernicus, Geosphere Austria, Austrian Research Centre for Forests) and ensuring their interoperability remains challenging and showcases the importance of internal data pools and standardised processing scripts for analysis-ready datasets. From here, our interview turns to the increasing role of AI in data harmonisation, coding, and analysis, with emphasis on the need for both physical-based and AI-driven models. 

Publication of data and code

“What's also important in our field is, of course, code sharing. Putting your code on git so that it's publicly available, and we have PhD students cleaning up their code with AI. And then there is the other side of the algorithm development that we do, which can either be physical-based models or machine learning. If we go from the backscatter data to the soil moisture, for example, you need to develop the model and you need to calibrate it on some reference data – this reference data is very often in-situ data that we collected in the field.”

However, the conversation also reveals tensions between Open Data ideals and publication pressures, especially regarding in-situ data, where researchers are reluctant to share valuable field data before publication. In this context, Dr Vreugdenhil emphasises research data repositories like the TU Wien Research Data Repository, opens an external URL in a new window as being essential for sharing non-operational datasets, ensuring proper citation and credit, and supporting Open Science. Our discussion considers embargoes, appropriate licensing, and metadata publication as potential solutions. Metadata and data versioning emerge as critical for reproducibility and long-term usability, particularly when satellite data archives are reprocessed or software versions change. Dr Mariette Vreugdenhil remains optimistic about the longevity and accessibility of curated datasets, highlighting the importance of long-term records for climate studies and the challenges posed by satellite lifespans and mission-specific data.

Contact

Mariette Vreugdenhil
Department of Geodesy and Geoinformation
TU Wien
mariette.vreugdenhil@tuwien.ac.at

Center for Research Data Management
TU Wien
research.data@tuwien.ac.at