Metadata

In archives, metadata is used to create consistent and structured descriptions of resources and their characteristics. Metadata is independent and can either be stored together with the resources it describes or separately.

Metadata plays a key role in the future of your stored research data: it contains important information which enables you and others to search for and find your data.

A metadata set consists mainly of descriptive categories, the so-called elements and their values. To facilitate communication with other systems, predefined standards are used for the elements and controlled vocabulary and persistent identifiers (PIDs) for the values.

When research data is stored in repositories, corresponding metadata is entered manually or gathered automatically via an interface (metadata harvesting).

Basically, a distinction is made between the bibliographic/administrative metadata and the descriptive/technical metadata.

Bibliographic and administrative metadata provides information on the origins of a whole data set. Those types of metadata are usually structured in a similar way and include information such as:

  • Title: name of the dataset or research project in which the data was produced
  • Author/primary researcher: name, institution, person identifier (e.g. ORCID iD)
  • Contributors: persons/institutions not primarily involved in data creation (e.g. data curators, data managers), as well as their tasks and identifiers
  • Identifier: individual number which identifies the data (e.g. internal project number)
  • Type of data: data type, file format and file size
  • Rights: usage rights and licences
  • Dates: time periods associated with the data (e.g. project start, project conclusion, observation period, release date)
  • Language: language or languages of the research data content
  • Place: references to a physical location or territorial coverage (e.g. coordinates)
  • Content summary: keywords or phrases describing the topic or data content
  • Research: funding body, grant number
  • Relationships: information about relationships with other resources

 

Descriptive and technical metadata provides additional information on individual aspects or data sets. It is structured very differently depending on discipline and type of data.

Metadata is often saved as XML (extensible markup language) or another markup language format (e.g. JSON). XML source texts are machine-readable and human-readable and can be transferred to other formats (e.g. with the help of an XML – JSON converter in JSON).

In XML, element-value pairs are structured so that the value stands between the beginning of the element (pointed brackets) and the end of the element (pointed brackets with a slash). A simple example would be:

                <creator>Max Mustermann</creator>

The element-value pair for a resource is always preceded by a root element which describes the nature of the resource (e.g. memo, book).

To increase the effectiveness of metadata, repositories and the scientific community used defined standards for metadata. Standardization allows metadata from different sources to be linked and edited together. Often, one standard can also be transferred to another by mapping.

The Research Data Alliance (RDA) website has a list of disciplinary metadata standards for scientific data.

A simple and widespread metadata standard is the Dublin Core (ISO Standard 15836 (2009). It consists of 15 elements (e.g. <dc:creator>Max Mustermann</dc:creator) and various child elements.

To optimize your information for searches and to facilitate machine-processing, you should use fixed terms for the individual values in your metadata. Like resources can only be linked properly with one another if they are named alike, thus ensuring interoperability and facilitating exchange. The use of standardised terms and clear identifiers also helps to avoid ambiguities and redundancies.

In controlled vocabularies (thesauri and classifications), integrated authority files and international standards (ISO) you will find a wide range of predefined terms, unambiguous descriptors and standardized formats. These include, for example, personal identifiers, norms for depiction of time data and lists of geographical locations and their descriptions. In addition to these global determinations, there are often standards which are specific to a particular discipline or institution.

Examples

  • ORCID for academic researchers
  • FundRef for funding bodies
  • DOI for online publications
  • ISBN for books
  • ISO 8601 for date and time depictions
  • ISO 639 for languages
  • ISO 3166-1 alpha-2 for country abbreviations
  • GeoNames for geographic names and topographical objects
  • AGROVOC for agriculture and nutrition terminology
  • ICD for diseases

 

Further thesauri and classifications can be found in the Basel Register of Thesauri, Ontologies & Classifications.