Resources can only be interpreted properly if they are named alike, thus ensuring interoperability and facilitating exchange. The use of standardized terms in the structure of your data and clear identifiers also helps to avoid ambiguities and redundancies. In controlled vocabularies, integrated authority files and international standards (ISO) you will find a wide range of predefined terms, unambiguous descriptors and standardized formats. In addition to these global determinations, there are often standards which are specific to a particular discipline or institution.

A simple example for using standards in your research process is the consistent use of standard units like

  • degrees Celsius vs degrees Fahrenheit; wind speed measured in m/s vs knots

Semantic artefacts

Semantic artefacts are conceptual models that describe the meaning of entities and relations in the data accurately and in a machine-actionable way. A good semantic model should represent a common view for a particular domain, thus it is good practice to search for existing models. Defining semantic artefacts (semantic modelling) is an essential step in the FAIRification process, opens an external URL in a new window as defined by GO FAIR, opens an external URL in a new window.

Two of the most relevant types of semantic artefacts in this context are controlled vocabularies and ontologies.

Controlled vocabularies

Controlled vocabularies are community-defined sets of terms that can be used to provide a consistent way to describe data or metadata. For example, they can be used to describe relations of people (e.g. RELATIONSHIP), opens an external URL in a new window, geographical locations (e.g., ISO 3166, opens an external URL in a new window), languages (e.g., ISO 639, opens an external URL in a new window), currencies (e.g., ISO 4217, opens an external URL in a new window), etc.

DDI-Controlled Vocabularies, opens an external URL in a new window is a set of well-established controlled vocabularies commonly used in social science data, health sciences data, data covering human activity, and other data based on observational methods to address ambiguities in (meta)data. DDI manages different stages of the research data lifecycle, including conceptualization, collection, process, distribution, discovery, and archiving.

Ontologies 

An ontology is a "formal, explicit specification of a shared conceptualization” (R. Studer, R. Benjamins, and D. Fensel. Knowledge engineering: Principles and methods. Data & Knowledge Engineering, 25 [1–2]:161–198, 1998). This definition focuses on three major aspects:

  • proper formal, i.e. it should be machine readable & interpretable
  • explicit specification, i.e. meaningful relations (e.g. subclass/subsumption, disjoint, inverse) among concepts (terms) and their constraints are explicitly defined
  • shared conceptualization, i.e. if ontologies exist for specific domains, then researchers are strongly suggested to use or based their ontology on these ontologies

Researchers need to develop an ontology to (i) share common understanding of the structure of their (meta)data in a domain among people and machine, (ii) reuse of existing domain knowledge and (iii) make domain assumptions explicit. It is important to note that there is no single correct ontology design for any domain, as the applications of the ontology and understanding the views of the domain will certainly affect ontology design.

There are several methodologies that can help you to adapt and build your own ontology. We would suggest to use the Ontology101, opens an external URL in a new window method (Noy, Natalya F., and Deborah L. McGuinness. Ontology development 101: A guide to creating your first ontology. 2001)  to help you in developing ontology considering the existing ontology in your domain.

Ontologies relevant to research data management

There are tools, opens an external URL in a new window available to help researchers finding controlled vocabularies and ontologies that may be suitable in their field: