Resources can only be interpreted properly if they are named alike, thus ensuring interoperability and facilitating exchange. The use of standardized terms in the structure of your data and clear identifiers also helps to avoid ambiguities and redundancies. In controlled vocabularies, integrated authority files and international standards (ISO) you will find a wide range of predefined terms, unambiguous descriptors and standardized formats. In addition to these global determinations, there are often standards which are specific to a particular discipline or institution.
A simple example for using standards in your research process is the consistent use of standard units like
- degrees Celsius vs degrees Fahrenheit; wind speed measured in m/s vs knots
Semantic artefacts are conceptual models that describe the meaning of entities and relations in the data accurately and in a machine-actionable way. A good semantic model should represent a common view for a particular domain, thus it is good practice to search for existing models. Defining semantic artefacts (semantic modelling) is an essential step in the FAIRification process, opens an external URL in a new window as defined by GO FAIR, opens an external URL in a new window.
Two of the most relevant types of semantic artefacts in this context are controlled vocabularies and ontologies.
Controlled vocabularies are community-defined sets of terms that can be used to provide a consistent way to describe data or metadata. For example, they can be used to describe relations of people (e.g. RELATIONSHIP), opens an external URL in a new window, geographical locations (e.g., ISO 3166, opens an external URL in a new window), languages (e.g., ISO 639, opens an external URL in a new window), currencies (e.g., ISO 4217, opens an external URL in a new window), etc.
DDI-Controlled Vocabularies, opens an external URL in a new window is a set of well-established controlled vocabularies commonly used in social science data, health sciences data, data covering human activity, and other data based on observational methods to address ambiguities in (meta)data. DDI manages different stages of the research data lifecycle, including conceptualization, collection, process, distribution, discovery, and archiving.
An ontology is a "formal, explicit specification of a shared conceptualization” (R. Studer, R. Benjamins, and D. Fensel. Knowledge engineering: Principles and methods. Data & Knowledge Engineering, 25 [1–2]:161–198, 1998). This definition focuses on three major aspects:
- proper formal, i.e. it should be machine readable & interpretable
- explicit specification, i.e. meaningful relations (e.g. subclass/subsumption, disjoint, inverse) among concepts (terms) and their constraints are explicitly defined
- shared conceptualization, i.e. if ontologies exist for specific domains, then researchers are strongly suggested to use or based their ontology on these ontologies
Researchers need to develop an ontology to (i) share common understanding of the structure of their (meta)data in a domain among people and machine, (ii) reuse of existing domain knowledge and (iii) make domain assumptions explicit. It is important to note that there is no single correct ontology design for any domain, as the applications of the ontology and understanding the views of the domain will certainly affect ontology design.
There are several methodologies that can help you to adapt and build your own ontology. We would suggest to use the Ontology101, opens an external URL in a new window method (Noy, Natalya F., and Deborah L. McGuinness. Ontology development 101: A guide to creating your first ontology. 2001) to help you in developing ontology considering the existing ontology in your domain.
Ontologies relevant to research data management
There are tools, opens an external URL in a new window available to help researchers finding controlled vocabularies and ontologies that may be suitable in their field:
- FAIRsharing | Standards: Explore what resources exist — and if they can be used, extended or added new
- Index of metadata standards – Metadata Standards Catalog (bath.ac.uk) : A catalogue of metadata standards that may be used to document research data
- Linked Open Vocabularies (linkeddata.es): A searchable repository of vocabularies and ontologies used to describe many different disciplines and domains
Tools supporting standardization
- OpenRefine (formerly Google Refine) is a helpful open source tool for organising, cleaning, transforming and enriching data
- An inventory of tools for converting your data to RDF for data FAIRification
Examples for controlled vocabularies
- ISO 8601 for date and time (the internationally agreed way to represent dates is YYYY-MM-DD, i.e. September 27, 2012 is represented as 2012-09-27)
- ISO 3166 for countries (e.g. for Austria "AT" in the two-letter code and "AUT" in the three-letter code)
- ISO 639 for languages (e.g. for English "en" in the two-letter code and "eng" in the three-letter code)
- GeoNames for geographic names and topographical objects
- AGROVOC for agriculture and nutrition terminology
- ICD for disease
- IATA for airline and location codes search
- NAL for national agricultural library thesaurus concept space