The term "text and data mining" (TDM) refers to processes of automated extraction of information from large quantities of texts or data (corpora). Information can be derived from unstructured or weakly structured text data (text mining) or from strucured data (data mining).

Legal information

Use of the resources and access to them is subject to various legal and technical terms of use. If you are planning to analyse content from resources licensed by the library in the course of your research, please note that automated mass downloading of full texts or other information using a crawler, script, bot or similar methods is not permitted and can lead to access being blocked.

However, many content providers enable access via special interfaces (APIs). The licensed content can be used in TDM projects for scientific (non-commercial) purposes. However, each provider's consent to your specific TDM project needs to be obtained in advance. Contact information for this purpose can be found on the linked websites.

Data Sources

On this page, you will find an overview of resources for text and data mining. If you need organizational support for data access, contact us by e-mail.

Licensed content can be used for TDM for scientific purposes.

Provider

Content

Notes on usage

AAAS - American Association for the Advancement of Science

AAAS publishes six peer-reviewed journals. TU Wien has a subscription for Science and Science Robotics.

Science platform, opens an external URL in a new window

No API available

American Chemical Society (ACS)

ACS Publications publishes a range of journals covering all aspects of chemical sciences and related fields.

ACS Publications platform, opens an external URL in a new window

TDM information ACS, opens an external URL in a new window

No API. Local TDM agreement required

Cambridge University Press

Cambridge University Press publishes more than 420 journals covering subjects across the humanities and social sciences as well as science, technology and medicine. 

Cambridge Core platform, opens an external URL in a new window

TDM information - CUP, opens an external URL in a new window

No API available
 

 

Elsevier

Elsevier publishes over 2300 journals from the physical sciences and engineering, life sciences, social sciences and humanities, and health. 

ScienceDirect platform, opens an external URL in a new window

TDM information - Elsevier, opens an external URL in a new window

Access via Elsevier API or via CrossRef TDM API

Emerald

Emerald pubishes journals from a wide range of fields, including engineering, applied sciences and technology, management, and library and information sciences.

Emerald Insights platform, opens an external URL in a new window

TDM information - Emerald, opens an external URL in a new window

No API available

JSTOR Labs

JSTOR hosts over 2800 scholarly journals from the humanities, social sciences, and sciences. JSTOR works with a diverse group of nearly 1200 publishers from more than 57 countries to preserve and make their content digitally available.

JSTOR platform, opens an external URL in a new window

JSTOR Labs, opens an external URL in a new window

Various APIs and open source projects available

Oxford University Press

Oxford University Press publishes over 500 peer-reviewed academic journals with learned societies from all disciplines, including science and mathematics, the arts and humanities, the social sciences and medicine and health.

Oxford Academic platform, opens an external URL in a new window

TDM informatin - OUP, opens an external URL in a new window

No API available

Royal Society of Chemistry

The Royal Society of Chemistry publishes 52 journals covering the chemical sciences and related fields.

RSC platform, opens an external URL in a new window

TDM information - RSC, opens an external URL in a new window

No API. Local TDM agreement required

SAGE

TU Wien Bibliothek licenses around 25 SAGE journals from the disciplines  of spatial planning, mechanical engineering and computer science.

SAGE Journals, opens an external URL in a new window

TDM information - SAGE, opens an external URL in a new window

Access via  CrossRef TDM API

Springer Nature

Springer publishes over 2900 journals from the fields of science, technology, and medicine (STM) and from the humanities.

SpringerLink platform, opens an external URL in a new window

TDM information - Springer Nature, opens an external URL in a new window

Access via Springer API. Local TDM agreement concluded for licensed journals and Lecture Notes.

Taylor & Francis

Over 2700 peer-reviewed journals from a wide range of disciplines, opens an external URL in a new window.

Explore Taylor & Francis journals, opens an external URL in a new window

TDM information - Taylor & Francis, opens an external URL in a new window

No API available

Wiley

Wiley offers a portfolio of 1600 journals from the life, health and physical sciences, social science and the humanities. Half of these journals are published in partnership with prestigious international scholarly and professional societies.

Wiley Online Libary, opens an external URL in a new window

TDM information - Wiley, opens an external URL in a new window

Local TDM agreement concluded for licensed journals. Access requires an ORCID iD and is carried out via CrossRef API.

Provider

Content

arXiv, opens an external URL in a new window

Preprint collection from the fields of physics, mathematics, computer science, electrical engineering, statistics, financial mathematics and biology

BioMed Central, opens an external URL in a new window

Around 300 open access journals from the disciplines biology and medicine

CORE, opens an external URL in a new window

CORE is the world's largest aggregator of open access research papers from repositories and journals. 

Crossref text and data mining, opens an external URL in a new window

Full-text documents from participating publishers regardless of the publishing model (both open access and subscription content)

User guides are available on crossref.org, opens an external URL in a new window

Europeana, opens an external URL in a new window

Digital library of European cultural heritage material including digitised books, films, museum and archive collections from over 2000 European insitutions

HathiTrust Digital Library, opens an external URL in a new window

Datasets from Internet Archive and Google Books and local digitised items from over 120 academic institutions worldwide

Internet Archive, opens an external URL in a new window

Over 2 million freely downloadable books and other texts

Public Library of Science (PLOS), opens an external URL in a new window

Access to the journals of the Public Library of Science, a nonprofit academic open access publisher 

PubMed Central: Databases and Text Mining Tools, opens an external URL in a new window
 

Various freely downloadable mining tools for searching PubMed Central, a free resource containing content from the fields of life and biomedical sciences

Wikidata, opens an external URL in a new window

Structured data from Wikipedia and other free knowledge databases