Skip to Main Content

Text and Data Mining

Text and data mining are techniques used to extract information from large volumes of content. This guide provides resources for both tools and datasets for analysis as well as guidance regarding fair use and copyright.
This page is under construction. When fully populated with UF resources, statement will be removed. 

Cambridge University Press

What is it - Cambridge University Press publishes journals and books across the humanities, sciences, and social sciences. 

How to access - Users may download UF-subscribed/licensed content directly from the CambridgeCore website (see database link below). Cambridge University Press does not offer an API at this time. If you need help accessing a large amount of content for a text and data mining project, please contact openresearch@cambridge.org 

Additional Considerations - Cambridge University Press monitors its site for the unapproved use of web bots/crawlers to automatically download/scape content. In many cases, Cambridge University Press can supply full-text XML files or otherwise facilitate access. Please contact openresearch@cambridge.org before you begin a large-scale project, so that you are not unnecessarily blocked. Additionally, CrossRef [INSERT CROSSREF LINK] contains a substantial percentage of metadata records for Cambridge University Press's content.

Link(s) to TDM info and access -

Cambridge University Press - Text & Data Mining

Elsevier ScienceDirect

What is it - Elsevier ScienceDirect contains journals and book content published by Elsevier, the largest scholarly publisher in the world. Full-text and metadata for content can be access for UF subscribed and open access content.

How to access - Register for the ScienceDirect API via Elsevier's Developers Portal

Additional Considerations - access varies by use case

Link(s) to TDM info and access -

- Elsevier TDM website

- Elsevier TDM FAQs

- Request API access

Sage

What is it - Sage is a large publisher specializing in academic journals, books, and other resources.  Sage has a long history of publishing in the social sciences and humanities, and has a strong reputation in the scientific, technical, and medical disciplines. 

How to access - Sage allows its journal content to be utilized for Text & Data Mining purposes. Users may download content directly from Sage Journals. Sage explicitly allows the use automated means to download content, as long as the automated requests respect the rate limits set by the publisher (see link below for more information). Sage also offers its metadata and full-text content via the CrossRef API [INSERT CROSSREF LINK].

Additional Considerations - The CrossRef API is the suggested/preferred method. 

Link(s) to TDM info and access -

Sage Publications - Text & Data Mining

Scopus

What is it - Scopus is a multidisciplinary literature database

How to access - Register for the Scopus API via Elsevier's Developers Portal

Additional Considerations - access varies by use case

Link(s) to TDM info and access -

- Elsevier TDM website

- Elsevier TDM FAQs

- Request API access

Springer Nature

What is it - With a focus on scientific, technical, and medical research, Springer Nature is one of the largest and well-known academic publishers in the world.  

How to access - Register for the Springer Nature API via Springer Nature’s Developer Portal. See Quick Start Guide for registrations instructions. 

Additional Considerations - Springer Nature offers their Open Access API and Meta (limited metadata) API free of charge to users. For API access to the TDM (full-text) API and/or the Metadata (detailed metadata) API for UF-subscribed/owned content please contact [Insert Contact].

Link(s) to TDM info and access -

- Text and Data Mining at Spring Nature (overview of TDM/API services, policies, access categories, tools/methods, and glossary. Note:This page refers users to “api.springernature.com,” which no longer contains public facing content; see instead, Springer Nature’s Developer Portal.)

- Developer Documentation

Wiley

What is it - Wiley is one of the largest publishers of academic journal and book content in the world. 

How to access -

1. Register for an Individual Login for Wiley Online Library

2. Visit Wiley’s Text and Data Mining page, scroll down and click on “Get a Text and Data Mining Token,” login using your individual login. 

Additional Considerations - The full-text of UF-subscribed/licensed content is available via Wiley’s API, which can be used to download .pdf files. In order to identify which DOIs to download, Wiley makes all of its metadata freely available via the CrossRef API [INSERT LINK TO CROSSREF]. Additionally, in June 2025 Wiley released their TDM Client Python package via GitHub

Link(s) to TDM info and access -

- Text and Data Mining - Overview of Wiley’s TDM API service, including technical documentation, and token request

TDM Client Python package via GitHub

Web of Science

What is it - Owned by Clarivate, Web of Science is an index of scholarly content, with a heavy focus on citation metrics. Web of Science does not contain full-text content, but it does contain metadata for publications across reputable academic publishers. UF-licensed/subscribed content can be accessed via the Web of Science Starter API (see link below, replaces WoS API LITE). One might use Web of Science or CrossRef metadata APIs [Insert CROSSREF LINK} to develop a candidate list of article for a TDM project. 

How to access -

1 - Register/login to the Clarivate Developer Portal

2 - Create/register an "application" in the Portal (see additional considerations)

3 - "Subscribe" the "application" to the API (see additional considerations)

3a - Select the "Free Institutional Member Plan" option

4 - The "subscription" will await approval from Clarivate

5???

 

Additional Considerations - To utilize Clarivate APIs, their Developer Portal requires users to create "applications" that allow users to collaborate with a single API key and assists Clarivate in managing their server loads. More information can be found here: Clarivate Developer Portal - Subscriptions & Applications.

Clarivate offers a host of additional APIs for an additional fee. The Libraries can facilitate the licensing of the additional APIs via our statewide license agreement if a researcher/department is able to fund the fee. Please contact {INSERT CONTACT}

Link(s) to TDM info and access -

Web of Science Starter API

Web of Science Starter API - Documentation

Clarivate Developer Portal

Clarivate Developer Help

University of Florida Home Page

This page uses Google Analytics - (Google Privacy Policy)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.