Skip to Main Content

Text and Data Mining

Text and data mining are techniques used to extract information from large volumes of content. This guide provides resources for both tools and datasets for analysis as well as guidance regarding fair use and copyright.

Overview

This guide is intended to help you learn about text and data mining (TDM) and the resources available to you here at UF via the George A. Smathers Libraries. Before you get started, please:

  1. Consult the general terms and conditions for using UF Library electronic resources
  2. Review the specific terms of use on our UF Licensed Data Sources page, or email er-help@uflib.ufl.edu if you plan on using any AI tools for text and data mining, as separate restrictions may apply.

Violating license agreements, even unintentionally, can result in the entire campus community losing access to critical research resources and potentially expose you and the University to legal liability. 

What is text and data mining?

Text and data mining (TDM) are automated techniques for analyzing large volumes of digital information to discover patterns, trends, and valuable insights. Data mining is the overarching process of finding anomalies, patterns, and correlations within large datasets, combining techniques from machine learning, statistics, and database systems. It evaluates both structured data (like database tables) and unstructured data (like text) to identify new information. Text mining is a specialized subset of data mining that focuses specifically on unstructured text-based data (like interviews, articles, or narratives). The overall goal of TDM is to transform raw data into knowledge.

University of Florida Home Page

This page uses Google Analytics - (Google Privacy Policy)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.