A subfield of linguistics, computer science, and artificial intelligence is concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.
|
|
Natural language Tool Kit (NLTK) |
|
PyTorch |
|
TensorFlow |
|
Keras |
|
Orange |
|
CLAMP (Clinical Language Annotation, Modeling, and Processing Toolkit) |
Clinical Language Annotation, Modeling, and Processing Toolkit, is a comprehensive clinical Natural Language Processing (NLP) software that enables recognition and automatic encoding of clinical information in narrative patient reports. |
|
Did you know UF HiPerGator stores NLP tools/datasets which you can use right away? Learn more: https://help.rc.ufl.edu/doc/NLP
The Pile Corpus from eleuther.ai (/data/ai/text/data/thepile
) for language modeling (paper)
AI-news collection ( /data/ai/text/data/news
) from Bing, Duckduckgo, and Google search engines. (this set is updated weekly, contact Eric at UFRC ericeric@ufl.edu for more information)
UF Baldwin Library digital collections (/data/ai/text/data/baldwin_library)
Fact-checking datasets: open-sourced FEVER (Fact Extraction and VERification) contains claims labeled as "Supported", "Refuted" or "NotEnoughInfo". Three datasets available (June 2023 checked) from 2018, 2019, and 2021.
Common sense Q and A: NLP-Progress repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
Kaggle NLP Datasets: Explore, analyze, and share quality data.