Skip to Main Content

Digital Humanities Toolkit

Resources for digital humanities

Text Analysis

Text Analysis (also known as Text Mining or Content Analysis) is to extract machine-readable information from unstructured text in order to enable data-driven approaches towards managing content. 


Voyant Tools is an open source, web-based environment for scholarly text analysis. Simply paste a URL or upload a document and Voyant will create automatic data visualizations that summarize the text of your document.  View these examples to see how Voyant has been used in real digital humanities research.  

HathiTrust Research Center

The HathiTrust Digital Library enables scholarly research by preserving more than 17 million resources that span the history of printed text and represent over 400 languages.  With the tools provided by the HathiTrust Research Center, you can do computational research such as large scale text analysis of the works in the HathiTrust Digital Library.  


WordSeer uses visualization, information retrieval, and natural language processing to make the contents of text navigable, accessible, and useful.  This set of tools is designed specifically for humanities scholars who need to work with digitized text collections on a large scale.  With WordSeer you can easily trace patters of language use across multiple texts.  

Stanford NLP Group

The Stanford Natural Language Processing Group makes some of their NLP software available to everyone for free. They provide statistical NLP, deep learning NLP, and rule-based NLP tools for major computational linguistics problems, which can be incorporated into applications with human language technology needs. These packages are widely used in industry, academia, and government.