This file set includes the final paper, original data, python scripts, and output logs for my term project in Introduction to Digital Scholarship at the University of Tennessee to assess the utility of a computational analytic technique called probabilistic topic modeling to identify latent topics or themes present in a large corpus of textual information.
I set out to accomplish this goal by performing a topic modeling text analysis on a corpus of 622 key U.S. presidential speeches identified by the University of Virginia Miller Center and archived on their web site at http://millercenter.org/president/speeches.
The results of this project, together with a review of the available literature on topic modeling, suggest that this technique is an effective tool for mining large data sets to identify latent themes or topics. The results of the topic modeling analysis of the presidential speeches suggest that the technique accurately identified latent themes or discourses across different presidential speeches over time. The results also suggest that it is an effective tool for producing new insights into the history of presidential speeches, including finding similarities between speeches that otherwise might not be apparent.