Trove Newspaper Harvester
softwareposted on 10.05.2018 by Tim Sherratt
Code as a research output can either be uploaded directly from your computer or through the code management system GitHub. Versioning of code repositories is supported.
The Trove Newspaper Harvester is a command-line tool written in Python that helps you download large quantities of digitised newspaper articles from Trove .
Instead of working your way through page after page of search results using Trove’s web interface, the newspaper harvester will save the results of your search to a CSV (spreadsheet) file which you can then filter, sort, or analyse.
Even better, the harvester can save the full OCRd (and possibly corrected) text of each article to an individual file. You could, for example, collect the text of thousands of articles on a particular topic and then feed them to a text analysis engine like Voyant to look for patterns in the language.