troveharvester.zip (16.25 kB)
Trove Newspaper Harvester
Version 4 2018-05-10, 06:06
Version 3 2018-02-28, 09:44
Version 2 2018-01-13, 01:40
Version 1 2016-12-14, 09:52
software
posted on 2018-05-10, 06:06 authored by Tim SherrattTim SherrattThe Trove Newspaper Harvester is a command-line tool written in Python that helps you download large quantities of digitised newspaper articles from Trove .
Instead of working your way through page after page of search results using Trove’s web interface, the newspaper harvester will save the results of your search to a CSV (spreadsheet) file which you can then filter, sort, or analyse.
Even better, the harvester can save the full OCRd (and possibly corrected) text of each article to an individual file. You could, for example, collect the text of thousands of articles on a particular topic and then feed them to a text analysis engine like Voyant to look for patterns in the language.