troveharvester.zip (16.25 kB)

Trove Newspaper Harvester

Download (16.25 kB)
software
posted on 10.05.2018 by Tim Sherratt
The Trove Newspaper Harvester is a command-line tool written in Python that helps you download large quantities of digitised newspaper articles from Trove .

Instead of working your way through page after page of search results using Trove’s web interface, the newspaper harvester will save the results of your search to a CSV (spreadsheet) file which you can then filter, sort, or analyse.

Even better, the harvester can save the full OCRd (and possibly corrected) text of each article to an individual file. You could, for example, collect the text of thousands of articles on a particular topic and then feed them to a text analysis engine like Voyant  to look for patterns in the language.


History

Licence

Exports

Licence

Exports