figshare
Browse
troveharvester.zip (16.25 kB)

Trove Newspaper Harvester

Download (16.25 kB)
Version 4 2018-05-10, 06:06
Version 3 2018-02-28, 09:44
Version 2 2018-01-13, 01:40
Version 1 2016-12-14, 09:52
software
posted on 2018-05-10, 06:06 authored by Tim SherrattTim Sherratt
The Trove Newspaper Harvester is a command-line tool written in Python that helps you download large quantities of digitised newspaper articles from Trove .

Instead of working your way through page after page of search results using Trove’s web interface, the newspaper harvester will save the results of your search to a CSV (spreadsheet) file which you can then filter, sort, or analyse.

Even better, the harvester can save the full OCRd (and possibly corrected) text of each article to an individual file. You could, for example, collect the text of thousands of articles on a particular topic and then feed them to a text analysis engine like Voyant  to look for patterns in the language.


History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC