figshare
Browse
IMAGE
BW_ProbableDuplicateMaterialPercentages.png (49.6 kB)
DATASET
Data_Wordcounts_CaledonianMercury_1825_1835.tsv (85.75 kB)
DOCUMENT
Readme.docx (15.8 kB)
1/0
3 files

Likely percentage of the optical character recognised word count of the average issue of the Caledonian Mercury for a given year to be duplicate material, with associated data

dataset
posted on 2018-03-25, 23:15 authored by M. H. BealsM. H. Beals
This file set contains a bar chart (BW_ProbableDuplicateMaterialPercentages.png) representing the likely percentages of duplicated news, advertising, miscellany and commentary, and numerical content in the average issue of The Caledonian Mercury (Edinburgh, Scotland) for a given year, 1825-1835.

It also contains a data table containing the OCR-calculated word count for each issue, the minimum duplicate material percentage for each issue, and the extrapolated word counts and percentages for each content type (Data_Wordcounts_CaledonianMercury_1820_1840.tsv).

The data set was derived from the British Library 19th Century Newspapers, Part 1 digital collection (http://gale.cengage.co.uk/british-library-newspapers/19th-century-british-library-newspapers-part-i.aspx) using the Scissors-and-Paste Console v.0.4.2 (https://doi.org/10.5281/zenodo.1207283)

Further details are available in the included documentation file (readme.docx) and on the websites listed below.

History