DAS2016_119_v0.3.pdf (1.62 MB)
Download fileMaking Europe's Historic Newspapers Searchable
This poster provides a rare glimpse into the overall approach for the
refinement, i.e. the enrichment of scanned historical newspapers with
text and layout recognition, in the Europeana Newspapers project. Within
three years, the project processed more than 10 million pages of
historical newspapers from 12 national and major libraries to produce
the largest open access and fully searchable text collection of digital
historical newspapers in Europe. In this, a wide variety of legal,
logistical, technical and other challenges were encountered. After
introducing the background issues in newspaper digitization in Europe,
the paper discusses the technical aspects of refinement in greater
detail. It explains what decisions were taken in the design of the
large-scale processing workflow to address these challenges, what were
the results produced and what were identified as best practices.