pdfmetadata.csv (40.82 kB)

A survey of Academic Publisher PDF metadata

dataset

posted on 2012-12-30, 23:37 authored by Ross MounceRoss Mounce

This is the corresponding dataset to a blogpost at http://rossmounce.co.uk/2012/12/31/pdf-metadata-why-so-poor/

It's a simple survey of PDF metadata, across a variety of different academic publishers sampling mostly from PDFs published in the year 2011, or what I could gain access to. All are from the publisher-provided Version of Record PDFs not self-archived pre-prints or other such. I used the CLI tool pdfinfo to extract this metadata.

Columns A to K are identifying metadata I supply about each PDF (some fields not complete!). Whilst columns L to V provide the interesting metadata about each PDF.

Many of the PDFs sampled are not Open Access so (sadly) I cannot provide you with copies to replicate these results.