An Evaluation of the Research Potential of Geo-indexed Internet Archive Data 1996-2010 (MSc Dissertation)
2013 MSc Dissertation for Birkbeck Geographic Information Science
This study uses the Geoindex JISC UK Web Domain Dataset (1996-2010), which is a 61gb text based dataset which contains around 700,000,000 instances of postocdes contained in archive.org’s html for it’s .uk domain collection. This data opens up the possibility of using the archive as a geographic dataset in it’s own right. The study evaluates the use and value of the archive as a dataset to researchers by processing and examining the data at various levels of aggregation and geographic areas. It evaluates data quality, provides summaries of the dataset, analysis examples, some likely research use cases, as well as recommendations for future work around this dataset.
Derived datasets are also published under my name on figshare