10.6084/m9.figshare.4104651.v1
Tim Sherratt
Tim
Sherratt
False positives (non-redactions) extracted from ASIO surveillance records in National Archives of Australia Series A6119
figshare
2016
computer vision
archives
ASIO
redactions
Digital Humanities
2016-10-27 12:03:51
Dataset
https://figshare.com/articles/dataset/False_positives_non-redactions_extracted_from_ASIO_surveillance_records_in_National_Archives_of_Australia_Series_A6119/4104651
This is a collection of false positives detected by a script to extract redactions (blacked out words and phrases) from surveillance files created by the Australian Security Intelligence Organisation (ASIO) and held in Series A6119 at the National Archives of Australia. <div><br></div><div>The digitised records were harvested from the NAA's online database as a collection of individual page images. These images were then processed to find and extract redactions. </div><div><br></div><div>After running the script, I manually removed non-redactions which are included in this fileset for comparison. There was about a 20% error rate. See the linked fileset for more information.</div><div><br></div><div>The filenames of the images provide important contextual information. For example:</div><div><br></div><div>1009279-p10-1-74-54.jpg -- has the following attributes </div><div><br></div><div>File barcode: 1009279 </div><div>Page: 10 </div><div>Redaction number: 1 </div><div>Width: 74px </div><div>Height: 54px </div><div><br></div><div>You can search the NAA database to find the original file using the barcode, or construct a link like: </div><div><br></div><div>https://owebrowse.herokuapp.com/items/1009279/pages/10/ </div><div><br></div><div>to view the page from which the redaction was extracted in my own experimental interface.</div>