Scholarly article citations in Wikipedia

2017-01-05T23:06:49Z (GMT) by Aaron Halfaker Dario Taraborelli
<p>This dataset includes a list of citations to scholarly articles from the most recent version of Wikipedia.</p> <p><strong>License</strong></p> <p>All files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/</p> <p><strong>Projects</strong></p> <p>• English Wikipedia</p> <p><strong>Identifiers</strong></p> <p>• PubMed IDs (pmid) and PubMedCentral IDs (pmcid).<br>• Digital Object Identifiers (doi)</p> <p>• International Standard Book Number (isbn)</p> <p>• ArXiv Ids (arxiv)</p> <p><strong>Format</strong></p> <p>Each row in the dataset represents a citation as a (Wikipedia article, scholarly article) pair. Metadata about when the citation was first added is included.</p> <p>• page_id -- The identifier of the Wikipedia article (int), e.g. <em>1325125<br>• </em>page_title -- The title of the Wikipedia article (utf-8), e.g.<em> Club cell<br>• </em>rev_id -- The Wikipedia revision where the citation was first added (int), e.g.<em> 282470030<br>• </em>timestamp -- The timestamp of the revision where the citation was first added. (ISO 8601 datetime), e.g.<em> 2009-04-08T01:52:20Z<br>• </em>type -- The type of identifier, e.g.<em> pmid<br>• </em>id -- The id of the cited scholarly article (utf-8), e.g.<em> 18179694</em></p> <p><strong>Source code</strong></p> <p>https://github.com/halfak/Extract-scholarly-article-citations-from-Wikipedia (MIT Licensed)</p> <p><strong>Notes</strong></p> <p>Citation identifers are extracted as-is from Wikipedia article content. Our spot-checking suggests that 98% of identifiers resolve.</p> <p><em>• </em>Added ISBNs for the 20150205 dataset.</p> <p>• Added arXivs for the 20150602 dataset. </p> <p> </p>