Accessibility and topics of citations with identifiers in Wikipedia

2018-07-16T10:44:05Z (GMT) by Miriam Redi Dario Taraborelli
<h3>About</h3><h3><sup>An extended version of the dataset "Citation with Identifiers" [1] with topic and accessibility information</sup></h3><div>When possible, we associated each publication with identifier cited in Wikipedia with 1) the publication topic and 2) the information about its accessibility (e.g. if it has open access).</div><h3>Topics</h3><p>When possible, we have assigned a topic to each publication, by looking at the main topics of the pages citing a publication. See [2] for more details on how we infer topics.</p><h3>Accessibility</h3><p>We mark each DOI publication as <i>Open Access</i> or <i>Closed Access</i> by merging our data with Unpaywall [3] data.</p><p><br></p><h3>Data Format</h3><div>There is a file for each Wikipedia edition (e.g. English Wikipedia, Farsi Wikipedia). Each line contains the following tab-separated values:</div><ul><li><b>page_id</b> - the id of the Wikipedia page citing the publication, see [4]</li><li><b>page_name</b> - the title of the Wikipedia page citing the publication</li><li><b>revision_id</b> - the id of the revision where the citation has been added, see [4]</li><li><b>timestamp</b> - the time when the revision has been saved</li><li><b>publication_type</b> - the type of the publication cited, it can be: <i>isbn,doi,pmid,pmc,arxiv</i></li><li><b>publication_id</b> - the identifier of the publication, format differs according to the type</li><li><b>topic</b> - publication topic inherited from the pages where it is cited. It can take one of the values at the top level of the Wikiprojects Hierarchy [5]</li><li><b>open_access</b> :<em>Open</em>: if the canonical version is open access at the source (journal); <em>Available</em>: if the canonical version at the source is behind a paywall, but an open access copy is available at a different location; <em>Closed</em>: if the canonical version at the source is behind a paywall and not open access copy was identified at a different location</li><li><b>open_access_url</b> - the url of the open access version of the publication, if 'open_access' is 'open' or 'available'</li></ul><h3>Visualizations</h3><div>Visualisations summarising the data in this repository can be found in this notebook: <a href="" rel="nofollow">visualizations</a> <br></div><div><br></div><p>[1]</p><p>[2]</p><p>[3]</p><p>[4]</p><p>[5]</p>