parsed_links_genomics_usability.csv (4.17 MB)
Download file

Dataset: Analysis of the installability and archival stability of omics computational tools - Parsed http link status

Download (4.17 MB)
posted on 2019-05-20, 10:35 authored by Serghei MangulSerghei Mangul, Thiago MosqueiroThiago Mosqueiro, Dat Duong, Keith Mitchell, Varuni Sarwal, Brian Hill, Jaqueline BritoJaqueline Brito, Russell Jared Littman, Benjamin Statz, Angela Ka-Mei Lam, Gargi Dayama, Laura Grieneisen, Lana S. Martin, Jonathan Flint, Eleazar Eskin, Ran Blehkman
Using a web scrapper, we checked the status of http links detected from scientific papers.

The data schema is detailed below:
* Type - In which part of the manusript the link was found.
* Journal - Title of the journal where the paper was published.
* Id - Pubmed's primary identifier for the paper.
* Year - When the paper was published.
* link - URL parsed from the manuscript text.
* code - HTTP code returned when trying to access the webpage.
* status - Parsed HTTP code.

For status, we followed the convention below.
* -1 = Access timed out.
* 1 = Accessed succesfully.
* 3 = Link was redirected, but still accessible.
* 4 = Error, content not found.


Usage metrics