wikidata: a central hub of linked open life science data

Data in the life sciences are abundant, but dispersed over many different resources. However, for the onset of research these different resources need to be integrated. Although the Semantic Web has been proposed as a potential solution for rapid knowledge integration, most data remains in their different data silos, which expand continually, worsening the knowledge integration challenge.

In the last decade, Wikipedia has been successful in becoming one of the most important sources of information on the web. Wikipedia thrives on the community for its curation. One of the partner projects currently is Wikidata, which is a public and free linked database using the same principles of community curation.

Here we report on our effort to make Wikidata a central hub for linked open life science data. Doing so not only provides a linked data platform of said data, but also opens up the potential of the Wikidata community at large for curating and putting the different data sources under scrutiny. Our game plan is to (1) develop bots to publish knowledge from established data sources on genes, diseases and drugs on Wikidata, (2) harvest links between these entities and enrich the respective Wikidata entities with these relations, and (3) engage the community in curating the knowledge at hand by developing applications to disseminate the content to a wider audience. Here, we report the first milestones, being Wikidata entries on all human genes from Entrez Gene and the diseases from the Disease Ontology. Within days upon first publication of these entries, the curation power of Wikidata became visible by some valuable improvements made by the community. Our next goals are to add gene-disease, drug-disease and gene-drug relationships.

Keyword(s)

License

CC BY 4.0