10 steps to integrate CIViCdb with other public data in Wikidata

Version 2 2017-03-26, 13:51

Version 1 2017-03-26, 13:42

dataset

posted on 2017-03-26, 13:51 authored by Elvira MitrakaElvira Mitraka, Andra WaagmeesterAndra Waagmeester, Núria Queralt-Rosinach, Sebastian Burgstaller-Muehlbacher, Gregory Stupp, Lynn Schriml, Josh F. McMichael, Benjamin Ainscough, Malachi Griffith, Obi L. Griffith, Andrew I. Su, Benjamin M. Good

Precision medicine has shifted our understanding of the etiology and treatment of cancer from a focus on anatomical to molecular features. The genetic fingerprint of a patient can be deterministic in both the onset and treatment of the disease. However the etiological network of a specific disease consists of very diverse factors from genetic to environmental. With such diverse knowledge comes a diverse data infrastructure. Data is scattered across data silos and different data formats/structures. This poses a serious bottleneck when interpreting data in a clinical and/or research setting.

CIViC (http://www.civicdb.org) is an open-access, community-based, highly-curated cancer variant database. It is a platform where data on cancer genomic alterations from different data sources are curated and interpreted for clinical application. These interpretations with their evidence are captured and stored as structured data in the public domain. In order to reach an even broader audience an effort was made to include CIViC's data into Wikidata. Wikidata contains and feeds structured data into Wikipedia and to other Wikimedia projects. It has all the traits of Wikipedia (open-access, editable, community-driven) and is accessible to both humans and machines. Although Wikidata has a Wikipedia narrative, its application is not limited to it. The open APIs allow broader application.

Adding public domain datasets to Wikidata benefits audiences in both directions. In this case, Wikidata gains additional content from a highly-curated resource, while CIViC gains exposure to a wider audience, the ability to link to other data types and domains (e.g. drugs) and the benefits of Wikidata’s being a hub on the Semantic Web, allowing complex queries to be performed. We report the process involved in linking CIViC to Wikidata. This led to eight new Wikidata relations and a model to capture provenance. The resulting statements are built upon common standards coming from ontologies and other resources. The success of the data integration is proof that different data models can work together without any loss of information and we invite other resources to follow.

Funding

GM089820 and GM114833

History

Usage metrics

Keywords

wikidata gene variants Bioinformatics

Licence

CC0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM