Standardizing Pathway Entries to Wikidata
Wikidata is a free, collaborative database that collects structured data from a wide variety of sources. Wikidata items build on what is called a statement. A statement is expressed as a triple - similar the the RDF triple - which is embedded in a set of qualifiers and references to provide their provenance. The Wikidata statements and their provenance are (almost) continually being converted to an RDF model, which is made available to the Wikidata Query Service (a SPARQL endpoint). This approach means that Wikidata is easily queried and the range of data allows links between items from different fields to established. For example, a query for any item citing a given scientific article will result in not only other scientific articles, but can also access any other data item that may reference the article. Here we present the collaborative effort between several groups to enter data concerning biological pathways into Wikidata in a standard fashion that will allow users to query several databases with a single Wikidata query. Initial data from both Reactome [http://reactome.org/] and WikiPathways [http://wikipathways.org] has been added using a data model designed to standardize the commonality among pathway resources. Work is now progressing to establish and improve links between the data entries and produce a standard format that will facilitate the addition of further pathway information to Wikidata. With a harmonized data model and sufficient coverage of multiple pathway resources, the platform is implemented to query and enrich pathway information with a single query, including information provided by non-pathway resources. See for examples: https://www.wikidata.org/wiki/User:Pathwaybot/query_examples
It should be noted when using these queries that, as yet, not all Reactome data has been exported to wikidata. Wikidata supports SPARQL 1.1. This means that federated queries are possible. These are a special type of queries that allow to query multiple SPARQL endpoints in one query. See this designated blog for examples: http://sulab.org/2017/07/integrating-wikidata-and-other-linked-data-sources-federated-sparql-queries/.
Wikidata is not a replacement for either Reactome and Wikipathways, it acts as proxy between both resources, and to a larger extent to other data resources, by providing a unified interface. To maintain its role as a hub of scientific data, regular updates with the primary sources are essential. Also direct links and references to the original sources are stored, allowing direct access. Harmonized data models create the mechanism to proxy through various scientific data sources. WikiPathways and Reactome are now available on a unified data model, we invite you to join our efforts.