Leveraging Wikidata for crowd curation

2016-04-12T11:54:25Z (GMT) by Andra Waagmeester
The process of creating, maintaining, updating, and integrating biological databases (biocuration) is, by all accounts, time and resource intensive. Because of this, many within the biocuration community are seeking help from the communities that they serve to extend the scope of what they can accomplish. In principle, such 'community curation' scales well with the rate of knowledge production in science, yet it has proven highly difficult to achieve. In fact, it is fair to say that most wiki-based projects have failed to generate a sufficient critical mass of contributors to make them effective. One approach that has proven successful is to leverage the very large, pre-existing editing community of Wikipedia. Up until now, Wikipedia has provided the means to gather contributions of unstructured, textual content (e.g. reviews of gene function), but it has offered no effective means of crowdsourcing the generation or verification of the structured data that is the principal interest of the biocuration community. With the recent introduction of Wikidata, this has changed. 
Wikidata is a new Semantic Web compatible database, operated by the MediaWiki foundation as a means to manage the links between articles on the 290+ different language Wikipedias and as a way to structure information content used within the articles. Yet it can be much more than a backend database for the Wikipedias. It is openly editable - by humans and machines - and can contain anything that is of interest. Here, we suggest that the biocuration community can use Wikidata as (1) an outreach channel, (2) a platform for data integration and (3) a centralized resource for community curation.

Our group has initiated the process of integrating data about drugs, diseases, and genes from a variety of different authoritative resources directly in the (SPARQL-accessible) Wikidata knowledge base. In our efforts, scripts are developed and applied in close collaboration with curators. They run in frequent update cycles, where data is compared and updated. Input from Wikidata (i.e. disagreement with other sources or users) is in turn looped back to the curators. 
As a next step, we are looking for more systematic approaches to capture input from the Wikidata community (and through it the Wikipedia community) that can be fed back to the curators of the data sources that are being integrated. We look forward to work with more curation teams to develop mature and effective curation cycles, leveraging the full potential of both professional and amateur curators worldwide.



CC BY 4.0