Wikidata as an intuitive resource towards semantic data modeling in data FAIRification

Jacobsen, Annika; Waagmeester, Andra; Kaliyaperumal, Rajaram; Stupp, Gregory S.; M. Schriml, Lynn; Thompson, Mark; I. Su, Andrew; Roos, Marco

doi:10.6084/m9.figshare.7415282.v2

SWAT4HCLS2018shortpaper_Jacobsenetal.pdf (476.95 kB)

Wikidata as an intuitive resource towards semantic data modeling in data FAIRification

Version 2 2018-12-04, 13:40

Version 1 2018-12-04, 08:48

journal contribution

posted on 2018-12-04, 13:40 authored by Annika Jacobsen, Andra WaagmeesterAndra Waagmeester, Rajaram KaliyaperumalRajaram Kaliyaperumal, Gregory S. Stupp, Lynn M. Schriml, Mark Thompson, Andrew I. Su, Marco Roos

Data with a comprehensible structure and context is easier to reuse and integrate with other data. The guidelines for FAIR (Find- able, Accessible, Interoperable, Reusable) data for humans and comput- ers provide handles to transform data existing in silos into well connected knowledge graphs (linked data). Semantic data models are key in this transformation and describe the logical structure of the data and the rela- tionships between the data entities. This description is provided through IRIs (Internationalized Resource Identifiers) which link to existing on- tologies and controlled vocabularies. Creating a semantic data model is a labour-intensive process, which requires a solid understanding of the selected domains and the applicable ontologies. Moreover, in order to achieve a useful degree of Interoperability between datasets, either the datasets need to use the same (set of) ontologies, or the ontologies them- selves need to be aligned and mapped. The former requires implementa- tion of extensive (social) processes to achieve consensus, while the latter requires relatively advanced semantic engineering. We argue that this poses a significant obstacle for (otherwise capable) novice data modelers and even experienced data stewards.

Here, we propose that Wikidata can be used as an intuitive re- source for resolvable IRIs both for teaching and studying semantic data modeling. In this way Wikidata serves as a hub in the linked data cloud connecting different but similar ontologies. We elaborate current prob- lems and how Wikidata can be used to tackle these. As an example we describe two genetic variant models, one generated in a workshop and one generated using Wikidata. This shows how Wikidata can be instru- mental in mapping similar concepts in different ontologies in a way that can benefitFAIR data stewardship processes in education and research.