figshare
Browse
1/1
3 files

Self-contained ground-truths for cross-domain linkage

dataset
posted on 2016-04-28, 04:48 authored by Mayank KejriwalMayank Kejriwal

Cross-domain knowledge bases such as DBpedia, Freebase and YAGO have emerged as encyclopedic hubs in the Web of Linked Data. Despite enabling several practical applications in the Semantic Web, the large-scale, schema-free nature of such graphs often precludes research groups from employing them widely as evaluation test cases for entity resolution and instance-based ontology alignment applications. Although the ground-truth linkages between the three knowledge bases above are available, they are not amenable to resource-limited applications. One reason is that the ground-truth files are not self-contained, meaning that a researcher must usually perform a series of expensive joins (typically in MapReduce) to obtain usable information sets.


We constructed this resource by uploading several publicly licensed data resources to the public cloud and used simple Hadoop clusters to compile, and make accessible, three cross-domain self-contained test cases involving linked instances from DBpedia, Freebase and YAGO. Self-containment is enabled by virtue of a simple NoSQL JSON-like serialization format. Potential applications for these resources, particularly related to testing transfer learning research hypotheses, are described in more detail in a paper submission in the resource track at ISWC 2016.   

History