These datasets are generated from cosmic mutation dataset in COSMIC database (GRCh37, version90) with the purpose of evaluating available ontology-based Data Integration engines.They include datasets with different number of records (10k, 100k, 1 million, and 10 million records), attributes (2-15), and duplicated values (25-75 percent of duplicated records and each duplicated value being repeated 10/20 times).
Also, the examples of mapping rules to integrate these datasets are available in https://github.com/SDM-TIB/SDM-RDFizer-Experiments/tree/master/cikm2020/experiments/mappings