sorry, we can't preview this file
SDM-Genomic-Testbeds
These datasets are generated from cosmic mutation dataset in COSMIC database (GRCh37, version90) with the purpose of evaluating available ontology-based Data Integration engines.They include datasets with different number of records (10k, 100k, 1 million, and 10 million records), attributes (2-15), and duplicated values (25-75 percent of duplicated records and each duplicated value being repeated 10/20 times).
The mappings consist of different complexities.
The details of generation of these datasets can be found in the papers where they have been used in empirical evaluation:
https://www.semantic-web-journal.net/system/files/swj3289.pdf
https://www.semantic-web-journal.net/system/files/swj3246.pdf
ttps://doi.org/10.1145/3340531.3412881
https://doi.org/10.1145/3477314.3507132