figshare
Browse
SDM-Genomic-Datasets.zip (5 GB)

SDM-Genomic-Datasets

Download (5 GB)
Version 3 2022-12-21, 15:01
Version 2 2022-01-24, 13:06
Version 1 2021-06-24, 13:36
dataset
posted on 2021-06-24, 13:36 authored by Samaneh JozashooriSamaneh Jozashoori
These datasets are generated from cosmic mutation dataset in COSMIC database (GRCh37, version90) with the purpose of evaluating available ontology-based Data Integration engines.They include datasets with different number of records (10k, 100k, 1 million, and 10 million records), attributes (2-15), and duplicated values (25-75 percent of duplicated records and each duplicated value being repeated 10/20 times).
The details of generation of these datasets can be found in the papers where they have been used in empirical evaluation: https://doi.org/10.1145/3340531.3412881 and 10.5281/zenodo.3993657
Also, the examples of mapping rules to integrate these datasets are available in https://github.com/SDM-TIB/SDM-RDFizer-Experiments/tree/master/cikm2020/experiments/mappings

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC