SR-hyperSMURF.revision1.pdf (990.59 kB)
SR-hyperSMURF.revision1.pdf
Most of state-of-the-art ML-based methods do not adopt specific
imbalance-aware learning techniques to deal with imbalanced data that
naturally arise in several genome-wide variant scoring problems, thus
resulting in a significant reduction of sensitivity and precision. We
present a novel method that adopts imbalance-aware learning strategies
based on resampling techniques and a hyper-ensemble approach that
outperforms state-of-the-art methods in two different contexts: the
prediction of non-coding variants associated with Mendelian and with
complex diseases. We show that imbalance-aware ML is a key issue for the
design of robust and accurate prediction algorithms and we provide a
method and an easy-to-use software tool that can be effectively applied
to this challenging prediction task.
History
Usage metrics
Categories
Keywords
machine Learning ClassificationSingle nucleotide variantsprediction of pathogenic variantsimbalanced classification scenariosensemble methodsDeleterious variantsMendelian diseasesPreventive MedicineMedical Genetics (excl. Cancer Genetics)Applied Computer ScienceBioinformatics SoftwarePattern Recognition and Data Mining
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC