figshare
Browse

Machine learning approach yields epigenetic biomarkers of food allergy: A novel 13-gene signature to diagnose clinical reactivity

Posted on 2019-06-19 - 17:34

Background

Current laboratory tests are less than 50% accurate in distinguishing between people who have food allergies (FA) and those who are merely sensitized to foods, resulting in the use of expensive and potentially dangerous Oral Food Challenges. This study presents a purely-computational machine learning approach, conducted using DNA Methylation (DNAm) data, to accurately diagnose food allergies and potentially find epigenetic targets for the disease.

Methods and results

An unbiased feature-selection pipeline was created that narrowed down 405,000+ potential CpG biomarkers to 18. Machine-learning models that utilized subsets of this 18-feature aggregate achieved perfect classification accuracy on completely hidden test cohorts (on an 8-fold hidden dataset). Ensemble classification was also shown to be effective for this High Dimension Low Sample Size (HDLSS) DNA methylation dataset. The efficacy of these machine learning classifiers and the 18 CpGs was further validated by their high accuracy on a large number of hidden data permutations, where the samples in the training, cross-validation, and hidden sets were repeatedly randomly allocated. The 18-CpG signature mapped to 13 genes, on which biological insights were collected. Notably, many of the FA-discriminating genes found in this study were strongly associated with the immune system, and seven of the 13 genes were previously associated with FA.

Conclusions

Previous studies have also created highly-accurate classifiers for this dataset, using both data-driven and a priori biological insights to construct a 96-CpG signature. This research builds on previous work because it uses a completely computational approach to obtain a perfect classification accuracy while using only 18 highly discriminating CpGs (0.005% of the total available features). In machine learning, simpler models, as used in this study, are generally preferred over more complex ones (other things being equal). Lastly, the completely data-driven methodology presented in this research eliminates the need for a priori biological information and allows for generalizability to other DNAm classification problems.

CITE THIS COLLECTION

DataCite
3 Biotech
3D Printing in Medicine
3D Research
3D-Printed Materials and Systems
4OR
AAPG Bulletin
AAPS Open
AAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)
Academic Medicine
Academic Pediatrics
Academic Psychiatry
Academic Questions
Academy of Management Discoveries
Academy of Management Journal
Academy of Management Learning and Education
Academy of Management Perspectives
Academy of Management Proceedings
Academy of Management Review
or
Select your citation style and then place your mouse over the citation text to select it.

SHARE

email
need help?