Integrating DNA Methylation and Gene Expression data in Placenta Tissue to Predict Childhood Obesity
Recent advances in genomic technologies have made it feasible to measure, on the same individual, multiple types of genomic activity such as genotypes, gene expression, DNA copy number, methylation and microRNA expression. However, in order to benefit from the increasing amounts of heterogeneous data and to obtain a more complete view of genomic functions, there is a great need for statistical and computationally efficient methods that allow us to combine this information in an intelligent way. Challenges with prediction models in this setting arise from the high-dimensional non-linear nature of the data, the large number of measurements compared to the few samples for whom they are collected, and the presence of complex interactions between the different types of data. Methods such as sparse regression, hierarchical clustering and principal component analysis can address any one of these challenges, but can not do so simultaneously. Kernel methods, which use matrices measuring the similarity between two individuals, offer a powerful way of simultaneously addressing these challenges without significantly increasing the computational burden. In this work, we investigate the benefits and challenges that arise from using kernel methods in the context of integrating DNA methylation, gene expression and phenotypic data in a sample of mother-child pairs from a prospective birth cohort. The goal of this study is to identify epigenetic marks observed at birth that help predict childhood obesity.