Sites of Epigenetic Supersimilarity (ESS) associate with Cancer Risk

posted on 04.12.2017, 22:01 by Cristian Coarfa

Monozygotic twins have long been studied to estimate heritability and explore epigenetic influences on phenotypic variation. The phenotypic and epigenetic similarity of monozygotic twins have been assumed to be largely due to their genetic identity. By analyzing data from a genome-scale study of DNA methylation in monozygotic and dizygotic twins, we identified genomic regions at which the epigenetic similarity of monozygotic twins is substantially greater than can be explained by their genetic identity, exhibiting ‘epigenetic supersimilarity’ (ESS). We were able to show that ESS probes associate with risk of cancer using the prospective cohort compiled by the Melbourne Collaborative Cohort Study (MCCS) using DNA methylation in whole blood.


We evaluated the association between 1576 ESS CpG probes and 2247 negative control CpG probes versus risk of developing several types of cancer by mining case-control studies nested in the MCCS cohort. Specifically, we evaluated 433 case-control pairs for Breast Cancer (BC), 834 case-control pairs for Colorectal Cancer (CRC), 141 case-control pairs for Kidney Cancer (Kidney), 331 case-control pairs for Lung Cancer (Lung), 435 case-control pairs for Mature B-cell Neoplasm (MBCN), 863 case-control pairs for Prostate Cancer (PC), and 426 case-control pairs for Urothelial Cell Carcinoma (UCC). For reproducibility and further analysis by the community, we make the data available as normalized M-values.

The methylation data were background corrected and normalized based on internal control probes using the manufacturer’s background correction, using the R library minfi. We also applied subset-quantile within-array normalization (SWAN) to correct for technical discrepancies between type I and type II probes on the assay. A β-value (interpreted as percentage methylation) was calculated for each CpG site using minfi. Methylation measures with a detection p-value higher than 0.01 were considered missing. Samples with more than 5% missing values were excluded; then, CpGs that were missing for more than 20% of samples were excluded. β-values were transformed into M-values for analysis: M=log2(β/(1-β)).


