figshare
Browse
modpathol201526a-Cancer MolClass.pdf (4.11 MB)

Functional Mutation Signatures of 5 Cancer Differentiation Subtypes and Epithelial Tumor Grading: Utility of Exome Sequence Data and Random Forest Analysis

Download (0 kB)
journal contribution
posted on 2015-03-30, 19:12 authored by Salvador Diaz-Cano, Russel Sutherland, Alfredo Blanes, Jane Moorhead, Richard Dobson, Salvador J. Diaz-CanoSalvador J. Diaz-Cano

Background: Cancer is morphologically and genetically heterogeneous, and the Pan-Cancer Analysis Project aimed to identify the genomic changes present in 12 cancer types from the Cancer Genome Atlas (TCGA) set. We aimed to identify predictors of the five main differentiation subtypes in the Pan-Cancer Analysis and a general molecular grading for epithelial malignancies. Design: Whole exome sequencing was performed on tumor and normal tissue samples from 3129 patients enabling the identification of cancer-related mutations in each patient. Clinical data were also collected for all patients, including gender and age. We used a Random Forest machine learning approach to compare the five differentiation subtypes in a pairwise fashion and a low (grade 1-2) vs. high (grade 3-4) grade. Functional somatic mutations unique to tumors were identified and represented as samples x genes mutation matrix (mutated=1, non-mutated=0). ► Pairwise Random Forest models were built for the five cancer differentiation subtypes (Adenocarcinoma, Squamous, Urothelial, Brain, Hematological) ► Recursive feature selection was used in a 5x10 fold cross-validation design. Random Forest models were based on the training set using the caret package in R ► Predictive accuracy of each model was measured in an independent test set. Results: We were able to discriminate between Bladder Urothelial Cancer and Acute Myeloid Leukemia in unseen samples with 87.8% accuracy (95% CI : (78.71, 93.99), Specificity 0.77 Sensitivity 0.94 AUC 0.9677) 5 binary predictors of tumor class membership of Urothelial or Hematological subtypes and epithelial grading (low vs. high). Average accuracy in the other comparisons is 85.0% ► Using a 5 variable random forest ensemble classification model (MLL3, TTN, ARID1A, MLL2, FRG1B), we achieved near 90% accuracy in class assignment task (after excluding any statistically significant gender and age bias between the high grade and low grade tumors) and identifying high grade carcinomas (glandular, squamous and urothelial). Grade-predictive genes included ARID1A, CTCF, CTNNB1, PIK3CA, PIK3R1, PTEN, and TP53. Conclusion: Exome sequence provides reliable data to classify common malignancies and may be clinically useful for the subtyping of malignancies and grading of carcinomas in case of limited tissue (i.e. biopsies). We now aim to produce a 5-class subtyping and 2-tier grading prediction models to assign test samples prospectively.

History