Dataset for: Recurrence-associated gene signature optimizes recurrence free survival prediction of colorectal cancer

High throughput gene expression profiling has showed great promise in providing insight into molecular mechanisms. Metastasis-related mRNAs may potentially enrich genes with the ability to predict cancer recurrence,therefore we attempted to build a recurrence-associated gene signature to improve prognostic prediction of colorectal cancer (CRC). We identified 2848 differentially expressed mRNAs by analyzing CRC tissues with or without metastasis. For the selection of prognostic genes, a LASSO Cox regression model was employed. Using this method, a 13-mRNA signature was identified and then validated in two independent Gene Expression Omnibus (GEO) cohorts. This classifier could successfully discriminate the high-risk patients in discovery cohort (HR = 5.27, 95%CI= 2.30-12.08, P < 0.0001). Analysis in two independent cohorts yielded consistent results (GSE14333: HR=4.55, 95%CI=2.18 – 9.508, P<0.0001) (GSE33113: HR=3.26, 95%CI=2.16 – 9.16, P=0.0176). Further analysis revealed that the prognostic value of this signature was independent of tumor stage, postoperative chemotherapy and somatic mutation. Receiver operating characteristic (ROC) analysis showed that the area under receiver operating characteristic curve (AUC) of this signature was 0.8861 and 0.8157 in the discovery and validation cohort, respectively. A nomogram was constructed for clinicians, which did well in the calibration plots. Furthermore, this 13-mRNA signature outperformed other known gene signatures, including oncotypeDX colon cancer assay. Single-sample gene-set enrichment analysis (ssGSEA) revealed that a group of pathways related to drug resistance, cancer metastasis and stemness were significantly enriched in the high-risk patients. In conclusion, this 13-mRNA signature may be a useful tool for prognostic evaluation and will facilitate personalized management of CRC patients.