figshare
Browse
1/1
4 files

Biomarker Benchmark - GSE27854

Version 5 2016-03-17, 22:18
Version 4 2016-03-17, 21:13
Version 3 2016-02-23, 23:31
Version 2 2016-02-04, 22:09
Version 1 2016-02-02, 22:17
dataset
posted on 2016-03-17, 22:18 authored by Anna GuyerAnna Guyer, Stephen PiccoloStephen Piccolo

[NOTICE: This data set has been deprecated. Please see our new version of the data (and additional data sets) here: https://osf.io/mhk93 ]


"Purpose: The purpose of this study is to identify a novel biomarker related with distant metastases of colorectal cancer (CRC).
Experimental Design: We investigated mRNA expression profiles in 115 patients with CRC using an Affymetrix Gene Chip, and copy number profiles in 122 patients with CRC using an Affymetrix DNA Sty array. Genes in common between copy number and expression data were extracted as candidate genes. We analyzed the mRNA expression of candidate gene by quantitative reverse transcription polymerase chain reaction (RT-PCR) in 86 patients as a validation study. Furthermore, we analyzed the protein expression of candidate gene by immunohistochemical study in 269 patients, and investigated the relationship between protein expression and clinicopathologic features.
Results: By the combination of copy number analysis and gene expression analysis, We extracted 2 candidate genes related with distant metastases of CRC. Several reports show that NUCKS1, one of candidate genes, is overexpressed in several cancer tissues. But a study about the relationship between NUCKS1 and CRC is none. The mRNA expression of NUCKS1 in cancer tissues was significantly higher than those in normal tissues. Overexpression of NUCKS1 protein was associated with significantly worse relapse-free survival of CRC. Overexpression of NUCKS1 protein was an independent risk factor for recurrence of CRC.
Conclusion: The overexpression of NUCKS1 would be a new biomarker predicting recurrence after colorectal surgery."

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE27854

We have included gene-expression data, the outcome (class) being predicted, and any clinical covariates. When gene-expression data were processed in multiple batches, we have provided batch information. Each data set is organized into a file set, where each contains all pertinent files for an individual dataset. The gene expression files have been normalized using both the SCAN and UPC methods using the SCAN.UPC package in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html). We summarized the data at the gene level using the BrainArray resource (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp). We used Ensembl identifiers. The class, clinical, and batch data were hand curated to ensure consistency ("tidy data" formatting). In addition, the data files have been formatted to be imported easily into the ML-Flex machine learning package (http://mlflex.sourceforge.net/).

History

Usage metrics

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC