Biomarker Benchmark - GSE38958

Version 6 2016-03-17, 22:17

Version 5 2016-03-16, 16:40

Version 4 2016-02-23, 23:22

Version 3 2016-02-22, 17:56

Version 2 2016-02-04, 21:53

Version 1 2016-02-02, 22:29

dataset

posted on 2016-03-17, 22:17 authored by Anna GuyerAnna Guyer, Stephen PiccoloStephen Piccolo

[NOTICE: This data set has been deprecated. Please see our new version of the data (and additional data sets) here: https://osf.io/mhk93 ]

"Idiopathic pulmonary fibrosis (IPF) is a specific form of chronic, progressive fibrosing interstitial disease of unknown cause. It remains impractical to conduct early diagnosis and predict IPF progression just based on gene expression information. Moreover, the relationship between gene expression and quantitative phenotypic value in IPF keeps controversial. To identify biomarkers to predict survival in IPF, we profiled protein-coding gene expression in peripheral blood mononuclear cells (PBMCs). We linked the gene expression level with the quantitative phenotypic variation in IPF, including diffusing capacity of the lung for carbon monoxide (DLCO) and forced vital capacity (FVC) percent predicted. In silico analyses on the expression profiles and quantitative phenotypic data allowed for the generation of a set of IPF molecular signature that predicted survival of IPF effectively."

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE38958

We have included gene-expression data, the outcome (class) being predicted, and any clinical covariates. When gene-expression data were processed in multiple batches, we have provided batch information. Each data set is organized into a file set, where each contains all pertinent files for an individual dataset. The gene expression files have been normalized using both the SCAN and UPC methods using the SCAN.UPC package in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html). We summarized the data at the gene level using the BrainArray resource (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp). We used Ensembl identifiers. The class, clinical, and batch data were hand curated to ensure consistency ("tidy data" formatting). In addition, the data files have been formatted to be imported easily into the ML-Flex machine learning package (http://mlflex.sourceforge.net/).