figshare
Browse
1/1
4 files

Biomarker Benchmark - GSE46449

Version 6 2016-03-17, 22:16
Version 5 2016-03-16, 16:23
Version 4 2016-02-23, 23:21
Version 3 2016-02-22, 16:59
Version 2 2016-02-04, 21:45
Version 1 2016-02-02, 22:41
dataset
posted on 2016-03-17, 22:16 authored by Anna GuyerAnna Guyer, Stephen PiccoloStephen Piccolo

[NOTICE: This data set has been deprecated. Please see our new version of the data (and additional data sets) here: https://osf.io/mhk93 ]


"There are currently no biological tests that differentiate patients with bipolar disorder (BPD) from healthy controls. While there is evidence that peripheral gene expression differences between patients and controls can be utilized as biomarkers for psychiatric illness, it is unclear whether current use or residual effects of antipsychotic and mood stabilizer medication drives much of the differential transcription. We therefore tested whether expression changes in first-episode, never-medicated bipolar patients, can contribute to a biological classifier that is less influenced by medication and could potentially form a practicable biomarker assay for BPD.
We employed microarray technology to measure global leukocyte gene expression in first-episode (n=3) and currently medicated BPD patients (n=26), and matched healthy controls (n=25). Following an initial feature selection of the microarray data, we developed a cross-validated 10-gene model that was able to correctly predict the diagnostic group of the training sample (26 medicated patients and 12 controls), with 89% sensitivity and 75% specificity (p<0.001). The 10-gene predictor was further explored via testing on an independent test cohort consisting of three pairs of monozygotic twins discordant for BPD, plus the original enrichment sample cohort (the three never-medicated BPD patients and 13 matched control subjects), and a sample of experimental replicates (n=34). 83% of the independent test sample was correctly predicted, with a sensitivity of 67% and specificity of 100% (although this result did not reach statistical significance). Additionally, 88% of sample diagnostic classes were classified correctly for both the enrichment (p=0.015) and the replicate samples (p<0.001)."

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46449

We have included gene-expression data, the outcome (class) being predicted, and any clinical covariates. When gene-expression data were processed in multiple batches, we have provided batch information. Each data set is organized into a file set, where each contains all pertinent files for an individual dataset. The gene expression files have been normalized using both the SCAN and UPC methods using the SCAN.UPC package in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html). We summarized the data at the gene level using the BrainArray resource (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp). We used Ensembl identifiers. The class, clinical, and batch data were hand curated to ensure consistency ("tidy data" formatting). In addition, the data files have been formatted to be imported easily into the ML-Flex machine learning package (http://mlflex.sourceforge.net/).

History

Usage metrics

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC