figshare
Browse
1/1
6 files

TCGA Pan-Cancer sample, expression, and mutation data for Project Cognoma

Version 7 2018-04-09, 17:44
Version 6 2017-01-04, 21:41
Version 5 2016-09-29, 22:22
Version 4 2016-08-25, 18:58
Version 3 2016-08-23, 21:09
Version 2 2016-07-15, 23:48
Version 1 2016-07-15, 18:14
dataset
posted on 2018-04-09, 17:44 authored by Daniel HimmelsteinDaniel Himmelstein, Gregory Way, Claire McLeod, Stephen Shank, Casey Greene
The following datasets were created for Project Cognoma:

`expression-matrix.tsv.bz2` is a sample × gene matrix indicating a gene's expression level for a given sample. This dataset will be the feature/x/predictor information for Project Cognoma.

`expression-genes.tsv` provides information and summary statistics for every gene in `expression-matrix.tsv.bz2`.

`mutation-matrix.tsv.bz2` is a sample × gene matrix indicating whether a gene is mutated for a given sample. Select columns (or unions of several columns) in this dataset will be the status/y/outcome for Project Cognoma.

`mutation-genes.tsv` provides information and summary statistics for every gene in `mutation-matrix.tsv.bz2`.

`samples.tsv` is a sample × attribute matrix providing sample information and clinical measures for each sample.

`covariates.tsv` is a sample × attribute matrix for modeling that encodes categorical variables in samples.tsv using dummies.

All datasets contain the same samples as rows (in the same order). No two samples correspond to the same patient.

The data was retrieved from the UCSC Xena Browser.

These datasets were created by the GitHub repository commit below. See the download directory of the cancer-data repository for metadata files with the version info for the Xena downloads this release is based on.

See the data/subset directory of the cancer-data repository on GitHub to browse small subsets of the expression and mutation datasets.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC