CHETAH tumor reference and input data

This upload consists of a dataset that can be used as a CHETAH reference to classify any single-cell RNA-sequencing dataset, but is made for tumor micro-environment data.

CHETAH_TME_reference.Rdata is a dataset that consists of four pubically available (see below) tumor micro-environment single-cell RNA-sequencing datasets. All data was downloaded in the TPM or FKPM format. Each dataset was normalized to 400.000 reads per cell and all values transformed into the log2 space. Only the genes that were present in all datasets were kept.

ribosomal.txt contains ribosomal protein genes from GO-term 'cytosolic ribosome' 0022626, which could be discarded from a CHETAH reference dataset.

The data of the CHETAH_tumor_reference were obtained from the following publications:

Puram, S. V. et al. Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer. Cell 171, 1611–1624.e24 (2017). GEO accession: GSE103322

Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science (80-. ). 352, 189–196 (2016). GEO accession: GSE72056

Chung, W. et al. Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat. Commun. 8, 1–12 (2017). GEO accession: GSE75688

Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017). GEO accession: GSE81861

The ovarian ascites data was obtained from the following publication:

Schelker, M. et al. Estimation of immune cell content in tumour tissue using single-cell RNA-seq data. Submitted 127001 (2017). doi:10.1101/127001. Data is available from: https://figshare.com/s/711d3fb2bd3288c8483