31 files

DepMap 20Q4 Public

posted on 18.12.2020, 05:27 by Broad DepMap

This dataset contains the results of Avana library CRISPR-Cas9 genome-scale knockout (prefixed with Achilles) as well as mutation, copy number and gene expression data (prefixed with CCLE) for cancer cell lines as part of the Broad Institute’s Cancer Dependency Map project. We have repackaged our fileset to include all quarterly-updating datasets produced by DepMap.

The Avana CRISPR-Cas9 genome-scale knockout data has expanded to include 808 cell lines, the RNAseq data includes 1376 cell lines, and the copy number data includes 1753 cell lines. Please see the README files for details regarding data processing pipeline procedures updates.

As our screening efforts continue, we will be releasing additional cancer dependency data on a quarterly basis for unrestricted use. For the latest datasets available, further analyses, and to subscribe to our mailing list visit https://depmap.org.

Descriptions of the experimental methods and the CERES algorithm are published in http://dx.doi.org/10.1038/ng.3984. Some cell lines were process using copy number data based on the Sanger Institute whole exome sequencing data (COSMIC: http://cancer.sanger.ac.uk.cell_lines, EGA accession number: EGAD00001001039) reprocessed using CCLE pipelines. A detailed description of the pipelines and tool versions for CCLE expression can be found here: https://github.com/broadinstitute/gtex-pipeline/blob/v9/TOPMed_RNAseq_pipeline.md.

v2 changes: The README file in version 1 for the Expression pipeline. Version 2 corrects this.

v3 changes: We have discovered several issues in the DepMap 20Q4 copy number data, described at https://forum.depmap.org/t/important-update-issues-with-depmap-20q4-data/344. These issues also affected the CRISPR (Avana) 20Q4 CERES data, which is dependent on copy number information. This version reverts the affected files to their previous versions which does not have these issues.

v4 changes:We have discovered additional issues with several of the gene expression files in the 20Q4 dataset which we will correct shortly. The issues, described in more detail below, affected the files CCLE_expression_full, CCLE_RNAseq_reads, and CCLE_RNAseq_transcripts. Note that these issues are independent of the problems with the copy number pipeline described above. Hence, these files were not reverted to their previous versions, and we are correcting them in the 20Q4 dataset.

CCLE_expression_full and CCLE_RNAseq_reads: Underlying datasets were swapped.

CCLE_RNAseq_transcripts: Data for the cell line ACH-000561 was incorrectly normalized in this dataset.

Note that, starting in 20Q4, the expression datasets use log(x+1) transformation for all TPM values (including now the CCLE_RNAseq_transcripts file), and no transformation for expected counts. See README for more details.