This repository contains data to support Pan et al., "Sparse dictionary learning recovers pleiotropy from human cell fitness screens". There are four groups of data:
## Tables (xlsx)
These are the excel files to support manuscript submission. Each file contains a Readme as the first sheet with data descriptions.
* Table S1: Webster output from genotoxic fitness screen data: dictionary matrix, loadings matrix, and annotations.
* Table S3: Webster output from Cancer Dependency Map data: dictionary matrix, loadings matrix, and annotations.
* Table S4: UMAP embedding coordinates for Cancer Dependency Map data.
* Table S5: Mass spectrometry peptide counts for immunoprecipitations.
* Table S6: The maximum subcellular localization score for each of the functions learned from Cancer Dependency Map data.
* Table S7: Compound-to-function loadings for annotated compounds from PRISM primary and secondary screens.
## depmap (tsv)
These are flat files that are the basis for the Tables above, and represent the raw input and outputs of Webster.
* depmap_cell_line_info
Annotations for each cell line.
* depmap_dictionary
Webster dictionary matrix inferred from fitness data.
* depmap_fn_annot_gprofiler
Annotations derived from gProfiler using gene loadings on each function.
* depmap_fn_biomarkers
Random forest modeling results using cell line features to predict the fitness effect of ecah function in the dictionary.
* depmap_fn_manual_name
Manual name for each function, derived from above resources.
* depmap_fn_subcell_raw_matrix
Matrix cross product between Go et al. localization scores, and our gene-to-function loadings.
* depmap_fn_subcell
The subcell localization information used for coloring functions in the global embedding.
* depmap_gene_loadings
Webster gene-to-function loadings matrix, inferred from fitness data.
* depmap_gene_meta
Gene-centric information and useful links.
* depmap_input
Pre-processed Cancer Dependency Map (DepMap) data that is the input to Webster.
* depmap_umap
Embedding coordinates.
## genotoxic (tsv)
Same structure as above, but for Webster input from the smaller genotoxic fitness dataset (Olivieri et al 2020)
* genotoxic_dictionary
* genotoxic_gene_loadings
* genotoxic_gene_meta
* genotoxic_input
* genotoxic_umap
## prism (tsv)
Results of projecting PRISM screening data (Corsello et al 2020) into a latent space inferred from Depmap data.
* prism_embedding
Same as depmap_umap above, except with the addition of selected compounds into the embedding, as well sa compound meta information useful for labeling the plot.
* prism_primary_imputed
Input for projection into the Webster latent space. This is preprocessed and filtered for high-variance, well annotated compounds.
* prism_primary_meta
Compound annotations for primary screen data.
* prism_primary_omp
Compound-to-function loadings learned by Orthogonal Matching Pursuit.
* prism_primary_proj_results
Summary statistics for projection results.
* prism_secondary_imputed
Input for projection into the Webster latent space. This is preprocessed and filtered for high-variance, well annotated compounds, treated at many doses.
* prism_secondary_meta
Compound annotations for secondary screen data.
* prism_secondary_omp
Compound-to-function loadings learned by Orthogonal Matching Pursuit.