Webster Supplemental Output

dataset

posted on 2022-01-19, 16:13 authored by Joshua PanJoshua Pan

# Webster Supplemental Output

This repository contains data to support Pan et al., "Sparse dictionary learning recovers pleiotropy from human cell fitness screens". There are four groups of data:

## Tables (xlsx)

These are the excel files to support manuscript submission. Each file contains a Readme as the first sheet with data descriptions.

* Table S1: Webster output from genotoxic fitness screen data: dictionary matrix, loadings matrix, and annotations.

* Table S2: UMAP embedding coordinates for genotoxic fitness screen data.

* Table S3: Webster output from Cancer Dependency Map data: dictionary matrix, loadings matrix, and annotations.

* Table S4: UMAP embedding coordinates for Cancer Dependency Map data.

* Table S5: Mass spectrometry peptide counts for immunoprecipitations.

* Table S6: The maximum subcellular localization score for each of the functions learned from Cancer Dependency Map data.

* Table S7: Compound-to-function loadings for annotated compounds from PRISM primary and secondary screens.

## depmap (tsv)

These are flat files that are the basis for the Tables above, and represent the raw input and outputs of Webster.

* depmap_cell_line_info

Annotations for each cell line.

* depmap_dictionary

Webster dictionary matrix inferred from fitness data.

* depmap_fn_annot_gprofiler

Annotations derived from gProfiler using gene loadings on each function.

* depmap_fn_biomarkers

Random forest modeling results using cell line features to predict the fitness effect of ecah function in the dictionary.

* depmap_fn_manual_name

Manual name for each function, derived from above resources.

* depmap_fn_subcell_raw_matrix

Matrix cross product between Go et al. localization scores, and our gene-to-function loadings.

* depmap_fn_subcell

The subcell localization information used for coloring functions in the global embedding.

* depmap_gene_loadings

Webster gene-to-function loadings matrix, inferred from fitness data.

* depmap_gene_meta

Gene-centric information and useful links.

* depmap_input

Pre-processed Cancer Dependency Map (DepMap) data that is the input to Webster.

* depmap_umap

Embedding coordinates.

## genotoxic (tsv)

Same structure as above, but for Webster input from the smaller genotoxic fitness dataset (Olivieri et al 2020)

* genotoxic_dictionary

* genotoxic_gene_loadings

* genotoxic_gene_meta

* genotoxic_input

* genotoxic_umap

## prism (tsv)

Results of projecting PRISM screening data (Corsello et al 2020) into a latent space inferred from Depmap data.

* prism_embedding

Same as depmap_umap above, except with the addition of selected compounds into the embedding, as well sa compound meta information useful for labeling the plot.

* prism_primary_imputed

Input for projection into the Webster latent space. This is preprocessed and filtered for high-variance, well annotated compounds.

* prism_primary_meta

Compound annotations for primary screen data.

* prism_primary_omp

Compound-to-function loadings learned by Orthogonal Matching Pursuit.

* prism_primary_proj_results

Summary statistics for projection results.

* prism_secondary_imputed

Input for projection into the Webster latent space. This is preprocessed and filtered for high-variance, well annotated compounds, treated at many doses.

* prism_secondary_meta

Compound annotations for secondary screen data.

* prism_secondary_omp

Compound-to-function loadings learned by Orthogonal Matching Pursuit.

* prism_secondary_proj_results

Summary statistics for projection results.

Funding

CA176058

History

Usage metrics

Keywords

pleiotropy CRISPR-Cas9 genetic networks Dictionary Learning Representation Learning Sparse approximation matrix factorization Gene Function Genetics Genomics

Licence

CC BY 4.0