figshare
Browse

Webster Supplemental Output

Version 2 2022-01-19, 16:13
Version 1 2021-08-16, 14:48
dataset
posted on 2022-01-19, 16:13 authored by Joshua PanJoshua Pan
# Webster Supplemental Output

This repository contains data to support Pan et al., "Sparse dictionary learning recovers pleiotropy from human cell fitness screens". There are four groups of data:


## Tables (xlsx)

These are the excel files to support manuscript submission. Each file contains a Readme as the first sheet with data descriptions.

* Table S1: Webster output from genotoxic fitness screen data: dictionary matrix, loadings matrix, and annotations.

* Table S2: UMAP embedding coordinates for genotoxic fitness screen data.

* Table S3: Webster output from Cancer Dependency Map data: dictionary matrix, loadings matrix, and annotations.

* Table S4: UMAP embedding coordinates for Cancer Dependency Map data.

* Table S5: Mass spectrometry peptide counts for immunoprecipitations.

* Table S6: The maximum subcellular localization score for each of the functions learned from Cancer Dependency Map data.

* Table S7: Compound-to-function loadings for annotated compounds from PRISM primary and secondary screens.



## depmap (tsv)

These are flat files that are the basis for the Tables above, and represent the raw input and outputs of Webster.

* depmap_cell_line_info

Annotations for each cell line.

* depmap_dictionary

Webster dictionary matrix inferred from fitness data.

* depmap_fn_annot_gprofiler

Annotations derived from gProfiler using gene loadings on each function.

* depmap_fn_biomarkers

Random forest modeling results using cell line features to predict the fitness effect of ecah function in the dictionary.

* depmap_fn_manual_name

Manual name for each function, derived from above resources.

* depmap_fn_subcell_raw_matrix

Matrix cross product between Go et al. localization scores, and our gene-to-function loadings.

* depmap_fn_subcell

The subcell localization information used for coloring functions in the global embedding.

* depmap_gene_loadings

Webster gene-to-function loadings matrix, inferred from fitness data.

* depmap_gene_meta

Gene-centric information and useful links.

* depmap_input

Pre-processed Cancer Dependency Map (DepMap) data that is the input to Webster.

* depmap_umap

Embedding coordinates.



## genotoxic (tsv)

Same structure as above, but for Webster input from the smaller genotoxic fitness dataset (Olivieri et al 2020)

* genotoxic_dictionary

* genotoxic_gene_loadings

* genotoxic_gene_meta

* genotoxic_input

* genotoxic_umap


## prism (tsv)

Results of projecting PRISM screening data (Corsello et al 2020) into a latent space inferred from Depmap data.

* prism_embedding

Same as depmap_umap above, except with the addition of selected compounds into the embedding, as well sa compound meta information useful for labeling the plot.

* prism_primary_imputed

Input for projection into the Webster latent space. This is preprocessed and filtered for high-variance, well annotated compounds.

* prism_primary_meta

Compound annotations for primary screen data.

* prism_primary_omp

Compound-to-function loadings learned by Orthogonal Matching Pursuit.

* prism_primary_proj_results

Summary statistics for projection results.

* prism_secondary_imputed

Input for projection into the Webster latent space. This is preprocessed and filtered for high-variance, well annotated compounds, treated at many doses.

* prism_secondary_meta

Compound annotations for secondary screen data.

* prism_secondary_omp

Compound-to-function loadings learned by Orthogonal Matching Pursuit.

* prism_secondary_proj_results

Summary statistics for projection results.



Funding

CA176058

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC