figshare
Browse

Processed AnnData objects for GeneTrajectory inference (Gene Trajectory Inference for Single-cell Data by Optimal Transport Metrics)

dataset
posted on 2024-04-04, 07:17 authored by Rihao QuRihao Qu, Francesco Strino

These are processed AnnData objects (converted from Seurat objects) for GeneTrajectory tutorials (https://github.com/KlugerLab/GeneTrajectory-python/):

Human myeloid dataset analysis

Myeloid cells were extracted from a publicly available 10x scRNA-seq dataset (https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3). QC was performed using the same workflow in (https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R). After standard normalization, highly-variable gene selection and scaling using the Seurat R package, we applied PCA and retained the top 30 principal components. Four sub-clusters of myeloid cells were identified based on Louvian clustering with a resolution of 0.3. Wilcoxon rank-sum test was employed to find cluster-specific gene markers for cell type annotation.

For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel, each bandwidth is determined by the distance to its k-nearest neighbor, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 5 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 0.5% − 75% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (11,21,8) to extract three gene trajectories.


Mouse embryo skin data analysis

We separated out dermal cell populations from the newly collected mouse embryo skin samples. Cells from the wildtype and the Wls mutant were pooled for analyses. After standard normalization, highly-variable gene selection and scaling using Seurat, we applied PCA and retained the top 30 principal components. Three dermal celltypes were stratified based on the expression of canonical dermal markers, including Sox2, Dkk1, and Dkk2. For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel bandwidth, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 10 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 1% − 50% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (9,16,5) to sequentially extract three gene trajectories. To compare the differences between the wiltype and the Wls mutant, we stratified Wnt-active UD cells into seven stages according to their expression profiles of the genes binned along the DC gene trajectory.

Funding

Delineating spatiotemporal dynamics of hair follicle dermal niche specification at the single-cell level

National Institute of Arthritis and Musculoskeletal and Skin Diseases

Find out more...

EFFICIENT METHODS FOR CALIBRATION, CLUSTERING, VISUALIZATION AND IMPUTATION OF LARGE scRNA-seq DATA

National Institute of General Medical Sciences

Find out more...

The Y-SCORCH Data Generation Center at Yale for Single-Cell Opioid Responses in the Context of HIV

National Institute on Drug Abuse

Find out more...

Yale TMC for Cellular Senescence in Lymphoid Organs

National Institute on Aging

Find out more...

Yale SPORE in Skin Cancer

National Cancer Institute

Find out more...

M-SCORCH: Methamphetamine use disorder data generation center for Single Cell Opioid Responses in the Context of HIV

National Institute on Drug Abuse

Find out more...

Yale Murine-TMC on Immune Cell Senescence Derived Inflammation

National Institute on Aging

Find out more...

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC