figshare
Browse
Culhane_2023_Correspondence.pdf (3.9 MB)

Correspondence analysis for dimension reduction, batch integration, and visualization of single‑cell RNA‑seq data

Download (3.9 MB)
journal contribution
posted on 2023-02-07, 16:09 authored by Lauren L. Hsu, Aedin CulhaneAedin Culhane

Effective dimension reduction is essential for single cell RNA-seq (scRNAseq) analysis. Principal  component analysis (PCA) is widely used, but requires continuous, normally-distributed data;  therefore, it is often coupled with log-transformation in scRNAseq applications, which can distort  the data and obscure meaningful variation. We describe correspondence analysis (CA), a count-based  alternative to PCA. CA is based on decomposition of a chi-squared residual matrix, avoiding distortive  log-transformation. To address overdispersion and high sparsity in scRNAseq data, we propose fve  adaptations of CA, which are fast, scalable, and outperform standard CA and glmPCA, to compute  cell embeddings with more performant or comparable clustering accuracy in 8 out of 9 datasets. In  particular, we fnd that CA with Freeman–Tukey residuals performs especially well across diverse  datasets. Other advantages of the CA framework include visualization of associations between genes  and cell populations in a “CA biplot,” and extension to multi-table analysis; we introduce corralm for  integrative multi-table dimension reduction of scRNAseq data. We implement CA for scRNAseq data  in corral, an R/Bioconductor package which interfaces directly with single cell classes in Bioconductor.  Switching from PCA to CA is achieved through a simple pipeline substitution and improves dimension  reduction of scRNAseq datasets 

Funding

CZF2019-002443

T32GM135117

History

Publication

Scientific Reports 13, 1197

Publisher

Nature portfolio

Also affiliated with

  • Health Research Institute (HRI)

Department or School

  • School of Medicine

Usage metrics

    University of Limerick

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC