Alevin-fry.pdf (2.12 MB)
Accurate, efficient, and uncertainty-aware expression quantification of single-cell RNA-seq data
poster
posted on 2020-11-05, 19:42 authored by Hirak Sarkar, Avi Srivastava, Mohsen ZakeriMohsen Zakeri, Scott Van Buren, Naim U Rashid, Michael Love, Rob PatroRob PatroThe rapid growth in the generation of single-cell RNA-seq (scRNA-seq) data highlights the need for scalable computational platforms to extract useful information from this data, such as gene expression estimates and the corresponding uncertainty information, that can be used in downstream applications. We present a flexible time- and memory-efficient framework for processing various types of scRNA-seq data that accounts for multi-mapping sequencing reads and that estimates the quantification uncertainty inherent in the gene counts. This uncertainty arises from gene-ambiguous UMIs that are particularly problematic, as they tend not to arise randomly, but instead arise preferentially from sequence-similar gene families. Alevin, and the new extension, alevin-fry, support a principled approach for estimating this expression uncertainty using a cell-level bootstrapping procedure. Alevin-fry is a framework that processes the mapping information generated by alevin, constructs an intermediate cell-level representation of the interactions among reads, UMIs, and genes called parsimonious UMI graphs (PUGs), and exposes multiple strategies, ranging from trivial to sophisticated, for resolving PUGs into gene-level counts. We observe that alevin and alevin-fry are capable of processing tagged-end single-cell data accurately, quickly and with very low memory requirements (usually ~2GB).
Alevin is written in C++14 and is available as part of salmon at https://github.com/COMBINE-lab/salmon.
Alevin-fry is written in Rust and is available at https://github.com/COMBINE-lab/alevin-fry.