ACCELERATING BIOINFORMATICS WORKLOADS USING GPUS
Bioinformatics workloads are important, as understanding genomic data is the first step toward understanding life itself. Genomic data continues to grow with (1) increasing numbers of fully sequenced genomes, (2) availability of genome assemblies of more complex life forms, and (3) faster, cheaper ways to capture accurate genomic data. Consequently, there is an urgent demand for faster processing of the rapidly growing genomic data. Recognizing this need, several efforts have explored specialized hardware—application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and large multinode clusters. Yet the steep fabrication costs of ASICs, the engineering overhead of FPGAs, and the capital investment required for distributed infrastructures keep these solutions out of reach for most genomics laboratories, leaving the broader community without practical access to such accelerators.
In my work, I design and demonstrate successful acceleration strategies for two important bioinformatics workloads—Whole Genome Alignment (WGA) and protein homology search using local alignment (BLAST)—by exploiting off-the-shelf GPUs, which makes it immediately accessible to genomics users.
The first contribution of my work is the design of a novel, GPU-accelerated Whole Genome Alignment (WGA) software that matches LASTZ (sequential ISO software) in sensitivity. FastZ accelerates the high-sensitivity gapped alignment, which uses a slow dynamic programming (DP) algorithm, by employing a novel inspector–executor scheme in which (a) the lightweight inspector elides DP traceback except in common, extremely short alignments, where the inspector performs limited, eager traceback to eliminate the executor, and (b) executor trimming avoids unnecessary work. Further, FastZ employs register-based cyclic-use-and-discard buffering to drastically reduce memory traffic and groups DP problems by size for load balance. FastZ achieves 43×, 93×, and 111× speedups over LASTZ on the Pascal, Volta, and Ampere GPUs, respectively.
The second contribution of my work is the design of BLAZE, a GPU-accelerated framework for accelerating the popular Basic Local Alignment Search Tool (BLAST) in protein homology searches. Unlike prior approaches that relax BLAST’s stringent filtering procedures, BLAZE preserves all of BLAST’s sequential filtering stages to maintain the high sensitivity necessary for discarding over 99% of sequences that do not produce high-scoring alignments. BLAZE maps all critical BLAST stages to the GPU—seed-hit discovery, two-hit filtration, ungapped extension, and semi-gapped extension. BLAZE accelerates BLAST by (a) leveraging SIMT-friendly hybrid parallelism that avoids the pitfalls of previous efforts of parallelizing BLASTP on GPUs, (b) minimizing SIMT divergence through a sorting-and-binning strategy that enables optimized kernel launches and improved GPU occupancy, and (c) efficiently implementing the semi-gapped extension algorithm on GPUs by using shared memory to store intermediate values that would otherwise reside in slow global memory. By addressing these challenges and learning from previous GPU-based implementations, BLAZE achieves an average 18.2× speedup over standard BLAST, a 1.9× speedup over 16-threaded BLAST, and a 4.8× speedup compared to existing GPU-accelerated BLAST implementations on an Ampere GPU. This work illustrates how preserving rigorous filtering in tandem with an architecture-aware parallelization strategy can significantly advance GPU-based protein sequence analysis.
In summary, my work enables the use of readily available commodity GPUs to accelerate important bioinformatics workloads, eliminating the need for costly specialized hardware.
History
Degree Type
- Doctor of Philosophy
Department
- Electrical and Computer Engineering
Campus location
- West Lafayette