DARE-BigNGS : A Science Gateway Model for Scalable NGS Data Analytics Over Distributed HPCs and Clouds

We introduce the science gateway project, DARE-BigNGS, which is built upon a gateway model whose primary goal is to provide services of scalable Next-Generation Sequencing (NGS) data analytics. As use cases, the two signature pipelines for transcriptome/metagenome and somatic mutation discovery, respectively, are developed and are offered as services via the gateway. In this work, we report the core strategies, benchmark results, and technical details around how to achieve the scalability for NGS data sets intrinsically associated with challenges with ever-growing data volumes and complexity of data analysis due to errors and artifacts of the sequencing technology. Recent enhancements on user-friendly interface components of the gateway project are also described.