Large-Scale Structured Sparsity via Parallel Fused Lasso on Multiple GPUs

<p>We present a massively parallel algorithm for the fused lasso, powered by a multiple number of graphics processing units (GPUs). Our method is suitable for a class of large-scale sparse regression problems on which a two-dimensional lattice structure among the coefficients is imposed. This structure is important in many statistical applications, including image-based regression in which a set of images are used to locate image regions predictive of a response variable such as human behavior. Such large datasets are increasingly common. In our study, we employ the split Bregman method and the fast Fourier transform, which jointly have a high data-level parallelism that is distinct in a two-dimensional setting. Our multi-GPU parallelization achieves remarkably improved speed. Specifically, we obtained as much as 433 times improved speed over that of the reference CPU implementation. We demonstrate the speed and scalability of the algorithm using several datasets, including 8100 samples of 512 × 512 images. Compared to the single GPU counterpart, our method also showed improved computing speed as well as high scalability. We describe the various elements of our study as well as our experience with the subtleties in selecting an existing algorithm for parallelization. It is critical that memory bandwidth be carefully considered for multi-GPU algorithms. Supplementary material for this article is available online.</p>