figshare
Browse
1/1
2 files

Collaborative Research: High Performance Low Rank Approximation for Scalable Data Analytics

Version 2 2018-04-25, 13:12
Version 1 2018-04-23, 16:47
poster
posted on 2018-04-25, 13:12 authored by R. Kannan, G. Ballard, B. Drake, H. Park

Matrix and tensor low rank approximations have been foundational tools in numerous science and engineering applications. By imposing constraints on the low rank approximations, we can model many key problems and design scalable algorithms for Big Data analytics that reach far beyond the classical science and engineering disciplines. In particular, mathematical models with nonnegative data values abound across application fields, and imposing nonnegative constraints on the low rank models make the discovered latent components interpretable and practically useful. Variants of these constraints can be designed to reflect additional characteristics of real-life data analytics problems.

The ultimate goals of this proposal are (1) to develop efficient parallel algorithms for computing nonnegative matrix and tensor factorizations (NMF and NTF) and their variants using a unified framework, and (2) to produce a software package called Parallel Low-rank Approximation with Nonnegative Constraints that delivers the high performance, flexibility, and scalability necessary to tackle the ever-growing size of today’s data sets. Although some NMF algorithms have been implemented within the enterprise data analytics ecosystem, the simplicity of the systems has sacrificed both efficiency of computation and flexibility of mathematical techniques. We plan to use tools and techniques from the high performance computing ecosystem that not only achieve efficiency and scalability on large parallel machines, but also allows for the application of a wide variety of mathematical algorithms for computing nonnegative factorizations. Our initial results demonstrate orders-of-magnitude improvements in run time over the current state of the art for a particular NMF algorithm. We propose to generalize our algorithms to NTF problems and extend the class of algorithms we can efficiently parallelize; our software framework will allow end-users to use and extend our techniques.

Funding

1642410

History