cssi.pdf (95.37 kB)

Elements: Transformation-Based High-Performance Computing in Dynamic Languages

figure

posted on 2020-02-04, 15:46 authored by Andreas KloecknerAndreas Kloeckner

A key capability in technical computing is the processing of large, regularly-shaped arrays of numbers by a wide variety of different processes. This facility is foundational in, for example, weather prediction, artificial intelligence, and image processing. Correspondingly, modern computing hardware has evolved advanced capabilities for carrying out such computations with high efficiency. Unfortunately, the process of adapting a desired process to a given piece of hardware thus far is costly, laborious, and error-prone. Differences of a factor of 50 in performance between a naive realization and a careful one is the rule, rather than the exception. Loopy, the subject of this project, attacks this problem by using human-guided, automated program rewriting. Loopy has demonstrated application impact in a number of applications ranging from the simulation of natural and engineering phenomena to neuroscience, where it has helped its users achieve higher performance with less effort. The present proposal concerns several important improvements that will contribute to making Loopy more effective and easier to apply, through enlarging the class of programs that Loopy can transform, improving the means by which Loopy represents on-chip communication, and permitting it to realize important basic operations that routinely pose difficulty in efficient implementation. An important component of the effort is making Loopy itself easy to use for its user community, through the realization of an interactive user interface, so that program transformations can be applied with the click of a mouse, rather than by writing computer code. The proposed advances will be demonstrated through a sample workload that is emblematic of many of the computational and software challenges faced in technical computing today.

Multidimensional arrays (sometimes called 'tensors') are a foundational data structure for much of scientific computing, with applications ranging from weather prediction to deep learning, to image processing and computational neuroscience. Even the efficient execution of one of the simplest operations on arrays, matrix-matrix multiplication, poses considerable technical challenges on modern computers. Through a polyhedrally-based program transformation tool, the proposed software will provide separation between mathematical intent and the technical challenges of program optimization, allowing each task to be performed by a domain expert. In the proposed project, the PI will develop means for more efficient on-chip communication, code generation for prefix sums, reuse and abstraction in program transformation, increasing the ease of use in transformation discovery and performance analysis, and for expressing array computations in user programs. The PI will validate the proposed techniques through a challenging application with broad applicability. The intellectual merit of the proposed research lies in (1) mapping out and extending the landscape of transformation-based programming from one-off scripts to reusable transform components, (2) the development of a unifying, loop/array-axis-based approach to expressing on-chip communication while reducing redundancy in Loopy?s program representation and transformation, (3) exploring the design space of high-performance languages that establish a close link between execution placement and data placement, (4) the development of an interactive program transform and performance analysis tool, along with the discovery of potential implications for workforce training in high-performance computing, (5) a demonstration that all the developed components can be applied together in a practical and coherent manner. Through graduate and undergraduate teaching as well as mentoring of the students and postdocs supported by this project, the PI contributes to enlarging the talent pool.