An Approach to Specifying and Automatically Optimizing Fourier Transform Based Operations

2018-10-22T20:16:58Z (GMT) by Doru Popovici
Ideally, computational libraries and frameworks should o er developers two key bene fits. First, they should provide interfaces to capture complicated algorithms<br>that are otherwise difficult to write by hand. Second, they should automatically map algorithms to efficient implementations that perform at least as well as expert<br>hand-optimized code. In the spectral methods domain there are frameworks that address the fi rst bene fit by allowing users to express applications that rely on building blocks like the discrete Fourier transform (DFT). However, most current frameworks fall short on the second requirement because most opt for optimizing the discrete Fourier transform in isolation and do not attempt to integrate the DFT stages with the surrounding computation. Integrating the DFT computation with the surrounding computation requires one to expose the implementation of the DFT<br>algorithm. However, due to the complexity of writing efficient DFT code, users typically resort to implementations for the DFT stages, in the form of black box library<br>calls to high performance libraries such as MKL and FFTW. The cost of this approach is that neither a compiler nor an expert can optimize across the various DFT<br>and non-DFT stages. This dissertation provides a systematic approach for obtaining efficient code<br>for DFT-based applications like convolutions, correlations, interpolations and partial differential equation solvers, through the use of cross-stage optimizations. Most<br>of the applications follow a pattern: they permute the input data, apply a multi dimensional discrete Fourier transform, perform some computation on the Fourier<br>transform result, apply another multi-dimensional discrete Fourier transform and possibly another data permutation. Applying optimizations across the multiple<br>stages is enabled by the ability to represent the DFT and the additional computation with the same high level mathematical representation. Capturing the compute<br>stages of the entire algorithm with a high level representation allows one to apply high level algorithmic transformations like fusion and low level optimizations across<br>the stages, optimizations that otherwise would not have been possible with the black box approach.<br>The fi rst contribution of this work is a high level API for describing most common DFT-related problems. The API translates the problem speci fication into<br>a mathematical representation that is readable by the SPIRAL code generator. The second contribution of this work is the extension to the SPIRAL framework to allow<br>for cross-stage optimizations. The current work extends SPIRAL's capabilities to automatically apply fusion and other low level architecture dependent optimizations<br>to the DFT and non-DFT stages, before generating the code for the most common CPUs. We show that the generated code, that adopts the proposed approach,<br>achieves 1:2x to 2:2x performance improvements over implementations that use DFT library calls to MKL and FFTW.<div><br> </div>