Popovici_cmu_0041E_10284.pdf (1.43 MB)
An Approach to Specifying and Automatically Optimizing Fourier Transform Based Operations
Ideally, computational libraries and frameworks should oer developers two key benefits. First, they should provide interfaces to capture complicated algorithms
that are otherwise difficult to write by hand. Second, they should automatically map algorithms to efficient implementations that perform at least as well as expert
hand-optimized code. In the spectral methods domain there are frameworks that address the first benefit by allowing users to express applications that rely on building blocks like the discrete Fourier transform (DFT). However, most current frameworks fall short on the second requirement because most opt for optimizing the discrete Fourier transform in isolation and do not attempt to integrate the DFT stages with the surrounding computation. Integrating the DFT computation with the surrounding computation requires one to expose the implementation of the DFT
algorithm. However, due to the complexity of writing efficient DFT code, users typically resort to implementations for the DFT stages, in the form of black box library
calls to high performance libraries such as MKL and FFTW. The cost of this approach is that neither a compiler nor an expert can optimize across the various DFT
and non-DFT stages. This dissertation provides a systematic approach for obtaining efficient code
for DFT-based applications like convolutions, correlations, interpolations and partial differential equation solvers, through the use of cross-stage optimizations. Most
of the applications follow a pattern: they permute the input data, apply a multi dimensional discrete Fourier transform, perform some computation on the Fourier
transform result, apply another multi-dimensional discrete Fourier transform and possibly another data permutation. Applying optimizations across the multiple
stages is enabled by the ability to represent the DFT and the additional computation with the same high level mathematical representation. Capturing the compute
stages of the entire algorithm with a high level representation allows one to apply high level algorithmic transformations like fusion and low level optimizations across
the stages, optimizations that otherwise would not have been possible with the black box approach.
The first contribution of this work is a high level API for describing most common DFT-related problems. The API translates the problem specification into
a mathematical representation that is readable by the SPIRAL code generator. The second contribution of this work is the extension to the SPIRAL framework to allow
for cross-stage optimizations. The current work extends SPIRAL's capabilities to automatically apply fusion and other low level architecture dependent optimizations
to the DFT and non-DFT stages, before generating the code for the most common CPUs. We show that the generated code, that adopts the proposed approach,
achieves 1:2x to 2:2x performance improvements over implementations that use DFT library calls to MKL and FFTW.
that are otherwise difficult to write by hand. Second, they should automatically map algorithms to efficient implementations that perform at least as well as expert
hand-optimized code. In the spectral methods domain there are frameworks that address the first benefit by allowing users to express applications that rely on building blocks like the discrete Fourier transform (DFT). However, most current frameworks fall short on the second requirement because most opt for optimizing the discrete Fourier transform in isolation and do not attempt to integrate the DFT stages with the surrounding computation. Integrating the DFT computation with the surrounding computation requires one to expose the implementation of the DFT
algorithm. However, due to the complexity of writing efficient DFT code, users typically resort to implementations for the DFT stages, in the form of black box library
calls to high performance libraries such as MKL and FFTW. The cost of this approach is that neither a compiler nor an expert can optimize across the various DFT
and non-DFT stages. This dissertation provides a systematic approach for obtaining efficient code
for DFT-based applications like convolutions, correlations, interpolations and partial differential equation solvers, through the use of cross-stage optimizations. Most
of the applications follow a pattern: they permute the input data, apply a multi dimensional discrete Fourier transform, perform some computation on the Fourier
transform result, apply another multi-dimensional discrete Fourier transform and possibly another data permutation. Applying optimizations across the multiple
stages is enabled by the ability to represent the DFT and the additional computation with the same high level mathematical representation. Capturing the compute
stages of the entire algorithm with a high level representation allows one to apply high level algorithmic transformations like fusion and low level optimizations across
the stages, optimizations that otherwise would not have been possible with the black box approach.
The first contribution of this work is a high level API for describing most common DFT-related problems. The API translates the problem specification into
a mathematical representation that is readable by the SPIRAL code generator. The second contribution of this work is the extension to the SPIRAL framework to allow
for cross-stage optimizations. The current work extends SPIRAL's capabilities to automatically apply fusion and other low level architecture dependent optimizations
to the DFT and non-DFT stages, before generating the code for the most common CPUs. We show that the generated code, that adopts the proposed approach,
achieves 1:2x to 2:2x performance improvements over implementations that use DFT library calls to MKL and FFTW.
History
Date
2018-09-20Degree Type
- Dissertation
Department
- Electrical and Computer Engineering
Degree Name
- Doctor of Philosophy (PhD)