figshare
Browse
Popovici_cmu_0041E_10284.pdf (1.43 MB)

An Approach to Specifying and Automatically Optimizing Fourier Transform Based Operations

Download (1.43 MB)
thesis
posted on 2018-09-20, 00:00 authored by Doru PopoviciDoru Popovici
Ideally, computational libraries and frameworks should o er developers two key bene fits. First, they should provide interfaces to capture complicated algorithms
that are otherwise difficult to write by hand. Second, they should automatically map algorithms to efficient implementations that perform at least as well as expert
hand-optimized code. In the spectral methods domain there are frameworks that address the fi rst bene fit by allowing users to express applications that rely on building blocks like the discrete Fourier transform (DFT). However, most current frameworks fall short on the second requirement because most opt for optimizing the discrete Fourier transform in isolation and do not attempt to integrate the DFT stages with the surrounding computation. Integrating the DFT computation with the surrounding computation requires one to expose the implementation of the DFT
algorithm. However, due to the complexity of writing efficient DFT code, users typically resort to implementations for the DFT stages, in the form of black box library
calls to high performance libraries such as MKL and FFTW. The cost of this approach is that neither a compiler nor an expert can optimize across the various DFT
and non-DFT stages. This dissertation provides a systematic approach for obtaining efficient code
for DFT-based applications like convolutions, correlations, interpolations and partial differential equation solvers, through the use of cross-stage optimizations. Most
of the applications follow a pattern: they permute the input data, apply a multi dimensional discrete Fourier transform, perform some computation on the Fourier
transform result, apply another multi-dimensional discrete Fourier transform and possibly another data permutation. Applying optimizations across the multiple
stages is enabled by the ability to represent the DFT and the additional computation with the same high level mathematical representation. Capturing the compute
stages of the entire algorithm with a high level representation allows one to apply high level algorithmic transformations like fusion and low level optimizations across
the stages, optimizations that otherwise would not have been possible with the black box approach.
The fi rst contribution of this work is a high level API for describing most common DFT-related problems. The API translates the problem speci fication into
a mathematical representation that is readable by the SPIRAL code generator. The second contribution of this work is the extension to the SPIRAL framework to allow
for cross-stage optimizations. The current work extends SPIRAL's capabilities to automatically apply fusion and other low level architecture dependent optimizations
to the DFT and non-DFT stages, before generating the code for the most common CPUs. We show that the generated code, that adopts the proposed approach,
achieves 1:2x to 2:2x performance improvements over implementations that use DFT library calls to MKL and FFTW.

History

Date

2018-09-20

Degree Type

  • Dissertation

Department

  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Franz Franchetti

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC