Weak scaling of parallel FMM vs. FFT up to 4096 processes

2012-06-18T22:37:20Z (GMT) by Lorena A. Barba Rio Yokota
<p>This figure shows the weak scaling of a parallel FMM-based fluid solver on GPUs, from 1 to 4096 processes. The FMM (fast multipole method) is used as the numerical engine in a vortex method fluid solver, simulating decaying isotropic turbulence. The reference method for this application is the pseudo-spectral method, which uses FFT as the numerical engine. Given the communication pattern of FFT, only 14% parallel efficiency is obtained with the spectral method on 4096 processes (no GPU acceleration). The parallel efficiency of the FMM-based solver is 74% at 4096 processes (one GPU per MPI process, 3 GPUs per node).</p> <p>It is important to note that the results correspond to the full-application codes, not just the FMM and FFT algorithms. The spectral method calculations were done using the 'hit3d' code (see link below). The size of the largest problem corresponds to a 4096^3 mesh, i.e., almost 69 billion points (about 17 million points per process).</p> <p>These calculations were run on the TSUBAME 2.0 system of the Tokyo Institute of Technology, thanks to guest access, during Fall 2011. </p> <p>The figure is here shared under CC-BY. Please use the handle and doi above for citation if you use it.</p>