Weak scaling of parallel FMM vs. FFT up to 4096 processes
Lorena A. Barba
Rio Yokota
10.6084/m9.figshare.92425.v1
https://figshare.com/articles/dataset/Weak_scaling_of_parallel_FMM_vs_FFT_up_to_4096_processes/92425
<p>This figure shows the weak scaling of a parallel FMM-based fluid solver on GPUs, from 1 to 4096 processes. The FMM (fast multipole method) is used as the numerical engine in a vortex method fluid solver, simulating decaying isotropic turbulence. The reference method for this application is the pseudo-spectral method, which uses FFT as the numerical engine. Given the communication pattern of FFT, only 14% parallel efficiency is obtained with the spectral method on 4096 processes (no GPU acceleration). The parallel efficiency of the FMM-based solver is 74% at 4096 processes (one GPU per MPI process, 3 GPUs per node).</p>
<p>It is important to note that the results correspond to the full-application codes, not just the FMM and FFT algorithms. The spectral method calculations were done using the 'hit3d' code (see link below). The size of the largest problem corresponds to a 4096^3 mesh, i.e., almost 69 billion points (about 17 million points per process).</p>
<p>These calculations were run on the TSUBAME 2.0 system of the Tokyo Institute of Technology, thanks to guest access, during Fall 2011. </p>
<p>The figure is here shared under CC-BY. Please use the handle and doi above for citation if you use it.</p>
2012-06-18 22:37:20
fmm
treecode
turbulence
vortex method
spectral method
Mechanical Engineering
Computational Physics
Applied Computer Science