figshare
Browse
2020-06-09_GROMACS_heterogeneous_parallellization.pdf (4.26 MB)

Heterogeneous parallelization in GROMACS

Download (4.26 MB)
presentation
posted on 2020-06-09, 22:06 authored by Szilárd PállSzilárd Páll
GROMACS is a versatile open source molecular dynamics (MD) package with a rich set of features, a bottom-up performance oriented design and strong focus on portability. The MD field has been very successful at recruiting GPUs and as an early adopter, GROMACS has targeted GPUs for nearly a decade. This talk will discuss the algorithmic and heterogeneous parallelization components that makes the GROMACS MD engine not only among the fastest, reaching iteration rates in the hundreds of microseconds at peak, but also highly versatile.
I will discuss our bottom-up redesign of fundamental MD algorithms, critical to efficiently target modern SIMD/SIMT architectures. I will also talk about our multi-level parallelization that separately targets each levels of hardware parallelism: SIMD with up to 14 CPU instructions sets, GPUs from all three major vendors, efficient multi-threading, a highly tuned GPU offload layer, multi-level load balancers, and MPI for SPMD/MPMD multi-node parallelization.
The GROMACS GPU offload features have evolved over time, recently adding the ability to offload entire iterations to the GPU while using the CPU to “reverse offload” work back when beneficial, as well as direct GPU communication for strong scaling . However, a core feature remains: heterogeneous parallelization used for flexibility and performance, aiming to make the best use of both CPUs, GPUs and interconnects. This allows GROMACS to efficiently target both homogeneous and heterogeneous systems from laptops to the largest supercomputers.

Funding

SSF Infrastructure Fellow programme

Swedish e-Science Research Centre (SeRC)

History