Modern HPC platforms are using multiple CPU, GPUs and high-performance
interconnects per node. Unfortunately, state-of-the-art production
quality implementations of the popular Message Passing Interface (MPI)
programming model do not have the appropriate support to deliver the
best performance and scalability for applications (HPC and DL) on such
dense GPU systems. The project involves a synergistic and comprehensive
research plan, involving computer scientists from OSU and OSC and
computational scientists from TACC, SDSC and UCSD. The proposed
innovations include: 1) Designing high-performance and scalable
communication operations that fully utilize multiple network adapters
and advanced in-network computing features for GPU and CPU; 2) Designing
novel datatype processing and unified memory management; 3) Designing
CUDA-aware I/O; 4) Designing support for containerized environments; and
5) Carrying out integrated evaluation with a set of driving
applications. Initial results from this project using the MVAPICH2 MPI
library will be presented.
Funding
Collaborative Research: Frameworks: Designing Next-Generation MPI Libraries for Emerging Dense GPU Systems
Directorate for Computer & Information Science & Engineering