CSSI Frameworks: Designing Next-Generation MPI Libraries for Emerging Dense GPU Systems

Modern HPC platforms are using multiple CPU, GPUs and high-performance interconnects per node. Unfortunately, state-of-the-art production quality implementations of the popular Message Passing Interface (MPI) programming model do not have the appropriate support to deliver the best performance and scalability for applications (HPC and DL) on such dense GPU systems. The project involves a synergistic and comprehensive research plan, involving computer scientists from OSU and OSC and computational scientists from TACC, SDSC and UCSD. The proposed innovations include: 1) Designing high-performance and scalable communication operations that fully utilize multiple network adapters and advanced in-network computing features for GPU and CPU; 2) Designing novel datatype processing and unified memory management; 3) Designing CUDA-aware I/O; 4) Designing support for containerized environments; and 5) Carrying out integrated evaluation with a set of driving applications. Initial results from this project using the MVAPICH2 MPI library will be presented.