Close

Presentation

Design and Implementation of MPI-Native GPU-Initiated MPI Partitioned Communication
DescriptionGPUs have become the dominant type of accelerators for high-performance computing and artificial intelligence. To support these systems, new communication libraries have emerged, such as NCCL and NVSHMEM, providing stream-based semantics and GPU-initiated communication. Some of the best performing communication libraries are unfortunately vendor-specific, and may use load-store semantics that have been traditionally underused in the application community. Moreover, MPI has yet to define explicit GPU support mechanisms, making it difficult to deploy the message-passing communication model efficiently on GPU-based systems.

MPI 4.0 introduced Partitioned point-to-point communication, which facilitates hybrid-programming models. Partitioned communication is designed to allow GPUs to trigger data movement through a persistent channel. We extend MPI Partitioned to provide intra-kernel GPU-initiated communication and partitioned collectives, augmenting MPI with techniques used in vendor-specific libraries. We evaluate our designs on an NVIDIA GH200 Grace Hopper Superchip testbed to understand the benefits of GPU-initiated communication on NVLink and InfiniBand networks.
Event Type
Workshop
TimeSunday, 17 November 202412pm - 12:30pm EST
LocationB303
Tags
Message Passing
Network
Registration Categories
W