Close

Presentation

High-Performance, Scalable Geometric Multigrid via Fine-Grain Data Blocking for GPUs
DescriptionWe present a performance study of geometric multigrid (GMG) on NVIDIA, AMD, and Intel GPU-accelerated supercomputers. The approach employs fine-grain data blocking in BrickLib, which reduces data movement in the GMG V-cycle by optimizing storage order for stencil access and communication.
Our GMG attains 73% in a peak performance portability metric, and 87% parallel efficiency when weak scaling to 512 GPUs on all three GPU-accelerated supercomputers.
Analysis shows stencil performance and MPI communication is well-correlated with a traditional linear model from which we can extract empirical latency, overhead, bandwidth, and throughput for comparison to theoretical GPU and network limits.
Observations show NVIDIA GPUs provide the lowest overhead and highest throughput per process with AMD and Intel GPUs delivering comparable performance.
Conversely, despite all three platforms employing the same Slingshot network, sustained bandwidth and latency vary widely when each GPU is dedicated one NIC.