Presentation
Performance Portability of Electron Repulsion Integrals and Their Related Methods across Peta to Exascale Architectures
DescriptionTo fully leverage the computing power available in GPU-based HPC systems, it is crucial to balance performance and portability across GPU vendors. This work discusses the restructuring and porting of several key kernels in the GAMESS quantum chemistry software to AMD, Intel, and NVIDIA GPUs via OpenMP API. By leveraging OpenMP, the same code is made portable across GPU vendors. However, due to vendor-specific implementation of OpenMP, GPU code generation varies and can result in large variations in performance even on the same hardware. This work highlights the challenges faced in GPU offloading via OpenMP, which are likely to be encountered in other porting efforts, including memory limitations, the need for substantial restructuring, and differences in compiler optimizations. Also presented are strategies and approaches to address these challenges, along with performance results across supercomputing systems such as Summit, Aurora, Frontier and Perlmutter, using a range of vendor software stacks.