Presentation
Performance Portable Optimizations of an Ice-sheet Modeling Code on GPU-supercomputers
DescriptionIn this paper, we present GPU-optimizations for an ice-sheet modeling code known as MPAS-Albany Land Ice (MALI). MALI is a C++ template code that leverages Kokkos programming model for portability and Trilinos library for data structures, nonlinear and linear solvers. Performance of the most expensive kernel is assessed via the Roofline model to highlight the potential for code improvement according to the underlying GPU architecture. We perform optimizations consisting of loop fusion, loop optimizations and local accumulation to productively and portably attain an overall speedup of 3$\times$ in either NVIDIA and AMD GPU. We analyze the performance gains using a time-oriented performance portability model based on time per invocation and GPU data movement. Results show an increment between 20\% and 50\% on the performance portability metric by improving data locality and highlights the importance of optimizing GPU-ported scientific applications to maximize memory bandwidth and minimize data movement on modern supercomputers.