Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
FRSZ2 for In-Register Block Compression Inside GMRES on GPUs
DescriptionThe performance of the GMRES iterative solver on GPUs is limited by the GPU main memory bandwidth. Compressed Basis GMRES outperforms GMRES by storing the Krylov basis in low precision, thereby reducing the memory access. An open question is whether compression techniques that are more sophisticated than casting to low precision can enable large runtime savings while preserving the accuracy of the final results. This paper presents the lightweight in-register compressor \frsz that can decompress at the bandwidth speed of a modern NVIDIA H100 GPU. In an experimental evaluation, we demonstrate using \frsz instead of low precision for compression of the Krylov basis can bring larger runtime benefits without impacting final accuracy.