Presentation
autoGEMM: Pushing the Limits of Irregular Matrix Multiplication on Arm Architectures
DescriptionThis paper presents an open-source library that pushes the limits of performance portability for irregular General Matrix Multiplication (GEMM) computations on the widely-used Arm architectures. autoGEMM generates optimized kernels for various hardware configurations by auto-combining fragments of auto-generated micro-kernels that employ hand-written optimizations to maximize computational efficiency. We optimize the kernel pipeline by tuning the register reuse and the data load/store overlapping. In addition, we use a dynamic tiling scheme to generate balanced tile shapes, based on the shapes of the matrices. We build autoGEMM on top of the TVM framework where our dynamic tiling scheme prunes the search space for TVM to identify the optimal combination of parameters for code optimization. Evaluations on five different classes of Arm chips demonstrate the advantages of autoGEMM. For small matrices, autoGEMM achieves 98% of peak and up to 2.0x speedup over state-of-the-art libraries such as LIBXSMM and LibShalom. autoGEMM is available at:https://github.com/wudu98/autoGEMM.
Event Type
Paper
TimeTuesday, 19 November 20242:30pm - 3pm EST
LocationB309
Accelerators
Compilers
Embedded and/or Reconfigurable Systems
Linear Algebra
Performance Evaluation and/or Optimization Tools
TP
Archive
view



