Close

Presentation

Optimizing MILC-Dslash Performance on NVIDIA A100 GPU: Parallel Strategies using SYCL
DescriptionMILC-Dslash is a benchmark derived from the MILC code which simulates lattice-gauge theory on a four-dimensional hypercube. This paper outlines a gradual progression in increasing the parallelism of the MILC-Dslash kernel using SYCL, transitioning from a simple to a fully parallel implementation. This investigation encompasses different work-item index orders, work-group sizes, and memory access patterns arising from these strategies. Examples of components intertwined with the parallel strategies include atomic memory operations, shared variables, divergent instructions, and versions with and without using the SYCL complex library and the SYCLomatic tool. The best parallel strategy is twice as fast as the simplest strategy and shows a 10% improvement over the QUDA baseline, thanks to enhanced parallelism and the use of work-group local memory. This, along with other findings — such as optimizing GPU resource utilization at the expense of concurrency — could guide researchers and developers seeking to optimize parallel computing applications.
Event Type
Workshop
TimeMonday, 18 November 20249:45am - 10am EST
LocationB306
Tags
Performance Optimization
Programming Frameworks and System Software
Registration Categories
W