Presentation
ActorProf: A Framework for Profiling and Visualizing Fine-grained Asynchronous Bulk Synchronous Parallel Execution
DescriptionA Fine-grained Asynchronous Bulk Synchronous Parallel (FA-BSP) model is an extended version of the existing BSP model that facilitates fine-grained asynchronous point-to-point messages with automatic message aggregation.
While there are many large irregular applications written with the FA-BSP model, demonstrating promising performance, no profiler is aware of profile-worthy portions of an FA-BSP program and visualizes the results in an intuitive way. This is reasonable because the FA-BSP program relies on multiple external libraries, and the runtime frequently switches between different portions of the program, which makes it difficult for well-established profilers like score-p, TAU, CrayPat, Vtune, and HPCToolkit to profile and visualize these portions in an FA-BSP-friendly manner.
This paper designs and implements a profiling and visualization framework called ActorProf. The framework enables 1) asynchronous point-to-point message-aware profiling with hardware performance counters, 2) overall performance breakdown that is aware of FA-BSP execution, and 3) visualization of these profiling results.
While there are many large irregular applications written with the FA-BSP model, demonstrating promising performance, no profiler is aware of profile-worthy portions of an FA-BSP program and visualizes the results in an intuitive way. This is reasonable because the FA-BSP program relies on multiple external libraries, and the runtime frequently switches between different portions of the program, which makes it difficult for well-established profilers like score-p, TAU, CrayPat, Vtune, and HPCToolkit to profile and visualize these portions in an FA-BSP-friendly manner.
This paper designs and implements a profiling and visualization framework called ActorProf. The framework enables 1) asynchronous point-to-point message-aware profiling with hardware performance counters, 2) overall performance breakdown that is aware of FA-BSP execution, and 3) visualization of these profiling results.