Close

Presentation

Performance Characterization and Provenance of Distributed Task-based Workflows on HPC Platforms
DescriptionUnderstanding performance and provenance of task-based workflows poses significant challenges, particularly in distributed configurations where resources are shared by multiple applications. Task-based workflow management systems further complicate performance predictability because of their dynamicity that subtly alters task execution order from run to run.
In this paper we propose a layered characterization framework for performance and task provenance for Dask.distributed workflows running on high-performance computing platforms. It collects data from jobs, the workflow management system, and the operating system to aid in understanding the performance of these workflows. Our approach encompasses three main contributions: first, an extension of Dask.distributed to capture high-fidelity task provenance using Mochi data services; second, the adaptation of the established HPC I/O characterization tool Darshan to gather high-fidelity I/O data, thereby enhancing the granularity of our analysis; and third, a framework to combine and process the collected data and provide helpful insights into performance characterization and reproducibility.
Event Type
Workshop
TimeMonday, 18 November 20249:37am - 10am EST
LocationB302
Tags
Applications and Application Frameworks
Distributed Computing
Middleware and System Software
Registration Categories
W