Presentation
Accelerating HPC Workflow Results and Performance Reproducibility Analytics
DescriptionModern high-performance computing (HPC) workflows produce massive datasets, often exceeding 100+ TB per day, driven by instruments collecting data at gigabytes per second. These workflows, executed on advanced HPC systems with heterogeneous storage devices, high-performance microprocessors, accelerators, and interconnects, are increasingly complex and often involve non-deterministic computations. In this context, thousands of processes share computing resources using synchronization for consistency. The intricate process interaction and existing non-deterministic operations challenge explorations of workflow behaviors to ensure reproducibility, optimize performance, and reason about what happens when processes compete for resources. Existing reproducibility analysis frameworks are not well-suited to identify the sources and locations of non-determinism and performance variations, as they often focus on the final workflow results and general statistics about workflow performance.
We address these challenges by introducing scalable techniques that accelerate intermediate workflow results' comparison using variation-tolerant hashing of floating-point datasets, thus improving result reproducibility. We also capture workflow performance profiles and benchmark various queries to analyze workflow performance reproducibility. We also identify opportunities to optimize the loading process and indexing of performance data to ensure minimal initialization and querying overhead. Using collected performance data, we propose a cache-aware staggering technique that leverages workflow I/O profiles to reduce bottlenecks and resource contention, particularly in workflows that share the same input data. Our evaluations across molecular dynamics, cosmology, and deep learning workflows demonstrate significant speedup in intermediate results reproducibility analyses compared to state-of-art baselines and our ability to propose workflow execution strategies that maximize cache reuse and minimize execution makespan.
We address these challenges by introducing scalable techniques that accelerate intermediate workflow results' comparison using variation-tolerant hashing of floating-point datasets, thus improving result reproducibility. We also capture workflow performance profiles and benchmark various queries to analyze workflow performance reproducibility. We also identify opportunities to optimize the loading process and indexing of performance data to ensure minimal initialization and querying overhead. Using collected performance data, we propose a cache-aware staggering technique that leverages workflow I/O profiles to reduce bottlenecks and resource contention, particularly in workflows that share the same input data. Our evaluations across molecular dynamics, cosmology, and deep learning workflows demonstrate significant speedup in intermediate results reproducibility analyses compared to state-of-art baselines and our ability to propose workflow execution strategies that maximize cache reuse and minimize execution makespan.