Close

Presentation

Scrutinizing Variables for Checkpoint Using Automatic Differentiation
DescriptionCheckpoint/Restart (C/R) saves the running state of the programs periodically, which consumes considerable time and system resources. We observe that not every piece of data is involved in the computation in typical HPC applications; such unused data should be excluded from checkpointing for better storage and compute efficiency. We propose a systematic approach that leverages automatic differentiation (AD) to scrutinize every element within variables (e.g., arrays) necessary for checkpointing. This allows us to identify critical and uncritical elements and eliminate uncritical elements from checkpointing. Specifically, we inspect every single element within a variable necessary for checkpointing with an AD tool to determine whether the element has an impact on the application output or not. We validate our approach with all benchmarks from the NPB suite. We visualize the distribution of critical and uncritical elements within a variable with respect to its binary impact (yes or no) on the application output.
Event Type
Workshop
TimeMonday, 18 November 202411:30am - 11:50am EST
LocationB301
Tags
Applications and Application Frameworks
Algorithms
Performance Evaluation and/or Optimization Tools
Registration Categories
W