Close

Presentation

Benchmarking and Continuous Performance Monitoring of HPC Resources using the XDMoD Application Kernel Module
DescriptionHigh-performance computing (HPC) resources are critical for scientific and engineering computations, but they come with significant costs, making optimal utilization essential. While substantial effort is often invested in performance optimization during the initial deployment, regular performance monitoring tends to diminish during the production phase, potentially leading to undetected degradation in software and hardware performance. The XDMoD Application Kernel Performance Monitoring Module addresses this by automatically executing a suite of applications and benchmarks on a daily basis to identify performance degradation proactively. This module is designed to generate meaningful performance data while keeping short wall times to minimize impact on users. It also includes a performance degradation detection algorithm that can trigger regular or problem-specific email notifications. Operational since 2011, this module has successfully monitored the performance of XSEDE and ACCESS resources. This presentation provides an overview of the module’s capabilities and showcases practical use cases illustrating its effectiveness.