Presentation
Benchmarking and Continuous Performance Monitoring of HPC Resources using the XDMoD Application Kernel Module
DescriptionHigh-performance computing (HPC) resources are critical for scientific and engineering computations, but they come with significant costs, making optimal utilization essential. While substantial effort is often invested in performance optimization during the initial deployment, regular performance monitoring tends to diminish during the production phase, potentially leading to undetected degradation in software and hardware performance. The XDMoD Application Kernel Performance Monitoring Module addresses this by automatically executing a suite of applications and benchmarks on a daily basis to identify performance degradation proactively. This module is designed to generate meaningful performance data while keeping short wall times to minimize impact on users. It also includes a performance degradation detection algorithm that can trigger regular or problem-specific email notifications. Operational since 2011, this module has successfully monitored the performance of XSEDE and ACCESS resources. This presentation provides an overview of the module’s capabilities and showcases practical use cases illustrating its effectiveness.
Event Type
Workshop
TimeFriday, 22 November 202410:30am - 10:40am EST
LocationB312-B313A
HPC Infrastructure
State of the Practice
System Administration
W
Archive
view