Presentation
MCBound: an Online Framework to Characterize and Classify Memory/Compute-bound HPC Jobs
SessionPerformance Analysis
DescriptionModern high-performance computing (HPC) systems play a fundamental role in driving scientific research, as they execute computationally intensive jobs from diverse domains. However, HPC jobs are characterized by conflicting computational requirements, which may cause inefficiency in resource usage, system throughput and energy consumption. One approach to tackle this problem is to distinguish between memory/compute-bound jobs at submission time, to make informed decisions about their execution. In this paper, we present MCBound, the first online data-driven framework to classify HPC jobs as memory/compute-bound before execution. We propose a systematic memory/compute-bound job characterization technique, and we use it to analyze the data of 2.2 million jobs run on the Supercomputer Fugaku. We implement MCBound for Fugaku and classify the jobs executed during February 2024. Our approach is proven effective, as it obtains an F1-macro average score of at least 0.89, while incurring a negligible overhead on the system's operations.