Close

Presentation

Vendor-neutral and Production-grade Job Power Management in High Performance Computing
DescriptionPower management and energy efficiency are critical
research areas for exascale computing and beyond, necessitating
reliable telemetry and control for distributed systems. Despite this
need, existing approaches present several limitations precluding
their adoption in production. These limitations include, but are
not limited to, lack of portability due to vendor-specific and
closed-source solutions, lack of support for non-MPI applications,
and lack of user-level customization.
We present a job-level power management framework based
on Flux. We introduce flux-power-monitor and demonstrate
its effectiveness on the Lassen (IBM Power AC922) and Tioga
(HPE Cray EX235A) systems with a low average overhead
of 0.4%. We also present flux-power-manager, where we
discuss a proportional sharing policy and introduce a hierarchical
FFT-based dynamic power management algorithm (FPP). We
demonstrate that FPP reduces energy by 1% compared to
proportional sharing, and by 20% compared to the default IBM
static power capping policy.
Event Type
Workshop
TimeSunday, 17 November 20242:30pm - 3pm EST
LocationB312
Tags
Energy Efficiency
HPC Infrastructure
Sustainability
Registration Categories
W