Close

Presentation

Accurate and Convenient Energy Measurements for GPUs: A Detailed Study of NVIDIA GPU's Built-in Power Sensor
DescriptionGPU has emerged as the go-to accelerator for HPC workloads; however, its power consumption has become a major limiting factor for further scaling HPC systems. An accurate understanding of GPU power consumption is essential for further improving its energy efficiency, and consequently reducing the associated carbon footprint. Despite the limited documentation and lack of understanding, NVIDIA GPUs' built-in power sensor is widely used in energy-efficient computing research. Our study seeks to elucidate the internal mechanisms of the power readings provided by nvidia-smi and assess the accuracy of the measurements. We evaluated over 70 different GPUs across 12 architectural generations, and identified several unforeseen problems that can lead to drastic under/overestimation of energy consumed, for example on the A100 and H100 GPUs only 25% of the runtime is sampled. We proposed several mitigations that could reduce the energy measurement error by an average of 35% in the test cases we present.