Presentation
CEEMS: A Resource Manager Agnostic Energy and Emissions Monitoring Stack
SessionSustainable Supercomputing
DescriptionCompute Energy & Emissions Monitoring Stack (CEEMS) has been designed to report energy usage of compute workloads in real time for HPC and cloud platforms alike. Besides CPU energy usage, it supports reporting energy usage of workloads on NVIDIA and AMD GPU accelerators. CEEMS has been built around the prominent open-source tools like Prometheus and Grafana. This paper explains the architectural overview of CEEMS, data sources that are used to measure energy usage and estimate equivalent emissions and potential use cases of CEEMS from operator and user perspectives. Finally, the paper will conclude by describing how CEEMS deployment on the Jean-Zay supercomputing platform is capable of monitoring more than 1400 nodes that have a daily job churn rate of around 20k jobs.