Presentation
DyTwin: Federated Adaptive Digital Twins for Data Centers – Visualization and Anomaly Detection
DescriptionReliable and uninterrupted operation is crucial in supercomputers, especially during failures or inconsistencies i.e., anomalies. In this paper, we present a federated adaptive Digital Twin (DT) framework, with a focus on enhancing anomaly detection -- a critical aspect of modern data center management. Our DT continuously monitors key metrics, detects anomalies powered by AI, and dynamically adjusts its monitoring parameters to ensure optimal performance. Using a dashboard, our system provides real-time alarms and detailed visualizations of detected anomalies, along with real-time visualization and forecast for selected metrics. Through a series of experiments, we validate the effectiveness of our approach in maintaining operational reliability and promptly identifying potential anomalies within the data center.