BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T234540Z
LOCATION:B310
DTSTART;TZID=America/New_York:20241117T164000
DTEND;TZID=America/New_York:20241117T165000
UID:submissions.supercomputing.org_SC24_sess737_ws_ia118@linklings.com
SUMMARY:Predicting Compute Node Unavailability in HPC: A Graph-Based Machi
 ne Learning Approach
DESCRIPTION:Jože Rožanec (Jožef Stefan Institute)\n\nAs high-performance c
 omputing (HPC) systems advance towards Exascale computing, their size and 
 complexity increase, introducing new maintenance challenges. Modern HPC sy
 stems feature data monitoring infrastructures that provide insights into t
 he system's state. This data can be leveraged to train machine learning mo
 dels to anticipate anomalies that require compute nodes to undergo mainten
 ance procedures. This paper presents a novel approach to predicting such a
 nomalies by creating a graph per measurement that encodes current and past
  sensor readings and information related to the compute node sensors. The 
 experiments were performed with data collected from Marconi 100, a tier-0 
 production supercomputer at CINECA in Bologna, Italy. Our results show tha
 t the machine learning model can accurately predict anomalies and surpass 
 current State-Of-The-Art (SOTA) models regarding the quality of prediction
 s and the time horizon considered to forecast them.\n\nTag: Graph Algorith
 ms, Heterogeneous Computing, Programming Frameworks and System Software\n\
 nRegistration Category: Workshop Reg Pass\n\nSession Chairs: Michela Becch
 i (North Carolina State University); John Feo (Pacific Northwest National 
 Laboratory (PNNL)); Antonino Tumeo (Pacific Northwest National Laboratory 
 (PNNL)); and Ana Lucia Varbanescu (University of Twente, Netherlands; Univ
 ersity of Amsterdam, Netherlands)\n\n
END:VEVENT
END:VCALENDAR
