BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T234540Z
LOCATION:B309
DTSTART;TZID=America/New_York:20241120T143000
DTEND;TZID=America/New_York:20241120T150000
UID:submissions.supercomputing.org_SC24_sess379_pap468@linklings.com
SUMMARY:GVARP: Detecting Performance Variance on Large-Scale Heterogeneous
  System
DESCRIPTION:Xin You, Zhibo Xuan, Hailong Yang, Zhongzhi Luan, Yi Liu, and 
 Depei Qian (Beihang University)\n\nPerformance variance is one of the nast
 y pitfalls of large-scale heterogeneous systems, which can lead to unexpec
 ted and unpredictable performance degradation for parallel programs. Such 
 performance issues typically arise from various random hardware and softwa
 re faults, making it exceedingly difficult to pinpoint the exact causes of
  performance variance in specific instances. In this paper, we propose \te
 xtit{GVARP}, a performance variance detection tool for large-scale heterog
 eneous systems. \textit{GVARP} employs static analysis to identify the per
 formance-critical parameters of kernel functions. Additionally, \textit{GV
 ARP} segments the program execution with external library calls and asynch
 ronous kernel operations. Then \textit{GVARP} constructs a state transfer 
 graph and estimates the workload of each program segment to identify and c
 luster instances of similar workloads, facilitating the detection of perfo
 rmance variance. Our evaluation results demonstrate that \textit{GVARP} ef
 fectively detects performance variance at a large scale with acceptable ov
 erhead and provides intuitive insights to locate the sources of performanc
 e variance.\n\nTag: Fault-Tolerance, Reliability, Maintainability, and Ada
 ptability, Middleware and System Software, Performance Evaluation and/or O
 ptimization Tools, Runtime Systems\n\nRegistration Category: Tech Program 
 Reg Pass\n\nSession Chair: Camille Coti (École de Technologie Supérieure)\
 n\n
END:VEVENT
END:VCALENDAR
