BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T234542Z
LOCATION:B309
DTSTART;TZID=America/New_York:20241120T133000
DTEND;TZID=America/New_York:20241120T140000
UID:submissions.supercomputing.org_SC24_sess379_pap454@linklings.com
SUMMARY:Versatile Datapath Soft Error Detection on the Cheap for HPC Appli
 cations
DESCRIPTION:Yafan Huang (University of Iowa); Sheng Di (Argonne National L
 aboratory (ANL)); Zhaorui Zhang (Hong Kong Polytechnic University); Xiaoyi
  Lu (University of California, Merced); and Guanpeng Li (University of Iow
 a)\n\nWith the ongoing reduction in technology sizes and voltage levels, m
 odern microprocessors are increasingly susceptible to soft errors, corrupt
 ing datapath units during program execution. While these error types have 
 received considerable attention recently, existing solutions either confin
 e themselves to limited scopes or incur massive overheads in performance a
 nd power consumption, hindering practical usage. In this work, we propose 
 CONDA, a novel error detection technique based on code transformation and 
 static program analysis, achieving versatile datapath protection at low co
 st. At compile time, CONDA analyzes program characteristics and transforms
  the original program code without complicating its control-flow and memor
 y access patterns. At runtime, CONDA detects datapath errors with low over
 head and latency. The evaluation of 38 benchmarks and a parallel HPC simul
 ation reveals that CONDA only incurs 57.79% runtime overhead, which is 41.
 84% faster than existing state-of-the-art, with the same level of error de
 tection effectiveness and low detection latency.\n\nTag: Fault-Tolerance, 
 Reliability, Maintainability, and Adaptability, Middleware and System Soft
 ware, Performance Evaluation and/or Optimization Tools, Runtime Systems\n\
 nRegistration Category: Tech Program Reg Pass\n\nSession Chair: Camille Co
 ti (École de Technologie Supérieure)\n\n
END:VEVENT
END:VCALENDAR
