BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T234542Z
LOCATION:B314
DTSTART;TZID=America/New_York:20241118T103000
DTEND;TZID=America/New_York:20241118T111500
UID:submissions.supercomputing.org_SC24_sess758_misc379@linklings.com
SUMMARY:Reimagining Performance and Reproducibility in the Post-Moore Era:
  Innovations in Checkpointing and Workflow Management
DESCRIPTION:Michela Taufer (University of Tennessee, Knoxville)\n\nIn the 
 post-Moore era, the quest for enhanced performance and reproducibility is 
 more critical than ever. As researchers and engineers in high-performance 
 computing (HPC) and scientific computing, reimagining key areas such as al
 gorithms, hardware architecture, and software is essential to drive progre
 ss. In this talk, we will explore how performance engineering is evolving,
  focusing on checkpointing and the management of intermediate data in scie
 ntific workflows.<br /><br />We will first discuss the shift from traditio
 nal low-frequency checkpointing techniques to modern high-frequency approa
 ches that require complete histories and efficient memory use. By breaking
  data into chunks, using hash functions to store only modified data, and l
 everaging Merkle-tree structures, we improve efficiency, scalability, and 
 GPU utilization while addressing challenges like sparse data updates and l
 imited I/O bandwidth.<br /><br />We will also examine the balance between 
 performance and data persistence in workflows, where cloud infrastructures
  often sacrifice reproducibility for speed. To overcome this, we propose a
  persistent, scalable architecture that makes node-local data shareable ac
 ross nodes. By rethinking checkpointing and cloud data architectures, we s
 how how innovations in algorithms, hardware, and software can significantl
 y advance both performance and reproducibility in the post-Moore era.\n\nT
 ag: Artificial Intelligence/Machine Learning, Codesign\n\nRegistration Cat
 egory: Workshop Reg Pass\n\nSession Chairs: John Feo (Pacific Northwest Na
 tional Laboratory (PNNL)), Jiyuan Zhang (Meta), and Amelie Chi Zhou (Hong 
 Kong Baptist University)\n\n
END:VEVENT
END:VCALENDAR
