BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T234542Z
LOCATION:B213
DTSTART;TZID=America/New_York:20241118T133000
DTEND;TZID=America/New_York:20241118T170000
UID:submissions.supercomputing.org_SC24_sess435_tut138@linklings.com
SUMMARY:Performance Engineering for Linear Solvers
DESCRIPTION:Christie Louis Alappat and Georg Hager (Friedrich-Alexander Un
 iversity, Erlangen-Nuremberg; Erlangen National High Performance Computing
  Center) and Hartwig Anzt (Technical University of Munich; Technical Unive
 rsity of Munich School of Computation, Information and Technology)\n\nThis
  tutorial covers code analysis, performance modeling, and optimization for
  sparse linear solvers on CPU and GPU nodes. Performance Engineering is of
 ten taught using simple loops as instructive examples for performance mode
 ls and how they can guide optimization; however, full, preconditioned line
 ar solvers comprise multiple back-to-back loops enclosed in an iteration s
 cheme that is executed until convergence is achieved. Consequently, the co
 ncept of “optimal performance” has to account for both hardware resource e
 fficiency and iterative solver convergence. We convey a performance engine
 ering process that is geared towards linear iterative solvers. After intro
 ducing basic notions of hardware organization and storage for dense and sp
 arse data structures, we show how the Roofline performance model can be ap
 plied to such solvers in predictive and diagnostic ways and how it can be 
 used to assess the hardware efficiency of a solver, covering important cor
 ner cases such as pure memory boundedness. Then we advance to the structur
 e of preconditioned solvers, using the Conjugate Gradient Method (CG) algo
 rithm as a leading example. Hotspots and bottlenecks of the complete solve
 r are identified followed by the introduction of advanced performance opti
 mization techniques like the use of mixed precision and cache blocking. Ha
 nds-on exercises in Python complement the lectures.\n\nTag: Numerical Meth
 ods, Performance Evaluation and/or Optimization Tools, Portability\n\nRegi
 stration Category: Tutorial Reg Pass\n\n
END:VEVENT
END:VCALENDAR
