BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T143139Z
LOCATION:B211
DTSTART;TZID=America/New_York:20241117T083000
DTEND;TZID=America/New_York:20241117T170000
UID:submissions.supercomputing.org_SC24_sess412_tut123@linklings.com
SUMMARY:Efficient Distributed GPU Programming for Exascale
DESCRIPTION:Andreas Herten (Forschungszentrum Jülich, Jülich Supercomputin
 g Centre (JSC)); Simon Garcia de Gonzalo (Sandia National Laboratories); J
 iri Kraus and Markus Hrywniak (NVIDIA Corporation); Lena Oden (University 
 of Hagen, Germany); and Carolin Penke and Chelsea John (Jülich Supercomput
 ing Centre (JSC))\n\nOver the past decade, GPUs became ubiquitous in HPC i
 nstallations around the world, delivering the majority of performance of s
 ome of the largest supercomputers (e.g. Summit, Sierra, JUWELS Booster). T
 his trend continues in the recently deployed and upcoming Pre-Exascale and
  Exascale systems (JUPITER, LUMI, Leonardo; El Capitan, Frontier, Aurora):
  GPUs are chosen as the core computing devices to enter this next era of H
 PC.\nTo take advantage of future GPU-accelerated systems with tens of thou
 sands of devices, application developers need to have the proper skills an
 d tools to understand, manage, and optimize distributed GPU applications.\
 nIn this tutorial, participants will learn techniques to efficiently progr
 am large-scale multi-GPU systems. While programming multiple GPUs with MPI
  is explained in detail, also advanced tuning techniques and complementing
  programming models like NCCL and NVSHMEM are presented. Tools for analysi
 s are shown and used to motivate and implement performance optimizations. 
 The tutorial teaches fundamental concepts that apply to GPU-accelerated sy
 stems in general, taking the NVIDIA platform as an example. It is a combin
 ation of lectures and hands-on exercises, using a development system for J
 UPITER (JEDI), for interactive learning and discovery.\n\nTag: Accelerator
 s, Numerical Methods, Parallel Programming Methods, Models, Languages and 
 Environments, Performance Evaluation and/or Optimization Tools\n\nRegistra
 tion Category: Tutorial Reg Pass\n\n
END:VEVENT
END:VCALENDAR
