BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T143034Z
LOCATION:B306
DTSTART;TZID=America/New_York:20241118T090000
DTEND;TZID=America/New_York:20241118T173000
UID:submissions.supercomputing.org_SC24_sess751@linklings.com
SUMMARY:2024 International Workshop on Performance, Portability, and Produ
 ctivity in HPC
DESCRIPTION:The Performance, Portability, and Productivity in HPC workshop
  aims to bring together developers and researchers with an interest in pra
 ctical solutions, technologies, tools, and methodologies that enable the d
 evelopment of performance-portable applications across a diverse set of cu
 rrent and future high‑performance computers. The topic of Performance, Por
 tability, and Productivity focuses on enabling applications and libraries 
 to run across multiple architectures without significant impact on achieve
 d performance and with the goal of maintaining developer productivity. Thi
 s workshop provides a forum for discussions of successes and failures in t
 ackling the compelling problems that lie at the intersection of performanc
 e, portability, and productivity in high-performance computing. This area 
 touches on many aspects of HPC software development and the workshop progr
 am is expected to reflect a wide range of experiences and perspectives, in
 cluding those of compiler, language and runtime experts; applications deve
 lopers, performance engineers; and domain scientists. For more information
  see https://p3hpc.org/\n\nPerformance Portable Optimizations of an Ice-sh
 eet Modeling Code on GPU-supercomputers\n\nIn this paper, we present GPU-o
 ptimizations for an ice-sheet modeling code known as MPAS-Albany Land Ice 
 (MALI). MALI is a C++ template code that leverages Kokkos programming mode
 l for portability and Trilinos library for data structures, nonlinear and 
 linear solvers. Performance of the most expensi...\n\n\nOscar Antepara and
  Samuel Williams (Lawrence Berkeley National Laboratory (LBNL)) and Max Ca
 rlson and Jerry Watkins (Sandia National Laboratories)\n------------------
 ---\nOptimizing MILC-Dslash Performance on NVIDIA A100 GPU: Parallel Strat
 egies using SYCL\n\nMILC-Dslash is a benchmark derived from the MILC code 
 which simulates lattice-gauge theory on a four-dimensional hypercube. This
  paper outlines a gradual progression in increasing the parallelism of the
  MILC-Dslash kernel using SYCL, transitioning from a simple to a fully par
 allel implementation. Th...\n\n\nAmanda S. Dufek (NERSC at LBNL (Lawrence 
 Berkeley National Laboratory)), Steven A. Gottlieb (Indiana University), M
 uaaz Gul Awan (NERSC at LBNL (Lawrence Berkeley National Laboratory)), Dou
 glas Adriano Augusto (Oswaldo Cruz Foundation), and Jack Deslippe and Bran
 don Cook (NERSC at LBNL (Lawrence Berkeley National Laboratory))\n--------
 -------------\nPerformance, Portability, and Productivity in HPC — Welcome
 \n---------------------\nWrapup\n\nWrap up at the end of the workshop\n\n-
 --------------------\nAn analysis into the performance and productivity of
  Rust in High Performance Computing\n\nRust is a type-safe programming lan
 guage originally developed by Mozilla in 2010. With design goals including
  guarantees of memory and thread safety, alongside foundations in function
 al programming, it claims to be highly performant, reliable, and productiv
 e for developers. Therefore, the Rust langu...\n\n\nEdmund Goodman and Ric
 hard Kirk (University of Warwick, England)\n---------------------\nLeverag
 ing AI to port from legacy Fortran to GPU enabled C++\n\nMany High-Perform
 ance Computing (HPC) applications have large code-bases written in legacy 
 Fortran. Porting these applications to C++ enables us to leverage GPU prog
 ramming frameworks such as Kokkos and AMReX but can be time-intensive. We 
 present our use of LLM-powered code converters to expedite th...\n\n\nHann
 ah Elizabeth Ross and Jean Sexton (Lawrence Berkeley National Laboratory (
 LBNL))\n---------------------\nPerformance, Portability, and Productivity 
 in HPC — Announcements\n---------------------\nPerformance and Power: Syst
 ematic Evaluation of AI Workloads on Accelerators with CARAML\n\nThe rapid
  advancement of machine learning (ML) technologies has driven the developm
 ent of specialized hardware accelerators designed to facilitate more effic
 ient model training. This paper introduces the CARAML benchmark suite, whi
 ch is employed to assess performance and energy consumption during th...\n
 \n\nChelsea Maria John, Andreas Herten, Stepan Nassyr, and Carolin Penke (
 Forschungszentrum Jülich, Jülich Supercomputing Centre (JSC))\n-----------
 ----------\nDevelopment of performance portable spline solver for exa-scal
 e plasma turbulence simulation\n\nThis paper describes the development of 
 performance portable spline building kernels on top of Kokkos-kernels. We 
 wish to solve a single matrix equation with multiple right-hand sides. Thi
 s problem is quite unique and thus neither Kokkos-kernels (direct method) 
 nor Ginkgo (iterative methods) is opti...\n\n\nYuuichi Asahi (CEA, Saclay;
  Maison de la Simulation); Baptiste Legouix (Atomic Energy and Alternative
  Energies Commission (CEA)); Emily Bourne (EPFL); Thomas Padioleau and Jul
 ien Bigot (CEA, Saclay); and Virginie Grandgirard and Kevin Obrejan (Atomi
 c Energy and Alternative Energies Commission (CEA))\n---------------------
 \nAfternoon Break\n---------------------\nPerformance portability via C++ 
 PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study\n\nGiulio Malenza
  (University of Torino, Italy); Valentina Cesare (National Institute for A
 strophysics); Marco Edoardo Santimaria and Robert Birke (University of Tor
 ino, Italy); Alberto Vecchiato and Ugo Becciani (National Institute for As
 trophysics); and Marco Aldinucci (University of Torino, Italy)\n----------
 -----------\nMorning Break\n---------------------\nPerformance Modeling an
 d Analysis of a de Bruijn Graph Based Local Assembly Kernel on Multiple Ve
 ndor GPUs\n\nBioinformatics workloads differ significantly from traditiona
 l scientific computing and AI workloads because they consist primarily of 
 integer-only operations and string comparisons rather than floating-point 
 operations. The underlying algorithms usually have low arithmetic intensit
 y, irregular memo...\n\n\nLeAnn Lindsey (University of Utah) and Nan Ding,
  Jack DeSlippe, and Muaaz Awan (Lawrence Berkeley National Laboratory (LBN
 L))\n---------------------\nPerformance, Portability, and Productivity in 
 HPC — Featured Speaker\n\nJulien Bigot (Atomic Energy and Alternative Ener
 gies Commission (CEA))\n---------------------\nRAJA Performance Suite: Per
 formance Portability Analysis with Caliper and Thicket\n\nMaintaining perf
 ormant code in a world of fast-evolving computer architectures and program
 ming models poses a significant challenge to scientists. Typically, benchm
 ark codes are used to model some aspects of a large application code's per
 formance, and are easier to build and run. Such benchmarks can...\n\n\nOlg
 a Pearce (Lawrence Livermore National Laboratory (LLNL), Texas A&M Univers
 ity); Jason Burmark and Rich Hornung (Lawrence Livermore National Laborato
 ry (LLNL)); Befikir Bogale and Ian Lumsden (Innovative Computing Laborator
 y, University of Tennessee); Michael McKinsey (Texas A&M University); Dewi
  Yokelson, David Boehme, and Stephanie Brink (Lawrence Livermore National 
 Laboratory (LLNL)); Michela Taufer (University of Tennessee, Knoxville); a
 nd Tom Scogland (Lawrence Livermore National Laboratory (LLNL))\n---------
 ------------\nExploring SYCL for batched kernels with memory allocations\n
 \nBatched parallelism with local allocations is an extremely common patter
 n in HPC, appearing in multi-dimensional FFTs, neural networks processing,
  or split computation of numerical operators.\nIts efficient support is es
 pecially complex on GPU where memory per work-item is limited and dynamic 
 memory ...\n\n\nAymeric Millan, Thomas Padioleau, and Julien Bigot (Maison
  de la Simulation, Atomic Energy and Alternative Energies Commission (CEA)
 )\n---------------------\nAutonomous Execution for Multi-GPU Systems: Comp
 iler Support\n\nRecent trends in HPC systems increasingly emphasize accele
 rators, particularly GPUs, as autonomous execution units, shifting control
  of entire program execution to GPUs. In this work, we aim to bridge this 
 gap with a compiler and provide a productive method for writing efficient 
 GPU-first code. We d...\n\n\nJavid Baydamirli (Koç University, Turkey); Ta
 l Ben-Nun (Lawrence Livermore National Laboratory (LLNL)); and Didem Unat 
 (Koç University, Turkey)\n---------------------\nPerformance and Scaling o
 f HPC and AI Applications on Leadership Class Intel, AMD, and NVIDIA GPU-A
 ccelerated Systems\n\nAs HPC systems move into the exascale era an increas
 ing diversity of hardware is deployed. The last decade saw the ascendance 
 of NVIDIA GPU-accelerated systems among the largest scale HPC systems and 
 spurred the need for application developers to consider approaches to perf
 ormance portability that p...\n\n\nJaeHyuk Kwack, Colleen Bertoni, Umesh U
 nnikrishnan, Riccardo Balin, Khalid Hossain, Yasaman Ghadar, Timothy Willi
 ams, Abhishek Bagusetty, Mathialakan Thavappiragasam, Väinö Hatanpää, Arch
 it Vasan, John Tramm, and Scott Parker (Argonne National Laboratory (ANL))
 \n---------------------\nPerformance, Portability, and Productivity in HPC
  — Tooling Panel & Discussion\n\nDoug Jacobsen (Google); CJ Newburn (NVIDI
 A Corporation); Kaan Olgu and Tom Lin (University of Bristol, England); an
 d Solomon Bekele (Argonne National Laboratory (ANL))\n--------------------
 -\nA Metric for HPC Programming Model Productivity\n\nThere has been a hea
 lthy growth of heterogeneous programming models that cover different parad
 igms in the HPC space.\nSelecting an appropriate programming model for new
  projects is challenging: how does one select a model that is both product
 ive and performant?\nThe same applies for existing projects ...\n\n\nWei-C
 hen Lin, Tom Deakin, and Simon McIntosh-Smith (University of Bristol, Engl
 and)\n---------------------\nEvaluating Performance Portability of a Seism
 ic Survey Simulation across GPU Architectures\n\nWith the increasing use o
 f graphic processing units (GPUs) from various vendors in oil and gas comp
 anies, achieving code portability has become essential. This capability al
 lows for evaluating performance across diverse GPU vendors, facilitating w
 ell-informed decisions, and promoting competition and...\n\n\nArthur Loren
 zon and Philippe Navaux (Federal University of Rio Grande do Sul, Brazil);
  Alexandre Sardinha (Petróleo Brasileiro S.A.); and Bronson Messer (Oak Ri
 dge National Laboratory (ORNL))\n---------------------\nHigh-Performance, 
 Scalable Geometric Multigrid via Fine-Grain Data Blocking for GPUs\n\nWe p
 resent a performance study of geometric multigrid (GMG) on NVIDIA, AMD, an
 d Intel GPU-accelerated supercomputers.  The approach employs fine-grain d
 ata blocking in BrickLib, which reduces data movement in the GMG V-cycle b
 y optimizing storage order for stencil access and communication.\nOur GMG 
 a...\n\n\nOscar Antepara, Samuel Williams, and Hans Johansen (Lawrence Ber
 keley National Laboratory (LBNL)) and Mary Hall (University of Utah)\n----
 -----------------\nLunch Break\n---------------------\nPerformance, Portab
 ility, and Productivity in HPC — Lightning Talk Q&A\n\nTag: Performance Op
 timization, Programming Frameworks and System Software\n\nRegistration Cat
 egory: Workshop Reg Pass\n\nSession Chairs: CJ Newburn (NVIDIA Corporation
 ), Scott J. Parker (Argonne National Laboratory (ANL)), John Pennycook (In
 tel Corporation), and Kenneth Weiss (Lawrence Livermore National Laborator
 y (LLNL))
END:VEVENT
END:VCALENDAR
