BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T234154Z
LOCATION:B310
DTSTART;TZID=America/New_York:20241117T090000
DTEND;TZID=America/New_York:20241117T173000
UID:submissions.supercomputing.org_SC24_sess737@linklings.com
SUMMARY:IA^3 2024 - 14th Workshop on Irregular Applications: Architectures
  & Algorithms
DESCRIPTION:Due to the heterogeneous datasets they process, data-intensive
  applications employ diverse methods and data structures, exhibiting irreg
 ular data accesses, control flows, and communication patterns. Modern data
  analytics applications additionally require supporting dynamic data struc
 tures, asynchronous control flows, and mixed parallel programming models. 
 Supercomputing systems are organized around software and hardware optimize
 d for data locality and bulk synchronous computations. Managing irregular 
 behaviors requires a substantial programming effort and lacks integration,
  leading to poor performance. Holistic solutions to these challenges emerg
 e only by considering the problem from multiple perspectives: from micro- 
 to system-architectures, from compilers to languages, from libraries to ru
 ntimes, and from algorithm design to data characteristics. Only collaborat
 ive efforts among researchers with different expertise, including domain e
 xperts and end-users, can lead to significant breakthroughs. This workshop
  brings together scientists with different backgrounds to discuss methods 
 and technologies for efficiently supporting irregular applications on curr
 ent and future architectures.\n\nEstablish the basis for Breadth-First Sea
 rch on Frontier System: XBFS on AMD GPUs\n\nGraphics Processing Units (GPU
 s) offer significant potential for accelerating various computational task
 s, including Breadth-First Search (BFS). Numerous efforts have been made t
 o deploy BFS on GPUs effectively. To address the dynamic nature of BFS, XB
 FS, the state-of-the-art work, employs an adapti...\n\n\nHaoshen Yang (Rut
 gers University), Hao Lu and Naw Safrin Sattar (Oak Ridge National Laborat
 ory (ORNL)), Hang Liu (Rutgers University), and Feiyi Wang (Oak Ridge Nati
 onal Laboratory (ORNL))\n---------------------\nIA^3 — Lunch Break\n------
 ---------------\nInvited Talk: Discussion on Hash-Table Approaches for Eff
 icient Sparse Tensor Contraction\n\nSparse tensor contraction (SpTC) is a 
 crucial operation in high-performance applications, particularly in comput
 ational chemistry, high-order tensor decompositions, and quantum sciences.
  This talk will explore the performance challenges associated with SpTC an
 d review current state-of-the-art soluti...\n\n\nJiajia Li (North Carolina
  State University)\n---------------------\nGPU Accelerated Sparse Cholesky
  Factorization\n\nThe solution of sparse symmetric positive definite linea
 r systems is an important computational kernel in large-scale scientific a
 nd engineering modeling and simulation. We will solve the linear systems u
 sing a direct method, in which a Cholesky factorization of the coefficient
  matrix is performed u...\n\n\nM. Ozan Karsavuran and Esmond G. Ng (Lawren
 ce Berkeley National Laboratory (LBNL)) and Barry W. Peyton (Dalton State 
 College)\n---------------------\nIA^3 — Morning Break\n-------------------
 --\nEnhancing Small Message Aggregation with Directive-Based Deferred Exec
 ution\n\nThe partitioned global address space (PGAS) model offers one-side
 d communication operations to efficiently access local and remote data thr
 ough a distributed shared memory model using point-to-point network operat
 ions.  An extension to the OpenSHMEM PGAS library previously demonstrated 
 how message a...\n\n\nAaron Welch and Oscar Hernandez (Oak Ridge National 
 Laboratory (ORNL)) and Stephen Poole and Wendy Poole (Los Alamos National 
 Laboratory (LANL))\n---------------------\nEnhancing Scalability and Perfo
 rmance in Influence Maximization with Optimized Parallel Processing\n\nInf
 luence Maximization (IM) is vital in viral marketing and biological networ
 k analysis for identifying key influencers. Given its NP-hard nature, appr
 oximate solutions are employed. This paper addresses scalability challenge
 s in a scale-out shared memory system, by focusing on the state-of-the-art
  ...\n\n\nHanjiang Wu, Huan Xu, and Joongun Park (Georgia Institute of Tec
 hnology); Jesmin Jahan Tithi, Fabio Checconi, Jordi Wolfson-Pou, and Fabri
 zio Petrini (Intel Corporation); and Tushar Krishna (Georgia Institute of 
 Technology)\n---------------------\nPredicting Compute Node Unavailability
  in HPC: A Graph-Based Machine Learning Approach\n\nAs high-performance co
 mputing (HPC) systems advance towards Exascale computing, their size and c
 omplexity increase, introducing new maintenance challenges. Modern HPC sys
 tems feature data monitoring infrastructures that provide insights into th
 e system's state. This data can be leveraged to train ma...\n\n\nJože Roža
 nec (Jožef Stefan Institute)\n---------------------\nPerformance evaluatio
 n and modelling of single-precision matrix multiplication on Cerebras CS-2
 \n\nAlthough recent supercomputers have been improving their computational
  performance, achieving performance scaling with respect to the number of 
 nodes is not easy due to long inter-node communication latency. Many attem
 pts have been made to hide communication latency and maintain strong scala
 bility e...\n\n\nRyunosuke Matsuzaki (Meiji University), Daichi Mukunoki (
 Independent), and Takaaki Miyajima (Meiji University)\n-------------------
 --\nInvited Talk: Accelerating Irregular Algorithms on GPUs\n\nMartin Burt
 scher (Texas State University)\n---------------------\nShared Memory-Aware
  Latency-Sensitive Message Aggregation for Fine-Grained Communication\n\nM
 essage aggregation is widely used with a goal to reduce communication cost
  in HPC applications. The discrepancy in the order of overhead of sending 
 a message and cost of per byte transferred motivates the need for message 
 aggregation, for several irregular fine-grained messaging applications lik
 e g...\n\n\nKavitha Chandrasekar (University of Illinois) and Laxmikant Ka
 le (University of Illinois Urbana-Champaign)\n---------------------\nLinea
 r Algebra Approach for Directed Triad Counting and Enumeration\n\nTriangle
  counting and enumeration are commonly used in real-world applications on 
 directed graphs. However, the performance of triangle counting algorithms 
 is usually benchmarked on undirected graphs. As such, many of these algori
 thms and formulations are not suitable for identifying the types of di...\
 n\n\nYuttapichai Kerdcharoen (Carnegie Mellon University, CMKL University)
 ; Upasana Sridhar (Carnegie Mellon University); Orathai Sangpetch (CMKL Un
 iversity); and Tze Meng Low (Carnegie Mellon University)\n----------------
 -----\nConcluding Remarks\n\nAntonino Tumeo and John Feo (Pacific Northwes
 t National Laboratory (PNNL)); Michela Becchi (North Carolina State Univer
 sity); and Ana Lucia Verbanescu (University of Twente, Netherlands)\n-----
 ----------------\nIA ^3 Debate\n\nAntonino Tumeo (Pacific Northwest Nation
 al Laboratory (PNNL)), Ariful Azad (Indiana University), Giulia Guidi (Cor
 nell University), John Leidel (Tactical Computing Laboratories LLC), and G
 eorgios Michelogiannakis (Lawrence Berkeley National Laboratory)\n--------
 -------------\nEfficient Tree-based Parallel Algorithms for N-Body Simulat
 ions Using C++ Standard Parallelism\n\nThe Barnes-Hut approximation for N-
 body simulations reduces the time complexity of the naive all-pairs approa
 ch from O(N^2) to O(N log N) by hierarchically aggregating nearby particle
 s into single entities using a tree data structure. \nThis inherently irre
 gular algorithm poses substantial challenges...\n\n\nThomas Lane Cassell a
 nd Tom Deakin (University of Bristol, England); Aksel Alpay and Vincent He
 uveline (Heidelberg University); and Gonzalo Brito Gadeschi (NVIDIA Corpor
 ation)\n---------------------\nWelcome and Introduction\n\nAntonino Tumeo 
 and John Feo (Pacific Northwest National Laboratory (PNNL)); Michela Becch
 i (North Carolina State University); and Ana Lucia Verbanescu (University 
 of Twente, Netherlands)\n---------------------\nIA^3 — Afternoon Break\n--
 -------------------\nxBS-GNN: Accelerating Billion-Scale GNN Training on F
 PGA\n\nGraph Neural Networks (GNNs) have been used in a variety of challen
 ging applications. However, training GNN models is time-consuming as it in
 cur high volume of irregular data accessing due to its graph-structured in
 put data; such a challenge is further exacerbated in real-world applicatio
 ns as they ...\n\n\nYi-Chien Lin (University of Southern California (USC))
 , Zhijie Xu (University of Michigan), and Viktor Prasanna (University of S
 outhern California (USC))\n---------------------\nAn Adaptive Asynchronous
  Approach for the Single-Source Shortest Paths Problem\n\nLarge-scale grap
 hs with billions and trillions of vertices and edges require efficient par
 allel algorithms for common graph problems, one of which is single-source 
 shortest paths (SSSP). Bulk-synchronous parallel algorithms such as Delta-
 stepping experience large synchronization costs at the scale o...\n\n\nRit
 vik Rao, Kavitha Chandrasekar, and Laxmikant Kale (University of Illinois 
 Urbana-Champaign)\n---------------------\nBatch Updates of Distributed Str
 eaming Graphs using Linear Algebra\n\nWe develop a distributed-memory para
 llel algorithm for performing batch updates on streaming graphs, where ver
 tices and edges are continuously added or removed. Our algorithm leverages
  distributed sparse matrices as the core data structures, utilizing equiva
 lent sparse matrix operations to execute g...\n\n\nElaheh Hassani, Md Tauf
 ique Hussain, and Ariful Azad (Indiana University)\n---------------------\
 nPerformance Analysis of the NICAM Benchmark on MN-Core Processor\n\nLarge
 -scale Computational Fluid Dynamics (CFD) simulations are typical HPC appl
 ications that require both high memory bandwidth and large memory capacity
 . However, it is difficult to achieve high performance for such applicatio
 ns on modern high-performance processors due to their low memory bandwidt.
 ..\n\n\nHikaru Takayashiki and Natsuko Saito (Fixstars Corporation); Hirot
 o Imachi and Ryo Sakamoto (Preferred Networks); and Junichiro Makino (Pref
 erred Networks; Kobe University, Japan)\n---------------------\nNEO-DNND: 
 Communication-Optimized Distributed Nearest Neighbor Graph Construction\n\
 nGraph-based approximate nearest neighbor algorithms have shown high neigh
 bor structure representation quality.\nNN-Descent is a widely known graph-
 based approximate nearest neighbor (ANN) algorithm.\nHowever, graph-based 
 approaches are memory- and time-consuming.\n\nTo address the drawbacks, we
  develop ...\n\n\nKeita Iwabuchi, Trevor Steil, Benjamin Priest, Roger Pea
 rce, and Geoffrey Sanders (Lawrence Livermore National Laboratory (LLNL))\
 n\nTag: Graph Algorithms, Heterogeneous Computing, Programming Frameworks 
 and System Software\n\nRegistration Category: Workshop Reg Pass\n\nSession
  Chairs: Michela Becchi (North Carolina State University); John Feo (Pacif
 ic Northwest National Laboratory (PNNL)); Antonino Tumeo (Pacific Northwes
 t National Laboratory (PNNL)); and Ana Lucia Varbanescu (University of Twe
 nte, Netherlands; University of Amsterdam, Netherlands)
END:VEVENT
END:VCALENDAR
