BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T143141Z
LOCATION:B311
DTSTART;TZID=America/New_York:20241120T103000
DTEND;TZID=America/New_York:20241120T110000
UID:submissions.supercomputing.org_SC24_sess369_pap588@linklings.com
SUMMARY:Distributed-Memory Parallel Algorithms for  Sparse Matrix and Spar
 se Tall-and-Skinny  Matrix Multiplication
DESCRIPTION:Isuru Ranawaka and Md Taufique Hussain (Indiana University); C
 harles Block, Gerasimos Gerogiannis, and Josep Torrellas (University of Il
 linois Urbana-Champaign); and Ariful Azad (Indiana University)\n\nWe consi
 der a sparse matrix-matrix multiplication (SpGEMM) setting where one matri
 x is square and the other is tall and skinny. This special variant, which 
 we call  TS-SpGEMM, has important applications in multi-source breadth-fir
 st search, influence maximization, sparse graph embedding, and algebraic m
 ultigrid solvers.  Unfortunately, popular distributed algorithms like spar
 se SUMMA deliver suboptimal performance for TS-SpGEMM. To address this lim
 itation, we \ndevelop a novel distributed-memory algorithm tailored for TS
 -SpGEMM. \nOur approach employs customized 1D partitioning for all matrice
 s involved and leverages sparsity-aware tiling for efficient data transfer
 s. In addition, it minimizes communication overhead by incorporating both 
 local and remote computations. On average, our TS-SpGEMM algorithm attains
  5X performance gains over 2D and 3D SUMMA. Furthermore, we use our algori
 thm to implement multi-source breadth-first search and sparse graph embedd
 ing algorithms\nand demonstrate their scalability up to 512 Nodes (or 65,5
 36 cores) on NERSC Perlmutter.\n\nTag: Algorithms, Artificial Intelligence
 /Machine Learning, Graph Algorithms, Linear Algebra\n\nRegistration Catego
 ry: Tech Program Reg Pass\n\nSession Chair: Jiajia Li (North Carolina Stat
 e University)\n\n
END:VEVENT
END:VCALENDAR
