BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T233526Z
LOCATION:B302-B305
DTSTART;TZID=America/New_York:20241120T100000
DTEND;TZID=America/New_York:20241120T170000
UID:submissions.supercomputing.org_SC24_sess533_post186@linklings.com
SUMMARY:Stalls and Memory Analysis on Fujitsu A64FX and NVIDIA Grace
DESCRIPTION:Yan Kang (Pennsylvania State University), Sayan Ghosh (Pacific
  Northwest National Laboratory (PNNL)), Mahmut Kandemir (Pennsylvania Stat
 e University), and Andres Marquez (Pacific Northwest National Laboratory (
 PNNL))\n\nARM-based multicore CPUs, such as NVIDIA Grace and Fujitsu A64FX
 , dominate contemporary HPC, featuring 32-256 cores with cache hierarchies
  and up to 1 TB/s memory bandwidth. While benchmarks like STREAM show simi
 lar performance across these systems, diverse applications, particularly g
 raph and nearest-neighbor (e.g., stencils), reveal distinct performance pr
 ofiles. Analyzing these profiles with low-level performance data can uncov
 er system bottlenecks. We propose a template focusing on stalls and memory
  accesses to identify bottlenecks efficiently by studying key CPU/memory p
 erformance events using Linux perf. Our approach engages all cores (144 fo
 r Grace, 48 for A64FX) with platform-specific compilers (ARMClang 24.04 fo
 r Grace, Fujitsu 4.10 for A64FX). This method effectively categorizes appl
 ication scenarios by analyzing stalls and memory accesses, enabling quick 
 identification of corner cases.\n\nRegistration Category: Tech Program Reg
  Pass, Exhibits Reg Pass\n\nSession Chairs: Ayesha Afzal (Friedrich-Alexan
 der University, Erlangen-Nuremberg; Erlangen National High Performance Com
 puting Center); Sally Ellingson (University of Kentucky); and Alan Sussman
  (University of Maryland)\n\n
END:VEVENT
END:VCALENDAR
