BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T233532Z
LOCATION:B302-B305
DTSTART;TZID=America/New_York:20241121T100000
DTEND;TZID=America/New_York:20241121T170000
UID:submissions.supercomputing.org_SC24_sess534_post266@linklings.com
SUMMARY:Improvement of Bridges-2 Resource Utilization Through User Optimiz
 ation
DESCRIPTION:Walter Ashworth (University of Tennessee, Knoxville); Julian U
 ran (Pittsburgh Supercomputing Center (PSC), Carnegie Mellon University); 
 Michela Taufer (University of Tennessee, Knoxville); and Paola Buitrago (P
 ittsburgh Supercomputing Center (PSC), Carnegie Mellon University)\n\nThis
  poster presents our two-phase solution for improving GPU utilization in N
 SF-funded ACCESS high-performance computing (HPC) clusters, with a pilot i
 mplementation on Pittsburgh Supercomputing Center’s Bridges-2. Our approac
 h addresses the limitations of Open XdMoD, which lacks per-job GPU usage m
 onitoring and experiences delays in data availability. In phase one, we de
 velop a data ingestion layer to collect GPU indices and resource usage dat
 a, utilizing existing software tools for efficient data aggregation and an
 alysis. Analyzing 5,717 completed GPU jobs revealed issues such as workflo
 w configuration errors, framework misconfigurations, and low GPU utilizati
 on. In phase two we create a user-facing platform with modern web tools. T
 his platform will automatically detect inefficiencies, notify users via em
 ail, and provide actionable insights to optimize resource management. By a
 ddressing these issues and integrating real-time data presentation, we aim
  to enhance overall system utilization, reduce GPU job wait times, and ena
 ble more efficient use of existing resources.\n\nRegistration Category: Te
 ch Program Reg Pass, Exhibits Reg Pass\n\nSession Chairs: Ayesha Afzal (Fr
 iedrich-Alexander University, Erlangen-Nuremberg; Erlangen National High P
 erformance Computing Center); Sally Ellingson (University of Kentucky); an
 d Alan Sussman (University of Maryland)\n\n
END:VEVENT
END:VCALENDAR
