BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T234542Z
LOCATION:B312-B313A
DTSTART;TZID=America/New_York:20241119T103000
DTEND;TZID=America/New_York:20241119T110000
UID:submissions.supercomputing.org_SC24_sess496_gb102@linklings.com
SUMMARY:Democratizing AI: Open-Source Scalable LLM Training on GPU-Based S
 upercomputers
DESCRIPTION:Siddharth Singh, Prajwal Singhania, Aditya Ranjan, and John Ki
 rchenbauer (University of Maryland); Jonas Geiping (Max Planck Institute f
 or Intelligent Systems); Yuxin Wen, Neel Jain, Abhimanyu Hans, and Manli S
 hu (University of Maryland); Aditya Tomar (University of California, Berke
 ley); and Tom Goldstein and Abhinav Bhatele (University of Maryland)\n\nTr
 aining and fine-tuning large language models (LLMs) with hundreds of billi
 ons to trillions of parameters requires tens of thousands of GPUs, and a h
 ighly scalable software stack. In this work, we present a novel four-dimen
 sional hybrid parallel algorithm implemented in a highly scalable, portabl
 e, open-source framework called AxoNN. We describe several performance opt
 imizations in AxoNN to improve matrix multiplication kernel performance an
 d overlap non-blocking collectives with computation, and performance model
 ing to choose performance-optimal configurations.\n\nWhile the abilities o
 f LLMs improve with the number of trainable parameters, so do privacy and 
 copyright risks caused by memorization of training data, which can cause d
 isclosure of sensitive or private information at inference time. We highli
 ght this side effect of scale through experiments that explore "catastroph
 ic memorization,'' where models are sufficiently large to memorize trainin
 g data in a single pass, and present an approach to prevent it.\n\nRegistr
 ation Category: Tech Program Reg Pass\n\nSession Chair: Barbara Chapman (H
 ewlett Packard Enterprise (HPE), Stony Brook University)\n\n
END:VEVENT
END:VCALENDAR
