BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T233528Z
LOCATION:B302-B305
DTSTART;TZID=America/New_York:20241120T100000
DTEND;TZID=America/New_York:20241120T170000
UID:submissions.supercomputing.org_SC24_sess533_post198@linklings.com
SUMMARY:Profiling Communication Overhead in 3D Parallel Pretrain of Large 
 Language Models
DESCRIPTION:Weijin Liu and Xiaodong Yu (Stevens Institute of Technology)\n
 \nTraining large language models (LLMs) efficiently requires addressing th
 e communication overhead introduced by parallelism strategies like Tensor,
  Pipeline, and Data Parallelism. This work profiles the communication patt
 erns in LLM pretraining using the Polaris supercomputer, highlighting the 
 impact of Tensor Parallelism, which suffers from significant overhead as p
 arallelism scales. To mitigate this, we apply hZCCL, a homomorphic compres
 sion technique that reduces communication costs by eliminating decompressi
 on-operation-compression cycles. Our results show hZCCL accelerates traini
 ng, achieving up to 6.77× speedup in multi-threaded modes while maintainin
 g data accuracy. These improvements allow for more efficient scaling of LL
 M pretraining across distributed nodes.\n\nRegistration Category: Tech Pro
 gram Reg Pass, Exhibits Reg Pass\n\nSession Chairs: Ayesha Afzal (Friedric
 h-Alexander University, Erlangen-Nuremberg; Erlangen National High Perform
 ance Computing Center); Sally Ellingson (University of Kentucky); and Alan
  Sussman (University of Maryland)\n\n
END:VEVENT
END:VCALENDAR
