BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T234541Z
LOCATION:B314
DTSTART;TZID=America/New_York:20241118T143000
DTEND;TZID=America/New_York:20241118T150000
UID:submissions.supercomputing.org_SC24_sess758_misc386@linklings.com
SUMMARY:TurboMoE: Enhancing MoE Training with Optimized Gating and Efficie
 nt Parallelization
DESCRIPTION:Reza Yazdani (Snowflake)\n\nThe Mixture of Experts (MoE) model
  has emerged as a scalable solution for large-scale machine learning tasks
 , thanks to its dynamic expert selection. However, the gating mechanism th
 at controls this selection together with all-2-all collectives can create 
 significant computation and communication bottlenecks. In this talk, we pr
 esent TurboMoE, a novel approach to accelerate MoE model training. TurboMo
 E employs innovative kernel-fusion and data-layout transformations to stre
 amline the gating process, along with a new parallelization layout that mi
 nimizes communication overhead. We also present a re-engineered MoE archit
 ecture, which is employed at Snowflake’s Arctic to enable overlapping comm
 unication with parallel computation, leading to a more efficient training 
 process.\n\nTag: Artificial Intelligence/Machine Learning, Codesign\n\nReg
 istration Category: Workshop Reg Pass\n\nSession Chairs: John Feo (Pacific
  Northwest National Laboratory (PNNL)), Jiyuan Zhang (Meta), and Amelie Ch
 i Zhou (Hong Kong Baptist University)\n\n
END:VEVENT
END:VCALENDAR
