BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T233527Z
LOCATION:B302-B305
DTSTART;TZID=America/New_York:20241120T100000
DTEND;TZID=America/New_York:20241120T170000
UID:submissions.supercomputing.org_SC24_sess533_post116@linklings.com
SUMMARY:Eve: Less Memory, Same Might
DESCRIPTION:Aditya Tomar (University of California, Berkeley) and Siddhart
 h Singh, Tom Goldstein, and Abhinav Bhatele (University of Maryland)\n\nAd
 aptive optimizers, which adjust the learning rate for individual parameter
 s, have become the standard for training deep neural networks. AdamW is a 
 popular adaptive method that maintains two optimizer state values (momentu
 m and variance) per parameter, doubling the model’s memory usage during tr
 aining. Many proposed memory efficient optimizers claim to match AdamW’s p
 erformance but lack its desirable qualities such as robustness to learning
  rate changes. This quality is especially desirable when pre-training LLMs
 , where experimenting with different hyperparameters is infeasible. We pro
 pose Eve, a Memory Efficient AdaptiVe Moment Estimation algorithm that sav
 es memory by reducing the variance term while also preserving AdamW’s desi
 rable properties across different training settings. We fine-tune Llama 2 
 70B on 64 GPUs and show memory savings of 20% compared to AdamW. We also c
 ompare our method to a recent well-received memory efficient optimizer cal
 led Adam-mini and demonstrate better training stability across various lea
 rning rates.\n\nRegistration Category: Tech Program Reg Pass, Exhibits Reg
  Pass\n\nSession Chairs: Ayesha Afzal (Friedrich-Alexander University, Erl
 angen-Nuremberg; Erlangen National High Performance Computing Center); Sal
 ly Ellingson (University of Kentucky); and Alan Sussman (University of Mar
 yland)\n\n
END:VEVENT
END:VCALENDAR
