BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T234544Z
LOCATION:B314
DTSTART;TZID=America/New_York:20241118T153000
DTEND;TZID=America/New_York:20241118T160000
UID:submissions.supercomputing.org_SC24_sess758_misc387@linklings.com
SUMMARY:Machine Learning-Guided Memory Optimization for DLRM Inference on 
 Tiered Memory
DESCRIPTION:Jie Ren (William & Mary)\n\nDeep Learning Recommendation Model
 s (DLRMs) are widely deployed in industry, demanding memory capacities at 
 the terabyte scale. Tiered memory architectures offer a cost-effective sol
 ution but introduce complexities in embedding-vector placement due to intr
 icate access patterns. In this talk, we introduce RecMG, a machine learnin
 g (ML)-guided system for vector caching and prefetching in tiered memory e
 nvironments. RecMG tackles the unique challenges of data labeling and navi
 gates the vast search space for embedding-vector placement, making ML prac
 tically feasible for DLRM inference. By leveraging separate ML models for 
 caching and prefetching, along with a novel differentiable loss function, 
 RecMG dramatically narrows the prefetching search space and minimizes on-d
 emand fetches.RecMG effectively reduces end-to-end DLRM inference time by 
 up to 43% in industrial-scale DLRM inference scenarios.\n\nTag: Artificial
  Intelligence/Machine Learning, Codesign\n\nRegistration Category: Worksho
 p Reg Pass\n\nSession Chairs: John Feo (Pacific Northwest National Laborat
 ory (PNNL)), Jiyuan Zhang (Meta), and Amelie Chi Zhou (Hong Kong Baptist U
 niversity)\n\n
END:VEVENT
END:VCALENDAR
