Close

Presentation

RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules
DescriptionIndustrial recommendation models typically involve numerous feature fields. The embedding computation workloads are heterogeneous across these fields, thus requiring varied optimal code schedules. While existing solutions apply basic fusion optimization for embedding operations, they inefficiently treat all feature-fields with identical schedules, leading to suboptimal performance. In this paper, we introduce RecFlex, which generates fused kernels with distinct schedules for different feature-fields. RecFlex employs the interference-aware schedule tuner to tune schedules and the heterogeneous schedule fusion compiler to generate fused kernels, addressing two major challenges. To determine optimal schedules of different feature-fields within the fused kernel, RecFlex proposes a two-stage interference-simulated tuning strategy. To handle dynamic workloads that challenge tuning and fusion, RecFlex combines compile-time schedule tuning with runtime kernel thread mapping. RecFlex surpasses state-of-the-art libraries and compilers, achieving average speedups of 2.64×, 20.77×, and 11.31× over TorchRec, HugeCTR, and RECom, respectively. RecFlex is publicly available at https://github.com/PanZaifeng/RecFlex.
Event Type
Paper
TimeWednesday, 20 November 202411am - 11:30am EST
LocationB308
Tags
Accelerators
Artificial Intelligence/Machine Learning
Cloud Computing
Distributed Computing
Heterogeneous Computing
Performance Optimization
Registration Categories
TP