Presentation
Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory
DescriptionDeep Learning Recommendation Models (DLRMs) are widely deployed in industry, demanding memory capacities at the terabyte scale. Tiered memory architectures offer a cost-effective solution but introduce complexities in embedding-vector placement due to intricate access patterns. In this talk, we introduce RecMG, a machine learning (ML)-guided system for vector caching and prefetching in tiered memory environments. RecMG tackles the unique challenges of data labeling and navigates the vast search space for embedding-vector placement, making ML practically feasible for DLRM inference. By leveraging separate ML models for caching and prefetching, along with a novel differentiable loss function, RecMG dramatically narrows the prefetching search space and minimizes on-demand fetches.RecMG effectively reduces end-to-end DLRM inference time by up to 43% in industrial-scale DLRM inference scenarios.
Presenter