Presentation
PULSE: Using Mixed-Quality Models for Reducing Serverless Keep-Alive Cost
DescriptionThis paper addresses a key challenge with using serverless computing for machine learning (ML) inference, which is cold starts that occur during initial invocations and container inactivity. Fixed keep-alive policies, like the commonly adopted 10-minute strategy, have been implemented by cloud providers to alleviate cold start issues. However, the substantial size of ML models poses a significant hurdle, leading to elevated keep-alive costs and potential strain on system resources. In response to these challenges, we introduce PULSE, a dynamic 10-minute keep-alive mechanism that employs ML model variants to optimize the balance between keep-alive costs, accuracy, and service time while avoiding peaks in keep-alive memory consumption. Our evaluation, using real-world serverless workloads and commonly used machine learning models, demonstrates reduced keep-alive costs compared to the fixed policy. Additionally, we observe that integrating PULSE improves the performance of existing state-of-the-art serverless function warm-up strategies.