Close

Presentation

Fast and Efficient Scaling For Microservices with SurgeGuard
DescriptionThe microservice architecture is increasingly popular for flexible, large-scale online applications. However, existing resource management mechanisms incur high latency in detecting Quality-of-Service (QoS) violations, and hence, fail to allocate resources effectively under commonly-observed varying load conditions. This results in over-allocation coupled with a late response that increase both the total cost of ownership and the magnitude of each QoS violation event. We present SurgeGuard, a decentralized resource controller for microservice applications specifically designed to guard application QoS during surges in load and network latency. SurgeGuard uses the key insight that for rapid detection and effective management of QoS violations, the controller must be aware of any available slack in latency and communication patterns between microservices within a task-graph.
Our experiments show that for the workloads in DeathStarBench, SurgeGuard on average reduces the combined violation magnitude and duration by 61.1% and 93.7%, respectively, compared to the well-known Parties and Caladan algorithms.
Event Type
Paper
TimeThursday, 21 November 20241:30pm - 2pm EST
LocationB309
Tags
Cloud Computing
Fault-Tolerance, Reliability, Maintainability, and Adaptability
Resource Management
State of the Practice
Registration Categories
TP