Presentation
PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters
SessionScheduling
DescriptionLarge-scale computing systems are increasingly using GPUs to enable peta- and exa-scale levels of compute to meet the needs of modern applications. Given the widespread and growing use of ML, including in scientific applications, optimizing clusters for ML workloads is important. However, recent work has demonstrated that accelerators in these clusters can suffer from performance variability, leading to resource under-utilization and load imbalance. In this work we focus on how clusters schedulers can embrace performance variability to mitigate its effects. We design a novel cluster scheduler, PAL, which uses application-specific variability profiles to improve job performance and resource utilization. PAL also balances performance variability with locality. Overall, PAL significantly improves GPU-rich cluster scheduling: across traces for six ML workloads with a variety of variability profiles, PAL improves geomean job completion time by 42% and cluster utilization by 28% over existing state-of-the-art schedulers.
Event Type
Paper
TimeTuesday, 19 November 20242pm - 2:30pm EST
LocationB308
Middleware and System Software
Programming Frameworks and System Software
Resource Management
TP
Archive
view
