Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Granularity and Interference-Aware GPU Sharing with MPS
DescriptionWith the advent of exascale computing, GPU acceleration has become central to the performance of supercomputers. Even at this extreme scale, most scientific and HPC-scale DNN applications underutilize GPU resources. Existing GPU sharing mechanisms can be used to increase utilization, throughput, and energy efficiency. However, naively co-scheduling workflows often does not yield optimal results. Scheduling multiple high-utilization workloads on the same set of GPUs, for example, leads to performance degradation due to high resource contention. In short, GPU sharing must be granularity- and interference-aware to maximize the benefit. We propose a scheduling approach that optimizes workflow scheduling configurations for given system metrics--i.e., throughput and energy efficiency, uses workload profiling data to right-size GPU resources for combinations of HPC workflows, and collocates workflows using existing concurrency mechanisms. We show that choosing the right arrangement of workflows to collocate can increase throughput by as much as 2x and energy efficiency by 1.6x.