Close

Presentation

ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments
DescriptionIn cloud environments, GPU-based deep neural network (DNN) inference servers are required to meet the Service Level Objective (SLO) latency for each workload under a specified request rate, while also minimizing GPU resource consumption. However, previous studies have not fully achieved this objective. In this paper, we propose ParvaGPU, a technology that facilitates spatial GPU sharing for large-scale DNN inference in cloud computing. ParvaGPU integrates NVIDIA's Multi-Instance GPU (MIG) and Multi-Process Service (MPS) technologies to enhance GPU utilization, with the goal of meeting the diverse SLOs of each workload and reducing overall GPU usage. Specifically, ParvaGPU addresses the challenges of minimizing underutilization within allocated GPU space partitions and external fragmentation in combined MIG and MPS environments. We conducted our assessment on multiple A100 GPUs, evaluating 11 diverse DNN workloads with varying SLOs. Our evaluation revealed no SLO violations and a significant reduction in GPU usage compared to state-of-the-art frameworks.
Event Type
Paper
TimeWednesday, 20 November 202411:30am - 12pm EST
LocationB308
Tags
Accelerators
Artificial Intelligence/Machine Learning
Cloud Computing
Distributed Computing
Heterogeneous Computing
Performance Optimization
Registration Categories
TP