Close

Presentation

Scaling New Heights: Transformative Cross-GPU Sampling for Training Billion-Edge Graphs
DescriptionTraining GNNs on billion-edge graphs faces significant memory and data transfer bottlenecks, especially with GPU-based sampling. Traditional methods struggle with CPU-GPU data transfer bottlenecks or high data shuffling and synchronization overheads in multi-GPU setups.
To overcome these challenges in GNN training on large-scale graphs, we introduce HyDRA, a pioneering framework that elevates mini-batch, sampling-based training. HyDRA transforms cross-GPU sampling by seamlessly integrating sampling and data transfer into a single kernel. It employs a hybrid pointer-driven technique for efficient neighbor retrieval, utilizes targeted replication for high-degree vertices to cut communication overhead, and adopts dynamic cross-batch data orchestration with pipelining to decrease redundant transfers. Tested on systems with up to 64 A100 GPUs, HyDRA significantly surpasses existing methods, offering 1.4x to 5.3x quicker training speeds than DSP and DGL-UVA, and showing up to a 42x boost in multi-GPU scalability. This establishes HyDRA as a benchmark for high-performance, large-scale GNN training.
Event Type
Paper
TimeWednesday, 20 November 20242pm - 2:30pm EST
LocationB308
Tags
Accelerators
Applications and Application Frameworks
Artificial Intelligence/Machine Learning
Distributed Computing
Graph Algorithms
Heterogeneous Computing
Tensors
Registration Categories
TP