Close

Presentation

CUDASTF: Bridging the Gap Between CUDA and Task Parallelism
DescriptionOrganizing computation as asynchronous tasks with data-driven dependencies is a simple and efficient model for single-and-multi-GPU programs. Sequential Task Flow (STF) is such a model, which derives task graphs from data dependencies.
We propose CUDASTF, a C++ library that implements STF over CUDA APIs, fostering easy creation of scalable and composable algorithms. Users may easily elect to use CUDA graphs instead of streams if needed. Structured kernels spanning multiple devices can exercise fine-grained control of affinity.
Implementation-wise, CUDASTF makes a compelling argument for an event-based approach to asynchronous parallel libraries. We obtain up to a 1.8x improvement over the cusolverMg library on Cholesky decomposition. On a small weather simulation task we demonstrate near-optimal scalability of our multi-gpu kernels; also, on a single GPU, CUDA graphs improve performance by up to 30%. Finally, we were able to author the first implementation of the CKKS Fully Homomorphic Encryption scheme over multiple devices.
Event Type
Paper
TimeWednesday, 20 November 202410:30am - 11am EST
LocationB309
Tags
Distributed Computing
Heterogeneous Computing
Programming Frameworks and System Software
Runtime Systems
Task Parallelism
Registration Categories
TP
Award Finalists
Best Paper Finalist