Close

Presentation

cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio
DescriptionExisting GPU lossy compressors suffer from expensive data movement overheads, inefficient memory access patterns, and high synchronization latency, resulting in limited throughput. This work proposes cuSZp2, a generic single-kernel error-bounded lossy compressor purely on GPUs designed for applications that require high speed, such as large-scale GPU simulation and large language model training. In particular, cuSZp2 proposes a novel lossless encoding method, optimizes memory access patterns, and hides synchronization latency, achieving extreme end-to-end throughput and optimized compression ratio. Experiments on NVIDIA A100 GPU with nine real-world HPC datasets demonstrate that, even with higher compression ratios and data quality, cuSZp2 can deliver on average 332.42 and 513.04 GB/s end-to-end throughput for compression and decompression, respectively, which is around 2× of existing pure-GPU compressors and 200× of CPU-GPU hybrid compressors.
Event Type
Paper
TimeTuesday, 19 November 202411:30am - 12pm EST
LocationB308
Tags
Accelerators
Algorithms
Data Compression
I/O, Storage, Archive
Performance Optimization
Registration Categories
TP
Award Finalists
Best Student Paper Finalist