Presentation
Profiling Communication Overhead in 3D Parallel Pretrain of Large Language Models
DescriptionTraining large language models (LLMs) efficiently requires addressing the communication overhead introduced by parallelism strategies like Tensor, Pipeline, and Data Parallelism. This work profiles the communication patterns in LLM pretraining using the Polaris supercomputer, highlighting the impact of Tensor Parallelism, which suffers from significant overhead as parallelism scales. To mitigate this, we apply hZCCL, a homomorphic compression technique that reduces communication costs by eliminating decompression-operation-compression cycles. Our results show hZCCL accelerates training, achieving up to 6.77× speedup in multi-threaded modes while maintaining data accuracy. These improvements allow for more efficient scaling of LLM pretraining across distributed nodes.

Event Type
ACM Student Research Competition: Graduate Poster
ACM Student Research Competition: Undergraduate Poster
Doctoral Showcase
Posters
TimeTuesday, 19 November 202412pm - 5pm EST
LocationB302-B305
TP
XO/EX
