Close

Presentation

llm-recipes: A Framework for Seamless Integration and Efficient Continual Pre-Training of Large Language Models
DescriptionLarge Language Models (LLMs) have advanced natural language processing but present challenges in training due to the complexity and resource demands. A significant issue is the inconsistency in checkpoint formats across pre-training libraries, complicating the use of pre-trained weights for continued training. To address this, we introduce llm-recipes, an open-source framework that streamlines the continual pre-training process by enabling direct use of Hugging Face Transformers checkpoints without conversion. This framework supports multi-node distributed training using PyTorch Fully Sharded Data Parallel (FSDP), enhancing scalability for large-scale models. Unlike existing tools, llm-recipes offers broader support for various model architectures and flexible training configurations, making it an adaptable solution for researchers and developers. Our experiments demonstrate its effective scalability, with high training throughput up to 64 GPUs, confirming its suitability for large-scale distributed training of LLMs.
Event Type
Workshop
TimeFriday, 22 November 20249:45am - 10am EST
LocationB206
Tags
Artificial Intelligence/Machine Learning
Registration Categories
W