Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
A Comparative Survey: Reusing Small Pre-Trained Models for Efficient Large Model Training
SessionAI4S: 5th Workshop on Artificial Intelligence and Machine Learning for Scientific Applications
DescriptionTraining large language models is becoming increasingly complex due to the rapid expansion in their size, resulting in significant computational costs. To address this challenge, various model growth methodologies have been proposed to leverage smaller pre-trained models to incrementally build larger models and reduce computational requirements. These methods typically involve mapping parameters from small models to large ones using either static functions or learned mappings. Although these approaches have demonstrated effectiveness, there is a lack of comprehensive comparative evaluations in the literature. Additionally, combining different methodologies could potentially yield superior performance. This study provides a uniform evaluation of multiple state-of-the-art model growth techniques and their combinations, revealing that efficient combination techniques can reduce the training cost (in TFLOPs) of individual methods by up to 80%.