Presentation
Architecting and deploying compute clusters for large language models
DescriptionAs the size of large language models and the processing needs keep on increasing, the compute infrastructure needs to adapt to be able to handle these reliably. In particular in addition to having a large number of processing units, the platform needs to provide guarantees on fabric and IO but also software strategies to schedule jobs and cache data reliably. In this work, we will show how some strategic choices on reference design definitions, combined with versatile scheduling, checkpointing, and validation strategies can help leverage the infrastructure for best performance. We will also review how scaling up to extreme scale impacts the hardware and software implementation choices for LLMs.
Presenter
Event Type
Workshop
TimeFriday, 22 November 20248:40am - 9:20am EST
LocationB309
Debugging and Correctness Tools
Hardware Technologies
Resource Management
State of the Practice
W
Archive
view

