Close

Presentation

Integrating Evolutionary Algorithms with Distributed Deep Learning for Optimizing Hyperparameters on HPC Systems
DescriptionHigh performance computing (HPC) systems have become essential for solving complex scientific problems, particularly in the context of deep learning (DL). This extended abstract presents a novel system that uses a multiobjective evolutionary algorithm (EA) to optimize hyperparameters for a deep learning model, AtomAI, to minimize validation training loss and energy use. We will be using the parallel and distributed computing capabilities of Dask and the scalable provenance features of FlowCept to measure CPU and
GPU resource usage as proxies for energy consumption. Our approach focuses on integrating multiple software components to operate efficiently on large-scale HPC systems, specifically targeting the OLCF's Frontier supercomputer, but should be generalizable to other HPC environments.
Event Type
Workshop
TimeMonday, 18 November 20242:15pm - 2:20pm EST
LocationB302
Tags
Applications and Application Frameworks
Distributed Computing
Middleware and System Software
Registration Categories
W