Close

Presentation

Cross-HPO: Optimizing Neural Networks for Cancer Drug Response Using Hyperparameter Tuning on Multiple Pharmacogenomic Datasets
DescriptionPredicting and comparing anti-cancer drug responses using deep learning models across datasets is a challenging modern problem. In this study, we optimized hyperparameters in several novel neural network-based models, including GraphDRP [1], IGTD [2], Paccmann [3], PathDSP [4], and HiDRA [5], and a machine learning model LGBM (build with LightGBM), across multiple public pharmacogenomic datasets: CCLE [6], CTRPv2 [7], gCSI [8], GDSCv1 [9], and GDSCv2 [10]. Our primary objective was to enhance prediction performance and robustness through hyperparameter optimization (HPO) tailored to each dataset. As a result, we have published the HPO results on GitHub for the research community and have started the cross-analysis of these HPO runs. The results are a first effort in the cross-model, cross-dataset HPO analysis (termed “Cross-HPO”) that is now possible. This work will enhance drug discovery candidate evaluation and increase discovery success.
Background: The Innovative Methodologies and New Data for Predictive Oncology Model Evaluation (IMPROVE) project provides a robust framework for this research. We use the Supervisor [11] hyperparameters optimization framework to run the models on ALCF Polaris, a powerful supercomputer with over 2000 NVIDIA A100 GPUs [12].
Finding 1: Dataset-specific HPO: Our findings, illustrated in Fig. 1 for GraphDRP and IGTD models, emphasize the importance of dataset-specific hyperparameter tuning. Tailoring hyperparameters to individual datasets led to significant performance improvements compared to a uniform approach. This study highlights the inherent complexity and variability in pharmacogenomic data. As shown in Figs. 2, IGTD-gCSI and Pacmann_MCA-gCSI exhibit improved performance with decreasing validation loss over iterations. In contrast, PathDSP-CCLE and HIDRA-CCLE show plateauing losses, indicating possible overfitting or suboptimal hyperparameters. LGBM models consistently achieve lower validation loss, mainly on CCLE, underscoring their potential effectiveness.
Finding 2: Community resources for HPO: This work provides a framework for hyperparameter optimization to enhance model performance and underscores the necessity of dataset-specific tuning for neural network models in cancer drug response prediction. Optimizing hyperparameters can lead to more accurate and reliable predictions, ultimately advancing personalized cancer treatments. We have published the HPO results in an updateable, versioned data structure called the IMPROVE Hall of Fame [13]. The studies were performed over standardized HPO range specifications, published and encoded in a readable JSON format specified for compliance with CANDLE conventions [14]. HPO runs were performed on Polaris but are portable to other systems via Supervisor, and were run at four standard sizes, SMALL, MEDIUM, LARGE, and XL, which specify the number of samples per HPO iteration and number of HPO iterations.
Finding 3: Cross-model behavior analysis for HPO: This data corpus makes many forms of analysis possible. Fig. 3 shows the aggregate performance across all model-dataset runs for loss improvement. This can be used as a reference point for future HPO runs, as the results show an exceptional 56.35% improvement in validation loss. Fig. 4 shows the varying optimal hyperparameters for IGTD across datasets. In Fig. 5, we show model-dataset results in an at-a-glance format, so that a fingerprint of the different behaviors of the combinations can be quickly seen.
Event Type
Workshop
TimeMonday, 18 November 20244pm - 4:15pm EST
LocationB311
Tags
Artificial Intelligence/Machine Learning
Biology
Education
Emerging Technologies
Medicine
Modeling and Simulation
Registration Categories
W