BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250626T234541Z
LOCATION:B311
DTSTART;TZID=America/New_York:20241118T160000
DTEND;TZID=America/New_York:20241118T161500
UID:submissions.supercomputing.org_SC24_sess818_ws_cafcw126@linklings.com
SUMMARY:Cross-HPO: Optimizing Neural Networks for Cancer Drug Response Usi
 ng Hyperparameter Tuning on Multiple Pharmacogenomic Datasets
DESCRIPTION:Rajeev Jain, Justin M. Wozniak, Alexander Partin, Andreas Wilk
 e, Yitan Zhu, Priyanka Vasanthakumari, Oleksandr Narykov, Jamie Overbeek, 
 and Rylie Weaver (Argonne National Laboratory (ANL)); Chen Wang and Yuanha
 ng Liu (Mayo Clinic); Ryan Weil (National Institutes of Health (NIH)); and
  Thomas Brettin and Rick Stevens (Argonne National Laboratory (ANL))\n\nPr
 edicting and comparing anti-cancer drug responses using deep learning mode
 ls across datasets is a challenging modern problem. In this study, we opti
 mized hyperparameters in several novel neural network-based models, includ
 ing GraphDRP [1], IGTD [2], Paccmann [3], PathDSP [4], and HiDRA [5], and 
 a machine learning model LGBM (build with LightGBM), across multiple publi
 c pharmacogenomic datasets: CCLE [6], CTRPv2 [7], gCSI [8], GDSCv1 [9], an
 d GDSCv2 [10]. Our primary objective was to enhance prediction performance
  and robustness through hyperparameter optimization (HPO) tailored to each
  dataset. As a result, we have published the HPO results on GitHub for the
  research community and have started the cross-analysis of these HPO runs.
   The results are a first effort in the cross-model, cross-dataset HPO ana
 lysis (termed “Cross-HPO”) that is now possible. This work will enhance dr
 ug discovery candidate evaluation and increase discovery success.\nBackgro
 und: The Innovative Methodologies and New Data for Predictive Oncology Mod
 el Evaluation (IMPROVE) project provides a robust framework for this resea
 rch. We use the Supervisor [11] hyperparameters optimization framework to 
 run the models on ALCF Polaris, a powerful supercomputer with over 2000 NV
 IDIA A100 GPUs [12].  \nFinding 1: Dataset-specific HPO: Our findings, ill
 ustrated in Fig. 1 for GraphDRP and IGTD models, emphasize the importance 
 of dataset-specific hyperparameter tuning. Tailoring hyperparameters to in
 dividual datasets led to significant performance improvements compared to 
 a uniform approach. This study highlights the inherent complexity and vari
 ability in pharmacogenomic data. As shown in Figs. 2, IGTD-gCSI and Pacman
 n_MCA-gCSI exhibit improved performance with decreasing validation loss ov
 er iterations. In contrast, PathDSP-CCLE and HIDRA-CCLE show plateauing lo
 sses, indicating possible overfitting or suboptimal hyperparameters. LGBM 
 models consistently achieve lower validation loss, mainly on CCLE, undersc
 oring their potential effectiveness.\nFinding 2: Community resources for H
 PO: This work provides a framework for hyperparameter optimization to enha
 nce model performance and underscores the necessity of dataset-specific tu
 ning for neural network models in cancer drug response prediction. Optimiz
 ing hyperparameters can lead to more accurate and reliable predictions, ul
 timately advancing personalized cancer treatments. We have published the H
 PO results in an updateable, versioned data structure called the IMPROVE H
 all of Fame [13].  The studies were performed over standardized HPO range 
 specifications, published and encoded in a readable JSON format specified 
 for compliance with CANDLE conventions [14].  HPO runs were performed on P
 olaris but are portable to other systems via Supervisor, and were run at f
 our standard sizes, SMALL, MEDIUM, LARGE, and XL, which specify the number
  of samples per HPO iteration and number of HPO iterations. \nFinding 3: C
 ross-model behavior analysis for HPO: This data corpus makes many forms of
  analysis possible.  Fig. 3 shows the aggregate performance across all mod
 el-dataset runs for loss improvement.  This can be used as a reference poi
 nt for future HPO runs, as the results show an exceptional 56.35% improvem
 ent in validation loss.  Fig. 4 shows the varying optimal hyperparameters 
 for IGTD across datasets.  In Fig. 5, we show model-dataset results in an 
 at-a-glance format, so that a fingerprint of the different behaviors of th
 e combinations can be quickly seen.\n\nTag: Artificial Intelligence/Machin
 e Learning, Biology, Education, Emerging Technologies, Medicine, Modeling 
 and Simulation\n\nRegistration Category: Workshop Reg Pass\n\nSession Chai
 rs: Lynn Borkon (Frederick National Laboratory for Cancer Research); Laure
 n Lewis (Frederick National Laboratory for Cancer Research); and Eric Stah
 lberg (MD Anderson Cancer Center, University of Texas)\n\n
END:VEVENT
END:VCALENDAR
