Presentation
Realizing Joint Extreme-Scale Simulations on Multiple Supercomputers - Two Superfacility Case Studies
SessionScaling and Checkpointing
DescriptionHigh-dimensional grid-based simulations serve as both a tool and a challenge in researching various domains.
The main challenge of these approaches is the well-known curse of dimensionality, amplified by the need for fine resolutions in high-fidelity applications.
The combination technique (CT) provides a straightforward way of performing such simulations while alleviating the curse of dimensionality.
Recent work demonstrated the potential of the CT to join multiple systems simultaneously to perform a single high-dimensional simulation.
This paper shows an extension to three or more systems and addresses some remaining challenges: load balancing on heterogeneous hardware; utilizing compression to maximize the communication bandwidth; efficient I/O management through hardware mapping; improving memory utilization through algorithmic optimizations.
Combining these contributions, we demonstrate the CT for extreme-scale Superfacility scenarios of 46-trillion DOF on two systems and 35-trillion DOF on three systems.
Scenarios at these resolutions would be intractable with full-grid solvers (>1,000-nonillion DOF each).
The main challenge of these approaches is the well-known curse of dimensionality, amplified by the need for fine resolutions in high-fidelity applications.
The combination technique (CT) provides a straightforward way of performing such simulations while alleviating the curse of dimensionality.
Recent work demonstrated the potential of the CT to join multiple systems simultaneously to perform a single high-dimensional simulation.
This paper shows an extension to three or more systems and addresses some remaining challenges: load balancing on heterogeneous hardware; utilizing compression to maximize the communication bandwidth; efficient I/O management through hardware mapping; improving memory utilization through algorithmic optimizations.
Combining these contributions, we demonstrate the CT for extreme-scale Superfacility scenarios of 46-trillion DOF on two systems and 35-trillion DOF on three systems.
Scenarios at these resolutions would be intractable with full-grid solvers (>1,000-nonillion DOF each).