Presentation
Filling the Void: Data-Driven Machine Learning-based Reconstruction of Sampled Spatiotemporal Scientific Simulation Data
DescriptionAs high-performance computing systems continue to advance, the gap between computing performance and I/O capabilities is widening. This bottleneck limits the storage capabilities of increasingly large-scale simulations, which generate data at never-before-seen granularities while only being able to store a small subset of the raw data. Recently, strategies for data-driven sampling have been proposed. However, a thorough analysis of how such intelligent samples can be used for data reconstruction is lacking. We propose a data-driven machine learning approach based on training neural networks to reconstruct full-scale datasets based on a simulation’s sampled output. Compared to current state-of-the-art reconstruction approaches, we demonstrate that our machine learning-based reconstruction has several advantages, including reconstruction quality, time-to-reconstruct, and knowledge transfer to unseen timesteps and grid resolutions. We propose and evaluate strategies that balance the sampling rates with model training (pretraining and fine-tuning) and data reconstruction time to demonstrate its efficacy.