Close

Presentation

Identifying Regions of Non-Determinism in HPC Simulations Through Event Graph Alignment
DescriptionHigh performance computing (HPC) applications using MPI (Message Passing Interface) often face non-determinism (ND) due to asynchronous MPI calls, making ND source identification challenging. Modeling execution as an event graph, where MPI calls are nodes and communication is edges, can be useful. Focusing on Message ND, which involves variability in MPI communication order across runs, we detect potential ND sources by comparing edge sets between event graphs. Accurate comparison requires aligning event graph nodes, but traditional methods like NetAlign, graphlet degree vectors, and Graph Auto-Encoders struggle due to the regularity of event graphs. We propose a meta graph heuristic utilizing structural constraints and a message passing scheme for sparse directed acyclic graphs, achieving up to 70% improvement in alignment accuracy over conventional techniques.