Close

Presentation

Doubling Graph Traversal Efficiency to 198 TeraTEPS on the Supercomputer Fugaku
DescriptionBreadth-first search (BFS) is a fundamental building block of various high-performance computing applications beyond graph analysis and also known as a benchmark problem in the Graph500 list. The increasing volume of global data demands efficient distributed BFS, which, however, is hindered by the high communication costs of exchanging vertex data between compute nodes. To address this challenge, this paper introduces four techniques: (i) forest pruning, which reduces the number of vertices by eliminating those unnecessary for the search; (ii) group reordering and (iii) multilevel bitmap compression, which decrease the memory footprint of graph data, thereby enabling fewer nodes to manage larger graphs; and (iv) adaptive parameter tuning, which quickly optimizes the hyperparameters of the BFS algorithm. In the evaluation using 152,064 nodes of the supercomputer Fugaku, our implementation achieved 198 tera-traversed edges per second, doubling the performance reported in the latest Graph500-related study on Fugaku.