Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Invited Talk: Mapping Irregular Computations to Accelerator-Based Exascale Systems
DescriptionAs traditional technology drivers of computing performance level off, the use of accelerators with various levels of specialization, are growing in importance. At the same time, data movement continues to dominate running time and energy costs, making communication cost reduction the primary optimization criteria for compilers and programmers. This requires new ways of thinking about algorithms to minimize and hide communication, expose fine-grained parallelism, and manage communication. These changes will affect the theoretical models of computing, the analysis of performance, the design of algorithms, and the practice of programming.
In this talk I will discuss prior work and open problems in optimizing communication, avoiding synchronization, and tolerating nondeterminism, using data analysis and statistical learning problems from biology as driving examples. I will discuss distributed data structures and communication optimizations in large-scale genome analysis, including metagenome assembly, protein clustering, and more. The algorithms represented data analysis “motifs” including hashing, alignment, generalized n-body, and sparse matrices. I will describe two parallelization approaches, one based on asynchronous one-sided communication and another based on bulk-synchronous collectives using GraphBLAS. I will give an overview of these approaches, describe the GPU parallelizations, and highlight some of the resulting scientific insights, including the discovery of new microbial species and new protein functional dark matter.
In this talk I will discuss prior work and open problems in optimizing communication, avoiding synchronization, and tolerating nondeterminism, using data analysis and statistical learning problems from biology as driving examples. I will discuss distributed data structures and communication optimizations in large-scale genome analysis, including metagenome assembly, protein clustering, and more. The algorithms represented data analysis “motifs” including hashing, alignment, generalized n-body, and sparse matrices. I will describe two parallelization approaches, one based on asynchronous one-sided communication and another based on bulk-synchronous collectives using GraphBLAS. I will give an overview of these approaches, describe the GPU parallelizations, and highlight some of the resulting scientific insights, including the discovery of new microbial species and new protein functional dark matter.