Presentation
The UpDown System: A Scalable Supercomputer co-designed for Graph Computations
DescriptionThe UpDown system’s goal is to reduce programming complexity AND improve scalability on graph computations. UpDown is codesigned for fine-grained parallelism and efficient global communication; early performance studies using graph kernels indicate a single UpDown node can outperform a multicore CPU by up to 100x. The 16,384-node UpDown system achieves strong scaling on small graphs with projected performance of 1,000x today’s supercomputers and cloud for Pagerank, Triangle Count, and more. Iso-power comparisons are more favorable.
The UpDown system architecture is a significant departure. First, UpDown’s 1-cycle thread creation and management, combined with hardware scheduling enables trillions of fine-grained computations (<25 instructions, MIMD) to achieve high hardware efficiency. Second, UpDown provides efficient short messages. Features include 1-cycle message sends, NIC-less design (scalability), and split-transaction memory access that enable software-controlled intelligent data movement. Third, UpDown has >4TB/s per-node of all-to-all network bandwidth and global memory access latency of 1.1 us. Radically higher network capability opens new spaces for graph algorithms and data structures, as the system can be programmed as a flat, global memory machine. Finally, UpDown has massive memory bandwidth (10 TB/s/node and 150 PB/s system). Together, these capabilities enable high-level programming of vertex and edge parallelism for scalable high performance. The UpDown project is part of the IARPA’s AGILE research program.
The UpDown system architecture is a significant departure. First, UpDown’s 1-cycle thread creation and management, combined with hardware scheduling enables trillions of fine-grained computations (<25 instructions, MIMD) to achieve high hardware efficiency. Second, UpDown provides efficient short messages. Features include 1-cycle message sends, NIC-less design (scalability), and split-transaction memory access that enable software-controlled intelligent data movement. Third, UpDown has >4TB/s per-node of all-to-all network bandwidth and global memory access latency of 1.1 us. Radically higher network capability opens new spaces for graph algorithms and data structures, as the system can be programmed as a flat, global memory machine. Finally, UpDown has massive memory bandwidth (10 TB/s/node and 150 PB/s system). Together, these capabilities enable high-level programming of vertex and edge parallelism for scalable high performance. The UpDown project is part of the IARPA’s AGILE research program.
Presenter