Presentation
Speeding-Up LULESH on HPX: Useful Tricks and Lessons Learned using a Many-Task-Based Approach
DescriptionCurrent programming models face challenges in dealing with modern supercomputers' growing parallelism and heterogeneity. Emerging programming models, like the task-based programming model found in the asynchronous many-task HPX programming framework, offer new ways to express parallelism, enhance scalability, and mask synchronization and communication latency on multi-core and distributed systems.
Regular high-performance computing benchmarks are often unsuitable for comparing different programming models due to their limited code complexity. However, real-world scientific applications are usually too complex. As a middle ground, proxy applications model the behavior of actual scientific problems, while reducing code complexity.
In our research on using HPX to program machines with heterogeneous compute units (e.g., GPU and FPGA/AI Engines), we have also substantially optimized a pure HPX-based software baseline of the LULESH proxy application. This paper discusses the techniques we applied yielding single-node speed-ups of 1.33x to 2.25x for different problem sizes relative to the LULESH OpenMP reference implementation.
Regular high-performance computing benchmarks are often unsuitable for comparing different programming models due to their limited code complexity. However, real-world scientific applications are usually too complex. As a middle ground, proxy applications model the behavior of actual scientific problems, while reducing code complexity.
In our research on using HPX to program machines with heterogeneous compute units (e.g., GPU and FPGA/AI Engines), we have also substantially optimized a pure HPX-based software baseline of the LULESH proxy application. This paper discusses the techniques we applied yielding single-node speed-ups of 1.33x to 2.25x for different problem sizes relative to the LULESH OpenMP reference implementation.