Presentation
Toward Performance & Portability & Productivity in Parallel Programming
DescriptionAchieving *performance*, *portability*, and *productivity* for data-parallel computations (e.g., MatMul and convolutions) has emerged as a major research challenge. The complex hardware design of contemporary parallel architectures, including GPUs and CPUs, requires advanced program optimizations to fully exploit the performance potential of architectures. Furthermore, due to the diverse hardware landscape, it has proven challenging to achieve (performance) portability: different architectures require different kinds of optimizations, thereby often posing challenging, often even contradicting requirements on code optimization. Also, the complexity of achieving performance and portability must be hidden behind a user-productive programming interface to make programming modern architectures amenable.
This thesis introduces a novel approach to code *generation* and *optimization* for data-parallel computations targeting modern parallel architectures. The ultimate goal of our approach is to simultaneously achieve *performance*, *portability*, and *productivity*, in one combined approach, which is identified as a major research challenge.
The first part of this thesis introduces the algebraic formalism of Multi-Dimensional Homomorphisms (MDH) — a novel approach to generating code that can be fully automatically optimized (auto-tuned) for a particular target architecture and characteristics of the input and output data (such as size and memory layout); our code generation approach is hidden behind a productive user interface that expresses a wide range of data-parallel computations.
The second part of this thesis introduces the Auto-Tuning Framework (ATF) for automatically optimizing parameterized program code (as generated by our MDH approach). In contrast to existing auto-tuners, ATF supports so-called constrained tuning parameters which are ubiquitous in modern parallel programming.
This thesis introduces a novel approach to code *generation* and *optimization* for data-parallel computations targeting modern parallel architectures. The ultimate goal of our approach is to simultaneously achieve *performance*, *portability*, and *productivity*, in one combined approach, which is identified as a major research challenge.
The first part of this thesis introduces the algebraic formalism of Multi-Dimensional Homomorphisms (MDH) — a novel approach to generating code that can be fully automatically optimized (auto-tuned) for a particular target architecture and characteristics of the input and output data (such as size and memory layout); our code generation approach is hidden behind a productive user interface that expresses a wide range of data-parallel computations.
The second part of this thesis introduces the Auto-Tuning Framework (ATF) for automatically optimizing parameterized program code (as generated by our MDH approach). In contrast to existing auto-tuners, ATF supports so-called constrained tuning parameters which are ubiquitous in modern parallel programming.