Presentation
Enhancing HPC I/O Performance: Leveraging Runtime and Offline I/O Optimization Frameworks
DescriptionThe existing parallel I/O stack is complex and difficult to tune due to the interdependencies among multiple factors that impact the performance of data movement between storage and compute systems. When performance is slower than expected, end-users, developers, and system administrators rely on I/O profiling and tracing information to pinpoint the root causes of inefficiencies. Despite having numerous tools that collect I/O metrics on production systems, it is not obvious where the I/O bottlenecks are (unless one is an I/O expert), their root causes, and what to do to solve them. Hence, there is a gap between the currently available metrics, the issues they represent, and the application of optimizations that would mitigate performance slowdowns. Streamlining such analysis, investigation, and recommendations could close this gap without requiring a specialist to intervene in every case.
This dissertation explores how this translation gap can be closed by introducing two innovative frameworks that leverage both offline and online analysis and tuning methodologies. The offline framework, named Drishti I/O, provides interactive visualizations that detail an application's I/O behavior. It pinpoints the root causes of I/O bottlenecks and offers actionable recommendations to enhance performance. The runtime framework extends the capabilities of the Recorder I/O tracing tool by incorporating a dynamic I/O prediction and optimization system. This system leverages context-free grammar to optimize I/O behavior in real time during application execution. Together, these frameworks offer a comprehensive approach to improving I/O performance through detailed analysis and real-time optimizations.
This dissertation explores how this translation gap can be closed by introducing two innovative frameworks that leverage both offline and online analysis and tuning methodologies. The offline framework, named Drishti I/O, provides interactive visualizations that detail an application's I/O behavior. It pinpoints the root causes of I/O bottlenecks and offers actionable recommendations to enhance performance. The runtime framework extends the capabilities of the Recorder I/O tracing tool by incorporating a dynamic I/O prediction and optimization system. This system leverages context-free grammar to optimize I/O behavior in real time during application execution. Together, these frameworks offer a comprehensive approach to improving I/O performance through detailed analysis and real-time optimizations.