Presentation
DFTracer: An Analysis-Friendly Data Flow Tracer for AI-Driven Workflows
DescriptionModern HPC workflows involve intricate coupling of simulation, data analytics, and artificial intelligence (AI) applications to improve time to scientific insight. However, current tools are not designed to work with an AI-based I/O software stack that requires tracing at multiple levels of the application. To this end, we designed DFTracer to capture data-centric events from workflows and the I/O stack. DFTracer has following three novel features, including a unified interface to capture tracing data from different layers in the software stack, a trace format which is analysis-friendly optimized to supports efficiently loading, and the capability to tag events with workflow-specific context to improve analysis. Additionally, we demonstrate that DFTracer has a 1.44x smaller runtime overhead and 7.1x smaller trace size as compared to state-of-the-art tools. In conclusion, we demonstrate that DFTracer can capture multi-level performance data with a low overhead of 1-5% from MuMMI and Megatron Deepspeed workflows.