Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Explainable AI Infrastructure: Optimizing Performance at Every Level
DescriptionThe growth of AI models and their deployment at scale have led to massive distributed environments that serve workflows spanning multiple nodes, storage clusters, and the networks that bind them together.
The challenge is clear: profiling and debugging the entire stack requires a new methodology. A holistic solution should provide visibility into every single component in the stack: switch, NIC, DPU, CPU, GPU, and storage.
In this talk we will show you the features of NVIDIA Nsight Systems to profile and analyze applications running on an NVIDIA DGX cluster attached to a VAST Data Platform. We shall peel the onion, going through every infrastructure component, understanding its role and effect on performance.
We will complete the picture with data-centric insights from the VAST Data Platform to provide a perspective never seen before on how applications use data and the impact on the storage stack, leading to optimizations and increased efficiency.
The challenge is clear: profiling and debugging the entire stack requires a new methodology. A holistic solution should provide visibility into every single component in the stack: switch, NIC, DPU, CPU, GPU, and storage.
In this talk we will show you the features of NVIDIA Nsight Systems to profile and analyze applications running on an NVIDIA DGX cluster attached to a VAST Data Platform. We shall peel the onion, going through every infrastructure component, understanding its role and effect on performance.
We will complete the picture with data-centric insights from the VAST Data Platform to provide a perspective never seen before on how applications use data and the impact on the storage stack, leading to optimizations and increased efficiency.