Close

Presentation

Explainable AI Infrastructure: Optimizing Performance at Every Level
DescriptionThe growth of AI models and their deployment at scale have led to massive distributed environments that serve workflows spanning multiple nodes, storage clusters, and the networks that bind them together.

The challenge is clear: profiling and debugging the entire stack requires a new methodology. A holistic solution should provide visibility into every single component in the stack: switch, NIC, DPU, CPU, GPU, and storage.

In this talk we will show you the features of NVIDIA Nsight Systems to profile and analyze applications running on an NVIDIA DGX cluster attached to a VAST Data Platform. We shall peel the onion, going through every infrastructure component, understanding its role and effect on performance.

We will complete the picture with data-centric insights from the VAST Data Platform to provide a perspective never seen before on how applications use data and the impact on the storage stack, leading to optimizations and increased efficiency.
Event Type
Exhibitor Forum
TimeThursday, 21 November 20244:30pm - 5pm EST
LocationB206
Tags
HPC Infrastructure
Performance Evaluation and/or Optimization Tools
Registration Categories
TP
XO/EX