Close

Presentation

Using Telemetry to Derive System Architecture Requirements: Experiences at the Oak Ridge Leadership Computing Facility
DescriptionIncreasing system complexity and component costs mean that designing supercomputers and other HPC systems requires significant architectural compromises to be made. As costs have increased dramatically, system architects are being forced to make ever more significant tradeoffs, where increasing one set of resources requires a reduction in another. Achieving the right resource balance is crucial for maximizing performance of the target workloads the system is designed for. To guide these decisions, it is first necessary to understand what the resource requirements of the workloads are. At ORNL we have been investigating the feasibility of using telemetry collected from existing systems to better understand how those systems are being used by users and their applications. We hope to be able to use this data to develop an understanding of resource usage to prioritize the various components in planning for future system procurements. In this talk I will give an overview of this effort, and the challenges we have faced along the way.
Event Type
Workshop
TimeSunday, 17 November 202410:50am - 11:10am EST
LocationB304
Tags
Codesign
Data Movement and Memory
Facilities
Registration Categories
W