Presentation
An Evaluation of the Effect of Network Cost Optimization for Leadership Class Supercomputers
DescriptionThe Dragonfly is an extensively deployed network
topology in large-scale high-performance computing (HPC) due
to its cost-effectiveness and efficiency. In comparison to other in-
direct networks of similar scale, the Dragonfly network has shown
a considerable reduction in cable lengths and network costs.
Three of the deployed and upcoming Exascale supercomputers
for leadership, class workloads will be deployed using Dragonfly
networks.
It is imperative to evaluate its performance across
a broad range of HPC workloads to facilitate optimal system
procurement. While previous work has focused on understanding
the topology from a capacity computing workload perspective, this study assesses extreme-scale leadership workloads on a dragonfly network. To accomplish this, we conduct a comprehensive evaluation of various workload efficiencies using state-of-the-art Slingshot 11 Dragonfly topology and compare it against Summit supercomputers EDR InfiniBand non-blocking fat-tree.
These evaluations are conducted utilizing resources at the OLCF (Frontier and Summit)
topology in large-scale high-performance computing (HPC) due
to its cost-effectiveness and efficiency. In comparison to other in-
direct networks of similar scale, the Dragonfly network has shown
a considerable reduction in cable lengths and network costs.
Three of the deployed and upcoming Exascale supercomputers
for leadership, class workloads will be deployed using Dragonfly
networks.
It is imperative to evaluate its performance across
a broad range of HPC workloads to facilitate optimal system
procurement. While previous work has focused on understanding
the topology from a capacity computing workload perspective, this study assesses extreme-scale leadership workloads on a dragonfly network. To accomplish this, we conduct a comprehensive evaluation of various workload efficiencies using state-of-the-art Slingshot 11 Dragonfly topology and compare it against Summit supercomputers EDR InfiniBand non-blocking fat-tree.
These evaluations are conducted utilizing resources at the OLCF (Frontier and Summit)
Event Type
Paper
TimeTuesday, 19 November 20243:30pm - 4pm EST
LocationB312-B313A
Accelerators
HPC Infrastructure
Performance Evaluation and/or Optimization Tools
State of the Practice
TP
Best Paper Finalist
Archive
view