Presentation
Arista EtherLink™ AI Networking
DescriptionThe Arista EtherLink AI networking platform introduces a modular, scalable architecture with a unified lossless dataplane across distributed, independent components. Each component, equipped with its own control and data planes, is interconnected via high-speed Ethernet links.
By embracing Ethernet, it ensures interoperability and optimizes for currently available, cost-efficient RDMA NICs.
Key technical benefits:
- Scalable AI Networking: EtherLink losslessly connects 32,000 XPUs in one cluster, or 100,000 across multiple data centers at 800 Gbps.
- Proven Reliability and Performance: The system has been field-tested by hyperscalers and is validated through large-scale simulations. Simulations indicate a 10% to 30% improvement in job completion time.
- Fast Failure Recovery: EtherLink uses hardware-accelerated link fault detection and repair for milliseconds-level recovery.
- EOS Integration: Arista's EOS® provides centralized control, telemetry, and quality of service (QoS) management, supporting configuration, monitoring, and debugging of AI/ML workloads from network to NIC.
At its core, EtherLink incorporates Broadcom's Jericho3 packet processor and Ramon3 fabric chips. These components target the requirements of AI/ML, offering advanced load balancing, congestion management, and fault resilience.
This presentation discusses the design, motivations, and technical foundations of the EtherLink solution.
By embracing Ethernet, it ensures interoperability and optimizes for currently available, cost-efficient RDMA NICs.
Key technical benefits:
- Scalable AI Networking: EtherLink losslessly connects 32,000 XPUs in one cluster, or 100,000 across multiple data centers at 800 Gbps.
- Proven Reliability and Performance: The system has been field-tested by hyperscalers and is validated through large-scale simulations. Simulations indicate a 10% to 30% improvement in job completion time.
- Fast Failure Recovery: EtherLink uses hardware-accelerated link fault detection and repair for milliseconds-level recovery.
- EOS Integration: Arista's EOS® provides centralized control, telemetry, and quality of service (QoS) management, supporting configuration, monitoring, and debugging of AI/ML workloads from network to NIC.
At its core, EtherLink incorporates Broadcom's Jericho3 packet processor and Ramon3 fabric chips. These components target the requirements of AI/ML, offering advanced load balancing, congestion management, and fault resilience.
This presentation discusses the design, motivations, and technical foundations of the EtherLink solution.
Event Type
Exhibitor Forum
TimeThursday, 21 November 20244pm - 4:30pm EST
LocationB206
Hardware Technologies
Network
TP
XO/EX
Archive
view
Similar Presentations
