Close

Presentation

Accelerating Machine Learning with Tensor Processing Units: A Journey of Full-stack Optimization and Co-design
DescriptionInspired by the success of the first TPU for ML inference deployed in 2015, Google has developed multiple generations of machine learning supercomputers for efficient ML training and serving, enabling near linear scaling of ML workloads. In this talk, we will present how TPU works as a machine learning supercomputer to benefit a growing number of Google services, including Gemini and Ads. Furthermore, we will have a deep dive into our full-stack co-design methodology that spans across model, software and hardware layers, and how it turns accelerator concepts into reality.