Trillium: Google’s TPU Powerhouse Behind Its New AI Models

May 20, 2024 by Lisa Boneta.

Google's sixth-generation tensor processing unit (TPU) stole the company's I/O developer conference stage with its higher-than-ever computing performance.

At Google's I/O developer conference, the company announced its sixth-generation tensor processing unit (TPU), Trillium—its most advanced TPU to date. With the rise of generative AI tools comes the need for hardware to support advanced computing, memory, and communication.



Sundar Pichai announces Google’s sixth-generation TPU at the Google I/O developer conference. 

Trillium achieves a 4.7x peak compute performance per chip compared to Google's TPU v5e. Google attributes this performance jump to Trillium's larger size and higher clock speed of matrix multiply units (MXUs).


Google Introduces Its Sixth-Gen TPU

Google introduced its first tensor processing unit (TPU) in 2015 as an AI accelerator application-specific integrated circuit (ASIC) for machine learning workloads. Although the chip was initially only intended for internal use, TPUs became available as a web service for scalable computing resources on Google Cloud.

Since then, Google has announced six generations of TPUs built to support machine learning applications. Google’s latest TPU offering, Trillium, promises to serve and train large models with better performance and efficiency. The TPU stands apart from previous generations with its expanded matrix multiply units (MXUs) and the integration of Google's latest dataflow processor, SparseCore. 


More Bandwidth, Larger Models, Less Power

Google has doubled Trillium's high-bandwidth memory (HBM) capacity and bandwidth to work with larger models. Its higher power efficiency and memory throughput improves training time and latency in these models. Trillium also doubles the interchip interconnect (ICI) bandwidth from TPU v5e, enabling training to scale to tens of thousands of chips.


Google's TPU VM architecture

Google's TPU VM architecture. 

Trillium notably includes Google's third-generation SparseCore, a dataflow processor that accelerates models specialized for ultra-large embeddings in advanced ranking and recommendation workloads. This means that Trillium can help train the next wave of foundation models faster while reducing latency and cost.

Trillium also represents Google's most sustainable TPU to date, with Trillium being over 67% more energy-efficient than TPU v5e.  


Trillium Leverages Google's Intelligent Processing Units

Trillium can scale 256 TPUs in a single pod. Using Google's Multislice technologies and Titanium intelligent processing units (IPUs), it can scale to hundreds of pods to connect tens of thousands of chips.

Cloud TPU Multislice is a full-stack, performance-scaling technology that allows training jobs to use multiple TPU slices in a single pod or on slices in multiple pods with data parallelism. TPU chips deployed in Multislice configurations communicate through ICI. Performance scales nearly linearly by the number of podslices deployed.


TPU chip performance

TPU chip performance scales linearly to podslice count.


Paired with TPU Multislice, Google’s Titanium IPU uses a modern offload architecture to run processing outside of a workload’s host, freeing up resources to deliver high compute performance. 


Supporting Gemini 1.5 Flash, Imagen 3, and Gemma 2

Trillium was the foundation for several other announcements made at Google I/O, including the new Gemini 1.5 Flash, Imagen 3, and Gemma 2 models—each of which was trained using TPUs. Paired with Trillium, these models can serve various applications, such as AI assistants, video and photo generation, and open-vision language models to enhance the quality of Google’s generative AI offerings.

Developers interested in using Trillium in their AI workloads can access it exclusively on Google Cloud later this year. 



All images used courtesy of Google.