Recently, Google revealed more details about its next-generation TPU platform, Ironwood, showcasing its rack-level scalability and unprecedented AI computing power. Launched in April 2025, Ironwood marks a significant leap from TPU v4, setting a new benchmark in the competitive AI hardware landscape.
In just a few years, Google has increased single-chip performance by over tenfold, reflecting the explosive growth in AI model computation demands and the relentless innovation of chip designers. The seventh-generation TPU, codenamed Ironwood, reportedly delivers 24 times the performance of today's most powerful supercomputers. A full Ironwood Superpod integrates 9,216 chips, pushing system-scale computing to new heights.
At the 2025 Hot Chips conference, Google disclosed that a single Ironwood chip achieves a peak performance of 4,614 TFLOPs—more than 16 times higher than the TPU v4 in 2022 and nearly 10 times higher than the TPU v5p released in 2023. Ironwood comes with 192GB of high-bandwidth memory (HBM) and a bandwidth of 7.4TB/s. By comparison, TPU v4 featured 275 TFLOPs, 32GB HBM, and 1.2TB/s bandwidth, while TPU v5p offered 459 TFLOPs, 95GB HBM, and 2.8TB/s bandwidth.
The TPU pods have scaled dramatically over the generations: TPU v4 pods integrate up to 4,096 chips, TPU v5p up to 8,960, and Ironwood pods up to 9,216. Beyond single-chip performance, Ironwood represents a system-level innovation designed for extreme scalability.
Powerful chips need equally precise system engineering. Google developed Ironwood's modular, scalable architecture spanning chips, racks, and pods. Each Ironwood SoC integrates four chips on a PCBA board, and 16 boards stack together to form a 64-chip TPU rack. For larger-scale deployments, Google uses proprietary Inter-Chip Interconnect (ICI) technology with PCB traces, copper cables, and optical links to connect multiple racks into a single Superpod.
Managing such immense compute power presents significant energy and cooling challenges. Ironwood racks are equipped with advanced liquid cooling systems to maintain efficiency and reliability.
Ironwood represents Google's largest and most powerful AI computing engine to date, optimized for AI training and inference workloads—especially Mixed Expert (MoE) inference models. It's also the first TPU to support FP8 computation in its tensor cores and matrix units, compared to previous TPUs supporting INT8 inference and BF16 training.
Additionally, Ironwood chips include the third-generation SparseCore accelerator, initially introduced in the TPU v5p in 2023 and further enhanced in the 2024 Trillium chip. SparseCore was designed to accelerate recommendation models that use embeddings across multiple user categories.