Performance rivals Blackwell, energy efficiency crushes GPUs: an in-depth analysis of Google TPU's "real combat power"

In the field of AI computing power, Nvidia seems to be the unrivaled hegemon. But away from the spotlight, tech giant Google is quietly redefining the rules of the AI chip war in a way that is both discreet and highly disruptive.

This trump card is Google’s self-developed TPU (Tensor Processing Unit).

If you think this is just Google’s cheap “spare tire,” you’d be sorely mistaken. Based on the latest disclosed in-depth information, Google’s latest TPU v7 (codename Ironwood) has not only matched Nvidia’s B200 in memory capacity, but also delivered a dimension-crushing blow to GPUs in terms of energy efficiency. Even Jensen Huang himself has hinted that in the field of ASICs, Google TPU is a “special presence.”

From TPU v6 (Trillium) to the newly revealed TPU v7 (Ironwood), Google is not just making chips—it's building an almost insurmountable moat for the impending “AI inference era.”

Origin: A “Forced” Survival Rescue

The story of the TPU did not begin with a breakthrough in chip manufacturing, but rather with a math problem that sent chills down the spines of Google’s executives.

In 2013, Jeff Dean and the Google Brain team ran a simulation: if every Android user used voice search for just three minutes a day, Google would need to double the capacity of its global data centers just to handle the computing load.

At that time, Google relied on general-purpose CPUs and GPUs, but these chips were terribly inefficient at the massive matrix multiplications needed for deep learning. Expanding with old hardware would have been a financial and logistical nightmare.

So, Google decided to walk an untrodden path: designing an ASIC chip tailored for TensorFlow neural networks.

This project progressed rapidly, taking only 15 months from conceptual design to data center deployment. By 2015, while the outside world knew nothing, TPUs were quietly supporting Google Maps, Photos, Translate, and other core businesses.

Architectural Battle: Ditching the “Baggage” So Data Flows Like Blood

Why can the TPU thrash GPUs in energy efficiency? It starts with the underlying architecture.

GPUs are designed as general-purpose parallel processors for graphics, built to handle everything from gaming textures to scientific simulations, and so carry a heavy “architectural baggage”—such as complex caches, branch prediction, and thread management—which use up significant chip area and power.

By contrast, the TPU is extreme “minimalism.” It strips away all irrelevant hardware such as rasterization and texture mapping, using a unique “systolic array” architecture.

In traditional GPUs, each calculation requires shuttling data between memory and compute units, resulting in the infamous “von Neumann bottleneck.” But in the TPU’s systolic array, data flows across the chip like blood pumps through a heart. This drastically reduces HBM (High Bandwidth Memory) read/writes, letting the chip spend time on computation rather than waiting for data.

This design gives TPU a crushing advantage in “operations per joule.”

Head-to-Head With Blackwell: The Terrifying Data of TPU v7

Although Google is always secretive about performance data, according to Semianalysis and insider leaks, Google’s latest TPU v7 (Ironwood) shows staggering generational leaps.

Explosive computing power: TPU v7’s BF16 compute reaches 4,614 TFLOPS, while the previous widely-used TPU v5p was only 459 TFLOPS. That’s a tenfold increase.

Memory matching B200: Single chip HBM capacity reaches 192GB, exactly the same as Nvidia’s Blackwell B200 (Blackwell Ultra is 288GB).

Bandwidth surge: Memory bandwidth reaches 7,370 GB/s, far above v5p’s 2,765 GB/s.

For interconnect technology, Google uses Optical Circuit Switches (OCS) and a 3D torus network.

Compared to Nvidia’s InfiniBand, OCS is extremely cost- and power-efficient because it eliminates photoelectric conversion. Although that does sacrifice some flexibility, for specific AI tasks combined with Google’s compiler, the efficiency is unrivaled.

Even more noteworthy is energy efficiency. At Hot Chips 2025, Google revealed v7’s per-watt performance is up 100% over v6e (Trillium). A former Google executive bluntly stated: “For specific applications, TPU can offer 1.4x better performance-per-dollar than GPU.” For dynamic model training (such as search-type workloads), TPU’s speed is even five times that of a GPU.

Escaping the “Nvidia Tax” and Returning to High Margin Era

For investors and cloud vendors, the biggest value of TPU isn’t just speed—it’s profitability.

In the AI age, cloud giants face a slide from oligopoly to commodity business. Forced to buy Nvidia’s GPUs, up to 75% of the margins go to Nvidia; cloud vendors’ AI business margins have plummeted from a traditional 50–70% to just 20–35%, functioning almost like “utility companies” collecting tolls.

How to return to high margins? Self-developed ASICs are the only cure.

By controlling the entire stack of TPU design (designing front-end RTL themselves, Broadcom only does physical implementation), Google successfully bypasses the “Nvidia tax.” At the same time, Broadcom’s profit margin is much lower than Nvidia’s, enabling Google to crush computing costs to the extreme.

A client confessed after comparing:

If I use eight H100s versus a v5e Pod, the latter not only has better performance-per-dollar, but as Google releases new generations of TPUs, the old versions don’t get retired—instead, they become incredibly cheap.

Sometimes if you’re willing to wait a few extra days for training, the cost might drop to a fifth of the original.

Despite TPUs facing ecosystem challenges (CUDA dominance) and multi-cloud deployment (data migration costs), as AI workloads shift from “training” to “inference,” CUDA’s importance is diminishing.

SemiAnalysis’ commentary is spot on:

Among hyperscale computing vendors, Google’s chip dominance is unchallenged. TPU v7’s performance puts it on par with Nvidia Blackwell.

In the trillion-dollar AI computing arms race, Nvidia leads, but Google—armed with the TPU sword—may be the only player able to fully control its own destiny.

Risk Warning and DisclaimerThe market involves risks; investment requires caution. This article does not constitute personal investment advice, nor does it consider any individual’s specific investment objectives, financial situation, or needs. Users should consider whether any opinions, views, or conclusions in this article fit their situation. Any investment actions based on this article are at the user's own risk.