Costs drop by 70%! Google TPU makes a strong push, now matches NVIDIA in price-performance

As AI capital expenditure remains high but commercialization pressure continues to rise, the market's focus is quietly but profoundly shifting: Can large models continue to “run regardless of cost”?

According to Chase Wind Trading Desk, Goldman Sachs' latest AI chip research report does not continue the market’s familiar comparisons of “compute power, process, parameter size,” but instead approaches from a perspective closer to commercial reality—unit cost during inference. By constructing an “inference cost curve,” Goldman Sachs attempts to answer a critical question for the AI industry: After models enter high-frequency invocation, under constraints such as depreciation, energy consumption, and system utilization, what is the real cost of processing each one million tokens with different chip solutions?

The research points to an accelerating yet not fully digested change: Google/Broadcom’s TPU is rapidly narrowing the inference cost gap with Nvidia’s GPUs. Upgrading from TPU v6 to TPU v7 sees unit token inference costs drop by about 70%, making its absolute cost roughly equal to Nvidia GB200 NVL72 and under some estimates, slightly advantageous.

This does not mean Nvidia’s position is shaken, but it clearly shows that the core evaluation system for AI chip competition is shifting from “who computes faster” to “who computes cheaper and more sustainably”. As training gradually becomes upfront investment and inference a long-term cash flow source, the slope of the cost curve is replacing peak compute power as the key variable that determines industry structure.

1. From Compute Leadership to Cost Efficiency, AI Chip Competition Standards Are Switching

In the early stage of AI development, training compute power decided everything. Whoever could train bigger models faster owned technological discourse. However, as large models move towards deployment and commercialization, inference loads begin to vastly surpass training itself, amplifying cost issues rapidly.

Goldman Sachs notes that at this stage, chip cost-effectiveness is no longer determined solely by single-card performance, but also shaped by system-level efficiency: including compute density, interconnect efficiency, memory bandwidth, and energy consumption. The cost curve built on this logic shows that Google/Broadcom TPUs’ advances in raw compute and system efficiency are enough to compete head-on with Nvidia in terms of cost.

By contrast, AMD and Amazon Trainium have more limited generational cost declines. Current calculations show both have notably higher unit inference costs than Nvidia and Google solutions, so their impact on mainstream markets is relatively limited.

2. Behind TPU’s Cost Leap Is System Engineering Capability, Not Single-Point Breakthrough

TPU v7’s substantial cost reduction does not stem from a single technical breakthrough, but from the concentrated release of system-level optimization capabilities. Goldman Sachs believes that as compute chips themselves approach physical limits, future cost reduction in inference will increasingly rely on advances in “adjacent computation technologies.”

These include higher-bandwidth, lower-latency networking, ongoing integration of high-bandwidth memory (HBM) and storage solutions, advanced packaging (such as TSMC CoWoS), and improvements in rack-level solution density and energy efficiency. TPU’s coordinated optimization in these areas gives it a clear economic advantage in inference scenarios.

This trend aligns closely with Google’s own compute deployment. TPU usage in Google internal workloads continues to rise and is widely used for Gemini model training and inference. At the same time, external clients with mature software capabilities are rapidly adopting TPU solutions, the most notable case being Anthropic’s approximately $21 billion order with Broadcom, with products expected to begin delivery in mid-2026.

However, Goldman Sachs also emphasizes that Nvidia still holds the “go-to-market timing” advantage. While TPU v7 has just caught up with GB200 NVL72, Nvidia has already moved to GB300 NVL72 and plans to deliver VR200 NVL144 in the second half of 2026. Its continual product iteration remains a key factor in maintaining customer stickiness.

3. Investing Implications Rebalanced: ASICs Rise, but Nvidia’s Moat Remains Intact

From an investment perspective, Goldman Sachs has not downgraded its view on Nvidia due to TPU’s rapid catch-up. The firm still maintains a buy rating on Nvidia and Broadcom, believing both are most directly tied to the most sustainable part of AI capex, and will benefit long-term from network, packaging, and system-level technological upgrades.

Within the ASIC camp, Broadcom's benefit logic is especially clear. Goldman Sachs has raised its FY2026 EPS forecast to $10.87, about 6% above market consensus, and believes the long-term profitability of Broadcom in AI networking and custom compute is still underestimated.

AMD and Amazon Trainium are still in the catch-up phase, but Goldman Sachs also points out that AMD’s rack-level solutions have the potential for late-mover advantage. By the end of 2026, the Helios rack based on MI455X may achieve a 70% cost reduction in some training and inference scenarios—worth watching.

More importantly, this report does not present a “winner-take-all” conclusion, but rather a gradually clarifying industry division: GPUs continue to dominate training and general compute markets, while custom ASICs keep penetrating scalable, predictable inference loads. In this process, Nvidia’s CUDA ecosystem and system-level R&D investment remain a robust moat, but its valuation logic must also continuously withstand the reality test of falling inference costs.

When AI truly enters a stage where “every token needs to return value,” the competition for compute inevitably returns to economics. The 70% cost drop for TPU is not a simple technical chase, but a critical stress test for the feasibility of AI’s commercial model. This may well be the most important signal behind the GPU vs. ASIC debate that the market should pay close attention to.

~~~~~~~~~~~~~~~~~~~~~~~~

The above content is from Chase Wind Trading Desk.

For more detailed interpretation, including real-time analysis and front-line research content, please join [Chase Wind Trading Desk Annual Membership]

Risk Warning and DisclaimerThe market carries risks, investment needs caution. This article does not constitute personal investment advice, nor does it consider the individual investment goals, financial status, or needs of specific users. Users should consider whether any opinions, views, or conclusions herein are suitable to their own circumstances. Investing based on this content is at one’s own risk.