Google plans to launch dedicated inference chips: a decade of hard work, TPUs are now challenging Nvidia’s dominant position across the board.

Google plans to launch dedicated inference chips: a decade of hard work, TPUs are now challenging Nvidia’s dominant position across the board.

Google is pushing its self-developed chip business to a new competitive frontier. **It plans to launch custom chips specifically designed for AI inference tasks, further challenging Nvidia's market dominance after reaching large-scale cooperation agreements with Meta and Anthropic.** According to Bloomberg, Google plans to unveil a new generation of tensor processing units (TPU) at the Google Cloud Next conference in Las Vegas this week. Jeff Dean, Google's chief scientist, said in an interview that with the rising demand for rapid processing of AI queries, "it is now reasonable to design chips more specialized for training or inference workloads." This move comes as the landscape of the AI chip market accelerates its evolution. Nvidia's GPUs remain the industry benchmark in the AI field, especially for model training, but the competition in the inference market is becoming increasingly fierce. Chirag Dekate, an analyst at market research firm Gartner, pointed out, **"The battlefield is shifting toward inference, and Google has an infrastructure advantage in this area."** **[Image]** **From Internal Tool to Industry Hit: The Breakout Journey of TPUs** Google's chip manufacturing has accumulated over more than a decade. This process began with a practical problem: Google needed computing power to support its language translation and speech recognition services, but available chips and hardware on the market couldn't achieve this at an affordable cost. The core idea behind TPUs, according to Vahdat, is to "solve a small set of problems, but those problems require enormous computational power." The mainstream view at the time was that developing custom hardware wasn't worth it, but Google went against the tide. In this stage, Google's chip R&D always maintained close co-evolution with its AI model work. The milestone research paper in 2017, which helped create today's large language models, prompted the TPU team to focus on chip designs serving the training of larger-scale AI systems. Later, Google DeepMind and the chip team noticed that TPUs experienced much idle wastage during reinforcement learning tasks and accordingly adjusted the network interconnections between chips to speed up data flow and avoid computational power idling. This internal feedback mechanism also enabled stronger control over "hardware-level errors." Paul Barham, Google scientist and joint head of the Gemini infrastructure team, revealed that when AI accelerator chips process massive mathematical operations, even a small fault can spread and cause the model to "collapse." "We can now check tens of thousands of accelerator chips in 10 seconds," he said. **Big Customers Enter Consecutively, Commercial Momentum Continues to Build Up** Google's chip business has also achieved rapid commercial breakthroughs. In October last year, **Anthropic announced expansion of its cooperation agreement with Google, gaining access to up to one million TPUs**; soon after, Google's Gemini model received widespread acclaim, being trained and run on TPUs. Since then, demand has expanded further. **Meta has signed a multi-billion dollar, multi-year TPU cloud service agreement.** Meta’s infrastructure chief Santosh Janardhan said, "It looks like there may be an advantage in inference," but also pointed out "a new platform inevitably has thresholds and learning curves." Citadel Securities, the hedge fund company, plans to share at this Google conference its experience in achieving faster model training speeds with TPUs compared to previous GPU solutions. Talal Al Kaissi, interim CEO of Core42, the cloud business under Abu Dhabi tech group G42, said it has had "multiple rounds of discussions" with Google about TPU usage and holds an optimistic attitude. Improvements to the software ecosystem are also proceeding in parallel. Google allows TPU clients to use external tools like PyTorch and third-party scheduling software, no longer requiring exclusive dependence on Google’s own products; at the same time, Google is testing permitting partners such as Anthropic to deploy some TPUs in their own data centers instead of Google's facilities. **Nvidia’s Strong Countermove and Rebalancing of the Market** Faced with Google's advance, Nvidia has not remained silent. Last month, Nvidia launched an inference chip developed using technology acquired from Groq. **Jensen Huang emphasized the chip’s versatility, claiming it can accomplish "many applications that TPUs cannot handle."** In practice, Google also relies on both TPUs and GPUs. Demis Hassabis, Google DeepMind CEO, pointed out that top AI labs have a strong interest in TPUs, "many want to run on both platforms." Google’s advantages lie in over a decade of chip design experience, abundant funds, and firsthand insight into AI models. Among top AI developers, Google is the only company conducting large-scale self-developed chips, enabling efficient two-way feedback between hardware and model teams. Natalie Serrino, co-founder of Gimlet Labs, said the current TPUs are very suitable for the workloads of emerging AI agents—"they are good tools for these explosive tasks." **Three-Year R&D Cycle and the Deep Contradiction of Rapid AI Iteration** The constraint on Google's chips is that **it takes about three years from R&D to mass production, while AI models evolve far faster, making it extremely difficult to accurately anticipate future customer needs.** Barham expressed another concern about the **feedback loop between hardware and model teams being overly tight**—this loop might **result in teams optimizing only the fit between current software and hardware, missing out on more breakthrough new ideas.** To balance the two, the TPU team sometimes chooses to design the chip to be "good enough" for various scenarios, rather than fully optimizing for a single use; another strategy is to advance two different design schemes in parallel, implementing the final plan based on specific needs. Vahdat’s words may best summarize Google’s long-term chip strategy considerations: >"Producing TPUs only for Google has its benefits, but also significant drawbacks. Ultimately, you end up stuck on a so-called 'technology island.' It might be a beautiful island, but with limited residents, limited diversity, and in the end, it might inhibit development." Risk Warning and Disclaimer The market involves risk, investment needs caution. This article does not constitute personal investment advice and has not taken into account the special investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, viewpoints, or conclusions in this article fit their specific circumstances. Invest accordingly and bear your own responsibility.