Code name "TorchTPU"! Google and Meta join forces to replicate CUDA, further threatening NVIDIA.
```
Google is advancing an internal initiative called "TorchTPU," aimed at improving its AI chips' compatibility with PyTorch, the world's most widely used AI software framework—a move that directly targets NVIDIA's longstanding moat built on its software ecosystem.
According to a Bloomberg report on Thursday, sources revealed that Google is working closely with Meta to push this plan forward. As the creator and maintainer of PyTorch, Meta hopes to lower inference costs and diversify AI infrastructure to gain more leverage in negotiations with NVIDIA. Google is also considering open-sourcing parts of the software to accelerate customer adoption.
Compared to previous efforts supporting PyTorch, Google has devoted more organizational resources and strategic emphasis this time. As more companies seek to adopt Tensor Processing Unit (TPU) chips but view the software stack as a bottleneck, this initiative has become a key growth engine for Google Cloud.
If successful, TorchTPU will significantly reduce the switching costs for companies moving from NVIDIA GPUs to alternative solutions. NVIDIA's dominance relies not just on hardware, but more importantly on its deeply embedded CUDA software ecosystem in PyTorch, which has become the default approach for enterprises training and running large AI models.
Software Compatibility Becomes the Biggest Barrier to TPU Adoption
Google's TorchTPU initiative aims to remove key barriers hindering adoption of TPU chips. According to sources, enterprise clients have consistently given Google the feedback that TPUs are more difficult to adopt for AI workloads because, historically, developers had to switch to Google's internally favored machine learning framework Jax, rather than the PyTorch framework already used by most AI developers.
This mismatch originates from Google's own technological path. Google's internal software development teams have long used the Jax code framework, and its TPU chips rely on the XLA tool to execute code efficiently. Google's own AI software stack and performance optimization are primarily built around Jax, widening the gap between Google's use of chips and customer needs.
In contrast, NVIDIA engineers have spent years ensuring that software developed with PyTorch runs as quickly and efficiently as possible on their chips. PyTorch is an open-source project, and its development has been closely tied to NVIDIA CUDA software. CUDA is regarded by some Wall Street analysts as NVIDIA's strongest shield against competitors.
Google Accelerates External Sales of TPUs
Alphabet has long kept the vast majority of its TPU chips for internal use. This changed in 2022, when Google's cloud computing unit successfully lobbied to take over management of the TPU sales team, significantly increasing Google Cloud's TPU quota.
As customer interest in AI grows, Google has been seeking to profit by boosting TPU production and external sales. TPU sales have become a key growth engine for Google Cloud's revenue, as the company strives to demonstrate to investors that its AI investments are paying off.
This year, Google began direct sales of TPUs to customers' data centers, no longer restricting access solely through its own cloud. This month, Google veteran Amin Vahdat was named head of AI infrastructure, reporting directly to CEO Sundar Pichai. Google needs this infrastructure to run its own AI products, including the Gemini chatbot and AI-powered search, as well as to supply customers like Anthropic on Google Cloud.
Meta Becomes a Strategic Partner
To speed up development, Google is working closely with Meta. According to The Information, the two tech giants have been in talks about deals for Meta to acquire more TPUs.
Sources revealed that the early services provided to Meta used a Google-managed model: customers like Meta install Google-designed chips to run Google software and models, with operational support from Google. Meta has a strategic interest in developing software that makes TPUs easier to use, hoping this will lower inference costs and diversify its AI infrastructure away from exclusive reliance on NVIDIA GPUs, thereby gaining bargaining power.
A Google Cloud spokesperson declined to comment on the specifics of the project but said: "We are seeing massive and accelerating demand for TPU and GPU infrastructure. Our focus is to provide the flexibility and scale that developers need, regardless of which hardware they choose to build on." Meta also declined to comment.
Lowering Switching Costs to Challenge NVIDIA's Ecosystem
PyTorch, first released in 2016, is one of the most widely used tools for AI model development. In Silicon Valley, few developers write every line of code that NVIDIA, AMD, or Google chips actually execute. Instead, these developers rely on tools like PyTorch—a set of pre-written code libraries and frameworks that automate many common tasks in AI software development.
According to sources, as corporate demand to use TPU chips grows—but they view the software stack as a bottleneck—Google has devoted more organizational focus, resources, and strategic importance to the TorchTPU initiative. Most developers can't easily adopt Google chips or achieve NVIDIA-equivalent performance without significant extra engineering work. In the fast-paced AI race, such work takes time and money.
If the TorchTPU initiative succeeds, it will significantly lower switching costs for enterprises seeking alternatives to NVIDIA GPUs. NVIDIA's dominance is reinforced not only by its hardware but even more by its deeply embedded CUDA software ecosystem in PyTorch, which has become the default for enterprises training and running large AI models.
Risk Warning and DisclaimerThe market has risks, and investments should be made cautiously. This article does not constitute personal investment advice, nor does it take into account the individual investment goals, financial situation, or needs of any particular user. Users should consider whether any opinions, viewpoints, or conclusions in this article are suitable to their specific situation. Investing based on this is at your own risk. ```