No longer betting solely on Nvidia, Meta spends billions of dollars to rent Google TPUs.

No longer betting solely on Nvidia, Meta spends billions of dollars to rent Google TPUs.

Meta, while promising to purchase millions of Nvidia GPUs, is also spending billions to rent Google TPUs, marking a new phase in AI computing power’s move away from reliance on a single supplier. On February 26, The Information reported, citing people involved in the negotiations, that Meta has reached an agreement with Google to rent Google’s AI chip TPU over the next few years to develop new AI models, with the deal worth "billions of dollars." Meta is also discussing the purchase of TPUs for its data centers as early as next year, but the progress of the talks remains unclear. Rare move on the training side: Meta looks for alternatives beyond “inference” Notably, sources say Meta plans to use TPUs for AI training. This is more sensitive to the market: Most attempts to challenge Nvidia are seen in the inference segment, rather than in training clusters, which have stricter requirements for connectivity scale and software/hardware ecosystem. Thus, the market consensus has been that training is where Nvidia GPUs excel, and TPUs are only substitutes for inference. The report also mentions Meta announced a major deal this week with AMD, another Nvidia competitor, but sources say Meta mainly uses AMD chips for running existing models (inference) rather than training new models. Meta is also continuing to develop its own inference chips to lower costs and further diversify risk. “It’s not that we don’t use Nvidia, but we can’t rely solely on Nvidia” Shortly before the disclosure of the Meta-TPU deal, Nvidia just announced a new cooperation with Meta: Meta said it would purchase millions of GPUs for its data centers over the next few years. Putting both pieces of news together points to the same conclusion—Meta still depends on Nvidia’s training ecosystem, but is shifting more training and inference workloads to a “second choice,” to reduce uncertainties caused by dependence on a single supplier. One reason pushing Meta to shift is the unsmooth progress in developing its own AI training chips; another is the reality that last year, clients including OpenAI and Meta encountered “technical faults and hardware complexity” obstacles when deploying Nvidia’s latest Blackwell chips at scale. Google’s strategy: Making TPU an external business worth “billions of dollars” Sources say Google is ramping up efforts to directly compete with Nvidia in the AI training chip market, and TPU sales could bring Google “billions of dollars in additional revenue.” Some within Google Cloud proposed that if the TPU business is “super accelerated,” it might capture a share equivalent to about 10% of Nvidia’s annual revenue; according to the report, Nvidia’s annual revenue over the past 12 months was about $200 billion. Google’s approach to externalizing TPU is becoming more “financialized.” Besides the deal with Meta, Google has also struck an agreement with a large unnamed investment institution to fund a joint venture leasing TPUs to other clients; Google is also negotiating similar joint ventures with other private equity institutions. Sources say Google has signed at least one term sheet with a major investment institution. At the same time, Google’s corporate development team is in talks with potential financial partners about buying TPUs through a "special purpose vehicle (SPV)" for external leasing, and TPUs may be used as collateral for debt financing. The report likens this to the "creative financing" arrangement between xAI and venture capital firm Valor around Nvidia GPUs. Biggest variable: TPU supply, TSMC capacity, and the balance between "self-use" and "external sales" TPU scaling doesn’t just depend on demand. Google needs to balance multiple objectives: on one hand, it challenges Nvidia at the chip level; on the other, Google Cloud is a major Nvidia GPU customer—most AI developers still prefer the GPU ecosystem, and Google Cloud “cannot not offer” Nvidia servers, or it would impact its cloud competitiveness. Supply is equally tight. Google’s own Gemini model team needs TPUs; at the same time, both TPUs and Nvidia GPUs are manufactured by TSMC, meaning they “compete for the same type of production capacity” at TSMC factories. This will determine whether Google can quickly replicate Meta-style deals with more large clients. Risk disclosure and disclaimer The market is risky, and investment should be cautious. This article does not constitute personal investment advice, nor does it take into account individual users’ unique investment objectives, financial situations, or needs. Users should consider whether any opinions, views, or conclusions in this article are suitable for their particular circumstances. Investing according to this article is at your own risk.