Report: Nvidia to launch "new inference chip" at next month's GTC conference, incorporating Groq LPU design

```

NVIDIA plans to release a brand-new inference chip integrating Groq's "Language Processing Unit" (LPU) technology at the GTC Developer Conference next month, marking NVIDIA's accelerated shift toward the inference computing field to address customers' urgent demand for high-performance, cost-effective computing solutions.

According to The Wall Street Journal, this new system, which NVIDIA CEO Jensen Huang described as "something the world has never seen," is designed to accelerate AI model query responses. The launch of this product is expected to reshape the current landscape of the AI computing market, directly affecting cloud service providers and enterprise-level investors seeking more cost-effective alternatives.

As a key sign of the market’s initial acceptance of this technology, OpenAI, developer of ChatGPT, has agreed to become one of the largest customers of the new processor and announced plans to purchase a large amount of "dedicated inference capacity" from NVIDIA. This move not only consolidates NVIDIA’s core client base, but also sends a clear message to the market: the underlying infrastructure supporting autonomous AI agents is shifting from large-scale pre-training to efficient inference.

Amid fierce competition from Google, Amazon, and numerous startups, NVIDIA is breaking away from sole reliance on traditional graphics processors (GPU). By introducing new technological architecture and exploring pure central processor (CPU) deployment models, the company aims to maintain its market dominance in the next phase of AI industry evolution.

Integrating LPU Design to Overcome Bottlenecks in Large Model Inference

As the AI industry shifts from model training to real-world application deployment, inference computing becomes a key focal point. AI inference mainly consists of pre-fill and decode stages, with the decode process for large AI models being especially slow. To address this technical bottleneck, NVIDIA chose to break physical limits by integrating external technology.

According to The Wall Street Journal, NVIDIA spent $20 billion at the end of last year to obtain key technology licenses from the startup Groq, and in a large-scale "core hiring" deal absorbed its executive team, including founder Jonathan Ross. The "Language Processing Unit" (LPU) designed by Groq adopts an architecture fundamentally different from traditional GPUs, and demonstrates extremely high efficiency in handling inference functions.

Industry analysis suggests the upcoming new products may involve the disruptive next-generation Feynman architecture. According to a previous WallstreetCN article, the Feynman architecture may adopt a wider SRAM integration scheme, even deeply integrating LPU within using 3D stacking technology, specifically optimized for latency and memory bandwidth—the two major inference bottlenecks—thus significantly reducing AI agents' operating energy consumption and costs.

Expanding Pure CPU Deployment for Diverse Computing Choices

While introducing the LPU architecture, NVIDIA is also flexibly adjusting its use of traditional processors. NVIDIA’s previous standard practice was to bundle Vera CPUs with its powerful Rubin GPUs in data center servers, but for certain AI agent workloads, this configuration proved too costly and inefficient.

Some large enterprise customers have found pure CPU environments more efficient for certain AI tasks. Responding to this trend, NVIDIA announced this month an expanded partnership with Meta Platforms, conducting its first large-scale pure CPU deployment to support Meta’s ad-targeting AI agents. This cooperation is seen in the market as an early window of NVIDIA’s strategic shift, showing the company is moving beyond a single GPU sales model, trying to lock in different AI market segments through diversified hardware combinations.

Shifting Market Demand and Intensifying Competition

The evolution of this underlying hardware design is directly driven by the exploding demand for AI agent applications in the tech industry. Many companies building and operating AI agents find traditional GPU costs too high and not the best choice for actual model operation.

OpenAI’s moves highlight this trend. In addition to promising to purchase NVIDIA’s new system to improve its rapidly growing Codex tool, OpenAI last month also reached a multi-billion dollar computing partnership with startup Cerebras. According to Cerebras CEO Andrew Feldman, its inference-focused chips outperform NVIDIA's GPUs in speed. Moreover, OpenAI has signed a major agreement to use Amazon's Trainium chips.

Not only startups, but major cloud service providers are also accelerating their efforts to develop chips in-house. Anthropic Claude Code, widely regarded as the leader in the auto-coding market, currently mainly relies on chips designed by Amazon AWS and Google Cloud under Alphabet, rather than NVIDIA products. Facing rivals, Jensen Huang emphasized in an interview with wccftech that NVIDIA is transforming from a pure chip supplier into a builder of a complete AI ecosystem covering semiconductors, data centers, cloud, and applications. For investors, next month's GTC conference will be a key moment to test whether NVIDIA can continue its myth of 90% market share in the inference era.

Risk warning and disclaimerThe market bears risk, and investment needs to be cautious. This article does not constitute personal investment advice, nor does it take into account the particular investment goals, financial circumstances, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article fit their specific circumstances. All investments based on this article are at the user's own risk. ```