Ming-Chi Kuo: Integrating into Nvidia’s ecosystem, LPU production will surge tenfold, having a significant impact on the PCB supply chain.

```

Nvidia's incorporation of Groq LPU technology into the Rubin platform is triggering a profound transformation at the supply chain level.

At Nvidia's GTC conference, CEO Jensen Huang announced the launch of the Nvidia Groq 3 LPU chip, officially integrating it into the Vera Rubin platform as the core inference acceleration component for the next-generation AI data center.

Well-known Apple supply chain analyst Ming-Chi Kuo immediately released a supply chain survey report, pointing out that after Nvidia invested in Groq, the shipment forecast for LPUs was significantly raised, with combined shipments in 2026 and 2027 expected to reach 4 to 5 million units—an increase of more than tenfold compared to historical annual shipments.

Ming-Chi Kuo believes there are two main drivers behind this explosive growth: First, the deep integration of the LPU with Nvidia’s CUDA ecosystem significantly lowers the development threshold; second, the rapid expansion of ultra-low latency inference scenarios such as AI agents, real-time consumer-side applications, and physical AI. He also notes that mass production of LPU/LPX racks will have a major impact on the PCB supply chain, and WUS Printed Circuit is expected to be a key beneficiary.

Jensen Huang announces at GTC: LPU officially becomes the seventh cornerstone of the Rubin platform

In this GTC keynote, Jensen Huang revealed how Nvidia has integrated the IP technology acquired from Groq last year into the Rubin platform. As an inference acceleration chip, the Nvidia Groq 3 LPU becomes the seventh core building block of the Rubin platform, following the Rubin GPU, Vera CPU, NVLink 6 expansion switch, ConnectX 9 smart NIC, Bluefield 4 DPU, and Spectrum-X expansion switch.

From a technical architecture perspective, the Groq 3 LPU follows a distinctly differentiated path compared to mainstream AI accelerators. Most AI accelerators rely on HBM as working memory, but each Groq 3 LPU is built with 500MB of SRAM—the same type of memory used in CPU and GPU ultra-high-speed caches. Although this capacity is much lower than the 288GB HBM4 equipped in the Rubin GPU, its bandwidth reaches 150TB/s, far exceeding the 22TB/s HBM bandwidth of the latter.

For AI decoding operations that are highly sensitive to bandwidth, the extremely high bandwidth of Groq 3 offers significant advantages in inference applications, especially for the deployment of cutting-edge AI models requiring high-volume, low-latency, and high-interactivity outputs.

Supply chain survey: Shipments expected to reach 4 to 5 million units in 2026–2027

According to Ming-Chi Kuo’s latest supply chain survey, after Nvidia’s investment in Groq, the shipment forecast for LPUs has seen a substantial upward revision. He expects total LPU shipments in 2026–2027 to reach 4 to 5 million units, with 30%-40% in 2026 and 60%-70% in 2027. Compared to historical annual production, this scale represents an exponential increase of more than tenfold.

At the rack level, Nvidia plans to increase the LPU density per rack from 64 units to 256 units to maintain ultra-low latency during the inference and decoding stage, while also meeting the expanding KV cache demands from long context inference.

Kuo expects the new rack architecture to enter mass production between Q4 2026 and Q1 2027, with rack shipments expected to jump from 300–500 units in 2026 to 15,000–20,000 units in 2027.

Ecosystem integration is key: Three technical nodes determine implementation speed

Kuo points out that the rapid growth in LPU demand fundamentally stems from its deep integration with Nvidia’s ecosystem. Integration with Nvidia CUDA significantly reduces the threshold for application development and deployment, allowing developers to leverage LPU computing power without restructuring existing workflows. Meanwhile, the fast expansion of ultra-low-latency inference scenarios—such as AI agents (e.g., programming agents), real-time consumer applications, and physical AI—is further driving LPU demand.

He also lists three key technical integration nodes to closely monitor: First, at the networking architecture level, whether rack-scale interconnects can be seamlessly integrated via NVLink Fusion and RealScale; second, at the developer interface level, whether Nvidia NIM will let developers deploy workloads directly without distinguishing between GPU and LPU; third, at the compiler level, whether TensorRT-LLM will support the LPU’s "compile-first" architecture. Kuo believes that the progress of these three integrations will directly determine the speed and depth of LPU’s large-scale adoption.

PCB supply chain enters a new cycle: WUS Printed Circuit could be a core beneficiary

Kuo especially emphasizes that mass production of LPU/LPX racks has major implications for the PCB supply chain. He notes that LPU/LPX racks represent the first large-scale commercial deployment of M9-grade CCL (copper-clad laminate) material, with WUS Printed Circuit playing a key role in this supply chain.

M9-grade CCL materials demand extremely high manufacturing processes and involve technical breakthroughs in the processing of high-layer-count boards with quartz glass fabric. Kuo believes that if LPU/LPX racks scale up smoothly, not only will it contribute substantially to WUS's 2027 performance, but it will also validate the company’s technical capabilities in advanced manufacturing, potentially catalyzing a new growth cycle for the entire PCB industry.

Risk warning and disclaimerThe market has risks, and investment should be cautious. This article does not constitute individual investment advice, nor does it take into account the special investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific situation. Investments made accordingly are at your own risk. ```