NVIDIA releases next-generation Rubin platform, inference cost 10 times lower than Blackwell, plans to ship in the second half of the year.

NVIDIA unveiled its next-generation Rubin AI platform at the CES exhibition, marking its continued annual update pace in the artificial intelligence (AI) chip sector. Through an integrated design of six new chips, the platform achieves significant leaps in inference costs and training efficiency, with the first batch scheduled for delivery to customers in the second half of 2026.

On Monday the 5th, US Eastern time, NVIDIA CEO Jensen Huang stated in Las Vegas that all six Rubin chips have returned from manufacturing partners and have passed some key tests, proceeding according to plan. He noted, "The AI race has begun, and everyone is working hard to reach the next level." NVIDIA emphasized that systems based on Rubin will have lower operational costs than the Blackwell version, achieving the same results with fewer components.

Microsoft and other major cloud computing providers will be among the first clients to deploy the new hardware in the second half of the year. Microsoft's next-generation Fairwater AI superfactory will be equipped with NVIDIA Vera Rubin NVL72 rack-level systems, scalable up to hundreds of thousands of NVIDIA Vera Rubin superchips. CoreWeave will also be among the first suppliers to offer Rubin systems.

The launch of the platform comes as some Wall Street observers worry about intensified competition facing NVIDIA and question whether spending in the AI sector can maintain its current pace. However, NVIDIA maintains a bullish long-term outlook, believing the total market size could reach trillions of dollars.

Performance improvements target next-generation AI needs

According to NVIDIA's announcement, the training performance of the Rubin platform is 3.5 times better than the previous Blackwell generation, and its AI software runtime performance is improved by 5 times. Compared to Blackwell, Rubin can reduce inference token generation costs by up to tenfold, and cut the GPU requirement for training mixture of experts (MoE) models by fourfold.

The new platform's Vera CPU has 88 cores, providing twice the performance of its predecessor. This CPU is designed specifically for agent inference and is the most energy-efficient processor for large-scale AI factories, featuring 88 custom Olympus cores, full Armv9.2 compatibility, and ultra-fast NVLink-C2C interconnect.

The Rubin GPU is equipped with a third-generation Transformer engine, featuring hardware-accelerated adaptive compression, delivering 50 petaflops of NVFP4 compute power for AI inference. Each GPU provides 3.6TB/s bandwidth, while the Vera Rubin NVL72 rack provides 260TB/s bandwidth.

Chip testing progresses smoothly

Jensen Huang revealed that all six Rubin chips have returned from manufacturing partners and passed key tests showing they can be deployed as scheduled. This statement suggests NVIDIA is maintaining its leading position as a manufacturer of AI accelerators.

The platform features five major innovations: sixth-generation NVLink interconnect technology, Transformer engines, confidential computing, RAS engine, and Vera CPU. The third-generation confidential computing feature makes the Vera Rubin NVL72 the first rack-level platform providing data security across CPU, GPU, and NVLink domains.

The second-generation RAS engine spans GPU, CPU, and NVLink, offering real-time health checks, fault tolerance, and proactive maintenance to maximize system productivity. The rack uses modular, cable-free tray design, allowing assembly and maintenance 18 times faster than Blackwell.

Extensive ecosystem support

NVIDIA stated that AWS (Amazon), Google Cloud, Microsoft, and Oracle Cloud will be the first to deploy Vera Rubin-based instances in 2026, with cloud partners CoreWeave, Lambda, Nebius, and Nscale following suit.

OpenAI CEO Sam Altman said, "Intelligence scales with compute. As we increase compute, models become more powerful, solve harder problems, and bring greater impact to people. NVIDIA's Rubin platform helps us continue expanding this progress."

Anthropic co-founder and CEO Dario Amodei said that the efficiency improvements of NVIDIA's Rubin platform represent infrastructure advances that enable longer memory, better reasoning, and more reliable outputs.

Meta CEO Mark Zuckerberg stated that NVIDIA's "Rubin platform promises step-change in performance and efficiency, which is what’s needed to deploy the most advanced models to billions of people."

NVIDIA also stated that Cisco, Dell, Hewlett Packard Enterprise, Lenovo, and Supermicro are expected to launch various servers based on Rubin products. AI labs including Anthropic, Cohere, Meta, Mistral AI, OpenAI, and xAI anticipate using the Rubin platform to train larger, more powerful models.

Early disclosure of product details

Commentators said NVIDIA revealed details of the new products earlier this year than in previous years, as part of its strategy to keep the industry reliant on its hardware. NVIDIA typically provides in-depth product details at the annual GTC event in San Jose, California each spring.

For Jensen Huang, CES is just one stop in his marathon of appearances. He announces products, partnerships, and investments at various events to fuel momentum for AI system deployment.

The new hardware announced by NVIDIA also includes networking and connectivity components, intended to be part of the DGX SuperPod supercomputer, but also available as standalone products for customers to use in a more modular manner. This boost in performance is necessary as AI shifts to more specialized model networks, which must not only filter massive input volumes, but also solve specific problems through multi-stage processes.

NVIDIA is promoting AI applications across the whole economy, including robots, healthcare, and heavy industry. As part of these efforts, NVIDIA announced a range of tools intended to accelerate autonomous vehicle and robotics development. Currently, most of the spending on computing power based on NVIDIA comes from a handful of customers’ capital expenditure budgets, including Microsoft, Google Cloud under Alphabet, and AWS under Amazon.

Risk warning and disclaimerThe market has risks, investment requires caution. This article does not constitute individual investment advice, nor does it take into account users' specific investment goals, financial situation, or needs. Users should consider whether any opinions, viewpoints, or conclusions in this article fit their particular circumstances. Investment made accordingly is at your own risk.