NVIDIA GTC Conference Full Text: Jensen Huang Announces the Arrival of the Inference Era, Lobster is the New Operating System
```
Opening Address
Host: Welcome to NVIDIA founder and CEO Jensen Huang on stage.
Jensen Huang, Founder and CEO:
Welcome to GTC. This is a technology conference, and I’m delighted to see so many people queuing up early in the morning.
Today, our discussion will revolve around three major platforms: the CUDA-X platform, the system platform, and the brand new AI factory platform. And most importantly, the ecosystem.
First, I want to thank our 'pre-game warm-up' hosts who did an outstanding job: Sarah Guo from Conviction, Alfred Lin from Sequoia Capital (NVIDIA’s first venture capitalist), and Gavin Baker, NVIDIA’s first significant institutional investor. All three have profound knowledge of technology and extensive influence in the tech ecosystem. I also thank all VIP guests hand-picked by myself.
Also, thanks to all participating companies. NVIDIA, as a platform company, has technology, platforms, and a rich ecosystem. Today, virtually every company from trillion-dollar industries is gathered here—450 companies sponsored this event, thank you very much.
This conference features 1,000 technical sessions, 2,000 guest speakers, and will cover each layer of the 'AI five-layer cake'—from land, power, and infrastructure, to chips, platforms, models, and finally, various applications powering the industry’s takeoff.
20 Years of CUDA
This year marks the 20th anniversary of CUDA.
For 20 years, we have dedicated ourselves to this architecture—a revolutionary invention: SIMT (Single Instruction, Multiple Threads), which allowed scalar code to be extended into multi-threaded applications, making programming easier than traditional methods. Recently, we have added Tile support to help developers program Tensor Cores and today’s AI math structures with greater ease.
So far, CUDA has gathered thousands of tools, compilers, frameworks, and libraries, with hundreds of thousands of open projects in the open-source community, and is deeply integrated into every mainstream ecosystem.
Flywheel Effect and Installed Base
This chart basically describes NVIDIA’s entire strategy.
The hardest, and most strategically valuable, is the foundational installed base. After 20 years, we have built hundreds of millions of GPUs and computing systems running CUDA worldwide. We cover every cloud service provider and computer manufacturer, serving nearly every industry.
CUDA’s installed base is the fundamental reason for the accelerating flywheel effect. The vast base attracts developers; developers create new algorithms; new algorithms drive breakthroughs—like the birth of deep learning. These breakthroughs open up new markets; new markets attract more ecosystem partners, thus forming an even bigger installed base. This flywheel keeps accelerating.
Currently, downloads of NVIDIA libraries are growing at a staggering speed, and still accelerating. This flywheel enables the computing platform to sustain massive applications and endless technological breakthroughs.
More importantly, it also allows the infrastructure to have a long life cycle. Because NVIDIA CUDA can run a wide range of applications, covering every phase of the AI life cycle, every type of data processing platform, and every kind of scientific solver, once you install an NVIDIA GPU, its lifespan is considerable.
This also explains why the cloud pricing for our Ampere architecture launched six years ago is still increasing.
Meanwhile, with continuous software updates, computing costs keep decreasing—not only in the leap of performance at initial deployment, but also in the long-term sustained cost reduction brought by accelerated computing. Because all GPUs are architecturally compatible with each other, we are willing to support and maintain every NVIDIA GPU globally. The larger the installed base, the more users benefit from each new optimization.
This dynamic combination enables NVIDIA’s architecture to expand coverage, accelerate growth, and continually lower computing costs, which in turn triggers new growth.
Beginning of CUDA: GeForce
CUDA’s journey actually began 25 years ago with GeForce.
GeForce is NVIDIA’s most successful marketing. We began attracting future customers when they were young and not yet able to buy—back then, their parents paid for them, year after year, until they grew up to become outstanding computer scientists and real developers.
25 years ago, we invented the programmable shader—the world’s first programmable accelerator, marking the beginning of pixel shaders. This invention drove us to keep exploring deeper, and five years later, gave birth to CUDA.
Promoting CUDA from GeForce to every computer was one of our largest investments at the time—even though it was hard to afford, it consumed most of the company’s profit. We believed in its potential, and after 20 years and 13 generations of architecture, CUDA is now everywhere.
About eight years ago, we introduced RTX, completely redesigning the architecture and introducing two then-new concepts: hardware ray tracing and AI-driven graphic rendering. Just as GeForce brought AI to the world—letting Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton, Andrew Ng, and others discover that GPUs are deep learning accelerators, igniting the AI explosion—now AI will in turn completely revolutionize computer graphics.
Neural Rendering: DLSS 5
Today, I want to show you the future of graphics technology. We call it Neural Rendering—the fusion of 3D graphics and AI, and this is DLSS 5.
(Playing video)
Stunning effects, right? We combine controllable 3D graphics (the 'structured data' of virtual worlds) with generative AI (probabilistic computation): one is completely predictable, the other is probability-driven but highly realistic. Together, the content generated is beautiful, realistic, and fully controllable.
The fusion of structured information and generative AI will play out again and again in different industries. Structured data is the foundation of trustworthy AI.
Structured and Unstructured Data Platforms
This next chart may surprise you, but bear with me.
Structured data—SQL, Spark, Pandas, Velox, and platforms like Snowflake, Databricks, Amazon EMR, Azure Fabric, Google BigQuery—all deal with Data Frames. These Data Frames are essentially giant spreadsheets carrying all business information, the 'baseline facts' of enterprise computing.
In the AI era, these structured data will be called by AI at high speed, so they must be extremely accelerated. Future AI agents will also widely use structured databases.
Unstructured data represents the majority of information in the world: vector databases, PDFs, videos, speech... About 90% of annually generated global information is unstructured. In the past, this data was almost useless—we stored it in file systems but couldn't search or query it.
Now, AI has changed all that. Just as AI solved the problem of multimodal perception and understanding, the same technology can read PDFs, understand their meanings, and embed them in a bigger searchable, queryable structure.
To enable this, NVIDIA has created two foundational libraries:
- cuDF: for Data Frame and structured data acceleration
- cuVS: for Vector Store, semantic data, and unstructured AI data acceleration
These two platforms will become among the most important computing platforms in the future.
Today, we officially announce multiple collaborations:
- IBM (the inventor of SQL) is using cuDF to accelerate its WatsonX data platform
- Dell is working with us to build the Dell AI data platform, integrating cuDF and cuVS, designed for the AI era
- Google Cloud: we accelerate their Vertex AI and BigQuery platforms; for example, we helped Snapchat cut its computing cost by nearly 80%
- AWS: we accelerate EMR, SageMaker, and Bedrock, and will bring OpenAI to AWS, driving large-scale cloud computing consumption
- Microsoft Azure: we accelerate Azure AI Foundry, deeply support Bing search, and expand Azure regional deployments
- CoreWeave: the world’s first AI-native cloud, born for GPU hosting and AI inference
- Oracle: we are Oracle’s first AI customer
- Palantir + Dell: three-way collaboration, enabling AI platform deployment locally in any country, any isolated region
NVIDIA's Core Strategy: Vertical Integration and Horizontal Openness
NVIDIA is the world’s first vertically integrated yet horizontally open computing company.
Accelerated computing is not a chip problem or a system problem; its core is application acceleration. To continuously deliver acceleration and cost reduction in every application domain, you must deeply understand the application, the domain, and the algorithm, and implement in every deployment scenario—be it cloud, on-premises, edge, or robotics.
This is why NVIDIA has to deeply cultivate each vertical domain. We integrate algorithms into the computing platform and open it for worldwide use.
This GTC covers nearly every vertical in the NVIDIA ecosystem, including:
- Autonomous driving
- Financial services (the largest industry at this GTC, hope the attendees are developers, not traders)
- Healthcare (experiencing its "ChatGPT moment")
- Industrial manufacturing
- Entertainment and gaming
- Robotics (110 robots exhibited, almost every robotics company is collaborating with NVIDIA)
- Telecom (about $2 trillion industry; base stations will evolve into AI edge computing infrastructure)
We announce updates to 100 libraries and about 40 models at this event. These libraries are the company’s core assets and key to enabling the computing platform and solving real problems.
One of the most important libraries is cuDNN (CUDA Deep Neural Network library), which revolutionized AI and ignited the explosion of modern AI.
Arrival of the Inference Turning Point
What has happened in the past two years? Three major things drove it all:
First: The launch of ChatGPT and the generative AI era (end of 2022 to 2023). AI not only perceives and understands, but also translates, creates, and generates new content. Generative computation fundamentally changed computer architecture and construction logic.
Second: The rise of inference AI (o1 and o3 models). Inference AI enables models to reflect, plan, and decompose complex problems into actionable steps, making AI more trustworthy and rooted in facts. This caused ChatGPT usage to surge, sharply increasing the computing volume of input and output Tokens.
Third: The birth of Claude Code and Agentic AI. This is the first true Agentic model, able to read files, write code, compile, test, evaluate, and iterate improvements. Claude Code has fundamentally transformed software engineering. Now, every NVIDIA internal software engineer uses AI Agents to assist with programming.
AI has evolved from "perception" to "generation," from "generation" to "reasoning," and from "reasoning" to "execution"—now AI can do genuinely productive work.
The inference turning point has arrived. Every time AI thinks, acts, reads, reasons, inference is required, and Token generation demand is exploding. In the past two years, the compute requirement for a single job has increased by about 10,000 times, usage about 100 times, with total compute demand rising nearly a million-fold.
From $500 Billion to $1 Trillion
Last year's GTC, I mentioned that we saw high-confidence demand of about $500 billion for Blackwell and Rubin through 2026.
Today, one year later, standing here at GTC, I can clearly see: at least through 2027, demand will reach $1 trillion.
And I am confident actual compute demand will be much higher.
Last year was NVIDIA’s year of inference. We ensured outstanding performance in every phase of the AI lifecycle beyond training and post-training, extending infrastructure investment value for the long term.
We are happy to see Anthropic chose NVIDIA, and Meta SL chose NVIDIA. Currently, open-source models are almost at the forefront and are everywhere. NVIDIA is now the only platform covering every AI domain—language, biology, graphics, computer vision, speech, proteins and chemistry, robotics—across all AI models, from edge to cloud, everywhere.
Our architecture's 'fungibility' makes it the platform with the lowest cost and highest confidence for building AI infrastructure. If you are investing a trillion dollars in infrastructure, you want complete confidence—NVIDIA is currently the only computing platform that lets you confidently deploy, on cloud, on-premises, or in any country.
Now, 60% of our business comes from five large-scale cloud service providers, the other 40% from regional clouds, sovereign clouds, enterprise, industry, robotics, edge, and supercomputing. This diversified coverage is resilience itself—AI is no longer a single application, but a true transition of computing platforms.
Breakthrough in Inference Performance
We have achieved major breakthroughs in inference optimization:
This is the most comprehensive AI inference performance evaluation (from Semi Analysis). Analysis dimensions:
- Vertical axis (Tokens per Watt): reflects throughput. Every data center is power-constrained; a 1 GW factory cannot become 2 GW, so Token production must be maximized within limited power.
- Horizontal axis (Inference speed/Token rate): reflects interactivity and AI 'intelligence.' The faster, the bigger the model, the longer the context, the deeper the thinking—the 'smarter'.
The results are astonishing:
From Hopper H200 to Grace Blackwell, Moore’s Law predicts about a 1.5x improvement, but actual achieved is 35x per watt performance. Dylan Patel from Semi Analysis even noted my data is conservative—it’s actually 50x.
This means NVIDIA's Token cost is the lowest worldwide. A 1 GW data center costs about $40 billion (amortized over 15 years); you pay the fixed cost regardless—so you must install the best-performing computing system to realize the lowest Token cost. This is currently unrivaled.
Fireworks AI example: after we updated their software, the same system’s Token rate rose from about 700 tokens/s to nearly 5,000 tokens/s, a 7x boost—true power of co-design.
Token factory business logic:
Every cloud provider and AI company will look at their business from the Token factory perspective. Different Token rates correspond to different service levels and pricing:
- Free tier: high throughput, low speed
- Basic tier: about $3/million Tokens
- Standard tier: about $6/million Tokens
- Premium tier: about $45/million Tokens
- Top tier: about $150/million Tokens (high speed, ultra-long context, biggest models)
Grace Blackwell example: compared to Hopper, at the most commercially valuable service level, throughput increased 35x, significantly boosting monetizable capacity and raising overall data center revenue by about 5x.
Vera Rubin: Next Generation Architecture
(Playing video)
Now, I’m no longer just showing a chip—I’m showing a whole system. This is Vera Rubin.
Vera Rubin is designed for Agentic systems, with very clear logic:
- Large language models will get bigger, needing more Token generation and faster thinking;
- AI Agents will frequently access memory (KV Cache), structured data (cuDF), and unstructured data (cuVS);
- Storage systems will be under immense pressure;
- Tool calls require CPUs with extremely high single-thread performance.
For this, we built the new Vera CPU—the only LPDDR5 data center CPU in the world, combining high single-thread performance, robust data handling, and unparalleled efficiency.
The core features of the Vera Rubin system:
- 100% liquid cooling, all cabling greatly simplified
- Installation time reduced from two days to two hours
- Uses 45°C hot water for cooling, drastically reducing data center cooling energy
- Comes with sixth-generation NVLink switch system (the world’s only)—entirely liquid cooled, a feat I am incredibly proud of our team for
- The world’s first CPO Spectrum-X switch (Co-Packaged Optics) is in mass production: photons directly integrated into the chip, electronic signals converted to photons and directly connected. This process co-developed with TSMC, we are the only mass producer, named 'CoOP', truly revolutionary.
- All series CPUs are in mass production and are confirmed to become a multi-billion-dollar standalone business
Rubin Ultra (Super Edition):
Rubin Ultra uses new 'Kyber' racks, supporting 144 GPUs in a single NVLink domain. Compute nodes insert from the front, NVLink switches connect from the back via mid-board, forming a giant computer.
Technology roadmap:
- Blackwell (current): Oberon system, NVLink 72 support
- Vera Rubin: Kyber rack (NVLink 144) + Oberon copper/optic extended to NVLink 576
- Vera Rubin Ultra: Rubin Ultra chip + LP35 (introducing NVFP4 compute structure)
- Feynman (next gen): new GPU + LP40 + Rosa CPU (short for Rosalyn) + Bluefield 5 + CX 10 + dual mode copper and CPO expansion
Grok Acquisition and Heterogeneous Inference Breakthrough
We acquired Grok’s technical team and obtained IP licensing for deep integration.
The core features of Grok processor:
- Deterministic dataflow processor, statically compiled, computations scheduled by compiler
- Compute and data arrive simultaneously, fully software-scheduled, no dynamic scheduling
- Massive SRAM, designed for inference workload only
Its limitation: a single Grok chip holds only 500MB storage (compared to 288GB for a single Rubin chip), unable to fit large model parameters or KV Cache, limiting scalability—until we had a brilliant idea.
Dynamo: Inference Decoupling Framework
We developed Dynamo software, re-architecting inference flow:
- Prefill stage: runs on Vera Rubin (needs massive compute)
- Decode stage Attention computation: runs on Vera Rubin (needs massive compute)
- Decode stage FFN/Token generation: runs on Grok chip (needs large bandwidth, low latency)
Two fundamentally different processors—one for high throughput, one for low latency—tightly coupled by Dynamo, reducing latency by about 50%.
Result: at the most commercially valuable service tier, performance improves 35x, and unlocks an unprecedented new inference performance tier.
Grok LP30 is manufactured by Samsung, already mass produced, shipping expected starting Q3 2026.
Optimal Grok deployment strategy:
- If workload is high throughput: 100% Vera Rubin
- If much high-value code-gen or high-speed Token demand: suggest 25% Grok, 75% Vera Rubin.
AI Factory Scale and Outlook
In a gigawatt-scale factory, just two years using these architectural optimizations will raise Token generation rate from 22 million to 700 million—a 350x increase.
This is the power of extreme co-design—vertical integration, horizontal openness, so everyone benefits.
As AI factories scale rapidly, we found a key issue: data center tech suppliers previously never interacted, each developed independently, causing huge energy waste.
To solve this, we created the NVIDIA DSX platform based on Omniverse, enabling all partners to co-design gigawatt AI factories in virtual worlds—full-system simulation including mechanics, thermal management, electrical, and networking, connected to the grid in real-time, dynamically optimizing via Max-Q power and cooling.
We believe this platform alone can deliver about 2x efficiency gains—at trillion-dollar scale, that’s enormous value.
Also, NVIDIA is heading to space: Thor chip has passed radiation certification, deployed on satellites. We’re working with partners on Vera Rubin Space-1, building data centers in space (solving pure radiation cooling engineering challenges).
OpenClaw: Operating System of AI Agents
Now let’s discuss a significant new discovery.
Peter Steinberger developed a software called OpenClaw. It became the most popular open-source project ever, surpassing Linux’s 30-year spread in just weeks.
What is OpenClaw? It’s an Agentic system able to:
- Connect large language models
- Access tools and file system
- Perform scheduling and timed tasks
- Break problems into step-by-step subtasks
- Generate and call sub-Agents
- Support multimodal interaction (text, voice, gestures, etc.)
In other words, OpenClaw is the operating system of Agentic computers. Just as Windows made personal computers possible, OpenClaw makes personal agents possible.
The key question for every company now is: What is your OpenClaw strategy?
Just as every company once needed a Linux strategy, an HTTP/HTML strategy, a Kubernetes strategy, now every company must have an OpenClaw strategy and an Agentic system strategy.
Enterprise IT paradigm shift:
Old model: Data center stores files → software tools → humans use tools
New model: Every SaaS company will become an AaaS (Agentic as a Service) company, offering specialized Agent services.
However, enterprise Agentic systems have major security challenges: they can access sensitive info, execute code, and communicate externally. So we partnered with Peter Steinberger and global top security experts to develop OpenClaw Enterprise edition, using OpenShell security tech, policy engine, network guardrails, and privacy router—building a secure enterprise reference architecture, named NemoClaw, available for direct download.
NVIDIA Open Model Initiative
NVIDIA has established leadership in all frontier AI model domains:
| Model | Domain |
|---|---|
| Nemotron | Large Language Model |
| Cosmos | World Foundation Model |
| GROOT | General-purpose Robot Model |
| Alpamayo | Autonomous Driving |
| BioNeMo | Digital Biology/Drug Discovery |
| PhysicsNeMo | AI Physics Simulation |
Today, we officially announce the Nemotron Alliance, collaborating with these companies to jointly develop Nemotron 4:
- BlackForest Labs (image generation)
- Cursor (code editing)
- LangChain (custom Agent building framework, a billion downloads)
- Mistral (open-source large model)
- Perplexity (AI search)
- Reflection (multimodal Agentic system)
- Sarvam (Indian AI company)
- Thinking Machines (Mira Murati’s lab)
These companies are working with us to deeply integrate the NemoClaw reference design, NVIDIA Agentic AI toolkit, and full range of open models into their products and services.
Physical AI and Robotics
Besides digital Agents, we have long been committed to physical AI and robotics.
We have built three key computers for robotic systems:
- Training computer
- Synthetic data generation and simulation computer
- Embedded robot computer
We are deeply integrated with partners like Siemens, Cadence, and announce a series of major collaborations:
Autonomous driving: The 'ChatGPT moment' for autonomous driving is here. Today, we announce four new RoboTaxi partners: BYD, Hyundai, Nissan, Geely, plus previously Mercedes-Benz, Toyota, GM—combined 18 million cars per year, all connecting to NVIDIA RoboTaxi Ready platform. Also, we announce a major partnership with Uber to deploy RoboTaxi vehicles in multiple cities and integrate with their network.
Industrial robotics: We collaborate with ABB, Universal Robots, KUKA, Caterpillar and others to integrate Physical AI models and simulation systems into manufacturing lines worldwide.
Telecom: T-Mobile is here too—the wireless base stations of the future will evolve into NVIDIA Aerial AI RAN, able to dynamically infer traffic, self-adapt beam forming, improving signal quality while saving significant energy.
Finally, we showcased the 'Olaf' robot co-developed with Disney—based on Jetson computing platform, Omniverse training, and the Newton physics solver (jointly developed with Disney and DeepMind, running on NVIDIA Warp), achieving adaptive movement in the real physical world. This is a fantastic demonstration of Physical AI and a preview of future theme parks.
Summary
This GTC revolves around four key themes:
- Inference Turning Point—AI transitions from "understanding" to "generating," to "reasoning," to "working," with compute demand exploding a million times; the inference turning point is here;
- AI Factory—data centers are shifting from storing files as "data centers" to producing Tokens as "AI factories," Vera Rubin will deliver about 5x revenue increase per service tier;
- OpenClaw and Agentic Revolution—Enterprise IT is undergoing profound transformation, every company must develop an Agent strategy, NemoClaw provides a secure reference design;
- Physical AI and Robotics—autonomous driving, industrial robots, humanoid robots; the era of Physical AI has arrived.
Have a great GTC, thank you!
Risk DisclaimerThe market is risky and investment needs caution. This article does not constitute personal investment advice, nor does it consider individual users’ specific investment goals, financial status, or needs. Users should evaluate whether any opinions, views, or conclusions in this article are suitable for their specific situation. Investing accordingly is at your own risk. ```