The Next Stop of the AI Revolution—What Are the New Advances in Physical AI?

```

Physics AI is moving from concept to industrial reality. In its latest industry deep-dive report, Zheshang Securities points out that, after Perceptual AI, Generative AI, and Agentic AI, Physics AI will become the next stage in AI technology evolution—its core lies in enabling models to understand and predict real-world states, thereby driving profound transformation in scenarios such as autonomous driving, embodied intelligence, and industrial software.

In terms of market size, Coatue Management estimates the Physics AI market could reach at least $6 trillion, about 50% higher than Digital AI. Nvidia CEO Jensen Huang stated at CES 2026 that Physics AI could reshape manufacturing and logistics industries worth about $50 trillion. Meanwhile, top scholars and tech giants are entering intensively: AMI Labs, founded by Turing Award winner Yann LeCun, completed a $1.03 billion seed round; World Labs, co-founded by "AI Queen" Fei-Fei Li, completed a new $1 billion round, achieving a valuation of over $5 billion in less than two years; Nvidia announced its next-gen chip Feynman, designed exclusively for Physics AI, expected to launch in 2028.

Zheshang Securities believes Physics AI currently lacks a fixed implementation paradigm and needs to be supported by world models and VLA (Vision-Language-Action models). Autonomous driving, embodied intelligence, and industrial software make up the three core application scenarios for Physics AI, among which autonomous driving is expected to be the first to achieve a "data loop" and "commercial loop." The report advises to focus on companies with world model capabilities, as well as hardware and software targets in the three major scenarios above.

Technical Definition: Paradigm shift from Generative AI to Physics AI

The Zheshang Securities report states Physics AI is an AI system capable of understanding the real world and must answer two core questions: How will the world change next, and how will the world react after an entity takes action. Compared to Generative AI, which is confined to language understanding and content creation in the digital world, Physics AI operates in the real physical world; its core capabilities include perception, action, and control, with value manifested in industrial control, embodied intelligence, and autonomous driving scenarios.

Jensen Huang summarized AI technological evolution into three generations: from Perceptual AI, to Generative AI, to Agentic AI, and the next stage is Physics AI—"AI that can run, reason, plan, and act."

The model capability of Physics AI has gone through three stages. The 1.0 era relied on hard-coded rules, with poor adaptability to scenarios; the 2.0 era shifted to data-driven approaches, using vast amounts of data for imitation learning but lacking true understanding of the physical world; the current 3.0 reasoning-driven era centers on world models + VLA + reinforcement learning, featuring environmental reasoning, causal understanding, and planning capabilities, supporting closed-loop decision-making for complex tasks.

Core Technologies: World models and VLA have yet to form a unified paradigm

The Zheshang Securities report emphasizes that the current implementation of Physics AI depends on two core components: world models and VLA, but neither has converged to a defined technological route yet.

The original concept of world models comes from reinforcement learning, referring to AI agents constructing internal representations of the external world to "simulate" action plans in their minds. The core value is that the real world is irreversible, and traditional simulation cannot support repeated "decision—observe result" trial and error, whereas world models can create virtual environments that closely approximate the real world, enabling AI training at lower cost and with greater safety.

Google DeepMind CEO Demis Hassabis said in a CNBC 2026 New Year interview: AGI is still missing one piece, which could be the world model.

At present, world models have formed four mainstream technological paths in academia: Observation-level generative models excel in "realism," represented by Sora; latent-space models excel in "efficiency," represented by the JEPA series; reinforcement learning-oriented models excel in "decision-making," represented by the Dreamer series; object-centered models excel in "interpretability," represented by SlotFormer. Fei-Fei Li believes that world models need to be generative, multimodal, and interactive.

VLA models (Vision-Language-Action models) use end-to-end learning to map visual and language task semantics onto specific actions within a unified model, bypassing manual rule design and modular integration. Since Google DeepMind launched RT-2 in 2023, VLA research entered a new phase; in 2024, Stanford released OpenVLA, the first open source 7B parameter universal robot VLA model; Nvidia released the GR00T N1 open source foundational VLA model for general humanoid robots in 2025.

Three Key Application Scenarios: Autonomous Driving, Embodied Intelligence, and Industrial Software

Zheshang Securities sees autonomous driving as the scenario most likely to first realize Physics AI's "data loop" and "commercial loop." Global vehicles accumulate approximately 13 trillion miles of driving each year, providing sustainable multi-modal real-world data, clear commercial charging models, and scalable industry chains, which grant autonomous driving unique advantages.

At the 2026 Beijing Auto Show, Physics AI was already a hidden main theme. Among autonomous driving solution providers, Pony.ai CTO Lou Tiancheng released World Model 2.0, with the core breakthrough of AI self-diagnosis and targeted evolution; Momenta officially released the R7 reinforcement learning world model; Qingzhou Zhihang announced a strategic shift from "driverless" to "universal Physics AI." Among carmakers, Xpeng plans to increase its Physics AI R&D investment in 2026 to 7 billion yuan; Geely released the WAM World Behavior Model and announced a deep collaboration with Nvidia in Physics AI; Chery officially announced a global strategic partnership with Nvidia, focusing on assisted driving, cockpit AI, and robotics.

Embodied intelligence is defined by Zheshang Securities as the core carrier of Physics AI's "perception—understanding—reasoning—action" closed loop. The evolution of the Physics AI tech stack is moving robotics from "rigid automation" to "true autonomy" — compared to traditional robots, Physics AI-powered robots can handle unpredictable and unknown parts, reduce manual coding workloads, and speed up deployment.

Industrial software is positioned as the "console" for Physics AI training, validation, deployment, and operations. The report concludes that industrial software data is non-reproducible, compliance requirements are high, and cloud-edge coordination is complex, creating a strong moat. Industrial software and Physics AI are complementary and symbiotic, mutually enabling each other: industrial software provides the physical foundation, high-quality data, and verification environments for Physics AI; Physics AI provides intelligent acceleration, automated decision-making, and closed-loop optimization for industrial software. CAE simulation, digital twins, industrial control, industrial IoT, energy scheduling, EDA/CAD, are all key beneficiary scenarios.

Risk warning and disclaimerThe market involves risks, investment requires caution. This article does not constitute personal investment advice, nor does it take into account individual users' specific investment objectives, financial situations, or needs. Users should consider whether any opinions, views, or conclusions in this article are suitable for their particular circumstances. Invest at your own risk. ```