Top Silicon Valley funds make collective bets! Morgan Stanley's in-depth analysis explains the next frontier of AI—"World Models"
```
Large models have taken the path of “language” to where it is today, and the boundaries are increasingly clear: they excel at writing, searching, editing, and programming, but once problems involve three-dimensional space, temporal evolution, and physical constraints, existing paradigms start to struggle. Morgan Stanley is betting the next phase of growth on “world models”—enabling AI to understand, simulate, and make decisions within environments. Applications go beyond robotics and autonomous driving, and will also reshape digital content industries like gaming, design, and film production.
According to Wind Chasing Trading Desk, Adam Jonas, an equity analyst at Morgan Stanley North America, bluntly stated in his latest report: “AI is moving beyond language toward models that understand, simulate and navigate the physical world.” The subtext: the next round of competition is not about whose chatbot is more human-like, but about who can compress the laws of the real world into a usable internal representation and turn it into an interactive “imagination engine.”
The report’s evidence isn’t distant vision, but real engineering practices already underway: Waymo, using a world model based on DeepMind Genie 3, has done “billions of miles” of virtual road testing; Microsoft used Muse to turn the 1997 game Quake II into a “fully AI-rendered, playable” version; Roblox has also announced its research direction for generating immersive worlds and iterating games with its proprietary world model and natural language. Big tech companies (DeepMind, Meta, Microsoft, Tesla, NVIDIA) are working on this, and startups are scrambling for talent and funding.
More noteworthy, Morgan Stanley’s report singles out two emerging companies: Fei-Fei Li’s World Labs, which focuses on “generating navigable 3D worlds,” and Yann LeCun’s AMI Labs, which focuses on “learning efficient latent space representations for prediction and reasoning.” Behind both paths lies the same question: In what form should AI “understand the world,” and when will such understanding move from demo to productivity?
From Language to Physics: What World Models Must Fix is LLMs’ Hard Weakness
The report describes the “physical world” as a much tougher arena: governed by the constraints of matter, thermodynamics, fluid dynamics, lighting, etc., all running in a constantly changing three-dimensional space. LLMs are mainly trained on text and its variants, and are strong at white-collar tasks (coding, search, writing), but for questions like “what will happen in the next second, what are the consequences of my action,” what’s lacking isn’t data, but the ability to maintain consistent environmental representations and forward simulation over time.
Thus, a world model is defined as an “internally usable environment representation”: it’s not just about recreating what is seen, but able to roll forward the state and, when action conditions change, offer different branches of possible futures—in short, the “imagination engine” metaphor the report repeatedly uses for AI.
World models aren’t one thing: five mainstream approaches running in parallel
Morgan Stanley roughly categorizes current approaches (while emphasizing that boundaries will blur):
- Interactive, action-conditioned world models: Like “learned game engines,” the environment changes in real-time with agent actions (e.g., DeepMind Genie).
- Consistent 3D world generators: Emphasize spatial and geometric consistency with multi-view exploration (e.g., World Labs Marble).
- Abstract representation/non-generative models: Not focused on pixel-level output but on predicting higher-level latent structures and dynamics, emphasizing efficiency and reasoning (e.g., Meta V-JEPA, AMI Labs).
- Predictive generative world models: More like “predict the next frame/state,” for planning, forecasting, and driving reasoning (e.g., Wayve GAIA, NVIDIA Cosmos Predict).
- Simulation data engines with physical constraints: Combine world models with simulation/physics engines and data pipelines to generate more “physically consistent” synthetic data for robot training (e.g., NVIDIA Cosmos Transfer).

This categorization has a practical implication: while all are called world models, some pursue “generating explorable worlds,” while others want to “compress the world into computable states”—product types, compute requirements, and commercialization paths are all different.
First in games and content creation: replacing engines is tempting, but not so soon
Games are the most “intuitive” use case in the report: world models can produce interactive environments from just a few prompts, potentially accelerating content production to a whole new level. Microsoft’s AI-playable Quake II made with Muse is a powerful example—no longer requiring traditional engines for frame-by-frame rendering, but having the model predict every frame based on player input.
But Morgan Stanley’s video game analyst team (citing Matt Cost’s framework) gives a less romantic assessment: Long term, two scenarios—incumbents embed AI into toolchains for “adaptation,” or new paradigms substitute/disrupt the old. The latter seems easier, since today’s models can already “generate playable worlds via natural language.”
The challenges lie ahead: performance and cost might be solved, but “meta-systems and latency” will be tougher, while problems like “determinism, memory, and updates” may be hard nuts to crack in the world model paradigm. This means short-term constraints give legacy players a window, but long-term threats remain real.
Autonomous Driving and Robotics are More Pragmatic: Virtual Worlds for “Data Supplementation” and “Think Before Acting”
The approach for autonomous driving is clearer: moving real-world dangerous, rare, or expensive “edge cases” into virtual environments and scaling them up. The report notes Waymo used a world model based on DeepMind Genie 3 for “billions of miles” of virtual driving tests, to train and validate system performance in rare edge situations—scenarios that in reality either hardly appear, or are not controllably safe.
Robotics logic is also more engineering-driven: world models may solve two things—training data volume and pre-execution reasoning. The report mentions research showing that robots trained on data generated by world models perform “similarly” to those trained on real interactive data. But Morgan Stanley draws the line: in the short term, world models and simulation data are more likely a supplement to real data pipelines, not a replacement.

The real pain points are in “contact and friction”: The report highlights, seemingly minor physical quantities can be crucial—tiny forces applied by fingers, actuator wear-and-tear differences, minute surface friction changes, even static friction in joints can lead to large “simulation to reality” gaps.
The hardest problems: “long-term stability” and “controllability”: still several hurdles until usable
The report lists the challenges specifically and bluntly:
- Error accumulation and temporal drift: The longer the interaction, the higher the chance of object drift, geometric deformation, or physical law deviation. Genie 3, considered advanced, currently only supports “a few minutes” of continuous interaction.
- Lack of controllability: No matter how pretty, if the action space only covers basic movement, product value is limited.
- Multi-agent and social dynamics: Multi-player/multi-vehicle/multi-robot simultaneous interactions are far harder than single-camera navigation; even DeepMind identifies this as a key Genie 3 challenge.
- Scale and diversity of data: Especially for robotics, collecting real sensor data is expensive and slow.
- Lack of unified benchmarks: How to quantify long-term interaction quality? There’s no accepted standard; progress often relies on demos and task-based evaluation.
These constraints shape the practical pace: world models are likely to first spread in “high-tolerance, fast-iteration” digital content areas, gradually permeating industries needing strict physical consistency.
Fei-Fei Li’s Bet: Letting AI “Understand” 3D Space
Morgan Stanley lists World Labs as representative of “generating consistent 3D worlds.” Founded in 2023 by Fei-Fei Li and her team, and coming out of stealth in 2024, its flagship product Marble was publicly released in November 2025, with the goal of generating “persistent, explorable” 3D environments from text, images, short video, or rough 3D input, and supporting editing and expansion.
The reported functionalities resemble a creation and production workstation: objects can be deleted/edited after generation, build a rough structure then add detail with “Chisel,” expand regions, compose multiple worlds into a larger scene, export to external 3D software/engines, and offer APIs for developer integration.
It also emphasizes industry toolchain integration: export to Unreal Engine and Unity; connect to simulation platforms like NVIDIA Isaac Sim; and shows usage in architectural design, robot simulation, etc.
Venture funding is also highlighted: PitchBook estimates World Labs has raised about $1.29 billion, and after a February 2026 round it’s valued at $5.4 billion.
Yann LeCun’s Alternative Route: Not Rendering Images, Only Predicting Structure
AMI Labs has a more “research-paradigm” narrative: coming out of stealth in March 2026, co-founded by Yann LeCun, the approach goes via the JEPA framework—not reconstructing every pixel, but predicting latent embeddings for occluded/future parts, using more abstract structures to learn world evolution rules. Morgan Stanley classifies it as “abstract representation/non-generative models,” emphasizing its potential value in reasoning, planning, and physical AI systems (especially robotics).
Details on AMI’s specific products are limited in the report, with only potential applications listed: robotics, autonomous driving, video understanding/analysis, as well as AR/VR and smart assistants with cameras. In funding, AMI Labs raised over $1 billion in seed round, with PitchBook stating a post-money valuation of over $4.5 billion.
Capital and Talent Already Converging: The Spatial Intelligence Race is “Accelerating”
The most important signal from this Morgan Stanley material may not be any model parameter or single demo, but the pattern it describes: from DeepMind, Meta, Microsoft, Tesla, and NVIDIA to a host of startups, world models are becoming the “shared language of the next stage.” They explain why gaming, film, and design may see a productivity leap, and why autonomous driving and robotics will increasingly shift training, validation, and planning to the virtual world.
World models are not a plug-and-play panacea. The report’s conclusion is more like a roadmap: runnable scenarios already exist, but the real hard problems are in the open—long-term stability, controllability, multi-agent, physical detail, and evaluation frameworks. Who can close these engineering loops will mark how far this journey from digital to physical can go.
~~~~~~~~~~~~~~~~~~~~~~~~
The above is brilliant content from Wind Chasing Trading Desk.
For more detailed interpretations, including real-time previews and first-hand research, please join [Wind Chasing Trading Desk ▪ Annual Membership]
Risk Disclaimer and Legal NoticeThe market carries risks, and investments need to be made cautiously. This article does not constitute personal investment advice, nor does it take into account the individual investment objectives, financial situations, or needs of any particular user. Users should consider whether any opinions, views, or conclusions contained in this article are appropriate for their particular circumstances. Investment is at your own risk. ```