Alibaba’s latest move in embodied intelligence targets deep environmental understanding.

Author | Huang Yu
The long-standing “intelligence wall” in the field of embodied intelligence is being broken down bit by bit.
On February 10, Alibaba DAMO Academy officially released the embodied intelligence brain foundation model RynnBrain, and open-sourced a full series of 7 models at once, including the industry's first 30B MoE (Mixture of Experts) architecture.
This move is of milestone significance. According to the introduction, RynnBrain enables robots to have spatiotemporal memory and spatial reasoning abilities for the first time, and broke records (SOTA) on 16 open benchmark leaderboards for embodied intelligence, surpassing industry-leading models like Google Gemini Robotics ER 1.5.
This means that the long-standing shackles of “spatiotemporal forgetting” and “physical hallucination” in embodied intelligence are being gradually undone, and robot brains are expected to evolve from simple instruction receivers to intelligent entities with deep environmental understanding.
For a long time, the level of intelligence in embodied models has been a major bottleneck restricting robots from becoming general purpose, especially the shortcomings in generalization ability that greatly limit their application in complex physical scenarios.
To break through this bottleneck, the industry has formed multiple technical exploration routes.
According to Wallstreetcn, one type focuses on action output (VLA models), which can directly operate the physical world, but due to the scarcity of high-quality machine data, it is extremely difficult to achieve cross-scenario generalization; another type introduces brain models with generalization potential (like VLM), but these models generally lack memory capabilities, have limited dynamic cognition, and suffer from physical hallucinations, making it hard to support complex movement operations in humanoid robots.
This technical wall caused by cognitive architecture defects means that even seemingly advanced robots still struggle when facing complex movement operations.
The RynnBrain model from Alibaba DAMO Academy was created precisely to topple this wall from the underlying logic.
It is reported that RynnBrain creatively introduces two core abilities: spatiotemporal memory and physical world reasoning, both basic abilities required for deep interaction between robots and their environment.
Spatiotemporal memory refers to a robot’s ability to locate objects within complete historical memory, recall target areas, and even predict movement trajectories, endowing the robot with global spatiotemporal recall abilities.
Physical space reasoning differs from the traditional pure-text reasoning paradigm. RynnBrain adopts a reasoning strategy where text and spatial positioning are interleaved, ensuring the reasoning process is firmly rooted in the physical environment and significantly reducing hallucination issues.
For example, a robot running RynnBrain, if interrupted while executing task A and asked to do task B first, can accurately remember the time and spatial state of task A, and seamlessly resume work after completing task B. This “long brain” memory mechanism solves the long-standing “instant amnesia” problem in embodied intelligence.
In addition, according to Wallstreetcn, RynnBrain was trained based on Qwen3-VL and adopted DAMO Academy’s self-developed RynnScale architecture for deep optimization, achieving double the training speed with equal computing resources and exceeding 20 million training data pairs.
This efficient training system is directly reflected in evaluation results: in 16 key tasks including environmental perception, object reasoning, first-person visual Q&A, spatial reasoning, trajectory prediction, etc., RynnBrain comprehensively set new industry records. This is not just piling up computing power, but a successful reconstruction of the foundational architecture for embodied intelligence.
It is reported that RynnBrain also has good scalability, able to quickly further train navigation, planning, action and other embodied models, and is expected to become a foundation model for the embodied intelligence industry.
On the road to building an industry foundational model for embodied intelligence, DAMO Academy has chosen to take the open-source route.
Reportedly, DAMO Academy has open-sourced the entire RynnBrain model series this time, totaling seven models, including all sizes of base models and proprietary further-trained models. Among them is the industry’s first 30B embodied MoE architecture, which only needs 3B inference activation parameters to surpass the performance of industry 72B models, making robot actions faster and smoother.
Meanwhile, DAMO Academy has also open-sourced a new evaluation benchmark—RynnBrain-Bench—for spatiotemporal fine-grained embodied tasks, filling a gap in the industry.
Behind this large-scale open-sourcing by Alibaba DAMO Academy is obviously a grander industry ambition: to accelerate the construction of an open, evolvable embodied intelligence ecosystem.
From the perspective of global technological competition, embodied intelligence is at a pivotal turning point from “digital virtual” to “physical entity”.
Zhao Deli, head of DAMO Academy’s Embodied Intelligence Laboratory, pointed out that RynnBrain has, for the first time, enabled the brain to deeply understand and reliably plan for the physical world, a key step toward general embodied intelligence in a hierarchical large-small brain architecture. “We hope it will accelerate AI's implementation from the digital world to real physical scenarios.”
In 2017, on Alibaba’s 18th anniversary, Jack Ma founded DAMO Academy, aiming to address issues of technology and R&D that promote productivity. At that time, Ant Group promised to invest 100 billion yuan in DAMO Academy within three years.
However, in the past three years, amid major organizational changes at Alibaba Group, DAMO Academy has undergone multiple adjustments and reshuffles. Its previously rich “4+X” research fields now only remain “Intelligence + Computing”. The Intelligence direction includes medical AI, decision intelligence, video technology, embodied intelligence, genetic intelligence, etc.; the Computing direction covers computing technology, RISC-V, etc.
Embodied intelligence is clearly one of the main focus areas for DAMO Academy now.
It is understood that in embodied intelligence, DAMO Academy is building deployable, scalable, and evolvable embodied intelligence systems, and has open-sourced embodied models such as WorldVLA, which integrates world models and VLA models, the world understanding model RynnEC, and the industry’s first robot context protocol RynnRCP.
As DAMO Academy focuses on embodied intelligence, the global humanoid robot market has also entered a key node for large-scale development. In 2025, the global humanoid robot market will reach a scale starting point.
IDC data shows that last year, global shipment volume of humanoid robots was close to 18,000 units, up about 508% year-on-year, with sales of about $440 million; during the same period, cumulative order volume exceeded 35,000 units.
Although this field still faces many challenges such as scarcity of real physical feedback data, generalization in unstructured environments, and deep hardware-software collaboration, open-sourcing RynnBrain undoubtedly provides global developers with a mature “brain template” that will help accelerate the industrialization of embodied intelligence.
For the industry, this is not only a release of code, but also a redistribution of technological power. When top-tier models are no longer the secret weapons of giant laboratories, the embodied intelligence industry will enter a new cycle of accelerated iteration and collective evolution.
Risk Warning and DisclaimerThe market has risks and investment requires caution. This article does not constitute personal investment advice, nor does it take into account individual users’ special investment goals, financial status, or needs. Users should consider whether any opinions, views, or conclusions in this article are suitable for their particular situation. Investing based on this is at your own risk.