Goldman Sachs Humanoid Robot China Tour: VLA+World Model Accelerated Integration, Data Remains the Biggest Bottleneck
```
China's embodied intelligence and humanoid robot industry are undergoing a critical evolution in technical architecture. The accelerated integration of vision-language-action (VLA) models with world models is driving the industry toward commercial reality, but the scarcity of high-quality, real-world data remains the core bottleneck restricting large-scale deployment.
According to Wind Chasing Trading Desk, Goldman Sachs analyst Jacqueline Du and others released a report after visiting 14 Chinese robotics companies, stating that industry discussions have moved beyond single VLA frameworks, shifting towards execution-oriented multimodal AI stacks. Model parameter sizes are rapidly expanding to the range of 4 billion to 8 billion, and the industry increasingly agrees that building a scalable and human-centered data acquisition architecture is the decisive factor for technological breakthroughs.
For investors, this technological evolution directly reshapes market commercialization expectations. The report points out that although application scenarios are expanding into industrial and logistics fields, most projects are still in the proof-of-concept (POC) stage. The market generally expects that only after tens of millions of hours of high-quality data are accumulated based on deployable models will industry-wide large-scale commercial deployment truly land between 2027 and 2029.
Despite short-term challenges, Goldman Sachs remains highly optimistic about the long-term investment outlook in the field. The progress of the multimodal AI stack and the establishment of sophisticated data collection systems indicate that the industry is on the verge of widespread application. However, Goldman Sachs also reminds investors to remain patient, as keeping quality stable and continuously reducing costs will be key milestones in the complex process of transitioning from POC to large-scale commercialization.
VLA and World Model Integration: Accelerated Convergence of Technical Routes
According to Goldman Sachs’ report, the industry consensus on embodied intelligence model architecture is undergoing significant change. Companies are rapidly moving away from the traditional single VLA model, turning instead to the combination of VLA or vision-touch-language-action (VTLA) with world models. In this new architecture, the world model is no longer an independent category but functions alongside the action model as a functional layer, enhancing real-world planning and robustness by predicting the next state and verifying actions before execution.
Companies such as Galaxea, Galbot, Spirit AI, and One Robotics have explicitly chosen the combination of VLA/VTLA and world models as the next development direction. Against this background, model training scales are climbing from previous single-digit billion parameters to the large range of 40 billion to 80 billion parameters. Multiple industry insiders told Goldman Sachs that several rounds of iteration will be needed before these multimodal stacks reach deployable and consistent quality standards.
Additionally, companies like PaXini emphasize the importance of touch (VTLA) in physical interaction, planning to launch touch-centric models to compensate for the limitations of pure vision approaches in force control tasks.
Data Bottleneck: From "Recipe Debate" to Acquisition Architecture Building
High-quality, multidimensional real-world data remains the largest bottleneck blocking actual deployment. Goldman Sachs notes that industry focus has shifted away from vague arguments about "data recipes" to the construction of scalable architectures that can reliably produce high-quality data.
In data collection, human-centered and self-centered methods (such as generic operation interfaces UMI and first-person wearable devices) are becoming the preferred approaches, especially when companies need to preserve natural movements, rich contact interactions, and achieve cross-body migration. In actual investment, the industry shows two distinct paths: some companies prefer building centralized data factories under government support—for example, PaXini currently operates five data factories nationwide; others, like Galaxea, Spirit AI, and One Robotics, prefer building distributed deployment cycles through deployed systems, VR, and client acquisition.
Data itself is becoming an important source of profit. The report notes that several companies expect data-related income to noticeably increase its share of total revenue by 2026. UBTech, for instance, expects strong government demand for data factories, supporting its revenue and data accumulation.
Commercialization Process: Focus on Industry and Logistics, Pragmatically Advancing Scale
Currently, humanoid robot commercialization is gradually covering industrial handling, logistics workflows, and some structured commercial scenarios. According to Goldman Sachs' report, recent core opportunities are mainly concentrated in standardized or semi-structured processes such as sorting, material handling, pick-and-place, and inspection.
Adoption in the industrial sector follows a strict phased pathway. Companies usually need to undergo 3–6 months of proof-of-concept (with an average of 2–3 rounds), followed by batch testing of fewer than 50 units for each batch. After about a 12-month verification period, they enter pilot deployment at a scale of around 50–100 units per client. In the logistics sector, companies like Geek+ emphasize a "scenario-first" philosophy, breaking down complex tasks into sub-tasks with clear boundaries, and, at this stage, prioritize reliability over generality.
As for hardware form and cost-down routes, the market displays strong pragmatism. Goldman Sachs points out that due to model capability limitations and cost considerations, many vendors currently prefer the "wheeled chassis + two or three-finger gripper" combination, which covers 70%–90% of industrial application scenarios, while the ultimate form of bipedal robots with dexterous five-finger hands remains a longer-term goal.
Cost-Down Path: Scale Effects Dominate, Hardware Forms Tend Toward Pragmatism
In fierce market competition, cost reduction mainly relies on scale effects and companies’ customized choices in architecture, components, and deployment forms. For full-size humanoid robot players, full-stack R&D control remains the most common means of cost control.
Across links in the industrial chain, companies are establishing their respective competitive advantages. Linkerbot reports it holds a leading share in the global high-degree-of-freedom dexterous hand market and has achieved pricing significantly lower than overseas competitors through self-developed joint modules. Mech-Mind focuses on 3D vision systems in industrial scenarios, with core customers in auto and battery manufacturing. Meanwhile, in traditional industrial robots, Estun Automation management emphasized that company strategy has shifted drastically from only pursuing market share and shipment volume to prioritizing product portfolio, profitability, and growth quality, in order to cope with increasingly fierce domestic price competition.
~~~~~~~~~~~~~~~~~~~~~~~~
The above highlights are from Wind Chasing Trading Desk.
For a more detailed interpretation, including real-time commentary and frontline research, please join【Wind Chasing Trading Desk▪Annual Membership】
Risk Warning and DisclaimerMarkets carry risks, investment must be cautious. This article does not constitute personal investment advice, nor does it take into account the special investment objectives, financial situations, or needs of individual users. Users should consider whether any opinions, viewpoints, or conclusions in this article meet their specific situations. Invest accordingly at your own risk. ```