Why has the "star chart" designed to give robots brains ended up doing the hard work of collecting data in the physical world?
June 16, embodied intelligence enterprise Xinghai Tu disclosed its latest technical roadmap and business plan at the Global Developers Conference. At this conference, Xinghai Tu made its debut with the bipedal humanoid robot Kengo, released and open-sourced the next-generation VLA foundation model G0.5, and jointly established the data company "Yishu Intelligent" with state-owned platforms such as Beijing Yizhuang, officially launching the "1 Million Hours Ultra-High Quality Real Data Plan." With the release of Kengo, Xinghai Tu has also become the only embodied intelligence company in China to simultaneously possess both the model and the physical body. As the embodied intelligence sector gradually passes the early stage of demonstrations and showcases, the industry's focus is shifting toward fundamental business logic and real-world applications. Xinghai Tu CEO Gao Jiyang, in response to these industry changes, has repositioned Xinghai Tu. According to Gao Jiyang, first, Xinghai Tu is an embodied brain company, centered on foundation model pre-training; second, it is a software-hardware integrated company, consistently building their own machines from the startup's inception, with 80% of power units self-developed or jointly developed within the supply chain; third, it is the earliest and most committed company in China to invest in real data. ## 01 The Value Transfer of Physical Tokens At this conference, Xinghai Tu's self-developed bipedal humanoid robot Kengo officially made its debut. From a hardware perspective, to ensure absolute system synchronization, Kengo uses an EC communication architecture for joint modules, which is extremely difficult to develop. Nevertheless, despite heavy investment in self-developed hardware, Xinghai Tu has a remarkably clear boundary for its business positioning. Gao Jiyang pointed out in an interview with Wallstreetcn and others, "The whole machine and supply chain is a finite game; intelligence and applications are the infinite game." Gao Jiyang added, "If you don’t play the preceding finite game well, you have no chance to succeed in the subsequent infinite game. We spend much time and effort building our own machines and supply chain, but the real purpose is to focus on intelligence and applications – that’s the true goal." Behind this is a strict fundamental account. In developed countries, the comprehensive cost of a single laborer per year is about $40,000 to $50,000. If hardware costs drop to $10,000 and a one-year investment payback period is set, a price gap of $30,000 to $40,000 is released. This commercial premium of $30,000 or more is what physical industries are willing to pay for intelligence—the ability of devices to independently fulfill productive roles. This is the fundamental motivation behind Xinghai Tu's absolute tilt of internal R&D resources toward intelligent algorithms. Based on the dialectical relationship between "finite game" and "infinite game," Xinghai Tu has planned three strict commercial evolution stages for embodied intelligence. The first stage is current whole machine sales, mainly targeting scientific research, educational developers, and exhibition entertainment markets. Gao Jiyang frankly said that blindly pursuing absolute first place in sales during this stage is meaningless; over-expanding immature productive scenarios could instead become a liability for the company. In the second stage, as machine intelligence improves, the industry will move towards solution subscriptions for productive scenarios. Ultimately, in the third stage, embodied intelligence will fully enter the physical world token sales — like cloud computing today, charging by the actual operational output the machine generates in the physical world. To support this commercial leap, Xinghai Tu has proposed a "triple jump" in its technical architecture: instinctive intelligence, operational intelligence, and evolutionary intelligence. Meanwhile, regarding the industry’s divergence between the VLA and world model routes, Xinghai Tu has offered a solution of integration, asserting that the two are not opposed; their foundation is both about converting multimodal data into tokens and encoding them via Transformer. In practical implementation, operational intelligence centered around imitation learning (such as the newly released G0.5 model) will mainly be applied to wheeled dual-arm robots for high-precision task execution; meanwhile, instinctive intelligence focused on reinforcement learning will primarily match biped devices like Kengo, tackling whole-body movement control in complex terrains. Ultimately, both will merge in unstructured spaces, together assembling a complete embodied brain. ## 02 Data Becomes Key The current core pain point of embodied intelligence is the extreme scarcity of real data. Once the foundation models are built, for robots to handle complex real-world scenarios, the industry must amass massive, real physical data in four dimensions: actions, operated objects, scenes, and the machine body. Gao Jiyang pointed out that at the pre-training stage, Xinghai Tu almost never uses simulated data, because only real data most efficiently solves the generalization issue. But this reveals the core pain point distinguishing embodied intelligence from large language models: 99% of LLM data is publicly available web text, while 99% of embodied intelligence data is scattered, private physical data across various industries. This dictates that embodied intelligence data acquisition is a heavy-asset, laborious task with no shortcuts. At the conference, Xinghai Tu announced the joint establishment of the data company "Yishu Intelligent" together with Beijing Yizhuang Holdings, Yizhuang Robotics, and Yizhuang Guotou, and launched the "1 Million Hours Ultra-High Quality Real Data Plan." Combining frontline operational data, Gao Jiyang broke down the cost structure of this "hard work" to the public. Calculated against real physical scenarios, human-led data collection costs about 50 to 100 yuan per hour; remotely operated collection with equipment depreciation costs as much as 250 yuan per hour. According to market prices, gathering 1 million hours of data would directly cost 100 million to 200 million RMB. In practice, Xinghai Tu will adopt a dual-track approach: outsourcing targeted industrial scene collection and crowdsourcing accompany collection via wearable devices in daily operations, aiming to push data scale to tens of millions of hours in three years. This is just the tip of the iceberg for the total intelligence cost. There exists a harsh "1:10 Rule" in the embodied intelligence field: for every 1 yuan spent on data collection, at least 10 yuan must be spent on computation for training. Faced with such high expenditures, Xinghai Tu's financial strategy is to embrace AI’s unique Scaling Law. Gao Jiyang stated that spending on data and computation is not linear, but expands exponentially in stages as "1, 5, 20, 100." Thus, traditional VC slow-release funding models utterly fail here; companies must accumulate as much capital as possible during up cycles, and strictly plan expenditure pace in accordance with technological advancement. As the scale of physical world data collection explodes, ecosystem coordination and security compliance become unavoidable topics. To this end, Xinghai Tu, together with Cathay Capital, launched an early-stage embodied intelligence industry fund "Xingtu Plan" to support startup teams, and jointly established a data ecosystem alliance with Ant Group, Baidu AI Cloud, and 15 other companies, striving to open a complete loop from data collection to application. Risk Warning and Disclaimer The market has risks; investment requires caution. This article does not constitute personal investment advice, nor does it take individual users’ special investment goals, financial situations, or needs into account. Users should consider whether any opinions, views, or conclusions in this article suit their specific situations. Investing based on this is at your own risk.