Mobilizing masses into the game, JD aims to "refine" embodied data
March 16, JD.com’s announcement about building the world’s largest and most comprehensive embodied intelligence data collection center—the spotlight stolen by lobsters, hammered a heavy note into the robot track that had been dormant for some time. In a certain sense, this is a data mass production movement with strong industrial internet overtones. This mobilization covers more than 100,000 internal employees and up to 500,000 industry workers externally, even mobilizing over 100,000 citizens just in Suqian— This unprecedented human-wave tactic attempts to use the scale of brute force aesthetics to forcibly shatter embodied intelligence’s most fatal weakness: the data shortage. With model architectures gradually converging and computing power thresholds relatively transparent today, high-quality physical interaction data has become the only deciding factor for whether robots can truly enter thousands of industries. Defined as “the largest data collection action in human history,” behind it lies an industry consensus: as the “cerebellum” of embodied intelligence in charge of motor control grows ever stronger, how to feed with higher quality data a brain that truly understands the physical world is becoming the core battle shaping the future industry landscape. From JD’s grand narrative to the micro realities of the industry, whether the data generated by these hundreds of thousands of people is a gold mine or gravel is still hard to determine. Workers Involved JD.com dares, and must launch this massive data human-wave campaign because its core logic lies in its gigantic and highly complex self-operated physical supply chain. Unlike pure software internet companies, JD itself is a huge physical world interaction site, and the maturity of embodied intelligence is directly linked to its fulfillment costs and operating efficiency over the next decade. This layout is deeply coupled with the robot industrial ecosystem in Beijing Yizhuang. Currently, Yizhuang Economic and Technological Development Zone has aggregated over 300 robot-related enterprises, industry chain exceeding tens of billions, opened 40+ real application scenarios, and has become the core gathering area for humanoid robots in China. JD, as the “chain master” rooted in Yizhuang, had already announced a robot industry acceleration program. JD’s major investment in the data collection center is essentially filling the most lacking part of the industry chain. Yizhuang provides the “body” and test site, while JD tries to inject robots with real-world common sense through massive scenarios. This soft-hard industrial resonance aims to create a closed business loop from the data flywheel to hardware iteration. Scheduling hundreds of thousands is by no means easy. According to plans, collected scenarios cover logistics, industry, retail, etc. In practice, this is likely to rely on JD’s existing digital management network—for example, requiring frontline couriers and warehouse pickers to wear devices with visual and even haptic sensors during their daily work. From the perspective of frontline workers and mobilized Suqian citizens, this movement is full of complexities. Employees invisibly become the data teachers of robots, whose future goal is to replace high-intensity labor. How to design reasonable incentive and benefit sharing mechanisms to avoid employee resistance is an issue JD needs to consider. However, details of the implementation have not yet reached the employee level. A JD employee in Beijing told Wall Street Insights that so far, he hasn’t heard of this. In his view, If there’s a corresponding reward, it’s a market action—whether employees are willing to participate is up to them. A JD employee in Suqian also told Wall Street Insights he hasn’t received any notice. Although the official statement mentions that “all data collection will strictly abide by laws and regulations,” The reality is often more complicated. In the courier scenario, the warehouse pipeline is standardized, but express delivery touches thousands of households, and retail scenes involve a large amount of face features and privacy habits of consumers. With increasingly strict data compliance, the cost of desensitizing and cleaning unstructured data collected by hundreds of thousands of people may be astronomical. Breaking Moravec's Paradox In 1988, Robot scientist Hans Moravec came to this conclusion: “It’s easy to make computers perform at adult level in intelligence tests or chess, but extremely difficult—almost impossible—to give them the sensory and motor skills of a one-year-old baby.” Today, embodied intelligence’s main mapping of Moravec's Paradox is the industry’s data vacuum. The success of large models is based on devouring billions in high-quality textual data accumulated by the internet over thirty years. But the physical world has no ready-made internet. For embodied intelligence to work in the real world, it faces a huge data wall. JD’s current campaign Targets this anchor point and the difficulties behind data collection. First, simulation limitations remain to be solved. At this stage, mainstream methods of acquiring data in the industry have seriously diverged, each struggling with its own bottleneck. Most startups rely heavily on simulation environments, such as Nvidia’s Isaac Sim or MuJoCo, allowing robots to conduct millions of reinforcement learning cycles in virtual worlds. This method is low-cost, fast, and avoids hardware damage from trial-and-error. However, Industry veterans are increasingly aware of “Sim-to-Real” limitations. The complexity of the physical world lies not only in visual light and shadow changes, but in extremely subtle physical feedback— For example, cable flex and deformation, non-rigid clothing pulling, tiny friction changes when screws are tightened, even the electromagnetic noise of sensors. Current physical engine computing power cannot perfectly simulate these high-dimensional, nonlinear microphysical laws. This leads to many models perfect in simulation but seriously “stroke out” or misbehave when deployed in real machines. If simulation is full of gaps, then return to the real world. From Stanford’s viral Mobile ALOHA to today’s Figure AI, Unitree, Zhiyuan and other leading companies, all use teleoperation—humans wear motion capture suits or VR devices, controlling robots like avatars to execute tasks, recording first-person vision, joint angles, and torque data. This is currently recognized as the highest quality data collection method, But, this faces the second major commercial problem for data collection: a highly uneconomical input-output ratio. Industry estimates say a single full-size humanoid robot costs hundreds of thousands to millions in hardware. For teleoperation data collection, you need not only high hardware depreciation, but also high labor costs for specialist operators. Wall Street Insights found that A single high-quality complex task data, its collection and cleaning cost can be hundreds of dollars, and failure rate is very high. This workshop-style “hand-mixed” data mode cannot support the hundred-billion or trillion-level parameter scale required for generalization of embodied intelligence. To lower barriers, Google and other giants have initiated open data sets like Open X-Embodiment, pooling global lab data for industry-wide use. Chinese companies have also published million-level real machine data sets. But here hides another major dilemma of data collection, a huge engineering challenge— the extreme fragmentation of robot hardware. Dog-like, wheeled, bipedal humanoids, even humanoids from different makers have different joint degrees of freedom, motor torque, sensor layouts, and center of gravity. A high-quality grasping data set trained on UR5 arm can hardly be transferred directly to a Tesla Optimus or JD’s logistics robot. Precisely, the difficulty of “cross-body mapping” turns most open data into isolated islands, unable to produce large-scale effects. Maybe Precisely under these three major dilemmas, embodied intelligence’s business competition logic has fundamentally changed: whoever owns real application scenarios holds the moat for continuously obtaining cheap and high-quality closed-loop data. This explains why Tesla and JD chose a road very different from other pure hardware startups. Tesla, leveraging its huge giga-factories, lets Optimus conduct trial and error on real battery sorting lines day and night; JD tries to build a semi-automated data pipeline via its nationwide logistics network, hundreds of thousands of industry workers, and huge physical retail system. This approach turns the company’s supply chain moat directly into a data moat in the AI era. In sharp contrast, many startups with no proprietary scenarios are forced to transform— Either sell hardware at a loss to universities and research institutions to trade for shared usage data; or pay heavily to rent factory space, or hire emerging embodied intelligence data service vendors like Jianzhi to customize data. One could say, JD’s entry has torn off the algorithm veil of embodied intelligence, dragging it into a period of commercial battle built around capital, scenarios, and manpower scheduling. Faced with a data shortage, the moat of algorithms is becoming shallower, while giants controlling real physical world interaction entry points are quietly closing the net on the road to AGI. Scarcer High-Quality Data Faced with JD’s plan to accumulate over 10 million hours of real scenario data within two years, the industry’s response is not one-sided enthusiasm, but a more measured perspective. Within embodied intelligence, data quality and modality are far more important than pure duration. The algorithm industry points out the core pain: What’s lacking is not first-person human perspective video, but “state-action pairs” containing precise physical feedback. For example, Suqian citizens wearing cameras while shopping, or couriers recording delivery, generates massive amounts of internet-grade generalized visual data. This data is very valuable for training robots’ world models—letting them understand what a door or apple is; but for training robots’ “control strategies”—so they know how much force to hold an apple without crushing it—pure visual data is almost useless. A robotics industry person told Wall Street Insights, What robots lack is valuable data, especially real-machine data. In his view, JD’s operation is still basically a BPO business of process outsourcing, providing people and venues. Humans performing physical grasps involve extremely complex tactile, force, proprioceptive spatial coordinate adjustments—these high-dimensional tacit knowledge cannot be captured by ordinary wearable devices. If JD's hundreds of thousands only contribute videos, the later conversion to executable robot actions would be astonishingly loss-prone. Another top domestic robot company executive bluntly said the industry’s primary difficulty is “lack of unified data set standards.” For example, Every robot company’s joint freedom, sensor location, actuator type are all different. How can the massive human movement data collected by JD be remapped to robot bodies with different configurations? Without unified underlying standards, the 10 million hours of data may finally only be proprietary nutrition for JD’s own robots, and not become the infrastructure for advancing the whole industry. This is perhaps why JD in the first-year plan especially emphasizes “1 million hours of robot body data collection.” The real direction for the industry is: Use generalized human video pre-training for world cognition, robot body high-quality data fine-tuning for skill learning, and reinforcement learning exploration for iteration and evolution. JD’s announcement of building an embodied intelligence data collection center marks that Chinese companies are beginning to use large-scale engineering practices to solve the robot industry’s data shortage. Combining real-world scenarios with large-scale manpower indeed offers a new path for data accumulation. But to truly achieve robots’ “intelligence emergence,” sheer data scale is not enough. How to ensure high-dimensional and high-quality data in massive collection, how to establish unified data standards, and how to properly handle privacy and compliance issues in mass collection will be the key problems for companies and the entire industry when moving toward commercialization. Risk alert and disclaimer: The market has risks, investment requires caution. This article does not constitute individual investment advice and does not take into account special investment goals, financial situations, or needs of individual users. Users should consider whether any views, opinions, or conclusions in this article suit their specific circumstances. Invest at your own risk.