After attending the robotics summit, Barclays pours cold water: "The GPT moment" hasn't arrived yet, true commercialization is still a long way off.

After Barclays visited the Boston Robotics Summit & Expo, it poured a not-too-cold but sufficiently sobering bucket of water on humanoid robots: there are more and more demo machines, prototypes, and single-task robots, and the industry has accepted the path of "AI entering the physical world." However, the timetable for putting fully autonomous, general-purpose humanoid robots to work at scale in human environments is not so close.

According to Windy Trading Desk, Barclays thematic investment analyst William Thompson wrote in a June 8 research report that humanoid robots are coming, but the real question is when and at what scale. The more certain short-term direction is single-task or multi-task robots in controlled scenarios like welding and logistics; the toughest challenge for general-purpose humanoid robots remains stuck behind several hurdles: safety, hardware, perception, data, and computing power.

This also explains why many companies are still at the pilot stage. Robots not only need to move but must do so reliably in complex environments; not just recognize objects, but turn recognition into low-latency actions; not just train models, but get enough real-world data. Meanwhile, many humanoid robot companies are starting to vertically integrate hardware manufacturing, making their own motors and actuators, or leveraging the automotive supply chain to lower costs and ensure deliveries.

The first to be deployed are not "general-purpose humanoids," but narrow-task robots

Short-term deployment is easier in controlled environments: warehouses, factories, welding, logistics. These scenarios have clear goals, relatively fixed paths, and controllable unexpected situations. Robots don’t need to understand the whole world like humans, just complete limited tasks.

The challenge for general-purpose humanoid robots is not the demonstration, but the long-tail problems in real environments. Uneven floors, chaotic placement of objects, moving personnel, changes in lighting, non-standard layouts—all of these can cause robots to fail. Mistakes in factories and warehouses usually have less severe consequences than on public roads, which makes companies more willing to try "imperfect but supervisable" systems, but this does not mean safety and reliability can be skipped.

Autonomous driving has been repeatedly used as an analogy. Autonomous driving went from early optimistic expectations to broader deployment, undergoing a decade of safety reviews, regulatory friction, and rebuilding public trust. Humanoid robots may also go through a stage of "humans in the loop": human remote supervision and takeover when necessary, letting the system accumulate data in real scenarios.

Safety is not an add-on, but the prerequisite for scaling

Traditional industrial robots are often kept in cages, executing programmed actions; humanoid robots are designed to enter human activity areas. This shift moves the problem from "can the machine complete the action" to "who bears the consequences when the machine makes mistakes."

Reliability directly affects business value. If robots frequently stop working, factories lose not just equipment efficiency, but production line stability and employee trust. The framework mentions that AI is expected to raise reliability from about 85% to over 95%, but for many industrial scenarios, 95% may still not be enough. The closer to real production, the lower the tolerance for errors.

Safety also includes cybersecurity. Humanoid robots are essentially connected, software-defined systems integrating sensors, actuators, AI models, and continuous connectivity. Once illegally accessed, models tampered with, or data corrupted, the problem is not just an IT mishap but could become an operational risk in the physical world. Before adoption, companies will require the system to have security architecture, update mechanisms, and fail-safes.

Physical AI still doesn’t have its "GPT moment"

The explosion of large language models had signature moments like GPT-3, and earlier foundations like the Transformer architecture and self-attention mechanisms. The robotics field hasn't yet seen a similar breakthrough: a universal architecture that allows machines to reliably perceive, plan, and act in multi-environment, multi-task, long-tail scenarios.

What seems simple for humans is often hardest for machines. Perception, navigation, grasping, balancing—these are instincts for humans but complex engineering for robots. This is precisely the Moravec Paradox: logical reasoning, playing chess—tasks humans find difficult—are done well by algorithms; yet the motion and perception that human children complete easily are extremely hard to automate.

The industry is attempting several approaches. First is fast and slow systems: low-latency controllers handle reflexive actions, high-level models handle planning and long-term reasoning. Second is reinforcement learning, allowing robots to improve control strategies through trial and error. Third is VLA models—visual-language-action models—which translate visual observations and language commands into action outputs, enabling robots to understand and execute commands like "pick up the red cup."

The long-term goal is a robotic world model: a system that can transfer across tasks, environments, and even different robot bodies. The issue is, the physical world is much messier than the textual world. Models not only need to understand, but also act with low latency, low power, and controllable risk.

The biggest data gap is the lack of a "robot perspective" world

Text and image models consume internet data. Robots don’t have this kind of resource bank. YouTube has a vast amount of human activity videos, but lacks critical kinematic info like joint movements, actuator commands, sensor feedback, and thus cannot directly teach robots how to interact with the physical world.

Autonomous driving has a unique advantage: millions of cars can collect data on public roads. General-purpose humanoid robots can’t do this now. Collecting real robot data is slow, expensive, and risky; even with remote operation, each machine can only run limited hours per day, and a single serious fall or collision may cause hardware damage and downtime.

Simulation and digital twins thus become important. Developers can let thousands of virtual robots practice in parallel, generating data across different terrains, lighting, and tasks. Its value is like "80/20": use simulation to quickly cover many scenarios, then reserve limited real-world testing for the hardest parts.

But there is still a gap from simulation to reality. Actions learned in virtual environments still need calibration and fine-tuning in the real world. Tesla's Optimus path is an example: using autonomous driving simulation experiences to train humanoid robots. Musk also described the "Optimus Academy" concept—tens of thousands of physical robots training in controlled facilities, alongside millions of simulated robots running.

Computing power competition will move from data centers to every robot

Physical AI’s computing power requirements fall into three layers.

First is simulation computing power. Training humanoid robots needs large-scale physical simulation and digital twins, especially running numerous virtual robots in parallel to generate synthetic data and reinforcement learning. This consumes AI data center resources.

Second is foundational model training. VLA models need to integrate vision, language, sensor input and output action plans, with parameter sizes of 1-2 billion, long training cycles, and high GPU consumption. The faster humanoid robots develop, the greater the pressure to compete for computing resources with other AI workloads.

Third is edge computing power on the robot itself. After deployment, robots can’t send all decision-making to the cloud. Keeping balance, avoiding obstacles, grasping—all need responses within tens of milliseconds. Large models must be compressed, distilled or redesigned to run on battery-powered hardware. NVIDIA’s open VLA model GR00T N1.6, with about 3 billion parameters, represents the direction of "miniaturization and deployability."

This drives two types of demand: cloud GPUs for training and simulation, low-power edge hardware for on-board robot inference. The cost of the perception stack for a single humanoid robot can reach about $20,000, demonstrating that computing isn’t a marginal cost issue for software companies, but will be part of each robot’s BOM.

Hardware is still the slowest leg

Software can iterate quickly, but hardware can't. Motors, actuators, sensors, hand structure, battery systems—all require design, supply, manufacturing, assembly, and feedback cycles. Without sufficiently safe and reliable products, it’s hard to build production capacity at scale; without scale manufacturing, it’s hard to lower costs and get more real feedback. This is the classic chicken-and-egg problem.

The industry still lacks mature universal components. The summit featured lots of 3D-printed parts, suitable for prototype verification, but not for low-cost mass production. Target costs are repeatedly anchored at about $20,000 per unit, borrowing from automotive ideas: standardization, modularization, reducing parts count, enabling fast on-site module replacement.

The hand is particularly difficult. Leading designs hope for about 22 degrees of freedom per hand, but a relatively limited dexterous humanoid robot hand still costs about $2,000. Actuators are another major component, with a humanoid robot typically needing 30 to 60 actuators. Competition among suppliers isn’t just about selling motors, but integrating firmware, sensors, safety features to improve torque control, fault detection, and reliability.

Sensors are also bottlenecked for scaling. Robots need multimodal sensing: vision, force, torque, touch, balance. High-performance tactile sensors, joint torque sensing, and body self-perception all increase cost and integration risk. Many current sensor stacks are considered too fragile, too expensive, or hard to mass-produce.

Batteries are another practical concern. If robots don’t have enough power for continuous work, companies must prepare backup robots, further driving up costs. Hot-swappable batteries are one mitigating approach: Boston Dynamics Atlas, Mentee Robotics’ forthcoming Mobileye humanoid robot, Unitree G1/H1, AgiBot Expedition series—all use or support on-demand battery swapping to reduce downtime.

Vertical integration is not a posture, but a choice under supply chain pressure

Many humanoid robot companies are making their own key components, not just to tell a story, but because the current supply chain isn’t ready yet.

1X has been refining its own tendon-driven motors since 2015, completing everything from copper winding to final actuator assembly in its California factory, and has produced about 17,000 motors. Apptronik developed its own high-torque actuators for Apollo, while partnering with Jabil for pilot and strategic manufacturing, producing Apollo and deploying in some Jabil manufacturing operations.

Boston Dynamics plans to leverage standard parts from Hyundai's automotive supply chain to improve Atlas’ reliability and manufacturability. Tesla’s approach is closer to automotive reuse: applying EV-grade motors, power electronics, and its self-developed FSD computing platform to Optimus, with the long-term goal of reaching automotive-style production volume and cost, annual output in the tens of thousands, and unit costs dropping to about $20,000 over time.

This path is not easy. Automotive supply chains offer manufacturing scale experience, but humanoid robots are not cars. They require denser joints, more complex touch sensing, higher real-time control demands, and must work alongside humans. Manufacturing capability is just an entry ticket, not the decisive factor.

~~~~~~~~~~~~~~~~~~~~~~~~

The above highlights are from Windy Trading Desk.

For more detailed interpretation, including real-time insights and frontline research, please join 【Windy Trading Desk ▪ Annual Membership】

Risk Warning and DisclaimerThe market has risks, investment requires caution. This article does not constitute personal investment advice and does not take into account the special investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, viewpoints, or conclusions in this article fit their specific situation. Investing accordingly is at your own risk.