The $7 Trillion Question: Which Path Will Win China's Robot Race？

摘要：AsianFin — The air inside this year's World Robot Conference was thick with excitement and a palpable sense of momentum. Despite s

UBTECH Walker Robot on display

AsianFin — The air inside this year's World Robot Conference was thick with excitement and a palpable sense of momentum. Despite sweltering temperatures, the sheer number of attendees, many of them parents with children in tow, pointed to a surging national interest in robotics, particularly in the burgeoning field of humanoid robots and embodied intelligence.

This enthusiasm is backed by hard data. The robotics sector in China is expanding at a rapid clip. According to data from Qichacha, the number of robotics-related companies in China has reached 958,000 as of August 12, 2025, nearing the one-million mark. This year alone, 152,800 new companies have been registered in the first seven months, a year-over-year increase of 43.81%. This growth far outpaces the 4.59% growth rate seen in all of 2024.

Investment is also pouring in. From January to July, over 200 deals were made in the embodied intelligence and robotics sectors, with total financing exceeding 24 billion yuan—already surpassing the total for all of last year. Projections from Citibank estimate the global humanoid robot market will reach $7 trillion by 2050, with China accounting for over 50% of the market and nearly 650 million humanoid robots in operation worldwide.

However, beneath the surface of this booming market, the industry is grappling with fundamental, "non-consensus" debates that are shaping its future.

The Great Debate: RL+VLA vs. World Models

At the heart of the discussion is the technical roadmap for embodied intelligence. The central question is whether the future lies in Reinforcement Learning (RL) + Vision-Language-Action (VLA) models or in World Models.

A VLA model is an end-to-end AI framework that integrates vision, language, and action to enable a robot to perceive its environment, understand instructions, and execute tasks. While some progress has been made, founders like Wang Xingxing of Unitree Robotics argue that VLA models currently lack the necessary generalizability for widespread commercial use. He pointed out that while a VLA-powered robot can be trained to dance, teaching it a new dance requires starting the training from scratch.

Wang believes that a pure VLA approach, even with RL, isn't enough, and the "scaling law" for reinforcement learning has yet to emerge. He suggests that world models, which allow a robot to simulate and predict the outcomes of its actions internally, may be a faster path to convergence. His conviction was strengthened by recent work from Google DeepMind on the Genie3 video generation/world model, which showed strong physical alignment and direct application to robotics.

However, other industry leaders offer a different perspective. Chen Jianyu, founder of Robetera, sees world models not as an alternative, but as a component within a broader VLA framework. He stated that any end-to-end model that can interact with humans through language and operate in the physical world is, by definition, a VLA model. He believes the future will involve a more generalized VLA that integrates technologies like world models and reinforcement learning to enhance fine-grained control and cognitive abilities.

Lu Cewu, co-founder of Noematrix, echoed this view, emphasizing that a successful embodied intelligence company should be adept at all approaches and skillfully integrate them. For him, the ultimate goal is to achieve generalization by reducing the immense uncertainty of the physical world. The exact terminology—whether it's still called VLA or something else—is secondary.

This debate highlights the industry's immaturity. With no unified model architecture, progress remains fractured. As Jiang Lei, Chief Scientist at the National and Local Joint Innovation Center for Humanoid Robots, noted after conversations with major companies like Alibaba and Huawei, a primary challenge remains the lack of a truly "good body" for these complex systems.

Data vs. Models: A Question of Focus

Another key point of contention is whether the industry should prioritize collecting more data or developing better models.

Wang Xingxing argued that too much attention is paid to data, while the models themselves are often neglected. He stated that in many cases, companies have the data but lack the models to effectively use it, making data collection an inefficient endeavor. He believes that better, more unified model architectures are the most pressing need right now.

Wang Qian, founder of X Square Robot, and Chen both agreed that while vast amounts of data are necessary to achieve a "ChatGPT moment" for robots, a greater focus should be placed on data utilization efficiency and the models themselves. Chen noted that with better models, robots in some industrial scenarios are already reaching about 70% of human efficiency and could hit 90% by next year.

The Reality Gap: Real-World vs. Synthetic Data

The third major debate centers on the source of training data: should it come from the real world or from simulations?

The majority of companies still prefer real-world data, but a notable minority, including Galbot and DexForce, are championing synthetic data. Wang He, founder of Galbot, argued that synthetic data is the key to accelerating embodied intelligence. He revealed that 99% of his company's training data is synthetic, generated by feeding vast numbers of object and material assets into a simulation engine to create datasets for grasping and manipulation tasks. He sees real-world data as a way to "complete the last mile" of training.

However, Zhao Xing, Chief Scientist at Galaxea AI, firmly believes that real-world data is the most crucial factor. He used an analogy of a race car: "I don’t want our vehicles to be like race cars endlessly circling a track. Instead, I hope our vehicles will venture out onto real roads, public roads... Likewise, we want our robots to enter real homes." His company recently launched the Galaxea AI Open World Dataset, which includes 500 hours of real-world robot data from home environments.

Lu offered a middle-ground perspective, suggesting that the ratio of synthetic to real-world data should be determined by an effective mechanism, not by humans. He noted that for simple actions like grasping, simulation works well, but for complex, continuous tasks like wiping a table, real-world data is essential.

A Consensus on Non-Consensus

Beyond these technical arguments, the industry is also debating a more fundamental question: what is the purpose of humanoid robots? Should they be for entertainment, like dancing and playing soccer, or should they be built to perform productive work in factories and homes? While some companies are showcasing robots with advanced athletic abilities, there is a growing consensus that, to create real value, humanoid robots must transition from performance to practical application.

This lack of convergence on technology, data, and purpose is, in itself, the industry's current consensus. The Chinese humanoid robot market is still in its early stages, with technological paths yet to be fully defined. As Wang Xingxing put it, the industry is on the cusp of its "ChatGPT moment," which could arrive in the next 1-2 years.

He and others predict that over the next decade, the humanoid robot market will grow into a $100 billion-plusindustry, eventually surpassing the scale of the automotive and smartphone markets. However, this growth will likely be accompanied by consolidation. Some analysts suggest that as many as 80% of current humanoid robot companies may not survive the upcoming "elimination round" as the market enters a phase of large-scale production.

来源：钛媒体

标签： robot trillionquestion robotra

本文地址：https://news.43u.com.cn/a/2180658.html

免责声明：本站系转载，并不代表本网赞同其观点和对其真实性负责。如涉及作品内容、版权和其它问题，请在30日内与本站联系，我们将在第一时间删除内容!