How can AI video say goodbye to "gacha" games

```

Large language models are generally facing an anxious period of “hitting the wall” with their business models, while AI video models have already established cash flow.

In the second quarter of 2025, Kuaishou’s “Keling” generated revenue of over 250 million yuan, while MiniMax’s “Conch” earned $17 million in the first three quarters of 2025.

Recently, LuxReal, the first AI video generation application under QuCore Technology (which is sprinting for a Hong Kong IPO), started beta testing, attempting to find a differentiated path in this lucrative track.

In terms of commercialization, LuxReal targets professional users in overseas e-commerce and short drama markets, who have stronger willingness to pay.

Technically, by leveraging QuCore’s 3D structured scene data, LuxReal proposes a new path that “refuses to guess pixels”: double insurance through 3D modeling and video algorithms reduces randomness and strengthens spatial consistency.

Although the industry is moving toward commercialization, “uncontrollability” means that most products can only remain at the “gacha game” stage, unable to meet the strict standards for physical logic and detail coherence required by B-end delivery.

“Generative models are fundamentally unsuitable for making videos. The AI models everyone sees that can generate videos don’t actually understand the physical world—they are just generating pretty pictures.” said Turing Award winner Yann LeCun.

As more players enter the AI video generation field, the industry may explore more new technical paths.

Moving Away from “Guessing Pixels”

LuxReal’s comparative advantage comes from QuCore Technology’s massive and physically accurate indoor space datasets built over many years.

According to QuCore’s live demonstration, the DEMO videos generated by LuxReal show no facial distortions during dance, and maintain a certain consistency between camera shots.

Currently, QuCore Technology owns data assets of 500 million 3D structured scenes and 440 million merchandise models, serving as one of the guarantees of “spatial consistency.”

The core technical logic of most mainstream AI video generation models is to combine diffusion models with Transformers to enhance consistency.

Take OpenAI’s video generation app Sora as an example. Its technical route is a deep integration of diffusion models and Transformers. The diffusion model “generates high-quality video from random noise through gradual denoising rather than directly predicting the next frame’s pixels,” while the Transformer’s self-attention mechanism enables global modeling of spatio-temporal dimensions, solving the “memory decay” issue of traditional frame-by-frame generation.

However, achieving spatial consistency requires that object positions, proportions, shapes, and textures in videos remain physically accurate across camera movement, perspective switching, and scene changes. This poses a universal challenge for almost all current AI video generation apps.

Fei-Fei Li believes that human cognition relies heavily on spatial reasoning, yet existing AIs—even powerful multimodal models—remain very weak in spatial understanding, such as understanding the size, location, and distance between objects.

Overall, due to limitations in training data, computing power, and algorithms, AI video models struggle to understand the motion rules of the physical world. More often, they “guess” to fill in the next frame, easily resulting in spatial consistency problems.

LuxReal’s solution is to first perform real 3D modeling of the subject before the AI generates the video, attempting to enhance object consistency within the video itself.

For example, in the above-mentioned DEMO video, the character underwent real 3D modeling, allowing consistent movement across scenes.

“So we’re essentially controlling facial expressions at the 3D level first, and then controlling them at the video algorithm level second. With double insurance, the final video can maintain consistency in actions,” said Long Tianze, QuCore’s product manager, to Xinfeng.

But without the 3D modeling step, LuxReal’s spatial consistency would be significantly compromised.

Based on Xinfeng’s participation in LuxReal’s beta testing, a box of Lego-built sunflowers was used as the image, with the prompt 'Lego sunflower model in a cardboard box, showcasing vibrant flowers and green stems, under soft lighting with a warm atmosphere.' However, the final result showed issues such as Lego blocks floating and the box being replaced.

A LuxReal developer told Xinfeng that the product still needs continued optimization.

Hot and Cold Realities

The AI video generation field was once not favored by industry giants.

Baidu founder Robin Li said in 2024: “Sora-style video generation cycles are too long. Even after 10 or 20 years, it may not bring any business returns. No matter how popular it gets, Baidu won’t do it.”

However, the new entrants have indeed shattered the pessimistic expectations of giants with their solid revenue data.

In the second quarter this year, Kuaishou’s AI video generation app “Keling” achieved more than 250 million RMB in revenue.

Based on this unexpectedly strong commercialization, Kuaishou not only raised full-year income projections in its Q3 2025 earnings call but also increased its investment in computing power.

This also boosted Kuaishou’s share price, with cumulative gains of more than 20% in the last six months.

The newly-listed MiniMax also made a splash in the video generation track; after launching its AI video app “Conch” in August 2024, it quickly became a pillar business, earning $17 million (about 120 million yuan) in the first three quarters of 2025, accounting for 32.6% of total revenue.

During the same period, Conch reached 310,000 paid users, with per capita contribution as high as $56, fully proving users are willing to pay for AI video.

On January 9, 2026, MiniMax’s IPO day closed at HKD 345 per share, rising 109% from the issue price, with market cap exceeding HKD 100 billion.

Behind the explosive revenue growth, extremely low user retention rates have become the Damocles sword hanging over all players’ heads.

As the novelty of “making cats dance” fades, the vast majority of AI video generation apps are finding it “easy to acquire new users but hard to retain them.”

Taking Conch as an example, in October 2025, its 1-day, 7-day, 30-day, and 60-day retention rates among Apple users in Singapore were 22.57%, 4.62%, 0.8%, and 0.66%, respectively.

This means that of every 100 new users acquired by Conch, fewer than 1 will remain after 60 days.

Facing challenges of C-end market retention, QuCore’s solution has leaned toward the overseas B-end market.

“Currently, we’re targeting overseas markets, especially e-commerce and short drama users, who have higher requirements for video spatial consistency,” a QuCore insider told Xinfeng.

For B-end users like e-commerce and drama producers, video is directly tied to conversion rates; they indeed have higher willingness to pay.

Yet this group also has higher demands on delivery quality, so whether LuxReal can find certainty as a tool-type product in the uncertain field of AI video still needs sustained testing by real commercial environments.

Risk Warning and DisclaimerThe market carries risks, and investment requires caution. This article does not constitute personal investment advice, nor does it consider the special investment goals, financial conditions, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article suit their circumstances. Investment is at your own risk.

```