AI video enters the "production line" survey

AI video enters the "production line" survey

```At the beginning of the year, the debut of Seedance 2.0 ignited the possibility of AI video participating in film industry workflows.

As scenarios such as short dramas, advertising, and e-commerce began to incorporate AI video into actual production processes, AI video models are transitioning from scoring to real work. Creators are now more concerned not only with model parameters and ranking performance, but with whether a model can stably produce videos, support continuous shot generation, and ultimately be embedded in a reusable, collaborative, and deliverable workflow.

ByteDance’s Seedance 2.0 has garnered attention in this context.

“Compared to many models that require high precision in prompts, Seedance 2.0 can expand even short and abstract prompts into more professional and detailed descriptions, translating ordinary expressions into shot language that the model can execute, thus lowering user difficulty.” A short drama practitioner in Xi’an told Wallstreetcn · AllWeather Tech.

Meanwhile, Kuaishou’s Kelina and Alibaba’s HappyHorse are still iterating rapidly; iQIYI’s Natto and Qunhe Technology’s LuxReal are approaching the field from workflow, digital assets, 3D space, and collaboration tools; vertical players like Shengshu Technology, Aishi Technology, MiniMax, and SenseTime continue to jockey for position.

Players from across models, platforms, and tool chains are diving in, making the AI video track both crowded and fast-developing.

Scoring Invalidates

From the manufacturer level, the competitive ladder is lengthening quickly.

Among internet giants, ByteDance has Seedance (“Jumeng”), Kuaishou has Kelina, Alibaba has HappyHorse.

Outside mainstream internet companies, long-video platform iQIYI has launched the full-process AI creation platform “Natto” for professional short drama production.

Beyond the giants, vertical players are flooding in: Shengshu Technology’s Vidu, Aishi Technology’s PixVerse (Paiwo AI), MiniMax’s Hailuo, Qunhe Technology’s LuxReal, SenseTime’s Seko, all positioning around this track.

But on the other side of this excitement is: as AI video moves from model demonstration to real production lines, the benchmarks for judging a model’s ability are changing.

In the past year, rankings for AI video models have multiplied, with model comparisons and sample videos emerging constantly. These lists have amplified industry buzz and made model differences more visible.

But when video generation enters real production like short dramas, ads, and industrialized content, the model must face more than “can it make a pretty sample”; it must stably generate material with image quality, smooth action, and consistent character subjects.

These abilities are hard to comprehensively measure by any automated scores.

Thus, at this stage, many manufacturers have begun to downplay automated machine review for video quality and focus more on manual evaluation and real-world feedback. For downstream creators, a model’s real usefulness is not based on rankings but whether it reduces rework, improves video production efficiency, and truly fits industrial workflows.

In some sense, this mirrors the “scoring invalidation” seen in the large-model Agent track.

When Agents first emerged, the industry measured models via leaderboard ranking. But as Agents moved from dialogue/demos to real workflow, people found leaderboard scores don’t map directly to practical usability.

The reason: once Agents enter the “work” stage, they face multi-step, long-chain decisions and execution, needing to understand goals, break down tasks, call tools, and constantly adjust the path.

The current evaluation systems can hardly fully test the abilities for these long tasks.

From this perspective, Seedance 2.0 is attracting attention because it’s being embedded into real production flows.

From Usable to Production

According to interviews conducted by AllWeather Technology with several downstream users, the impact brought by Seedance 2.0 is more direct.

“Whether it’s understanding video content, grasping physical world logic, or naturalness of performances, Seedance 2.0 has greatly improved,” said Liu Cheng, Content Lead at AI short drama production company Kameng Intelligence (Beijing) Technology Co., Ltd, to AllWeather Tech.

Regarding content understanding, Liu believes Seedance 2.0 has made significant progress in handling abstract semantics.

“Although the final results still have uncertainties, it’s already quite good. For example, if the prompt is ‘let these two people have an ambiguous interaction in the scene,’ the AI will analyze and generate ambiguous lighting and tones between the two people, their camera movements might become slower—in effect, it automatically supplements these elements based on requirements,” said Liu.

Furthermore, he noted that previously, some martial arts actions and complex multi-person interactive scenes often exhibited problems like breaking the set, model collisions, and face misalignment; after Seedance 2.0, these issues are basically solved.

“Some videos, you really can’t tell whether it’s AI or real people,” Liu said.

A short drama practitioner in Chongqing shares a similar perspective.

“After Seedance 2.0 came out, indeed the consistency of characters, lip-sync, and voices is much better, and the ‘oil painting’ look of the images is reduced. Storyboard design is also smarter,” this practitioner told AllWeather Tech.

Industry insiders in Xi’an’s AI short drama sector revealed to AllWeather Tech that with Seedance 2.0 and prompt optimization, they can now generate a 10-second video in one or two attempts, up to three for satisfactory results.

“If skilled, a 50-episode real-person AI short drama can be completed in about two weeks,” this person disclosed.

Star Xi (alias), an entrepreneur and developer focused on AI short drama tools, believes ByteDance’s Jumeng, which integrates Seedance 2.0, has made usability more complete than other competitors.

According to Star Xi, Jumeng’s video generation ‘omnipotent reference mode’ can understand nine-grid storyboard images well. Upload a keyframe image containing nine shots, and it can infer and generate video based on the annotations. However, iteration is fast, and other tools now have this function, too.

At least in this round of AI video competition, Seedance 2.0 has already pushed model abilities from “usable” to “closer to production-grade,” increasing pressure on followers to catch up.

Main Pain Points

Though Seedance 2.0 is a leap forward, common industry problems persist for AI video.

First, as the generated video length extends, character consistency becomes difficult to maintain—especially when the character changes from front to side view, the face can change.

Currently, models like Seedance 2.0 control the length of generated videos, keeping them within 5-15 seconds.

This means users generate video in segments and must later edit them into a complete work.

But segment-based generation introduces new issues: for every new shot, creators must re-input character reference images, costumes, scene, and props to the model to maintain visual coherence.

Academia is exploring solutions.

For example, Peking University master's student Yuan Shenghai's team published “Identity-Preserving Text-to-Video Generation by Frequency Decomposition,” aiming to solve “how to preserve character identity across frames, actions, and angles while generating video from text.”

Yuan’s ConsisID framework splits facial features into high-frequency and low-frequency signals, letting models learn each separately to reduce learning difficulty.

“Previously, everyone just fed the raw image to feature extractors. We think this increases learning difficulty,” Yuan explained. “Literature shows facial features can be split: high-frequency corresponds to details (texture, eyes), low-frequency to global features (skeletal structure, position of eyes, nose, etc.). Separating these for model learning makes it easier to grasp these features.”

Second, the “layer separation” between characters and backgrounds.

Many viewers intuitively notice AI-generated characters often “float” over the background, as if not in the same layer.

Star Xi analyzed that much of the “AI flavor” in images is due to light and layer handling. Many creators transitioning to AI video lack film aesthetics training and do not proactively adjust lighting, resulting in a lack of depth in the images.

“Some fail to coordinate light angles, shadows, focus, and depth of field, making the visuals feel flat or separated. So it looks like two layers forcibly stitched together,” Star Xi said. “Removing the ‘AI flavor’ depends largely on the creator’s cinematography foundation—it’s about aesthetics and camera relationships.”

AI video researchers say this is fundamentally a multimodal reference fusion issue: character reference images and scene images have their own color tones and lighting but cannot blend.

Third, shot logic and emotional tension in long narratives.

Star Xi believes that even with self-developed script generation and breakdown tools, stories still suffer from “flat narration” and “rigid, clichéd plots.”

“Generalization for specific genre and style is lacking, with little rise and fall,” Star Xi said. “While villains are set in big plots, small moments fail to evoke emotional resonance, lacking small conflicts and rigorous logic.”

Liu Cheng concurs: “Upgrades like Seedance 2.0 lower the content creation threshold, but if AI content floods in, quality is uneven, and truly touching works still need strong content power.”

Differentiation Filling In

Against this backdrop, players outside the giants are building differentiated advantages in workflow and case libraries.

According to Liu Cheng, Kameng uses AI auxiliary features during project generation. The team developed storyboard prompt and sketch functions; after users modify prompts, AI completes 80%-90% of creation. Skilled prompt users can further fine-tune for efficiency.

Qunhe Technology optimized workflow on the 3D level, launching a short drama version of LuxReal on May 27.

Based on independently developed large spatial models and 3D tech, LuxReal can transform 2D scene images into roamable virtual 3D spaces. Creators can freely adjust camera positions and character placement, with the system auto-rendering equivalent scenes.

Actual generation quality remains to be seen. Though LuxReal’s workflow adaptation is well developed for short drama, proactive optimization is lacking, with issues like mismatched costume and period backgrounds.

iQIYI’s Natto integrated both self-developed and external models like Seedance 2.0, leveraging its IP and digital asset libraries and creator communities for platform capabilities, offering one-stop support from content production to operation.

Among these, IP libraries and digital asset libraries are iQIYI’s unique strengths. In the asset library, creators can call up scenes, weapons, animals, etc., from various dramas such as “Cheng He Ti Tong” palace or “Hua Rong” magic world.

However, AllWeather Tech observes that iQIYI’s asset library is still relatively limited on the Natto platform.

Overall, players outside the giants, after incorporating Seedance 2.0, are mainly building their differentiated advantages in engineering, knowledge accumulation, and workflow collaboration.

Battle Continues

Whether it’s long-video stability, character consistency, or controllability, the AI video industry still faces many pain points to solve. The competitive landscape is far from convergence.

In this context, capitalization is becoming a key option for manufacturers to accelerate.

In May this year, the market reported that Kuaishou is accelerating the spin-off of Kelina for an independent IPO next year, with pre-IPO valuation expected to reach $20 billion.

Kuaishou later confirmed in Hong Kong Exchange announcements that its board is evaluating plans for restructuring Kelina-related assets and business.

Similarly, vertical players are speeding up fundraising and IPO preparations. Shengshu Technology completed two financing rounds totaling over 2.6 billion yuan in two months, and is rumored to be planning a Hong Kong IPO in the first half of 2026, with its main entity finishing equity restructuring at the end of March.

Intensive capital moves mean competition in this track will only intensify, not converge.

Underlying these moves, the reality reflects that competition in the AI video track is not just a technical race, but also one in funding, computing power, data, and scenario application abilities.

Meanwhile, AI video commercialization is still in early stages. Scenarios like short drama, ads, e-commerce, gaming, and film pre-viz are validating demand, but stable, scalable, high-margin revenue models will take time.

Therefore, capital market support has become a key chip for many manufacturers to stay at the table.

The current AI video race isn’t ending with Seedance 2.0’s temporary lead. Instead, with more manufacturers refueling and iterating, the industry may undergo another round of competition in model capability, production tools, and commercialization efficiency.

Risk Warning and DisclaimerThe market is risky, investment needs caution. This article does not constitute personal investment advice, nor does it take into account special investment goals, financial status, or needs of individual users. Users should consider whether any opinions, views, or conclusions herein suit their particular circumstances. Invest accordingly at your own risk. ```