A Quick Guide to the New Model of Token Economics

```

The commercialization of AI applications is extending from selling software and memberships to selling Token call capabilities. Here, “Token” is the smallest unit of information processed by large models, and also the basis for model API billing, settlement, and consumption. As usage volumes increase, Tokens themselves are starting to be purchased, routed, split, and resold like a type of "inventory."

Chen Liangdong, an analyst at Huayuan Securities, recently summarized the core change in a media industry special report as: “Token operations are forming a new intermediate market layer, that is, exploring a Token distribution model that connects upstream large model vendors with downstream developers, enterprises, and individuals. Essentially, it is the liquidity infrastructure of the global Token wholesale-to-retail network.”

The background for this business is not complicated: on one hand, Token call volume in China is expanding rapidly—from a daily average of 100 billion at the beginning of 2024 to 100 trillion by the end of 2025, surpassing 140 trillion by March 2026; on the other hand, the capabilities of domestically produced large models have improved, already reaching the top tier globally in some rankings and usage volumes. As demand and the number of models increase, real bottlenecks now occur at the transaction points: payment, networks, interfaces, compliance, channels, and scenario implementation.

But Token distribution cannot be simply understood as "reselling API quotas." The thinnest layer of profit comes from resale price differences, but the thicker part comes from inference acceleration, unified interfaces, enterprise-end prompt engineering, agent orchestration, model selection, and business system integration. Because the entry barrier is not high, the risks in this market are also clear: intensified competition, advance payments and bad debts, and policy changes by upstream model vendors can all squeeze intermediary profits.

Tokens Now Have “Wholesalers” and “Retailers”

The basic chain of Token distribution includes three kinds of players.

Upstream are the model providers, including ByteDance’s Seedance series, Alibaba’s Qwen series, Zhipu’s GLM series, Moonshot’s Kimi series, DeepSeek, etc.—they are the primary sources of Tokens.

In the middle are agency platforms, which take upstream model resources and redistribute them to end users. Their work is not only to resell quotas, but also to convert interface protocols of different models into a unified API format so that downstream users can call multiple models with just one API Key.

Downstream are the actual Token consumers, including individual users, developers, enterprise clients, or even lower-level distributors.

The value in this middle layer is mainly: direct domestic access lowers network barriers; one set of code fits multiple models; supports both individual and enterprise payments; bulk procurement may lead to lower costs; a single platform aggregating models like GPT, Claude, DeepSeek, Kimi, etc., reduces developers’ repetitive integration costs.

Therefore, Token distribution appears asset-light—there's no need to train large models yourself or have massive server clusters. The core assets become the API scheduling system, upstream model resources, channel clients, and service capabilities.

Shooting Token Usage Is the Direct Fuel of This Business

For the Token operation model to work, you first need a large enough consumption volume.

China’s daily average Token call volume jumped from 100 billion to over 140 trillion in two years, over a 1,000-fold growth. The expanding call volume comes from the launch of various vertical agents and enterprises embedding generative AI into more of their business flows.

IDC data presents an even more aggressive trajectory: the number of active AI agents in Chinese companies is projected to surpass 350 million by 2031, with a CAGR exceeding 135%. As agent task density and complexity grow, Token consumption by agents may increase more than 30 times annually.

This change is already visible with execution-type agents. For example, OpenClaw’s weekly Token consumption on the OpenRouter platform rose from 0.81T to 4.97T between February 2 and March 16, 2026, with its share rising from 8.31% to 24.36%.

Once Tokens become large-scale consumables, their procurement, pricing, routing, and settlement naturally become layered. Model providers may not serve every client directly, and end clients might not want to access each model individually—thus, there is space for intermediaries.

The Cost-Effectiveness of Domestic Models Opens the Door for Tokens Going Global

The improvement in capabilities of Chinese large models is the key variable for Token distribution to move from domestic to international markets.

SuperCLUE data shows that domestic models such as Bytedoubao and DeepSeek have achieved comprehensive scores exceeding 70, narrowing the gap with leading overseas models like GPT-5.4 and Gemini; models such as Tongyi Qianwen, Kimi, and Zhipu GLM have also formed distinct tiers.

According to OpenRouter data, as of the week of May 10, 2026, Tencent’s Hy3 preview (free) ranked first in call volume; among the top 5, top 10, and top 20, domestic large models accounted for 2, 6, and 9 models, respectively.

More notably, in the first quarter of 2026, Chinese models' Token call volume on OpenRouter reached 4.12 trillion between February 9 and 15, surpassing the 2.94 trillion of American models in the same period for the first time. From February 16 to 22, call volume by Chinese models further rose to 5.16 trillion Tokens; among the top five models by platform call volume, four were from Chinese vendors—MiniMax M2.5, Kimi K2.5, Zhipu GLM-5, and DeepSeek V3.2, contributing 85.7% of the Top 5 total call volume.

There is also an obvious price advantage. The input price for MiniMax M2.5 and GLM 5 is $0.3 per million Tokens, while Claude Opus 4.6 is $5; for output, MiniMax M2.5 is $1.1, GLM 5 is $2.55, and Claude Opus 4.6 is $25. The cost-effectiveness gap of Chinese models in high Token-consumption scenarios like AI agents and code development will continue to grow.

Global AI Resource Imbalance: Routing Platforms Become “Transfer Stations”

Token distribution is not only about solving price issues but also about resource mismatches.

Leading overseas large models are subject to regional access limits, compliance rules, and payment barriers, making them inaccessible to some users including developers in mainland China. For premium domestic large models going abroad, there are also challenges of localization, channel building, and user acquisition.

This imbalance drives cross-border circulation, aggregation routing, and layered distribution demand.

OpenRouter has already become a typical example. Its platform's weekly Token processing volume increased from 5–7 trillion per week in 2025 to over 20 trillion per week in April 2026; annualized revenue for 2026 exceeded $50 million, about five times the over $10 million annualized revenue disclosed in October 2025.

There are similar platforms in China. SiliconFlow is a one-stop large model cloud service platform offering efficient inference acceleration based on its own engine, and provides enterprise-grade large model services. As of December 2025, the platform had more than 9 million registered users, over 10,000 enterprise users, and more than 150 models launched.

Even U.S. political capital has entered this sector. On May 5, 2026, WLFI, a crypto company closely linked to Trump and his family, joined hands with WorldClaw to launch WorldRouter, integrating over 300 models such as Claude, GPT, and Gemini, settling in USD1, with pricing about 30% lower than official public rates.

Real Profits Aren’t Necessarily in “Resale Margins”

There are three ways to profit from Token distribution.

The first is resale margins. Platforms bulk purchase API quotas from upstream model vendors and resell them at a markup to downstream clients. OpenRouter adds about a 5.5% markup over supplier costs, representing this model.

The second is technological premium. Platforms use self-developed inference acceleration engines to reduce per-Token running costs, so when sale prices approach or are even lower than official prices, they earn gross margins by leveraging efficiency differentials. SiliconFlow’s SiliconLLM and OneDiff technologies have increased language model inference speed tenfold and text-to-image efficiency threefold, making large model API call costs as low as one-tenth of the industry standard.

The third is enterprise value-added services. The cost for enterprises to deploy AI lies not only in Token unit prices but also in prompt engineering, multi-model selection, business system integration, workflow orchestration, operations management, and workforce AI skills construction. When baseline Token prices drop, these hidden costs become more prominent payment points.

SiliconFlow’s enterprise MaaS platform points in this direction: for enterprise users, it provides layered threefold capabilities for model training and tuning, deployment and inference, and application development support, covering data processing, model fine-tuning, prompt engineering, RAG, etc., ultimately delivered in standardized API form to industries such as energy, finance, and government.

Marketing, Short Dramas, Gaming, E-Commerce: Scenarios That More Easily Consume Tokens

For Token distribution to be profitable, it must ultimately land in real-world scenarios.

Generative AI applications are making their way into industries like healthcare, transportation, and manufacturing, and are starting to participate in core enterprise processes like decision support and strategy. However, many enterprises still have weak foundations for digital transformation, insufficient data assets, and limited computational resources—directly deploying AI is not easy.

In contrast, marketing and ad companies already have clients and scenarios at hand—short dramas, comics, gaming, e-commerce, etc.—with more direct and sustained Token consumption needs. For these companies, the opportunity isn’t just reselling model capabilities, but embedding Tokens into their content generation, delivery, material production, and video workflows.

Investment opportunities also follow two main lines:

One is companies with superior model capabilities, including Alibaba, Tencent Holdings, Kuaishou, Kunlun Wanwei, Zhipu, MiniMax, etc.

The other is companies with strong Token consumption scenarios and high-quality clients, especially those with overseas client resources and marketing scenarios and those actively investing in AI-driven marketing and video solutions, including Yidian Tianxia, BlueFocus, etc.

Risks Are Significant: Low Barrier, Need for Advance Payment, Upstream Has the Final Say

The Token distribution business model is light, but the moat isn't naturally deep.

Peer competition is the first layer of risk. The technical threshold for distribution is low; if leading agents enter the market with financial, client, and channel advantages, they can quickly copy the model and squeeze profit margins.

Advance payments and bad debts are the second layer of risk. Distributors often settle monthly or quarterly with downstream clients but must pay upfront when purchasing API quotas upstream. The larger the Token scale, the greater the cash pressure; if clients default, bad debt risk increases in tandem.

Upstream model vendor policy changes are the third risk. Large model vendors control API pricing and access policies—they might adjust prices or restrict third-party integrations. For intermediaries, this is the hardest factor to control.

Risk Warning and DisclaimerThe market has risks, and investment needs to be cautious. This article does not constitute individual investment advice and does not consider the special investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investment based on this is at their own risk. ```