"The 'Price Butcher' is here: Xiaomi MiMo Large Model API sees permanent price cuts of up to 99%."

```

On May 27, Xiaomi announced permanent price reductions for the MiMo-V2.5 series large model API pricing system, with the maximum reduction reaching 99%, and will no longer differentiate based on context length.

Specifically, for MiMo-V2.5-Pro:

One million tokens input (cache hit) costs only 0.025 yuan.

One million tokens input (cache miss) costs 3 yuan.

One million tokens output costs 6 yuan.

Moreover, for the Token Plan, Xiaomi adopts the “more for the same price” strategy, increasing the usable number of tokens for users in Agent or Code scenarios to 5-8 times the original amount, and adjusts the rules to “what you see is what you get,” simplifying the complex pricing logic caused by conversion.

This is another major pricing adjustment by a leading domestic large model within just one week, following DeepSeek’s announcement last week to permanently reduce the price of V4-Pro to 25% of its original price.

Internationally, DeepSeek and Xiaomi’s latest pricing is already significantly lower than mainstream overseas vendors.

For mainstream international models, OpenAI GPT-4o’s standard input price is $2.5 USD per million tokens, output is $10 USD; Claude Sonnet 4.6’s input is $3 USD per million tokens, output is $15 USD.

Unlike simple “burning cash,” the logic behind Xiaomi’s price reduction this time targets structural cost optimization at the engineering layer.

According to Xiaomi, based on SGLang HiCache fully supporting SWA (Sliding Window Attention), the amount of KV Cache data transferred among multi-level storage like GPU VRAM, CPU memory, and SSD is reduced to nearly 1/7 of the pre-optimization level, and the number of tokens that can be cached is increased to nearly five times before optimization, significantly improving cache hit rate and inference efficiency. Additionally, Xiaomi has improved cluster input throughput further via expert solutions and input length bucketing strategies.

This mirrors DeepSeek’s pricing logic, both structurally reducing per-token service costs through architectural innovation and engineering optimization of inference systems, then passing these dividends to developers.

From an industry perspective, this round of domestic large model price cuts is also accompanied by changes in application demand. As large models shift from “chatting” to “working,” what truly worries developers and enterprise users is not just the cost of individual Q&A sessions, but the continuous token consumption of Agents in multi-turn inference, calls, and automated cost workflows.

As the price per million tokens continues to drop, competition among domestic large models will further trickle down the value chain. For developers, lower costs mean a surge in Agent and other application offerings; for vendors, low prices imply higher inference efficiency, stronger computing power dispatch, and longer-term ecological investment.

The price war may not directly determine whose model is strongest, but it can drive faster adoption by developers. Xiaomi MiMo’s substantial price cut at this moment is another footnote in the domestic large model’s journey toward “scaled usage.”

Risk warning and disclaimerThe market carries risks; investors should exercise caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, viewpoints, or conclusions in this article are suitable for their particular situation. Invest accordingly at your own risk. ```