MiniMax releases the M2.5 model: $1 for 1 hour of operation, only 1/20 the price of GPT-5, with performance comparable to Claude Opus.
```
MiniMax has released its latest iteration of the M2.5-series models, maintaining industry-leading performance while dramatically reducing inference costs. The model aims to address the economic infeasibility of complex Agent applications and claims it has reached or surpassed the industry SOTA (state-of-the-art) in programming, tool usage, and office scenarios.
On February 13, MiniMax announced data showing that M2.5 demonstrates significant price advantages. At the version outputting 50 tokens per second, its price is only 1/10 to 1/20 that of mainstream models such as Claude Opus, Gemini 3 Pro, and GPT-5.
In high-speed environments outputting 100 tokens per second, the cost for M2.5 to operate continuously for one hour is just $1. For 50 tokens per second, the cost drops further to $0.3. This means a budget of $10,000 can support four Agents working continuously for a year, greatly lowering the threshold for building and operating large-scale Agent clusters.
In terms of performance, M2.5 shows strong results in key programming tests and achieved first place in the multi-language task Multi-SWE-Bench, reaching levels comparable to the Claude Opus series. Additionally, the model has optimized its ability to break down complex tasks. In the SWE-Bench Verified test, task completion speed increased by 37% compared to the previous generation M2.1, with end-to-end runtime reduced to 22.8 minutes, matching that of Claude Opus 4.6.
Currently, MiniMax's internal operations have validated the model’s capabilities. Data indicates that 30% of all internal tasks are autonomously completed by M2.5, covering core functions in R&D, product, and sales. Particularly in programming scenarios, code generated by M2.5 now accounts for 80% of new code submissions, highlighting the model’s high penetration and usability in real production environments.
Breaking the Cost Floor: Economic Feasibility of Unlimited Agent Operation
The original intent of M2.5 was to eliminate cost constraints for running complex Agents. MiniMax achieved this by optimizing inference speed and token efficiency. The model offers 100 TPS (transactions per second) inference speed, about twice that of mainstream models.
Beyond reduced computational costs, M2.5 lowers total token consumption needed to complete tasks through more efficient task breakdown and decision logic.
In the SWE-Bench Verified evaluation, M2.5 averages 3.52M tokens per task, lower than M2.1's 3.72M.
The dual improvements in speed and efficiency allow enterprises to economically build and operate Agents almost without limit, shifting the competition focus from cost to the speed of model capability iteration.
Advancing Programming Capabilities: Thinking and Building Like an Architect
In programming, M2.5 focuses not only on code generation but also emphasizes system design capability. The model has developed native Spec (specification) behaviors, proactively breaking down functions, structure, and UI design from the architect’s perspective before coding.
The model was trained across more than 10 programming languages (including GO, C++, Rust, Python, etc.) and hundreds of thousands of real environments.
Tests show that M2.5 is competent for the entire workflow from system design (0-1), development (1-10), feature iteration (10-90), to final code review (90-100).
To verify its generalization in different development environments, MiniMax tested it on programming scaffolds like Droid and OpenCode.
Results show M2.5 achieved a pass rate of 79.7 on Droid and 76.1 on OpenCode, both surpassing the previous generation model and Claude Opus 4.6.

Complex Task Handling: More Efficient Search and Professional Delivery
In search and tool calling, M2.5 demonstrates higher maturity in decision-making, no longer simply pursuing "correct results," but seeks to solve problems using more streamlined paths.
Across tasks like BrowseComp, Wide Search, and RISE, M2.5 saves about 20% in rounds consumed compared to the previous generation, closing in on results with better token efficiency.

For office scenarios, MiniMax worked with industry veterans in finance, law, etc., incorporating tacit industry knowledge into model training.
In the internally developed Cowork Agent evaluation framework (GDPval-MM), M2.5 achieved an average win rate of 59.0% in pairwise comparisons against mainstream models, able to output Word research reports, PPTs, and complex Excel financial models that meet industry standards, not just simple text generation.


Technical Foundation: Native Agent RL Framework Drives Linear Improvement
The core driver of M2.5's performance improvements is large-scale reinforcement learning (RL).
MiniMax adopted a native Agent RL framework called Forge, introducing an intermediate layer to decouple the underlying training/inference engine from Agents, supporting integration with any scaffolding.
At the algorithm level, MiniMax continues to use the CISPO algorithm to ensure the stability of MoE models during large-scale training, and it introduced a Process Reward mechanism to address credit allocation challenges caused by long Agent contexts.
Additionally, the engineering team optimized asynchronous scheduling strategies and tree-merge training sample strategies, achieving about 40x training acceleration, validating that model capability shows near-linear improvement as computing power and number of tasks increase.

Currently, M2.5 has been fully launched in MiniMax Agent, API, and Coding Plan, and its model weights will also be open-sourced on HuggingFace, supporting local deployment.
Risk Warning and DisclaimerThe market carries risks; invest with caution. This article does not constitute individual investment advice and does not take into account individual users’ special investment goals, financial situations, or needs. Users should consider whether any opinions, views, or conclusions in this article fit their specific circumstances. Investment based on this information is at your own risk. ```