The strongest in open source! "Punching GPT-5," "Kicking Gemini-3.0"—why has DeepSeek V3.2 improved so much?

As the large model track gradually shifts from a “parameter competition” to a “capability competition”, a significant change is taking place: open-source models are now approaching, and even challenging, top closed-source models across more and more key capability dimensions.

On December 1, DeepSeek simultaneously released two official models—DeepSeek-V3.2 and DeepSeek-V3.2-Speciale. The former achieved GPT-5-level performance in reasoning tests, only slightly below Gemini-3.0-Pro, while the latter won gold medals in four top international competitions, including IMO 2025.

V3.2 has reached the highest level among current open-source models in terms of tool-using abilities, significantly reducing the gap with closed-source models.

According to official sources, V3.2 is DeepSeek’s first model that integrates thinking with tool use, supporting tool invocation even in “thinking mode.” By synthesizing massive agent training data, the company constructed over 1,800 environments and more than 85,000 complex reinforcement learning tasks, resulting in significant improvements in agent evaluation benchmarks.

V3.2 proves one thing: with the right architecture + data strategy + tool integration design, open-source models are fully capable of becoming world-class competitors. Deepseek researcher Zhibin Gou posted on X:

If Gemini-3 proves that continuously scaling up pre-training is still effective, then DeepSeek-V3.2-Speciale proves that it is feasible to expand reinforcement learning under ultra-large context windows.

It took us a year to push DeepSeek-V3 to its limits. The experience gained: The bottleneck of post-training is solved by optimization methods and data, not by waiting for a stronger base model.

DSA breaks through performance bottlenecks, “Thinking + Tool Use” strategy brings qualitative leap

This breakthrough comes from two core innovations.

The first is DeepSeek Sparse Attention (DSA), a sparse attention mechanism introduced two months ago in the experimental version (V3.2-Exp) as a key structural change.

The sparse attention mechanism effectively addresses efficiency bottlenecks in handling long sequences of the traditional attention mechanism, reducing attention complexity from O(L²) to O(Lk), while maintaining model performance.

Architecturally, DSA uses two main components: the Lightning Indexer and the fine-grained token selection mechanism. The Lightning Indexer calculates index scores between query tokens and historical tokens to decide which tokens are selected; the fine-grained selection retrieves corresponding key-value entries based on index scores. This mechanism is implemented using MLA’s MQA mode, ensuring computational efficiency and stable performance.

Extensive user testing revealed: V3.2-Exp was never noticeably weaker than V3.1 in any scenario, and sparse attention not only preserved capabilities, but also significantly improved efficiency and response quality. This means the model can see “further,” think “deeper,” and use fewer computational resources.

The second key to DeepSeek-V3.2’s major improvement lies in fundamental changes to its training strategies. Previously, models used a simplistic “direct tool invocation” mode, whereas V3.2 innovatively implements an integrated “thinking + tool invocation” (Thinking in Tool-use) mechanism.

DeepSeek-V3.2 becomes the first model to support tool invocation while in “thinking mode.” In other words, it no longer instantly calls tools as soon as it recognizes a problem, but rather: it analyzes first, then plans, then invokes tools, then verifies and adjusts.

This performance more closely mimics the human “think-act-reflect” loop, creating exponential gains in complex tasks (such as searching, coding, bug fixing, and project planning).

Changes to Data Strategy: 1,800+ environments + 85,000 complex instructions

Why is the model suddenly so much stronger? Essentially, it’s due to an overhaul in training strategy.

DeepSeek built a brand new, large-scale data synthesis pipeline, generating more than 1,800 environments and over 85,000 high-difficulty instructions specifically for reinforcement learning.

This “cold start + large-scale synthetic RL data” training method significantly improves the model’s generalization abilities in complex tasks such as code repair and search. By designing reinforcement learning tasks that are “hard to solve, easy to verify,” the model learns to organically incorporate tool use in reasoning.

The core value of this approach: it no longer relies on real human annotation, but forges model capabilities by constructing an “extreme question bank.”

The results are clear: In code repair, search path planning, and multi-step tasks, V3.2’s generalization ability far surpasses previous versions, even approaching closed-source commercial models.

For contextual thinking management, V3.2 uses targeted optimization strategies for tool-using scenarios. Historical reasoning contents are discarded only when new user messages are introduced, while reasoning is retained when tool-related messages (tool outputs) are added—avoiding the inefficiency of re-reasoning the whole problem for every tool call.

Large-scale reinforcement learning dramatically boosts model capabilities; post-training compute exceeds 10% of pre-training

DeepSeek-V3.2 employs a scalable reinforcement learning framework, allocating post-training computational budgets exceeding 10% of pre-training costs, laying the groundwork for unlocking advanced capabilities.

The company introduced multiple stability improvements on top of the GRPO (Group Relative Policy Optimization) algorithm, including unbiased KL estimation, off-policy sequential masking, and keeping routing mechanisms.

In the expert distillation stage, the company developed specialized models for each domain—mathematics, programming, general logical reasoning, agent tasks, etc., spanning six expert domains—all supporting thinking and non-thinking modes. These expert models are trained via large-scale reinforcement learning and later used to generate domain-specific data for the final checkpoints.

Hybrid RL training merges reasoning, agent, and human alignment training into a single RL phase, effectively balancing performance across domains and avoiding catastrophic forgetting seen in multi-stage training. For reasoning and agent tasks, reward rules are based on results, length penalties, and linguistic consistency; for general tasks, generative reward models are used.

The “power structure” of large models is changing!

Compared to several overseas models, DeepSeek-V3.2 displays remarkable performance advantages. In reasoning, V3.2 reached a 93.1% pass rate in the AIME 2025 test, close to GPT-5’s 94.6% and Gemini-3.0-Pro’s 95.0%. In the HMMT 2025 test, V3.2 scored 92.5%, further narrowing the gap with top closed-source models.

On agent capability benchmarks, V3.2 performed especially well. It achieved a 73.1% solution rate in the SWE-Verified code agent task, and 46.4% accuracy in Terminal Bench 2.0, far outperforming current open-source models. In BrowseComp search agent evaluation, with better context management technology, V3.2’s pass rate rose from 51.4% to 67.6%.

In tool-use benchmarks, V3.2 scored 80.3% on τ2-Bench and 45.9% on MCP-Universe. Notably, V3.2 was not specifically tuned to these benchmarks’ toolsets, demonstrating strong generalization. In contrast, other open-source models released in the same period, like MiniMax-M2-Thinking, lagged noticeably in multiple tests.

Behind DeepSeek-V3.2’s release lies a bigger message: The absolute technical monopoly of closed-source models is being broken, as open-source models become truly competitive.

This means three things:

For developers: High-performance models with lower cost and greater customizability have arrived; for enterprises: There’s no longer a need to rely entirely on overseas APIs—a powerful AI system can be built independently; for the industry: The large model arms race is shifting from “who has the largest parameters” to “who has the strongest methods.”

At this moment, DeepSeek stands at the front of the pack.

Risk Disclaimer and Liability ClauseThe market has risks, and investment should be cautious. This article does not constitute personal investment advice, nor does it take into account the individual investment goals, financial situation, or needs of any user. Users should consider whether any opinions, views, or conclusions in this article are appropriate for their specific circumstances. Any investment based on this, responsibility is borne by the user.