Ultimate test scores reach new highs, Google Gemini 3 deep-thinking model undergoes major upgrade, targeting scientific research and engineering applications

```

Google’s Deep Think model Gemini 3 Deep Think has undergone a major upgrade, advancing its professional reasoning capabilities from abstract theory to practical application scenarios. This upgrade focuses on solving complex challenges in modern scientific research and engineering fields, marking Google’s strategic bet in the enterprise AI market.

On Thursday, June 12 (Eastern Time), Google officially announced the Gemini 3 Deep Think upgrade, claiming that the upgraded model has achieved breakthrough results in several industry benchmark tests, including scoring 84.6% in the “Humanity’s Last Exam” (HLE) benchmark test and ARC-AGI-2 test, verified by the ARC Prize Foundation; on the competitive programming platform Codeforces, Gemini 3 Deep Think achieved an Elo rating of 3455.

The upgraded Deep Think mode is now open to Google AI Ultra subscription users and offers early access via the Gemini API to select researchers, engineers, and enterprise customers. Google noted that the model has already shown its value in real-world research, from discovering logical flaws in academic papers to optimizing semiconductor material growth processes.

This release puts Google in direct competition with OpenAI’s o1 series and Anthropic’s Claude in the AI reasoning model race. As general AI capabilities become increasingly commoditized, professional reasoning is emerging as the new battleground for the enterprise market, and the launch of Deep Think mode shows Google is unwilling to give ground in this high-value domain.

From Benchmark Test to Gold Medal Performance

Google emphasized Deep Think mode’s performance in rigorous academic benchmark tests in its official blog. Besides previous achievements, the Gemini 3 Deep Think model reached gold medal levels in the written sections of the 2025 International Physics and Chemistry Olympiads, and scored 50.5% on the CMT-Benchmark advanced theoretical physics test.

Google’s provided comparison shows that results from various tests for Gemini 3 Deep Think this month surpassed both Anthropic and OpenAI’s strongest reasoning modes, and were also superior to the Gemini 3 Pro preview’s reasoning mode.

For example, in the ARC-AGI-2 test, Gemini 3 Deep Think achieved an accuracy rate of 84.6%, while Anthropic’s Claude Opus 4.6 Thinking Max scored 68.8%, and OpenAI’s GPT-5.2 Thinking xhigh scored 52.9%.

The Google team stated that this upgrade was completed in close collaboration with scientists and researchers, aiming to tackle research challenges “lacking clear boundaries or single correct answers, and with often messy or incomplete data.” The model combines profound scientific knowledge with practical engineering abilities, bridging the gap from abstract theory to practical application.

Beyond breakthroughs in mathematics and programming, the Deep Think mode's performance now extends to chemistry, physics (including theoretical physics), and other scientific disciplines. This breadth means the model is no longer limited to specific fields, but has become a cross-disciplinary research tool.

Real-World Application Cases Demonstrate Value

Scenarios from early testing users demonstrate the model’s practical application potential. Rutgers University mathematician Lisa Carbone used the Deep Think mode to review a highly specialized mathematical paper while studying mathematical structures needed for high energy physics. The model successfully identified a subtle logical flaw previously missed in human peer review.

At Duke University, the Wang Lab used Deep Think mode to optimize manufacturing methods for complex crystal growth to discover potential semiconductor materials. The model designed a formula to grow thin films over 100 microns, achieving precision previously unattainable by prior methods.

Anupam Pathak, head of Google’s Platforms and Devices division and former CEO of Liftware, tested the new Deep Think mode to accelerate the design of physical components.

Another use case presented by Google shows that with the upgraded Gemini 3 Deep Think, users can turn sketches into 3D-printable models. The model can analyze blueprints, model complex shapes, and generate files for 3D printing.

Strategic Deployment in the Enterprise Market

This upgrade reflects an industry shift—from general chatbots to professional reasoning engines capable of handling expert-level problems. For enterprise customers, evaluation standards are changing; it’s no longer just which AI writes code or summarizes documents fastest, but whether the model can handle complex financial models, analyze experimental data and spot methodological flaws, assist patent research, or help drug discovery.

Google’s advantage lies in integration. Deep Think mode does not exist in isolation, but is part of the broader Gemini ecosystem, meaning it could leverage Google’s vast knowledge graph, scientific datasets, and research partnerships. Researchers using Deep Think mode through Google Cloud, theoretically, can access computational power and data sources unmatched by stand-alone AI services.

The company wrote on X on Thursday: “The upgraded Deep Think mode is already driving discoveries and helping researchers solve ‘unsolvable’ problems—from finding flaws in research papers to optimizing semiconductor (crystal) growth.” This highlights the model’s ability to transition from benchmark test scores to real-world applications.

From a product strategy perspective, Google is opening access to both consumer and enterprise users. Google AI Ultra subscribers can use the Gemini app immediately, while scientists, engineers, and enterprise users can apply for early access via the Gemini API. This tiered strategy reflects Google's dual goal: maintaining consumer market presence and competing for high-value enterprise customers.

Reasoning Model Competition Heats Up

The release of Deep Think mode puts Google in direct confrontation with OpenAI and Anthropic in the AI reasoning competition. OpenAI’s o1 model reportedly spends more time “thinking” before generating responses, using reinforcement learning to improve reasoning chains. Anthropic’s Claude 3 has claimed a niche in research and analysis tasks. Now Google has entered the same field, backed by infrastructure and distribution advantages from its integration with Workspace and Cloud Platform.

For professional users, this means choosing between fast general responses and slower deep reasoning, becoming a new architectural decision. Applications may route simple queries to standard models while sending complex questions to reasoning modes, creating a layered AI reasoning methodology.

Google wrote on X on Thursday: “Gemini 3 Deep Think mode excels in advanced benchmark tests at the frontier of intelligence. Specific data: 48.4% in ‘Humanity’s Last Exam’ (no tools), 84.6% in ARC-AGI-2 (ARC Prize Foundation verified), 3455 Elo rating in Codeforces competitive programming.”

Google also pointed out that the model now performs outstandingly in science fields like chemistry and physics.

The real test of this competition is not press releases, but actual adoption. If research institutions and engineering firms begin using Deep Think mode to tackle complex work, Google’s thesis will be validated—the future of enterprise AI lies in depth, not speed. Google has made its stance clear: it’s competing for the high-end of the AI market, where thinking matters more than conversation.

Risk Warning and DisclaimerThe market is risky; invest with caution. This article does not constitute personal investment advice, nor does it consider the unique investment objectives, financial situation, or needs of any individual user. Users should consider whether any opinions, views, or conclusions in this article are suitable for their circumstances. Investments based on this are at your own risk. ```