On the eve of its IPO, Zhipu launches its flagship large model GLM4.7.

At a critical juncture of advancing its IPO, Zhipu AI officially launched and open-sourced its latest flagship model, GLM-4.7. The new version focuses on enhancing coding abilities, long-term task planning, and tool collaboration, marking another significant iteration of the company's technical product line.

On December 23, Zhipu officially launched and open-sourced its latest flagship model GLM-4.7. In multiple mainstream public benchmark tests, GLM-4.7 demonstrated competitive performance, with some metrics surpassing current leading models in the market. Data shows that in the professional coding evaluation system Code Arena, which involves blind testing by millions of users globally, GLM-4.7 ranks first among open-source models and first among domestic models, outperforming GPT-5.2. Meanwhile, the model achieved open-source SOTA (state-of-the-art) scores on SWE-bench-Verified and LiveCodeBench V6, aligning with Claude Sonnet 4.5.

In terms of architecture, GLM-4.7 introduces "retentive thinking" and "round-level thinking" mechanisms, significantly improving the stability and controllability of complex tasks. Regarding frontend generation quality, the model’s understanding of UI design specifications has been enhanced, enabling it to generate more aesthetically pleasing web pages and PPTs. Currently, the model is available via API service through BigModel.cn and has been launched in the Skills module on z.ai’s full-stack development mode, supporting unified planning of multimodal tasks.

This update signifies another breakthrough in domestic large models’ collaborative abilities for “thinking” and “action.” With enhanced coding capabilities, developers can more naturally organize development processes around “task delivery” as the core, a move also regarded as Zhipu’s important display of technical strength on the eve of their capital market operations.

Coding and Reasoning Capabilities Set New Benchmarks

According to published test data, GLM-4.7 has achieved remarkable improvements in programming and reasoning abilities. In the HLE (“Human Last Exam”) benchmark test, the model scored 42.8%, a 41% increase over the previous-gen GLM-4.6, exceeding GPT-5.1.

In the field of code generation, GLM-4.7 shows advantages in multi-language coding. Specific evaluation data includes:

SWE-bench-Verified: Achieved 73.8% open-source SOTA score.LiveCodeBench V6: Reached 84.9% open-source SOTA score, surpassing Claude Sonnet 4.5.Terminal Bench 2.0: Achieved 41%, a 16.5% improvement.

Additionally, in terms of tool invocation capabilities, GLM-4.7 scored 87.4 in the τ²-Bench interactive tool invocation assessment, setting a new open-source record.

Introduction of a Controllable "Thinking" Model

To address stability issues in complex tasks, GLM-4.7 has reinforced the controllability of its thinking evolution, specifically in three dimensions:

Interleaved Thinking: The model engages in pre-thinking before each response or tool invocation, to enhance compliance with complex instructions and code generation quality.
Retentive Thinking: Supports automatically retaining thinking blocks during multi-turn conversations, increasing cache hit rate, thereby lowering the reasoning cost of long-range tasks.
Round-level Thinking: Allows control of reasoning overhead within the session by "round." Simple tasks can disable thinking to reduce latency; complex tasks enable thinking to ensure accuracy.

This mechanism enables GLM-4.7 to realize the logic of "think first, then act" in mainstream programming frameworks such as Claude Code, TRAE, Kilo Code, Cline, and Roo Code, outperforming previous versions in stability and deliverability of actual programming tasks.

Frontend Aesthetics and Full-stack Delivery

For frontend development scenarios, GLM-4.7 has improved understanding of visual code. In practical applications, the model can better adhere to UI design specifications, providing aesthetically pleasing default solutions for layout structure, color harmony, and component styles, reducing the time needed for manual fine-tuning.

According to official demonstrations, the model’s layout aesthetics in office creation are significantly upgraded, with PPT 16:9 adaptation rate jumping from 52% to 91%, and generated results essentially reaching "ready-to-use" standards.

In practical case demonstrations, GLM-4.7 can independently complete development of highly interactive mini-games such as "Plants vs. Zombies" and "Fruit Ninja," showing strong abilities in task decomposition and tech stack integration.

Market Feedback: Cost-Performance and Real-World Results

After the launch of GLM-4.7, it quickly attracted attention from the global developer community. User feedback has focused on its ability to practically solve problems and its very high cost-performance ratio.

On social media, user Diego shared a case of visualizing one-way red lights using Python code written by GLM-4.7, saying the result "runs well overall," only noting a minor issue of vehicle color changing with the lights.

User Alex Fazio stated the performance on WebDev Arena was shocking, bluntly saying “GLM-4.7 surpasses GPT-5.2.”

The pricing strategy has also become a focal point of market discussion. User Bessi pointed out that subscribing to GLM-4.7 for a year costs only as much as one month of the highest tier plan for Codex or Claude Code, and believes that this highly competitive pricing model will challenge Western AI companies, bluntly stating “Whether you like it or not, this is the future.”

Regarding the speed of model evolution, user Chubby commented that the HLE benchmark was originally designed to be extremely complex and difficult to fit in the short term, but the industry raised the score from 8% (o1) to 45.8% in just 12 months using various tools, and the breakthrough achieved by GLM-4.7 shows technical iteration speed is "exceeding expectations."

Risk Warning and DisclaimerThe market has risks, and investment should be cautious. This article does not constitute personal investment advice, nor does it consider the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, viewpoints, or conclusions in this article suit their particular situation. Investing accordingly is at your own risk.