Tencent's Qiu Yuepeng: As inference demand surges, cloud infrastructure must be upgraded accordingly
```

Author | Huang Yu
In 2025, with the explosive growth of AI applications and the arrival of the first year of Agent, inference demands are surging. To seize this opportunity, cloud service providers are also actively upgrading cloud infrastructure to meet market demand.
On September 16, at the 2025 Tencent Global Digital Ecosystem Conference, Qiu Yuepeng, Vice President of Tencent Group and President of Tencent Cloud, stated that the industrial focus of large models has shifted from training to inference, which has already become an industry consensus. At the same time, customers have shown an intense enthusiasm for using large models and building Agents, all of which have led to a surge in inference demand.
This also means that AI infrastructure must be upgraded accordingly.
In recent years, Tencent Cloud has been continuously upgrading its cloud infrastructure to support the large-scale deployment of Agents and the global development of enterprises. According to Qiu Yuepeng, Tencent Cloud has made breakthroughs in inference acceleration, Agent Infra, and internationalization, and will take a more open approach to help enterprises seize opportunities in this era.
In terms of inference acceleration, Tencent Cloud is deeply involved in open-source contributions, submitting multiple optimization technologies to communities such as DeepSeek, vLLM, and SGLang. Meanwhile, in response to the memory bottlenecks faced during large model inference, Tencent Cloud independently developed and open-sourced the FlexKV multi-level cache technology, significantly reducing the occupancy of KVCache and decreasing the first-token latency by up to 70%.
At the same time, Qiu Yuepeng revealed that Tencent Cloud relies on a heterogeneous computing platform to integrate various chip resources, providing cost-effective AI computing power to the market. Currently, the platform has fully adapted to mainstream domestic chips.
It is understood that full-stack optimization by coordinating software and hardware is a long-term strategic investment of Tencent Cloud. Through the software capabilities of the heterogeneous computing platform, different types of chips are integrated to provide cost-effective AI computing power to external customers.
This year is regarded as the first year of Agent. As cutting-edge technology enters enterprise production environments, ensuring efficient operation in a secure and trustworthy environment has become a new challenge. To this end, Tencent Cloud has also launched a brand new Agent infrastructure solution—Agent Runtime.
Agent Runtime integrates five key capabilities: execution engine, cloud sandbox, context service, gateway, and secure observability service. The cloud sandbox, based on independently developed technology, has a startup time of only 100 milliseconds and supports hundreds of thousands of concurrent instances.
Besides upgrading the infrastructure for Agents, Qiu Yuepeng pointed out that Tencent Cloud is also considering how to apply Agent capabilities in customers’ cloud journeys, helping customers better use and manage the cloud, thus introducing Tencent Cloud’s expert service agent—Cloud Mate.
Cloud Mate consists of a series of sub-Agents with expertise in various cloud fields. It is not just a technology but also a summary of Tencent Cloud’s vast practical experience, capable of visualizing cloud architecture governance, preemptively intercepting risks, and significantly improving troubleshooting efficiency, thereby changing the way cloud is managed.
Qiu Yuepeng revealed that in internal trials, Cloud Mate has achieved a 95% risk SQL interception rate, reducing troubleshooting time from 30 hours to as little as 3 minutes.
As the Agent era surges forward, cloud service providers are actively preparing for this arms race.
Risk warning and disclaimerThe market is risky, and investment should be made cautiously. This article does not constitute personal investment advice and does not take into account individual users’ special investment goals, financial status, or needs. Users should consider whether any opinions, views, or conclusions in this article are suitable for their particular circumstances. All investment based on this is at your own risk. ```