UBS finds: 60% have started to control AI spending, companies shifting toward low-cost models and open-source Chinese models

AI expenditure management is becoming the new battleground for enterprise IT governance. With the widespread adoption of AI Agents and code tools, Token billing has officially entered CFOs' view, and corporate responses are reshaping the beneficiary dynamics of the AI industry chain.

According to Windy Trading Desk, the team led by UBS Securities analyst Karl Keirstead shared a core judgement in an AI research report released on June 23: The surge in Token spending optimization may temporarily drag down AI revenue growth, but the long-term trend remains robust. Early-stage research indicates that around 60% of enterprises have restricted AI expenditures in some way, mainly by setting up guardrails for Token usage. This proportion suggests that the cost governance of AI expenditures has evolved from spontaneous action by individual firms to a broader industry phenomenon.

The direct impacts of these changes are now being transmitted through the industry chain. High-priced cutting-edge models face pressure of downgraded usage and open-source alternatives; Chinese open-source models—such as Alibaba’s Qwen, DeepSeek, MiniMax, Zhipu GLM, etc.—are starting to be included in enterprise procurement and deployment options. A large global bank has already deployed Qwen locally to balance usage of premium models like Claude.

Cloud vendors and hardware layers are relatively less impacted, while software companies are in the most complex position: they face client budget cuts but also have opportunities to position themselves as Token optimization platforms.

Enterprises aren't abandoning AI; they're just starting to check the Token bill

The early phase of enterprise AI adoption was characterized by a rough approach—employees were encouraged to try it out as much as possible, with priority given to adoption rates and generally weak cost discipline. As AI Agents and AI Coding tools become more popular, Token consumption shifts from chatbots’ low traffic to sustained high-traffic task runs, and the issue of "Token-maxxing" emerges.

Research shows some extreme cases: Some companies saw their annual Token budget consumed so much that they had to shrink internal AI tools from five to two; some companies recorded single-user monthly spend of $35,000 on AWS Bedrock; some DevOps team members consumed 100%-200% of their Token quota weekly, but the company has not intervened clearly yet.

This is not a unified "braking" story. Some enterprises have deeply integrated AI into product workflows, and the goal is not to use fewer Tokens, but to maximize output per dollar spent. Some companies have tied employee compensation targets to AI usage, creating tension between CFOs’ cost-reduction goals and CEOs’ adoption goals. Databricks’ CEO describes this round of changes as: "This is a big speed bump, not a small one."

What’s actually being compressed tends to be use cases with unclear ROI. Software engineers boosting code output, AI Agents reducing call volume, faster R&D process—these metrics mean some companies don’t have reason to restrict usage. Firms are willing to tolerate high Token bills if ROI is visible.

Model routing turns premium models from "defaults" into "luxuries"

The most important technical move in Token optimization isn't simply capping usage, but model routing: assigning different tasks to different models, only using the most expensive models for complex reasoning, critical code, and extended context analysis.

Price differences are the direct driving force behind this behavior. Take Anthropic model pricing for example: Haiku 4.5 output costs $5 per million Tokens, Opus 4.5-4.8 is $25, and Fable/Mythos 5 goes as high as $50—from low-end to high-end, the output Token price differs by tenfold. Such price gaps make choosing models by task highly meaningful for cost control.

A more reasonable metric is "effective cost per successful outcome": a premium model generating high-quality results in one go may be cheaper than a low-end model requiring multiple iterations, but this demands that high-end models consistently prove their value. Teams who used to submit all tasks to the strongest model now question: Does this task really need the maximum context window?

Microsoft's recently launched MAI small language model fits this direction. MAI "Thinking" is described as a 3.5 billion parameter mid-size model, Code-1 as a low-end cutting-edge model, aiming to provide enterprises an option that's "good enough but cheaper".

Chinese open-source models enter the enterprise cost curve

Downgrading isn’t limited to a single model supplier. Enterprises are evaluating open-source models on a larger scale, especially those from China, including Alibaba Qwen, DeepSeek, MiniMax, Zhipu GLM and Kimi by Moonshot.

As described, a large global bank, to manage Token expenditures, started local deployment of Qwen to balance use of premium models like Claude. Local deployment shifts the cost structure from per-Token billing to local hardware capacity configuration, while avoiding compliance risks tied to external hosting of Chinese models.

Cloud platforms have already made the above models part of their standard menus. AWS Bedrock’s model options include MiniMax, Kimi, Qwen, DeepSeek, GLM; Microsoft offers DeepSeek via Azure AI Foundry and continually assesses different models’ performance and cost combinations under a multi-model strategy.

For Chinese model providers, this is an opportunity, though boundaries remain clear. Open-source models are usually free or low cost, so direct monetization is limited. A more realistic path may resemble the cooperation project between BMW and Alibaba around Qwen.

Cloud and chips face different kinds of pressure

The model layer is where cost pressure hits directly; the impact on cloud and hardware layers is more circuitous.

AWS, Azure, Google Cloud are already multi-model platforms and do not bet solely on any one cutting-edge model company. When clients switch from expensive models to small or open-source models, it may affect cloud vendors’ model API revenue growth, but as long as inference runs on the cloud, demand for computing power doesn’t disappear. The more enterprises focus on cost management, the more likely they are to consolidate model selection, deployment, security, and billing onto cloud platforms.

The impact on GPU cloud and AI infrastructure pricing power is a variable to watch: If model companies lower per-Token prices because clients are price-sensitive, can cloud computing still raise prices? Investors are already discussing this, but currently, supply is tight and AI penetration is still in early days; demand for training and inference hasn't been interrupted by optimization behavior.

The outlook for the hardware layer is generally optimistic. New generations of computing power such as GB200/GB300 are starting to scale, and models trained and inferred with these chips will likely bring better Token economics. Audio, video, and physical AI—multimodal data flows—continue to expand computing boundaries.

Software companies: Budget pressure and "optimizer" opportunities coexist

As AI Token expenditure rises, enterprise budgets cannot expand indefinitely. Currently observable sources for funds include: slower hiring, reduced external IT service spending, and compressing SaaS and application software budget growth.

Uber is a representative example: AI adoption continues, but Token costs are offset by slower internal staff growth. This framework is also used to interpret the weak performance of IT service companies and some SaaS companies.

Large seat-based SaaS companies are in especially complex situations. Salesforce, ServiceNow, Workday, etc. face client budget reshuffles on one hand, and on the other are pushing to shift from seat-based billing to a "seat plus usage" model. However, when clients are just shocked by AI bills, their willingness to accept another usage-based billing model is clearly reduced.

But software companies have a counter-strategy. About a month ago, Palantir commercialized AIP Evolve, helping clients choose the best model for the task, tune prompts, and improve data calls. In one disclosed case, Evolve recommended a model swap that reduced Token costs by 97%, achieving a 90% adoption rate in the first three weeks after launch.

The structural advantage of software companies is "not being bound to a single model"—they can position themselves as model-neutral scheduling platforms, managing costs and performance among Claude, Qwen, Llama and various small models, much like multi-cloud database companies.

The logic of AI growth hasn’t changed; the slope battle is just beginning

The hardest variable to quantify now is how much Token growth actually gets compressed. Many enterprises haven’t fully figured out where their Token spend goes, and reliable industry-wide data is even rarer.

A conservative scenario: If a company’s original AI Token spend was 100 and expected to grow to 150 in several months, after optimization it may land at 120–130, not set back to 80. In other words, the growth rate dims, not reverses demand.

The latest survey by UBS’s Evidence Lab of about 130 companies shows that only 8% have deployed AI Agents at scale in production, 37% use them in limited production, 29% are still piloting, and 26% use Copilot or AI Coding products but haven’t deployed Agent apps. The stage where AI Agents really drive large-scale Token consumption is just beginning.

Data from leading AI-native companies confirms this. Legal AI company Harvey revealed its Token consumption grew from 1 trillion in January to 12–13 trillion in May, showing optimization and expansion can happen together: enterprises will allocate spend more precisely, but AI usage scenarios continue to expand outward.

This round of Token optimization is fundamentally different from the cloud and software "budget contraction" of 2022–2024’s post-pandemic era: the latter was about chopping mature usage, the former is closer to cost management in the early diffusion of new tech. The result isn’t disappearance of AI demand, but a reshuffling of winners: high-priced model revenue growth gets squeezed, low-cost models and routing tools benefit, cloud platforms continue to capture multi-model deployment needs, and software companies are now at the crossroads of budget cuts and becoming savings tools.

~~~~~~~~~~~~~~~~~~~~~~~~

The above excellent content is from Windy Trading Desk.

For more detailed analysis, including real-time interpretations and frontline research, please join [Windy Trading Desk Annual Membership]

Risk Warning and DisclaimerMarkets have risks, investment needs caution. This article does not constitute personal investment advice, nor does it consider unique investment goals, financial situations, or needs of individual users. Users should consider whether any opinions, viewpoints, or conclusions in this article fit their situation. Investing accordingly, responsibility is your own.