Goldman Sachs In-Depth Report: The Coming Turning Point—Decoding the AI Agent Economy

Agentic AI is shifting the artificial intelligence industry from a cost-driven narrative to a profit-driven one. According to Goldman Sachs, as token consumption is poised for exponential growth and the underlying computing costs are decreasing faster than the rate of token price declines, the profit inflection point for hyperscale cloud providers and large model providers may arrive within the next 3 to 12 months.

According to Windy Trading Desk, Goldman Sachs released a report on May 5th stating, the bank expects that by 2030, AI agents for consumers and enterprises combined will drive global token consumption to grow 24 times over 2026 levels, reaching approximately 1.2 quadrillion tokens per month; if enterprise agent adoption peaks by 2040, this figure will further expand to 55 times.

Meanwhile, Goldman Sachs' projected price and cost curve shows that mainstream large model token pricing has stabilized after previously dropping about 40% annually, with a slight rebound in some cases, while the per-token computing costs driven by chips such as NVIDIA, AMD, Google TPU, and Trainium continue to fall at a rate of 60% to 70% per year. The widening gap between these curves is opening up profit space for the industry. Large-scale AI infrastructure capital expenditure may gain more sustainable economic support due to improved profit margins.

Token Economics Inflection Point: Costs Drop Faster Than Prices, Profit Space Opens Up

The central argument of Goldman Sachs' report is that the AI industry is moving from an era of "uncertain inference economics, possibly diluting profits" to a new stage where "incremental tokens capture attractive marginal profits."

In the first stage of the AI cycle, investors generally viewed computing power and tokens as cost drivers—more usage meant more inferencing load, more accelerators, more electricity, and higher capital expenditures. But Goldman Sachs' projected price and cost curve indicates this logic is changing.

Though mainstream large model token prices have dropped sharply, they have now stabilized, in some cases even rebounding; meanwhile, the all-in per-token cost for chips from NVIDIA, Google TPU (Broadcom), AMD, and Trainium (Marvell) continues to fall rapidly and steadily. If token prices stabilize above token costs, increased adoption of Agentic AI will result in expanded profits—not just revenue growth.

Goldman Sachs further points out that Agentic AI may create a self-reinforcing economic flywheel: lower per-token computing costs generate richer, more complex agents; richer agents consume more tokens through longer contexts, more loops, more verification, and continuous monitoring; higher utilization improves AI infrastructure economics, in turn supporting providers to keep investing in model quality and distribution. Goldman Sachs believes this flywheel is fundamentally different from the mainstream narrative that "AI usage will bring unsustainable cost burdens."

However, Goldman Sachs also highlights risks: not all AI workloads are guaranteed to achieve a positive profit inflection point. For highly commoditized pure text chatbots, competition may still drive token prices down faster than computing costs.

Consumer Agents: From Fragmented Conversations to "Always-On" Assistants, Token Consumption Will Increase 12-fold

Goldman Sachs estimates that by 2030, consumer-side AI agents could increase global token consumption 12 fold, adding about 600 trillion tokens per month.

The report divides consumer agents into two categories: "on-demand" agents (like OpenAI Operator, Claude Code, and other browser-side agents), which autonomously plan, execute, and return results after the user initiates a task; and "always-on" agents, such as persistent background email monitoring, schedule management, or digital life assistants. Goldman Sachs believes the biggest token consumption surge will occur as agents shift from user-initiated to persistent background operation—where agents continuously monitor context and act proactively when needed.

According to simulation data, a typical LLM chatbot consumes about 1,000 tokens per conversation, an embedded Copilot consumes over 5,000 tokens per day, while an always-on agent can consume over 100,000 tokens per day.

Goldman Sachs expects that by 2030, daily AI queries will rise from about 5 billion in 2025 to about 23 billion, with up to 30% flowing to agents in fields like search, shopping, travel, email, and personal productivity. Meanwhile, traditional search engines' share of queries is expected to drop from 68% in 2025 to 36% in 2030, while LLM native apps’ share will rise from 12% to 31%.

Enterprise Agents: Workflow Complexity Drives Token Intensity, Consumption Could Reach 55 Times by 2040

Goldman Sachs expects enterprise AI agents to be the biggest token multiplier, driving global token consumption up 24 times by 2030, and further up to 55 times at peak adoption in 2040, with enterprise workloads then accounting for over 70% of global token usage.

Enterprise agents are so much more token-intensive than consumer ones because their workflows require agents to perform more complex, precise operations—monitoring tasks, retrieving context, reasoning about anomalies, validating outputs, updating systems, and reporting issues continuously throughout the workday. Enterprise agents also tend to involve heavier multimodal inputs (voice, images, documents, screen activity, app data, logs, and structured system records), significantly boosting token intensity.

Goldman Sachs quantified token consumption for simulated agents in different professions.

Results show a programming agent consumes about 7 million tokens per day, with an API cost of around $13/day, far below human labor costs—explaining why software development is seeing the fastest agent adoption; call center agents consume about 2 million tokens per day, but if relying on real-time voice processing, costs can reach $92/day, making full voice automation still economically non-competitive; data entry agents consume about 25 million tokens per day, costing around $60/day, still below human costs.

Goldman Sachs notes that the adoption speed for enterprise agents will depend on four variables: token volume, API cost, modality mix, and implementation complexity. Text-driven workflows with mature tool ecosystems will scale up first; voice-driven or deeply integrated backend workflows may progress more slowly.

Looking at adoption curves, Goldman Sachs thinks enterprise Agentic AI will likely follow an S-shaped curve, with peak adoption around 35% to 40% of knowledge workers, reaching peak in about 15 years—faster than the historical median for technology diffusion (29 years).

Capex Sustainability: Improved Profits Give Hyperscale Cloud Providers More Room

One key investment conclusion from the Goldman Sachs report is: improved profit margins for hyperscale cloud providers will make the currently high infrastructure investments more sustainable, thus easing the market's core concerns over AI capital expenditure returns.

The report notes that operators are still supply constrained in meeting current and future computing demand; both Google and Meta have raised their capex forecasts for FY2026, and Amazon’s management reiterated its strategy to maintain high capex after Q1 earnings. Goldman Sachs expects that as the profit inflection point nears, investors will increasingly seek evidence for return visibility.

For specific stocks, Goldman Sachs’ main thesis for Amazon is the re-acceleration of AWS revenue growth (up 28% YoY in Q1), plus $364 billion in backlog; for Google, the view is based on its cloud business growing 63% YoY in Q1 and backlog nearly doubling QoQ to $460 billion; for Meta, it’s based on significant outperformance of its ad business compared to the digital ad industry overall, and AI computing power’s ongoing contribution to user engagement and ad monetization.

In software, Goldman Sachs argues that lower token costs make it easier for software vendors to embed agents into existing products without significantly affecting gross margins, and supports pricing based on outcomes, productivity, or work units rather than just seat licenses—expanding the addressable market. For IT services companies, as agents shift AI consumption from standalone tools to enterprise-grade, highly integrated workflow transformation, demand for integration, governance, and managed orchestration will rise sharply, with Accenture seen as a main beneficiary of this trend.

~~~~~~~~~~~~~~~~~~~~~~~~

The above exciting content comes from Windy Trading Desk.

For more detailed interpretations, including real-time insights and frontline research, please join [Windy Trading Desk Annual Membership]

Risk Warning and DisclaimerThe market involves risks, and investment needs caution. This article does not constitute personal investment advice, nor does it take into account individual users’ specific investment objectives, financial situations, or needs. Users should consider whether any opinions, perspectives, or conclusions given here fit their particular circumstances. Investments based on this are at your own risk.