Wall Street debates “Tokenomics”: Has the AI bill spiraled out of control?

A Token index chart has ignited anxiety over AI growth and runaway bills.

This chart is the LLM Token expenditure index compiled by Silicon Data. On June 11th, it had fallen for seven consecutive days, marking the longest streak since January this year; in the past 12 days, it dropped on 11 days.

This index measures how much is paid on average in the market for every 1 million Tokens used—essentially a barometer of how much the market is willing to pay for AI. It has more than doubled since December last year, continued to climb until May 2026, but has recently plummeted.

It touches not just a small indicator, but the whole AI trading chain: If companies start controlling Token bills, will expectations for capital expenditure by GPU, DRAM, data centers, and cloud vendors be repriced?

US macro strategist Andreas Steno Larsen directly called this chart “the most important chart to watch in the whole market right now” on June 9th, warning that if Token pricing keeps weakening, the cycle from memory to broader hardware and data center trading may come to an end.

This statement struck the most sensitive nerve of investors. But Wall Street’s views are more complex and multifaceted; the weakening of Token pricing may not simply mean AI demand has peaked.

Is the index decline a sign of peak demand? What does this chart actually show?

This chart cannot simply be interpreted as “no one is using AI.”

It’s not the Token total demand index, nor the total Token expenditure amount. What it measures is the weighted average price per million Tokens—in other words, it reflects what price point models are being used by users.

A simple calculation illustrates this: suppose a frontier model is $10 per million Tokens, and a cheap model is $1. One month, all 100 units of usage are on the frontier model, so the index is 10. The next month, demand doubles to 200, but all new usage runs batch tasks on cheap models; the index drops straight to 5.5.

Demand doubles, index is halved.

This means the index drop has two completely different interpretations: first, demand is actually shrinking; second, usage is exploding, but users are actively moving to lower-priced models.

This is the core of current debate: Does an index drop mean peak demand, or a shift in usage structure?

Citadel Securities’ “Tokenomics” report suggests that the core constraint for AI adoption has shifted from “model capability” to “cost and scarce computing power,” and users are accelerating their shift to cheaper models.

“Adoption trends are increasingly less dependent on frontier model functionalities and more on price... The recent drop in the Token index may partially reflect this shift to cheaper models.”

J.P. Morgan TMT analyst Mark Schilsky summed up recent market discussion on June 11 as: “AI bills are out of control.” At the same time, the bank believes current Token spending disorder is merely “the smallest speed bump on the way to higher expenditures.”

Citadel’s judgment: “Cost-effectiveness and scarcity” of AI is more important now

In its newly released “Tokenomics” report, Citadel Securities provides a clear directional judgment.

The core argument: The core constraint for AI implementation has shifted from “model capability” to “cost and scarcity.”

Citadel says: “The core of technology landing is no longer what frontier models can theoretically do, but the price and scarcity of inputs required for scale AI operation. Computing power, electricity, cooling, memory bandwidth, and inference budgets are all real and binding limits.”

The report cites basic economic principles: price has three functions—transmitting scarcity signals, creating substitution incentives, and allocating resources to the highest value use. All three are happening in AI at the same time.

The conclusion: The best future returns will not come from companies building the strongest models, but from companies that lower AI costs and improve efficiency.

Frontier inference-intensive AI won’t disappear but will increasingly concentrate in the hands of a few large enterprises able to bear the costs. For the broader economy, until physical constraints are eased, simpler models may be the more productive path.

Low-priced models are changing bill structures

Rich Privorotsky, head of Goldman Sachs' One-Delta division, mentioned that DeepSeek lowered its pricing by 75%, and Xiaomi’s MiMo nearly lowered prices by 99%; the easing of infrastructure bottlenecks is triggering a price war.

Coinbase CEO Brian Armstrong predicts that in the next 12 to 18 months, 80% of AI workloads will migrate to models with 99% lower costs, and only 20% of tasks that need supreme intelligence will stay with frontier models.

Hugging Face CEO Clement Delangue cited Stanford University data showing that local models now have an accuracy rate of 71.3% for real-world queries, with extremely low costs.

This closely aligns with Citadel's judgment: frontier AI won't disappear but will likely concentrate in a few companies who can bear computing costs, have research depth, and can turn difficult problems into scalable profit.

For the broader economy, before physical constraints are eased, simpler models may be cheaper productivity tools.

In other words, AI usage may become layered.

High-value, complex tasks continue using frontier models. Routine, batch tasks, and low-return experiments switch to cheap or local models.

J.P. Morgan: Optimizing bills doesn't mean AI demand has peaked

J.P. Morgan’s judgment is: the current bill anxiety may only be a small speed bump in the early phase of AI demand; in a year, Token expenditure may be significantly higher.

If the average cost per million Tokens falls, but the AI payment penetration rate among US companies continues to rise, total Token usage must increase significantly in arithmetic terms. In other words, “unit price drop” and “explosive overall usage” can happen at the same time.

Internal corporate situations are similar. Companies already using AI heavily will optimize Token budgets to reduce waste; those not fully onboard may begin to use AI because models are cheaper and easier to deploy.

AI agents will further amplify Token consumption. Where a task used to require a single call, now it may be executed in multi-step operations, repeated planning, tool calls, and context reading, significantly increasing Token consumption per task. Cases cited show that for some SMEs, Token consumption per task tripled after agentification.

Thus, the key market debate isn’t “whether Token will keep growing,” but “whether the unit economics of this growth is healthy.”

Right now, businesses are starting to manage their bills

The first issue exposed on the enterprise side isn’t that no one’s using AI, but that it’s being used too casually.

Axios cited an AI consultant who stated a client enterprise recently spent $500 million in a single month on Claude, simply because employee usage limits weren’t set.

Internal corporate use of AI as KPI is also starting to have side effects.

Previously, some US companies used AI usage as ranking or assessment criteria, resulting in “Tokenmaxxing”: employees artificially inflated their usage, having AI do low-value tasks.

Amazon’s developer platform Kiro once had an internal leaderboard, “Kirorank”. Amazon SVP Dave Treadwell admitted employees ran pointless tasks with AI to boost rankings, driving up costs. He then instructed employees “not to use AI just for the sake of using AI,” and closed the internal test dashboard.

Amazon switched to “normalized deployment” metrics, tracking the actual value of AI-generated code instead of simply tracking Token consumption. Similar cases reportedly occurred at Meta, with people inflating Token consumption for ranking advantages.

The meaning of these adjustments is clear: Companies aren't stopping AI use, but are starting to distinguish “effective Tokens” from “ineffective Tokens.”

J.P. Morgan mentioned that Cloudflare launched products like AI Gateway to help companies control Token budgets. Tools like OpenRouter have existed for a long time, essentially routing and cost management across models.

Pricing is also changing.

On June 1, GitHub Copilot officially switched from per-request pricing to per-Token usage billing. Reddit users noted monthly costs are expected to jump from below $45 to over $847.

GitHub Chief Product Officer Mario Rodriguez previously said the old pricing model is unsustainable as agent AI rises.

Gartner analyst Arun Chandrasekaran told Business Insider that as advanced inference models drive up computing consumption, more companies will shift to usage-based pricing.

This means costs previously hidden with fixed subscriptions or subsidies are now surfacing in corporate finance.

Long-short divergence: Does hardware trading logic still hold?

The ultimate question in this debate is whether the AI infrastructure investment logic still holds.

The bulls cite: Goldman Sachs’ Jim Schneider calculates that by 2030, agent AI will drive Token consumption up 24-fold, and cloud provider gross margins will turn positive in the short term. Mark Schilsky also believes the short-term chaos in Token spending won't change the long-term trend.

The bears cite: Goldman Sachs semiconductor analyst Jim Covello says current industry prosperity comes at the cost of upstream consumption—nearly all value flows to semiconductor companies, which is unsustainable. Investor Tommy Shaughnessy warns, major AI companies have deeply negative profit margins; once companies face the real per-usage prices, capital flow supporting GPU procurement and model training may reverse.

In recent interviews, Anthropic CEO Dario Amodei, Broadcom President Hock Tan, and “The Big Short” prototype Steve Eisman all mentioned similar issues: companies are currently overspending on AI tools—in other words, “Token maximization”; as cost awareness builds and pricing shifts to per-Token, future investments will need a sharper focus on ROI.

On social media, some directly question the framing of this chart. Some point out, “six consecutive declines last year happened four times during the peak adoption period,” “cheaper models may actually accelerate deployment since cost barriers are lower.”

Others rebut: “No matter how cheap, every new model launch will exponentially consume more Tokens.”

Currently, there is no conclusion to this debate. But one thing is certain: marginal changes in Token expenditure, through the transmission chain of GPU power, DRAM memory, and data center demand, directly affect capital expenditure expectations for Nvidia, memory chipmakers, and cloud service providers. Investors must keep watching this chart.

Risk warning and disclaimerThe market has risks, investment needs caution. This article does not constitute personal investment advice, nor does it take into account individual users’ specific investment goals, financial situation, or needs. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific situation. Investment based on this article is at your own risk.