``` Is computing power no longer scarce? AI server rental costs are decreasing, Japan launches cutting-edge "orchestration" new model ```

```

The AI infrastructure market is simultaneously exhibiting two new signals: server rental costs continue to decline, and the growth rate of compute Token expenditures is stabilizing. These two signals jointly point to a core question in the AI field — is declining AI cost driving more demand, or eroding existing pricing power?

Japanese AI lab Sakana has launched the disruptive "orchestration" framework model Fugu, which coordinates multiple cutting-edge models to replace brute-force scaling, surpassing Claude and GPT-5.5 in mainstream benchmarks.

Sakana Fugu scored 73.7 on the SWE-Bench Pro test—outperforming Claude Opus 4.8’s 69.2 and GPT-5.5’s 58.6. Fugu is not a single large model, but a dynamic scheduler — through a single API call, it decides to route different parts of a task to the most suitable advanced models for parallel processing, ultimately integrating outputs to yield results superior to any single model.

Rich Privorotsky, Head of High-Goldman 1-Delta, considers the price of compute resource rentals as a key indicator tracking whether the logic of AI hardware investment holds: the core premise in the market is compute scarcity; if rising supply leads to continued declines in rental prices, it will directly challenge this narrative. Currently, server rental costs are showing a clear downward trend.

However, Privorotsky also points out that until there is a fundamental change in the structure of Token expenditures, the trading logic for the hardware sector is likely to continue. Last week, semiconductor ETFs saw abnormal inflows, confirming the current market preference.

He stated that the price movement by hyperscale cloud providers will increasingly become a key focus; this week, the market will focus more on Micron's financial data rather than PCE.

Fugu: Orchestration-driven cutting-edge performance

Sakana is an AI lab headquartered in Tokyo, one of whose co-founders co-authored the original Transformer paper.

The Fugu framework launched by Sakana has upended the industry’s mainstream "brute force scaling" logic. As a coordinator, Fugu receives a single API request and dynamically decides to allocate different parts of the task to different state-of-the-art models for parallel processing, ultimately delivering results superior to any single model’s independent answer.

Rich Privorotsky summarizes this method as achieving frontier-level performance through "model orchestration and fusion" rather than brute force scaling. If this methodology gains wider validation, it will challenge the underlying logic of the compute arms race—because its performance gains aren’t dependent on more compute for training.

Against the backdrop of rising export control risks, Fugu’s model pool supports dynamic replacement, providing extra strategic resilience: if a supplier is restricted by export controls, the framework can automatically switch and bypass, reducing single-point supply chain risks. This feature gives it unique architectural advantages under intensifying geopolitical rivalry.

Open source catch-up accelerates, Token cost compression deepens

GLM-5.2, released by Zhipu, scored 74.4 on the FrontierSWE long-form programming benchmark, just about 1 percentage point behind Anthropic’s top model Opus 4.8 (75.1), and exceeding GPT-5.5 (72.6), making it currently the highest-rated open-source weighted model, with pricing about 72%-82% lower than Opus 4.8.

Privorotsky points out that the gap between open and closed source models is continuously narrowing. GLM-5.2 uses the MIT license, supports weight openness, model distillation, quantization, and reproducibility, representing a major leap in capability and serving as a clear signal that the gap in this field is closing quickly.

As more capable open-source models continuously emerge, the Token cost compression process is accelerating. However, Privorotsky emphasizes that the current incentive mechanism still points to more rather than less capital expenditure.

Server rental prices: The barometer for compute scarcity narrative

In Privorotsky’s analytical framework, compute resource rental prices are the core variable to observe whether the AI hardware investment logic is valid.

The logic chain is clear: if compute remains scarce, price should stay strong, legitimately supporting ongoing capital expenditure; conversely, if increasing supply leads to sustained price declines, it directly challenges the scarcity narrative, and "hardware is first to bear the pressure."

Currently, server rental costs are trending downward. The emergence of orchestration frameworks like Sakana Fugu may echo this trend—improving performance through more efficient resource scheduling rather than simply stacking compute, technically reflecting the reality of supply expansion.

Privorotsky says market focus will increasingly turn to the price moves of hyperscale cloud providers. Once they signal strategic adjustment, the underlying logic of the AI investment cycle will face reassessment.

Risk Warning and DisclaimerThe market comes with risk, and investment requires caution. This article does not constitute personal investment advice, nor does it take into account individual users’ specific investment goals, financial situation, or needs. Users should consider whether any opinions, views, or conclusions in this article suit their specific situation. Investments based on this are at your own risk. ```