Citrini: AMD and Apple both betting on flash memory to replace DRAM, memory costs may drop by up to 55 times

```

The reliance of AI inference on expensive DRAM is loosening. AMD announced the acquisition of memory optimization company MEXT, bringing AI-driven flash memory optimization technology into data centers. This signals a structural migration in AI storage architecture, with only one core driving force: the cost of flash memory is only 1/55 that of DRAM.

On Monday, AMD announced the completion of its acquisition of MEXT for an undisclosed amount. MEXT has developed AI-driven predictive memory technology aimed at making flash memory behave more like DRAM, expanding available memory capacity while maintaining performance and efficiency. AMD stated that the acquisition will expand its AI product portfolio, helping data center customers to improve performance, reduce total cost of ownership, and accelerate workload deployment. "Demand for memory is growing across every segment of enterprise computing," AMD said in a statement.

The acquisition news spurred AMD's stock price to rise 7.7% to $550.75 on Monday, with its market value approaching $900 billion, while the S&P 500 index as a whole rose 1.8% that day. AMD's cumulative increase this year has reached 323%. Last Friday, Citi upgraded AMD’s rating from neutral to buy, raising the target price from $460 to $575.

It is noteworthy that Apple started promoting its "LLM in a Flash" edge-side solution as early as 2024. Behind this strategy is the intensifying DRAM supply crisis. According to TrendForce data, high-bandwidth memory (HBM) now accounts for about a quarter of all DRAM wafer capacity, and DRAM contract prices soared about 90% quarter-on-quarter in Q1 2026. Citrini Research points out that AI storage demand is now so huge that it requires multiple architectural layers; flash memory does not replace HBM, but rather accommodates overflow demand in terms of capacity—this architectural reconstruction is redefining the pricing throughout the AI storage supply chain.

Memory Tax Crisis: Bottleneck Spreads from AI to the Entire Economy

According to Morgan Stanley analysts led by Shawn Kim earlier this month, soaring memory prices and supply shortages are evolving into full-scale risks for the digital economy, "spreading from AI infrastructure bottlenecks to hardware profit margins, device affordability, cloud costs, inflation, and even policy levels." This pressure already has concrete evidence: Xbox CEO Asha Sharma stated last week that memory costs have surged about fivefold in the past two years, making it impossible for the company to produce the quantity of game consoles consumers want.

HBM’s continuous encroachment on DRAM capacity is the core driver of this crisis. Based on data from TrendForce and disclosures from Samsung, SK Hynix, and Micron, HBM’s share of DRAM wafer capacity has jumped from 2% in 2020 to an estimated 25% in 2026. Hyperscale cloud providers pre-purchase future wafer production through multi-year contracts, further squeezing available standard chip capacity for phones and PCs.

The construction of new DRAM capacity also faces structural constraints. Capacity expansion relies on EUV lithography machines to imprint finer linewidths, with each EUV device costing as much as $200 million, and a new wafer fab requiring billions of dollars, with construction taking years even under optimal conditions. This supply rigidity is the fundamental reason for the sustained shortage.

55x Cost Difference: The Economic Logic of Flash Replacement

According to Citrini Research calculations, the per-bit cost of flash memory is about 1/55 that of DRAM—QLC NAND is roughly $0.05 per GB, DDR5 DRAM is around $2.75 per GB, and HBM3E reaches as high as $15 per GB. The exploitable space of this price difference lies in the largest single memory consumption in AI inference—KV cache (which records all previous context tokens during each step of model generation and can grow to hundreds of GB in long conversations)—whose requirements for read speed are far lower than those of model weights decoding. For such sequentially-read data, DRAM's speed advantage is greatly reduced, while the capacity advantage of flash memory is fully realized.

The expansion path of flash is fundamentally different from that of DRAM. Flash increases capacity by vertically stacking more cell layers, relying on existing factory deposition and etching processes without requiring new lithography nodes and without consuming EUV resources. Flash controllers are produced with mature 6/7nm processes, far from the bottleneck nodes that limit advanced processes.

A previous paper by Apple researchers, "LLM in a Flash," provides methodological backup: by storing large language model parameters in device flash memory and loading them into DRAM as needed, it is possible to run models exceeding DRAM capacity limits on devices with limited DRAM, and achieve inference speeds on CPU and GPU 4-5 times and 20-25 times faster, respectively, than naïve loading methods.

Two Paths: Data Center and Edge Device Synchronous Evolution

AMD’s acquisition focuses on the data center. By integrating MEXT’s technology into AMD’s data center product portfolio, AMD aims to help enterprise customers increase resource utilization efficiency and reduce costs in AI workload deployment. Morgan Stanley’s Shawn Kim team believes that, despite ongoing memory shortages, AMD has structural advantages in cloud market competition—“proxy AI-driven CPU demand structurally favors AMD’s share expansion in the cloud market.” Citigroup’s optimism about AMD is more based on its GPU sales and direct competition with Nvidia.

Apple’s strategy is on the edge. The "LLM in a Flash" solution partially shifts model inference’s reliance on expensive cloud memory to device local flash, which reduces cloud computing costs and provides feasible memory architecture support for edge AI applications.

According to Citrini Research, the two paths point to the same conclusion: The memory hierarchy of AI inference is being reconstructed. Low-frequency KV caches, model weights, and edge device data will gradually move from the expensive HBM/DRAM layer down to the NAND Flash/SSD layer, creating a multi-tiered storage architecture.

This architectural transition is producing multi-level ripple effects along the supply chain. As analyzed by Citrini Research, the most direct beneficiaries are NAND manufacturers: high-capacity NAND, enterprise SSDs, and QLC NAND are the purest directions, including SanDisk, Western Digital, Micron, and Kioxia.

The SSD controller layer is seen as having the most sustained demand—the key to making flash truly close to memory experience lies in controllers, firmware, and NVMe architecture optimization, involving companies like Silicon Motion and Marvell. The CXL/PCIe high-speed interconnect layer also benefits.

Risk Warning and DisclaimerThe market has risks, and investment should be cautious. This article does not constitute individual investment advice, nor does it take into account the particular investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific situation. Investment based on this article is at your own risk. ```