Unlock the next big opportunity in storage! Korean media explains Jensen Huang's "Mysterious Reasoning Context Memory Platform" in detail

At the 2026 International Consumer Electronics Show (CES) on January 5, Nvidia CEO Jensen Huang unveiled the brand new “Inference Context Memory Platform” (ICMS), a hardware solution aimed at addressing the explosive data storage demands in the AI inference phase. This move marks a shift in the focus of AI hardware architecture from simply stacking computing power to efficient context storage, NAND flash and SSDs are poised to succeed HBM as the next key growth engine.

An article in the Korea Economic Daily on January 24 reported that Jensen Huang showcased a mysterious black rack known as the “Inference Context Memory Platform” (ICMS) during his keynote. This is not a regular hardware update, but a crucial innovation designed to resolve the data bottlenecks in AI inference. Observant reporters noted that this could be the next breakout point for the storage industry following HBM (High Bandwidth Memory).

The core logic of this platform is to solve the “KV cache” (key-value cache) problem in AI inference. As AI moves from simple training to large-scale inference applications, data volumes are growing exponentially, and the current GPU memory and server memory architectures can no longer meet the demand. Nvidia introduces a new Data Processing Unit (DPU) and massive SSDs (Solid State Drives) to build a huge cache pool, seeking to break through these physical limitations.

This technological transformation is undeniably great news for Korean storage giants Samsung Electronics and SK hynix. The report states that with the spread of ICMS, NAND flash will see a “golden age” similar to HBM. This means not just an explosive growth in storage capacity needs, but a fundamental change in storage architecture—GPUs may bypass CPUs to communicate directly and quickly with storage devices.

Explosive Growth of KV Caches Triggers Storage Anxiety

The Korean media article pointed out that the core driver for Jensen Huang introducing ICMS technology is the explosion of KV caches. In the age of AI inference, KV caching is critical for AI to understand conversational context and carry out logical reasoning. For example, if a user asks the AI a complex, subjective question about G-Dragon, the AI must call on internal model data and historical conversational context (i.e., KV caches) for weight assignment and inference, avoiding repeated computation and hallucinations.

As AI shifts from training to inference and the application scene expands to multimodality, the volume of data to be processed grows irregularly and explosively. Nvidia discovered that relying solely on the expensive HBM or conventional DRAM can’t accommodate the massive KV cache while current server internal storage architecture is stretched too thin to cope with the future of inference. Therefore, a dedicated storage platform capable of holding huge data while maintaining efficient access is urgently required.

DPU-Driven 9600TB Mega Space

According to the Korean media, the core of the ICMS platform is combining DPUs with ultra-large-capacity SSDs. Quoting Nvidia, the article says the platform uses the new “BlueField-4” DPU, serving as a “logistics officer” for data transmission, relieving the CPU’s burden. A standard ICMS rack has 16 SSD trays, each equipped with 4 DPUs managing 600TB of SSDs, bringing the total rack capacity to an astonishing 9600TB.

This capacity far surpasses traditional GPU racks. In comparison, a VeraRubin GPU platform with 8 racks has a total SSD capacity of about 4423.68TB. Huang stated that, thanks to the ICMS platform, the GPU’s virtual available memory capacity has increased from 1TB to 16TB. With BlueField-4’s performance boost, the platform achieves KV cache transfer speeds of 200GB per second, effectively solving the bottleneck of large SSDs in network transmission.

Opening the Golden Age of NAND Flash

The article notes that the ICMS platform mainly relies on SSDs, directly benefiting NAND flash manufacturers. In recent years, although AI has been booming, the spotlight has mainly been on HBM, and NAND flash/SSD haven’t received equivalent attention.

Nvidia positions this platform in a “layer 3.5” of storage between server’s internal local SSD and external storage. Compared to expensive, power-eating DRAM, SSDs managed by high-performance DPUs offer greater capacity, faster speeds, and data retention even in power outages, making them ideal for storing KV caches.

This architectural innovation directly benefits Samsung Electronics and SK hynix. Due to ICMS’s intense requirements for storage density, demand for enterprise SSDs and NAND flash will surge. Furthermore, Nvidia is pushing the “Storage Next” (SCADA) plan to let GPUs bypass the CPU and access NAND flash directly, further eliminating data transfer bottlenecks.

SK hynix has already responded quickly. According to reports, SK hynix VP Kim Tae Sung revealed that the company is working with Nvidia on a prototype called “AI-N P,” which will use a PCIe Gen 6 interface and is planned for release by the end of this year, supporting 25 million IOPS (input/output operations per second) storage products, with performance expected to reach 100 million IOPS by the end of 2027. With major players accelerating their moves, NAND flash and SSDs are set to see a new boom in both volume and price in the AI inference era.

Below is the full Korean media article, AI-translated:

At the 2026 International Consumer Electronics Show (CES), Nvidia CEO Jensen Huang introduced a mysterious memory platform: “Inference Context Memory Platform.” Today, “Tech and City” will take a deep dive into what it actually is.Keyword: KV Cache

On the 5th (local time) in Las Vegas at the NVIDIA Live conference, CEO Jensen Huang mentioned the memory platform at the end of his talk. I perked up my ears—Will this be the next HBM?

^{Star of the day: black rack-mounted NVIDIA ICMS (Inference Context Memory Storage). Image: NVIDIA}

CEO Huang pointed to a black rack at the corner of the VeraRubin AI computing platform. This rack, the star of our story today, holds massive storage space inside.

First, let me explain why Jensen Huang introduced this technology. We have to start with “KV cache,” a term the CEO often mentions in official settings. Readers may have seen KV cache in AI hardware articles about GPUs lately.

This keyword is crucial in the age of AI inference. It determines AI’s ability to understand conversational context and compute efficiently. Let’s give an example: Say you open OpenAI’s ChatGPT or Google Gemini and ask a question about Korean pop star G-Dragon.

If the user asks for objective information about G-Dragon’s music, fashion, or career, AI can answer with what it has learned. But after chatting for a while, the user suddenly asks, “Why did he become the 'idol' of that era?”—a question without a clear answer. At this point, AI starts to infer.

This is the core of KV caching: key and value. First, the key. It's easy to understand, but the AI uses key vectors to clearly identify who “that person” is in the conversation, as well as the topic and subject (key). Then, it uses various internal model data about G-Dragon and data (values) accumulated during the dialogue to assign weights, perform inference, and come up with an answer.

Without KV caches, if every question were recalculated as if it were the first, GPUs would have to redo work two or three times, lowering efficiency. This can lead to hallucinations and wrong answers. But KV cache improves efficiency. Inference based on “attention computation” reuses various data obtained from long conversations, applies weights, is faster, and results in more natural conversations.

^{Image by NVIDIA}

As AI shifts from training to inference, key-value caching is no longer just auxiliary memory. What’s more, the required capacity keeps increasing.

As more people bring generative AI into daily life, irregular spikes in data volume become inevitable. Introduction of image and video services will create even more explosive data growth as higher-level inference and AI imagination are demanded.

As AI's ability to discover new information keeps improving, it will create massive useful KV caches during user interactions in diverse scenarios.

Facing the explosive growth in KV cache, NVIDIA manages GPU traffic by dividing GPUs: some generate lots of KV cache, others use it. But there isn’t enough storage for all this cache.

Of course, server internal memory is massive: Next to GPUs are HBM memory—if that’s not enough, use DRAM modules—if still not enough, even SSDs in servers are used. However, CEO Huang seems to have realized that this setup will be hard to handle in the future inference era. Thus, he released this black box at CES.

^{NVIDIA CEO Jensen Huang launches ICMS at CES 2026. Image: NVIDIA YouTube}DPU + Ultralarge SSD = Specialized KV Cache Storage Team

This black server is the “Inference Context Memory Platform,” or ICMS. Let’s look at its details.

First, the device driving ICMS is the DPU, the data processing unit. Readers are likely familiar with GPUs and CPUs, but the hidden engine of servers—the DPU—is also noteworthy.

^{NVIDIA CEO Jensen Huang releases BlueField-4 DPU. Image by NVIDIA.}

DPU (Data Processing Unit) is like a supply and logistics officer in the military. If the CPU is the squad leader, the GPU is the attack force. DPU is responsible for managing ammunition and supplies, handling communications, allowing the CPU to make smart choices while the GPU focuses on the attack. The new DPU, BlueField-4, is assigned a new task: ICMS. Let’s look at the ICMS rack. It has 16 SSD trays total.

^{Image: NVIDIA}

Each tray has four DPUs, each managing 150TB of SSD, so each tray has 600TB of cache SSD.

This is a huge storage capacity. Let's compare. Suppose in a Blackwell GPU server, for max KV cache, we install eight 3.84TB general-purpose cache SSDs in the SSD area. Each server then has 30.72TB SSD, so 18 servers in a GPU rack = 552.96TB total SSD.

This means a single ICMS tray’s cache SSD capacity outweighs a whole GPU “rack.” One rack has 600TB x 16 = 9600TB SSD. This more than doubles the total SSD capacity of an entire VeraRubin 8-GPU rack setup (4423.68TB, or 552.96 x 8).

^{Image by NVIDIA}

At CES, Jensen Huang said: “Previously, GPU memory capacity was 1TB, but with this platform, we get 16TB of storage.”

If you think about it, he’s quite right. A full VeraRubin platform has eight GPU racks, each with 72 GPUs—576 storage cards total. Divide ICMS’s 9600TB total by 576 = ~16.7TB.

While concerns remain about physical server distance and SSD transmission speed, BlueField 4’s performance upgrades have lessened those issues. Huang explained: “We achieved 200GB/s KV cache transmission—same as before.”

Previously, GPU servers had network bottlenecks limiting the use of large SSDs (7.68TB, 15.36TB). This DPU-based network upgrade seems designed to solve it.Is NAND Flash’s Golden Age Upon Us?

^{Image by NVIDIA}

NVIDIA splits this platform into 3.5 memory groups. Group 1: HBM, 2: DRAM modules, 3: server’s local SSD, 4: external storage. ICMS explores the mysterious in-between zone between groups 3 and 4. Unlike expensive/high-power DRAM, SSD with high-performance DPU is faster, bigger, and retains data after power loss—making it ideal.

The platform is certainly a huge business opportunity for Samsung and SK hynix. One rack can add 9600TB capacity, meaning they can sell many times more NAND flash than for current NVIDIA racks—just counting bits. What’s more, the developer is NVIDIA—the company every global AI firm dreams about—which magnifies the opportunity.

^{Samsung server SSD. Even as the AI era arrived, NAND flash and SSD prices lagged, expected to jump in Q1 this year. Image: Samsung}

Over the past 3 years, despite the boom in AI, NAND flash and SSDs (solid-state drives) received little focus—mainly because their utilization is low compared to HBM, which plays a key role. NVIDIA, starting with the ICMS project, is preparing a “Storage Next” project to further boost SSD utilization. The project is part of “Storage Next” (also called SCADA—Scaled Accelerated Data Access). Soon, GPUs for AI computation will directly access NAND flash (SSD) for various data, bypassing CPUs. This is a bold move to eliminate the GPU/SSD bottleneck. SK hynix has also officially announced work on AI-N P to keep pace. VP Kim Tae Sung from SK hynix said: “SK hynix and NVIDIA are conducting a proof-of-concept called ‘AI-N P.’”

He explained: “A storage prototype supporting 25 million IOPS based on PCIe Gen 6 will likely launch by year-end.” He added: “By the end of 2027, we should be able to make products supporting up to 100 million IOPS.” 25 million IOPS is over 10x faster than current SSD speeds.

Risk Warning & DisclaimerThe market has risks; invest with caution. This article does not constitute personal investment advice, nor does it consider individual users' specific investment objectives, financial situations, or needs. Users should evaluate whether any opinions, views, or conclusions in this article apply to their personal situation. Investment made based on this, at your own risk.