OpenAI releases GPT-5.4 mini and nano, approaching flagship model performance at lower cost

OpenAI launched its two most powerful small models to date, GPT-5.4 mini and GPT-5.4 nano, on Tuesday, significantly narrowing the performance gap with flagship models with lower latency and lower cost. GPT-5.4 mini comprehensively outperforms the previous generation GPT-5 mini across core areas like programming, reasoning, multimodal understanding, and tool usage, boasting over double the speed and nearing the larger GPT-5.4 in benchmarks such as SWE-Bench Pro. GPT-5.4 nano is positioned as the lowest-cost, lowest-latency lightweight option, available only via API to developers, and designed for tasks such as data classification, extraction, and simple programming sub-tasks. The launch of these two models aims to fill the gap where large models struggle to be deployed in real-time interaction scenarios due to high latency, directly impacting rapidly growing markets like coding assistants, AI agent systems, and multimodal applications. Mini targets consumers, nano exclusive to API GPT-5.4 mini is now available across OpenAI API, Codex platform, and ChatGPT. GPT-5.4 mini’s API pricing is $0.75 per million input tokens and $4.50 per million output tokens, supporting text and image input, tool usage, function calling, web search, file retrieval, computer control, and skill extensions, with a context window of up to 400,000 tokens. On the Codex platform, GPT-5.4 mini consumes only 30% of GPT-5.4’s quota, cutting the cost of simple programming tasks for developers to about a third of the flagship model. Codex also supports delegating workloads to sub-agents running GPT-5.4 mini, so low-reasoning tasks automatically use cheaper models. On ChatGPT, Free and Go users can select "Thinking" via the "+" menu to use GPT-5.4 mini; for other paying users, when the GPT-5.4 Thinking rate limit is reached, this model will be automatically enabled as a fallback option. GPT-5.4 nano currently is available only via API to developers, priced at $0.20 per million input tokens and $1.25 per million output tokens, making it the lowest priced of the two new models. According to OpenAI, nano is suitable for sub-agent scenarios where a higher-level model orchestrates and delegates supporting tasks. Mini nears flagship, nano outperforms predecessor According to OpenAI’s published evaluation data, GPT-5.4 mini excels in programming and multimodal tasks. On the programming benchmark SWE-bench Pro, mini scored 54.4%, closing the gap with GPT-5.4’s 57.7% to just 3.3 percentage points, substantially higher than GPT-5 mini’s 45.7%. On the computer control benchmark OSWorld-Verified, mini scored 72.1%, nearly matching GPT-5.4’s 75.0%, and far ahead of GPT-5 mini's 42.0%. In terms of tool calling capabilities, GPT-5.4 mini scored 93.4% in τ2-bench telecom tests, a significant improvement over GPT-5 mini’s 74.1%. In the general intelligence GPQA Diamond test, mini scored 88.0%, and nano scored 82.8%, both surpassing GPT-5 mini’s 81.6%. Notably, GPT-5.4 nano lagged GPT-5 mini in some visual tasks, with an OSWorld-Verified score of 39.0%, lower than the latter's 42.0%. However, in programming and tool usage tasks, nano still showed clear improvement over the previous generation. OpenAI stated that nano’s design prioritizes low latency and low cost over comprehensive performance, so developers should balance the specific demands of their tasks when choosing a model. Sub-agent architecture, multi-model collaboration is the new product paradigm OpenAI emphasized the placement of these new models within multi-model layered systems in its release. Taking OpenAI's proprietary programming assistant Codex as an example, GPT-5.4 handles planning, coordination, and final judgment, while GPT-5.4 mini sub-agents simultaneously manage tasks such as codebase retrieval, large file review, and supporting documentation processing. OpenAI says that as small models become faster and more capable, developers no longer need to rely on a single model for all tasks, and can build systems where larger models make decisions and smaller models execute tasks quickly and at scale. OpenAI states: GPT-5.4 mini is the most powerful small model we have released for such workflows so far. This architecture is especially crucial for highly concurrent tasks, where response latency directly affects product experience in scenarios like coding assistants, screenshot parsing, and real-time image understanding. The optimal choice is often not the most capable model, but the one that achieves the best balance among speed, tool reliability, and task performance. For developers, the release of GPT-5.4 mini and nano means a clearer path to significantly reduce inference costs without sacrificing overall system intelligence. Risk Disclaimer The market has risks, and investments should be prudent. This article does not constitute personal investment advice and did not consider individual users' specific investment goals, financial situation or needs. Users should evaluate whether any opinions, views or conclusions in this article suit their particular circumstances. Investing accordingly is at your own risk.