Andrej Karpathy's annual review: AI large models are evolving into a new form of intelligence, with six key turning points emerging this year.

```

Andrej Karpathy, one of the founders of OpenAI and an AI luminary, recently released his annual review, stating that 2025 will be a year of booming development for large language models, witnessing six key “paradigm shift” turning points. These changes have not only transformed the industry landscape, but more importantly, revealed that LLMs are evolving into a brand new form of intelligence.

On December 20, according to Hard AI news, in his annual review posted on social platform X, Karpathy said that LLMs are evolving into a new kind of intelligence: “much smarter and also much dumber than I expected.”

He pointed out that this year saw six key paradigm-shifting turning points that changed the industry landscape. Among them, reinforcement learning with verifiable rewards (RLVR) became a new stage in the LLM production process, as major labs shifted computing power originally used for pretraining to longer-cycle reinforcement learning training.

He particularly emphasized the "jagged" characteristic of LLM intelligence, saying these models are both erudite geniuses and also resemble disorganized-thinking schoolchildren. Karpathy stated that LLMs are not “evolving animals” but "summoning ghosts." This totally new form of intelligence requires a different perspective to understand.

In his review, he noted that the evolution from pretraining to reinforcement learning, the user experience innovation from text interaction to graphical interfaces, and the widespread capabilities shifting from professional programming to “vibe coding” all mark a new development stage for AI applications.

Despite current capabilities already being extremely useful, Karpathy believes that the potential realized by LLMs across the industry is still less than 10%. He expects to see continuous and rapid progress, but the technical challenges remain daunting.

Turning point one: Reinforcement learning with verifiable rewards changes the training paradigm

The most important technical breakthrough in 2025 is that reinforcement learning with verifiable rewards (RLVR) has become a new stage in LLM training.

Karpathy says, Traditional production-level LLM training processes include three stages: pretraining, instruction fine-tuning, and reinforcement learning based on human feedback. The introduction of RLVR has completely changed this pattern.

RLVR trains LLMs in automatically verifiable environments such as math problems and code puzzles, enabling LLMs to spontaneously evolve reasoning-like strategies. The models learn to break down problems into intermediate computational steps and master various trial-and-error and deductive problem-solving methods. The DeepSeek R1 paper demonstrates these strategies in detail.

Unlike the relatively low compute requirements of SFT and RLHF, RLVR targets objective and cheat-proof reward functions, allowing for longer optimization cycles. This approach has an extremely high “capability/cost ratio,” consuming computing power that was previously used for pretraining. Most of the capability improvements in 2025 come from labs absorbing the “compute backlog” of this new stage.

OpenAI o1 was the first demonstration of an RLVR model, but it was the release of o3 that was the real turning point, allowing for an intuitive experience of the difference. RLVR also brings a totally new tuning knob: reasoning ability can be controlled by generating longer reasoning paths and increasing “thinking time.”

Turning point two: "Ghost intelligence" displays jagged performance characteristics

In 2025, the industry begins to truly understand the unique "form" of LLM intelligence.

Karpathy points out that we are not “evolving animals,” but “summoning ghosts.” LLMs’ neural architectures, training data, algorithms, and optimization pressures are completely different from biological intelligence, resulting in a brand new intelligent entity.

The human neural network is optimized for survival in jungle tribes, but LLMs' neural networks are optimized to imitate human text, win rewards from math problems, and get likes in LM Arena. This difference leads to “jagged” performance features in LLMs: they are both erudite geniuses and cognitively impaired schoolchildren, capable of solving hard problems one moment and being tripped up by simple prompts the next.

With RLVR being applied in verifiable domains, LLMs show “surges” in capability in such fields, but overall performance is highly uneven. This phenomenon causes Karpathy to lose trust in benchmarks, since benchmarking is inherently a verifiable environment and easily vulnerable to RLVR manipulation. Lab teams “game the leaderboard,” building environments around the test set—“test-set oriented training” becomes a new form of art.

Turning point three: Cursor leads the rise of a new generation of LLM application layer

The rise of Cursor is not only about its success, but also about how it revealed a new tier of “LLM applications.” People have begun discussing "Cursor versions" for various industries, marking the rise of vertical LLM applications.

LLM applications like Cursor encapsulate and orchestrate LLM calls for specific vertical fields, with four core features:

Handles “context engineering”; orchestrates multiple LLM calls at the back end into complex Directed Acyclic Graphs, balancing performance and cost; provides GUI interfaces tailored for specific human interactions; offers a “degree of autonomy” slider.

In 2025, a hot topic is how “thick” the new application layer will be: Will LLM labs dominate all applications, or will there be space for the development of vertical applications?

Karpathy believes LLM labs tend to train “college students” with general capabilities, while LLM applications organize these “students” into professionals in specific fields by providing private data, sensors, actuators, and feedback loops.

Turning point four: Claude Code pioneers a new paradigm of local AI agents

Claude Code has become the first convincing demonstration of an LLM agent, chaining tool calls and reasoning in a cyclic loop to solve problems over extended periods. More importantly, Claude Code runs on the user’s computer, using private environments, data, and context.

Karpathy believes OpenAI has lost its way in this regard, focusing too much on cloud containers and ChatGPT orchestration rather than local deployment. While cloud agent clusters seem like the “endgame for AGI,” in the current stage of uneven abilities, it makes more sense to run agents directly on computers and interact with developers’ specific configurations.

Claude Code nailed the right order of priorities, packaging it into a beautifully minimal command-line form, changing the face of AI. AI is no longer a website to passively access, but a “spirit” that “lives” on your computer. This local, personalized AI interaction paradigm points the way for the future, highlighting the importance of privacy protection and personalized experience.

Turning point five: Vibe Coding makes programming ubiquitous

In 2025, AI will cross a key capability threshold, enabling people to build complex programs using only English, completely ignoring the existence of underlying code. The popularity of the “Vibe Coding” concept marks a thorough lowering of the programming barrier.

Vibe Coding makes programming no longer the preserve of professionals—anyone can participate. This validates the LLM’s "power to the people" aspect: Unlike previous technologies, ordinary people benefit far more than professionals, companies, or governments. Not only can ordinary people try out programming, but professional developers can also create much more software than they would have otherwise built.

Karpathy shared his own practical experiences:

Building efficient BPE tokenizers in Rust, creating various fast demo apps, and even building entire temporary applications just to find a single bug. Code has become free, instant, plastic, used on demand and then discarded. This change will reshape the software ecosystem, change the definition of professions, and bring the cost of realizing ideas close to zero.

Turning point six: Nano Banana ushers in the era of LLM graphical interfaces

Google Gemini Nano Banana is called by Karpathy the most stunning and paradigm-shifting model of 2025. In his worldview, LLMs are the next major computing paradigm after computers of the 1970s-80s, set to bring innovations of similar historical significance.

Currently, conversing with LLMs is like inputting commands into computer terminals in the 1980s. Text, while the native data format for computers and LLMs, is not the human-preferred format. Humans dislike reading long texts—they prefer visual and spatial information consumption, which is why GUIs were invented in traditional computing.

LLMs should also interact in human-preferred formats: images, infographics, slides, whiteboards, animated videos, web apps, etc. Emoji and Markdown are early attempts, but real “LLM GUIs” require deeper innovation.

Nano Banana provides an early prototype of such possibilities—its distinctive feature is not just image generation, but the deep intertwining and fusion in model weights of text generation, image generation, and world knowledge.

This multimodal fusion ability signals a fundamental change in future AI interaction interfaces, shifting from pure text conversations to rich-media, multisensory immersive experiences.

This article comes from WeChat public account "Hard AI". For more AI frontier information, please go here.

Risk warning and disclaimerThe market has risks, and investment must be cautious. This article does not constitute individual investment advice, nor does it take into account the special investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investing accordingly is at your own risk. ```