Existing path not working? OpenAI and Amazon consider changing large model training methods

Existing path not working? OpenAI and Amazon consider changing large model training methods

As competition in the field of artificial intelligence enters deeper waters, top industry researchers are beginning to question the current model training paradigm.

Researchers from OpenAI, Thinking Machines Lab, and Amazon are exploring a fundamental shift: abandoning the standard "pre-training followed by post-training" workflow and adopting a training model that introduces selected data for specific tasks earlier in the process to address inefficiencies and flaws such as the "brain split problem" in current models.

This potential shift is strongly advocated by Amazon's David Luan and others. The core idea is that the current universal training path—imparting broad world knowledge to a model (such as poetry or gardening) before fine-tuning it for specific tasks (like coding or customer refunds)—is not always logically sound. Researchers believe that if the model’s final usage is already determined, then task-relevant selected data should be introduced during the pre-training stage to more directly serve the end goal.

If this methodological change is put into practice, it will profoundly reshape the development landscape of the AI industry. This means that development teams might no longer need to artificially separate pre-training and post-training stages and signals a market shift from "one universal model for all scenarios" to "dedicated models built on different datasets." This shift will force developers to conduct stricter data selection in the initial stages of training, thereby determining the model's strengths and weaknesses in specific fields.

Signs of such differentiation are already emerging in the market. OpenAI is currently routing ChatGPT queries to different models, and has developed dedicated models like GPT-5-Codex. This strategy reflects the stark contrast between consumer demand for simple chatbots and companies’ pursuit of superintelligence and advanced scientific research (such as Mars colonization or disease treatment). If this path deepens further, OpenAI may have to completely restructure its research teams to accommodate entirely different model training needs.

Reshaping Training Logic: Eliminate General Redundancy

Current AI training norms somewhat mimic the human learning process, i.e., accumulating broad foundational knowledge in childhood, then learning specific skills. However, the industry is beginning to reflect on the efficiency of this process. David Luan points out that it is a waste of resources for a model intended for code or customer service to spend vast computational power learning entirely unrelated topics (such as poetry or gardening).

This "wide net" approach to pre-training may seem intuitive but leads to technical bottlenecks, such as the "brain split problem," where a model may give wrong answers simply due to phrasing differences. The new mindset advocates using the pre-training process to focus on selected data more relevant to predetermined tasks. Researchers at OpenAI and Thinking Machines Lab agree, with some even suggesting the elimination of separate teams for different training stages, merging staff into a unified training team for better targeting.

The Rise of Dedicated Models and Organizational Restructuring

This transformation will have far-reaching effects on the final form of AI models. Researchers must decide what data to incorporate in the early stages of training, directly determining the model's capabilities. For example, adding more math and coding data while reducing prose data during early training might create an outstanding programming assistant but would sacrifice its abilities in creative writing or emotional engagement.

This will result in a future AI market that no longer relies on patching the same pre-trained model, but instead produces a multitude of dedicated models built on different foundational datasets. According to internal information from OpenAI, the company is already aware of this differentiated need. On one hand, consumers want ChatGPT to answer simple questions and act as a chat companion; on the other, the company is committed to frontline research in inference models and superintelligence.

At present, while all models at OpenAI are still based on the same pre-trained model, it has already responded to this complexity with routing technology and specific versions (such as GPT-5-Codex). If the future calls for completely independent models trained for different purposes, it will require the company to thoroughly restructure its research teams.

Hardware Breakthroughs and Capital Investment

While software training models are undergoing transformation, hardware innovation is accelerating as well, with capital closely watching new technologies that can improve energy efficiency. Photonic chip startup Neurophos has just completed a $110 million Series A round led by Gates Frontier (Bill Gates’ venture arm), with Microsoft’s VC firm M12 also participating.

Neurophos is focused on designing chips that use light instead of electrons to perform AI math computations. According to co-founder and CEO Patrick Bowen, the goal is to deliver a chip by 2028 whose speed and efficiency will be 50 times greater than Nvidia's Blackwell chip. Microsoft executive Marc Tremblay said that modern AI inference places huge demands on power and computing resources, and the industry needs breakthroughs at the computing level.

Meanwhile, OpenAI is also ramping up its infrastructure. CFO Sarah Friar revealed at the World Economic Forum that the company's custom inference chip is now in the "tape-out" stage, the final step before manufacturing. She also noted that the $500 billion-plus Stargate infrastructure project announced last year is now more than halfway completed, and "progress is beyond imagination", with models already being trained on Oracle's Stargate campus servers.

Industry Consolidation and Competitive Dynamics

M&A and financing activities in the AI sector remain brisk. According to The Information, software firm Lightning AI, which customizes AI models, merged with data center provider Voltage Park, with the new entity valued at over $2.5 billion. Additionally, Yelp has agreed to acquire AI agent startup Hatch for $300 million. Google DeepMind has brought on Hume AI’s CEO and several top engineers through a licensing agreement with the voice AI startup.

At the tech giant level, Bloomberg reports that Apple is in talks with Google to use its cloud infrastructure and TPU chips to roll out a new version of Siri, with plans to introduce AI-powered wearable devices as early as 2027. Nvidia CEO Jensen Huang is reportedly preparing to visit China to try to reestablish a foothold in this strategic market.

On the regulatory and ethics front, Anthropic has released a new "constitution" for its Claude model, which, compared to the original 2023 version, is less prescriptive, gives the model more judgment space, and unusually mentions the possibility of the model possessing some kind of "consciousness" or "moral status." The White House Council of Economic Advisers has issued a report predicting that generative AI will trigger a profound transformation of the U.S. economy and is poised to significantly boost productivity and growth.

Risk Warning and DisclaimerThe market carries risks; investments require caution. This article does not constitute personal investment advice, nor does it take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their individual circumstances. Investing on this basis is at your own risk.