AI Prediction Authority: I still underestimated the speed of AI. Achieving "AI research and development automation" by the end of this year is really possible.
```
The speed at which AI capabilities are leaping forward is catching even the most rigorous forecasters off guard.
Renowned AI forecasting researcher Ajeya Cotra recently publicly admitted that her prediction about AI progress for 2026, released just two months ago, has already become notably conservative. Triggering this self-revision was the performance of Anthropic’s latest model, Claude Opus 4.6, on the authoritative METR benchmark, where the model’s software engineering "time span" reached about 12 hours—far surpassing Cotra’s previous prediction of roughly 24 hours by the end of 2026. This means that AI’s actual progress in software engineering is nearly ten months ahead of her forecast.
Even more striking, Cotra has consequently raised her probability estimates for "full automation of AI research and development." She maintains a 10% probability that by the end of this year, AI will fully take over research ideation and execution, requiring no human intervention, and clearly states: "This is the first time I cannot find any stable trend to extrapolate, to assert that this won’t happen soon." This remark has attracted widespread attention within AI prediction circles.
Cotra once served as the AI Safety Research Funding Lead at Coefficient Giving—one of the world’s largest AI safety funding institutions—and now works for METR, an organization focused on evaluating AI capabilities.
Prediction Miss: Judgments from Two Months Ago are Already Outdated
On January 14th this year, Cotra projected—based on the historical trend of roughly less than doubling per year between 2019 and 2025—that by the end of 2026, the top model’s 50% success rate "time span" would be around 24 hours, with her 80th percentile estimate at 40 hours.
However, just about two months after she published her forecast, Opus 4.6 was assessed as having about a 12-hour time span. In the METR test suite, among 19 software engineering tasks estimated to require over 8 hours of human effort, Opus 4.6 could at least partially complete 14 of them, and consistently solve 4. Cotra admitted that with a full ten months of progress still ahead, AI agents failing half the time at 24-hour tasks was "no longer believable."
It is worth noting that Cotra also cautioned that current time span estimations contain significant uncertainty—the 95% confidence interval for Opus 4.6 is between 5.3 hours and 66 hours—partly because long tasks are rare, human completion times are often estimates, and the benchmark itself is nearly saturated.
Capability Boundaries: Traditional Evaluation Frameworks are Breaking Down
As AI agents approach and even surpass task scales spanning several dozen hours, Cotra believes the very concept of “time span” is being challenged.
She notes that as scale increases, the decomposability of tasks improves significantly: A one-hour debugging task is nearly impossible to break up for parallel work, a one-day development task can barely be divided up but with fuzzy boundaries, while month-long projects are naturally suited for decomposition into multiple parallel subtasks. Once AI agents can stably complete 80-hour tasks, in theory, a "management level AI" can assign tasks while "execution level AI" pushes them forward in parallel, continuously advancing any scale of project.
Cotra’s colleague Tom therefore suggested measuring “intrinsic difficulty” by calendar time needed for a large team to complete a task, rather than single-person work hours. Cotra believes that as AI reaches this new scale, the “single-person time” metric may begin to show super-exponential growth, making it extremely difficult to estimate the software engineering capability upper limit by year’s end.
She also acknowledges that large-scale task decomposition won’t work perfectly in practice—participants’ intuitive grasp of project backgrounds is hard to completely replace with Jira tickets or Asana tasks. However, she believes that for quite a large class of software projects, this mode "may be surprisingly effective."
Critical Node: AI Research Automation May Become Reality This Year
Among all the predictions, the most watched is Cotra’s estimate for “full automation of AI research and development.”
She defines this probability as: AI systems fully undertaking research ideation and execution, requiring no human involvement. In her January forecast, she gave a 10% probability and received feedback from several AI prediction colleagues who thought this number was too high. But after Opus 4.6’s performance came out, she said 10% “again feels in the reasonable range.”
Cotra remains cautious. She points out that fully automated AI R&D requires not only software engineering capabilities, but also breakthroughs in “research judgment” and “creativity”—areas where current AI systems still obviously lag behind human researchers. She considers that the possibility of this goal being achieved in the next three to five years is much higher than within this year.
But her tone has fundamentally shifted: “This is the first time I cannot find any stable trend to extrapolate, to claim it won’t happen soon.”
Risk Warning and DisclaimerThe market is risky, and investment requires caution. This article does not constitute personal investment advice, nor does it take into account the special investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article fit their particular circumstances. Investments made accordingly are at one’s own risk. ```