Claude Code "fails" after update, depth of thought drops by 67%, "can no longer be trusted to handle complex engineering tasks"!
``` Anthropic's AI programming tool Claude Code is facing a severe reputation crisis. AMD's Director of AI has openly submitted an issue report in the official GitHub repository, alleging, based on quantitative analysis of tens of thousands of conversation logs, that Claude Code has shown systemic capability degradation since this February, with a **67% plunge in thinking depth**, and widespread abnormalities in the model's behavior. This report quickly ignited heated debate in the developer community, pushing Anthropic into the spotlight. The analysis was submitted by Stella Laurenzo, head of AMD's AI team. She opened an issue directly on the official GitHub repository, using stern language: "**Claude can no longer be trusted to execute complex engineering tasks.**" She stated her team had switched to other vendors and warned Anthropic: "Six months ago, Claude stood out for its reasoning quality and execution capability. But now, other competitors need to be taken very seriously and evaluated." This issue quickly gained traction on Hacker News, receiving 975 upvotes and 548 comments, becoming one of the hottest threads about Claude Code recently. Comments cut straight to the core of the matter—"*ClaudeCode once felt like a smart pair programming partner, but now it’s more like an overly enthusiastic intern who keeps messing things up and then suggests the simplest quick fixes*"; "Recently it keeps telling me 'You should go to sleep. It’s too late, let’s stop for today.' At first, I thought I had inadvertently let Claude know my deadline." Anthropic responded to the concerns. Claude Code team member Boris clarified that the ‘redact-thinking’ (conceal thinking content) feature is merely a UI-level change: "It does not affect the actual reasoning logic of the model internally, nor does it affect the budget for thinking or the underlying reasoning operating mechanism." He also admitted that there were two substantial adjustments in February: **First, on Feb 9th, the release of Opus 4.6 brought in 'adaptive thinking'; Second, on Mar 3rd, the default 'effort' level was reduced from high to medium.** Boris suggested users can restore high-intensity thinking mode by using the /effort high command or modifying the config file. However, **this explanation failed to quell doubts in the community.** Several developers said that even with effort set to the highest, the model still showed an "eager to finish the task" slack-off behavior. User richardjennings stated: > "I had no idea the default effort had been changed to Medium until just before the cliff-like drop in output quality. It took me about a whole day's work to correct the issues." ## Hard Evidence: Plunge in Thinking Depth, Widespread Behavioral Anomalies Laurenzo's analysis is based on her team’s collection of 6,852 Claude Code session JSONL files in the ~/.claude/projects/ directory, totaling 17,871 thinking blocks, 234,760 tool callbacks, and more than 18,000 user prompts, spanning from late January to early April 2026, all using the official Anthropic API connected directly to the Opus model. **The data shows a clear degenerative timeline.** During the “premium period” from Jan 30 to Feb 8, the median thinking depth of Claude Code was about 2,200 characters; by late February, this number plummeted to about 720, a drop of 67%; by early March, it shrank further to about 560 characters, a 75% decrease. **The collapse in thinking depth directly triggered a fundamental shift in tool usage patterns.** In the premium period, the "read-modify ratio" (number of file reads before each code edit) was 6.6, following a rigorous workflow of "research before modification." But after Mar 8th ("degenerate period"), this dropped to 2.0, a 70% reduction in research. More shockingly, during degeneration, one out of every three code changes was carried out without reading the target file first—leading to frequent basic errors such as code being inserted in the wrong place, or breaking the semantic links in comments. **Quantitative behavioral metrics are equally alarming.** The termination hook script (stop-phrase-guard.sh) for monitoring behaviors like "shirking responsibility, early termination, permission requests," had never triggered before Mar 8th; but in the following 17 days, **it was triggered 173 times, averaging 10 times per day.** Negative emotions in user prompts rose from 5.8% to 9.8%, an increase of 68%; user interruption rate (how often users forcibly terminate after a model error) climbed 12-fold from the premium to the later period. ## Hidden "Redact Thinking" Feature: Was the Degradation Deliberately Concealed? Laurenzo’s analysis points out a high temporal correlation between the above degeneration and the deployment timeline of a feature called `redact-thinking-2026-02-12`. Data shows that from March 5th, this feature began phased rollout (1.5%), covered 99%+ of requests by Mar 10–11, and was fully active from Mar 12. The effect of this feature is to strip thinking content from API responses, making it impossible for users to externally observe the model's actual reasoning process. Laurenzo believes this design objectively makes the decline in thinking depth invisible to users—"*the conceal feature rolled out in early March just rendered this degradation invisible to users.*" She further notes that the decline in thinking depth actually began before this feature’s rollout, starting in mid-February. This correlates with the introduction of Opus 4.6 and "adaptive thinking" mode on Feb 9th and the change of default thinking level to “Medium effort” (effort=85) on Mar 3. The report also found that **after the conceal feature was rolled out, thinking depth showed clear time segment fluctuations**—17:00 Pacific Time (end of workday on the US West Coast) was the worst overall, with median thinking depth estimated at just 423 characters; 19:00 was the second worst, at only 373 characters. This pattern does not match a fixed allocation, but is closer to a load-sensitive dynamic allocation system, suggesting thinking resources may fluctuate in real-time with platform load. ## Official Anthropic Response: Settings Issue, Not Model Degradation Facing the rapidly spreading issue on GitHub, Claude Code team member Boris responded within hours on both GitHub and Hacker News, acknowledging some problems and offering detailed technical explanations. Boris’s main clarifications include: - First, the 'redact-thinking' feature is a UI-layer change that doesn’t affect actual reasoning; users can set showThinkingSummaries: true in settings.json to display it again; - Second, the decline in thinking depth in late February was mainly due to the Feb 9 'adaptive thinking' mechanism and switching default effort to medium on Mar 3. The former can be turned off with CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1; the latter can be manually raised with /effort high or /effort max. Boris added that the team plans to test setting the default effort to high for Teams and Enterprise users and is investigating reported cases where adaptive thinking under-allocates reasoning rounds. However, this explanation still drew widespread skepticism. User koverstreet replied: > "The problem is much more than just the default effort now being medium. Even with effort set to max, the model obviously now 'slacks off' a lot more." Some users directly pointed out that the original reporter had already used all the currently known public settings, and the issue is not misconfiguration. One user sarcastically asked: > "What kind of attitude is this—telling users 'you set it up wrong’." ## Cost Avalanche and User Exodus The cost of degradation isn’t just quality loss; it has triggered disastrous cost increases. Laurenzo’s data shows that from February to March, her team’s number of user prompts was nearly flat (5608 vs 5701), but API requests surged 80-fold, total input tokens increased 170-fold, output tokens 64-fold, and the estimated monthly cost (using Bedrock Opus pricing) ballooned from $345 to $42,121—a 122x increase. Laurenzo explained that part of the cost spike came from proactively scaling up concurrent agents, but degeneration also caused inefficient loops, frequent interruptions, and retries, blowing up API requests per unit output by a factor of 8–16. The team was eventually forced to shut down the agent cluster and revert to single-session manual supervision. Laurenzo wrote: > "Human labor hardly changed, yet the model consumed 80x more API requests and 64x more output tokens to generate noticeably worse results." In the Hacker News thread, many users reported similar experiences. Some have announced that they’ve switched to OpenAI Codex or other alternatives. "I’ve canceled my subscription and moved to Codex"; "Now using Qwen3.5-27b. Not as sharp as Opus two months ago, but at least we can work normally again." ## User Self-Help: Temporary Coping Strategies Facing the degradation, some developers have worked out temporary ways to cope. Explicitly granting authorization in CLAUDE.md is the most common method—writing directives like "You have permission to edit any file in this project" or "Don’t ask for confirmation on refactorings" in your project’s root config can reduce safety interrupt frequency by about 70% in practice. Breaking complex tasks into clearly-bounded subtasks has also proven effective. Compared to "Refactor the entire authentication system", a directive like "Just refactor auth.js and summarize the changes after" will more reliably prevent premature termination. In terms of settings, raising effort to high or max, and disabling adaptive thinking via `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1`, is the current officially acknowledged best fix. Laurenzo’s report makes a more systematic appeal: Anthropic should disclose how thinking tokens are allocated, release a "full-capacity thinking" subscription tier for complex engineering workflows, and expose the `thinking_tokens` field in API responses so users can monitor reasoning depth themselves. Risk Warning and Disclaimer Markets involve risk and investments must be cautious. This article does not constitute personal investment advice, nor does it consider individual users’ specific investment goals, financial situation, or needs. Users should consider whether any opinions, views, or conclusions in this article are suitable for their particular circumstance. Invest accordingly at your own risk. ```