The Fallacy of Agentic AI: Why Bolting Autonomy onto Amnesiacs Won't Work

A companion piece to "The Orchestration Fallacy"

4 January 2026 18 min read

The AI industry has found its next buzzword: agentic. After the generative AI hype cycle began its descent into Gartner's "trough of disillusionment," the marketing machine pivoted. The message now: LLMs aren't just conversational—they're autonomous. They don't just respond—they act. They don't just generate—they execute.

The money followed the narrative. Agentic AI investment surged 265% between Q4 2024 and Q1 2025. The AI agents market is projected to grow from $7.6 billion in 2025 to $47.1 billion by 2030. Startups raised $3.8 billion in 2024 alone—nearly tripling from the previous year.

And yet, by every meaningful measure, the approach is failing.

Gartner predicts that over 40% of agentic AI projects will be cancelled by end of 2027 due to escalating costs, unclear business value, or inadequate risk controls. Of thousands of vendors claiming agentic AI capabilities, only approximately 130 genuinely deliver such solutions. The rest engage in what analysts now call "agent washing"—rebranding existing products without substantial agentic capabilities.

This isn't a temporary growing pain. It's a fundamental architectural mistake. We're trying to bolt autonomy onto systems that lack the cognitive prerequisites for autonomy. We're building elaborate prosthetics for minds that don't exist.

The Core Problem: Autonomy Without Cognition

In my previous piece, "The Orchestration Fallacy," I argued that orchestration platforms like n8n and Zapier are prosthetic scaffolding for cognitively incomplete AI. They exist because current AI systems lack persistent memory, belief systems, and goal-directed behaviour.

Agentic AI makes the same mistake at a different layer. Instead of admitting that LLMs lack the prerequisites for genuine autonomy, the industry bolted on "tool use" and "function calling" and declared the problem solved. The agent loop—observe, plan, act, repeat—became the new paradigm.

But a loop isn't a mind. And execution isn't intelligence.

What Agents Actually Need (But Don't Have)

Yann LeCun, Meta's chief AI scientist and Turing Award winner, has been characteristically blunt about this. In lectures throughout 2024, he identified the key characteristics of intelligent behaviour that LLMs lack: "the capacity to understand the world, understand the physical world, the ability to remember and retrieve things, persistent memory, the ability to reason and the ability to plan."

He went further in 2024-2025, telling PhD students there's "no point working on LLMs as they currently stand" if you're interested in human-level intelligence. He's called them "basically an off-ramp, a distraction, a dead end."

LeCun's alternative—the "world model" paradigm—aims to build AI that internally simulates the physical world and predicts how it changes over time. Whether his specific approach succeeds is beside the point. What matters is his diagnosis: LLMs are word models, not world models. They predict tokens, not reality.

Agentic AI tries to solve this by wrapping word models in action loops. It's like giving a brilliant amnesiac a to-do list and calling them autonomous.

The Compounding Error Catastrophe

Here's where the mathematics become brutal.

Research from multiple sources has converged on a fundamental problem: if each step in an agentic workflow has a 90% success rate, a 5-step workflow drops to 59% overall success. A 10-step workflow drops to 35%.

It gets worse. As VentureBeat reported, an agent with just a 1% error rate per step compounds to a 63% chance of failure by the hundredth step.

DeepMind's Demis Hassabis has called this "compound interest in reverse." Gary Marcus, the cognitive scientist and persistent AI critic, quoted Hassabis: "If your AI model has a 1% error rate and you plan over 5,000 steps, that 1% compounds like compound interest."

The data from actual deployments confirms this. Analysis of agent trajectories reveals that 73% of task failures stem from cascading errors, where a single root cause triggers multiple downstream failures. The average failed trajectory contains 3.7 compounded errors.

This isn't a bug to be fixed with better prompting. It's an inherent property of systems that:

Don't verify intermediate steps — LLMs can't reliably self-evaluate
Don't maintain coherent state — Each step effectively starts fresh
Don't learn from failures — The same mistakes repeat indefinitely
Can't recover gracefully — No mechanism for backtracking or replanning

The Stanford-Harvard Paper: Agents Don't Fail Because They Lack Intelligence

A December 2024 paper from researchers at Stanford, Princeton, Harvard, and the University of Washington offers the most damning analysis yet. Titled "Adaptation of Agentic AI", it examines why agentic systems "feel impressive in demos and then completely fall apart in real use."

Their core finding: agents don't fail because they lack intelligence. They fail because they don't adapt.

The paper documents that modern agents are "incredible when everything goes smoothly, and astonishingly brittle the moment the real world gets noisy." APIs fail. Search results drift. Code throws exceptions. Memory gets cluttered. Long-running tasks fail under compounding errors.

The researchers identify a fundamental architectural flaw: most agents are built to execute plans, not revise them. They assume the world stays stable, tools work as expected, and goals remain valid. Once any of that changes, the agent keeps going anyway—"confidently making the wrong move over and over."

Key insights from the paper:

Rigid tool use is a hidden failure mode: Agents that treat tools as fixed options get stuck, while agents that can re-rank, abandon, or switch tools based on feedback perform far better.
Memory beats raw reasoning: Agents that store short, structured lessons from past successes and failures outperform agents that rely on longer chains of reasoning.
Execution without adaptation is just automation with better marketing.

The WebArena benchmark confirms this brittleness empirically. Even when tasks are derived from the same underlying template, GPT-4 succeeds consistently on only 4 out of 61 templates. Similar brittleness appears across other benchmarks—Mind2Web and BrowserGym both show sharp performance drops when environments shift slightly.

The Hallucination Problem Metastasizes

If you thought hallucination was bad for single-turn LLM interactions, it's catastrophic for agentic systems.

A comprehensive September 2025 survey titled "LLM-based Agents Suffer from Hallucinations" offers the first taxonomy of agent-specific hallucinations. The authors note that "despite their remarkable potential, LLM-based agents remain vulnerable to hallucination issues, which can result in erroneous task execution and undermine the reliability of the overall system design."

The key finding: hallucinations accumulate and amplify over time in agentic systems. "Hallucinations may initially appear as minor issues, but their iterative accumulation can ultimately lead to severe consequences."

The paper identifies multiple types of agent hallucination:

Memory hallucination: The agent "remembers" things that never happened
Execution hallucination: The agent confidently invokes outdated or invalid tools while believing execution succeeded
Planning hallucination: The agent constructs plans that require nonexistent capabilities

Research from Nature Scientific Reports on spatial reasoning in LLMs demonstrates this concretely. When tasked with path planning in maze environments, LLMs "often imagined a non-existent path to cross obstacles and reach the destination in a straight line." This isn't a reasoning error—it's a fabrication presented with confidence.

The problem compounds in agentic loops. A 2024 paper on TravelPlanner benchmarks found that "giving more shots in the context window may distract the LLM and lead to hallucination (e.g., using entities that do not exist in the given reference information)." More context, more confusion.

The Stochastic Parrot Redux

The "stochastic parrot" critique has become polarizing, but its core insight remains valid. The term was coined in the landmark 2021 paper "On the Dangers of Stochastic Parrots" by Emily Bender, Timnit Gebru, and colleagues. They argued that large language models generate realistic-sounding language but "do not truly understand the meaning of the language they are processing."

The debate has evolved since then. Geoffrey Hinton argues that "to predict the next word accurately, you have to understand the sentence." Some benchmarks suggest GPT-4 exhibits compositional generalization beyond mere pattern matching.

But here's what matters for agentic AI: even if LLMs do exhibit genuine understanding, they lack the other cognitive prerequisites for autonomy.

A 2025 study in arXiv titled "The Stochastic Parrot on LLM's Shoulder" tested state-of-the-art models including GPT-4o, o1, and Gemini 2.0 on physical concept understanding. The results: LLMs lag behind humans by approximately 40% on these tasks. The models "fail on grid tasks but can describe and recognize the same concepts well in natural language."

This is the essential gap: LLMs can talk about the world far better than they can operate in it. Agentic AI assumes the gap doesn't matter—that reasoning about actions is sufficient for taking actions. The evidence says otherwise.

The ROI Catastrophe

Let's talk money.

An August 2025 MIT Media Lab report stated that "despite $30–40 billion in enterprise investment into Generative AI, 95% of organizations are getting zero return."

The broader enterprise AI statistics are equally grim. According to S&P Global data, the share of companies abandoning most of their AI projects jumped to 42% in 2025—from just 17% the year prior—with cost and unclear value cited as top reasons.

The average organization abandoned 46% of AI proof-of-concepts before they reached production. An equally brutal statistic: 88% of AI pilots never make it to production, meaning only about 1 in 8 prototypes becomes an operational capability.

IBM's Institute for Business Value found that enterprise-wide AI initiatives achieved an ROI of just 5.9%, while those same AI projects incurred a 10% capital investment. In other words: they're losing money.

This isn't a failure of implementation. It's a failure of premise.

The Bubble Acknowledged

Even the industry is starting to admit the problem.

OpenAI CEO Sam Altman acknowledged in August 2025: "Are we in a phase where investors as a whole are overexcited about AI? My opinion is yes." He compared the current moment to the dot-com bubble, noting "when bubbles happen, smart people get overexcited about a kernel of truth."

MIT economist Daron Acemoglu, who received the 2024 Nobel Memorial Prize in Economic Sciences, was more direct: "These models are being hyped up, and we're investing more than we should. I have no doubt that there will be AI technologies that will come out in the next ten years that will add real value and add to productivity, but much of what we hear from the industry now is exaggeration."

The Bank of England warned of growing risks of a global market correction due to possible overvaluation of leading AI tech firms. OpenAI more than tripled its value from $157 billion in October 2024 to $500 billion the following year—while accumulating projected cumulative losses of $44 billion by 2028.

Goldman Sachs analysts found that hyperscaler companies have taken on $121 billion in debt over the past year—more than 300% above typical levels. The financial engineering is as aggressive as the marketing.

Gary Marcus Was Right (Again)

Gary Marcus, the cognitive scientist who has been AI's most persistent critic, made this prediction on January 1, 2025: "AI 'Agents' will be endlessly hyped throughout 2025 but far from reliable, except possibly in very narrow use cases."

He was right.

Marcus cites the CMU benchmark AgentCompany, which showed failure rates of 70% on some tasks. His conclusion: "a lot of what's being sold as 'agentic AI' is just old automation, wrapped in new language and a nicer UI."

He also quotes MIT Professor Armando Solar-Lezama on AI coding agents: they're "like a brand new credit card here that is going to allow us to accumulate technical debt in ways we were never able to do before."

The agent paradigm isn't solving the reliability problem. It's amplifying it.

The Fundamental Mistake

Here's the uncomfortable truth that the industry doesn't want to acknowledge:

You cannot achieve genuine autonomy by wrapping a stateless reasoning engine in an action loop.

Autonomy requires:

Persistent memory — Not a vector database bolted on, but epistemically-grounded memory that tracks confidence, provenance, and truth categories
Belief systems — The ability to hold views about the world, update them rationally when evidence demands, and maintain consistency
Goal commitment — Pursuing objectives across sessions, not just responding to prompts
Continuous learning — Actually improving from experience, not just executing predefined patterns
Adaptation — Detecting when reality diverges from assumptions and responding intelligently

Current agentic systems have none of these. They have tool-calling interfaces and retry loops. They have prompt engineering and context windows. They have "memory" that's really just retrieval-augmented generation—no understanding of what's true, what's uncertain, what should be revised.

The industry is treating autonomy as a wrapper layer when it's actually a cognitive architecture problem.

What Would Actually Work

If the agentic paradigm is a dead end, what's the alternative?

The theoretical foundations already exist. They've existed for decades:

BDI (Belief-Desire-Intention) architectures from agent theory in the 1990s
AGM belief revision from the 1980s—formal frameworks for rational belief update
Stigmergic coordination from swarm intelligence—agents that coordinate through shared memory traces rather than central orchestration
Commitment strategies that determine when to persist with goals and when to abandon them

These aren't science fiction. They're established AI theory that the LLM hype cycle caused the industry to forget.

The path forward isn't better prompting or more sophisticated tool use. It's giving AI systems genuine cognitive capabilities:

Memory systems that constitute beliefs, not just store data
Truth categorization — distinguishing facts from opinions from inferences
Confidence tracking — knowing what you don't know
Goal persistence across sessions with rational abandonment conditions
Learning from outcomes — not just fine-tuning, but genuine belief revision

This is what I've been working on with Simplex and the Mnemonic Hive architecture. Not because I think I've solved the problem, but because I think the industry is solving the wrong problem.

The Investment Thesis That Should Worry You

If you're betting on agentic AI as currently implemented, you're making three implicit assumptions:

That execution loops can compensate for the lack of genuine cognition
That compounding errors are an engineering problem, not an architectural one
That autonomy is a feature you can add, rather than a capability that emerges from cognitive architecture

All three assumptions are wrong.

The winning bet isn't on better agent frameworks. It's on the cognitive infrastructure that makes agent frameworks unnecessary—systems that don't need external orchestration because they have internal cognitive architecture.

This isn't a 2035 problem. It's happening now. The 40% cancellation rate Gartner predicts isn't because the implementations were bad. It's because the paradigm is wrong.

Conclusion: The Fallacy

The agentic AI fallacy is the belief that you can achieve autonomy by bolting action loops onto systems that lack the prerequisites for autonomous behavior.

It's the belief that tool calling is equivalent to goal pursuit.

It's the belief that retrieval-augmented generation is equivalent to memory.

It's the belief that bigger context windows are equivalent to understanding.

It's the belief that better prompting is equivalent to reasoning.

The industry has spent billions on this belief. The ROI data says it's not working. The academic research says it can't work—not in its current form.

The alternative isn't giving up on AI autonomy. It's recognizing that autonomy requires cognitive architecture, not wrapper layers. It requires systems that can remember, believe, learn, and commit—not just execute, retry, and hallucinate.

The fallacy isn't wanting autonomous AI. The fallacy is thinking we already have the foundations for it.

We don't. And the sooner we admit that, the sooner we can build something that actually works.

Sources

This is a companion piece to "The Orchestration Fallacy." I work on cognitive architectures for AI systems, including persistent memory systems and autonomous agent frameworks at Senuamedia.