Back to Articles

Beyond Recursive Language Models

Why Simplex Will Never Need RLM

A new paper from Google DeepMind introduces Recursive Language Models (RLM)—an inference-time strategy for handling inputs 100x longer than a model's context window. It's clever engineering. It's also a solution to a problem that Simplex's architecture has already solved through fundamentally different means.

What RLM Does

The RLM paper presents an approach where an LLM treats its input prompt as an "environmental object" it can programmatically examine. Instead of feeding 10 million tokens directly into the model, RLM:

  1. Loads the context as a Python variable
  2. Writes code to chunk, filter, and examine segments
  3. Makes recursive LLM calls on relevant portions
  4. Aggregates results and returns an answer

The results are impressive: RLM handles inputs far exceeding native context windows while maintaining quality and often reducing cost compared to context condensation approaches.

But here's the thing: RLM is a workaround for a fundamental architectural limitation. It doesn't solve the context problem—it routes around it with clever infrastructure.

The Context Window Trap

The entire RLM approach assumes a specific paradigm:

"We have a giant model with a fixed context window. How do we feed it more than it can handle?"

This framing reveals the limitation: LLMs are stateless. Every interaction requires re-sending all relevant context. The model has no memory between calls. It cannot learn from experience within a session. Each invocation starts from zero.

RLM's solution? Make the model call itself recursively, decomposing context into digestible chunks. It's turtles all the way down—except the turtles are expensive API calls.

Simplex: Memory Over Context

Simplex's Cognitive Hive AI (CHAI) architecture doesn't have this problem because it doesn't rely on context windows for continuity. Instead, it uses persistent, structured memory.

The Anima: A Cognitive Core

Every specialist in Simplex has an Anima—a cognitive core with four memory types:

Memory Type Purpose RLM Equivalent
Episodic Specific experiences with timestamps and context None—RLM processes, then forgets
Semantic Generalized facts extracted from experience Must be re-derived each time
Procedural Learned skills and patterns Hardcoded in Python scaffolding
Working Active context for current task The context window itself

The Anima remembers. It doesn't need 10 million tokens in context because it has already processed, distilled, and stored what matters. When a new task arrives, relevant memories are recalled—not the raw data, but the understanding derived from it.

specialist DocumentAnalyst {
    anima: {
        // Remembers past analyses, not raw documents
        episodic: { capacity: 500 },
        // Knows document patterns, not document contents
        semantic: { capacity: 2000 },
        // Beliefs with confidence scores
        beliefs: { revision_threshold: 30 },
    },

    fn analyze(doc: Document) -> Analysis {
        // Recall relevant past experiences
        let similar = self.anima.recall_similar(doc.embedding, limit: 5)

        // Use beliefs to guide analysis
        let approach = self.anima.beliefs.get("analysis_strategy")

        // Process with context from memory, not raw context
        let result = infer(doc, context: similar, strategy: approach)

        // Remember this experience for next time
        self.anima.remember(doc.summary, result)

        result
    }
}

HiveMnemonic: Shared Consciousness

Beyond individual memory, Simplex hives share a collective consciousness through HiveMnemonic. When any specialist learns something valuable, it can be promoted to shared memory—available to all specialists in the hive without re-processing.

RLM approach:

Each recursive call processes chunks independently. Knowledge gained in one branch doesn't help other branches. The LLM forgets between calls.

Simplex approach:

One specialist discovers a pattern. It writes to HiveMnemonic. Every other specialist instantly benefits. The hive learns.

Why RLM Is Solving the Wrong Problem

1. Context Windows Are a Symptom, Not the Disease

The "long context problem" exists because LLMs are designed as stateless text processors. RLM accepts this limitation and builds infrastructure around it. Simplex rejects the premise entirely.

Humans don't process every conversation by replaying their entire life history. We have memory. We have beliefs formed from experience. We recall relevant context, not all context.

The Anima operates the same way. A 10-million-token document? Process it once, extract beliefs and memories, discard the raw tokens. Next time you need that knowledge, you don't reprocess—you remember.

2. Recursive Decomposition Is Expensive

RLM makes multiple LLM calls per query. The paper shows this can sometimes be cheaper than context condensation, but it's still expensive. Each recursive call is a full inference pass through a large model.

Simplex uses small, specialized models. The per-hive SLM (simplex-cognitive-7b) is 7 billion parameters, not 70 billion. Memory lookup is a vector similarity search—milliseconds, not seconds. No recursion required.

Operation RLM Cost Simplex Cost
Process 10M token document Multiple GPT-4 calls (~$5-50) One-time ingestion, then memory (~$0.10)
Query same document again Full recursive process again Memory recall (~$0.001)
Share knowledge with other agents Each agent processes independently HiveMnemonic: instant, free

3. Python Scaffolding Is Fragile

RLM has the model write Python code to decompose and filter context. This is clever but introduces failure modes:

  • Code generation errors
  • Infinite recursion risks
  • Security concerns (executing model-generated code)
  • Debugging complexity

Simplex's memory operations are first-class language constructs, statically typed and verified at compile time:

// Type-safe memory operations
let memories: List<Memory> = self.anima.recall_for(query, limit: 10)

// Compiler ensures correct usage
for memory in memories {
    match memory.confidence {
        c if c > 0.8 => use_directly(memory),
        c if c > 0.5 => verify_first(memory),
        _ => skip(memory),
    }
}

4. The Model Still Doesn't Learn

After RLM processes a document, what has it learned? Nothing. The model's weights are unchanged. The next query starts fresh.

The Anima's belief system is fundamentally different:

// Process contradictory information
self.anima.observe("deadline is Friday", confidence: 0.9)
// Later...
self.anima.observe("deadline moved to Monday", confidence: 0.95)

// Belief automatically revises based on evidence strength
let deadline = self.anima.beliefs.get("deadline")
// Returns "Monday" with confidence 0.95

The Anima updates its beliefs. It revises based on new evidence. It learns.

The Three-Tier Architecture

Simplex doesn't just have individual memory—it has a hierarchy that mirrors human organizational knowledge:

Specialist Anima (30% threshold)

Personal memories and beliefs. Easily revised. "I think the deadline is Friday."

HiveMnemonic (50% threshold)

Shared team knowledge. Requires more evidence to change. "Our team knows deadlines slip."

Divine (70% threshold)

Organization-wide truth. Highly stable. "Company policy requires 2-week notice for deadline changes."

This hierarchy means knowledge can flow up (individual discovery becomes team knowledge becomes organizational truth) and down (organizational policies inform team practices inform individual behavior). RLM has nothing comparable—each query is isolated.

What RLM Gets Right

To be fair, the RLM paper makes valid contributions:

  • Inference-time scaling: Showing that compute at inference can compensate for architectural limits
  • Programmatic context access: Treating prompts as data structures, not just strings
  • Emergent strategies: Models learning decomposition approaches without explicit training

These insights are valuable. But they're incremental improvements to a paradigm that Simplex has moved beyond.

The Fundamental Difference

RLM asks: "How do we fit more context into an LLM?"

Simplex asks: "Why do we need all that context in the first place?"

The answer: you don't, if you have memory. You don't need to re-send everything if you can remember. You don't need recursive decomposition if you've already distilled understanding. You don't need to process 10 million tokens if you know what they mean.

Context is what you send when you have no memory.

Simplex has memory. Simplex doesn't need RLM.

Looking Forward

The AI industry will continue building increasingly elaborate workarounds for the stateless LLM paradigm. Longer context windows. Better compression. Recursive processing. Each one a patch on a fundamental limitation.

Simplex takes a different path: cognitive architectures that think like minds, not text processors. Memory that persists. Beliefs that update. Specialists that collaborate and share knowledge.

RLM is an impressive engineering achievement. But from where Simplex stands, it's solving yesterday's problem.

The future isn't bigger context windows. It's systems that don't need them.