Back to Articles

Simplex Toolchain: Native or VM? Why Not Both?

Part 1: An Executive Overview

Most programming languages force a choice: compile to native code for speed, or target a virtual machine for portability. Simplex refuses this dichotomy. Write your code once, then choose your deployment target—blazing-fast native binaries or portable bytecode that runs anywhere.

This post provides a high-level tour of the Simplex toolchain for decision-makers, architects, and anyone curious about how a modern AI-native language is built. Part 2 dives into the technical implementation for developers.

The Big Picture

The Simplex toolchain consists of five integrated components, each written entirely in Simplex itself:

Tool Purpose Think of it as...
sxc Compiler The translator that converts Simplex code into something computers can run
spx Package Manager The organizer that manages projects, dependencies, and builds
sxdoc Documentation Generator The librarian that produces searchable documentation from code
cursus Virtual Machine The universal player that runs Simplex bytecode on any platform
sxlsp Language Server The assistant that powers IDE features like autocomplete and error checking

Together, these tools form a complete development ecosystem—from writing code to deploying production systems.

Native or VM: The User's Choice

Here's the key insight: the same Simplex source code can target either native compilation or bytecode execution. The choice is a build flag, not a language constraint.

Native Compilation (via LLVM)

When you need maximum performance, compile to native machine code:

  • Speed: Native binaries run at full processor speed with no interpretation overhead
  • Cross-platform: Target macOS (ARM64, x86_64), Linux (ARM64, x86_64), or Windows
  • Optimization: Multiple optimization levels from debug builds to aggressive production optimization
  • Deployment: Ship a single executable with no runtime dependencies

Native compilation uses LLVM, the same infrastructure behind Rust, Swift, and Clang. Your Simplex code benefits from decades of compiler optimization research.

Bytecode Execution (via cursus)

When you need flexibility and advanced runtime features, compile to bytecode:

  • Instant startup: No compilation delay—bytecode loads and runs immediately
  • Universal portability: The same .sxb file runs on any platform with cursus
  • Checkpointing: Actors can save their state and resume later, even on different machines
  • Migration: Running actors can move between nodes in a cluster without stopping
  • Sandboxing: The VM provides isolation for security-sensitive deployments

Bytecode is essential for Simplex's distributed computing model. When an actor needs to migrate from a failing node to a healthy one, the VM makes this seamless.

Why Both Matter

Different deployment scenarios demand different tradeoffs:

Scenario Best Target Why
CLI tool or local application Native Fast startup, no runtime needed
High-performance inference Native Every millisecond matters
Distributed actor swarm Bytecode Checkpointing and migration required
Cloud spot instances Bytecode Nodes can be terminated; actors must survive
Development and testing Bytecode Fast iteration, no recompilation
Embedded or edge deployment Native Minimal resource footprint

The power is in having both options available from the same codebase.

The Toolchain Components

sxc: The Compiler

The Simplex compiler (sxc) is the heart of the toolchain. It transforms human-readable Simplex code into either native executables or portable bytecode.

What it does:

  • Reads and validates Simplex source files (.sx)
  • Checks types to catch errors before runtime
  • Generates optimized output (LLVM IR for native, .sxb for bytecode)
  • Supports multiple optimization levels for different use cases

Key capabilities:

  • Type inference: The compiler figures out types automatically, reducing boilerplate
  • Generics: Write code once, use it with any type—the compiler generates specialized versions
  • AI primitives: Native support for inference, embeddings, and cognitive constructs
  • Actor verification: Compile-time checks for message-passing correctness

spx: The Package Manager

Modern development requires managing dependencies, organizing projects, and automating builds. That's spx.

What it does:

  • Creates and manages project structure
  • Resolves and downloads dependencies
  • Orchestrates builds across multiple files
  • Runs tests and generates documentation
  • Publishes packages to the registry

Why it matters:

Without a package manager, every project becomes an island. With spx, developers can share code, depend on libraries, and maintain reproducible builds. The lock file ensures that a build today produces the same result as a build next year.

sxdoc: The Documentation Generator

Good documentation is crucial for adoption. sxdoc extracts documentation from code comments and produces searchable HTML output.

What it does:

  • Parses documentation comments from source files
  • Generates HTML with navigation, search, and cross-references
  • Includes code examples with syntax highlighting
  • Links type signatures to their definitions

Why it matters:

Documentation that lives with the code stays accurate. When developers write a function, they document it in the same file. sxdoc turns those comments into a professional documentation website.

cursus: The Virtual Machine

The Simplex Virtual Machine (SVM), named cursus (Latin for "course" or "journey"), executes bytecode and provides the runtime for distributed computing.

What it does:

  • Executes .sxb bytecode files
  • Manages the actor system (spawning, messaging, supervision)
  • Handles checkpointing for persistence and recovery
  • Coordinates cluster communication for distributed deployment
  • Provides optional JIT compilation for hot paths

Why it matters:

Cursus enables Simplex's distributed computing model. Actors can checkpoint their state, migrate between nodes, and recover from failures—all transparently. This is essential for running on ephemeral cloud infrastructure like spot instances.

sxlsp: The Language Server

Modern developers expect intelligent editor support. The language server provides it.

What it does:

  • Autocomplete: Suggests completions as you type
  • Go to definition: Jump to where a function or type is defined
  • Find references: See everywhere a symbol is used
  • Real-time errors: Highlights problems before you compile
  • Hover information: Shows type signatures and documentation
  • Rename refactoring: Safely rename symbols across files

Why it matters:

Developer productivity depends on tooling. The language server integrates with VS Code, Vim, Emacs, and any editor supporting the Language Server Protocol. Developers get the experience they expect from mature languages.

The Self-Hosting Story

Here's something remarkable: the entire Simplex toolchain is written in Simplex.

This isn't just an interesting technical detail—it's a proof of the language's capabilities. A language that can't build its own compiler probably can't build your production system either.

The Bootstrap Problem

Every self-hosted compiler faces a chicken-and-egg problem: you need a compiler to compile the compiler. How do you get started?

Simplex solves this with a three-stage bootstrap:

  1. Stage 0: A minimal compiler written in Python. It understands a restricted subset of Simplex—enough to compile the real compiler.
  2. Stage 1: The full Simplex compiler, written in Simplex, compiled by Stage 0. This version supports all language features.
  3. Stage 2: The same compiler code, compiled by Stage 1. If Stage 1 and Stage 2 produce identical output, the compiler is verified.

This process mirrors how GCC, Go, and Rust bootstrap themselves. The Python stage is temporary scaffolding; once the Simplex compiler can compile itself, Python is no longer needed.

Bootstrap Restrictions

The Stage 0 compiler supports only a restricted subset of Simplex:

  • No while loops (use recursion instead)
  • No mutable variables (pure functional style)
  • No traits or generics
  • Limited pattern matching
  • Simplified module system

These restrictions make Stage 0 simpler to implement in Python. The full-featured Stage 1 compiler, written in this restricted subset, then provides all language features.

Why Self-Hosting Matters

Self-hosting provides several benefits:

  • Dogfooding: The language team uses Simplex daily, exposing pain points
  • Verification: The compiler compiling itself is a rigorous test
  • Independence: No external language runtime in production
  • Credibility: A self-hosted compiler demonstrates the language works

The Simplex toolchain comprises approximately 16,600 lines across 40 files—all pure Simplex.

Content-Addressed Code

One of Simplex's distinctive features is content-addressed code: every function is identified by a SHA-256 hash of its implementation.

What this means:

  • Perfect caching: If the hash matches, the code is identical—no need to recompile
  • No version conflicts: Two functions with the same hash are definitionally the same
  • Seamless migration: When actors move between nodes, the hash guarantees identical behavior
  • Lazy loading: The VM fetches only the functions it needs, by hash

This approach, inspired by the Unison language, eliminates entire categories of dependency management problems.

Language Features at a Glance

The Simplex compiler and runtime support a rich set of language features. Here's a high-level overview (detailed in Part 2):

Core Language

  • Static typing with inference: Types catch errors early without verbose annotations
  • Pattern matching: Destructure data elegantly with exhaustiveness checking
  • Generics: Write polymorphic code; the compiler generates specialized versions
  • Traits: Define shared behavior across types (like interfaces)
  • Ownership semantics: Memory safety without garbage collection pauses
  • Result types: Explicit error handling with Result<T, E> and ? operator

Concurrency and Distribution

  • Actors: Isolated concurrent entities communicating via messages
  • Supervision trees: Automatic failure handling and recovery
  • Async/await: Cooperative concurrency within actors
  • Checkpointing: Persistent actor state for fault tolerance
  • Clustering: Transparent distribution across nodes

AI-Native Constructs

  • Inference primitive: Call language models as naturally as calling functions
  • Embeddings: Generate and search vector embeddings
  • Specialists: Actors wrapping small language models
  • Hives: Supervisors coordinating multiple specialists
  • Belief systems: Epistemically-grounded memory with truth categories

Mnemonic Extensions

  • Three-tier memory: Short-term, long-term, and persistent storage
  • Truth categories: Absolute, contextual, opinion, and inferred
  • Confidence tracking: Bayesian updates based on evidence
  • Belief revision: Rational updates when evidence contradicts beliefs
  • BDI agents: Belief-Desire-Intention architecture as language primitives

The Economics of Dual Targets

The choice between native and bytecode isn't just technical—it has economic implications.

Native: Lower Per-Request Cost

Native code runs faster, meaning each server handles more requests. For high-volume, latency-sensitive workloads, native compilation reduces infrastructure costs.

Bytecode: Lower Operational Complexity

Bytecode enables features that reduce operational burden:

  • Spot instances: Run on 70-90% cheaper cloud infrastructure because actors survive termination
  • Zero-downtime deployment: Migrate actors to new nodes without service interruption
  • Automatic recovery: Failed actors restore from checkpoint without manual intervention

For distributed AI systems with complex operational requirements, bytecode's flexibility often outweighs native's raw speed.

What's Next

This overview covers the toolchain architecture and the native-vs-bytecode decision. Part 2 dives into the technical details:

  • Complete language specification and syntax
  • Compiler internals (lexer, parser, type system, code generation)
  • Bytecode format and VM architecture
  • Actor system implementation
  • CHAI and Mnemonic extensions
  • Code examples throughout

Whether you're evaluating Simplex for a project, curious about language design, or interested in AI-native programming, Part 2 provides the depth to understand how it all works.


Summary

The Simplex toolchain provides:

  • Dual compilation targets: Native for performance, bytecode for flexibility—same source code
  • Complete ecosystem: Compiler, package manager, documentation generator, VM, and language server
  • Self-hosted implementation: The toolchain is written in Simplex, proving the language works
  • Content-addressed code: Functions identified by hash for perfect caching and migration
  • AI-native features: Inference, embeddings, specialists, hives, and belief systems as language primitives

Native or VM? The answer is: whichever your deployment needs. Simplex gives you both.