Back to Articles

Simplex Toolchain Deep Dive: From Source to Swarm

Part 2: Technical Implementation for Developers

This is Part 2 of our toolchain series. Part 1 covered the architecture at a high level; this post goes deep into the implementation.

We'll cover the complete language specification, compiler internals, bytecode format, VM architecture, and the cognitive extensions that make Simplex unique. Code examples throughout.

Table of Contents

  1. Compilation Pipeline
  2. Language Specification
  3. Type System
  4. Bytecode Format
  5. VM Architecture
  6. Actor System
  7. AI Primitives
  8. CHAI Specialists and Hives
  9. Mnemonic Extensions
  10. Bootstrap Process

1. Compilation Pipeline

The sxc compiler transforms Simplex source into executable output through a multi-stage pipeline:

Source (.sx)
    │
    ▼
┌─────────────────┐
│     Lexer       │  Tokenization (~500 lines)
│  (lexer.sx)     │  Character stream → Token stream
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│     Parser      │  AST Construction (~2,500 lines)
│  (parser.sx)    │  Token stream → Abstract Syntax Tree
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Type Checker   │  Semantic Analysis (~2,500 lines)
│  (types/*.sx)   │  AST → Typed AST + Error reports
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Code Gen      │  Output Generation (~2,200 lines)
│ (codegen/*.sx)  │  Typed AST → LLVM IR or Bytecode
└────────┬────────┘
         │
    ┌────┴────┐
    │         │
    ▼         ▼
  LLVM IR   Bytecode
  (.ll)     (.sxb)
    │         │
    ▼         │
  Native      │
  Binary      │
    │         │
    └────┬────┘
         │
         ▼
     Execution

Lexer: Tokenization

The lexer converts character streams into tokens. It uses pre-computed lookup tables for performance:

// Character classification table (O(1) lookup)
let CHAR_CLASS: [CharClass; 256] = init_char_classes()

enum CharClass {
    Whitespace,
    Alpha,
    Digit,
    Operator,
    Delimiter,
    Quote,
    Invalid
}

fn next_token(input: &str, pos: &mut usize) -> Token {
    skip_whitespace(input, pos)

    let c = input[*pos]
    match CHAR_CLASS[c as usize] {
        CharClass::Alpha => scan_identifier(input, pos),
        CharClass::Digit => scan_number(input, pos),
        CharClass::Quote => scan_string(input, pos),
        CharClass::Operator => scan_operator(input, pos),
        CharClass::Delimiter => scan_delimiter(input, pos),
        _ => Token::Invalid(c)
    }
}

Keywords are identified via hash table lookup after scanning an identifier:

// Keyword hash table
let KEYWORDS: HashMap<String, TokenKind> = [
    ("fn", TokenKind::Fn),
    ("let", TokenKind::Let),
    ("var", TokenKind::Var),
    ("if", TokenKind::If),
    ("else", TokenKind::Else),
    ("match", TokenKind::Match),
    ("for", TokenKind::For),
    ("while", TokenKind::While),
    ("return", TokenKind::Return),
    ("actor", TokenKind::Actor),
    ("receive", TokenKind::Receive),
    ("send", TokenKind::Send),
    ("ask", TokenKind::Ask),
    ("spawn", TokenKind::Spawn),
    ("specialist", TokenKind::Specialist),
    ("hive", TokenKind::Hive),
    ("infer", TokenKind::Infer),
    ("checkpoint", TokenKind::Checkpoint),
    // ... 50+ keywords
].into()

Parser: AST Construction

The parser uses Pratt parsing (top-down operator precedence) for expressions. This elegantly handles operator precedence and associativity:

fn parse_expression(tokens: &[Token], min_precedence: u8) -> Expr {
    let mut left = parse_prefix(tokens)

    while let Some(op) = peek_infix_op(tokens) {
        let (prec, assoc) = precedence(op)
        if prec < min_precedence {
            break
        }

        advance(tokens)  // consume operator

        let next_min = match assoc {
            Assoc::Left => prec + 1,
            Assoc::Right => prec
        }

        let right = parse_expression(tokens, next_min)
        left = Expr::Binary(op, Box::new(left), Box::new(right))
    }

    left
}

// Operator precedence table
fn precedence(op: BinaryOp) -> (u8, Assoc) {
    match op {
        // Assignment (right-associative)
        BinaryOp::Assign => (1, Assoc::Right),

        // Logical OR
        BinaryOp::Or => (2, Assoc::Left),

        // Logical AND
        BinaryOp::And => (3, Assoc::Left),

        // Comparison
        BinaryOp::Eq | BinaryOp::Ne => (4, Assoc::Left),
        BinaryOp::Lt | BinaryOp::Le |
        BinaryOp::Gt | BinaryOp::Ge => (5, Assoc::Left),

        // Arithmetic
        BinaryOp::Add | BinaryOp::Sub => (6, Assoc::Left),
        BinaryOp::Mul | BinaryOp::Div |
        BinaryOp::Mod => (7, Assoc::Left),

        // Exponentiation (right-associative)
        BinaryOp::Pow => (8, Assoc::Right),
    }
}

Type Checker: Hindley-Milner Inference

Simplex uses Hindley-Milner type inference with extensions for traits and ownership. The core algorithm:

struct TypeEnv {
    bindings: HashMap<String, TypeScheme>,
    substitutions: HashMap<TypeVar, Type>
}

fn infer(env: &mut TypeEnv, expr: &Expr) -> Result<Type, TypeError> {
    match expr {
        Expr::Var(name) => {
            let scheme = env.lookup(name)?;
            Ok(instantiate(scheme))
        }

        Expr::Lambda(param, body) => {
            let param_type = fresh_type_var();
            env.bind(param, TypeScheme::mono(param_type.clone()));
            let body_type = infer(env, body)?;
            Ok(Type::Function(Box::new(param_type), Box::new(body_type)))
        }

        Expr::Apply(func, arg) => {
            let func_type = infer(env, func)?;
            let arg_type = infer(env, arg)?;
            let result_type = fresh_type_var();

            unify(func_type, Type::Function(
                Box::new(arg_type),
                Box::new(result_type.clone())
            ))?;

            Ok(result_type)
        }

        Expr::Let(name, value, body) => {
            let value_type = infer(env, value)?;
            let scheme = generalize(env, value_type);
            env.bind(name, scheme);
            infer(env, body)
        }

        // ... pattern matching, actors, etc.
    }
}

fn unify(t1: Type, t2: Type) -> Result<(), TypeError> {
    match (t1, t2) {
        (Type::Var(v), t) | (t, Type::Var(v)) => {
            if occurs_check(v, &t) {
                Err(TypeError::InfiniteType)
            } else {
                bind(v, t)
            }
        }

        (Type::Function(a1, r1), Type::Function(a2, r2)) => {
            unify(*a1, *a2)?;
            unify(*r1, *r2)
        }

        (Type::Con(n1, args1), Type::Con(n2, args2)) if n1 == n2 => {
            for (a1, a2) in args1.iter().zip(args2.iter()) {
                unify(a1.clone(), a2.clone())?;
            }
            Ok(())
        }

        (t1, t2) => Err(TypeError::Mismatch(t1, t2))
    }
}

Code Generation

Code generation targets either LLVM IR or Simplex bytecode:

fn codegen(ast: &TypedAst, target: Target) -> Output {
    match target {
        Target::Native => {
            let ir = emit_llvm_ir(ast);
            Output::LlvmIr(ir)
        }
        Target::Bytecode => {
            let bytecode = emit_bytecode(ast);
            Output::Bytecode(bytecode)
        }
    }
}

fn emit_llvm_ir(ast: &TypedAst) -> String {
    let mut builder = StringBuilder::new();

    // Emit module header
    builder.append("target triple = \"");
    builder.append(target_triple());
    builder.append("\"\n\n");

    // Emit type definitions
    for typedef in &ast.types {
        emit_type_def(&mut builder, typedef);
    }

    // Emit function definitions
    for func in &ast.functions {
        emit_function(&mut builder, func);
    }

    // Emit actor definitions
    for actor in &ast.actors {
        emit_actor(&mut builder, actor);
    }

    builder.to_string()
}

2. Language Specification

Basic Types

// Primitive types
let b: bool = true
let i: i64 = 42
let f: f64 = 3.14159
let s: String = "hello"
let c: char = 'x'

// Unit type (like void)
let u: () = ()

// Optional values
let maybe: Option<i64> = Some(42)
let nothing: Option<i64> = None

// Error handling
let result: Result<i64, String> = Ok(42)
let error: Result<i64, String> = Err("failed")

Collections

// Lists (dynamic arrays)
let numbers: List<i64> = [1, 2, 3, 4, 5]
let first = numbers[0]  // 1
let len = numbers.len()  // 5

// Maps (hash tables)
let scores: Map<String, i64> = {
    "alice": 100,
    "bob": 85,
    "charlie": 92
}
let alice_score = scores["alice"]  // 100

// Sets
let unique: Set<i64> = {1, 2, 3}
let has_two = unique.contains(2)  // true

// Vectors (fixed-size, for math/ML)
let vec: Vector<f64, 3> = [1.0, 2.0, 3.0]
let embedding: Vector<f64, 384> = ai::embed("hello")

Functions

// Basic function
fn add(a: i64, b: i64) -> i64 {
    a + b
}

// Generic function
fn identity<T>(x: T) -> T {
    x
}

// Function with trait bounds
fn print_all<T: Display>(items: List<T>) {
    for item in items {
        println(item.to_string())
    }
}

// Closures
let double = (x: i64) => x * 2
let numbers = [1, 2, 3].map((x) => x * 2)  // [2, 4, 6]

// Higher-order functions
fn apply_twice<T>(f: fn(T) -> T, x: T) -> T {
    f(f(x))
}

Pattern Matching

// Basic matching
fn describe(n: i64) -> String {
    match n {
        0 => "zero",
        1 => "one",
        2..=9 => "single digit",
        10..=99 => "double digit",
        _ => "large"
    }
}

// Destructuring
struct Point { x: f64, y: f64 }

fn distance_from_origin(p: Point) -> f64 {
    match p {
        Point { x, y } => (x*x + y*y).sqrt()
    }
}

// Enum matching
enum Shape {
    Circle { radius: f64 },
    Rectangle { width: f64, height: f64 },
    Triangle { base: f64, height: f64 }
}

fn area(shape: Shape) -> f64 {
    match shape {
        Shape::Circle { radius } => 3.14159 * radius * radius,
        Shape::Rectangle { width, height } => width * height,
        Shape::Triangle { base, height } => 0.5 * base * height
    }
}

// Guards
fn classify(n: i64) -> String {
    match n {
        x if x < 0 => "negative",
        x if x == 0 => "zero",
        x if x % 2 == 0 => "positive even",
        _ => "positive odd"
    }
}

Structs and Enums

// Struct definition
struct User {
    id: i64,
    name: String,
    email: String,
    active: bool
}

// Struct instantiation
let user = User {
    id: 1,
    name: "Alice",
    email: "alice@example.com",
    active: true
}

// Struct update syntax
let inactive_user = User { active: false, ..user }

// Enum with data
enum Message {
    Text(String),
    Image { url: String, width: i64, height: i64 },
    Video { url: String, duration: f64 },
    Empty
}

// Using enums
let msg = Message::Image {
    url: "https://example.com/img.png",
    width: 800,
    height: 600
}

Traits

// Trait definition
trait Display {
    fn to_string(&self) -> String
}

trait Numeric {
    fn zero() -> Self
    fn add(&self, other: &Self) -> Self
}

// Trait implementation
impl Display for User {
    fn to_string(&self) -> String {
        format!("User(id={}, name={})", self.id, self.name)
    }
}

impl Numeric for i64 {
    fn zero() -> i64 { 0 }
    fn add(&self, other: &i64) -> i64 { *self + *other }
}

// Associated types
trait Iterator {
    type Item
    fn next(&mut self) -> Option<Self::Item>
}

// Trait bounds
fn sum<T: Numeric>(items: List<T>) -> T {
    items.fold(T::zero(), (acc, x) => acc.add(&x))
}

Error Handling

// Result type
fn divide(a: f64, b: f64) -> Result<f64, String> {
    if b == 0.0 {
        Err("division by zero")
    } else {
        Ok(a / b)
    }
}

// The ? operator
fn calculate(x: f64, y: f64, z: f64) -> Result<f64, String> {
    let a = divide(x, y)?  // Returns early if Err
    let b = divide(a, z)?
    Ok(b)
}

// Pattern matching on Result
match divide(10.0, 2.0) {
    Ok(result) => println("Result: {result}"),
    Err(e) => println("Error: {e}")
}

// Unwrap (panics on Err)
let result = divide(10.0, 2.0).unwrap()  // 5.0

// Default on error
let result = divide(10.0, 0.0).unwrap_or(0.0)  // 0.0

Modules

// File: math/geometry.sx
pub mod geometry {
    pub struct Point { pub x: f64, pub y: f64 }

    pub fn distance(a: &Point, b: &Point) -> f64 {
        let dx = b.x - a.x
        let dy = b.y - a.y
        (dx*dx + dy*dy).sqrt()
    }

    // Private helper
    fn validate_point(p: &Point) -> bool {
        p.x.is_finite() && p.y.is_finite()
    }
}

// File: main.sx
use math::geometry::{Point, distance}

fn main() {
    let a = Point { x: 0.0, y: 0.0 }
    let b = Point { x: 3.0, y: 4.0 }
    println("Distance: {}", distance(&a, &b))  // 5.0
}

3. Type System

Generic Monomorphization

Simplex compiles generics via monomorphization—generating specialized code for each concrete type:

// Source: generic function
fn max<T: Ord>(a: T, b: T) -> T {
    if a > b { a } else { b }
}

// Usage
let x = max(10, 20)        // i64
let y = max(3.14, 2.72)    // f64
let z = max("alpha", "beta")  // String

// Compiler generates:
fn max_i64(a: i64, b: i64) -> i64 { ... }
fn max_f64(a: f64, b: f64) -> f64 { ... }
fn max_String(a: String, b: String) -> String { ... }

Ownership and Borrowing

Simplex uses ownership semantics for memory safety without garbage collection:

// Ownership transfer
fn take_ownership(s: String) {
    println(s)
}  // s is dropped here

let greeting = "hello".to_string()
take_ownership(greeting)
// greeting is no longer valid here

// Borrowing (immutable reference)
fn borrow(s: &String) {
    println(s)
}

let greeting = "hello".to_string()
borrow(&greeting)
// greeting is still valid

// Mutable borrowing
fn modify(s: &mut String) {
    s.push_str(" world")
}

let mut greeting = "hello".to_string()
modify(&mut greeting)
// greeting is now "hello world"

Const Generics

// Fixed-size vectors with compile-time dimensions
struct Vector<T, const N: usize> {
    data: [T; N]
}

impl<T: Numeric, const N: usize> Vector<T, N> {
    fn dot(&self, other: &Vector<T, N>) -> T {
        let mut sum = T::zero()
        for i in 0..N {
            sum = sum.add(&self.data[i].mul(&other.data[i]))
        }
        sum
    }
}

// Embeddings with fixed dimension
type Embedding = Vector<f64, 384>

fn cosine_similarity(a: &Embedding, b: &Embedding) -> f64 {
    a.dot(b) / (a.magnitude() * b.magnitude())
}

4. Bytecode Format

SXB File Structure

┌──────────────────────────────────────┐
│           HEADER (32 bytes)          │
├──────────────────────────────────────┤
│  Magic: "SXB" (3 bytes)              │
│  Version: u16                        │
│  Flags: u16                          │
│  Entry Point: u32 (function index)   │
│  String Table Offset: u32            │
│  Type Table Offset: u32              │
│  Function Table Offset: u32          │
│  Code Section Offset: u32            │
│  Reserved: 8 bytes                   │
├──────────────────────────────────────┤
│          STRING TABLE                │
├──────────────────────────────────────┤
│  Count: u32                          │
│  [Length: u32, Data: bytes]...       │
├──────────────────────────────────────┤
│           TYPE TABLE                 │
├──────────────────────────────────────┤
│  Count: u32                          │
│  [TypeDef]...                        │
├──────────────────────────────────────┤
│         FUNCTION TABLE               │
├──────────────────────────────────────┤
│  Count: u32                          │
│  [FunctionDef]...                    │
│    - Name Index: u32                 │
│    - Hash: [u8; 32] (SHA-256)        │
│    - Param Count: u16                │
│    - Local Count: u16                │
│    - Code Offset: u32                │
│    - Code Length: u32                │
├──────────────────────────────────────┤
│          CODE SECTION                │
├──────────────────────────────────────┤
│  [Instructions]...                   │
└──────────────────────────────────────┘

Instruction Set

// Stack Operations (0x00-0x0F)
0x00  NOP           // No operation
0x01  PUSH_CONST n  // Push constant pool[n]
0x02  PUSH_LOCAL n  // Push local variable[n]
0x03  STORE_LOCAL n // Pop and store to local[n]
0x04  POP           // Discard top of stack
0x05  DUP           // Duplicate top of stack
0x06  SWAP          // Swap top two values
0x07  ROT           // Rotate top three values

// Integer Arithmetic (0x10-0x1F)
0x10  ADD           // a + b
0x11  SUB           // a - b
0x12  MUL           // a * b
0x13  DIV           // a / b
0x14  MOD           // a % b
0x15  NEG           // -a

// Float Arithmetic (0x18-0x1F)
0x18  FADD          // a + b (float)
0x19  FSUB          // a - b (float)
0x1A  FMUL          // a * b (float)
0x1B  FDIV          // a / b (float)

// Bitwise Operations (0x20-0x2F)
0x20  AND           // a & b
0x21  OR            // a | b
0x22  XOR           // a ^ b
0x23  NOT           // ~a
0x24  SHL           // a << b
0x25  SHR           // a >> b

// Comparisons (0x30-0x3F)
0x30  EQ            // a == b
0x31  NE            // a != b
0x32  LT            // a < b
0x33  LE            // a <= b
0x34  GT            // a > b
0x35  GE            // a >= b

// Control Flow (0x40-0x4F)
0x40  JMP offset    // Unconditional jump
0x41  JMP_IF offset // Jump if true
0x42  JMP_UNLESS o  // Jump if false
0x43  CALL n        // Call function[n]
0x44  RET           // Return from function
0x45  TAIL_CALL n   // Tail call optimization

// Objects (0x50-0x5F)
0x50  NEW_OBJ t     // Create object of type[t]
0x51  GET_FIELD f   // Get field[f] from object
0x52  SET_FIELD f   // Set field[f] on object
0x53  NEW_LIST n    // Create list with n elements
0x54  NEW_MAP n     // Create map with n entries
0x55  INDEX         // list[index] or map[key]
0x56  INDEX_SET     // list[index] = value

// Actor Operations (0x60-0x6F)
0x60  SPAWN t       // Spawn actor of type[t]
0x61  SEND          // Send message (fire-and-forget)
0x62  ASK           // Send message and await response
0x63  RECEIVE       // Await next message
0x64  SELF          // Push current actor reference
0x65  CHECKPOINT    // Save actor state

// AI Operations (0x70-0x7F)
0x70  AI_COMPLETE   // Text completion
0x71  AI_EMBED      // Generate embedding
0x72  AI_NEAREST    // Nearest neighbor search
0x73  AI_STREAM_START // Start streaming completion
0x74  AI_STREAM_NEXT  // Get next stream chunk
0x75  AI_EXTRACT    // Structured extraction

// Debug (0xF0-0xFF)
0xF0  DEBUG_PRINT   // Print top of stack
0xFF  HALT          // Stop execution

Content Addressing

Every function is identified by the SHA-256 hash of its bytecode:

struct FunctionDef {
    name: String,
    hash: [u8; 32],      // SHA-256 of bytecode
    params: List<Type>,
    returns: Type,
    bytecode: List<u8>
}

fn compute_function_hash(func: &FunctionDef) -> [u8; 32] {
    let mut hasher = Sha256::new()

    // Include signature in hash
    hasher.update(&serialize(&func.params))
    hasher.update(&serialize(&func.returns))

    // Include bytecode
    hasher.update(&func.bytecode)

    hasher.finalize()
}

5. VM Architecture

Runtime Structure

struct SimplexVM {
    // Execution state
    stack: Vec<Value>,
    call_stack: Vec<CallFrame>,
    ip: usize,  // Instruction pointer

    // Memory
    heap: Heap,
    gc: GarbageCollector,

    // Actor system
    actors: ActorRegistry,
    scheduler: ActorScheduler,
    mailboxes: Map<ActorId, Mailbox>,

    // Code
    modules: Map<Hash, Module>,
    function_cache: Map<Hash, CompiledFunction>,

    // Checkpointing
    checkpoint_manager: CheckpointManager,

    // Optional JIT
    jit: Option<JitCompiler>
}

struct CallFrame {
    function_hash: [u8; 32],
    ip: usize,
    base_pointer: usize,
    locals: Vec<Value>
}

Execution Loop

fn execute(vm: &mut SimplexVM) -> Result<Value, RuntimeError> {
    loop {
        let opcode = vm.fetch_byte()

        match opcode {
            PUSH_CONST => {
                let idx = vm.fetch_u32()
                let value = vm.constants[idx].clone()
                vm.stack.push(value)
            }

            ADD => {
                let b = vm.stack.pop()?.as_i64()?
                let a = vm.stack.pop()?.as_i64()?
                vm.stack.push(Value::I64(a + b))
            }

            CALL => {
                let func_idx = vm.fetch_u32()
                let func = &vm.functions[func_idx]

                // Check for JIT compilation
                if let Some(jit) = &vm.jit {
                    if func.call_count > JIT_THRESHOLD {
                        let native = jit.compile(func)?
                        return native.execute(&mut vm.stack)
                    }
                }

                // Interpreted execution
                let frame = CallFrame {
                    function_hash: func.hash,
                    ip: 0,
                    base_pointer: vm.stack.len() - func.param_count,
                    locals: vec![Value::Nil; func.local_count]
                }
                vm.call_stack.push(frame)
                vm.ip = func.code_offset
            }

            SPAWN => {
                let actor_type = vm.fetch_u32()
                let args = vm.pop_n(vm.fetch_u16() as usize)
                let actor_id = vm.actors.spawn(actor_type, args)?
                vm.stack.push(Value::ActorRef(actor_id))
            }

            CHECKPOINT => {
                let actor_id = vm.current_actor()?
                vm.checkpoint_manager.checkpoint(actor_id)?
            }

            RET => {
                let result = vm.stack.pop()?
                if vm.call_stack.is_empty() {
                    return Ok(result)
                }
                let frame = vm.call_stack.pop().unwrap()
                vm.stack.truncate(frame.base_pointer)
                vm.stack.push(result)
                vm.ip = frame.return_address
            }

            HALT => return Ok(vm.stack.pop().unwrap_or(Value::Nil)),

            _ => return Err(RuntimeError::InvalidOpcode(opcode))
        }
    }
}

Garbage Collection

struct GarbageCollector {
    heap: Heap,
    roots: Set<*mut Object>,
    threshold: usize,
    allocated: usize
}

impl GarbageCollector {
    fn collect_if_needed(&mut self) {
        if self.allocated > self.threshold {
            self.collect()
        }
    }

    fn collect(&mut self) {
        // Mark phase
        let mut marked = Set::new()
        let mut worklist: Vec<*mut Object> = self.roots.iter().copied().collect()

        while let Some(obj) = worklist.pop() {
            if marked.insert(obj) {
                // Add all referenced objects
                for child in (*obj).references() {
                    worklist.push(child)
                }
            }
        }

        // Sweep phase
        self.heap.retain(|obj| marked.contains(&obj))
        self.allocated = self.heap.size()

        // Adjust threshold
        self.threshold = (self.allocated as f64 * 1.5) as usize
    }
}

6. Actor System

Actor Definition

actor Counter {
    var count: i64 = 0

    // Initialize with starting value
    fn init(start: i64) {
        count = start
    }

    // Handle increment message
    receive Increment {
        count += 1
    }

    // Handle increment by amount
    receive Add(n: i64) {
        count += n
    }

    // Handle query with response
    receive GetCount -> i64 {
        count
    }

    // Handle reset
    receive Reset {
        count = 0
        checkpoint()  // Persist state
    }
}

// Using the actor
fn main() {
    let counter = spawn Counter(10)

    send(counter, Increment)
    send(counter, Add(5))

    let value = ask(counter, GetCount)  // 16
    println("Count: {value}")
}

Supervision Trees

// Supervisor definition
supervisor CounterSystem {
    strategy: OneForOne,     // Restart only failed child
    max_restarts: 3,         // Max 3 restarts
    within: 60.seconds,      // Within 60 seconds

    children: [
        child(Counter, args: [0], restart: Always),
        child(Counter, args: [100], restart: Always),
        child(Logger, restart: OnFailure)
    ]
}

// Restart strategies
enum SupervisorStrategy {
    OneForOne,    // Restart only the failed child
    OneForAll,    // Restart all children if one fails
    RestForOne    // Restart failed child and all started after it
}

// Restart policies
enum RestartPolicy {
    Always,       // Always restart
    OnFailure,    // Restart only on failure (not normal exit)
    Never         // Never restart
}

Message Passing

// Fire-and-forget (async)
send(actor, Message)

// Request-response (sync)
let response = ask(actor, Query)

// Request with timeout
let response = ask(actor, Query, timeout: 5.seconds)?

// Broadcast to all actors of a type
broadcast<Counter>(Reset)

// Pipeline pattern
let result = actor1
    |> ask(Process)
    |> ask(actor2, Transform)
    |> ask(actor3, Finalize)

Checkpointing

actor StatefulProcessor {
    var processed: i64 = 0
    var results: List<String> = []

    receive Process(data: String) -> String {
        let result = transform(data)
        results.push(result.clone())
        processed += 1

        // Checkpoint every 100 items
        if processed % 100 == 0 {
            checkpoint()
        }

        result
    }

    // Lifecycle hook: called after recovery
    fn on_resume() {
        println("Resumed with {processed} items processed")
    }
}

7. AI Primitives

Inference

// Basic completion
let response = await ai::complete("Explain quantum computing")

// With options
let response = await ai::complete(
    "Summarize this article: {article}",
    model: "mistral-7b",
    temperature: 0.7,
    max_tokens: 500
)

// Streaming
for chunk in ai::stream("Write a story about...") {
    print(chunk)
}

// Structured extraction
struct ContactInfo {
    name: String,
    email: Option<String>,
    phone: Option<String>
}

let contact = await ai::extract<ContactInfo>(
    "John Smith can be reached at john@example.com or 555-1234"
)
// ContactInfo { name: "John Smith", email: Some("john@example.com"), phone: Some("555-1234") }

// Classification
enum Sentiment { Positive, Negative, Neutral }

let sentiment = await ai::classify<Sentiment>("This product is amazing!")
// Sentiment::Positive

Embeddings

// Single embedding
let embedding: Vector<f64, 384> = ai::embed("Hello, world!")

// Batch embeddings
let texts = ["apple", "banana", "orange"]
let embeddings = ai::embed_batch(texts)

// Similarity search
let query = ai::embed("fruit")
let results = ai::nearest(query, embeddings, k: 2)
// Returns indices of most similar embeddings

// Cosine similarity
let similarity = ai::similarity(embedding1, embedding2)  // 0.0 to 1.0

8. CHAI Specialists and Hives

Specialist Definition

specialist Summarizer {
    model: "mistral-7b-instruct",
    domain: "text summarization",
    temperature: 0.3,
    max_tokens: 200,

    system_prompt: "You are a concise summarizer.
                   Provide brief, accurate summaries.",

    receive Summarize(text: String) -> String {
        infer("Summarize the following text:\n\n{text}")
    }

    receive SummarizeWithLength(text: String, max_words: i64) -> String {
        infer("Summarize in {max_words} words or less:\n\n{text}")
    }
}

specialist CodeAnalyzer {
    model: "codellama-7b",
    domain: "code analysis",
    temperature: 0.1,

    receive Explain(code: String) -> String {
        infer("Explain what this code does:\n```\n{code}\n```")
    }

    receive FindBugs(code: String) -> List<String> {
        let response = infer("List potential bugs in this code:\n```\n{code}\n```")
        parse_bullet_points(response)
    }

    receive Refactor(code: String, goal: String) -> String {
        infer("Refactor this code to {goal}:\n```\n{code}\n```")
    }
}

Hive Definition

hive DocumentProcessor {
    specialists: [
        Summarizer,
        EntityExtractor,
        SentimentAnalyzer,
        TopicClassifier
    ],

    router: SemanticRouter {
        embedding_model: "all-minilm-l6-v2",
        domain_embeddings: auto  // Computed from specialist domains
    },

    strategy: OneForOne,
    max_concurrent: 10,

    // Shared memory for all specialists
    memory: SharedVectorStore {
        dimension: 384,
        max_entries: 100_000
    },

    receive ProcessDocument(doc: String) -> DocumentAnalysis {
        // Parallel processing
        let (summary, entities, sentiment, topics) = parallel(
            ask(Summarizer, Summarize(doc)),
            ask(EntityExtractor, Extract(doc)),
            ask(SentimentAnalyzer, Analyze(doc)),
            ask(TopicClassifier, Classify(doc))
        )

        DocumentAnalysis {
            summary,
            entities,
            sentiment,
            topics
        }
    }
}

Routing Strategies

// Semantic routing (default)
router: SemanticRouter {
    embedding_model: "all-minilm-l6-v2"
}

// Rule-based routing
router: RuleRouter {
    rules: [
        ("code|program|function", CodeAnalyzer),
        ("summarize|summary|tldr", Summarizer),
        ("sentiment|feeling|emotion", SentimentAnalyzer)
    ],
    default: GeneralAssistant
}

// LLM-based routing
router: LlmRouter {
    model: "tinyllama-1.1b",
    prompt: "Given this task, which specialist should handle it?"
}

// Cascade routing (try in order until success)
router: CascadeRouter {
    order: [FastSpecialist, MediumSpecialist, HighQualitySpecialist],
    success_threshold: 0.8
}

Ensemble Patterns

// Parallel execution
let results = parallel(
    ask(Specialist1, Query),
    ask(Specialist2, Query),
    ask(Specialist3, Query)
)

// Voting (majority wins)
let consensus = vote([
    ask(Judge1, Evaluate(item)),
    ask(Judge2, Evaluate(item)),
    ask(Judge3, Evaluate(item))
])

// Weighted combination
let combined = weighted_average([
    (ask(Expert1, Score(item)), 0.5),
    (ask(Expert2, Score(item)), 0.3),
    (ask(Expert3, Score(item)), 0.2)
])

// Chain of thought
let result = item
    |> ask(Analyzer, BreakDown)
    |> ask(Reasoner, ProcessSteps)
    |> ask(Synthesizer, Combine)

9. Mnemonic Extensions

Memory Tiers

// Memory entry structure
struct MemoryEntry {
    id: String,
    content: String,
    embedding: Vector<f64, 384>,
    tier: MemoryTier,
    truth_category: TruthCategory,
    confidence: ConfidenceScore,
    domains: List<String>,
    tags: List<String>,
    created_at: Timestamp,
    accessed_at: Timestamp,
    access_count: i64
}

enum MemoryTier {
    ShortTerm,    // 24-72h TTL, working hypotheses
    LongTerm,     // Indefinite, validated knowledge
    Persistent    // Permanent, core identity
}

// Storing memories
remember {
    content: "User prefers dark mode",
    memory_type: MemoryType::Preference,
    truth_category: TruthCategory::Contextual {
        domains: ["ui", "settings"]
    },
    confidence: 0.9,
    tier: MemoryTier::LongTerm
}

// Recalling memories
let memories = recall about "user preferences"
    from tiers [LongTerm, Persistent]
    limit 10

// Forgetting
forget memory_id reason: "Outdated information"

// Archiving
archive memories where accessed_at < 30.days.ago
    to "archive/old_memories.sxb"

Truth Categories

enum TruthCategory {
    // Empirically verifiable facts
    Absolute,

    // True within specific domains
    Contextual { domains: List<String> },

    // Subjective preferences/values
    Opinion,

    // Derived from patterns with stated confidence
    Inferred {
        confidence: f64,
        sources: List<String>
    }
}

// Examples
let fact = MemoryEntry {
    content: "Python 3.12 was released October 2023",
    truth_category: TruthCategory::Absolute,
    confidence: ConfidenceScore {
        source_reliability: 1.0,
        recency: 0.9,
        corroboration_count: 5,
        contradiction_count: 0
    },
    ...
}

let contextual = MemoryEntry {
    content: "React is best for component-based UIs",
    truth_category: TruthCategory::Contextual {
        domains: ["web", "frontend"]
    },
    ...
}

let opinion = MemoryEntry {
    content: "User thinks tabs are better than spaces",
    truth_category: TruthCategory::Opinion,
    ...
}

let inferred = MemoryEntry {
    content: "User likely works in finance",
    truth_category: TruthCategory::Inferred {
        confidence: 0.75,
        sources: ["frequent finance queries", "timezone patterns"]
    },
    ...
}

Belief System

// Type-level belief representation
Belief<T, C, τ> where
    T: Type,              // Content type
    C: ConfidenceLevel,   // High, Medium, Low
    τ: TruthCategory      // Absolute, Contextual, Opinion, Inferred

// Creating beliefs
let belief: Belief<String, High, Absolute> = believe {
    content: "Earth orbits the Sun",
    justification: "Scientific consensus"
}

// Querying beliefs
let relevant = query beliefs
    where topic matches "astronomy"
    and confidence >= 0.7
    order by recency desc

// Belief revision (AGM-style)
fn revise_beliefs(new_evidence: Evidence) {
    let contradicted = find_contradictions(beliefs, new_evidence)

    for belief in contradicted {
        if new_evidence.confidence > belief.confidence {
            // Contract: remove old belief
            contract(belief, reason: "Contradicted by higher-confidence evidence")

            // Expand: add new belief
            expand(new_evidence)
        } else {
            // Keep existing belief, note contradiction
            belief.add_contradiction(new_evidence)
        }
    }
}

BDI Agents

agent ResearchAgent {
    beliefs: BeliefStore { backing: memory },
    desires: GoalQueue { max_concurrent: 5 },
    intentions: IntentionStack,

    // Autonomous execution loop
    autonomous {
        tick_rate: 100.ms,

        loop {
            // 1. Perception
            let observations = perceive()

            // 2. Belief revision
            for obs in observations {
                revise_beliefs(obs)
            }

            // 3. Option generation
            let options = generate_options()

            // 4. Deliberation
            let selected_goals = deliberate(options)

            // 5. Planning
            for goal in selected_goals {
                if !intentions.has_plan_for(goal) {
                    let plan = create_plan(goal)
                    intentions.commit(Intention::new(goal, plan))
                }
            }

            // 6. Execution
            if let Some(action) = intentions.select_action() {
                let result = execute(action)
                process_result(action, result)
            }

            // 7. Learning
            send(learner, ObserveStep(observations, action, result))
        }
    }

    receive AssignGoal(goal: Goal) {
        desires.add(goal)
    }

    receive QueryStatus -> AgentStatus {
        AgentStatus {
            active_goals: desires.active(),
            current_intention: intentions.current(),
            belief_count: beliefs.count()
        }
    }
}

// Goal definition
goal FindInformation {
    topic: String,
    deadline: Option<Timestamp>,
    priority: Priority,

    success_condition: (beliefs) => {
        beliefs.has_high_confidence_belief_about(topic)
    }
}

// Plan definition
plan WebResearchPlan {
    goal: FindInformation,

    steps: [
        Action::Search { query: goal.topic },
        Action::Evaluate { results: $search_results },
        Action::Synthesize { sources: $evaluated_sources },
        Action::Remember { content: $synthesis }
    ],

    fallback: plan AskExpertPlan
}

Mnemonic Specialist

mnemonic specialist MemoryEnhancedCoder {
    model: "codellama-7b",
    domain: "code generation",

    // Memory configuration
    memory: {
        tiers: [ShortTerm, LongTerm],
        retrieval_k: 5,
        relevance_threshold: 0.7
    },

    receive GenerateCode(task: String, context: String) -> String {
        // Retrieve relevant memories
        let memories = recall about task from tiers [LongTerm]

        // Build context with memory
        let enhanced_context = format!(
            "Previous relevant code:\n{}\n\nCurrent context:\n{}\n\nTask: {}",
            memories.map(m => m.content).join("\n---\n"),
            context,
            task
        )

        let code = infer(enhanced_context)

        // Store successful generation
        remember {
            content: format!("Task: {task}\nCode: {code}"),
            memory_type: MemoryType::Skill,
            tier: MemoryTier::LongTerm
        }

        code
    }
}

10. Bootstrap Process

Three-Stage Bootstrap

┌─────────────────────────────────────────────────────────────┐
│                    STAGE 0: Python                          │
│  bootstrap/stage0.py (~2000 lines)                          │
│  - Minimal Simplex parser (restricted subset)               │
│  - Direct Python AST generation                             │
│  - No optimization, no error recovery                       │
│                                                             │
│  Restrictions:                                              │
│  - No while loops (use recursion)                           │
│  - No mutable variables (pure functional)                   │
│  - No traits or generics                                    │
│  - Simplified pattern matching                              │
│  - Basic module system only                                 │
└─────────────────────┬───────────────────────────────────────┘
                      │ compiles
                      ▼
┌─────────────────────────────────────────────────────────────┐
│                    STAGE 1: Simplex                         │
│  Written in restricted Simplex, compiled by Stage 0         │
│  - Full language support                                    │
│  - All features: while, var, traits, generics               │
│  - Complete pattern matching                                │
│  - Full module system                                       │
│  - Optimization passes                                      │
│  - Error recovery and reporting                             │
└─────────────────────┬───────────────────────────────────────┘
                      │ compiles
                      ▼
┌─────────────────────────────────────────────────────────────┐
│                    STAGE 2: Simplex                         │
│  Same source as Stage 1, compiled by Stage 1                │
│  - Produces identical output to Stage 1                     │
│  - This identity proves correctness                         │
│                                                             │
│  Verification: SHA-256(Stage1_binary) == SHA-256(Stage2_binary)
└─────────────────────────────────────────────────────────────┘

Bootstrap Restrictions in Practice

// Stage 0 compatible (restricted Simplex)
fn factorial(n: i64) -> i64 {
    // No while loop, use recursion
    if n <= 1 {
        1
    } else {
        n * factorial(n - 1)
    }
}

// Stage 1+ only (full Simplex)
fn factorial(n: i64) -> i64 {
    var result = 1
    var i = 2
    while i <= n {
        result *= i
        i += 1
    }
    result
}

Toolchain Size

// Component sizes (lines of Simplex)
| Component          | Files | Lines  |
|--------------------|-------|--------|
| Compiler (sxc)     | 1     | ~350   |
| Package Mgr (spx)  | 1     | ~650   |
| Runtime (cursus)   | 1     | ~600   |
| Lexer              | 4     | ~500   |
| Parser             | 6     | ~2,500 |
| Type System        | 5     | ~2,500 |
| Code Generation    | 5     | ~2,200 |
| Runtime System     | 6     | ~1,800 |
| Standard Library   | 11    | ~5,500 |
|--------------------|-------|--------|
| Total              | 40    | ~16,600|

CLI Reference

sxc (Compiler)

# Compile to native binary
sxc build main.sx -o myapp

# Compile to bytecode
sxc build main.sx -o myapp.sxb --target bytecode

# Compile and run
sxc run main.sx

# Type-check only
sxc check main.sx

# Emit LLVM IR
sxc emit-ir main.sx -o main.ll

# Optimization levels
sxc build main.sx -O0  # No optimization (debug)
sxc build main.sx -O2  # Standard optimization
sxc build main.sx -O3  # Aggressive optimization

# Cross-compilation
sxc build main.sx --target aarch64-apple-darwin
sxc build main.sx --target x86_64-unknown-linux-gnu

# Interactive REPL
sxc repl

spx (Package Manager)

# Create new project
spx new myproject
spx init  # In existing directory

# Build and run
spx build
spx run
spx test

# Dependencies
spx add json@^2.0
spx add http --features tls
spx remove unused-package
spx update

# Publishing
spx publish

# Utilities
spx clean
spx fmt
spx doc

cursus (VM)

# Execute bytecode
cursus run myapp.sxb

# Run as daemon
cursus daemon --port 8080

# Cluster mode
cursus cluster --config cluster.toml

# Disassemble bytecode
cursus disasm myapp.sxb

# Debug mode
cursus debug myapp.sxb

Summary

The Simplex toolchain provides:

  • Multi-stage compilation: Lexer, parser, type checker, code generator
  • Dual targets: Native (LLVM) and bytecode (cursus VM)
  • Rich type system: Hindley-Milner inference, generics, traits, ownership
  • Actor model: First-class actors, supervision, checkpointing
  • AI primitives: Inference, embeddings, structured extraction
  • CHAI architecture: Specialists, hives, routing strategies
  • Mnemonic extensions: Memory tiers, belief systems, BDI agents
  • Self-hosting: 16,600 lines of pure Simplex

The toolchain enables building cognitive AI systems with the safety of static typing, the performance of native code, and the flexibility of portable bytecode.


For the high-level overview, see Part 1: Executive Overview.