This is Part 2 of our toolchain series. Part 1 covered the architecture at a high level; this post goes deep into the implementation.
We'll cover the complete language specification, compiler internals, bytecode format, VM architecture, and the cognitive extensions that make Simplex unique. Code examples throughout.
Table of Contents
- Compilation Pipeline
- Language Specification
- Type System
- Bytecode Format
- VM Architecture
- Actor System
- AI Primitives
- CHAI Specialists and Hives
- Mnemonic Extensions
- Bootstrap Process
1. Compilation Pipeline
The sxc compiler transforms Simplex source into executable output through a multi-stage pipeline:
Source (.sx)
│
▼
┌─────────────────┐
│ Lexer │ Tokenization (~500 lines)
│ (lexer.sx) │ Character stream → Token stream
└────────┬────────┘
│
▼
┌─────────────────┐
│ Parser │ AST Construction (~2,500 lines)
│ (parser.sx) │ Token stream → Abstract Syntax Tree
└────────┬────────┘
│
▼
┌─────────────────┐
│ Type Checker │ Semantic Analysis (~2,500 lines)
│ (types/*.sx) │ AST → Typed AST + Error reports
└────────┬────────┘
│
▼
┌─────────────────┐
│ Code Gen │ Output Generation (~2,200 lines)
│ (codegen/*.sx) │ Typed AST → LLVM IR or Bytecode
└────────┬────────┘
│
┌────┴────┐
│ │
▼ ▼
LLVM IR Bytecode
(.ll) (.sxb)
│ │
▼ │
Native │
Binary │
│ │
└────┬────┘
│
▼
Execution
Lexer: Tokenization
The lexer converts character streams into tokens. It uses pre-computed lookup tables for performance:
// Character classification table (O(1) lookup)
let CHAR_CLASS: [CharClass; 256] = init_char_classes()
enum CharClass {
Whitespace,
Alpha,
Digit,
Operator,
Delimiter,
Quote,
Invalid
}
fn next_token(input: &str, pos: &mut usize) -> Token {
skip_whitespace(input, pos)
let c = input[*pos]
match CHAR_CLASS[c as usize] {
CharClass::Alpha => scan_identifier(input, pos),
CharClass::Digit => scan_number(input, pos),
CharClass::Quote => scan_string(input, pos),
CharClass::Operator => scan_operator(input, pos),
CharClass::Delimiter => scan_delimiter(input, pos),
_ => Token::Invalid(c)
}
}
Keywords are identified via hash table lookup after scanning an identifier:
// Keyword hash table
let KEYWORDS: HashMap<String, TokenKind> = [
("fn", TokenKind::Fn),
("let", TokenKind::Let),
("var", TokenKind::Var),
("if", TokenKind::If),
("else", TokenKind::Else),
("match", TokenKind::Match),
("for", TokenKind::For),
("while", TokenKind::While),
("return", TokenKind::Return),
("actor", TokenKind::Actor),
("receive", TokenKind::Receive),
("send", TokenKind::Send),
("ask", TokenKind::Ask),
("spawn", TokenKind::Spawn),
("specialist", TokenKind::Specialist),
("hive", TokenKind::Hive),
("infer", TokenKind::Infer),
("checkpoint", TokenKind::Checkpoint),
// ... 50+ keywords
].into()
Parser: AST Construction
The parser uses Pratt parsing (top-down operator precedence) for expressions. This elegantly handles operator precedence and associativity:
fn parse_expression(tokens: &[Token], min_precedence: u8) -> Expr {
let mut left = parse_prefix(tokens)
while let Some(op) = peek_infix_op(tokens) {
let (prec, assoc) = precedence(op)
if prec < min_precedence {
break
}
advance(tokens) // consume operator
let next_min = match assoc {
Assoc::Left => prec + 1,
Assoc::Right => prec
}
let right = parse_expression(tokens, next_min)
left = Expr::Binary(op, Box::new(left), Box::new(right))
}
left
}
// Operator precedence table
fn precedence(op: BinaryOp) -> (u8, Assoc) {
match op {
// Assignment (right-associative)
BinaryOp::Assign => (1, Assoc::Right),
// Logical OR
BinaryOp::Or => (2, Assoc::Left),
// Logical AND
BinaryOp::And => (3, Assoc::Left),
// Comparison
BinaryOp::Eq | BinaryOp::Ne => (4, Assoc::Left),
BinaryOp::Lt | BinaryOp::Le |
BinaryOp::Gt | BinaryOp::Ge => (5, Assoc::Left),
// Arithmetic
BinaryOp::Add | BinaryOp::Sub => (6, Assoc::Left),
BinaryOp::Mul | BinaryOp::Div |
BinaryOp::Mod => (7, Assoc::Left),
// Exponentiation (right-associative)
BinaryOp::Pow => (8, Assoc::Right),
}
}
Type Checker: Hindley-Milner Inference
Simplex uses Hindley-Milner type inference with extensions for traits and ownership. The core algorithm:
struct TypeEnv {
bindings: HashMap<String, TypeScheme>,
substitutions: HashMap<TypeVar, Type>
}
fn infer(env: &mut TypeEnv, expr: &Expr) -> Result<Type, TypeError> {
match expr {
Expr::Var(name) => {
let scheme = env.lookup(name)?;
Ok(instantiate(scheme))
}
Expr::Lambda(param, body) => {
let param_type = fresh_type_var();
env.bind(param, TypeScheme::mono(param_type.clone()));
let body_type = infer(env, body)?;
Ok(Type::Function(Box::new(param_type), Box::new(body_type)))
}
Expr::Apply(func, arg) => {
let func_type = infer(env, func)?;
let arg_type = infer(env, arg)?;
let result_type = fresh_type_var();
unify(func_type, Type::Function(
Box::new(arg_type),
Box::new(result_type.clone())
))?;
Ok(result_type)
}
Expr::Let(name, value, body) => {
let value_type = infer(env, value)?;
let scheme = generalize(env, value_type);
env.bind(name, scheme);
infer(env, body)
}
// ... pattern matching, actors, etc.
}
}
fn unify(t1: Type, t2: Type) -> Result<(), TypeError> {
match (t1, t2) {
(Type::Var(v), t) | (t, Type::Var(v)) => {
if occurs_check(v, &t) {
Err(TypeError::InfiniteType)
} else {
bind(v, t)
}
}
(Type::Function(a1, r1), Type::Function(a2, r2)) => {
unify(*a1, *a2)?;
unify(*r1, *r2)
}
(Type::Con(n1, args1), Type::Con(n2, args2)) if n1 == n2 => {
for (a1, a2) in args1.iter().zip(args2.iter()) {
unify(a1.clone(), a2.clone())?;
}
Ok(())
}
(t1, t2) => Err(TypeError::Mismatch(t1, t2))
}
}
Code Generation
Code generation targets either LLVM IR or Simplex bytecode:
fn codegen(ast: &TypedAst, target: Target) -> Output {
match target {
Target::Native => {
let ir = emit_llvm_ir(ast);
Output::LlvmIr(ir)
}
Target::Bytecode => {
let bytecode = emit_bytecode(ast);
Output::Bytecode(bytecode)
}
}
}
fn emit_llvm_ir(ast: &TypedAst) -> String {
let mut builder = StringBuilder::new();
// Emit module header
builder.append("target triple = \"");
builder.append(target_triple());
builder.append("\"\n\n");
// Emit type definitions
for typedef in &ast.types {
emit_type_def(&mut builder, typedef);
}
// Emit function definitions
for func in &ast.functions {
emit_function(&mut builder, func);
}
// Emit actor definitions
for actor in &ast.actors {
emit_actor(&mut builder, actor);
}
builder.to_string()
}
2. Language Specification
Basic Types
// Primitive types
let b: bool = true
let i: i64 = 42
let f: f64 = 3.14159
let s: String = "hello"
let c: char = 'x'
// Unit type (like void)
let u: () = ()
// Optional values
let maybe: Option<i64> = Some(42)
let nothing: Option<i64> = None
// Error handling
let result: Result<i64, String> = Ok(42)
let error: Result<i64, String> = Err("failed")
Collections
// Lists (dynamic arrays)
let numbers: List<i64> = [1, 2, 3, 4, 5]
let first = numbers[0] // 1
let len = numbers.len() // 5
// Maps (hash tables)
let scores: Map<String, i64> = {
"alice": 100,
"bob": 85,
"charlie": 92
}
let alice_score = scores["alice"] // 100
// Sets
let unique: Set<i64> = {1, 2, 3}
let has_two = unique.contains(2) // true
// Vectors (fixed-size, for math/ML)
let vec: Vector<f64, 3> = [1.0, 2.0, 3.0]
let embedding: Vector<f64, 384> = ai::embed("hello")
Functions
// Basic function
fn add(a: i64, b: i64) -> i64 {
a + b
}
// Generic function
fn identity<T>(x: T) -> T {
x
}
// Function with trait bounds
fn print_all<T: Display>(items: List<T>) {
for item in items {
println(item.to_string())
}
}
// Closures
let double = (x: i64) => x * 2
let numbers = [1, 2, 3].map((x) => x * 2) // [2, 4, 6]
// Higher-order functions
fn apply_twice<T>(f: fn(T) -> T, x: T) -> T {
f(f(x))
}
Pattern Matching
// Basic matching
fn describe(n: i64) -> String {
match n {
0 => "zero",
1 => "one",
2..=9 => "single digit",
10..=99 => "double digit",
_ => "large"
}
}
// Destructuring
struct Point { x: f64, y: f64 }
fn distance_from_origin(p: Point) -> f64 {
match p {
Point { x, y } => (x*x + y*y).sqrt()
}
}
// Enum matching
enum Shape {
Circle { radius: f64 },
Rectangle { width: f64, height: f64 },
Triangle { base: f64, height: f64 }
}
fn area(shape: Shape) -> f64 {
match shape {
Shape::Circle { radius } => 3.14159 * radius * radius,
Shape::Rectangle { width, height } => width * height,
Shape::Triangle { base, height } => 0.5 * base * height
}
}
// Guards
fn classify(n: i64) -> String {
match n {
x if x < 0 => "negative",
x if x == 0 => "zero",
x if x % 2 == 0 => "positive even",
_ => "positive odd"
}
}
Structs and Enums
// Struct definition
struct User {
id: i64,
name: String,
email: String,
active: bool
}
// Struct instantiation
let user = User {
id: 1,
name: "Alice",
email: "alice@example.com",
active: true
}
// Struct update syntax
let inactive_user = User { active: false, ..user }
// Enum with data
enum Message {
Text(String),
Image { url: String, width: i64, height: i64 },
Video { url: String, duration: f64 },
Empty
}
// Using enums
let msg = Message::Image {
url: "https://example.com/img.png",
width: 800,
height: 600
}
Traits
// Trait definition
trait Display {
fn to_string(&self) -> String
}
trait Numeric {
fn zero() -> Self
fn add(&self, other: &Self) -> Self
}
// Trait implementation
impl Display for User {
fn to_string(&self) -> String {
format!("User(id={}, name={})", self.id, self.name)
}
}
impl Numeric for i64 {
fn zero() -> i64 { 0 }
fn add(&self, other: &i64) -> i64 { *self + *other }
}
// Associated types
trait Iterator {
type Item
fn next(&mut self) -> Option<Self::Item>
}
// Trait bounds
fn sum<T: Numeric>(items: List<T>) -> T {
items.fold(T::zero(), (acc, x) => acc.add(&x))
}
Error Handling
// Result type
fn divide(a: f64, b: f64) -> Result<f64, String> {
if b == 0.0 {
Err("division by zero")
} else {
Ok(a / b)
}
}
// The ? operator
fn calculate(x: f64, y: f64, z: f64) -> Result<f64, String> {
let a = divide(x, y)? // Returns early if Err
let b = divide(a, z)?
Ok(b)
}
// Pattern matching on Result
match divide(10.0, 2.0) {
Ok(result) => println("Result: {result}"),
Err(e) => println("Error: {e}")
}
// Unwrap (panics on Err)
let result = divide(10.0, 2.0).unwrap() // 5.0
// Default on error
let result = divide(10.0, 0.0).unwrap_or(0.0) // 0.0
Modules
// File: math/geometry.sx
pub mod geometry {
pub struct Point { pub x: f64, pub y: f64 }
pub fn distance(a: &Point, b: &Point) -> f64 {
let dx = b.x - a.x
let dy = b.y - a.y
(dx*dx + dy*dy).sqrt()
}
// Private helper
fn validate_point(p: &Point) -> bool {
p.x.is_finite() && p.y.is_finite()
}
}
// File: main.sx
use math::geometry::{Point, distance}
fn main() {
let a = Point { x: 0.0, y: 0.0 }
let b = Point { x: 3.0, y: 4.0 }
println("Distance: {}", distance(&a, &b)) // 5.0
}
3. Type System
Generic Monomorphization
Simplex compiles generics via monomorphization—generating specialized code for each concrete type:
// Source: generic function
fn max<T: Ord>(a: T, b: T) -> T {
if a > b { a } else { b }
}
// Usage
let x = max(10, 20) // i64
let y = max(3.14, 2.72) // f64
let z = max("alpha", "beta") // String
// Compiler generates:
fn max_i64(a: i64, b: i64) -> i64 { ... }
fn max_f64(a: f64, b: f64) -> f64 { ... }
fn max_String(a: String, b: String) -> String { ... }
Ownership and Borrowing
Simplex uses ownership semantics for memory safety without garbage collection:
// Ownership transfer
fn take_ownership(s: String) {
println(s)
} // s is dropped here
let greeting = "hello".to_string()
take_ownership(greeting)
// greeting is no longer valid here
// Borrowing (immutable reference)
fn borrow(s: &String) {
println(s)
}
let greeting = "hello".to_string()
borrow(&greeting)
// greeting is still valid
// Mutable borrowing
fn modify(s: &mut String) {
s.push_str(" world")
}
let mut greeting = "hello".to_string()
modify(&mut greeting)
// greeting is now "hello world"
Const Generics
// Fixed-size vectors with compile-time dimensions
struct Vector<T, const N: usize> {
data: [T; N]
}
impl<T: Numeric, const N: usize> Vector<T, N> {
fn dot(&self, other: &Vector<T, N>) -> T {
let mut sum = T::zero()
for i in 0..N {
sum = sum.add(&self.data[i].mul(&other.data[i]))
}
sum
}
}
// Embeddings with fixed dimension
type Embedding = Vector<f64, 384>
fn cosine_similarity(a: &Embedding, b: &Embedding) -> f64 {
a.dot(b) / (a.magnitude() * b.magnitude())
}
4. Bytecode Format
SXB File Structure
┌──────────────────────────────────────┐
│ HEADER (32 bytes) │
├──────────────────────────────────────┤
│ Magic: "SXB" (3 bytes) │
│ Version: u16 │
│ Flags: u16 │
│ Entry Point: u32 (function index) │
│ String Table Offset: u32 │
│ Type Table Offset: u32 │
│ Function Table Offset: u32 │
│ Code Section Offset: u32 │
│ Reserved: 8 bytes │
├──────────────────────────────────────┤
│ STRING TABLE │
├──────────────────────────────────────┤
│ Count: u32 │
│ [Length: u32, Data: bytes]... │
├──────────────────────────────────────┤
│ TYPE TABLE │
├──────────────────────────────────────┤
│ Count: u32 │
│ [TypeDef]... │
├──────────────────────────────────────┤
│ FUNCTION TABLE │
├──────────────────────────────────────┤
│ Count: u32 │
│ [FunctionDef]... │
│ - Name Index: u32 │
│ - Hash: [u8; 32] (SHA-256) │
│ - Param Count: u16 │
│ - Local Count: u16 │
│ - Code Offset: u32 │
│ - Code Length: u32 │
├──────────────────────────────────────┤
│ CODE SECTION │
├──────────────────────────────────────┤
│ [Instructions]... │
└──────────────────────────────────────┘
Instruction Set
// Stack Operations (0x00-0x0F)
0x00 NOP // No operation
0x01 PUSH_CONST n // Push constant pool[n]
0x02 PUSH_LOCAL n // Push local variable[n]
0x03 STORE_LOCAL n // Pop and store to local[n]
0x04 POP // Discard top of stack
0x05 DUP // Duplicate top of stack
0x06 SWAP // Swap top two values
0x07 ROT // Rotate top three values
// Integer Arithmetic (0x10-0x1F)
0x10 ADD // a + b
0x11 SUB // a - b
0x12 MUL // a * b
0x13 DIV // a / b
0x14 MOD // a % b
0x15 NEG // -a
// Float Arithmetic (0x18-0x1F)
0x18 FADD // a + b (float)
0x19 FSUB // a - b (float)
0x1A FMUL // a * b (float)
0x1B FDIV // a / b (float)
// Bitwise Operations (0x20-0x2F)
0x20 AND // a & b
0x21 OR // a | b
0x22 XOR // a ^ b
0x23 NOT // ~a
0x24 SHL // a << b
0x25 SHR // a >> b
// Comparisons (0x30-0x3F)
0x30 EQ // a == b
0x31 NE // a != b
0x32 LT // a < b
0x33 LE // a <= b
0x34 GT // a > b
0x35 GE // a >= b
// Control Flow (0x40-0x4F)
0x40 JMP offset // Unconditional jump
0x41 JMP_IF offset // Jump if true
0x42 JMP_UNLESS o // Jump if false
0x43 CALL n // Call function[n]
0x44 RET // Return from function
0x45 TAIL_CALL n // Tail call optimization
// Objects (0x50-0x5F)
0x50 NEW_OBJ t // Create object of type[t]
0x51 GET_FIELD f // Get field[f] from object
0x52 SET_FIELD f // Set field[f] on object
0x53 NEW_LIST n // Create list with n elements
0x54 NEW_MAP n // Create map with n entries
0x55 INDEX // list[index] or map[key]
0x56 INDEX_SET // list[index] = value
// Actor Operations (0x60-0x6F)
0x60 SPAWN t // Spawn actor of type[t]
0x61 SEND // Send message (fire-and-forget)
0x62 ASK // Send message and await response
0x63 RECEIVE // Await next message
0x64 SELF // Push current actor reference
0x65 CHECKPOINT // Save actor state
// AI Operations (0x70-0x7F)
0x70 AI_COMPLETE // Text completion
0x71 AI_EMBED // Generate embedding
0x72 AI_NEAREST // Nearest neighbor search
0x73 AI_STREAM_START // Start streaming completion
0x74 AI_STREAM_NEXT // Get next stream chunk
0x75 AI_EXTRACT // Structured extraction
// Debug (0xF0-0xFF)
0xF0 DEBUG_PRINT // Print top of stack
0xFF HALT // Stop execution
Content Addressing
Every function is identified by the SHA-256 hash of its bytecode:
struct FunctionDef {
name: String,
hash: [u8; 32], // SHA-256 of bytecode
params: List<Type>,
returns: Type,
bytecode: List<u8>
}
fn compute_function_hash(func: &FunctionDef) -> [u8; 32] {
let mut hasher = Sha256::new()
// Include signature in hash
hasher.update(&serialize(&func.params))
hasher.update(&serialize(&func.returns))
// Include bytecode
hasher.update(&func.bytecode)
hasher.finalize()
}
5. VM Architecture
Runtime Structure
struct SimplexVM {
// Execution state
stack: Vec<Value>,
call_stack: Vec<CallFrame>,
ip: usize, // Instruction pointer
// Memory
heap: Heap,
gc: GarbageCollector,
// Actor system
actors: ActorRegistry,
scheduler: ActorScheduler,
mailboxes: Map<ActorId, Mailbox>,
// Code
modules: Map<Hash, Module>,
function_cache: Map<Hash, CompiledFunction>,
// Checkpointing
checkpoint_manager: CheckpointManager,
// Optional JIT
jit: Option<JitCompiler>
}
struct CallFrame {
function_hash: [u8; 32],
ip: usize,
base_pointer: usize,
locals: Vec<Value>
}
Execution Loop
fn execute(vm: &mut SimplexVM) -> Result<Value, RuntimeError> {
loop {
let opcode = vm.fetch_byte()
match opcode {
PUSH_CONST => {
let idx = vm.fetch_u32()
let value = vm.constants[idx].clone()
vm.stack.push(value)
}
ADD => {
let b = vm.stack.pop()?.as_i64()?
let a = vm.stack.pop()?.as_i64()?
vm.stack.push(Value::I64(a + b))
}
CALL => {
let func_idx = vm.fetch_u32()
let func = &vm.functions[func_idx]
// Check for JIT compilation
if let Some(jit) = &vm.jit {
if func.call_count > JIT_THRESHOLD {
let native = jit.compile(func)?
return native.execute(&mut vm.stack)
}
}
// Interpreted execution
let frame = CallFrame {
function_hash: func.hash,
ip: 0,
base_pointer: vm.stack.len() - func.param_count,
locals: vec![Value::Nil; func.local_count]
}
vm.call_stack.push(frame)
vm.ip = func.code_offset
}
SPAWN => {
let actor_type = vm.fetch_u32()
let args = vm.pop_n(vm.fetch_u16() as usize)
let actor_id = vm.actors.spawn(actor_type, args)?
vm.stack.push(Value::ActorRef(actor_id))
}
CHECKPOINT => {
let actor_id = vm.current_actor()?
vm.checkpoint_manager.checkpoint(actor_id)?
}
RET => {
let result = vm.stack.pop()?
if vm.call_stack.is_empty() {
return Ok(result)
}
let frame = vm.call_stack.pop().unwrap()
vm.stack.truncate(frame.base_pointer)
vm.stack.push(result)
vm.ip = frame.return_address
}
HALT => return Ok(vm.stack.pop().unwrap_or(Value::Nil)),
_ => return Err(RuntimeError::InvalidOpcode(opcode))
}
}
}
Garbage Collection
struct GarbageCollector {
heap: Heap,
roots: Set<*mut Object>,
threshold: usize,
allocated: usize
}
impl GarbageCollector {
fn collect_if_needed(&mut self) {
if self.allocated > self.threshold {
self.collect()
}
}
fn collect(&mut self) {
// Mark phase
let mut marked = Set::new()
let mut worklist: Vec<*mut Object> = self.roots.iter().copied().collect()
while let Some(obj) = worklist.pop() {
if marked.insert(obj) {
// Add all referenced objects
for child in (*obj).references() {
worklist.push(child)
}
}
}
// Sweep phase
self.heap.retain(|obj| marked.contains(&obj))
self.allocated = self.heap.size()
// Adjust threshold
self.threshold = (self.allocated as f64 * 1.5) as usize
}
}
6. Actor System
Actor Definition
actor Counter {
var count: i64 = 0
// Initialize with starting value
fn init(start: i64) {
count = start
}
// Handle increment message
receive Increment {
count += 1
}
// Handle increment by amount
receive Add(n: i64) {
count += n
}
// Handle query with response
receive GetCount -> i64 {
count
}
// Handle reset
receive Reset {
count = 0
checkpoint() // Persist state
}
}
// Using the actor
fn main() {
let counter = spawn Counter(10)
send(counter, Increment)
send(counter, Add(5))
let value = ask(counter, GetCount) // 16
println("Count: {value}")
}
Supervision Trees
// Supervisor definition
supervisor CounterSystem {
strategy: OneForOne, // Restart only failed child
max_restarts: 3, // Max 3 restarts
within: 60.seconds, // Within 60 seconds
children: [
child(Counter, args: [0], restart: Always),
child(Counter, args: [100], restart: Always),
child(Logger, restart: OnFailure)
]
}
// Restart strategies
enum SupervisorStrategy {
OneForOne, // Restart only the failed child
OneForAll, // Restart all children if one fails
RestForOne // Restart failed child and all started after it
}
// Restart policies
enum RestartPolicy {
Always, // Always restart
OnFailure, // Restart only on failure (not normal exit)
Never // Never restart
}
Message Passing
// Fire-and-forget (async)
send(actor, Message)
// Request-response (sync)
let response = ask(actor, Query)
// Request with timeout
let response = ask(actor, Query, timeout: 5.seconds)?
// Broadcast to all actors of a type
broadcast<Counter>(Reset)
// Pipeline pattern
let result = actor1
|> ask(Process)
|> ask(actor2, Transform)
|> ask(actor3, Finalize)
Checkpointing
actor StatefulProcessor {
var processed: i64 = 0
var results: List<String> = []
receive Process(data: String) -> String {
let result = transform(data)
results.push(result.clone())
processed += 1
// Checkpoint every 100 items
if processed % 100 == 0 {
checkpoint()
}
result
}
// Lifecycle hook: called after recovery
fn on_resume() {
println("Resumed with {processed} items processed")
}
}
7. AI Primitives
Inference
// Basic completion
let response = await ai::complete("Explain quantum computing")
// With options
let response = await ai::complete(
"Summarize this article: {article}",
model: "mistral-7b",
temperature: 0.7,
max_tokens: 500
)
// Streaming
for chunk in ai::stream("Write a story about...") {
print(chunk)
}
// Structured extraction
struct ContactInfo {
name: String,
email: Option<String>,
phone: Option<String>
}
let contact = await ai::extract<ContactInfo>(
"John Smith can be reached at john@example.com or 555-1234"
)
// ContactInfo { name: "John Smith", email: Some("john@example.com"), phone: Some("555-1234") }
// Classification
enum Sentiment { Positive, Negative, Neutral }
let sentiment = await ai::classify<Sentiment>("This product is amazing!")
// Sentiment::Positive
Embeddings
// Single embedding
let embedding: Vector<f64, 384> = ai::embed("Hello, world!")
// Batch embeddings
let texts = ["apple", "banana", "orange"]
let embeddings = ai::embed_batch(texts)
// Similarity search
let query = ai::embed("fruit")
let results = ai::nearest(query, embeddings, k: 2)
// Returns indices of most similar embeddings
// Cosine similarity
let similarity = ai::similarity(embedding1, embedding2) // 0.0 to 1.0
8. CHAI Specialists and Hives
Specialist Definition
specialist Summarizer {
model: "mistral-7b-instruct",
domain: "text summarization",
temperature: 0.3,
max_tokens: 200,
system_prompt: "You are a concise summarizer.
Provide brief, accurate summaries.",
receive Summarize(text: String) -> String {
infer("Summarize the following text:\n\n{text}")
}
receive SummarizeWithLength(text: String, max_words: i64) -> String {
infer("Summarize in {max_words} words or less:\n\n{text}")
}
}
specialist CodeAnalyzer {
model: "codellama-7b",
domain: "code analysis",
temperature: 0.1,
receive Explain(code: String) -> String {
infer("Explain what this code does:\n```\n{code}\n```")
}
receive FindBugs(code: String) -> List<String> {
let response = infer("List potential bugs in this code:\n```\n{code}\n```")
parse_bullet_points(response)
}
receive Refactor(code: String, goal: String) -> String {
infer("Refactor this code to {goal}:\n```\n{code}\n```")
}
}
Hive Definition
hive DocumentProcessor {
specialists: [
Summarizer,
EntityExtractor,
SentimentAnalyzer,
TopicClassifier
],
router: SemanticRouter {
embedding_model: "all-minilm-l6-v2",
domain_embeddings: auto // Computed from specialist domains
},
strategy: OneForOne,
max_concurrent: 10,
// Shared memory for all specialists
memory: SharedVectorStore {
dimension: 384,
max_entries: 100_000
},
receive ProcessDocument(doc: String) -> DocumentAnalysis {
// Parallel processing
let (summary, entities, sentiment, topics) = parallel(
ask(Summarizer, Summarize(doc)),
ask(EntityExtractor, Extract(doc)),
ask(SentimentAnalyzer, Analyze(doc)),
ask(TopicClassifier, Classify(doc))
)
DocumentAnalysis {
summary,
entities,
sentiment,
topics
}
}
}
Routing Strategies
// Semantic routing (default)
router: SemanticRouter {
embedding_model: "all-minilm-l6-v2"
}
// Rule-based routing
router: RuleRouter {
rules: [
("code|program|function", CodeAnalyzer),
("summarize|summary|tldr", Summarizer),
("sentiment|feeling|emotion", SentimentAnalyzer)
],
default: GeneralAssistant
}
// LLM-based routing
router: LlmRouter {
model: "tinyllama-1.1b",
prompt: "Given this task, which specialist should handle it?"
}
// Cascade routing (try in order until success)
router: CascadeRouter {
order: [FastSpecialist, MediumSpecialist, HighQualitySpecialist],
success_threshold: 0.8
}
Ensemble Patterns
// Parallel execution
let results = parallel(
ask(Specialist1, Query),
ask(Specialist2, Query),
ask(Specialist3, Query)
)
// Voting (majority wins)
let consensus = vote([
ask(Judge1, Evaluate(item)),
ask(Judge2, Evaluate(item)),
ask(Judge3, Evaluate(item))
])
// Weighted combination
let combined = weighted_average([
(ask(Expert1, Score(item)), 0.5),
(ask(Expert2, Score(item)), 0.3),
(ask(Expert3, Score(item)), 0.2)
])
// Chain of thought
let result = item
|> ask(Analyzer, BreakDown)
|> ask(Reasoner, ProcessSteps)
|> ask(Synthesizer, Combine)
9. Mnemonic Extensions
Memory Tiers
// Memory entry structure
struct MemoryEntry {
id: String,
content: String,
embedding: Vector<f64, 384>,
tier: MemoryTier,
truth_category: TruthCategory,
confidence: ConfidenceScore,
domains: List<String>,
tags: List<String>,
created_at: Timestamp,
accessed_at: Timestamp,
access_count: i64
}
enum MemoryTier {
ShortTerm, // 24-72h TTL, working hypotheses
LongTerm, // Indefinite, validated knowledge
Persistent // Permanent, core identity
}
// Storing memories
remember {
content: "User prefers dark mode",
memory_type: MemoryType::Preference,
truth_category: TruthCategory::Contextual {
domains: ["ui", "settings"]
},
confidence: 0.9,
tier: MemoryTier::LongTerm
}
// Recalling memories
let memories = recall about "user preferences"
from tiers [LongTerm, Persistent]
limit 10
// Forgetting
forget memory_id reason: "Outdated information"
// Archiving
archive memories where accessed_at < 30.days.ago
to "archive/old_memories.sxb"
Truth Categories
enum TruthCategory {
// Empirically verifiable facts
Absolute,
// True within specific domains
Contextual { domains: List<String> },
// Subjective preferences/values
Opinion,
// Derived from patterns with stated confidence
Inferred {
confidence: f64,
sources: List<String>
}
}
// Examples
let fact = MemoryEntry {
content: "Python 3.12 was released October 2023",
truth_category: TruthCategory::Absolute,
confidence: ConfidenceScore {
source_reliability: 1.0,
recency: 0.9,
corroboration_count: 5,
contradiction_count: 0
},
...
}
let contextual = MemoryEntry {
content: "React is best for component-based UIs",
truth_category: TruthCategory::Contextual {
domains: ["web", "frontend"]
},
...
}
let opinion = MemoryEntry {
content: "User thinks tabs are better than spaces",
truth_category: TruthCategory::Opinion,
...
}
let inferred = MemoryEntry {
content: "User likely works in finance",
truth_category: TruthCategory::Inferred {
confidence: 0.75,
sources: ["frequent finance queries", "timezone patterns"]
},
...
}
Belief System
// Type-level belief representation
Belief<T, C, τ> where
T: Type, // Content type
C: ConfidenceLevel, // High, Medium, Low
τ: TruthCategory // Absolute, Contextual, Opinion, Inferred
// Creating beliefs
let belief: Belief<String, High, Absolute> = believe {
content: "Earth orbits the Sun",
justification: "Scientific consensus"
}
// Querying beliefs
let relevant = query beliefs
where topic matches "astronomy"
and confidence >= 0.7
order by recency desc
// Belief revision (AGM-style)
fn revise_beliefs(new_evidence: Evidence) {
let contradicted = find_contradictions(beliefs, new_evidence)
for belief in contradicted {
if new_evidence.confidence > belief.confidence {
// Contract: remove old belief
contract(belief, reason: "Contradicted by higher-confidence evidence")
// Expand: add new belief
expand(new_evidence)
} else {
// Keep existing belief, note contradiction
belief.add_contradiction(new_evidence)
}
}
}
BDI Agents
agent ResearchAgent {
beliefs: BeliefStore { backing: memory },
desires: GoalQueue { max_concurrent: 5 },
intentions: IntentionStack,
// Autonomous execution loop
autonomous {
tick_rate: 100.ms,
loop {
// 1. Perception
let observations = perceive()
// 2. Belief revision
for obs in observations {
revise_beliefs(obs)
}
// 3. Option generation
let options = generate_options()
// 4. Deliberation
let selected_goals = deliberate(options)
// 5. Planning
for goal in selected_goals {
if !intentions.has_plan_for(goal) {
let plan = create_plan(goal)
intentions.commit(Intention::new(goal, plan))
}
}
// 6. Execution
if let Some(action) = intentions.select_action() {
let result = execute(action)
process_result(action, result)
}
// 7. Learning
send(learner, ObserveStep(observations, action, result))
}
}
receive AssignGoal(goal: Goal) {
desires.add(goal)
}
receive QueryStatus -> AgentStatus {
AgentStatus {
active_goals: desires.active(),
current_intention: intentions.current(),
belief_count: beliefs.count()
}
}
}
// Goal definition
goal FindInformation {
topic: String,
deadline: Option<Timestamp>,
priority: Priority,
success_condition: (beliefs) => {
beliefs.has_high_confidence_belief_about(topic)
}
}
// Plan definition
plan WebResearchPlan {
goal: FindInformation,
steps: [
Action::Search { query: goal.topic },
Action::Evaluate { results: $search_results },
Action::Synthesize { sources: $evaluated_sources },
Action::Remember { content: $synthesis }
],
fallback: plan AskExpertPlan
}
Mnemonic Specialist
mnemonic specialist MemoryEnhancedCoder {
model: "codellama-7b",
domain: "code generation",
// Memory configuration
memory: {
tiers: [ShortTerm, LongTerm],
retrieval_k: 5,
relevance_threshold: 0.7
},
receive GenerateCode(task: String, context: String) -> String {
// Retrieve relevant memories
let memories = recall about task from tiers [LongTerm]
// Build context with memory
let enhanced_context = format!(
"Previous relevant code:\n{}\n\nCurrent context:\n{}\n\nTask: {}",
memories.map(m => m.content).join("\n---\n"),
context,
task
)
let code = infer(enhanced_context)
// Store successful generation
remember {
content: format!("Task: {task}\nCode: {code}"),
memory_type: MemoryType::Skill,
tier: MemoryTier::LongTerm
}
code
}
}
10. Bootstrap Process
Three-Stage Bootstrap
┌─────────────────────────────────────────────────────────────┐
│ STAGE 0: Python │
│ bootstrap/stage0.py (~2000 lines) │
│ - Minimal Simplex parser (restricted subset) │
│ - Direct Python AST generation │
│ - No optimization, no error recovery │
│ │
│ Restrictions: │
│ - No while loops (use recursion) │
│ - No mutable variables (pure functional) │
│ - No traits or generics │
│ - Simplified pattern matching │
│ - Basic module system only │
└─────────────────────┬───────────────────────────────────────┘
│ compiles
▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 1: Simplex │
│ Written in restricted Simplex, compiled by Stage 0 │
│ - Full language support │
│ - All features: while, var, traits, generics │
│ - Complete pattern matching │
│ - Full module system │
│ - Optimization passes │
│ - Error recovery and reporting │
└─────────────────────┬───────────────────────────────────────┘
│ compiles
▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 2: Simplex │
│ Same source as Stage 1, compiled by Stage 1 │
│ - Produces identical output to Stage 1 │
│ - This identity proves correctness │
│ │
│ Verification: SHA-256(Stage1_binary) == SHA-256(Stage2_binary)
└─────────────────────────────────────────────────────────────┘
Bootstrap Restrictions in Practice
// Stage 0 compatible (restricted Simplex)
fn factorial(n: i64) -> i64 {
// No while loop, use recursion
if n <= 1 {
1
} else {
n * factorial(n - 1)
}
}
// Stage 1+ only (full Simplex)
fn factorial(n: i64) -> i64 {
var result = 1
var i = 2
while i <= n {
result *= i
i += 1
}
result
}
Toolchain Size
// Component sizes (lines of Simplex)
| Component | Files | Lines |
|--------------------|-------|--------|
| Compiler (sxc) | 1 | ~350 |
| Package Mgr (spx) | 1 | ~650 |
| Runtime (cursus) | 1 | ~600 |
| Lexer | 4 | ~500 |
| Parser | 6 | ~2,500 |
| Type System | 5 | ~2,500 |
| Code Generation | 5 | ~2,200 |
| Runtime System | 6 | ~1,800 |
| Standard Library | 11 | ~5,500 |
|--------------------|-------|--------|
| Total | 40 | ~16,600|
CLI Reference
sxc (Compiler)
# Compile to native binary
sxc build main.sx -o myapp
# Compile to bytecode
sxc build main.sx -o myapp.sxb --target bytecode
# Compile and run
sxc run main.sx
# Type-check only
sxc check main.sx
# Emit LLVM IR
sxc emit-ir main.sx -o main.ll
# Optimization levels
sxc build main.sx -O0 # No optimization (debug)
sxc build main.sx -O2 # Standard optimization
sxc build main.sx -O3 # Aggressive optimization
# Cross-compilation
sxc build main.sx --target aarch64-apple-darwin
sxc build main.sx --target x86_64-unknown-linux-gnu
# Interactive REPL
sxc repl
spx (Package Manager)
# Create new project
spx new myproject
spx init # In existing directory
# Build and run
spx build
spx run
spx test
# Dependencies
spx add json@^2.0
spx add http --features tls
spx remove unused-package
spx update
# Publishing
spx publish
# Utilities
spx clean
spx fmt
spx doc
cursus (VM)
# Execute bytecode
cursus run myapp.sxb
# Run as daemon
cursus daemon --port 8080
# Cluster mode
cursus cluster --config cluster.toml
# Disassemble bytecode
cursus disasm myapp.sxb
# Debug mode
cursus debug myapp.sxb
Summary
The Simplex toolchain provides:
- Multi-stage compilation: Lexer, parser, type checker, code generator
- Dual targets: Native (LLVM) and bytecode (cursus VM)
- Rich type system: Hindley-Milner inference, generics, traits, ownership
- Actor model: First-class actors, supervision, checkpointing
- AI primitives: Inference, embeddings, structured extraction
- CHAI architecture: Specialists, hives, routing strategies
- Mnemonic extensions: Memory tiers, belief systems, BDI agents
- Self-hosting: 16,600 lines of pure Simplex
The toolchain enables building cognitive AI systems with the safety of static typing, the performance of native code, and the flexibility of portable bytecode.
For the high-level overview, see Part 1: Executive Overview.