Simplex's cognitive architecture relies on domain-specific specialists—fine-tuned language models that excel at particular tasks. This post documents our complete training pipeline: infrastructure selection, AWS provisioning, LoRA fine-tuning scripts, and why we're starting with Python before porting to pure Simplex.
The Training Challenge
Simplex's hive architecture routes tasks to specialists based on semantic similarity. Each specialist is a LoRA adapter trained on top of a shared base model (Qwen3-8B). The challenge: training 52+ specialists efficiently without breaking the bank.
Our requirements:
- Cost efficiency: Full pipeline under $15 per run
- Reproducibility: Same inputs produce identical adapters
- Modularity: Train individual specialists or the full catalog
- Export ready: Output GGUF files compatible with Ollama
Infrastructure Selection
We evaluated three approaches before settling on AWS spot instances:
| Option | Pros | Cons | Cost (Full Pipeline) |
|---|---|---|---|
| Local GPU | No cloud costs, instant access | Requires 24GB+ VRAM, ties up dev machine | ~$2 electricity |
| Colab/Kaggle | Free tier available, no setup | Session limits, inconsistent GPU availability | $0-10 (Pro tier) |
| AWS Spot | Reliable, scriptable, 24GB A10G | Potential interruption, initial setup | $8-14 |
AWS spot instances won on reliability and automation. The g5.xlarge instance type provides a single NVIDIA A10G with 24GB VRAM—enough for 8B parameter models with LoRA. Spot pricing typically runs 60-70% below on-demand rates.
Instance Selection
g5.xlarge: 1x A10G (24GB), 4 vCPU, 16GB RAM — $1.01/hr on-demand, ~$0.35/hr spot. Sweet spot for single-adapter training. For parallel training of multiple specialists, scale to g5.12xlarge (4x A10G, 96GB total).
AWS Provisioning
Infrastructure is defined in Terraform for reproducibility. The setup creates:
- EC2 instance with Deep Learning AMI (Ubuntu 22.04, NVIDIA drivers pre-installed)
- Security group allowing SSH access
- IAM role with S3 access for artifact storage
- 200GB gp3 EBS volume for model weights and datasets
Terraform Configuration
# variables.tf - Key configuration options
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "g5.xlarge"
validation {
condition = contains([
"g4dn.xlarge", "g4dn.2xlarge",
"g5.xlarge", "g5.2xlarge", "g5.12xlarge",
"p3.2xlarge"
], var.instance_type)
error_message = "Must be a GPU instance type."
}
}
variable "use_spot_instance" {
description = "Use spot instance for cost savings"
type = bool
default = true
}
variable "spot_max_price" {
description = "Maximum hourly price for spot"
type = string
default = "0.50" # 50% of on-demand
}
The bootstrap script (user_data.sh) runs on instance launch, installing dependencies automatically:
#!/bin/bash
# Verify NVIDIA drivers
nvidia-smi
# Create Python environment
python3 -m venv ~/simplex-training
source ~/simplex-training/bin/activate
# Install PyTorch with CUDA 12.1
pip install torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu121
# Install training dependencies
pip install transformers>=4.40.0 accelerate>=0.27.0 \
datasets>=2.18.0 peft>=0.10.0 bitsandbytes>=0.43.0 \
trl>=0.8.0 wandb>=0.16.0
# Configure credentials (from Terraform variables)
wandb login $WANDB_API_KEY
huggingface-cli login --token $HF_TOKEN
Quick Launch Scripts
For those who prefer shell scripts over Terraform:
# Launch a spot instance
./launch_instance.sh
# SSH once ready (~3-5 minutes)
ssh -i ~/.ssh/<your-private-key>.pem ubuntu@<instance-ip>
# When finished
./terminate_instance.sh
The Training Pipeline
Training proceeds through five stages, each producing a LoRA adapter that's merged into the final model:
| Stage | Purpose | Training Time | Dataset Size |
|---|---|---|---|
| 1. Context Protocol | Simplex memory format understanding | 4-6 hours | 100K examples |
| 2. Confidence Calibration | Well-calibrated confidence outputs | 2-4 hours | 50K examples |
| 3. Belief Revision | Updating beliefs on new evidence | 2-4 hours | 30K examples |
| 4. Neural IR/Gates | Soft logic, gradient-aware outputs | 2-3 hours | 25K examples |
| 5. Specialists | Domain-specific LoRA adapters | 30-60 min each | 10-50K per domain |
Stage 1: Context Protocol Training
The context protocol teaches the model Simplex's memory format. The model learns to parse and generate responses using episodic, semantic, procedural, and working memory contexts:
# Example training prompt
<context>
<episodic>User asked about Rust async patterns yesterday</episodic>
<semantic>Rust uses async/await with the tokio runtime</semantic>
<belief confidence="0.85">User prefers concrete examples</belief>
</context>
Query: How do I handle multiple futures?
# Expected output references context appropriately
Response: Building on our async discussion, here's a concrete example
using tokio::join! to await multiple futures concurrently...
The training script generates 100K synthetic examples covering all context types and threshold levels (30% Anima, 50% Hive, 70% Divine).
Stage 2: Confidence Calibration
Simplex requires well-calibrated confidence scores. A model claiming 90% confidence should be correct 90% of the time. We train on four data distributions:
- 35% Factual QA: High confidence (0.97-0.99) on verifiable facts
- 30% Ambiguous: Medium confidence (0.45-0.65) on uncertain questions
- 20% Unknowable: Low confidence (0.05-0.15) on impossible-to-know queries
- 15% Threshold: Comparison and boundary cases
Target metric: Expected Calibration Error (ECE) < 0.05.
Stage 3: Belief Revision
The model learns to update confidence appropriately when presented with new evidence:
# Training example
Initial belief: "The meeting is at 3pm" (confidence: 0.75)
Evidence: "Calendar shows meeting moved to 4pm"
Evidence strength: strong_confirm
Expected output:
Updated belief: "The meeting is at 4pm" (confidence: 0.92)
Reasoning: Calendar is authoritative source, directly contradicts
prior belief about time while confirming meeting exists.
Evidence categories range from strong_confirm (+15-30% confidence) through strong_contradict (-15-30% confidence), with the model learning appropriate magnitude of updates.
Stage 5: Specialist Adapters
The specialist catalog defines 52 domain-specific LoRA adapters:
| Category | Specialists | LoRA Rank |
|---|---|---|
| Document Processing | Invoice, receipt, contract, form extraction | r=16 |
| Coding | Generation, review, debugging, SQL, API integration | r=32 |
| Reasoning | Math, logic, commonsense, causal | r=16 |
| Sentiment | Classification, aspect-based, opinion mining | r=8 |
| Finance | Analysis, sentiment, risk assessment | r=16 |
| Legal | Contract review, compliance, analysis | r=16 |
All training datasets use verified open-source licenses (MIT, Apache 2.0, CC BY, CC0) ensuring commercial viability.
The Fine-Tuning Scripts
Each training stage has a dedicated Python script with a consistent interface:
# Run all stages sequentially
python scripts/run_full_training.py --all
# Run specific stage
python scripts/run_full_training.py --stage 2
# Local test mode (CPU, small dataset)
python scripts/train_context_protocol.py --local-test --generate-data
# Train specific specialist
python scripts/train_specialists.py --specialist invoice_processing
# List available specialists
python scripts/train_specialists.py --list
Core Training Configuration
# training_config.yaml
model:
name: Qwen/Qwen3-8B
torch_dtype: bfloat16
trust_remote_code: true
lora:
r: 16
lora_alpha: 32
target_modules: [q_proj, v_proj]
lora_dropout: 0.05
training:
num_train_epochs: 3
learning_rate: 2.0e-4
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
optim: adamw_8bit
bf16: true
gradient_checkpointing: true
data:
max_seq_length: 4096
train_split: 0.9
seed: 42
Synthetic Data Generation
Training data is generated programmatically to ensure coverage of edge cases:
def generate_context_example():
"""Generate a single context protocol training example."""
context_type = random.choice(['episodic', 'semantic', 'belief', 'hive'])
if context_type == 'episodic':
memory = generate_episodic_memory()
query = generate_related_query(memory)
response = generate_contextual_response(memory, query)
elif context_type == 'belief':
belief, confidence = generate_belief_with_confidence()
threshold = random.choice([0.3, 0.5, 0.7])
# ... generate threshold-aware response
return format_training_example(context, query, response)
Validation Pipeline
Before training all 52 specialists, we validate the approach with three pilot tasks:
| Pilot Task | Difficulty | Baseline | Target |
|---|---|---|---|
| Sentiment Analysis | Easy (classification) | 70-75% | 85%+ |
| SQL Generation | Medium (structured output) | 30-40% | 60%+ |
| Invoice Extraction | Hard (information extraction) | 40-50% | 75%+ |
# Run pilot validation
cd validation
./run_pilots.sh --local-test # CPU-only, ~30 min
./run_pilots.sh --full # Full GPU training + evaluation
Export to Production
Trained adapters export to GGUF format for deployment with Ollama:
python scripts/export_to_gguf.py \
--adapter-path outputs/context_protocol/final \
--quantization q4_k_m
# Creates: exports/simplex-cognitive-8b-q4_k_m.gguf
# Plus: exports/Modelfile
# Deploy to Ollama
ollama create simplex-cognitive-8b -f exports/Modelfile
ollama run simplex-cognitive-8b
The Modelfile includes the system prompt with Simplex-specific instructions for confidence calibration, memory context handling, and threshold awareness.
Why Python? And the Path to Pure Simplex
The training pipeline is written in Python. This deserves explanation given that Simplex is its own language.
Pragmatic Reasons for Python
- Ecosystem maturity: PyTorch, Transformers, PEFT, and the entire ML stack are Python-native. Fighting this adds friction without benefit.
- Rapid iteration: Training experiments need quick turnaround. Python's REPL and debugging tools accelerate development.
- Community resources: Documentation, tutorials, and troubleshooting are overwhelmingly Python-centric.
- Contributor accessibility: Most ML practitioners know Python. Lowering the barrier to contribution matters for an open-source project.
The Honest Assessment
Writing training scripts in Simplex today would mean building FFI bindings to PyTorch, reimplementing data loading pipelines, and debugging at the intersection of two complex systems. The cognitive overhead isn't justified when Python scripts work reliably and cost nothing extra to run.
The Roadmap to Pure Simplex
That said, "training infrastructure in Simplex" is explicitly on our roadmap. The path:
- Phase 1 (Current): Python training scripts, Simplex runtime consumes trained models
- Phase 2: Simplex FFI bindings to PyTorch C++ API (libtorch)
- Phase 3: Native tensor operations in Simplex compiler (post Neural IR)
- Phase 4: Self-hosted training—Simplex trains its own specialists
Phase 4 is the goal: a Simplex program that generates training data, fine-tunes adapters, and validates results—all in pure Simplex. This becomes viable after Neural IR lands (see Neural IR roadmap), when the compiler can emit differentiable computation graphs.
Until then, Python training scripts are a pragmatic bridge. They're isolated from the runtime, well-tested, and produce artifacts (GGUF files) that the Simplex runtime consumes cleanly.
Running the Full Pipeline
End-to-end execution:
# 1. Provision infrastructure
cd infrastructure
terraform init
terraform apply -var="use_spot_instance=true"
# 2. Wait for instance (~5 min), SSH in
ssh -i ~/.ssh/<your-private-key>.pem ubuntu@$(terraform output -raw public_ip)
# 3. Activate environment (auto-configured by user_data.sh)
source ~/start-training.sh
# 4. Run full pipeline
python scripts/run_full_training.py --all
# 5. Monitor progress
nvidia-smi -l 5 # GPU utilization
tail -f outputs/training.log
# 6. Export trained model
python scripts/export_to_gguf.py --adapter-path outputs/final
# 7. Download artifacts and terminate
scp -r ubuntu@<instance-ip>:~/simplex-training/exports ./
terraform destroy
Total wall time: 10-14 hours. Total cost: $8-14 on spot instances.
What's Next
The training infrastructure is functional and documented. Immediate next steps:
- Pilot validation: Run the three-task validation to confirm approach before full specialist training
- Continuous training: GitHub Actions workflow for automated retraining on dataset updates
- Adapter registry: Public repository of pre-trained specialist adapters
- Simplex FFI: Begin Phase 2 bindings to libtorch for future native training
The full training code is available in the Simplex repository under /training. Contributions welcome—especially for new specialist domains and dataset curation.
Training Infrastructure
Scripts, configs, and documentation for specialist training