Autonomous CTA Strategy Research Platform — Powered by LLM Agents
Architecture · Research Loop · Backtest Engine · Getting Started
AuraQuant is an agentic-first quantitative research platform for CTA (trend-following) strategies. Instead of using LLMs as code-completion assistants, AuraQuant puts an AI Agent in the role of Principal Researcher — autonomously proposing hypotheses, designing experiments, evaluating results, and managing strategy lifecycle.
The core idea: define constraints and goals; let the agent decide the research path.
Human defines: "Find a robust trend-following strategy on BTC/ETH 1h, Sharpe ≥ 1.2, MaxDD ≤ -15%"
Agent executes: hypothesis → spec → backtest → evaluate → iterate → archive → approve
Human reviews: only at the final gate (frozen → production)
- System state is explicit. Research state lives in typed objects such as
Spec,Run,Report,ArchiveRecord,RunTrace,DataVersion,DataSlice, andFeatureSlice, rather than being inferred from chat history. - Policy and execution are separated. Agent behavior is layered across
system_prompt.md,AGENTS.md,skills/, and the stable tool/runtime surface, so research policy can change without rewriting the execution kernel each time. - Strategy modules have hard boundaries.
Baseline,Filter,Execution, andRiskare treated as separate responsibilities, which keeps entry logic, admission logic, position management, and risk constraints from collapsing into one opaque module. - Data semantics are versioned and traceable.
DataVersion,DataSlice, andFeatureSlicecarry provenance, and multi-timeframe rules use closed bars only, so experiments are easier to replay and compare. - Governance is part of the runtime model.
ArchiveRecordstores lineage and lifecycle state, which lets research runs move through candidate and frozen states automatically while keeping promotion to production behind a human gate. - Long sessions preserve structured state, not raw logs. Compaction and trace artifacts are designed to retain identifiers, decisions, failures, and next actions needed to resume research after many iterations.
AuraQuant's behavior is governed by a four-layer control stack — upper layers evolve with research goals, lower layers remain stable:
┌─────────────────────────────────────────────────────────┐
│ system_prompt.md Role definition + invariant rules │ ← changes rarely
├─────────────────────────────────────────────────────────┤
│ AGENTS.md Project contracts + module bounds │
├─────────────────────────────────────────────────────────┤
│ skills/ 5 domain operators (data/strategy/ │ ← changes per
│ experiment/governance/orchestrator) │ research phase
├─────────────────────────────────────────────────────────┤
│ aura_tools.py Hard-contract tool surface with │ ← stable
│ + backtest engine pre-flight checks & typed I/O │ infrastructure
└─────────────────────────────────────────────────────────┘
graph TB
subgraph "Agent Layer"
SP["System Prompt"]
AG["AGENTS.md<br/>Project Contract"]
SK["Skills<br/>5 Domain Operators"]
end
subgraph "Runtime Kernel — aura/"
ENG["Async Agent Engine<br/>3,500 LOC · streaming · tool DAG"]
CMP["Context Compaction<br/>token budgeting · structured snapshots"]
LLM["Multi-Provider LLM<br/>Anthropic · OpenAI · Gemini · Codex"]
end
subgraph "CTA Infrastructure — auraquant/cta/"
TL["Tool Surface<br/>50+ typed tools · pre-flight validation"]
BT["Backtest Engine<br/>deterministic · realistic fills"]
EXP["Experiment Runner<br/>walk-forward · parallel windows"]
GOV["Governance<br/>archive · approval · lifecycle"]
end
SP --> ENG
AG --> ENG
SK --> ENG
ENG --> CMP
ENG --> LLM
ENG --> TL
TL --> BT
TL --> EXP
TL --> GOV
style ENG fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
style CMP fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
style LLM fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
style TL fill:#16213e,stroke:#0f3460,color:#e0e0e0
style BT fill:#16213e,stroke:#0f3460,color:#e0e0e0
style EXP fill:#16213e,stroke:#0f3460,color:#e0e0e0
style GOV fill:#16213e,stroke:#e0a020,color:#e0e0e0
Each research session progresses through four phases, with the agent autonomously routing to the most urgent gap:
flowchart LR
D["Data<br/>Versioning · DataSlice<br/>Feature engineering"]
S["Strategy<br/>Spec definition<br/>Module assembly"]
E["Experiment<br/>Walk-forward<br/>Optuna param search"]
G["Governance<br/>Archive · Freeze<br/>Human approval"]
D --> S --> E --> G
E -- "hypothesis falsified" --> S
G -- "new direction" --> S
style D fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
style S fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
style E fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
style G fill:#1a1a2e,stroke:#e0a020,color:#e0e0e0
Phase dependencies are hard-enforced: no strategy without qualified data; no experiment without a persisted Spec. The agent identifies the earliest gap and routes accordingly.
The agent's autonomous research follows a structured anti-stagnation protocol:
| Dimension | Trigger | Action |
|---|---|---|
| Parameter | Default starting point | Search optimal params within current signal (Optuna / grid) |
| Structure | Params converge but execution poor | Swap Filter / Execution / Risk combinations |
| Hypothesis | Structure space exhausted | Change alpha hypothesis, switch entry logic |
| Induction | Forced every 10 rounds | Summarize falsified hypotheses, output next directions |
| Research | 3 consecutive dead-ends | Import external CTA strategy ideas |
The induction layer is the critical design: forced summarization every 10 rounds prevents the agent from burning budget on a dead-end direction.
When given convergence targets, the agent enters self-directed mode with structured iteration state:
<ITER_STATE>
iter: 3 / 200
goal: BTC 1h · Sharpe ≥ 1.2 · max_dd ≤ -15%
tried:
- ema_cross(fast=8, slow=21) → Sharpe=0.74 · dd=-19.2%
- trendflex(lb=20) + atr_filter → Sharpe=0.91 · dd=-17.3%
next_action: Try momentum breakout with vol regime filter
status: CONTINUING
</ITER_STATE>Termination conditions: goal achieved · 5 rounds of Sharpe delta < 0.05 (stagnation) · budget exhausted · structural gap · approval gate · user interrupt.
Every CTA strategy is decomposed into four orthogonal, swappable modules using Python Protocols:
flowchart TD
Bar["Each Bar (OHLCV)"] --> BL
subgraph Strategy["Strategy Modules (Protocol-based plugins)"]
BL["Baseline<br/>Generate entry candidates<br/>O(1) per-bar · incremental cache"]
FL["Filter<br/>Discrete pass/reject<br/>FilterDecision audit trail"]
EX["Execution<br/>Post-entry state machine<br/>Trailing stop · fixed SL/TP"]
RM["Risk<br/>Final approval + position sizing<br/>RiskDecision records"]
BL --> FL --> EX --> RM
end
RM --> EN["Backtest Engine<br/>Fills · Positions · Equity Curve"]
Why this decomposition matters:
| Benefit | Example |
|---|---|
| Composable | Replace Execution module; Baseline stays untouched |
| Auditable | Every filter decision recorded as structured FilterDecision |
| A/B testable | Same Baseline + different Risk modules → directly comparable walk-forward results |
| Type-safe | Pydantic model coercion on all module outputs |
| Feature | Detail |
|---|---|
| Order Execution | Realistic fill timing (entry/exit), intrabar vs. bar-close modes |
| Position Model | Long/short, reversals, same-bar conflict resolution |
| Cost Modeling | Commission + slippage, configurable per-side application |
| Price Triggers | Trailing stop, take-profit, stop-loss with deterministic evaluation |
| Walk-Forward | Configurable train/validation/target window splits |
| Parallel Execution | ThreadPoolExecutor with CPU-aware worker scaling |
| Parameter Search | Optuna integration, 50+ trial batches per window |
Each experiment produces structured, machine-readable artifacts:
- Equity Curve: NAV series with initial capital normalization
- Trade List: Entry/exit timestamps, quantities, PnL per trade
- Step Summary: Baseline entries, filter decisions, risk approvals per bar
- Window Selection: Parameter assignments per walk-forward window
- EvalReport: Sharpe, max drawdown, win rate, profit factor, and more
Long-running research sessions (100+ experiments, hours of continuous operation) require principled context management. AuraQuant implements structured XML snapshot compaction:
Context Window (~80% utilization threshold)
├── System Prompt + Skills ← always retained
├── Compacted Summary ← structured XML snapshot
│ ├── <overall_goal> ← research objective
│ ├── <key_knowledge> ← falsified hypotheses, discovered patterns
│ ├── <contract_memory> ← error patterns, constraint violations
│ └── <spec_workflow_state> ← current experiment state
└── Recent Messages (budget: 20% of window) ← last N messages, token-counted
Key design decisions:
- Per-provider token budgeting (different limits for Anthropic vs. Codex)
- Minimum 2-message retention floor (prevents losing all recent context)
- Auto-compact at 80% of context window
<AUTO_CONTINUE>signal for seamless session resumption after compaction
The 50+ tool registry follows progressive disclosure — SDK knowledge is provided on-demand, not dumped into the prompt:
Agent needs to write a Baseline module
→ cta__module_scaffold("baselines")
→ Returns template with O(1) per-bar caching skeleton
Agent unsure about parameter constraints
→ cta__sdk_reference("baseline")
→ Returns performance tips + common pitfalls
Agent ready to run experiment
→ Pre-flight automatically validates:
✓ Spec persisted?
✓ Runtime module files exist?
✓ Parameter search space configured?
→ Cheap static checks before expensive backtest runs
| Category | Count | Examples |
|---|---|---|
| Data | 3 | data_version_list, data_slice_create, feature_slice_create |
| Strategy | 8 | module_scaffold, spec_put, sdk_reference, spec_schema |
| Experiment | 5 | experiment_run, experiment_wait, evaluation_show, evaluation_compare |
| Governance | 6 | archive_candidate, approval_request, approval_resolve |
| Agent Lifecycle | 3 | agent_state_save, agent_state_restore, run_trace_query |
Research and deployment are cleanly separated — the agent operates autonomously during research, with a single human gate before production:
stateDiagram-v2
[*] --> candidate: Metrics exceed threshold → auto-archive
candidate --> frozen: Research converges → auto-freeze
candidate --> [*]: Below threshold → discard
frozen --> production: Human approval
frozen --> deprecated: Human rejection
production --> deprecated: Decommission
note right of frozen
Single human gate
Everything before is autonomous
end note
| Layer | Technologies |
|---|---|
| Runtime | Python 3.10+, AsyncIO, Pydantic 2.x, dataclasses |
| LLM | Multi-provider (Anthropic, OpenAI, Gemini, Codex), streaming, token budgeting |
| Data | Pandas 2.x, DuckDB, PyArrow, CCXT (multi-exchange) |
| Optimization | Optuna (hyperparameter search) |
| Indicators | pandas-ta |
| CLI/UX | Rich, structured JSON output |
| Testing | pytest, 39 test files, ~9,700 LOC test coverage |
# Install dependencies
pip install -r requirements.txt
# Configure LLM provider
mkdir -p .aura/config
cp aura/config/models.json.example .aura/config/models.json
# Edit .aura/config/models.json — add your API key
# Initialize workspace and verify
python -m auraquant init .
pytest -q tests/integration
# Launch the agent research loop
python -m auraquant chatAuraQuant/
├── aura/ # LLM Agent runtime kernel
│ └── runtime/
│ ├── engine_agno_async.py # Async agent loop (3,500 LOC)
│ ├── compaction.py # Context compaction & token budgeting
│ └── prompts/ # Compaction prompt templates
├── auraquant/cta/ # CTA domain infrastructure
│ ├── aura_tools.py # Tool surface (50+ tools, 2,300 LOC)
│ ├── cli.py # CLI entry point
│ └── backtest/
│ ├── engine.py # Deterministic backtest engine (1,200 LOC)
│ ├── experiment.py # Walk-forward & parallel execution
│ └── runtime.py # Protocol-based strategy runner
├── agent/system_prompt.md # Principal Research Agent prompt
├── AGENTS.md # Project contract: module boundaries
├── skills/ # Five domain operators
│ ├── cta-data-operator/
│ ├── cta-strategy-operator/
│ ├── cta-experiment-operator/
│ ├── cta-governance-operator/
│ └── cta-principal-research-loop/
├── cta_runtime/ # User-written strategy modules
│ ├── baselines/
│ ├── filters/
│ ├── executions/
│ └── risks/
├── tests/ # 39 test files, ~9,700 LOC
└── README.md # Project overview
MIT — see LICENSE.