AuraQuant

Autonomous CTA Strategy Research Platform — Powered by LLM Agents

Architecture · Research Loop · Backtest Engine · Getting Started

What Is This?

AuraQuant is an agentic-first quantitative research platform for CTA (trend-following) strategies. Instead of using LLMs as code-completion assistants, AuraQuant puts an AI Agent in the role of Principal Researcher — autonomously proposing hypotheses, designing experiments, evaluating results, and managing strategy lifecycle.

The core idea: define constraints and goals; let the agent decide the research path.

Human defines:  "Find a robust trend-following strategy on BTC/ETH 1h, Sharpe ≥ 1.2, MaxDD ≤ -15%"
Agent executes:  hypothesis → spec → backtest → evaluate → iterate → archive → approve
Human reviews:   only at the final gate (frozen → production)

Design Ideas

System state is explicit. Research state lives in typed objects such as Spec, Run, Report, ArchiveRecord, RunTrace, DataVersion, DataSlice, and FeatureSlice, rather than being inferred from chat history.
Policy and execution are separated. Agent behavior is layered across system_prompt.md, AGENTS.md, skills/, and the stable tool/runtime surface, so research policy can change without rewriting the execution kernel each time.
Strategy modules have hard boundaries. Baseline, Filter, Execution, and Risk are treated as separate responsibilities, which keeps entry logic, admission logic, position management, and risk constraints from collapsing into one opaque module.
Data semantics are versioned and traceable. DataVersion, DataSlice, and FeatureSlice carry provenance, and multi-timeframe rules use closed bars only, so experiments are easier to replay and compare.
Governance is part of the runtime model. ArchiveRecord stores lineage and lifecycle state, which lets research runs move through candidate and frozen states automatically while keeping promotion to production behind a human gate.
Long sessions preserve structured state, not raw logs. Compaction and trace artifacts are designed to retain identifiers, decisions, failures, and next actions needed to resume research after many iterations.

Architecture

Control Stack

AuraQuant's behavior is governed by a four-layer control stack — upper layers evolve with research goals, lower layers remain stable:

┌─────────────────────────────────────────────────────────┐
│  system_prompt.md    Role definition + invariant rules  │  ← changes rarely
├─────────────────────────────────────────────────────────┤
│  AGENTS.md           Project contracts + module bounds  │
├─────────────────────────────────────────────────────────┤
│  skills/             5 domain operators (data/strategy/ │  ← changes per
│                      experiment/governance/orchestrator) │     research phase
├─────────────────────────────────────────────────────────┤
│  aura_tools.py       Hard-contract tool surface with    │  ← stable
│  + backtest engine   pre-flight checks & typed I/O      │     infrastructure
└─────────────────────────────────────────────────────────┘

System Components

graph TB
    subgraph "Agent Layer"
        SP["System Prompt"]
        AG["AGENTS.md<br/>Project Contract"]
        SK["Skills<br/>5 Domain Operators"]
    end

    subgraph "Runtime Kernel — aura/"
        ENG["Async Agent Engine<br/>3,500 LOC · streaming · tool DAG"]
        CMP["Context Compaction<br/>token budgeting · structured snapshots"]
        LLM["Multi-Provider LLM<br/>Anthropic · OpenAI · Gemini · Codex"]
    end

    subgraph "CTA Infrastructure — auraquant/cta/"
        TL["Tool Surface<br/>50+ typed tools · pre-flight validation"]
        BT["Backtest Engine<br/>deterministic · realistic fills"]
        EXP["Experiment Runner<br/>walk-forward · parallel windows"]
        GOV["Governance<br/>archive · approval · lifecycle"]
    end

    SP --> ENG
    AG --> ENG
    SK --> ENG
    ENG --> CMP
    ENG --> LLM
    ENG --> TL
    TL --> BT
    TL --> EXP
    TL --> GOV

    style ENG fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style CMP fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style LLM fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style TL fill:#16213e,stroke:#0f3460,color:#e0e0e0
    style BT fill:#16213e,stroke:#0f3460,color:#e0e0e0
    style EXP fill:#16213e,stroke:#0f3460,color:#e0e0e0
    style GOV fill:#16213e,stroke:#e0a020,color:#e0e0e0

Research Loop

Each research session progresses through four phases, with the agent autonomously routing to the most urgent gap:

flowchart LR
    D["Data<br/>Versioning · DataSlice<br/>Feature engineering"]
    S["Strategy<br/>Spec definition<br/>Module assembly"]
    E["Experiment<br/>Walk-forward<br/>Optuna param search"]
    G["Governance<br/>Archive · Freeze<br/>Human approval"]

    D --> S --> E --> G
    E -- "hypothesis falsified" --> S
    G -- "new direction" --> S

    style D fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style S fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style E fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style G fill:#1a1a2e,stroke:#e0a020,color:#e0e0e0

Phase dependencies are hard-enforced: no strategy without qualified data; no experiment without a persisted Spec. The agent identifies the earliest gap and routes accordingly.

Five-Dimensional Exploration Framework

The agent's autonomous research follows a structured anti-stagnation protocol:

Dimension	Trigger	Action
Parameter	Default starting point	Search optimal params within current signal (Optuna / grid)
Structure	Params converge but execution poor	Swap Filter / Execution / Risk combinations
Hypothesis	Structure space exhausted	Change alpha hypothesis, switch entry logic
Induction	Forced every 10 rounds	Summarize falsified hypotheses, output next directions
Research	3 consecutive dead-ends	Import external CTA strategy ideas

The induction layer is the critical design: forced summarization every 10 rounds prevents the agent from burning budget on a dead-end direction.

Autonomous Mode & Termination

When given convergence targets, the agent enters self-directed mode with structured iteration state:

<ITER_STATE>
  iter: 3 / 200
  goal: BTC 1h · Sharpe ≥ 1.2 · max_dd ≤ -15%
  tried:
    - ema_cross(fast=8, slow=21) → Sharpe=0.74 · dd=-19.2%
    - trendflex(lb=20) + atr_filter  → Sharpe=0.91 · dd=-17.3%
  next_action: Try momentum breakout with vol regime filter
  status: CONTINUING
</ITER_STATE>

Termination conditions: goal achieved · 5 rounds of Sharpe delta < 0.05 (stagnation) · budget exhausted · structural gap · approval gate · user interrupt.

Backtest Engine

Four-Layer Strategy Decomposition

Every CTA strategy is decomposed into four orthogonal, swappable modules using Python Protocols:

flowchart TD
    Bar["Each Bar (OHLCV)"] --> BL

    subgraph Strategy["Strategy Modules (Protocol-based plugins)"]
        BL["Baseline<br/>Generate entry candidates<br/>O(1) per-bar · incremental cache"]
        FL["Filter<br/>Discrete pass/reject<br/>FilterDecision audit trail"]
        EX["Execution<br/>Post-entry state machine<br/>Trailing stop · fixed SL/TP"]
        RM["Risk<br/>Final approval + position sizing<br/>RiskDecision records"]

        BL --> FL --> EX --> RM
    end

    RM --> EN["Backtest Engine<br/>Fills · Positions · Equity Curve"]

Why this decomposition matters:

Benefit	Example
Composable	Replace Execution module; Baseline stays untouched
Auditable	Every filter decision recorded as structured `FilterDecision`
A/B testable	Same Baseline + different Risk modules → directly comparable walk-forward results
Type-safe	Pydantic model coercion on all module outputs

Engine Capabilities

Feature	Detail
Order Execution	Realistic fill timing (entry/exit), intrabar vs. bar-close modes
Position Model	Long/short, reversals, same-bar conflict resolution
Cost Modeling	Commission + slippage, configurable per-side application
Price Triggers	Trailing stop, take-profit, stop-loss with deterministic evaluation
Walk-Forward	Configurable train/validation/target window splits
Parallel Execution	ThreadPoolExecutor with CPU-aware worker scaling
Parameter Search	Optuna integration, 50+ trial batches per window

Output Artifacts

Each experiment produces structured, machine-readable artifacts:

Equity Curve: NAV series with initial capital normalization
Trade List: Entry/exit timestamps, quantities, PnL per trade
Step Summary: Baseline entries, filter decisions, risk approvals per bar
Window Selection: Parameter assignments per walk-forward window
EvalReport: Sharpe, max drawdown, win rate, profit factor, and more

Context Compaction

Long-running research sessions (100+ experiments, hours of continuous operation) require principled context management. AuraQuant implements structured XML snapshot compaction:

Context Window (~80% utilization threshold)
├── System Prompt + Skills                    ← always retained
├── Compacted Summary                         ← structured XML snapshot
│   ├── <overall_goal>                        ← research objective
│   ├── <key_knowledge>                       ← falsified hypotheses, discovered patterns
│   ├── <contract_memory>                     ← error patterns, constraint violations
│   └── <spec_workflow_state>                 ← current experiment state
└── Recent Messages (budget: 20% of window)   ← last N messages, token-counted

Key design decisions:

Per-provider token budgeting (different limits for Anthropic vs. Codex)
Minimum 2-message retention floor (prevents losing all recent context)
Auto-compact at 80% of context window
<AUTO_CONTINUE> signal for seamless session resumption after compaction

Tool Surface Design

The 50+ tool registry follows progressive disclosure — SDK knowledge is provided on-demand, not dumped into the prompt:

Agent needs to write a Baseline module
  → cta__module_scaffold("baselines")
  → Returns template with O(1) per-bar caching skeleton

Agent unsure about parameter constraints
  → cta__sdk_reference("baseline")
  → Returns performance tips + common pitfalls

Agent ready to run experiment
  → Pre-flight automatically validates:
     ✓ Spec persisted?
     ✓ Runtime module files exist?
     ✓ Parameter search space configured?
     → Cheap static checks before expensive backtest runs

Tool Categories

Category	Count	Examples
Data	3	`data_version_list`, `data_slice_create`, `feature_slice_create`
Strategy	8	`module_scaffold`, `spec_put`, `sdk_reference`, `spec_schema`
Experiment	5	`experiment_run`, `experiment_wait`, `evaluation_show`, `evaluation_compare`
Governance	6	`archive_candidate`, `approval_request`, `approval_resolve`
Agent Lifecycle	3	`agent_state_save`, `agent_state_restore`, `run_trace_query`

Governance Pipeline

Research and deployment are cleanly separated — the agent operates autonomously during research, with a single human gate before production:

stateDiagram-v2
    [*] --> candidate: Metrics exceed threshold → auto-archive

    candidate --> frozen: Research converges → auto-freeze
    candidate --> [*]: Below threshold → discard

    frozen --> production: Human approval
    frozen --> deprecated: Human rejection

    production --> deprecated: Decommission

    note right of frozen
        Single human gate
        Everything before is autonomous
    end note

Tech Stack

Layer	Technologies
Runtime	Python 3.10+, AsyncIO, Pydantic 2.x, dataclasses
LLM	Multi-provider (Anthropic, OpenAI, Gemini, Codex), streaming, token budgeting
Data	Pandas 2.x, DuckDB, PyArrow, CCXT (multi-exchange)
Optimization	Optuna (hyperparameter search)
Indicators	pandas-ta
CLI/UX	Rich, structured JSON output
Testing	pytest, 39 test files, ~9,700 LOC test coverage

Getting Started

# Install dependencies
pip install -r requirements.txt

# Configure LLM provider
mkdir -p .aura/config
cp aura/config/models.json.example .aura/config/models.json
# Edit .aura/config/models.json — add your API key

# Initialize workspace and verify
python -m auraquant init .
pytest -q tests/integration

# Launch the agent research loop
python -m auraquant chat

Project Structure

AuraQuant/
├── aura/                          # LLM Agent runtime kernel
│   └── runtime/
│       ├── engine_agno_async.py   #   Async agent loop (3,500 LOC)
│       ├── compaction.py          #   Context compaction & token budgeting
│       └── prompts/               #   Compaction prompt templates
├── auraquant/cta/                 # CTA domain infrastructure
│   ├── aura_tools.py              #   Tool surface (50+ tools, 2,300 LOC)
│   ├── cli.py                     #   CLI entry point
│   └── backtest/
│       ├── engine.py              #   Deterministic backtest engine (1,200 LOC)
│       ├── experiment.py          #   Walk-forward & parallel execution
│       └── runtime.py             #   Protocol-based strategy runner
├── agent/system_prompt.md         # Principal Research Agent prompt
├── AGENTS.md                      # Project contract: module boundaries
├── skills/                        # Five domain operators
│   ├── cta-data-operator/
│   ├── cta-strategy-operator/
│   ├── cta-experiment-operator/
│   ├── cta-governance-operator/
│   └── cta-principal-research-loop/
├── cta_runtime/                   # User-written strategy modules
│   ├── baselines/
│   ├── filters/
│   ├── executions/
│   └── risks/
├── tests/                         # 39 test files, ~9,700 LOC
└── README.md                      # Project overview

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AuraQuant

What Is This?

Design Ideas

Architecture

Control Stack

System Components

Research Loop

Five-Dimensional Exploration Framework

Autonomous Mode & Termination

Backtest Engine

Four-Layer Strategy Decomposition

Engine Capabilities

Output Artifacts

Context Compaction

Tool Surface Design

Tool Categories

Governance Pipeline

Tech Stack

Getting Started

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
agent		agent
aura		aura
auraquant		auraquant
skills		skills
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AuraQuant

What Is This?

Design Ideas

Architecture

Control Stack

System Components

Research Loop

Five-Dimensional Exploration Framework

Autonomous Mode & Termination

Backtest Engine

Four-Layer Strategy Decomposition

Engine Capabilities

Output Artifacts

Context Compaction

Tool Surface Design

Tool Categories

Governance Pipeline

Tech Stack

Getting Started

Project Structure

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages