Skip to content

qWaitCrypto/AuraQuant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AuraQuant

Autonomous CTA Strategy Research Platform — Powered by LLM Agents

Architecture · Research Loop · Backtest Engine · Getting Started

Python 3.10+ AsyncIO Multi-Provider LLM License


What Is This?

AuraQuant is an agentic-first quantitative research platform for CTA (trend-following) strategies. Instead of using LLMs as code-completion assistants, AuraQuant puts an AI Agent in the role of Principal Researcher — autonomously proposing hypotheses, designing experiments, evaluating results, and managing strategy lifecycle.

The core idea: define constraints and goals; let the agent decide the research path.

Human defines:  "Find a robust trend-following strategy on BTC/ETH 1h, Sharpe ≥ 1.2, MaxDD ≤ -15%"
Agent executes:  hypothesis → spec → backtest → evaluate → iterate → archive → approve
Human reviews:   only at the final gate (frozen → production)

Design Ideas

  • System state is explicit. Research state lives in typed objects such as Spec, Run, Report, ArchiveRecord, RunTrace, DataVersion, DataSlice, and FeatureSlice, rather than being inferred from chat history.
  • Policy and execution are separated. Agent behavior is layered across system_prompt.md, AGENTS.md, skills/, and the stable tool/runtime surface, so research policy can change without rewriting the execution kernel each time.
  • Strategy modules have hard boundaries. Baseline, Filter, Execution, and Risk are treated as separate responsibilities, which keeps entry logic, admission logic, position management, and risk constraints from collapsing into one opaque module.
  • Data semantics are versioned and traceable. DataVersion, DataSlice, and FeatureSlice carry provenance, and multi-timeframe rules use closed bars only, so experiments are easier to replay and compare.
  • Governance is part of the runtime model. ArchiveRecord stores lineage and lifecycle state, which lets research runs move through candidate and frozen states automatically while keeping promotion to production behind a human gate.
  • Long sessions preserve structured state, not raw logs. Compaction and trace artifacts are designed to retain identifiers, decisions, failures, and next actions needed to resume research after many iterations.

Architecture

Control Stack

AuraQuant's behavior is governed by a four-layer control stack — upper layers evolve with research goals, lower layers remain stable:

┌─────────────────────────────────────────────────────────┐
│  system_prompt.md    Role definition + invariant rules  │  ← changes rarely
├─────────────────────────────────────────────────────────┤
│  AGENTS.md           Project contracts + module bounds  │
├─────────────────────────────────────────────────────────┤
│  skills/             5 domain operators (data/strategy/ │  ← changes per
│                      experiment/governance/orchestrator) │     research phase
├─────────────────────────────────────────────────────────┤
│  aura_tools.py       Hard-contract tool surface with    │  ← stable
│  + backtest engine   pre-flight checks & typed I/O      │     infrastructure
└─────────────────────────────────────────────────────────┘

System Components

graph TB
    subgraph "Agent Layer"
        SP["System Prompt"]
        AG["AGENTS.md<br/>Project Contract"]
        SK["Skills<br/>5 Domain Operators"]
    end

    subgraph "Runtime Kernel — aura/"
        ENG["Async Agent Engine<br/>3,500 LOC · streaming · tool DAG"]
        CMP["Context Compaction<br/>token budgeting · structured snapshots"]
        LLM["Multi-Provider LLM<br/>Anthropic · OpenAI · Gemini · Codex"]
    end

    subgraph "CTA Infrastructure — auraquant/cta/"
        TL["Tool Surface<br/>50+ typed tools · pre-flight validation"]
        BT["Backtest Engine<br/>deterministic · realistic fills"]
        EXP["Experiment Runner<br/>walk-forward · parallel windows"]
        GOV["Governance<br/>archive · approval · lifecycle"]
    end

    SP --> ENG
    AG --> ENG
    SK --> ENG
    ENG --> CMP
    ENG --> LLM
    ENG --> TL
    TL --> BT
    TL --> EXP
    TL --> GOV

    style ENG fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style CMP fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style LLM fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style TL fill:#16213e,stroke:#0f3460,color:#e0e0e0
    style BT fill:#16213e,stroke:#0f3460,color:#e0e0e0
    style EXP fill:#16213e,stroke:#0f3460,color:#e0e0e0
    style GOV fill:#16213e,stroke:#e0a020,color:#e0e0e0
Loading

Research Loop

Each research session progresses through four phases, with the agent autonomously routing to the most urgent gap:

flowchart LR
    D["Data<br/>Versioning · DataSlice<br/>Feature engineering"]
    S["Strategy<br/>Spec definition<br/>Module assembly"]
    E["Experiment<br/>Walk-forward<br/>Optuna param search"]
    G["Governance<br/>Archive · Freeze<br/>Human approval"]

    D --> S --> E --> G
    E -- "hypothesis falsified" --> S
    G -- "new direction" --> S

    style D fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style S fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style E fill:#1a1a2e,stroke:#4a9eff,color:#e0e0e0
    style G fill:#1a1a2e,stroke:#e0a020,color:#e0e0e0
Loading

Phase dependencies are hard-enforced: no strategy without qualified data; no experiment without a persisted Spec. The agent identifies the earliest gap and routes accordingly.

Five-Dimensional Exploration Framework

The agent's autonomous research follows a structured anti-stagnation protocol:

Dimension Trigger Action
Parameter Default starting point Search optimal params within current signal (Optuna / grid)
Structure Params converge but execution poor Swap Filter / Execution / Risk combinations
Hypothesis Structure space exhausted Change alpha hypothesis, switch entry logic
Induction Forced every 10 rounds Summarize falsified hypotheses, output next directions
Research 3 consecutive dead-ends Import external CTA strategy ideas

The induction layer is the critical design: forced summarization every 10 rounds prevents the agent from burning budget on a dead-end direction.

Autonomous Mode & Termination

When given convergence targets, the agent enters self-directed mode with structured iteration state:

<ITER_STATE>
  iter: 3 / 200
  goal: BTC 1h · Sharpe ≥ 1.2 · max_dd ≤ -15%
  tried:
    - ema_cross(fast=8, slow=21) → Sharpe=0.74 · dd=-19.2%
    - trendflex(lb=20) + atr_filter  → Sharpe=0.91 · dd=-17.3%
  next_action: Try momentum breakout with vol regime filter
  status: CONTINUING
</ITER_STATE>

Termination conditions: goal achieved · 5 rounds of Sharpe delta < 0.05 (stagnation) · budget exhausted · structural gap · approval gate · user interrupt.


Backtest Engine

Four-Layer Strategy Decomposition

Every CTA strategy is decomposed into four orthogonal, swappable modules using Python Protocols:

flowchart TD
    Bar["Each Bar (OHLCV)"] --> BL

    subgraph Strategy["Strategy Modules (Protocol-based plugins)"]
        BL["Baseline<br/>Generate entry candidates<br/>O(1) per-bar · incremental cache"]
        FL["Filter<br/>Discrete pass/reject<br/>FilterDecision audit trail"]
        EX["Execution<br/>Post-entry state machine<br/>Trailing stop · fixed SL/TP"]
        RM["Risk<br/>Final approval + position sizing<br/>RiskDecision records"]

        BL --> FL --> EX --> RM
    end

    RM --> EN["Backtest Engine<br/>Fills · Positions · Equity Curve"]
Loading

Why this decomposition matters:

Benefit Example
Composable Replace Execution module; Baseline stays untouched
Auditable Every filter decision recorded as structured FilterDecision
A/B testable Same Baseline + different Risk modules → directly comparable walk-forward results
Type-safe Pydantic model coercion on all module outputs

Engine Capabilities

Feature Detail
Order Execution Realistic fill timing (entry/exit), intrabar vs. bar-close modes
Position Model Long/short, reversals, same-bar conflict resolution
Cost Modeling Commission + slippage, configurable per-side application
Price Triggers Trailing stop, take-profit, stop-loss with deterministic evaluation
Walk-Forward Configurable train/validation/target window splits
Parallel Execution ThreadPoolExecutor with CPU-aware worker scaling
Parameter Search Optuna integration, 50+ trial batches per window

Output Artifacts

Each experiment produces structured, machine-readable artifacts:

  • Equity Curve: NAV series with initial capital normalization
  • Trade List: Entry/exit timestamps, quantities, PnL per trade
  • Step Summary: Baseline entries, filter decisions, risk approvals per bar
  • Window Selection: Parameter assignments per walk-forward window
  • EvalReport: Sharpe, max drawdown, win rate, profit factor, and more

Context Compaction

Long-running research sessions (100+ experiments, hours of continuous operation) require principled context management. AuraQuant implements structured XML snapshot compaction:

Context Window (~80% utilization threshold)
├── System Prompt + Skills                    ← always retained
├── Compacted Summary                         ← structured XML snapshot
│   ├── <overall_goal>                        ← research objective
│   ├── <key_knowledge>                       ← falsified hypotheses, discovered patterns
│   ├── <contract_memory>                     ← error patterns, constraint violations
│   └── <spec_workflow_state>                 ← current experiment state
└── Recent Messages (budget: 20% of window)   ← last N messages, token-counted

Key design decisions:

  • Per-provider token budgeting (different limits for Anthropic vs. Codex)
  • Minimum 2-message retention floor (prevents losing all recent context)
  • Auto-compact at 80% of context window
  • <AUTO_CONTINUE> signal for seamless session resumption after compaction

Tool Surface Design

The 50+ tool registry follows progressive disclosure — SDK knowledge is provided on-demand, not dumped into the prompt:

Agent needs to write a Baseline module
  → cta__module_scaffold("baselines")
  → Returns template with O(1) per-bar caching skeleton

Agent unsure about parameter constraints
  → cta__sdk_reference("baseline")
  → Returns performance tips + common pitfalls

Agent ready to run experiment
  → Pre-flight automatically validates:
     ✓ Spec persisted?
     ✓ Runtime module files exist?
     ✓ Parameter search space configured?
     → Cheap static checks before expensive backtest runs

Tool Categories

Category Count Examples
Data 3 data_version_list, data_slice_create, feature_slice_create
Strategy 8 module_scaffold, spec_put, sdk_reference, spec_schema
Experiment 5 experiment_run, experiment_wait, evaluation_show, evaluation_compare
Governance 6 archive_candidate, approval_request, approval_resolve
Agent Lifecycle 3 agent_state_save, agent_state_restore, run_trace_query

Governance Pipeline

Research and deployment are cleanly separated — the agent operates autonomously during research, with a single human gate before production:

stateDiagram-v2
    [*] --> candidate: Metrics exceed threshold → auto-archive

    candidate --> frozen: Research converges → auto-freeze
    candidate --> [*]: Below threshold → discard

    frozen --> production: Human approval
    frozen --> deprecated: Human rejection

    production --> deprecated: Decommission

    note right of frozen
        Single human gate
        Everything before is autonomous
    end note
Loading

Tech Stack

Layer Technologies
Runtime Python 3.10+, AsyncIO, Pydantic 2.x, dataclasses
LLM Multi-provider (Anthropic, OpenAI, Gemini, Codex), streaming, token budgeting
Data Pandas 2.x, DuckDB, PyArrow, CCXT (multi-exchange)
Optimization Optuna (hyperparameter search)
Indicators pandas-ta
CLI/UX Rich, structured JSON output
Testing pytest, 39 test files, ~9,700 LOC test coverage

Getting Started

# Install dependencies
pip install -r requirements.txt

# Configure LLM provider
mkdir -p .aura/config
cp aura/config/models.json.example .aura/config/models.json
# Edit .aura/config/models.json — add your API key

# Initialize workspace and verify
python -m auraquant init .
pytest -q tests/integration

# Launch the agent research loop
python -m auraquant chat

Project Structure

AuraQuant/
├── aura/                          # LLM Agent runtime kernel
│   └── runtime/
│       ├── engine_agno_async.py   #   Async agent loop (3,500 LOC)
│       ├── compaction.py          #   Context compaction & token budgeting
│       └── prompts/               #   Compaction prompt templates
├── auraquant/cta/                 # CTA domain infrastructure
│   ├── aura_tools.py              #   Tool surface (50+ tools, 2,300 LOC)
│   ├── cli.py                     #   CLI entry point
│   └── backtest/
│       ├── engine.py              #   Deterministic backtest engine (1,200 LOC)
│       ├── experiment.py          #   Walk-forward & parallel execution
│       └── runtime.py             #   Protocol-based strategy runner
├── agent/system_prompt.md         # Principal Research Agent prompt
├── AGENTS.md                      # Project contract: module boundaries
├── skills/                        # Five domain operators
│   ├── cta-data-operator/
│   ├── cta-strategy-operator/
│   ├── cta-experiment-operator/
│   ├── cta-governance-operator/
│   └── cta-principal-research-loop/
├── cta_runtime/                   # User-written strategy modules
│   ├── baselines/
│   ├── filters/
│   ├── executions/
│   └── risks/
├── tests/                         # 39 test files, ~9,700 LOC
└── README.md                      # Project overview

License

MIT — see LICENSE.

About

Agentic-first CTA research system for autonomous hypothesis generation, backtesting, evaluation, and strategy governance.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages