Skip to content

Latest commit

 

History

History
1025 lines (782 loc) · 39.4 KB

File metadata and controls

1025 lines (782 loc) · 39.4 KB

Repository Evidence Graph Queries

src/queries/insights.ts is the shared adapter for dashboard-grade evidence graph queries. CLI, TUI, and integration tests should reuse these builders instead of embedding ad hoc SurrealQL that can drift from the schema.

Example commands:

axctl insights
axctl insights schema
axctl insights repositories --limit=25
axctl insights checkouts --limit=25
axctl insights git --limit=25
axctl insights friction --limit=50
axctl insights tools --limit=20
axctl insights sessions --limit=20
axctl insights file-evidence --limit=20
axctl insights feedback-loops --limit=20
axctl insights feedback-language --limit=20
axctl insights message-signals --limit=20
axctl insights reactions --limit=20
axctl insights reaction-themes --limit=20
axctl insights reaction-events --limit=20
axctl insights reaction-event-themes --limit=20
axctl insights verification-gaps --limit=20
axctl insights user-language --limit=20
axctl insights token-impact --limit=20
axctl insights cache-health --limit=20
axctl insights workflow-impact --limit=20
axctl insights codex-health --limit=20
axctl insights closure --limit=20
axctl insights post-feature-fixes --limit=20
axctl insights skill-candidates --limit=20
axctl insights graph-health --limit=10
axctl dashboard --limit=25
axctl costs summary --since=2
axctl costs for --query "live-traces" --limit=20
axctl costs for --terms "live trace,livetrace,live-traces" --since=2 --limit=50
axctl costs for --query "checkout bug" --since=7 --here
axctl costs for --commit 464c80b
axctl costs for --branch main --limit=20
axctl pricing --query gpt-5.5

Use --json on any insights view to print the raw query rows. The message-analysis views default to compact, scan-friendly output.

The builders target the current schema fields directly:

  • repositoryOverviewSql reads repository and counts ->has_checkout->checkout.
  • checkoutActivitySql reads checkout and counts linked sessions, turns, tool calls, failures, produced commits, and touched files per worktree or local checkout.
  • gitCorrelationSql reads repository-linked sessions, commits, produced, and touched evidence so a dashboard can show whether transcript activity is attached to Git history.
  • recentFrictionSql reads friction_event and returns the JSON-encoded labels, metrics, and raw fields rather than flattened draft fields.
  • toolFailuresSql groups tool_call rows with WHERE has_error = true.
  • sessionEvidenceSql summarizes session-linked tool calls, failures, friction events, and plan snapshots.
  • fileEvidenceSql summarizes provider-neutral edit, read, and search file relations so Claude/Codex/Pi evidence can be compared without provider branches.
  • feedbackLoopsSql groups persisted command_outcome rows so expected test feedback, guardrails, search misses, and real blockers can be separated.
  • feedbackLanguageSql, messageSignalsSql, and reactionsSql read the turn feedback graph: semantic signals, user/assistant examples, and correction or approval pairs linked through reacts_to.
  • reactionThemesSql groups reacts_to edges by promoted semantic signal so recurring correction, rejection, and approval patterns are visible without manually reading every pair.
  • verificationGapsSql finds sessions with edits but no verification-shaped command outcomes.
  • userLanguageSql reads persisted user_message_ngram aggregates from user turns, including correction and verification proximity counters.
  • tokenImpactSql compares actual or estimated token usage by workflow epoch and provider.
  • cacheHealthSql surfaces sessions with actual cache metrics when present, otherwise high estimated-token sessions that need better provider metadata.
  • workflowImpactSql compares turns, tool calls, tool errors, corrections, interruptions, subagent dispatches, and estimated tokens by workflow epoch.
  • codexHealthSql ranks non-empty Codex sessions by estimated context cost.
  • closureSql summarizes commit lifecycle classifications.
  • postFeatureFixesSql lists feature commits followed by overlapping fix commits within the configured window.
  • skillCandidatesSql lists evidence-backed skill or guardrail candidates derived from fix chains and risky sessions.
  • schemaCoverageSql reports every schema table as active, conditional, or staged, so intentionally empty tables are visible instead of surprising in Surrealist.

Cost And Pricing Queries

axctl costs is the graph-backed read surface for token spend. It reads session_token_usage rows produced by provider ingest and session-health, then groups by provider/model and reports estimated USD. Cost estimates use agent_model pricing rows imported by the pricing/models stage, with a built-in local catalog as a fallback for common models.

Supported commands:

axctl costs summary [--since=N] [--source=codex|claude|pi|opencode|cursor]
axctl costs for --session <session-id>
axctl costs for --query <turn-text> [--since=N] [--project=<path>] [--here] [--limit=N]
axctl costs for --terms <term-a,term-b> [--since=N] [--project=<path>] [--here] [--limit=N]
axctl costs for --commit <sha>
axctl costs for --branch <branch> [--limit=N]
axctl pricing [--query <model-or-provider>] [--limit=N]

Selectors map onto graph evidence:

  • --session reads the direct session_token_usage.session row.
  • --query finds sessions through full-text matching turn.text_excerpt.
  • --terms matches any comma-separated text term, dedupes sessions, and is useful for spelling variants such as live trace, livetrace, and live-traces.
  • --commit follows session -> produced -> commit.
  • --branch filters produced.checkout -> checkout.branch in the current repository.

--query and --terms accept scope filters:

  • --since=N restricts matching sessions to the last N days.
  • --project=<path> restricts matching sessions to a project/cwd path.
  • --here restricts matching sessions to the repository at the current working directory, using session.repository.

The output includes session count, total estimated tokens, prompt/output/cache token buckets, model breakdown, pricing source, and the matching session ids. Use --json on costs for and pricing when another tool needs structured data.

Known limits:

  • Direct --pr <number> is not wired yet. Use the PR branch name or a commit SHA until pull-request nodes are populated in the graph.
  • Cost is only as precise as provider model metadata. If a provider records no concrete model, ax avoids pricing bare provider names such as openai and reports the row as unpriced rather than inventing a model.
  • Some providers expose actual prompt/output/cache counters; others still fall back to transcript-byte token estimates.

Harness Doctor Tables And Ingestion Status

The Harness Doctor ingest slice currently persists these tables:

  • guidance_source
  • guidance_revision
  • stack

Current implementation status:

  • axctl project harness scans repo-local and global guidance sources at report time.
  • axctl project harness --json returns Guidance Sources, Guidance Revisions, Stack signals, and Harness Doctor findings.
  • axctl project harness also reads existing tool_call, edited, and produced graph evidence so observed tooling and main-branch write-risk signals are grounded in the current database.
  • Default axctl ingest persists the durable Harness Doctor subset via the harness/doctor ingest stage.
  • Default axctl ingest also persists command outcome classifications and user-message n-grams via the outcomes/derive ingest stage.
  • Default axctl ingest persists token/cache/workflow health via the session-health/derive ingest stage.
  • Default axctl ingest persists commit lifecycle, post-feature fix-chain, and skill-candidate records via the closure/derive ingest stage.

The harness ingest stage is idempotent and:

  1. Upserts guidance_source rows keyed by path.
  2. Upserts guidance_revision rows keyed by source path plus content hash.
  3. Upserts declared and observed stack records.

Use axctl project harness --json as the canonical report surface and axctl insights schema to verify durable table population after ingest.

Command Outcome And User Language Tables

The command outcome slice adds:

  • command_outcome
  • user_message_ngram

command_outcome is keyed from the original tool_call record and classifies commands into success, expected_feedback, search_miss, guardrail, environment_blocker, workflow_error, product_bug_signal, or unknown. This keeps useful TDD/lint/typecheck feedback distinct from real workflow friction.

user_message_ngram is derived from turn.role = "user" excerpts and stores bi-gram/tri-gram frequency plus correction, failed-tool, edit, and verification proximity counters. It is an intentionally small first pass for mining repeated preferences, corrections, and language that should become taste or harness learning candidates.

Session Token And Workflow Health Tables

The session health slice adds:

  • workflow_epoch
  • session_token_usage
  • session_health

workflow_epoch currently derives a gsd to superpowers split from the first observed superpowers:* skill invocation. This is a heuristic, but it creates a stable comparison anchor for dogfooding workflow migration questions.

session_token_usage labels token/model quality in its JSON labels field. token_source_quality is explicit for provider counters such as Codex token_count, Pi usage fields, or Claude usage metadata; estimate for transcript-byte estimates; and unavailable when neither counters nor text bytes are present. model_source_quality distinguishes provider model names from missing model metadata. Cost reads also surface unpriced_model_reason when pricing is not computed for a row.

session_health records turns, tool calls, tool errors, correction-like user messages, interruption/status/redirect-like user messages, subagent dispatches, plan snapshots, estimated tokens, cache ratios, and a coarse context-pressure bucket. These rows power token-impact, cache-health, workflow-impact, and codex-health.

Verification Churn (ax sessions churn)

ax sessions churn [--here|--project=P] [--source=S] [--since=N] [--json] rolls verification churn up by session and source: landed LOC (commits via produced/touched), edit LOC (edit-class tool calls), and repair LOC - edits made while a verification episode is open. An episode opens when a check command (test / typecheck / lint / build) fails after prior edits and closes when a same-family check passes; open episodes expire after 30 minutes. Check classification is anchored to command position (bun test counts, ls test/ does not). Default window is 30 days; --here scopes to the current repo (project slug, cwd, and worktree/subdirectory sessions all match).

Closure Quality And Skill Candidate Tables

The closure-quality slice adds:

  • commit_classification
  • later_fixed_by
  • skill_candidate
  • suggests_skill

commit_classification classifies commit messages as feature, fix, refactor, test, docs, chore, or unknown.

later_fixed_by links a feature commit to a later fix commit when they share a repository, land within the time window, and touch one or more of the same files. This is a deliberately conservative first pass: it treats same-file post-feature fixes as evidence that closure quality could improve.

skill_candidate turns repeated fix-chain patterns and risky session health signals into candidate skills or guardrails, such as ingest idempotency checks, schema-change smoke tests, live query dogfooding, or session closure quality gates.

Onboarding

axctl onboarding --json checks whether global Claude, Codex, and shared agent guidance directories are git-tracked. This gives future guidance and skill experiments commit evidence before ax starts recommending harness changes.

SurrealKit workflow takeaway: local development can keep importing the schema directly for now. Tests should prefer isolated databases or namespaces so query/integration runs do not mutate the user's main ax/main graph. A future schema sync and rollout workflow can be added once the evidence graph stabilizes.

Implementation-pattern reference: docs/effect-reference-t3code.md captures Effect practices from the local .references/t3code clone that are worth adapting as the prototype grows, especially typed config, process services, schema decoders, and layer-based tests.

Prototype Verification Notes

The prototype writes the new evidence graph beside the legacy taste graph. Existing taste/search commands continue to read legacy edges while the new insight commands read through src/queries/insights.ts.

Verification commands run:

  • bun run db:schema
  • bun src/cli/index.ts ingest --since=1
  • bun src/cli/index.ts ingest-insights
  • bun src/cli/index.ts insights schema --limit=5
  • bun src/cli/index.ts insights repositories --limit=5
  • bun src/cli/index.ts insights checkouts --limit=5
  • bun src/cli/index.ts insights git --limit=5
  • bun src/cli/index.ts insights friction --limit=5
  • bun src/cli/index.ts insights tools --limit=5
  • bun src/cli/index.ts insights sessions --limit=5
  • bun src/cli/index.ts dashboard --limit=5

2026-05-11 Dogfood Notes

Full backlog dogfood ran:

  • bun run db:schema
  • bun src/cli/index.ts ingest-insights --progress=plain
  • bun src/cli/index.ts ingest --since=1 --progress=plain
  • bun test
  • bun run typecheck
  • bun run build
  • bun run check:cli-reference
  • bun src/cli/index.ts project harness --json
  • bun src/cli/index.ts onboarding --json
  • bun src/cli/index.ts dashboard --limit=5
  • bun src/cli/index.ts insights <new-view> --limit=3 for feedback loops, user language, token/cache/workflow/Codex health, closure, post-feature fixes, skill candidates, and graph health.
  • bun src/cli/index.ts ingest-insights --progress=plain now imports both Claude usage-data facets and legacy dotfiles self-improve artifacts from ~/.dotfiles/claude/.claude/self-improve/runs/*.

Observed outputs:

  • Full ingest reached every current derived stage: outcomes/derive, session-health/derive, closure/derive, turn-analysis/derive, and harness/doctor.
  • Latest full ingest wrote 3,183 command outcomes, 421 user n-grams, 37 recent session-health rows, 1,321 commit classifications, 1,089 fix-chain edges, 5 skill candidates, plus turn-analysis signal rows.
  • project harness --json found 12 guidance sources and stack signals.
  • onboarding --json reports Claude and shared agent guidance as git-tracked; Codex global guidance is currently a warning because ~/.codex is not tracked.
  • Dashboard generation reported tools=69,655, plans=244, friction=4,434, and sessions=3,357.

Dogfood fixes made during the backlog run:

  • feedback-loops now filters successful/no-command rows so expected feedback and blockers are visible.
  • user-language ranks signal proximity before raw count and regenerates stale n-grams.
  • verification-gaps was rewritten from a slow per-session scan to an edit-first aggregate.
  • codex-health now ignores empty sessions and ranks by estimated context cost.
  • Closure derivation now runs full-graph during ingest even when transcript ingest is since-scoped, because fix-chain rows are materialized comparisons.
  • bun test
  • bun run typecheck
  • bun src/cli/index.ts project verify --json

Live dogfood counts after the smoke:

  • tool_call: 9,055
  • plan_snapshot: 103
  • insight: 131
  • friction_event: 626
  • diagnostic_event: 456

Schema coverage after the smoke should be read from axctl insights schema. The schema view counts current active and staged graph tables and omits tables removed by schema migrations.

Legacy self-improve importer behavior:

  • runs/*/events.jsonl becomes stable friction_event rows with source=legacy_self_improve.
  • clusters.json, proposed-claudemd.md, and _spend.log become compact artifact evidence plus insight summaries.
  • has_artifact and derived_from edges keep provenance queryable; the imported rows are evidence, not authoritative truth.

Install onboarding dogfood:

  • ./dist/axctl onboarding --json and bun src/cli/index.ts onboarding --json returned the same local harness tracking state.
  • Claude global guidance and shared agent skills were already git-tracked.
  • Codex global guidance was the only warning: /Users/necmttn/.codex.
  • The install onboarding formatter produced a host-agent checklist scoped to that warning, with guidance to use axctl onboarding --json, track only guidance/hooks/skills/commands/settings, exclude transcripts/caches/logs/ secrets/generated artifacts, commit chore: track agent harness, and rerun onboarding.

wterm terminal dogfood:

  • ./dist/axctl dogfood terminal --scenario=axctl-setup --transport=pty --port=1744 --json served a browser-rendered wterm terminal backed by a Node node-pty sidecar.
  • agent-browser open http://127.0.0.1:1742/ loaded the wterm DOM frontend and drove the scenario through the browser.
  • The scratch setup scenario demonstrated axctl --help, initial axctl onboarding --json warnings for .claude, .codex, and .agents, host-agent-style git tracking of those harness dirs, and a second onboarding check returning all ok.
  • Latest passing run wrote a dogfood_run row with scenario=axctl-setup, status=passed, and transport=pty.
  • The transcript was stored as artifact:dogfood_wterm_setup__bea19103cb17318a__transcript.
  • Native node-pty inside Bun 1.3.10 was tested first but did not reliably stream PTY output, so the committed PTY path uses a Node sidecar and keeps --transport=process as a fallback. Free-running Claude-driver automation remains the next driver slice.
  • Interactive mode now runs with ./dist/axctl dogfood terminal --scenario=interactive --transport=pty --command='bash -l' --port=1747 --json. agent-browser drove the terminal by typing echo AGENT_BROWSER_STEERED_INTERACTIVE, then exit; the latest result was status=completed, transport=pty, and the transcript contained the typed marker.
  • Agent presets now run with --agent=shell|claude|codex|opencode. Live smoke --scenario=interactive --agent=shell --transport=pty --port=1748 --json produced a dogfood_run row with agent=shell, command=bash -l, command_source=preset, status=completed, and a transcript containing AGENT_PRESET_SHELL_STEERED.
  • Repeatable success criteria via --success-marker=STR and --timeout=SECONDS. Marker-pass live smoke --scenario=interactive --agent=shell --transport=pty --success-marker=E2E_MARKER_PASS --timeout=20 --port=1749 --json with agent-browser typing echo E2E_MARKER_PASS; exit returned status=passed, markerFound=true, timedOut=false, persisted=true. Timeout live smoke --scenario=interactive --agent=shell --transport=pty --success-marker=NEVER_SEEN --timeout=2 --port=1750 --json produced a dogfood_run row with status=timed_out, metrics.timed_out=true, metrics.timeout_seconds=2, metrics.success_marker=NEVER_SEEN, and metrics.marker_found=false.

Harness Doctor schema additions are populated by default ingest. If they are empty, run axctl ingest --since=1 and inspect the harness/doctor ingest stage.

Dashboard generated at:

file:///Users/necmttn/.local/share/ax/dashboard.html

Experiment Loop CLI (axctl improve)

axctl improve is the read-write surface on top of the experiment-loop tables (proposal, skill_proposal, experiment, checkpoint). The loop: retro → proposal → experiment → verdict (see axctl retro for the front end). Subcommands:

axctl improve recommend

Rank open proposals by confidence × recency × frequency and print them as paste-ready blocks, each wrapped in <!--ax:id--> provenance markers so the agent file edit is traceable back to the proposal.

Flags:

  • --limit=N (default 5) - top N to print
  • --form=<skill|guidance|...> (repeatable) - filter by proposal form
  • --since=N - only proposals derived within N days
  • --json - machine-readable
  • --no-clipboard - skip auto-copy of top result
  • --apply - interactive accept loop: pick a numbered row, accept, repeat

axctl improve accept <id>

Default mode emits .ax/tasks/<id>.md, a structured brief your primary agent (Claude Code, Codex, etc.) consumes to edit the target file with the marker still in place. The brief tracks task_emitted status on the experiment row.

Flags:

  • --auto-scaffold - skip the brief, write SKILL.md directly under the scaffold dir (skill form only). Use when you want the file now and don't need a brief to hand off.
  • --with-agent - after scaffold, spawn a claude -p subagent (bypass permissions, streaming to terminal) that reads the stub + sibling skills and rewrites SKILL.md with concrete triggers, steps, anti-patterns. Optionally writes a sibling PLAN.md with a 3-bullet experiment plan (what to measure, success criterion, kill criterion). Implies --auto-scaffold semantics.
  • --force - overwrite an existing scaffold.

<id> accepts either the dedupe sig (12-char prefix from recommend) or the full proposal:<key> record id.

axctl improve lint

Scan grounded agent files for <!--ax:id--> markers and reconcile against the DB:

  • Markers in files but no matching proposal → orphan warning.
  • task_emitted experiments whose .ax/tasks/<id>.md brief has been consumed (marker now lives in agent file) → consumed task file removed, experiment status advanced.
  • Task briefs older than --stale-days (default 7) with no marker landed → stale warning, candidate for reject.

Flags:

  • --root=<dir> (repeatable) - additional scan roots beyond CWD
  • --stale-days=N (default 7)
  • --json

Linter dedupes against proposal.dedupe_sig exactly and pushes the stale-task date filter into SurrealQL, so it stays fast as the proposal table grows.

axctl improve show <id>

Full evidence trail for one proposal: source retro(s), baseline cluster, skill payload (trigger pattern, proposed behavior, expected impact), the linked experiment row, scaffold path, checkpoint snapshots, locked verdict.

axctl improve list

Browse the proposal queue.

Flags:

  • --status=<open|accepted|rejected|all>
  • --form=<skill|guidance|...>
  • --limit=N (default 30)
  • --json

axctl improve verdict <id>

Inspect or lock the +30-session verdict.

Flags:

  • --set=<adopted|ignored|regressed|partial|no_longer_needed> - lock the verdict (otherwise computed from checkpoints)

axctl improve reject <id>

Mark proposal rejected. Future re-derives of the same trigger are deduped against rejected proposals, so the same pattern won't re-propose every retro.

Flags:

  • --reason=<short_string> (default not_worth_packaging) - tracked on the row for later analysis of what kinds of proposals get rejected.

axctl improve checkpoint

Compute checkpoint snapshots at +3/+10/+30 sessions for active experiments (session-count windows, not calendar days - see issue #83). Cron-runnable; the weekly self-improve cron calls this. Legacy day-based rows (t+7/t+30/t+90) from before #83 stay in the DB as historical data and are not re-derived.

axctl improve reset --yes

Wipe all experiment-loop state (proposals, experiments, checkpoints, skill proposals). For test fixtures and local-only debugging. Requires --yes.

Provenance markers

Every accepted proposal's edit is wrapped:

<!--ax:a1b2c3d4e5f6-->
... agent-file content ...
<!--/ax:a1b2c3d4e5f6-->

The id is the proposal dedupe_sig prefix. axctl improve lint reconciles both directions: orphan markers (DB has no proposal) and orphan proposals (task_emitted but the brief was never consumed). Nested same-id close tags are balanced; markers across multiple files for the same proposal are allowed.

.ax/tasks/<id>.md task briefs

When axctl improve accept <id> runs without --auto-scaffold/--with-agent, it writes .ax/tasks/<id>.md with:

  1. Target file path (e.g. ~/.claude/CLAUDE.md or a skill SKILL.md path).
  2. The exact paste-ready block (markers + content).
  3. A Lint after applying: footer pointing at axctl improve lint.

The brief is plain markdown. Hand it to any agent; the agent's diff is what lands in your config. lint reconciles the brief's existence against the marker actually showing up in the target file.

Session Sharing CLI (axctl share)

Share a Session

axctl share <session-id> exports a sanitized session artifact, creates a secret GitHub Gist containing ax-session.json, and prints an https://ax.necmttn.com/s/<owner>/<gist-id> renderer URL, which opens the Studio-backed session inspector.

Use --dry-run to inspect the artifact before publishing:

axctl share <session-id> --dry-run > session-share.json

Secret Gists are unlisted links, not private storage. Do not share sessions that contain secrets or proprietary data without reviewing the dry-run artifact first.

Flags:

  • --dry-run - print the sanitized artifact JSON without publishing.
  • --public - create a public Gist instead of the default secret Gist.
  • --yes - skip the publish confirmation prompt.
  • --open - open the printed renderer URL in the default browser after publish on macOS.

Retro CLI (axctl retro)

The retro surface tracks one structured reflection per session (tried, worked, failed, next). A session has been retro'd iff the graph has a reviewed edge from it to a retro row. See ADR-0010 for the design rationale.

axctl retro emit

Write a retro for one session and create the reviewed edge.

Two paths:

  • No --from-file: run the deterministic heuristic on the named session (defaults to $AX_SESSION_ID, then the most recent session). Cheap, no LLM. Suitable for Stop-hook autoemit.
  • --from-file=<path>: ingest {tried, worked, failed, next} JSON written by an agent (the retro-reviewer subagent does this). --source defaults to claude_stop_hook here; pass --source=manual for subagent-authored payloads.

Flags:

  • --session=<id> - target session record id or bare key
  • --from-file=<path> - JSON payload to ingest
  • --source=<claude_stop_hook|codex_rollout|heuristic|manual>
  • --json - machine-readable

axctl retro pending

List sessions in the window that have no reviewed edge. Drives the /retro skill's Step 0 "drain the backlog" flow.

Two-pass query: ended sessions (ended_at != NONE) come first; idle sessions (no ended_at AND started_at older than --idle-min) come second. Subagent sessions (source = 'claude-subagent') are excluded by default - their retros belong to the parent session's review.

Flags:

  • --since=N (default 7) - window in days
  • --idle-min=N (default 30) - idle threshold in minutes for sessions without ended_at
  • --limit=N (default 20) - per-pass cap
  • --include-subagents - include claude-subagent rows
  • --json

axctl retro brief

Write a .ax/tasks/retro/<session-key>.md task brief for one session. The brief is what the retro-reviewer subagent consumes. Frontmatter includes the transcript pointer, model used, turn count, pending reason, and a suggested_model heuristic (haiku for ≤5 turns, opus for ≥40 turns, sonnet otherwise).

Flags:

  • --session=<id> (required) - target session record id or bare key
  • --out-dir=<path> - override .ax/tasks/retro/ location
  • --json

axctl retro list

Browse recent retros (reverse-chronological).

Flags:

  • --since=N (default 7) - window in days
  • --limit=N (default 20)
  • --json

axctl retro reflect

Walk clustered retro-derived proposals interactively (accept / reject / skip each pattern). Used by the /retro skill's triage step; see that skill for the full workflow.

axctl retro meta

Emit a read-only investigation snapshot (JSON) for an external AI agent to drive a deep retro-of-retros. Used by /retro-meta.

axctl retro plan

Register an externally-drafted plan as a proposal (plus experiment unless --leave-open). Called by an external agent after the user agrees in a /retro-meta session.

.ax/tasks/retro/<session-key>.md briefs

A retro brief is a markdown file with YAML frontmatter (session_id, session_key, transcript, model_used, turns, pending_reason, suggested_model, status: pending) and a body describing what the reviewer should produce. The retro-reviewer subagent reads it, calls ax retro emit --source=manual, optionally calls ax improve recommend for repeated patterns, and updates the brief's frontmatter status: completed. The reviewed edge created by ax retro emit removes the session from the next ax retro pending result.

These briefs live next to the older .ax/tasks/<id>.md improve briefs but in their own subdir to keep listings clean.

Workflow extraction queries

These commands were shipped in the feat/workflow-extraction-port-2026-05-29 branch. They cover scoped ingest, session navigation, cross-session recall, skill classification, role tagging, and role-aware skill views.

ax ingest here [--since=Nd] [--stages=...]

Scope a full ingest run to the git repository at $PWD. The claude stage is restricted to the matching ~/.claude/projects/<slug>/ transcript directory; git history is restricted to this repo path. Codex, Pi, OpenCode, and Cursor stages are skipped by default because they have no per-repo cwd filter yet.

Flags:

  • --since=Nd (e.g. --since=7) - only ingest transcripts newer than N days
  • --stages=<a,b,c> - run exactly these stages instead of the auto-filtered set
  • --progress=<plain|json|tui> - progress reporting mode
axctl ingest here --since=3
axctl ingest here --stages=claude,git,signals

Errors with a clear message when $PWD is not inside a git repository.

ax sessions here [--days=N] [--json]

List sessions whose repository matches the git repo at $PWD, reverse chronological, within the last N days.

Flags:

  • --days=N (default 14) - lookback window
  • --json - machine-readable array
axctl sessions here
axctl sessions here --days=7 --json

ax sessions around <date> [--days=N --project=PATH] [--json]

List sessions that started within ±N days of <date>. Accepts YYYY-MM-DD or full ISO 8601.

Flags:

  • --days=N (default 3) - half-window around the anchor date
  • --project=PATH - filter by project slug or absolute repo path
  • --json
axctl sessions around 2026-05-23
axctl sessions around 2026-05-23 --days=7 --project=/Users/me/Projects/acme

ax sessions near <sha> [--json]

List sessions whose time range overlaps the commit window around <sha> in the git repository at $PWD. The window is derived from the commit's author date and the surrounding parent/child timestamps. Root commits fall back to ±3 days.

Flags:

  • --json
axctl sessions near d923fcc
axctl sessions near HEAD --json

ax sessions show <id> [--expand=<uuid> | --all] [--by-role] [--json]

Display the invoked-skill and tool-call timeline for one session. Subagent sessions are collapsed to one-line summaries by default.

Flags:

  • --expand=<uuid> (repeatable) - inline the named subagent's contents
  • --all - inline all subagent contents
  • --by-role - group the Top Skills section by role instead of a flat list; skills without a role appear under "(unclassified)"
  • --json - machine-readable; also the default when stdout is not a TTY

<id> accepts a bare UUID, a claude-subagent-<id> string, or a full session:⟨...⟩ record id.

axctl sessions show a1b2c3d4-e5f6-...
axctl sessions show a1b2c3d4 --expand=f9e8d7c6 --by-role
axctl sessions show a1b2c3d4 --all --json

ax recall <q> [--sources=turn,commit,skill] [--scope=here|all] [--project=? --skill=? --since=ISO] [--json]

Full-text BM25 recall across sessions, commits, and skill invocations. Returns ranked hits with timestamps, project slugs, and excerpt snippets.

Flags:

  • --sources=<turn,commit,skill> (default all) - comma-separated source types
  • --scope=here|all - here restricts results to the git repo at $PWD; omitting auto-detects (tries here, silently falls back to all)
  • --project=<slug|?> - filter by project slug; ? opens an interactive picker
  • --skill=<name|?> - filter sessions that invoked the named skill; ? picks
  • --since=<ISO> - only results newer than this ISO timestamp
  • --json
axctl recall "auth middleware"
axctl recall "schema migration" --scope=here --sources=turn,commit
axctl recall "retry loop" --project=acme-app --since=2026-05-01 --json

Pass --project=? or --skill=? on a TTY to get a numbered interactive picker; these flags require a value when stdin is not a TTY.

ax skills classify [<skill>...] [--out-dir=<path> --dry-run --json]

Emit one classify brief per unclassified skill into .ax/tasks/classify-<slug>.md. In default mode (no names), targets all skills with ≥ 3 invocations that have no plays_role edge with source in ("frontmatter", "brief", "user"). In explicit mode (one or more names provided), targets exactly those skills with no invocation threshold and no unclassified guard.

Flags:

  • <skill>... - optional list of skill names to target explicitly
  • --out-dir=<path> (default .ax/tasks/) - directory to write briefs into
  • --dry-run - print briefs to stdout without writing files
  • --json - print a JSON array of {skill, invocations, sessions, path} records
axctl skills classify
axctl skills classify retro simplify --dry-run
axctl skills classify --out-dir=.ax/tasks --json

Skips files that already exist (idempotent). The generated briefs are consumed by axctl skills lint once an agent fills in the primary_role frontmatter field.

ax skills tag <skill> <role> [--confidence=N --rationale="..." --remove]

Write (or remove) a plays_role edge with source="user" between a skill and a role. Idempotent: any prior user-source edge for the same pair is deleted before the new one is created. Run multiple times with different roles to attach multiple roles to the same skill.

Flags:

  • --confidence=N (float 0–1, default 1.0) - confidence score on the edge
  • --rationale="..." - free-form rationale stored on the edge
  • --remove - delete the user-source edge instead of creating it

Role and skill names are validated at the boundary (alphanumeric, _ or -, optionally plugin-namespaced for skills; lowercase alphanumeric and _ or - for roles).

axctl skills tag retro reflection
axctl skills tag simplify cleanup --confidence=0.8 --rationale="consistent usage pattern"
axctl skills tag simplify cleanup --remove

ax skills lint [--task-dir=<path> --dry-run --json]

Scan .ax/tasks/classify-*.md for filled briefs (YAML frontmatter with a non-empty primary_role field), write plays_role edges with source="brief", and remove each brief file after a successful write. Pending briefs (no primary_role) are silently skipped. This is the counterpart to skills classify - classify emits the brief, lint consumes it.

Flags:

  • --task-dir=<path> (default .ax/tasks/) - directory to scan for briefs
  • --dry-run - report what would be applied without writing edges or removing files
  • --json - machine-readable LintReport with briefs, applied, pending, errors, and dryRun fields
axctl skills lint
axctl skills lint --dry-run
axctl skills lint --task-dir=.ax/tasks --json

Sweeps all prior source="brief" edges for a skill before writing the current set, so role shrinkage is handled atomically.

ax skills weighted [--window=Nd --limit=N --doctor-threshold=N --json]

Rank skills by a composite weighted score over a rolling time window. The score blends invocations (positive), errors near invocation (negative), user corrections within 3 turns (negative), commits produced by sessions that invoked the skill (positive), and proposed-but-not-invoked counts (negative).

Flags:

  • --window=Nd (e.g. --window=30) - rolling window in days; omit to use the default window defined by fetchSkillsWeighted
  • --limit=N (default 25) - rows to show
  • --doctor-threshold=N (default 5) - correction count above which a skill is flagged for review
  • --json
axctl skills weighted
axctl skills weighted --window=7 --limit=10
axctl skills weighted --doctor-threshold=3 --json

ax skills by-role <role> [--json --limit=N]

List all skills classified as <role>, ranked by invocation count.

Flags:

  • --limit=N (default 50)
  • --json
axctl skills by-role reflection
axctl skills by-role cleanup --limit=20 --json

ax skills roles <skill> [--json]

List all roles assigned to <skill> (from any source: frontmatter, brief, user, or inferred), with confidence scores and sources.

Flags:

  • --json

Exits with a non-zero status and an error message when the skill name is not found in the DB.

axctl skills roles retro
axctl skills roles "superpowers:systematic-debugging" --json

ax roles [--json]

List every role in the DB with the count of skills assigned to it. Useful for exploring the taxonomy before tagging or classifying.

Flags:

  • --json
axctl roles
axctl roles --json

Hooks SDK CLI (ax hooks)

@ax/hooks-sdk hooks are single TypeScript files in ~/.ax/hooks/ that default-export defineHook({ name, events, matcher, run }) and return a verdict: allow, block(reason), warn, or inject. Defects fail open - a buggy hook never wedges the agent. Fire path is bun <file>.ts (~70ms).

ax hooks init [--dir=~/.ax/hooks] [--no-install]

Scaffold the ~/.ax/hooks workspace (package.json + starter guard hooks).

ax hooks install <file> [--providers=claude,codex] [--scope=global]

Idempotent fan-out of one SDK hook file into the Claude Code and Codex provider configs, with ax ownership markers.

ax hooks backtest <file> [--days=N] [--provider=claude] [--json]

Replay historical tool_call rows through the hook in-process before installing it: would-block count and rate, per-project breakdown, sample blocked commands. Default window 30 days. State-dependent checks run against current repo state (caveat printed).

ax hooks cases <case> [--since=N] [--tail=N] [--window=N] [--no-persist] [--json]

Deterministic feedback-case verdict runner - separate from backtest. Scores a known candidate (currently enforce-worktree) against labeled cases and reports a structured pass/fail verdict.

axctl hooks init
axctl hooks install ~/.ax/hooks/enforce-worktree.ts --providers=claude,codex
axctl hooks backtest ~/.ax/hooks/enforce-worktree.ts --days=14
axctl hooks cases enforce-worktree --since=14

Empty DB Benchmarks

Use scripts/bench-empty-db.sh for cold ingest timing without mutating ax/main:

scripts/bench-empty-db.sh --since=90

The script selects a unique AX_DB_DB=bench_<timestamp>, applies the schema, runs ingest, imports Claude insights, writes schema.json, checkouts.json, and git.json, and generates a static dashboard under ~/.local/share/ax/benchmarks/<db>/.

Repo initialization is not per-project. Ingest discovers repositories from existing transcript cwd values and optionally from ~/.local/share/ax/ax-repos.txt. The Git pass backfills session.repository and session.checkout; produced edges are then tied to the checkout plus commit timestamp, while touched edges connect commits to canonical repository-relative files.

The final ingest smoke also found and fixed a plan-item identity bug: plan item records now use plan+sequence identity, and the writer deletes legacy content-hashed item rows that conflict on the plan_item_plan_seq unique index before upserting the canonical row.