Audit-driven trust-fix release.
Fixed
- Trace grader was vacuous: the runner only emitted token spans, so all 4 rubrics scored 100 on every run. It now translates Claude Code tool events into FILE_EDIT/read/SHELL_COMMAND spans (correlating Bash exit codes), with path relativization through symlinked workspaces. Validated against a real fast-check run.
-j Nwas a silent no-op without--parallel; now enables parallel mode, and crashed parallel tasks are recorded as FAILs instead of vanishing.no_out_of_scope_editshonorstests/directory entries infiles_to_examine.
Added
- Baselines carry per-run
trace_grade+ submissionreadiness/trace_summary(null for span-less traces, no fake 100s). Publishedclaude-code-custom-1.4.0-fast-check.jsonships real discriminating grades. docs/SECURITY.md: shell-exec trust boundary + scoped per-task Docker isolation.
Changed
- Aider is a real adapter; runtime deps exact-pinned; invariant guard tests for storefront drift.
274 tests, ruff clean.