v1.4.0 — real trace grading

Latest

Latest

xmpuspus released this 30 May 12:50

49fe01c

Audit-driven trust-fix release.

Fixed

Trace grader was vacuous: the runner only emitted token spans, so all 4 rubrics scored 100 on every run. It now translates Claude Code tool events into FILE_EDIT/read/SHELL_COMMAND spans (correlating Bash exit codes), with path relativization through symlinked workspaces. Validated against a real fast-check run.
-j N was a silent no-op without --parallel; now enables parallel mode, and crashed parallel tasks are recorded as FAILs instead of vanishing.
no_out_of_scope_edits honors tests/ directory entries in files_to_examine.

Added

Baselines carry per-run trace_grade + submission readiness/trace_summary (null for span-less traces, no fake 100s). Published claude-code-custom-1.4.0-fast-check.json ships real discriminating grades.
docs/SECURITY.md: shell-exec trust boundary + scoped per-task Docker isolation.

Changed

Aider is a real adapter; runtime deps exact-pinned; invariant guard tests for storefront drift.

274 tests, ruff clean.

Assets 2