Skip to content

Commit b1355af

Browse files
Retro epic + cloud filing: self-observing transcript retrospective, delta-recall, and cloud-capable filing (#344, #568) (#601)
* feat(RV9JT4): safeword retro — transcript-mining session retrospective A manual `safeword retro --transcript <path> [--findings <path>]` command that mines a session transcript for QUALITATIVE safeword friction (bugs, rough edges, gaps the deterministic self-report spool can't catch) and files issues upstream AUTONOMOUSLY — no human approval. Autonomy is made safe by an automated egress guard, not a human: a constrained finding schema (no field for customer data), a deny-by-default sanitizer (secrets + customer paths redacted; safeword paths kept), fail-closed surface resolution, and a code-owned write over already- sanitized fields. The agent only supplies raw structured findings via a fresh- context extractor (retro guide); code does the rest. Modules (src/retro): finding (schema/normalizer + body), draft (retro: signature + shape), egress (sanitizer + resolveSurface), pipeline (composes the guard), ledger (occurrence ledger), triage (dedup + caps + encounters), hash, github-rest (IssueTracker REST adapter). Command in src/commands/retro.ts; agent guide in templates/guides/retro.md. 21 BDD scenarios proven by vitest (TB1/NTB1/SM1). Two independent quality reviews caught + fixed 4 egress leaks/DoS and 2 dedup defects, all regression-tested. Follow-ups tracked as sub-tickets: 1FGE1C (robust signature-marker dedup), 7ZCKS6 (extraction-quality eval). Verified: full suite 3879/3879, Gherkin 159 scenarios, lint/tsc clean, audit passed. Test fixtures use short non-canonical fake tokens (not real-format keys). * docs(FTCQGD): intake for retro auto-trigger (Claude-first) + Codex/Cursor follow-on stubs Lock the retro auto-trigger slice to Claude-first: a Stop-anchored, idempotent nudge that fires retro while the session is alive (SessionEnd is unreliable in cloud and unnecessary — friction is already in the transcript). Resolve the three intake questions: trigger=nudge (the one mechanism Claude/Codex/Cursor all expose), substance-gate=turn/tool-count (not spool-friction, which would defeat retro's purpose), one mechanism for cloud+local. The trigger is portable across agents; the per-agent transcript substrate is not, so stub Codex (53DQJZ) and Cursor (KHYXY4) as blocked follow-ons. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(FTCQGD): scenario-gate passed — 9 scenarios, impl-plan written Define-behavior + scenario-gate for the retro auto-trigger (Claude-first). An independent fork review caught three vacuous silent-Then scenarios (trivial-path, session-id precedence, fail-open); rewrote them as contrast/precedence/fail-open Scenario Outlines and added the missing different-session-id keying scenario. Re-review PASS. spec.md (personas/JTBD/ACs), dimensions.md, the @manual feature source, the R/G/R ledger, and impl-plan.md (proof plan + build order) all landed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(FTCQGD): auto-fire retro from the Claude Stop hook (cloud-safe, idempotent) Implements the retro auto-trigger (Claude-first): a stop-retro.ts Stop hook that, at most once per substantial session, surfaces a fact-phrased nudge pointing the agent at the retro pipeline + the live transcript_path — while the session is alive (Stop-anchored, not SessionEnd, which is killed before async work finishes in cloud and whose transcript is deleted on reclaim). Shared core (templates/hooks/lib/retro-trigger.ts, reused later by Codex 53DQJZ and Cursor KHYXY4): - countToolUses / isSubstantial — the transcript is the substance measure (inclusive >= threshold); no separate counter, no spool dependency. - resolveSessionId — input > cloud (CLAUDE_CODE_REMOTE_SESSION_ID) > local precedence ladder. - once-per-session sentinel keyed by sanitized session id (within-session idempotency; the occurrence ledger covers across sessions). - decideRetroNudge — fail-open orchestration: any missing input / unreadable transcript / trivial session / already-nudged → silent, never blocks Stop. stop-retro.ts mirrors stop-self-report.ts (fact-phrased additionalContext, exit 0, config-gated on selfReport.surface). Registered in schema.ts + config.ts Stop + .claude/settings.json with byte-parity .safeword mirrors. 24 unit + 5 integration tests; scenario-gate passed an independent fork review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * test(FTCQGD): exempt stop-retro.ts from the smoke hook-coverage guard stop-retro.ts is a Stop hook (fires at turn-end, not on a tool call) so it is not assertable in a tool-based live smoke run — same as its sibling stop-self-report.ts. It is covered deterministically by tests/integration/stop-retro.test.ts. Add the EXEMPT_HOOKS entry with that justification (the drift guard caught the new hook in CI, exactly as designed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(FTCQGD): reconcile impl-plan (implemented) + enter verify phase Plan held with no design drift; records the one unforeseen addition (the smoke hook-coverage EXEMPT entry surfaced by CI) and the live dogfood confirmation that the hook fired its nudge on this session's Stop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(FTCQGD): verify gate — full suite green, audit passed Test Suite 3915/3915 (5 skipped), Gherkin 159/159, build + lint + typecheck clean, audit clean (0 circular/dead/dup, no new deps). Experience walk: net friction reduced (removes the 'remember /retro' step). One open decision: PR #543 composition (combined epic vs split FTCQGD). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * chore(FTCQGD): mark retro auto-trigger done Done gate satisfied: full suite 3915/3915, Gherkin 159/159, build + lint + typecheck clean, audit passed, verify.md present, impl-plan reconciled. The Claude stop-retro hook fires the retro pipeline once per substantial session, cloud-safe and idempotent; confirmed firing live on this session. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(53DQJZ): scenario-gate passed — Codex retro trigger (11 scenarios) figure-it-out resolved the Codex substrate: Stop payload carries transcript_path directly; rollout shape differs from Claude (function_call/exec_command_begin/ mcp_tool_call_begin, not message.content tool_use). Define-behavior + scenario-gate for the Codex variant: independent fork review caught two blockers (trivial path vacuous vs fail-open; no end-to-end proof the Codex counter drives the fire) → added a below-threshold-count trivial scenario and a Claude-shaped-zero-events adversarial twin. Re-review PASS. spec.md, dimensions.md, the @manual feature, the R/G/R ledger, and impl-plan.md (counter-seam refactor + adapter build order). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(53DQJZ): fire retro from the Codex Stop hook Codex variant of the retro auto-trigger (FTCQGD shipped Claude). The figure-it-out established Codex's Stop payload carries transcript_path directly, but the rollout shape differs from Claude's, so: - countToolUsesCodex counts Codex tool events (function_call / exec_command_begin / mcp_tool_call_begin in {type,payload} JSONL), nesting-tolerant. - isSubstantial + decideRetroNudge now take an injected per-agent counter + session-id resolver (Claude defaults — behavior-preserving, FTCQGD suite green). - resolveCodexSessionId keys on session_id > CODEX_THREAD_ID (NOT turn_id, which is per-turn and would break once-per-session idempotency). - codex/stop.ts emits {decision:block, reason} (continuation) or {} — Codex Stop requires valid JSON; fail-open on any bad input. Wired config.toml [[hooks.Stop]] (schema patch + unpatch), schema.ts, byte mirrors. Excluded /codex/ from the SETTINGS_HOOKS drift test (Codex hooks wire via config.toml, like /cursor/). 13 unit + 5 integration tests; the scenario-gate passed an independent fork review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * chore(53DQJZ): enter verify phase * docs(53DQJZ): reconcile impl-plan (implemented) Resolver substitution (direct resolveCodexSessionId vs run-identity wrapper — equivalent, simpler), the turn_id session-key correction, and the /codex/ SETTINGS_HOOKS drift-test exclusion recorded against what shipped. * docs(53DQJZ): verify gate — full suite green, audit passed Test Suite 3933/3933 (5 skipped), Gherkin 159/159, build + lint + typecheck clean, audit clean (0 circular/dead/dup, no new deps). Experience: net friction reduced; two weak links recorded (nudge compliance; Codex experimental-hooks). Awaiting done-flip confirmation. * refactor(53DQJZ): quality-review fixes — honest Codex-nesting comment + Codex fail-open integration row Independent fresh-context code review (APPROVE, 0 critical). Applied the two worthwhile NOTEs: - countToolUsesCodex comment overstated 'confirmed by a live Codex spike' — the spike is deferred; reworded to say the nesting is NOT yet confirmed and both shapes are matched defensively to hedge that. - Added a Codex integration case for an unreadable transcript_path → valid {} / exit 0, so the Codex fail-open suite matches the Claude one's coverage. Declined NOTE #3 (installCrashCapture in the trigger hooks): the top-level try/catch is correct fail-open; adding crash capture is out of this ticket's scope and would couple the trigger to the spool. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * refactor(retro-trigger): extract shared JSONL-entry counting skeleton Both per-agent tool-use counters (countToolUses, countToolUsesCodex) duplicated the trim / split / JSON.parse-or-skip-malformed loop. Extract it to one sumOverJsonlEntries(text, perEntry); each counter now supplies only its per-entry rule. Behavior-preserving (48 trigger tests + typecheck green); the malformed-skip guard lives in one place and KHYXY4's Cursor counter becomes a one-liner. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * chore(53DQJZ): mark Codex retro trigger done Done gate satisfied: full suite green at verify (3933/3933), Gherkin 159/159, audit passed, verify.md present, impl-plan reconciled, quality-review APPROVE (0 critical), refactor pass landed (shared JSONL-counter skeleton extracted). The Codex stop hook fires retro once per substantial session via a {decision:block} continuation, idempotent and fail-open. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(KHYXY4): scenario-gate — Cursor retro trigger (13 scenarios) Cursor variant of the retro auto-trigger. figure-it-out (official Cursor docs) overturned the 'no transcript' premise: every Cursor hook incl. stop carries transcript_path to a Claude-shaped JSONL transcript, so the existing countToolUses is reused; only the session-id source (conversation_id) and output channel (followup_message) are Cursor-specific. Independent fork review caught 3 vacuous scenarios (coexistence left-unset, unexercised counter, unproven id resolution); all fixed. spec/dimensions/feature/ledger landed; re-review pending. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(KHYXY4): fire retro from the Cursor stop hook (followup_message) Adds the retro path to cursor/stop.ts, completing the cross-agent set. Per the figure-it-out (official Cursor docs): every Cursor hook incl. stop carries transcript_path to a Claude-shaped JSONL transcript, so the existing countToolUses is reused unchanged. New: resolveCursorSessionId (conversation_id, session-stable) + conversation_id on RetroTriggerInput. The retro nudge rides Cursor's followup_message (auto-submits) on the non-quality-review branch, so it yields to the existing quality-review followup without consuming its once-per-session sentinel (proven by the integration test). 4 unit + 6 integration tests; loosened a brittle exact-import assertion in cursor-stop-review.test.ts to assert the crash-capture wiring rather than a literal import string. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * chore(KHYXY4): enter verify phase * docs(KHYXY4): verify gate green + quality-review provenance fix Full suite 3944/3944 (5 skipped), Gherkin 159/159, build + lint + typecheck clean, audit passed. Independent quality-review APPROVE (0 critical) — traced coexistence, session-id keying (conversation_id session-stable, never generation_id), and fail-open as correct. Applied its one NOTE: cite the official Cursor hooks docs in resolveCursorSessionId, closing the one unverified assumption. * chore(KHYXY4): mark Cursor retro trigger done Done gate satisfied: full suite 3944/3944, Gherkin 159/159, audit passed, verify.md present, impl-plan reconciled, quality-review APPROVE (0 critical). Cursor's stop hook fires retro once per substantial session via followup_message, reusing the Claude-shaped counter, keyed by conversation_id, coexisting with the existing quality-review followup. Completes the cross-agent trigger set (Claude + Codex + Cursor). * docs(KHYXY4): document the coexistence starvation trade-off Second independent quality-review (APPROVE, 0 critical) flagged that the all-edits-every-stop case starves retro that session. Add a comment at the quality-review branch naming it as an accepted trade-off (the ledger dedupes across sessions; one followup_message per stop is a hard Cursor constraint). * refactor(retro-trigger): collapse session-id resolvers over firstNonEmpty The three resolvers (Claude/Codex/Cursor) each repeated nonEmpty(a) ?? nonEmpty(b) precedence chains. Generalize nonEmpty into firstNonEmpty(...values) and rewrite each resolver as a one-liner over it. Behavior-preserving (41 trigger tests + typecheck green). * fix(retro/egress): close verified secret-leak gaps from PR #543 review A reviewer flagged (verified against the code) that the egress sanitizer — which files into the PUBLIC repo — missed the key formats most likely to appear in AI-coding transcripts, and that the 'independent LLM redaction pass' claimed in the comment/PR does not exist. Secrets: add hyphenated provider keys (sk-ant-…, sk-proj-…, sk-…), Authorization Bearer tokens, and secret-named assignment literals (password=/token:/api_key=). Paths: scrub relative customer paths (src/customers/acme/secret.ts) via a plain string-op tokenizer (no backtracking regex / no ReDoS surface) — safeword-internal tails still survive the allowlist; prose with a slash (and/or, TCP/IP) is untouched. Honesty: correct the egress comment — there is NO LLM redaction pass; the real defense is the constrained schema + this deterministic scrubber, with secretlint adoption named as the durable hardening. Leak-regression tests added at the unit (egress.test.ts) and end-to-end (pipeline.test.ts) layers for every newly-covered format. Did NOT add a blanket high-entropy/hex catch-all (would redact git SHAs/ids) — assignment-context + specific provider shapes instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(SPNZKM): track durable egress hardening follow-up (secretlint + write-path test) From the PR #543 review fast-follow. Immediate Critical fixed inline (859c5fb); this carries @secretlint/core adoption + the GitHub write-path auth/method test. * fix(retro/egress): close round-2 secret/path leaks from PR #543 review A fresh adversarial review (verified-by-construction) found 8 leaks the first pass missed. Closed the concrete ones: - Secrets: GitHub fine-grained PAT (github_pat_), Slack xapp-, Authorization Basic (base64 user:pass), and underscore-prefixed secret names (aws_secret, client_secret) via (?<![a-z0-9]) instead of \b. - Paths: rewrote the path scrub to scan path-character RUNS (punctuation delimits) and normalize backslashes — closing Windows relative paths (C6), paths glued to text (C7), and paths with a query string (C8). Documented residual honestly: arbitrary KEY=secret names and 40-char entropy-shaped keys are not regex-coverable — secretlint (SPNZKM) is the durable fix. Over-redaction of non-safeword relative paths is accepted (safe direction). Leak-regression tests added for C1-C8. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(retro/egress): layer @secretlint provider rules over the regex floor (SPNZKM) Adopt @secretlint/core behind the scrubSecrets seam as an ADDITIVE pass, not a replacement. The spike proved secretlint is precise (its anthropic rule needs the exact 108-char shape; its AWS rule skips bare AKIA ids by default), so it MISSES truncated/malformed keys the broad regex catches. Keep both layers: - redactKnownSecrets: async secretlint pass over the raw text, fail-OPEN to the input (a dep/parse failure is never less safe — the regex floor still runs). - sanitizeTextDeep = sanitizeText(await redactKnownSecrets(text)): secretlint for the 28 maintained provider formats, then the sync regex/path/email floor. prepareEncounters is now async. Adds end-to-end coverage that a well-formed secretlint-only key (sk-ant 108-char) never reaches the assembled body, plus the previously-untested GitHub write path: createIssue/createComment/updateComment now assert method + auth header + JSON body (the fetch mock no longer drops init). Documents @secretlint/core + preset-recommend as runtime deps in ARCHITECTURE.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * fix(retro/egress): coalesce overlapping secretlint ranges before redacting Independent review found a latent leak: @secretlint can report overlapping ranges for one credential (basicauth ⊃ github over a credentialed URL — its pipeline de-dupes only exactly-equal ranges). The descending-by-start splice assumed disjoint ranges; with a nested span shorter than the `[redacted]` replacement, stale offsets under-delete and leak the secret's tail. Coalesce ranges into a minimal disjoint set first, then splice back-to-front — provably correct for any topology. Adds a regression test reproducing the overlapping basicauth/github case, a direct redactKnownSecrets isolation test for a secretlint-only (sendgrid) key, and corrects the fail-open JSDoc to match the static-import reality (missing dep = fail-CLOSED at module load, also safe). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * refactor(retro/egress): extract shared [redacted] sentinel constant The placeholder was duplicated across scrubSecrets and redactKnownSecrets, and it is load-bearing: the floor-stays-clean invariant requires both secret layers to emit the identical inert sentinel. A shared REDACTED constant removes the duplication and documents that invariant. Behavior-preserving. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): add 7D8PJP invisible-retro-claude (intake) BDD feature ticket under RV9JT4 for the invisible self-report architecture (GitHub #550): Stop hook runs retro extraction in an isolated headless `claude -p` session (no --bare, SAFEWORD_RETRO_CHILD guard, transcript digest) instead of injecting additionalContext into the user's conversation; filing via the agent-owned GitHub transport. Validated live in this cloud session (#553). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): 7D8PJP spec + scenarios (define-behavior) spec.md (personas/JTBD/ACs, self-reviewed), dimensions.md, and the invisible-retro-claude.feature scenarios + R/G/R ledger. 9 scenarios across 4 rules covering all ACs: no conversation hijack (TB1), cloud auth + synchronous + digest (TB2), egress guard unchanged + recursion guard (NTB1), agent transport + once-per-session (SM1). Phase: scenario-gate (independent review pending). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): 7D8PJP address scenario-gate review (12 scenarios) Close the independent review's GATE BLOCK: add fail-open scenario (extractor error → hook stays silent, no throw, files nothing), token-present REST transport arm, and the "fires once when sentinel unset" baseline; tighten the digest scenario (concrete marker survival), the spawn-contract scenario (digest passed in, neutral cwd), and the allowed-tools assertion (Read only, no write/Bash). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(retro): buildDigest — size-capped transcript digest for headless extraction (7D8PJP) First implement-phase unit (TB2.AC3). Reduces a multi-MB JSONL transcript to a signal-dense digest under a cap: user/assistant text + tool-use names + short/error-ish tool results; oversized non-signal tool-result bodies are omitted (not just truncated) so they can't crowd out the markers the extractor needs. Malformed lines skipped, never thrown. New templates/hooks/lib module + byte-identical .safeword mirror, registered in schema.ts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): mark TB2.AC3 digest scenario green (7D8PJP) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): 7D8PJP impl-plan + build order (scenario-gate exit) impl-plan.md with all five sections (approach + per-AC proof + build order, decisions, arch alignment, deviations, assessment triggers). Ticket advanced to implement with the build order recorded in the work log. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(retro): buildExtractArgv — headless claude -p argv, no --bare (7D8PJP TB2.AC1) Builds the `claude` argv for the isolated retro extraction: print mode + JSON output + read-only tool set, and deliberately NO `--bare` (which breaks the cloud managed-proxy auth — proven live). Trailing positional is the task prompt. Byte-mirrored to .safeword. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): mark TB2.AC1 headless-argv scenario green (7D8PJP) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(retro): isRetroChild + RETRO_CHILD_ENV recursion guard (7D8PJP NTB1.AC2 read half) The headless child runs with hooks loaded (no --bare for cloud auth), so it would re-fire retro. The SAFEWORD_RETRO_CHILD env sentinel lets every safeword hook early-return in the child. Predicate + exported env name (the spawn half, piece 4, sets it). Byte-mirrored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(retro): runHeadlessExtraction — synchronous, fail-open out-of-band runner (7D8PJP) Spawns claude -p in an isolated session: digest as input, neutral cwd, child env carries SAFEWORD_RETRO_CHILD=1 (recursion guard). Awaits the spawn (synchronous) and fail-OPENS — non-zero exit, unparseable output, or a spawn throw yields [] and never throws, so the wrapping Stop hook stays silent. Parses findings from the claude -p JSON envelope. Closes TB1.AC2 (spawn contract), TB2.AC2 (synchronous), TB1.AC1 fail-open, NTB1.AC2 spawn half. Byte-mirrored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): mark runner scenarios green — TB1.AC2/TB2.AC2/TB1.AC1(fail-open)/NTB1.AC2 (7D8PJP) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(retro): decideRetroRun — invisible trigger decision, no nudge (7D8PJP) The out-of-band replacement for decideRetroNudge: same substance + once-per- session gates, but returns the transcript path to extract (not conversation text) and adds the recursion guard FIRST (a SAFEWORD_RETRO_CHILD process never triggers another retro). Closes both SM1.AC2 arms (fires once when unset, suppressed after) and NTB1.AC2's gate half. Byte-mirrored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): mark SM1.AC2 both arms green via decideRetroRun (7D8PJP) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(retro): rewrite stop-retro to run out-of-band, emit no additionalContext (7D8PJP TB1.AC1) The invisibility swap: stop-retro now calls decideRetroRun and, on a run decision, spawns `safeword retro --auto-extract --transcript <path>` synchronously (spawnSync, stdio ignored) — emitting NOTHING to the conversation. No more additionalContext nudge; the user's session is never hijacked. SAFEWORD_RETRO_EXTRACT_CMD is a test/advanced seam to neutralize the real CLI. Codex/Cursor adapters keep the nudge model (their own tickets #551/#552). Integration test asserts no-output + sentinel-armed on a substantial session, plus the silent paths (trivial / second stop / retro-child / surface-off / malformed). Byte-mirrored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): mark TB1.AC1 + fail-open hook half green (7D8PJP) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(retro): safeword retro --auto-extract — headless extraction into the egress pipeline (7D8PJP NTB1.AC1) --auto-extract builds the FindingExtractor from runHeadlessExtraction (real spawnSync of claude -p, digest written to a neutral temp dir, SAFEWORD_RETRO_CHILD set by the runner) instead of reading --findings. The extracted findings flow through the SAME egress guard (normalize → fail-closed surface → sanitizeTextDeep → assemble → file). Wiring test proves a secret + customer path are scrubbed and an unresolved-surface finding is dropped end-to-end via the auto-extract path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(retro): resolve GitHub token from env or `gh`, drop hard GITHUB_TOKEN requirement (7D8PJP SM1.AC1) Transport selection now sources the token from GITHUB_TOKEN OR `gh auth token` (the environment's existing GitHub access), reusing the battle-tested REST transport rather than adding a second adapter. With neither available, the command no-ops gracefully (info, exit 0) instead of failing — the out-of-band Stop hook must never fail the Stop for lack of GitHub access. Covers SM1.AC1 both arms (token → REST; no token but gh → still files). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): mark SM1.AC1 both arms green — all 12 scenarios complete (7D8PJP) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): 7D8PJP verify.md — suite 4171/4171, audit passed, all 12 scenarios green Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): reconcile 7D8PJP impl-plan → implemented (transport refinement + test seam noted) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * refactor(retro): require createRestTransport token; clarify recursion-guard comment (7D8PJP review) Done-gate review fixes (non-blocking): make createRestTransport's token required so callers can't bypass resolveGitHubToken's `gh` fallback via the old process.env default; correct the SAFEWORD_RETRO_CHILD comment ("the retro Stop hook checks this" — stop-retro is the only hook that recursively spawns claude -p). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * chore(tickets): flip 7D8PJP + SPNZKM to done 7D8PJP (invisible retro, Claude/cloud): all 12 scenarios green, verify 4171/4171, audit passed, quality-review APPROVED. SPNZKM (egress hardening): secretlint + write-path tests, reviewed + refactored. Both ride PR #543; flipping satisfies the CI ticket-closure guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * chore(tickets): regenerate INDEX after 7D8PJP/SPNZKM done flip Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * harden(retro): cap raw findings processed in prepareEncounters (combined review) Combined SPNZKM+7D8PJP quality-review surfaced an unbounded-cost edge: each finding costs 4 async secretlint passes, and --auto-extract feeds in model output of uncontrolled length — a runaway/adversarial `claude -p` array could fire thousands of secretlint calls inside the synchronous Stop. Cap raw findings at 50 (a real session yields a handful; generous headroom for recurrence bumps, which triage's 5-issue creation cap doesn't bound) so the cost ceiling is explicit. Non-blocking hardening; no Critical issues found. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): correct 7D8PJP 'validated end-to-end' overstatement (cloud filing pending #568) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(ticket): add BNGK9W cloud retro filing transport (#568) Captures the figure-it-out decision: try-REST-then-agent-subagent transport for retro filing, selected automatically. Parent RV9JT4, sibling of 7D8PJP. Intake phase; spec.md stub to be filled at build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(ticket): add 0XEMEE retro extraction-recall (0/5 eval); link BNGK9W Live head-to-head eval: headless extractor caught 0 of the 5 findings a manual review caught (#564-#568) on this 25MB session. Two failure modes captured with numbers: digest head-truncation (8.1% kept, tail cut) and weak within-window recall. Transport (BNGK9W) is necessary-not-sufficient. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): correct 0XEMEE framing after sonnet eval (tier, not "near-zero") Sonnet @400KB returned 9 valid high-value findings (~242s); haiku 1-3 weak. The "0/5 vs human-filed 5" metric was misleading — no canonical set, 15+ real frictions in the session. Reframe to the three real problems: tier (haiku too weak by default vs cost), coverage (head-only digest misses the tail), and metric (need validity-based eval). Fix BNGK9W cross-ref too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ticket): add TIMING lever (Phase 0) to 0XEMEE, concept-tested + steelmanned "When does it run" exposed the dominant recall lever: decideRetroRun fires at the FIRST Stop with >=3 tool-uses, then the sentinel suppresses it. Concept test on this session's raw transcript: trigger fires at line 19/9,788 (0.2%), but #567 isn't seen until 29% and full coverage not until ~50% — fire-once-early reads only the opening. Caps recall before tier/coverage matter. Added the re-arm fix + a 5-point steelman; reordered done_when so timing is Phase 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(0XEMEE): define-behavior — re-arm timing slice (Phase 0) BDD define-behavior for the timing lever: spec.md (SM persona, TB1/TB2/TB3/NTB1 JTBDs+ACs), dimensions.md (8 dimensions), retro-rearm-timing.feature (12 scenarios), and the R/G/R ledger. Replaces the boolean once-per-session sentinel with a growth-gated re-arm so later Stops re-read the fuller transcript; existing occurrence ledger dedupes filings. Advanced to scenario-gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * chore(0XEMEE): scenario-gate PASS + proof plan; advance to implement Independent fork review (sonnet, /review-spec) returned GATE: PASS, 0 blocking, 6 step-authoring heads-ups. Stamp recorded. Logged the leaf-first proof plan (count-state helpers -> re-arm decision -> two-call current-transcript -> back-half finding integration -> ledger dedupe -> stop-retro wiring) and advanced to implement. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(0XEMEE): add impl-plan.md (re-arm timing, Phase 0) Five-section impl plan authored at scenario-gate exit: approach + per-AC proof table + leaf-first build order, the re-fire/sentinel/dedupe/cost-bound decisions, arch alignment (FTCQGD trigger core + RV9JT4 ledger + byte-parity mirror), and assessment triggers. Unblocks the implement phase. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * fix(0XEMEE): geometric re-arm backoff + fire cap — bound long-session cost Cost question exposed that an additive growth threshold is O(N) fires (~74 on this 1,859-tool-use session). Switch to geometric backoff (re-fire at last_fired x REARM_FACTOR) = O(log N) (~5 fires), plus a MAX_FIRES cap, the #563 friction gate, the fixed digest cap, and haiku tier this slice. Worst case ~$0.50/session, typical ~$0.10. impl-plan Decisions/approach updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * fix(0XEMEE): plan-check vs this session killed geometric + haiku Simulated the re-arm schedule against this session's real timeline (sim-rearm.ts): - GEOMETRIC backoff front-loads → last fire 38%, 62% blind tail (its 5/5 was a summary artifact). Friction-gating worse (3,622 friction lines exhaust a 5-cap by 7%). Fix: ADDITIVE cadence (~200 tool-uses, no low cap, high backstop) → last fire 91-100%, full coverage. - HAIKU too weak (1-3 vs sonnet 9); re-arming it re-reads badly. Fix: default SONNET. Honest cost: typical ~$0.35-1/session, pathological 25MB ~$3-5. impl-plan Decisions/approach/triggers updated; debounce-to-quiet noted as the future cost win (blocked on cloud reclaim-timing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * feat(0XEMEE): piece 1/6 — re-arm state helpers (count + fire-number) Replace the boolean once-per-session sentinel with per-session re-arm state: RearmState {lastCount, fires}, rearmStatePath, readRearmState (fail-open on missing/corrupt), recordFire; REARM_GROWTH=250 + MAX_FIRES=20 (additive cadence, high backstop). 6 unit tests (keying, round-trip, re-fire overwrite, path- sanitize, corrupt-fail-open) — 35/35 retro-trigger pass. .safeword mirrored. Added a plain-English end-to-end design capture to the ticket. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(0XEMEE): /quality-review REQUEST CHANGES — phases are coupled Two independent reviewers. Headline: Phase 0 (timing) is INERT in isolation — buildDigest head-slices (first 180KB = chronological opening), so every re-fire reads the same ~8% of content; re-arm surfaces nothing new without the coverage fix. Plus: sync spawnSync blocks Stop (re-arm x sonnet = repeated multi-minute hangs -> needs detached/async); dedupe-by-title + sessionId='unknown' fallback are fragile (signature-dedupe/1FGE1C is a dependency); read-then-write TOCTOU on the rearm state. Conclusion: stop the linear phase build; re-decide the coherent slice (coverage + late-fire + sonnet + async + signature-dedupe together). Captured in the work log. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(0XEMEE): figure-it-out DECISION — delta re-arm + sonnet + async:true hook Coherent recall slice (replaces the inert linear phase plan): - Coverage: each re-fire digests only NEW activity since the last fire (bounded window + small overlap); deltas tile the whole session, defeating the head-cap. ~M bounded calls (~$1.44 this session) vs whole-transcript-per-fire's ~M*N (~$15). Validated by sliding-window IE (SLIDE arxiv 2503.17952); ledger = the cross-window entity bank. - Execution: latency blocker RESOLVED by documented `async: true` hook mode (hook returns immediately, background, 600s) — not asyncRewake (would break invisibility). Detached survival also proven empirically. - Signature dedupe (1FGE1C) folded in as a dependency. Next: rewrite impl-plan + scenarios for the combined slice; piece-1 offset state is reused (lastCount -> byte/line offset). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * refactor(retro): kill 0XEMEE + inert linear-phase artifacts; start ZFGWS1 The /quality-review proved the phase plan inert and /figure-it-out settled the coherent slice, so reset: - Delete ticket 0XEMEE and its scenarios/spec/dimensions/impl-plan. - Revert retro-trigger.ts (+ .safeword mirror) and its test to the FTCQGD baseline (remove the piece-1 re-arm state helpers + tests; 29/29 green). - Remove the superseded retro-rearm-timing.feature. - New ticket ZFGWS1 (delta re-arm + sonnet + async:true hook + signature dedupe) carries the evidence forward (eval, concept test, sim, quality-review, decision). - Repoint BNGK9W's dependency note 0XEMEE -> ZFGWS1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ZFGWS1): fold in quality-review — design validated, scope sharpened Isolated-subprocess review: REQUEST CHANGES but design VALIDATED (async:true confirmed in docs; sonnet/haiku IDs current; egress unbypassed; all 6 code claims verified-from-file). Sharpened scope: name the 2nd model site (retro.ts:113); atomic offset = temp-write+rename; clarify async:true backgrounds the whole hook tree (inner spawnSync stays sync — reviewer's "defeats async" was a misread); buildDigest takes a pre-sliced window; overlap size / #563-absent fail-open / tail bound -> spec refinements. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HoXpza4VZxp5kRwjBYjD5r * docs(ZFGWS1): intake + define-behavior — spec, dimensions, 22 scenarios Advance ZFGWS1 (retro recall: delta re-arm + sonnet + async hook + signature dedupe) from intake through define-behavior: - spec.md: 4 JTBD / 8 AC across SM/TB/NTB; quality-review refinements locked (REARM_GROWTH=200 additive, MAX_FIRES=20 backstop, OVERLAP_BYTES=2048, #563-absent fail-open, tail residual bound, retro.model config). Self-review stamped. - dimensions.md: 11 behavioral dimensions mapped to ACs + test layers. - features/retro-recall-delta-rearm.feature: 22 scenarios / 8 rules (@manual: unit+wiring, mocked boundaries) with @retro-recall.<AC> lineage tags. - test-definitions.md: R/G/R ledger. Phase: scenario-gate (independent fresh-context review in flight). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * docs(ZFGWS1): apply scenario-gate review — 26 scenarios, sharpened assertions Independent fresh-context review found 3 must-fix / 6 should-strengthen; applied: - SM2.AC3: replace tautological offset ">=" with a strict sequential advance + torn-read-prevention; spec states the honest guarantee (last-writer-wins + signature-dedupe backstop, NOT max-wins — a lock is out of scope). - SM1.AC1: back-half finding now asserts an observable FILED outcome over a transcript larger than the digest cap (proves window-not-head); later-fire window begins at "previous offset minus overlap" (resolves boundary conflict with the overlap scenario); first-fire anchored to the head-cap, not given-echo. - New rejection/negatives: first Stop below substance threshold does not fire; no-session-id-resolves files nothing under 'unknown'; fuzzy signature-search near-miss rejected by the exact filter; state-write failure leaves offset unchanged. 26 scenarios / 8 rules; test-definitions ledger synced. Re-review in flight. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * docs(ZFGWS1): scenario-gate PASS + impl-plan; advance to implement - Re-review: 0 must-fix (gate pass); folded the 2 optional polish tweaks (drop first-fire trailing clause; arm the recursion-guard Given so precedence is load-bearing). - impl-plan.md: load-bearing slice = back-half window→pipeline proof; build order 1-9 (offset state → cadence → window slice → back-half wiring → sonnet → signature dedupe → session id → async registration → egress non-bypass). - Scenario-gate stamped (independent fresh-context review; cross-model skip logged). - Phase → implement. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * test(ZFGWS1): RED — delta re-arm cadence; offset-state helpers (slice 1) - Offset-state helpers (real): REARM_GROWTH/MAX_FIRES/OVERLAP_BYTES, OffsetState, offsetStatePath, readOffsetState (graceful on torn state), writeOffsetState (atomic temp-write + rename, injectable fs). - decideRetroRun still fire-once → the 6 delta-decision tests fail (windowStart, additive re-fire, backstop, fail-open-offset-unchanged). RED for the feature. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * feat(ZFGWS1): GREEN — delta-aware decideRetroRun (offset state + additive cadence) decideRetroRun supersedes the once-per-session sentinel: first fire keeps the substance gate and digests from offset 0; re-fires gate on additive growth (REARM_GROWTH tool-uses) under the MAX_FIRES backstop, returning windowStart = prior offset. Recursion-guard-first; fail-open on unreadable transcript and on a state-write failure (duplicate absorbed by signature dedupe). Mirrored to .safeword. Superseded fire-once trigger tests removed (covered by retro-delta- rearm.test.ts). 37/37 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * feat(ZFGWS1): GREEN — runRetro digests the pre-sliced delta window (slice 3/4) runRetro slices windowFor(transcript, windowStart) before extraction, so the digest cap applies to the new delta, not the chronological head. A back-half finding beyond the head cap is now filed by a delta fire (windowStart at the back half) while a head-capped fire over the same transcript files nothing — proven end-to-end through runRetro → windowFor → runHeadlessExtraction → buildDigest → triage. windowFor units cover first-fire-whole / later-fire-minus-overlap / overlap-clamp. 11/11 green; mirrors in sync. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * test(ZFGWS1): RED — extraction defaults to sonnet (slice 5) runHeadlessExtraction with no model passed should request sonnet, not haiku (measured: haiku 1-3 weak findings vs sonnet 9). RED: default is still haiku. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * feat(ZFGWS1): GREEN — sonnet default at both model sites, config-overridable - retro-extract.ts: DEFAULT_RETRO_MODEL='sonnet'; runHeadlessExtraction default → sonnet; resolveRetroModel reads .safeword/config.json retro.model (sonnet fallback, fail-open). - retro.ts: buildAutoExtractor exported, takes projectDirectory + injectable spawn, resolves model via resolveRetroModel (was hardcoded 'haiku'); retroCommand passes the project dir. - Tests: resolveRetroModel default+override; buildAutoExtractor runner sonnet default + config override (injected spawn asserts argv). 22/22 green; mirrored. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * docs(ZFGWS1): ledger — slices 1-5 R/G/R recorded Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * feat(ZFGWS1): GREEN — signature dedupe + stable session id forward (slice 6) Dedupe by content signature, not the model-generated title (titles vary across delta re-fires): - draft.ts: buildDraft embeds a hidden signatureMarker(retro:<hash>) in the body. - triage.ts: IssueTracker.searchByTitle → searchBySignature; matches on encounter.draft.signature. - github-rest.ts: searchBySignature queries in:body by the hash token, then exact-filters the full signature (rejects fuzzy near-misses). Stable session id to the child (was 'unknown' in cloud): - decideRetroRun returns the resolved sessionId; retroChildArgs forwards --window-start + --session-id; cli.ts parses them; retroCommand prefers the forwarded id. Completes the delta-window CLI plumbing. - stop-retro integration updated to offset state (sentinel retired). Mirrored. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * test(ZFGWS1): RED — retro Stop hook registered async (slice 7) The retro Stop hook should be registered async:true (non-blocking, background, 600s) so repeated delta fires never block Stop — and NOT asyncRewake (which surfaces stderr into chat, breaking invisibility). RED: it's a plain sync hook. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * feat(ZFGWS1): GREEN — register the retro Stop hook async:true (slice 7) - config.ts: asyncHook helper ({type,command,async:true}); the Stop bucket registers stop-retro via asyncHook (was a plain blocking hook). async:true backgrounds the whole hook tree (returns immediately, 600s) so repeated delta fires never block Stop; NOT asyncRewake (which would surface stderr → break invisibility). Inner spawnSync stays synchronous within that backgrounded tree. - .claude/settings.json: dogfood parity — retro Stop hook gains async:true. - Also records the slice-6 (SM2.AC1/AC2) R/G/R ledger. config + schema + reconcile suites green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * test(ZFGWS1): NTB1.AC1 — egress guard holds on every delta window (slice 9) Guard tests: a finding from a re-fire (windowStart > 0) with a secret is redacted and one with an unresolved surface is dropped — windowing slices the input transcript only and never bypasses normalize→resolveSurface→sanitize→buildDraft. Protect tests for the inherited egress boundary; pass as the invariant already holds. 12/12 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * docs(ZFGWS1): ledger — all 26 scenarios R/G/R; cross-scenario row pending Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * refactor(ZFGWS1): cross-scenario — temp-file name uniqueness hardening Whole-ticket /quality-review verdict: SHIP (0 must-fix / 0 should-fix). Applied its one worthwhile hardening: the offset-state temp file now carries pid + a per-process counter, so neither two near-simultaneous Stops nor two writes within one process can collide before the atomic rename. Mirrored. 13/13 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * docs(ZFGWS1): reconcile impl-plan → implemented; advance to verify - Cross-scenario refactor row marked (8ded451). - impl-plan reconciled: all decisions held; 2 naming nuances (char-not-byte offsets; HTML-comment-indexing dependency) → Known deviations. - Work log: implement complete (26 scenarios / 173 tests; /quality-review SHIP). - Phase → verify. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * fix(ZFGWS1): tsc — drop unused import + guard mock.calls index in delta test tsc --noEmit (the gate's typecheck) flagged an unused readFileSync import and an unguarded mock.calls[0] destructure; vitest passed but CI's lint job runs tsc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * docs(ZFGWS1): verify + audit pass → done /verify: suite 4197/4197 (5 skipped), build ✅, Gherkin 23/23, lint+tsc clean, PR scope matches, dep-drift clean. /audit passed: 0 circular deps, config in sync, no ZFGWS1 dead code, duplication 1.3% (pre-existing). verify.md written; all 7 done-when criteria met. Ticket → done. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * refactor(ZFGWS1): rename OVERLAP_BYTES → OVERLAP_CHARS (retire misnomer) The overlap constant and windowFor operate on JS string code units (transcript.length / String.slice), not bytes — the BYTES name was a documented misnomer (quality-review nice-to-have). Behavior-preserving Tier-1 rename across retro-extract.ts (+ retro-trigger comment + .safeword mirrors), retro-window tests, and the ticket docs. 28/28 retro tests green; mirrors byte-identical. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * docs: file 1M20EW — retro extractor reports fixed bugs as current friction Follow-up from the ZFGWS1 live fire: sonnet mined the build session's back half and returned 6 findings, 5 of which described bugs ZFGWS1 *fixed in that session*, framed as current friction. Any bug-fixing session would file false issues. Backlog ticket (intake/todo); ZFGWS1's mechanism itself is validated, out of scope. (Live fire also filed + closed ArcadeAI/safeword#581 — the one genuine open finding — after confirming the GitHub in:body signature search works.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * feat(BNGK9W): draft spool — persist post-egress drafts on filing failure BNGK9W step 1 (cloud filing, #568): in a Claude cloud container the retro REST transport 401s (GITHUB_TOKEN isn't a GitHub token) and the sanitized drafts are LOST. This spool persists the code-assembled drafts ({signature,title,body,labels} — already egress-sanitized, no raw finding text on disk) to .safeword/retro-drafts/ per session, so the agent-filing path can read + post them via GitHub MCP. Fail-open + capped (20/session) + torn-line-tolerant, mirroring the self-report spool. Unit tests: round-trip, append-across-fires, empty/torn→[], cap, only the four draft fields written. 5/5 green; tsc + eslint clean. Foundation slice only — transport selection + agent pickup are follow-on. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * docs(BNGK9W): reconcile PATH B with ZFGWS1 async hook; log slice 1 The intake design's PATH B (Stop hook surfaces a line → agent files) can't fire from an async:true hook (backgrounded, surfaces nothing). Reconciliation: keep extraction+spool on the async Stop hook; move the agent-filing trigger to a SessionStart/UserPromptSubmit "unfiled drafts" nudge that can surface. Logged the draft-spool foundation slice (fb0d79c). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * docs(BNGK9W): define-behavior — spec + dimensions + 11 scenarios Authored spec.md (3 JTBD / 6 AC across SM/TB/NTB, reconciled two-path design: silent REST when a token works; spool + agent-subagent filing via GitHub MCP in cloud where REST 401s; nudge off the async Stop hook), dimensions.md, and 11 scenarios across 5 rules (features/cloud-retro-filing-transport.feature, @manual). /self-review stamped. Phase → scenario-gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W: apply scenario-gate review findings Revise the cloud-retro-filing scenarios after the independent scenario-gate review (3 must-fix, 4 should-strengthen): Must-fix: - Pin the subagent seam (SM1.AC1) to the mocked transport payload: exactly N posts for N drafts, each body byte-equal incl signature marker. - Rebind the drain assertion (SM1.AC3) from hidden on-disk state to a fresh persisted re-read. - Add the two missing rejection paths: REST partial failure (only the REST-filed draft drains) and subagent partial failure (posted draft drains, the failed one stays spooled and re-nudges). Should-strengthen: - Once-per-batch pinned to a persisted marker surviving fresh evaluation, plus a "batch gains a new draft -> nudges again" case. - Pair negative-existence Thens (SM1.AC2, TB1.AC1) with positive anchors. - Phrasing check names the banned imperative-marker list. - Exactly-one-line gets a "several drafts" Given so cardinality bites. Resync test-definitions.md ledger (14 scenarios) and add REST-partial / subagent-failure partition rows to dimensions.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W: pass scenario-gate — apply round-2 findings, advance to implement Round-2 independent review: 0 must-fix / 3 should-strengthen / 8 looks-good. Applied the 3 should-strengthen: - Drain scenario drives mark-filed inside the When (a no-op stub now fails). - "no re-file" bound to a zero-post assertion on the transport mock. - NTB1.AC1 uses a distinctive leak sentinel so the absence check is load-bearing. Stamp the scenario-gate (same-model independent reviewer; crossModelReview off), advance phase to implement, and record the proof plan + build order. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W: author impl-plan.md at scenario-gate exit Five sections: Approach (riskiest assumption = durable spool drain; load-bearing slice first; per-scenario proof + 5-slice build order), Decisions (drain mechanism, nudge de-dupe key, trigger-hook home, transport selection), Arch alignment (7D8PJP/ZFGWS1/egress boundary), Known deviations (none), Assessment triggers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W slice 1: mark-filed/drain primitive for the retro draft spool The shipped spool was append-only — a filed draft could never be removed, so the whole "no duplicates across the fallback" rule had no substrate. Add markDraftsFiled(projectDirectory, sessionId, filedSignatures): rewrites the per-session spool minus the filed signatures (persisted removal, not an in-memory filter, so a fresh read no longer yields them), atomic via temp-write+rename, fail-open. Extract draftLine and reuse it in spoolDrafts. Drives SM1.AC3 "Marking a draft filed drains it from the persisted spool" plus the partial-drain path the REST-partial / subagent-partial scenarios consume. 10/10 unit tests pass; lint + tsc clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W: relocate the draft spool to templates/hooks/lib (shared by both paths) The cloud fallback's surfacing hook runs under bun in a customer repo and can only import from the materialized .safeword/hooks/ — never from src. Both paths need the spool: the CLI writes + drains it (PATH A), the surfacing hook reads it to decide the once-per-batch nudge (PATH B). So move it to templates/hooks/lib, self-contained (node:* only, local SpooledDraft type structurally equal to RetroDraft), exactly the lib/self-report.ts precedent that src already imports by relative path. Behavior-preserving move: byte-parity mirror at .safeword/hooks/lib, registered in schema.ts, test relocated to tests/hooks. 10/10 tests pass, tsc clean (the 11 pre-existing .safeword-mirror rootDir warnings are unrelated). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W slice 2: once-per-batch cloud-filing nudge decision Add lib/retro-nudge.ts (self-contained, mirrored, schema-registered): - decideRetroNudge: one factual line when unfiled drafts exist AND this batch hasn't been surfaced; else undefined. Fires once per unfiled batch via a persisted, signature-keyed marker (sha256 of the sorted unfiled signatures), compared on each fresh evaluation — so an unchanged batch stays silent while a batch that gains a draft nudges again. - formatRetroNudge: names the count + spool path, no imperative marker (run/file/please/you-must/should as whole words), single line. Muted by design (user steer): the line is a system-reminder statement the model reads, never a command. Also records the muted footprint decision in impl-plan. 6/6 unit tests pass; tsc + parity clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W: ledger + work log for slices 1-2 (drain + once-per-batch nudge) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W slice 3: transport-selection wiring — spool, try-REST, drain filed runRetro now (opt-in via projectDirectory) spools the post-egress drafts BEFORE filing, so a REST auth failure (cloud #568) no longer loses them; after triage it drains only the drafts that reached the tracker and reports agentFilingNeeded when any remain. triage gains filedSignatures (created OR matched an existing issue) — the drafts safe to drain; deferred/failed stay spooled for the agent path. retroCommand passes projectDirectory so the async Stop-hook path spools silently. Existing runRetro callers omit projectDirectory and keep REST-only behavior. Covers SM1.AC1/AC2: valid token → all filed + spool drained + agentFilingNeeded false; REST 401 → all retained + true; partial → only the rejected draft retained. 110/110 retro tests pass; lint + tsc clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W slice 4: fileSpooledDrafts — the agent filing seam (post verbatim, drain) Add fileSpooledDrafts(projectDirectory, sessionId, post): reads the session spool, posts each draft's code-assembled body VERBATIM through the injected transport (the agent's GitHub MCP in cloud, mocked in tests), then drains exactly the drafts that posted. A draft whose post throws stays spooled for retry — so a later boundary re-nudges and findings are never dropped. The spool already holds post-egress bodies, so "verbatim" carries no un-sanitized text; the cloud subagent's MCP filing procedure mirrors this loop. Covers SM1.AC1: exactly-N verbatim posts + full drain on success; partial failure retains only the un-posted draft and a later boundary still nudges for it. 12/12 tests pass; parity + tsc clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W slice 5: UserPromptSubmit surfacing hook for the cloud-filing nudge Add prompt-retro-nudge.ts (mirrored, schema-registered, wired into config.ts + dogfood settings.json UserPromptSubmit): when the async Stop hook has spooled unfiled drafts (REST 401 in cloud), the next user prompt surfaces ONE factual line via hookSpecificOutput.additionalContext — a system-reminder the model reads and the user never sees as a chat message (docs-confirmed), muted by design. Fires once per unfiled batch (decideRetroNudge marker); silent when the spool is empty; never blocks the prompt. UserPromptSubmit (not SessionStart) is the right boundary — it fires after each Stop within the same ephemeral cloud session, so drafts spooled mid-session get surfaced; SessionStart fires only once, too early. Exempted in hook-coverage (covered deterministically by tests/integration/prompt-retro-nudge.test.ts). 43/43 affected tests pass; parity + tsc + config tests clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W: cover NTB1.AC1 (on-disk no-leak) + drained-boundary no-refile/no-renudge - NTB1.AC1: a finding carrying a recognized secret + a customer path flows through the REAL egress pipeline; with a REST 401 the draft stays spooled, and the spool FILE carries neither the secret nor the path (only the four sanitized fields) — the no-leak guarantee holds on disk, not just upstream. - Drained boundary: fileSpooledDrafts over an empty spool posts nothing and decideRetroNudge stays silent — no re-file, no re-nudge. Both drive already-shipped behavior (egress + spool + drain); coverage added here. 20/20 in the two files pass; lint clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W: teach the filing guide to file cloud-spooled retro drafts The nudge points the agent at self-report-filing.md, but that guide only covered the self-report spool (`safeword self-report --format issue`) — an agent following it would read the wrong spool and file nothing. Add a "Retro drafts" section: read the JSONL spool named in the reminder, then file each draft by the SAME procedure (upstream repo, signature dedup, cap, verbatim body). No drain step needed — the nudge fires once per unfiled batch and the cloud container is ephemeral. Also finalize the R/G/R ledger for all 14 scenarios. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4sfd6Q * BNGK9W: work log for slices 3-5 + guide fix Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019VsqfvrhacpctH7x4s…
1 parent e27e654 commit b1355af

130 files changed

Lines changed: 13470 additions & 233 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/settings.json

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,14 @@
118118
"command": "bun \"$CLAUDE_PROJECT_DIR\"/.safeword/hooks/prompt-questions.ts"
119119
}
120120
]
121+
},
122+
{
123+
"hooks": [
124+
{
125+
"type": "command",
126+
"command": "bun \"$CLAUDE_PROJECT_DIR\"/.safeword/hooks/prompt-retro-nudge.ts"
127+
}
128+
]
121129
}
122130
],
123131
"Stop": [
@@ -144,6 +152,15 @@
144152
"command": "bun \"$CLAUDE_PROJECT_DIR\"/.safeword/hooks/stop-self-report.ts"
145153
}
146154
]
155+
},
156+
{
157+
"hooks": [
158+
{
159+
"type": "command",
160+
"command": "bun \"$CLAUDE_PROJECT_DIR\"/.safeword/hooks/stop-retro.ts",
161+
"async": true
162+
}
163+
]
147164
}
148165
],
149166
"PreToolUse": [

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ examples/
99
# Safeword - Local cache and transient state
1010
.safeword/.update-cache.json
1111
.safeword/self-reports/
12+
.safeword/retro-drafts/
1213
.safeword-project/quality-state*.json
1314
.safeword-project/cursor-run-identity.json
1415
.safeword-project/codex-run-identity.json
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
---
2+
id: 1B46CT
3+
slug: retro-legacy-retirement
4+
type: task
5+
phase: todo
6+
status: todo
7+
parent: RV9JT4-retro-transcript-mining
8+
scope: |
9+
Retire the retro code/paths that ZFGWS1 (delta re-arm + signature dedupe) and
10+
the Codex/Cursor invisibility tickets (#551/#552) make dead. Grounded in a usage
11+
sweep + an independent quality-review (2026-06-30). Grouped by WHEN deletion is
12+
safe.
13+
14+
VERIFICATION (quality-review C1 — load-bearing): do NOT verify with knip. Both
15+
`knip.json` and `packages/cli/knip.json` IGNORE `templates/**` and `.safeword/**`
16+
(knip.json:2, packages/cli/knip.json:2), so knip reports zero orphans for these
17+
functions whether or not callers exist — the check is unfalsifiable. Verify with
18+
a REPO-WIDE GREP (`grep -rn "<symbol>" packages/cli/src packages/cli/templates
19+
.safeword tests`) returning zero hits, plus a green build + test run.
20+
21+
EVERY deletion PR must, in the SAME PR (quality-review C2): edit the
22+
`templates/**` source, sync the `.safeword/**` byte-mirror, update `schema.ts`
23+
managed-file pairs if an entry is removed, AND drop the method from all test
24+
fakes — or the `IssueTracker` type check / live `.safeword` hooks break.
25+
26+
TIER 1 — delete WITH ZFGWS1:
27+
- `searchByTitle` title-dedupe path: transport impl (github-rest.ts:70), the
28+
`IssueTracker` port entry (triage.ts:39), the call (triage.ts:82), AND the
29+
test fakes (triage.test.ts:50, github-rest.test.ts:67, tests/commands/
30+
retro.test.ts:19). Replaced by signature matching. Verified safe: no dynamic
31+
imports, no string refs, no guide calls it (the self-report + retro guides
32+
dedupe via `gh`/`--findings`, not this method).
33+
- The fire-once `hasNudged` gate INSIDE `decideRetroRun` (retro-trigger.ts:318)
34+
— replaced by the re-arm offset state. (File-based sentinel, not an in-memory
35+
boolean — C3 wording fix.) The `hasNudged`/`markNudged` helpers themselves
36+
STAY until Tier 2 (still used by `decideRetroNudge`).
37+
- Collateral (already in ZFGWS1 scope; note here so it's not forgotten):
38+
`model:'haiku'` at retro.ts:113 + retro-extract.ts:155 → sonnet; and
39+
`buildDigest`'s head-cap (retro-extract.ts:210-233) does NOT delete but
40+
CHANGES meaning (cap now applies to a pre-sliced window, not the head).
41+
42+
TIER 2 — delete WITH #551/#552 (Codex/Cursor invisibility):
43+
- The in-conversation nudge path: `decideRetroNudge` + `buildRetroNudge` +
44+
`hasNudged`/`markNudged`/`sentinelPath`/`sentinelName` (retro-trigger.ts). For
45+
Claude these are ALREADY dead (stop-retro.ts uses `decideRetroRun`); they
46+
survive only via `codex/stop.ts` + `cursor/stop.ts`. Confirmed consumers
47+
(incl. tests: retro-trigger.test.ts, codex/cursor/stop-retro integration
48+
tests). Same-PR rule applies: templates + `.safeword` mirror + schema.ts +
49+
integration tests together.
50+
51+
TIER 3 — consolidation design call (NOT a blind delete):
52+
- Deterministic self-report spool (`stop-self-report.ts` + `lib/self-report.ts`)
53+
vs qualitative invisible retro: different CAPTURE (allowlisted spool signals vs
54+
LLM extraction), CADENCE (every Stop w/ signals vs once+re-arm), and EGRESS
55+
(agent files w/ title-dedupe vs code files w/ signature-dedupe). Their FILING
56+
paths overlap. Folding the spool into retro's invisible+egress pipeline is a
57+
sound design QUESTION — BUT `stop-self-report.ts` is the ONLY remaining
58+
in-conversation `additionalContext` surface after 7D8PJP; folding MUST
59+
explicitly re-home that signal or it's lost. Own ticket; keep separate until
60+
decided.
61+
62+
TICKETS: 1FGE1C (robust-tracker-dedup) — signature dedupe absorbed by ZFGWS1 →
63+
close/annotate once ZFGWS1 covers its done_when.
64+
out_of_scope: |
65+
- The deletions before ZFGWS1 / #551 / #552 land — this is the PLAN + the
66+
post-merge grep-driven execution, not premature removal.
67+
- #563 (cost gate) and 7ZCKS6 (eval) — still live, not retired.
68+
done_when: |
69+
- Tier 1 removed in ZFGWS1's PR; `grep -rn "searchByTitle" packages/cli .safeword`
70+
returns zero hits; build + tests green (NOT a knip check — knip ignores
71+
templates/.safeword).
72+
- Tier 2 removed when #551/#552 land; grep for the nudge+sentinel symbols returns
73+
zero; `.safeword` mirror + schema.ts + integration tests updated in the same PR.
74+
- Tier 3 has a recorded decision (fold w/ re-homed in-conversation surface, or
75+
keep separate) in its own ticket.
76+
- 1FGE1C closed/annotated as absorbed by ZFGWS1.
77+
created: 2026-06-30T17:20:00.000Z
78+
last_modified: 2026-06-30T17:20:00.000Z
79+
---
80+
81+
# Retire legacy retro paths after ZFGWS1 + Codex/Cursor invisibility
82+
83+
**Goal:** Track + drive the dead-code retirement the recall rework (ZFGWS1) and
84+
Codex/Cursor invisibility (#551/#552) enable, verified by grep + build/test (knip
85+
is blind to `templates/**` and `.safeword/**`).
86+
87+
**Parent:** RV9JT4. **Depends on:** ZFGWS1 (Tier 1), #551/#552 (Tier 2).
88+
89+
## Usage sweep + quality-review (2026-06-30, grounded)
90+
91+
- `searchByTitle`: callers = triage.ts:82 (+ :39 port, github-rest.ts:70 impl) +
92+
3 test fakes. No dynamic imports / string refs / guide calls. → Tier 1.
93+
- nudge path (`decideRetroNudge`/`buildRetroNudge`/`hasNudged`/`markNudged`/
94+
`sentinelPath`): codex/stop.ts + cursor/stop.ts + tests only. → Tier 2.
95+
- knip IGNORES `templates/**` + `.safeword/**` (knip.json:2, packages/cli/
96+
knip.json:2) → verify with grep, not knip.
97+
98+
## Work Log
99+
100+
- 2026-06-30T17:20Z Captured the three-tier retirement plan from a usage sweep.
101+
- 2026-06-30T17:27Z /quality-review (independent subprocess) → REQUEST CHANGES,
102+
folded in: (C1) knip is blind to templates/.safeword → verify by grep + build/
103+
test, not knip [the done_when fix]; (C2) every deletion PR must update templates
104+
+ `.safeword` mirror + schema.ts + test fakes together; (C3) "boolean sentinel"
105+
→ "fire-once `hasNudged` gate" (file-based). Added collateral (`model:'haiku'`
106+
×2, buildDigest head-cap semantic change) and the Tier-3 nuance (folding must
107+
re-home stop-self-report's in-conversation surface). Tier 1/2 consumer lists
108+
confirmed correct + safe.
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
id: 1FGE1C
3+
slug: robust-tracker-dedup
4+
parent: RV9JT4-retro-transcript-mining
5+
type: task
6+
phase: intake
7+
status: todo
8+
created: 2026-06-28T01:00:02.710Z
9+
last_modified: 2026-06-28T01:00:02.710Z
10+
scope: |
11+
Replace retro's fragile fuzzy-title dedup with a robust scheme on the upstream
12+
GitHub adapter + triage:
13+
1. Stamp every retro-filed issue with a `retro` label and a hidden body
14+
marker `<!-- retro-sig: retro:<hash> -->`. The marker is appended in
15+
`buildDraft` AFTER sanitize (assembleBody takes a Finding with no
16+
signature; buildDraft has the signature) so the sanitizer never touches it.
17+
2. Dedup by the strongly-consistent issues-LIST API + exact marker match —
18+
NOT the eventually-consistent search API. List `state=all` (see closed
19+
policy below), paginated with a page cap, and scan returned bodies for the
20+
marker (the list endpoint returns `body`, so no per-issue GET).
21+
3. In-run signature map (`Map<signature, IssueReference>`), checked BEFORE the
22+
list lookup and populated on BOTH create and list-hit, so two findings
23+
sharing a signature in one run can't double-create or double-bump within
24+
the consistent-list window. This also covers the first-ever run (before the
25+
label propagates).
26+
4. Ensure the `retro` label exists before the first list (idempotent create,
27+
ignore 422-already-exists).
28+
5. Closed-issue policy: match CLOSED retro issues too. On a closed match, do
29+
NOT create a duplicate and do NOT auto-reopen; post a brief "recurred after
30+
close" comment so a regression is visible without resurrecting the issue.
31+
6. Retire `searchByTitle` — remove the title-search dedup path entirely (no
32+
dual path that could reintroduce the dup bug).
33+
The IssueTracker port gains `ensureLabel` + `listByLabel` (state-parameterized);
34+
the tested core (egress/pipeline/ledger) is unchanged.
35+
out_of_scope: |
36+
- Semantic dedup vs HUMAN-filed tickets (no shared key) — that's a separate
37+
LLM-triage concern; this ticket is exact retro-vs-retro dedup only.
38+
- Multi-provider (Linear) adapters — routing stays upstream GitHub (RV9JT4).
39+
- The cross-session near-simultaneous race (two installs filing the same novel
40+
signature within the list→create window) — inherent limit; periodic merge is
41+
the backstop, not in scope.
42+
- Maintainer REMOVES the `retro` label from an issue → it drops out of the list
43+
and a recurrence may re-file. Accepted limitation (same class as the cross-
44+
session race); not defended here.
45+
- Auto-reopening maintainer-closed issues — deliberately not done (a comment is
46+
the signal; reopening is too aggressive).
47+
done_when: |
48+
- A retro-filed issue carries the `retro` label and an exact, anchored
49+
`<!-- retro-sig: retro:<12-hex> -->` body marker; a test asserts the marker
50+
round-trips through body assembly + sanitize and is matchable by the scan.
51+
- The `retro` label is ensured to exist before the first list (idempotent).
52+
- Dedup uses the issues-list API + exact marker match; a known OPEN signature
53+
never creates a second issue even when GitHub search hasn't indexed it yet.
54+
- A known CLOSED signature creates no new issue and does not reopen; it leaves a
55+
"recurred after close" comment.
56+
- Two findings with the same signature in one run create exactly one issue
57+
(in-run map), and re-running on the same transcript does not double-file.
58+
- Title drift on a known signature does not fork a new issue.
59+
- List pagination is bounded by a page cap; behavior at the cap is logged
60+
(truncation = possible miss, backstopped by periodic merge).
61+
- Scenarios green; /verify passes.
62+
---
63+
64+
# Robust dedup: signature marker + label-scoped list lookup (not fuzzy title search)
65+
66+
**Goal:** Make retro's "never a duplicate issue" guarantee actually hold, by
67+
deduping on a stable embedded signature via the strongly-consistent issues-list
68+
API instead of fuzzy, eventually-consistent title search.
69+
70+
**Why:** Title-search dedup (RV9JT4's first cut) is fragile — GitHub search
71+
indexing-lag, relevance ranking past the first results page, and title drift can
72+
all miss an existing issue and file a duplicate, breaking SM1.AC2.
73+
74+
**Parent:** RV9JT4-retro-transcript-mining. Flagged by two independent reviews
75+
(S2) and deferred from RV9JT4 as a contained follow-up.
76+
77+
## Work Log
78+
79+
- 2026-06-28T01:00:02.710Z Started: Created ticket 1FGE1C (sub-ticket of RV9JT4)
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
id: 1M20EW
3+
slug: retro-fixed-vs-present-friction
4+
type: task
5+
phase: intake
6+
status: todo
7+
created: 2026-06-30T20:43:24.401Z
8+
last_modified: 2026-06-30T20:43:24.401Z
9+
---
10+
11+
# Retro extractor reports fixed/discussed bugs as current friction
12+
13+
**Goal:** Stop the invisible retro from filing issues for bugs the session
14+
already FIXED (or merely discussed) — only surface friction that is still live.
15+
16+
**Why:** Discovered during the ZFGWS1 live fire (2026-06-30). Sonnet mined the
17+
back half of the ZFGWS1 build session and returned 6 sanitized findings — 5 of
18+
which described the very bugs ZFGWS1 *fixed in that session* (haiku default,
19+
once-per-session sentinel, title dedupe, blocking hook, missing session id),
20+
phrased as present-tense friction. The extractor can't distinguish "we fixed X
21+
this session" from "X is broken." For a self-reporting feature this is high-impact:
22+
**any** session that fixes safeword bugs will file false issues for the bugs it
23+
just resolved — exactly the sessions most likely to be substantial and trigger
24+
retro. (The 6th finding — the GitHub-indexing risk — was genuine and was filed +
25+
then closed as #581 after the indexing assumption was empirically confirmed.)
26+
27+
## Evidence
28+
29+
- Live-fire transcript window: `--window-start 2000000` over the ZFGWS1 session;
30+
`model=sonnet rawFindings=9 encounters=6`.
31+
- 5/6 encounters were fixed-this-session bugs framed as current friction.
32+
- Egress + signature + filing + dedupe all worked correctly — the gap is purely
33+
the extractor's temporal framing of findings.
34+
35+
## Sketch (not yet designed — intake)
36+
37+
Candidate directions to weigh in spec/figure-it-out:
38+
39+
- Tighten the extraction system prompt to require findings be friction that is
40+
STILL present at the end of the window (ignore problems the session resolved).
41+
- Post-filter: drop findings whose surface/title was touched by a commit in the
42+
same session (the transcript shows the fix landing).
43+
- Accept-and-dedupe: rely on the occurrence ledger + human triage (weakest —
44+
still files the false issue once).
45+
46+
## Out of scope
47+
48+
- ZFGWS1's shipped mechanism (delta re-arm, sonnet, async hook, signature dedupe)
49+
— all validated by the live fire; this is a follow-up refinement, not a regression.
50+
51+
## Work Log
52+
53+
- 2026-06-30T20:43Z Created from the ZFGWS1 live fire — extractor reported 5/6
54+
already-fixed bugs as current friction. Backlog (todo); needs intake/spec.

0 commit comments

Comments
 (0)