docs: make workstreams deliver PR-ready changes by durch · Pull Request #75 · open-horizon-labs/oh-omp

durch · 2026-06-13T20:30:13Z

Summary

amend the oh-omp expert system so workstreams default to useful PR/MR-ready output instead of plan-only/governance-only stops
treat branch creation/use, logical commits, push to origin, and opening/preparing PR/MR as normal reviewable ship mechanics inside invoked workstream/ship flows
preserve human-owned gates for merge, release/tag/publish, destructive/non-reviewable external side effects, model binding/rebinding, and unresolved material decisions
update the workstream-expert builder prompts so future projects inherit the same PR-as-outcome semantics

Verification

git diff --check -- .oh/workstreams/oh-omp/EXPERT-SYSTEM.md packages/coding-agent/src/prompts/agents/workstream-expert.md packages/coding-agent/src/prompts/commands/workstream-expert.md
adversarial grep for old over-gating language and positive PR-as-outcome law
Superego review: DECISION: ALLOW, CONFIDENCE: high

Notes

This is governance/prompt law only; no product runtime code changed.
Ship stops before merge. Release remains separate and human-owned.
Branch includes the existing local ahead commits plus the latest protocol-law commit 2bed71690.

Summary by CodeRabbit

Bug Fixes
- Fixed bash command pattern matching to avoid false positives on quoted shell content.
- Corrected session resumption turn numbering to prevent collisions after resuming sessions.
- Improved tool validation error messages with required field structure hints.
- Refined Anthropic prompt caching to avoid marking synthetic context with cache markers.
New Features
- Added configurable working-set retention settings for preserving frequently-accessed content during context compression.
- New session management CLI for mapping and tracking session identifiers.
Improvements
- Enhanced bash interceptor to detect composite shell commands and apply stricter rule matching.
- Improved grepping to skip symbol-index searches when precision filters are active.

Read-only miner that replays the production assembler transform (dedup/read/warm codecs, hot window) over recorded session JSONL and measures context-illusion integrity: re-fetch of compressed content, recall vs full-price recovery split, recall efficacy, and wasted-token cost. Mutation-justified re-reads are excluded; output is aggregate-only with hashed samples (no payloads). Part of harness-ergonomics Track A.

Sweeps eviction age, token cap, and entry rule for the proposed assembler working-set policy over the recorded corpus, with a cache-aware cost model (write-priced savings vs read-priced retention rent, dedup back-ref residue). Miner helpers exported for reuse; miner main guarded with import.meta.main.

Files the model actively re-reads (2+ unchanged re-reads) keep their canonical first-read turn verbatim beyond the hot window, so dedup back-references resolve to full content instead of a codec skeleton. Pins are version-keyed (path + content hash) so interleaved range reads don't reset candidacy; one version per path; evicted after N untouched turns or beyond a token cap; budget pressure still applies. Pure transformMessages policy: deterministic, cache-stable (pinned turns never change representation while pinned), no protocol changes. Off unless explicitly enabled; runtime wires it from assembler settings (default on, evict=8 turns, cap=16K). Decisions report reason 'working-set' for observability. Verified: 68 transform tests pass; full-corpus replay via mine-context-illusion --working-set reclassifies 55 historical read-only illusion breaks as served-by-pin.

… change A pinned version is disqualified the moment the same read view (path + offset/limit) returns different content after its last touch (re-read after an edit). Without this, provably outdated file content stayed pinned verbatim for up to evictAfterTurns after a mutation - worst exactly in edit-heavy sessions. Different ranges of a file remain independent (pagination is not proof of change); unproven staleness still ages out. Found by dogfooding-hazard analysis before it bit: the edit-then-reread cycle is the dominant pattern in sessions working on this very code.

Stubs now tell the model how to recover at the point of need instead of relying on system-prompt instructions 100K tokens away: - read stubs: 'recall("<path>") expands' when the file is unchanged in session, or 'edited since this read - re-read for current state' when a later in-session mutation makes the snapshot provably stale - warm stubs: path-scoped recall recipe when the call targeted a file, generic 'recall expands' otherwise Staleness comes from a pure transcript scan (mutating tool calls per path) threaded into CodecContext as mutatedPaths - no fs I/O in the transform. Recipes are path-query based, not turn-based: recall turn numbering is process-scoped and diverges from transcript numbering (documented seam, follow-up tracked).

The per-turn assembly dashboard line now shows 'N pinned (working set)' when retention pins are active, making the policy observable while dogfooding without opening the prompt inspector.

When validation fails on a missing required property, append a compact required-only skeleton derived from the schema (two levels deep). The dominant observed failure mode is models confidently filling sibling fields while omitting a container property (e.g. task's tasks array, 3/3 sampled errors in session-log mining); naming the missing property alone leaves the model guessing at the shape. Applies to every tool. Type-mismatch failures are unchanged (no skeleton noise when the model already knows the shape).

Tiered recovery ladder on every store operation: 1. reopen the table and retry - the dominant failure is a stale handle after another agent process compacts version manifests (data intact, handle dead) 2. salvage readable rows -> timestamped JSON backup in agentDir -> drop -> recreate -> re-insert salvage -> retry Non-storage errors (bad filters, programmer errors) propagate unchanged; rebuild only triggers on classified Lance storage failures. Compaction cutoff widened from 2 minutes to 24 hours, closing the cross-process prune race that produced ~49% of todo/todos tool errors in session-log mining. Worst-case loss: rows unreadable at corruption time, with the backup attempt on disk either way. Update remains delete+re-insert but is wrapped as a single recovery op so a retry can never lose the row. Verified with real on-disk corruption: destroying the table directory mid-session rebuilds automatically and the store stays operational.

…all recovery Layer 1 - store data bug: the ingest turn counter was process-lifetime while the recall store is session-persistent. After a resume, new rows reused turn numbers already occupied by pre-restart rows of the same session, so recall(turn=N) merged unrelated turns. The counter now seeds from RecallStore.maxTurn(sessionId) at wiring time (new method, 5s timeout, fail-open to 0 with a warning). Layer 2 - lying affordance: system prompt, recall tool prompt, and composer meta-prompt all claimed stubs carry recall-compatible turn numbers. They never did. All three now teach the shipped mechanism: follow the stub's inline recipe (recall("<path>"), or re-read when flagged 'edited since this read'); for pathless stubs use recall(query=...) first and take turn numbers from recall results. One-step turn-precise expansion from stubs (toolCallId-based) is deferred behind dogfood telemetry per the turn-identity decision.

… stability Front-placed injections (recalled-context, concept-graph, assembly summary) regenerated message 0 every request, invalidating the Anthropic prompt cache for the entire transcript each turn. Measured via proxy: cache write/read 2.3-8.0 for assembler traffic vs ~0.1 control, with reads capped at the system+tools prefix. Tail placement preserves the conversation prefix so writes are incremental.

…equests Two wrong-result bugs (silent bad data, worse than errors): - grep: the RNA structural shortcut replaced text grep while ignoring glob/type/case/multiline/context/offset entirely - a glob-filtered search could return TS symbols and never search the target files. RNA now only fires when no precision filters are present (explicit pre/post only, since context defaults come from settings), and its header states it is a symbol index with the escape hatch spelled out. - read: the RNA structural-view gate checked !offset only, so raw:true and explicit limit still returned the skeleton instead of file content. raw/limit/offset are explicit requests for actual content and now all bypass the structural view.

Two precision fixes to the guard engine: - Quote-blanking: rules now match against the command with quoted segments blanked. Quoted content is shell data, not syntax - echo of literal '>' markers is not a redirection, 'sed -i' inside a python -c string is not a sed invocation. Fixes the echo-as-redirection FP class for every rule generically. - simpleCommandsOnly: grep/find rules skip composite commands (unquoted pipes, chains, substitution). 'find ... | xargs wc' has no find-tool equivalent; blocking it forced guard-sandwich loops. Mutation rules (sed/perl/awk, echo redirection) still apply everywhere. 9 regression tests covering both FP classes and the preserved blocks.

recall(turn=N) calls returning empty/no content are failed recovery attempts - the evidence gate for the deferred toolCallId-based stub expansion slice. Report line now shows 'of which empty/failed: N'.

coderabbitai · 2026-06-13T20:30:27Z

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive governance framework for the oh-omp workstream, improves Anthropic prompt caching by tracking cache-eligible messages early in conversion, implements working-set retention to preserve actively re-read file content across message transforms, enhances codec recovery markers for staleness detection, refines tool interception logic, and adds analysis utilities for measuring context integrity.

Changes

Workstream Governance and Context Assembly Improvements

Layer / File(s)	Summary
Workstream governance framework and system law `.oh/workstreams/oh-omp/EXPERT-SYSTEM.md`, `.oh/workstreams/oh-omp/FRAME.md`, `.oh/workstreams/oh-omp/FRAME-model-routing-superseded.md`, `packages/coding-agent/src/prompts/agents/workstream-expert.md`, `packages/coding-agent/src/prompts/commands/workstream-expert.md`	Establishes durable project law (EXPERT-SYSTEM.md) with dual-chapter acceptance criteria, authority chains, phase mapping (`execute`/`ship`/`release`), superego phase-handoff gates, and model roster bindings; creates FRAME documents for oh-omp and model-routing scopes with execution contracts, stop conditions, and material uncertainties.
Anthropic prompt caching with cache-candidate message marking `packages/ai/CHANGELOG.md`, `packages/ai/src/providers/anthropic.ts`, `packages/ai/test/anthropic-alignment.test.ts`	Threads cache-candidate message indices through conversion (`convertAnthropicMessagesWithMetadata`) to mark non-synthetic user messages and tool-result aggregations as cache-eligible early, then applies cache breakpoints near those marks instead of deriving from role alone.
Validation error enrichment `packages/ai/src/utils/validation.ts`, `packages/ai/test/tool-argument-coercion.test.ts`	Adds `renderRequiredSkeleton` to generate compact JSON-schema required-property hints; appends hints to validation errors when AJV reports missing required fields.
Working-set retention for actively re-read content `packages/coding-agent/src/config/settings-schema.ts`, `packages/coding-agent/src/context/assembler/message-transform.ts`, `packages/coding-agent/src/context/assembler/types.ts`, `packages/coding-agent/test/message-transform.test.ts`	Implements optional message-transform retention policy that preserves verbatim content for tool results whose files are repeatedly re-read in-session, constrained by eviction age and token cap; detects mutations and threads mutated-path map through codec context for staleness awareness.
Codec recovery recipes and staleness markers `packages/coding-agent/src/context/assembler/codecs/read-codec.ts`, `packages/coding-agent/src/context/assembler/codecs/warm-codec.ts`	Adds `recoveryRecipe` helper and recovery instruction text to read and warm codecs indicating file edit status since read and appropriate recall/re-run strategy.
Tool interception and search refinements `packages/coding-agent/src/tools/bash-interceptor.ts`, `packages/coding-agent/src/tools/grep.ts`, `packages/coding-agent/src/tools/read.ts`, `packages/coding-agent/test/bash-interceptor.test.ts`	Hardens bash-interceptor to avoid false positives in quoted strings and composite commands via quote-blanking and composite detection; refines grep RNA structural search gating to exclude precision filters; tightens read tool RNA substitution conditions.
Session reliability and storage recovery `packages/coding-agent/src/context/recall/store.ts`, `packages/coding-agent/src/tasks/store.ts`, `packages/coding-agent/test/context/recall/store.test.ts`, `packages/coding-agent/test/tasks-store-recovery.test.ts`	Adds `RecallStore.maxTurn(sessionId)` to seed turn indices on session resume; implements storage-corruption detection, salvage-and-rebuild mechanics in TaskStore with tiered recovery (`#withRecovery`).
Message assembly and SDK integration `packages/coding-agent/src/sdk.ts`, `packages/coding-agent/src/context/assembly-summary.ts`	Moves assembled context (concept-graph, hydrated recall, assembly summary) to tail of message array to preserve prompt prefix for incremental cache writes; threads working-set configuration through assembler pipeline; seeds recall ingestion from maxTurn.
Analysis and simulation utilities `scripts/mine-context-illusion.ts`, `scripts/simulate-working-set.ts`, `scripts/sessions.sh`	Introduces scripts to analyze session transcripts for context-illusion integrity (re-fetch classification, mutation detection) and simulate working-set retention configurations (pinning, eviction, token overhead); adds bash CLI for session-key/ID mapping.
Configuration, prompts, and documentation `packages/coding-agent/CHANGELOG.md`, `.gitignore`, `packages/coding-agent/src/prompts/composer/meta-prompt.md`, `packages/coding-agent/src/prompts/system/system-prompt.md`, `packages/coding-agent/src/prompts/tools/recall.md`	Updates changelogs and prompts to document context placement, recovery recipes, and phase-handoff semantics; adds sessions.txt to ignore list.

Sequence Diagrams

sequenceDiagram
  participant Client
  participant buildParams
  participant convertAnthropicMessagesWithMetadata
  participant applyPromptCaching
  participant Anthropic API
  Client->>buildParams: message array
  buildParams->>convertAnthropicMessagesWithMetadata: convert with metadata
  convertAnthropicMessagesWithMetadata->>convertAnthropicMessagesWithMetadata: mark non-synthetic user messages
  convertAnthropicMessagesWithMetadata-->>buildParams: messages + cacheCandidateIndexes
  buildParams->>applyPromptCaching: call with cacheCandidateIndexes
  applyPromptCaching->>applyPromptCaching: position cache breakpoints near candidates
  applyPromptCaching-->>buildParams: params with cache_control markers
  buildParams->>Anthropic API: send request

sequenceDiagram
  participant transformMessages
  participant mutation detection
  participant working-set compute
  participant pinning logic
  participant codec context
  transformMessages->>mutation detection: scan toolCalls for mutating tools
  mutation detection-->>transformMessages: mutatedPaths map
  transformMessages->>working-set compute: compute retention exemptions
  working-set compute->>working-set compute: group re-reads by path+view+hash
  working-set compute->>pinning logic: select canonical pinned turn per path
  pinning logic->>pinning logic: respect eviction age and token cap
  pinning logic-->>transformMessages: exemption turn indices
  transformMessages->>codec context: thread mutatedPaths for staleness
  codec context-->>codec context: emit recovery recipe hints

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.96% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'docs: make workstreams deliver PR-ready changes' directly aligns with the PR's primary objective to amend the expert system so workstreams default to producing PR/MR-ready output.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch workstream/pr-as-outcome-protocol

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

rna-familiar · 2026-06-13T20:32:53Z

🧬 RNA Analysis — oh-omp#75

Scope: 37 files · ~3400 lines · Context assembly hardening, bash interceptor correctness, working-set retention, TaskStore recovery, workstream expert system artifacts

1. Context injection moved to tail — cache discipline fix (sdk.ts)

contextMessages (recalled-context, concept-graph facts, assembly summary) moved from front to tail of the message array:

-const finalMessages = contextMessages.length > 0 ? [...contextMessages, ...boundedMessages] : boundedMessages;
+const finalMessages = contextMessages.length > 0 ? [...boundedMessages, ...contextMessages] : boundedMessages;

Why: Anthropic prompt caching matches on a stable message prefix. Front-placement regenerated message 0 on every request — recalled-context blocks change per turn, so the entire transcript was rewritten at cache-write prices (measured write/read ratio 2.3–8.0 vs ~0.1 for non-assembler traffic). Tail placement adds one block of fresh input per request without breaking the conversation prefix. CHANGELOG documents the measurement.

2. Working-set retention (settings-schema.ts, sdk.ts, message-transform.ts)

Three new assembler settings: workingSetEnabled (default true), workingSetEvictTurns (default 8), workingSetTokenCap (default 16,000). Keeps actively re-read files pinned verbatim beyond the hot window for up to N turns without a re-read, up to the token cap. Assembly summary shows N pinned (working set) when active.

3. Recovery recipes in compression stub headers (read-codec.ts, warm-codec.ts)

recoveryRecipe(ctx, filePath) — added to every [warm:read:…] stub header:

File unchanged in session → recall("path") expands
File mutated since this read → edited since this read — re-read for current state

mutatedPaths?: ReadonlyMap<string, number> added to CodecContext. System prompt, recall tool docs, and meta-prompt updated to explain inline recipe lookup. This closes a context-decay loop: the agent no longer needs to guess how to recover a compressed read — the recovery instruction is carried in the stub itself.

4. `RecallStore.maxTurn(sessionId)` + ingest counter seeding (recall/store.ts, sdk.ts)

RecallStore.maxTurn(sessionId) — returns the highest ingested turn number for a session. Used to seed the ingest turn counter on process resume so post-restart rows never reuse turn numbers occupied by pre-restart rows (which would merge unrelated turns in recall(turn=N) lookups). 5-second timeout. Test suite: null for no rows, scoped by session ID, post-resume counter prevents collision.

5. Bash interceptor correctness (bash-interceptor.ts, settings-schema.ts)

blankQuotedSegments(command) — strips quoted content before pattern matching. echo "a > b" no longer triggers the write interceptor; python3 -c 'sed -i ...' no longer triggers the edit interceptor.

isCompositeCommand(blanked) — detects |, &&, ;, $(, ` in the blanked command.

simpleCommandsOnly: true on grep and find rules — find … | xargs wc passes through; find … -name '*.ts' (simple) still blocks. sed -i (no flag) still blocks in chains. Test suite: 9 cases covering quoted data, pipelines, composite detection, sed non-simple rule.

6. Grep/find precision filter guard (grep.ts)

RNA structural search now gated by !hasPrecisionFilters. Previously, grep -g '*.md' pattern could return RNA symbol results while ignoring the glob filter. Flags blocked: glob, type, ignoreCase, multiline, pre, post, offset. Result banner updated: [RNA structural search — symbol index, not text grep; add glob/type or regex syntax to force text search].

7. Read tool RNA guard (read.ts)

!limit && !raw added to the RNA structural view condition. Explicit limit/raw requests for actual file content are no longer substituted with the structural view.

8. TaskStore recovery (tasks/store.ts)

isLanceStorageError(err) — classifies Lance storage corruption vs. query/schema errors. Auto-rebuild on storage failure: salvage tasks to tasks-backup-<timestamp>.json, drop corrupted table, reinitialise. Store remains operational after rebuild. Non-storage errors propagate without triggering rebuild. Test suite: 2 tests including destructive on-disk corruption simulation.

9. Tool validation skeleton hint (ai/src providers)

validateToolArguments appends Required shape: with TypeBox-derived skeleton when a required property is missing. Only emitted on missing-property failures (not type-mismatch). Tests: skeleton for task tool with missing tasks array; no skeleton for type mismatch.

`.oh/` workstream artifacts

Three new artifacts committed:

.oh/workstreams/oh-omp/EXPERT-SYSTEM.md (450 lines) — full workstream expert system for oh-omp
.oh/workstreams/oh-omp/FRAME.md (218 lines) — current approved frame
.oh/workstreams/oh-omp/FRAME-model-routing-superseded.md (145 lines) — superseded frame on record

This is the PR-as-outcome protocol. The workstream's delivered changes are encoded as referenceable RNA artifacts alongside the code. Future RNA queries will surface these as alignment context for changes to the assembler, recall store, and bash interceptor subsystems.

Blast radius

AssemblerSettings interface: workingSetEnabled, workingSetEvictTurns, workingSetTokenCap added — callers that spread the interface need the new fields (defaults in SETTINGS_SCHEMA cover all existing call sites)
CodecContext.mutatedPaths — optional, backward compatible
RecallStore.maxTurn — additive, no callers removed
Context order change (sdk.ts) — affects all assembler-mode sessions; measured improvement, no functional regression expected

PR is clean. ✓

RNA familiar @ 036d0369 · dispatch 16fb29b0

coderabbitai

Actionable comments posted: 14

🧹 Nitpick comments (1)

packages/ai/test/tool-argument-coercion.test.ts (1)

660-663: ⚡ Quick win

Avoid byte-for-byte skeleton formatting assertions.

At Line 661, asserting the full formatted skeleton string couples this test to formatting/order rather than the semantic contract.

As per coding guidelines, "Do not assert exact strings, ordering, and formatting unless downstream code parses or materially depends on the exact bytes."

Proposed hardening

- expect(message).toContain('"tasks": [{ "id": string, "description": string, "assignment": string }]');
+ expect(message).toContain('"tasks"');
+ expect(message).toContain('"id": string');
+ expect(message).toContain('"description": string');
+ expect(message).toContain('"assignment": string');

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/test/tool-argument-coercion.test.ts` around lines 660 - 663, The
test currently asserts byte-for-byte skeleton formatting by checking exact
formatted substrings against message; instead relax these assertions in the test
by asserting the semantic pieces exist (e.g., replace
expect(message).toContain('"tasks": [{ "id": string, "description": string,
"assignment": string }]') with checks that the message includes the keys/markers
like '"tasks"', '"id"', '"description"', '"assignment"' and keep the existing
expect(message).toContain('"agent"') (or change it to
expect(message).toMatch(/"agent"\s*:/) for whitespace-robustness); alternatively
use a regex that verifies presence of the required fields without enforcing
ordering/spacing. Target the assertions that reference the variable message and
remove any exact skeleton formatting checks.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/test/tool-argument-coercion.test.ts`:
- Around line 666-679: The test currently only asserts inside the catch block so
it can falsely pass if validateToolArguments(taskLikeTool, toolCall) does not
throw; update the test to explicitly assert the failure by either wrapping the
call with Jest's expect(...).toThrow()/toThrowError() or by adding a
fail()/throw after the call so execution cannot silently succeed. Specifically
change the body of the "omits the skeleton for non-missing-property failures"
test to assert that validateToolArguments throws for the given toolCall (or use
expect(() => validateToolArguments(taskLikeTool, toolCall)).toThrow() and keep
the existing catch-based message assertion if you prefer to inspect the error
text).

In `@packages/coding-agent/src/context/assembler/codecs/read-codec.ts`:
- Around line 117-123: The recoveryRecipe function embeds filePath directly into
recall("...") which can produce malformed recipe text for paths with quotes or
control characters; fix this by escaping filePath before interpolation (e.g.,
compute const escaped = JSON.stringify(filePath).slice(1, -1) or an equivalent
escaping routine) and use escaped in the return template instead of filePath so
recall("...") always contains a properly escaped string.

In `@packages/coding-agent/src/context/recall/store.ts`:
- Around line 107-124: The maxTurn function conflates "no rows" and "scan timed
out" by returning null for both; change the timeout branch to reject with a
distinct error so callers can distinguish a timeout from a legitimate null
result. Specifically, in maxTurn replace the timeout Promise (currently
Bun.sleep(5000).then(() => null)) with a Promise that throws/rejects (e.g.,
Bun.sleep(5000).then(() => { throw new Error('maxTurn: timeout'); })), leaving
the scan logic (the scan variable and its null return when no rows) unchanged so
null still means "no rows" while timeouts surface as an exception the caller can
handle.

In `@packages/coding-agent/src/tasks/store.ts`:
- Around line 27-35: isLanceStorageError currently treats any message containing
"manifest" as a storage error which is too broad; narrow the manifest check in
isLanceStorageError so it only matches missing-manifest signals from lancedb
0.26.x (e.g. paths like "_latest.manifest" combined with "not found"/"was not
found", and the explicit "failed to load manifest version" pattern) while
preserving existing behavior such as the test case "failed to load manifest
version 42"; update the manifest alternative in isLanceStorageError to use
targeted regexes like /_latest\.manifest.*(not found|was not found)/i and
/failed to load manifest version/i (remove the generic /manifest/i), leaving
TaskStore.#withRecovery and `#salvageRebuild` logic unchanged.

In `@packages/coding-agent/src/tools/bash-interceptor.ts`:
- Around line 62-63: Update isCompositeCommand to treat newline-separated
commands as composite by checking for newline characters; change the regex used
on blankedCommand to include \n (and optionally \r) so it matches
/\||&&|;|\$\(|`|\r?\n/ (or add a separate /\n/.test check) instead of the
current pattern, ensuring multiline chains like "cmd1\ncmd2" are detected as
composite.

In `@packages/coding-agent/src/tools/grep.ts`:
- Around line 199-207: The branch that chooses RNA symbol results uses
hasPrecisionFilters but only checks pre/post params, so configured defaults like
grep.contextBefore/grep.contextAfter are ignored; update the hasPrecisionFilters
computation to include the effective context variables (e.g.,
normalizedContextBefore and normalizedContextAfter or the computed context
values that combine settings and params) alongside
pre/post/normalizedOffset/etc., so that when effective context > 0 the text-grep
path is taken instead of returning RNA results (adjust the expression near
hasPrecisionFilters and the subsequent if that checks
this.session.settings.get("rna.enabled") && !hasPrecisionFilters &&
looksLikeIdentifier(normalizedPattern)).

In `@packages/coding-agent/test/context/recall/store.test.ts`:
- Around line 32-34: The afterEach teardown removes the temp directory but never
closes the open RecallStore handle; update the afterEach in the test to first
await closing the store (e.g., await store.close() or await recall.close()) with
a guard if the store variable may be undefined, then remove the directory (await
fs.rm(agentDir, { recursive: true, force: true })); ensure the change targets
the afterEach block and uses the existing RecallStore instance name used in the
test so the store is closed before deletion.

In `@packages/coding-agent/test/tasks-store-recovery.test.ts`:
- Around line 38-39: The tests open TaskStore via TaskStore.open but never close
it; wrap each use of the store in a try/finally and call the store's close
method (e.g., await store.close()) in the finally block so file handles/DB
resources are released; update the occurrences around the TaskStore.open call
that creates tasks (the block using store.create([{ content: "task before
corruption" }], "sess-1")) and the other similar opens noted in the comment so
every opened TaskStore is closed even on failure.
- Around line 61-70: The test "non-storage errors propagate without triggering
rebuild" doesn't actually trigger a non-storage failure; either update the test
to simulate a real non-storage error or rename it to reflect its current
behavior. To fix, either (A) stub/mock TaskStore.query (after TaskStore.open and
store.create) to throw a non-storage Error (e.g., throw new Error("query
failure")) and assert await expect(store.query(...)).rejects.toThrow(...) and
verify no rebuild path is invoked, or (B) if you don't want to inject a throw,
rename the test to describe that it verifies a harmless escaped filter resolves
(change the test name string and keep the existing resolved expectation). Ensure
you reference the test named "non-storage errors propagate without triggering
rebuild", the TaskStore.open call, the store.create call, and the store.query
invocation when making the change.

In `@scripts/mine-context-illusion.ts`:
- Around line 52-65: scripts/mine-context-illusion.ts (lines 52-65): In
parseArgs(), validate the parsed since value by checking Number.isNaN(since)
after Date.parse(...) and if invalid print a clear error and exit non‑zero;
return the validated since value and ensure any later code (report footer)
reuses this returned since instead of reparsing process.argv.
scripts/simulate-working-set.ts (lines 304-311): Mirror the same CLI validation
— check the result of Date.parse(...) for NaN, emit a clear error and exit
non‑zero so the sweep does not run and produce a zero-session report; if valid,
pass the validated since value onward. Ensure both places fail fast on invalid
dates rather than continuing with NaN.
- Around line 193-212: The loadSession function currently reads the file and
parses lines manually and can throw if Bun.file(path).text() or JSON.parse
fails; change loadSession to catch read/parse errors and return null on failure,
and use Bun.JSONL.parse() instead of split/JSON.parse. Specifically, in
loadSession wrap the Bun.file(path).text() call and the parsing in a try/catch,
call Bun.JSONL.parse(text) to get an array of records, then iterate those
records and push record.message into messages when record.type === "message" and
push messages.length into compactionBoundaries when
/compact/i.test(record.type); if any read/parse error occurs return null, and
keep the final messages.length === 0 check and return { messages,
compactionBoundaries } otherwise.

In `@scripts/sessions.sh`:
- Around line 116-157: In the --file case inside both subcommand parsing loops
(the '--file' case that assigns FILE_PATH="$2"), guard against missing argument
before reading $2 (with set -u this currently causes an unbound parameter
error); change the --file branch to check that $# -ge 2 (or that "$2" is
non-empty) and if not print a usage/error and exit, otherwise assign
FILE_PATH="$2" and shift 2; apply this same guard in the find branch as well so
both parsing loops validate the --file argument before using $2.

In `@scripts/simulate-working-set.ts`:
- Around line 79-91: The SessionModel currently loses compaction metadata from
loadSession(); extend the SessionModel interface to include compactionBoundaries
(e.g., number[] or Set<number>) and update buildSessionModel() to copy the
compactionBoundaries returned by loadSession() into the model, then modify the
sweep/retention logic that credits "avoided" reads to flush or ignore
pinned/retained data when a request's turn (requestTurns) crosses a compaction
boundary—i.e., detect when requestTurns[i] > boundary and clear retention state
for earlier turns so later reads aren't incorrectly treated as avoided. Ensure
all uses of SessionModel (construction in buildSessionModel and any
retention/sweep functions) are updated to handle the new compactionBoundaries
field.
- Around line 184-203: The touch function can leave pinnedTokens above
config.tokenCap because the eviction loop stops when pinned.size <= 1 or when
the LRU equals the just-touched path; to fix, change the loop to continue
evicting until pinnedTokens <= config.tokenCap or there are no entries left (use
while (pinnedTokens > config.tokenCap && pinned.size > 0)), and remove the
special-case break when lruPath === path so the LRU entry (even the touched one)
can be evicted if it is causing the cap breach; keep using evict(lruPath) and
adjust any dependent accounting as needed.

---

Nitpick comments:
In `@packages/ai/test/tool-argument-coercion.test.ts`:
- Around line 660-663: The test currently asserts byte-for-byte skeleton
formatting by checking exact formatted substrings against message; instead relax
these assertions in the test by asserting the semantic pieces exist (e.g.,
replace expect(message).toContain('"tasks": [{ "id": string, "description":
string, "assignment": string }]') with checks that the message includes the
keys/markers like '"tasks"', '"id"', '"description"', '"assignment"' and keep
the existing expect(message).toContain('"agent"') (or change it to
expect(message).toMatch(/"agent"\s*:/) for whitespace-robustness); alternatively
use a regex that verifies presence of the required fields without enforcing
ordering/spacing. Target the assertions that reference the variable message and
remove any exact skeleton formatting checks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 38f8e0f5-a42b-4ee3-999f-996470fd373c

📥 Commits

Reviewing files that changed from the base of the PR and between 865c862 and 2bed716.

📒 Files selected for processing (35)

.gitignore
.oh/workstreams/oh-omp/EXPERT-SYSTEM.md
.oh/workstreams/oh-omp/FRAME-model-routing-superseded.md
.oh/workstreams/oh-omp/FRAME.md
packages/ai/CHANGELOG.md
packages/ai/src/providers/anthropic.ts
packages/ai/src/utils/validation.ts
packages/ai/test/anthropic-alignment.test.ts
packages/ai/test/tool-argument-coercion.test.ts
packages/coding-agent/CHANGELOG.md
packages/coding-agent/src/config/settings-schema.ts
packages/coding-agent/src/context/assembler/codecs/read-codec.ts
packages/coding-agent/src/context/assembler/codecs/warm-codec.ts
packages/coding-agent/src/context/assembler/message-transform.ts
packages/coding-agent/src/context/assembler/types.ts
packages/coding-agent/src/context/assembly-summary.ts
packages/coding-agent/src/context/recall/store.ts
packages/coding-agent/src/prompts/agents/workstream-expert.md
packages/coding-agent/src/prompts/commands/workstream-expert.md
packages/coding-agent/src/prompts/composer/meta-prompt.md
packages/coding-agent/src/prompts/system/system-prompt.md
packages/coding-agent/src/prompts/tools/recall.md
packages/coding-agent/src/sdk.ts
packages/coding-agent/src/tasks/store.ts
packages/coding-agent/src/tools/bash-interceptor.ts
packages/coding-agent/src/tools/grep.ts
packages/coding-agent/src/tools/read.ts
packages/coding-agent/test/assembly-summary.test.ts
packages/coding-agent/test/bash-interceptor.test.ts
packages/coding-agent/test/context/recall/store.test.ts
packages/coding-agent/test/message-transform.test.ts
packages/coding-agent/test/tasks-store-recovery.test.ts
scripts/mine-context-illusion.ts
scripts/sessions.sh
scripts/simulate-working-set.ts

coderabbitai · 2026-06-13T20:44:12Z

+	it("omits the skeleton for non-missing-property failures", () => {
+		const toolCall: ToolCall = {
+			type: "toolCall",
+			id: "call-skel-2",
+			name: "task",
+			arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] },
+		};
+		try {
+			validateToolArguments(taskLikeTool, toolCall);
+		} catch (err) {
+			const message = String(err instanceof Error ? err.message : err);
+			expect(message).not.toContain("Required shape:");
+		}
+	});


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Assert the failure path explicitly in this negative test.

At Line 666, this test can pass when validateToolArguments(...) does not throw, because there is no assertion outside the catch block.

As per coding guidelines, "Every new test must defend one concrete, externally observable contract... Never add placeholder tests, tautologies, or assertions that only prove code executed."

Proposed fix

it("omits the skeleton for non-missing-property failures", () => { const toolCall: ToolCall = { type: "toolCall", id: "call-skel-2", name: "task", arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] }, }; - try { - validateToolArguments(taskLikeTool, toolCall); - } catch (err) { - const message = String(err instanceof Error ? err.message : err); - expect(message).not.toContain("Required shape:"); - } + let message: string | null = null; + try { + validateToolArguments(taskLikeTool, toolCall); + } catch (err) { + message = String(err instanceof Error ? err.message : err); + } + expect(message).not.toBeNull(); + expect(message).not.toContain("Required shape:"); });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

it("omits the skeleton for non-missing-property failures", () => {

const toolCall: ToolCall = {

type: "toolCall",

id: "call-skel-2",

name: "task",

arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] },

};

try {

validateToolArguments(taskLikeTool, toolCall);

} catch (err) {

const message = String(err instanceof Error ? err.message : err);

expect(message).not.toContain("Required shape:");

}

});

it("omits the skeleton for non-missing-property failures", () => {

const toolCall: ToolCall = {

type: "toolCall",

id: "call-skel-2",

name: "task",

arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] },

};

let message: string | null = null;

try {

validateToolArguments(taskLikeTool, toolCall);

} catch (err) {

message = String(err instanceof Error ? err.message : err);

}

expect(message).not.toBeNull();

expect(message).not.toContain("Required shape:");

});

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai/test/tool-argument-coercion.test.ts` around lines 666 - 679, The test currently only asserts inside the catch block so it can falsely pass if validateToolArguments(taskLikeTool, toolCall) does not throw; update the test to explicitly assert the failure by either wrapping the call with Jest's expect(...).toThrow()/toThrowError() or by adding a fail()/throw after the call so execution cannot silently succeed. Specifically change the body of the "omits the skeleton for non-missing-property failures" test to assert that validateToolArguments throws for the given toolCall (or use expect(() => validateToolArguments(taskLikeTool, toolCall)).toThrow() and keep the existing catch-based message assertion if you prefer to inspect the error text).

Source: Coding guidelines

coderabbitai · 2026-06-13T20:44:12Z

+function recoveryRecipe(ctx: CodecContext, filePath: string): string {
+	const mutatedAt = ctx.mutatedPaths?.get(filePath);
+	if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) {
+		return " | edited since this read — re-read for current state";
+	}
+	return ` | unchanged in session — recall("${filePath}") expands`;
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Escape filePath before embedding it in recall("...") recipe text.

The current string interpolation can produce malformed recipe text for valid paths containing quotes/control characters (Line 122), which weakens the recovery contract.

Proposed fix

diff --git a/packages/coding-agent/src/context/assembler/codecs/read-codec.ts b/packages/coding-agent/src/context/assembler/codecs/read-codec.ts @@ function recoveryRecipe(ctx: CodecContext, filePath: string): string { const mutatedAt = ctx.mutatedPaths?.get(filePath); if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) { return " | edited since this read — re-read for current state"; } - return ` | unchanged in session — recall("${filePath}") expands`; + const quotedPath = JSON.stringify(filePath); + return ` | unchanged in session — recall(${quotedPath}) expands`; } diff --git a/packages/coding-agent/src/context/assembler/codecs/warm-codec.ts b/packages/coding-agent/src/context/assembler/codecs/warm-codec.ts @@ - } else if (path) { - headerParts.push(`recall("${path}") expands`); + } else if (path) { + headerParts.push(`recall(${JSON.stringify(path)}) expands`); } else {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function recoveryRecipe(ctx: CodecContext, filePath: string): string {

const mutatedAt = ctx.mutatedPaths?.get(filePath);

if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) {

return " | edited since this read — re-read for current state";

}

return ` | unchanged in session — recall("${filePath}") expands`;

}

function recoveryRecipe(ctx: CodecContext, filePath: string): string {

const mutatedAt = ctx.mutatedPaths?.get(filePath);

if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) {

return " | edited since this read — re-read for current state";

}

const quotedPath = JSON.stringify(filePath);

return ` | unchanged in session — recall(${quotedPath}) expands`;

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/coding-agent/src/context/assembler/codecs/read-codec.ts` around lines 117 - 123, The recoveryRecipe function embeds filePath directly into recall("...") which can produce malformed recipe text for paths with quotes or control characters; fix this by escaping filePath before interpolation (e.g., compute const escaped = JSON.stringify(filePath).slice(1, -1) or an equivalent escaping routine) and use escaped in the return template instead of filePath so recall("...") always contains a properly escaped string.

coderabbitai · 2026-06-13T20:44:13Z

+	async maxTurn(sessionId: string): Promise<number | null> {
+		const filter = `session_id = '${sessionId.replace(/'/g, "''")}'`;
+		const scan = this.#table
+			.query()
+			.where(filter)
+			.select(["turn"])
+			.toArray()
+			.then(rows => {
+				let max: number | null = null;
+				for (const row of rows as { turn: number }[]) {
+					const turn = Number(row.turn);
+					if (Number.isFinite(turn) && (max === null || turn > max)) max = turn;
+				}
+				return max;
+			});
+		const timeout = Bun.sleep(5000).then(() => null);
+		return Promise.race([scan, timeout]);
+	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Differentiate timeout from “no rows” in maxTurn.

maxTurn currently returns null for both “session has no rows” and “scan timed out”. That conflation is unsafe for resume seeding: downstream logic treats null as “start from 0”, which can reuse turn numbers after restart.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/coding-agent/src/context/recall/store.ts` around lines 107 - 124, The maxTurn function conflates "no rows" and "scan timed out" by returning null for both; change the timeout branch to reject with a distinct error so callers can distinguish a timeout from a legitimate null result. Specifically, in maxTurn replace the timeout Promise (currently Bun.sleep(5000).then(() => null)) with a Promise that throws/rejects (e.g., Bun.sleep(5000).then(() => { throw new Error('maxTurn: timeout'); })), leaving the scan logic (the scan variable and its null return when no rows) unchanged so null still means "no rows" while timeouts surface as an exception the caller can handle.

coderabbitai · 2026-06-13T20:44:13Z

+export function isLanceStorageError(err: unknown): boolean {
+	const message = err instanceof Error ? err.message : String(err);
+	return (
+		/object at location .* not found/i.test(message) ||
+		/no such file or directory/i.test(message) ||
+		/manifest/i.test(message) ||
+		/dataset .*(not found|corrupt)/i.test(message) ||
+		/invalid .*version/i.test(message)
+	);


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For @lancedb/lancedb 0.26.x, what are the canonical error-message patterns for (1) missing object file, (2) missing manifest version, and (3) corrupt dataset metadata?

💡 Result:

Canonical error-message patterns in @lancedb/lancedb 0.26.x (as surfaced via the underlying LanceIO/LanceError strings) for the three cases: 1) Missing object file Pattern (JS/Python often prefixes with “lance error: …” and wraps a LanceError(IO) “Not found”): “lance error: LanceError(IO): External error: Not found: <full/path-or-uri-to-missing-file>” [1] 2) Missing manifest version Pattern (missing _latest.manifest when no manifest/version pointer exists, typically from opening a table/dataset whose manifest pointer file is absent): “ValueError: Dataset at path .lance/_latest.manifest was not found: Not found: .lance/_latest.manifest” [2] 3) Corrupt dataset metadata Pattern (table exists on disk but can’t be opened because the internal dataset files/manifests/metadata are missing or corrupt; surfaces as a “Table … was not found”/not-loadable message): “ValueError: Table '' was not found” (when the.lance directory is present but contains no valid manifest/metadata) [3] If you want, paste the exact stack trace/error text you’re seeing in 0.26.x and I’ll map it to the closest canonical pattern above.

Citations:

1: bug(python): RuntimeError: lance error: LanceError(IO): External error: Not found (maybe NFS related) lancedb/lancedb#2963

2: open_table + create_table error implementation lancedb/lancedb#216

3: bug: table_names() lists table with corrupt/incomplete .lance directory that open_table() cannot open lancedb/lancedb#3127

🏁 Script executed:

#!/bin/bash set -euo pipefail # Locate and print the relevant function implementation FILE="packages/coding-agent/src/tasks/store.ts" echo "---- $FILE (lines around 27-60) ----" nl -ba "$FILE" | sed -n '1,120p' # Find all usages of isLanceStorageError echo "---- usages of isLanceStorageError ----" rg -n "isLanceStorageError" packages/coding-agent/src/tasks -S || true rg -n "isLanceStorageError" packages/coding-agent -S || true

Repository: open-horizon-labs/oh-omp

Length of output: 201

🏁 Script executed:

#!/bin/bash set -euo pipefail # Print only the specific line range the review references (27-60) more narrowly FILE="packages/coding-agent/src/tasks/store.ts" echo "---- $FILE (lines 20-60) ----" nl -ba "$FILE" | sed -n '20,70p' # Also print some broader context for how classifier impacts salvage/drop/rebuild echo "---- $FILE (lines 1-160) ----" nl -ba "$FILE" | sed -n '1,180p'

Repository: open-horizon-labs/oh-omp

Length of output: 194

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="packages/coding-agent/src/tasks/store.ts" echo "---- $FILE (lines 20-70) ----" nl -ba "$FILE" | sed -n '20,70p' echo "---- usages of isLanceStorageError ----" rg -n "isLanceStorageError" packages/coding-agent -S

Repository: open-horizon-labs/oh-omp

Length of output: 194

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="packages/coding-agent/src/tasks/store.ts" echo "---- $FILE (lines 20-120) ----" python3 - <<PY from pathlib import Path p = Path("$FILE") lines = p.read_text().splitlines() start, end = 20, 120 # 1-based in print for i in range(start, min(end, len(lines)) + 1): print(f"{i:4d}\t{lines[i-1]}") PY echo "---- $FILE (exact isLanceStorageError body) ----" python3 - <<PY import re from pathlib import Path p = Path("$FILE") txt = p.read_text() m = re.search(r"export function isLanceStorageError\$err: unknown\$: boolean\\s*\\{([\\s\\S]*?)\\n\\}", txt) if not m: print("NOT FOUND") else: body = m.group(0) print(body) PY echo "---- usages of isLanceStorageError ----" rg -n "isLanceStorageError" packages/coding-agent -S

Repository: open-horizon-labs/oh-omp

Length of output: 5433

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="packages/coding-agent/src/tasks/store.ts" echo "---- $FILE (lines 20-120) ----" python3 - <<PY from pathlib import Path p = Path("$FILE") lines = p.read_text().splitlines() start, end = 20, 120 # 1-based for i in range(start, min(end, len(lines)) + 1): print(f"{i:4d}\t{lines[i-1]}") PY echo "---- $FILE (exact isLanceStorageError body) ----" python3 - <<PY import re from pathlib import Path p = Path("$FILE") txt = p.read_text() m = re.search(r"export function isLanceStorageError\$err: unknown\$: boolean\\s*\\{([\\s\\S]*?)\\n\\}", txt) print(m.group(0) if m else "NOT FOUND") PY echo "---- usages of isLanceStorageError ----" rg -n "isLanceStorageError" packages/coding-agent -S

Repository: open-horizon-labs/oh-omp

Length of output: 5433

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="packages/coding-agent/test/tasks-store-recovery.test.ts" echo "---- $FILE ----" python3 - <<PY from pathlib import Path p=Path("$FILE") lines=p.read_text().splitlines() for i,l in enumerate(lines, start=1): print(f"{i:4d}\t{l}") PY

Repository: open-horizon-labs/oh-omp

Length of output: 3581

Narrow isLanceStorageError’s /manifest/i to the missing-manifest patterns only.
TaskStore.#withRecovery() uses this classifier to decide when to run the destructive #salvageRebuild() tier, and matching any /manifest/i substring is overly broad. Keep the existing intended behavior (e.g. the test case "failed to load manifest version 42") but restrict the regex to lancedb 0.26.x missing-manifest signals like /_latest\.manifest/ / “was not found” / “failed to load manifest version”, rather than any message containing “manifest”.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/coding-agent/src/tasks/store.ts` around lines 27 - 35, isLanceStorageError currently treats any message containing "manifest" as a storage error which is too broad; narrow the manifest check in isLanceStorageError so it only matches missing-manifest signals from lancedb 0.26.x (e.g. paths like "_latest.manifest" combined with "not found"/"was not found", and the explicit "failed to load manifest version" pattern) while preserving existing behavior such as the test case "failed to load manifest version 42"; update the manifest alternative in isLanceStorageError to use targeted regexes like /_latest\.manifest.*(not found|was not found)/i and /failed to load manifest version/i (remove the generic /manifest/i), leaving TaskStore.#withRecovery and `#salvageRebuild` logic unchanged.

coderabbitai · 2026-06-13T20:44:13Z

+function isCompositeCommand(blankedCommand: string): boolean {
+	return /\||&&|;|\$\(|`/.test(blankedCommand);


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Treat newline-separated shell chains as composite commands too.

isCompositeCommand() only recognizes one-line separators here. A multiline bash snippet like find . -name '*.ts'\nwc -l is still a composite command, but this helper returns false, so simpleCommandsOnly rules keep blocking the newline form while allowing the equivalent ;/| form.

Suggested fix

function isCompositeCommand(blankedCommand: string): boolean { - return /\||&&|;|\$\(|`/.test(blankedCommand); + return /\||&&|;|\n|\$\(|`/.test(blankedCommand); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function isCompositeCommand(blankedCommand: string): boolean {

return /\||&&|;|\$\(|`/.test(blankedCommand);

function isCompositeCommand(blankedCommand: string): boolean {

return /\||&&|;|\n|\$\(|`/.test(blankedCommand);

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/coding-agent/src/tools/bash-interceptor.ts` around lines 62 - 63, Update isCompositeCommand to treat newline-separated commands as composite by checking for newline characters; change the regex used on blankedCommand to include \n (and optionally \r) so it matches /\||&&|;|\$\(|`|\r?\n/ (or add a separate /\n/.test check) instead of the current pattern, ensuring multiline chains like "cmd1\ncmd2" are detected as composite.

coderabbitai · 2026-06-13T20:44:13Z

+function parseArgs(): CliArgs {
+	// Mutated via --working-set: replay history under the retention policy.
+
+	const args = process.argv.slice(2);
+	let since = Date.parse("2026-05-04");
+
+	let jsonOut: string | undefined;
+	for (let i = 0; i < args.length; i++) {
+		if (args[i] === "--since" && args[i + 1]) since = Date.parse(args[++i]!);
+		if (args[i] === "--json" && args[i + 1]) jsonOut = args[++i];
+		if (args[i] === "--working-set") WORKING_SET_REPLAY.enabled = true;
+	}
+	return { since, jsonOut };
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate --since once and fail fast on invalid dates. Both CLIs pass Date.parse(...) through unchecked. A bad date becomes NaN; the miner later crashes when formatting the report date, while the simulator silently reports an empty corpus.

scripts/mine-context-illusion.ts#L52-L65: reject NaN in parseArgs() and reuse the validated since value in the report footer instead of reparsing process.argv.

scripts/simulate-working-set.ts#L304-L311: reject NaN during CLI parsing so the sweep does not emit a false zero-session report.

📍 Affects 2 files

scripts/mine-context-illusion.ts#L52-L65 (this comment)

scripts/simulate-working-set.ts#L304-L311

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/mine-context-illusion.ts` around lines 52 - 65, scripts/mine-context-illusion.ts (lines 52-65): In parseArgs(), validate the parsed since value by checking Number.isNaN(since) after Date.parse(...) and if invalid print a clear error and exit non‑zero; return the validated since value and ensure any later code (report footer) reuses this returned since instead of reparsing process.argv. scripts/simulate-working-set.ts (lines 304-311): Mirror the same CLI validation — check the result of Date.parse(...) for NaN, emit a clear error and exit non‑zero so the sweep does not run and produce a zero-session report; if valid, pass the validated since value onward. Ensure both places fail fast on invalid dates rather than continuing with NaN.

coderabbitai · 2026-06-13T20:44:13Z

+export async function loadSession(path: string): Promise<LoadedSession | null> {
+	const text = await Bun.file(path).text();
+	const messages: AgentMessage[] = [];
+	const compactionBoundaries: number[] = [];
+	for (const line of text.split("\n")) {
+		if (!line.trim()) continue;
+		let record: { type?: string; message?: AgentMessage };
+		try {
+			record = JSON.parse(line) as { type?: string; message?: AgentMessage };
+		} catch {
+			continue;
+		}
+		if (record.type === "message" && record.message) {
+			messages.push(record.message);
+		} else if (record.type && /compact/i.test(record.type)) {
+			compactionBoundaries.push(messages.length);
+		}
+	}
+	if (messages.length === 0) return null;
+	return { messages, compactionBoundaries };


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show relevant portions of the file file="scripts/mine-context-illusion.ts" echo "== Imports (approx lines 1-60) ==" nl -ba "$file" | sed -n '1,80p' echo "== loadSession implementation (approx lines 150-260) ==" nl -ba "$file" | sed -n '150,260p' echo "== Call sites of loadSession ==" rg -n "loadSession\(" -S scripts | head -n 50

Repository: open-horizon-labs/oh-omp

Length of output: 163

🏁 Script executed:

#!/bin/bash set -euo pipefail file="scripts/mine-context-illusion.ts" echo "== Imports (approx lines 1-80) ==" nl -ba "$file" | sed -n '1,80p' echo "== loadSession implementation (approx lines 150-260) ==" nl -ba "$file" | sed -n '150,260p' echo "== Call sites of loadSession ==" rg -n "loadSession\(" -S scripts | head -n 50

Repository: open-horizon-labs/oh-omp

Length of output: 163

🏁 Script executed:

#!/bin/bash set -euo pipefail file="scripts/mine-context-illusion.ts" echo "== Imports (approx lines 1-80) ==" nl -ba "$file" | sed -n '1,80p' echo "== loadSession implementation (approx lines 150-260) ==" nl -ba "$file" | sed -n '150,260p' echo "== Call sites of loadSession ==" rg -n "loadSession\(" -S scripts | head -n 50

Repository: open-horizon-labs/oh-omp

Length of output: 163

🏁 Script executed:

#!/bin/bash set -euo pipefail file="scripts/mine-context-illusion.ts" echo "== Top of file (1-80) ==" sed -n '1,80p' "$file" echo echo "== loadSession area (160-260) ==" sed -n '160,260p' "$file" echo echo "== loadSession call sites ==" rg -n "loadSession\(" -S scripts | head -n 200

Repository: open-horizon-labs/oh-omp

Length of output: 6818

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect call sites where loadSession() is used sed -n '70,140p' scripts/simulate-working-set.ts echo sed -n '300,420p' scripts/mine-context-illusion.ts

Repository: open-horizon-labs/oh-omp

Length of output: 6479

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,180p' scripts/simulate-working-set.ts echo sed -n '180,360p' scripts/simulate-working-set.ts

Repository: open-horizon-labs/oh-omp

Length of output: 13673

🏁 Script executed:

#!/bin/bash set -euo pipefail file="scripts/mine-context-illusion.ts" echo "== main() loadSession call area (300-380) ==" sed -n '300,420p' "$file" echo echo "== Any other loadSession usage ==" rg -n "loadSession\(" -S scripts

Repository: open-horizon-labs/oh-omp

Length of output: 4478

Harden loadSession() against unreadable/transient session files and switch to Bun.JSONL.parse()

loadSession() can throw when Bun.file(path).text() fails (permission denied, deleted mid-run, etc.), which aborts both CLI scripts; callers only skip when it returns null. Catch read/parse errors and return null. Also replace manual JSONL parsing (split("\n") + JSON.parse) with Bun.JSONL.parse() to match repo guidelines.

Suggested direction

export async function loadSession(path: string): Promise<LoadedSession | null> { - const text = await Bun.file(path).text(); - const messages: AgentMessage[] = []; - const compactionBoundaries: number[] = []; - for (const line of text.split("\n")) { - if (!line.trim()) continue; - let record: { type?: string; message?: AgentMessage }; - try { - record = JSON.parse(line) as { type?: string; message?: AgentMessage }; - } catch { - continue; - } + let records: Array<{ type?: string; message?: AgentMessage }>; + try { + records = Bun.JSONL.parse(await Bun.file(path).text()) as Array<{ + type?: string; + message?: AgentMessage; + }>; + } catch { + return null; + } + const messages: AgentMessage[] = []; + const compactionBoundaries: number[] = []; + for (const record of records) { if (record.type === "message" && record.message) { messages.push(record.message); } else if (record.type && /compact/i.test(record.type)) { compactionBoundaries.push(messages.length); } } if (messages.length === 0) return null; return { messages, compactionBoundaries }; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export async function loadSession(path: string): Promise<LoadedSession | null> {

const text = await Bun.file(path).text();

const messages: AgentMessage[] = [];

const compactionBoundaries: number[] = [];

for (const line of text.split("\n")) {

if (!line.trim()) continue;

let record: { type?: string; message?: AgentMessage };

try {

record = JSON.parse(line) as { type?: string; message?: AgentMessage };

} catch {

continue;

}

if (record.type === "message" && record.message) {

messages.push(record.message);

} else if (record.type && /compact/i.test(record.type)) {

compactionBoundaries.push(messages.length);

}

}

if (messages.length === 0) return null;

return { messages, compactionBoundaries };

export async function loadSession(path: string): Promise<LoadedSession | null> {

let records: Array<{ type?: string; message?: AgentMessage }>;

try {

records = Bun.JSONL.parse(await Bun.file(path).text()) as Array<{

type?: string;

message?: AgentMessage;

}>;

} catch {

return null;

}

const messages: AgentMessage[] = [];

const compactionBoundaries: number[] = [];

for (const record of records) {

if (record.type === "message" && record.message) {

messages.push(record.message);

} else if (record.type && /compact/i.test(record.type)) {

compactionBoundaries.push(messages.length);

}

}

if (messages.length === 0) return null;

return { messages, compactionBoundaries };

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/mine-context-illusion.ts` around lines 193 - 212, The loadSession function currently reads the file and parses lines manually and can throw if Bun.file(path).text() or JSON.parse fails; change loadSession to catch read/parse errors and return null on failure, and use Bun.JSONL.parse() instead of split/JSON.parse. Specifically, in loadSession wrap the Bun.file(path).text() call and the parsing in a try/catch, call Bun.JSONL.parse(text) to get an array of records, then iterate those records and push record.message into messages when record.type === "message" and push messages.length into compactionBoundaries when /compact/i.test(record.type); if any read/parse error occurs return null, and keep the final messages.length === 0 check and return { messages, compactionBoundaries } otherwise.

Source: Coding guidelines

coderabbitai · 2026-06-13T20:44:13Z

+      while [[ $# -gt 0 ]]; do
+        case "$1" in
+          --file)
+            FILE_PATH="$2"
+            shift 2
+            ;;
+          --)
+            shift
+            break
+            ;;
+          *)
+            if [[ -z "$key" ]]; then
+              key="$1"
+            elif [[ -z "$sid" ]]; then
+              sid="$1"
+            else
+              echo "add only accepts <session-key> and <session-id>" >&2
+              usage
+              exit 1
+            fi
+            shift
+            ;;
+        esac
+      done
+
+      if [[ -z "$key" || -z "$sid" ]]; then
+        echo "add needs <session-key> and <session-id>" >&2
+        usage
+        exit 1
+      fi
+
+      add_session "$key" "$sid"
+      ;;
+
+    find)
+      local query=""
+
+      while [[ $# -gt 0 ]]; do
+        case "$1" in
+          --file)
+            FILE_PATH="$2"
+            shift 2


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate --file before reading $2.

Both subcommands do FILE_PATH="$2" under set -u. sessions.sh add foo bar --file exits on an unbound positional parameter instead of returning a normal usage error.

Suggested fix

while [[ $# -gt 0 ]]; do case "$1" in --file) + if [[ $# -lt 2 ]]; then + echo "--file requires a path" >&2 + usage + exit 1 + fi FILE_PATH="$2" shift 2 ;;

Apply the same guard in the find branch as well.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

while [[ $# -gt 0 ]]; do

case "$1" in

--file)

FILE_PATH="$2"

shift 2

;;

--)

shift

break

;;

*)

if [[ -z "$key" ]]; then

key="$1"

elif [[ -z "$sid" ]]; then

sid="$1"

else

echo "add only accepts <session-key> and <session-id>" >&2

usage

exit 1

fi

shift

;;

esac

done

if [[ -z "$key" || -z "$sid" ]]; then

echo "add needs <session-key> and <session-id>" >&2

usage

exit 1

fi

add_session "$key" "$sid"

;;

find)

local query=""

while [[ $# -gt 0 ]]; do

case "$1" in

--file)

FILE_PATH="$2"

shift 2

while [[ $# -gt 0 ]]; do

case "$1" in

--file)

if [[ $# -lt 2 ]]; then

echo "--file requires a path" >&2

usage

exit 1

fi

FILE_PATH="$2"

shift 2

;;

--)

shift

break

;;

*)

if [[ -z "$key" ]]; then

key="$1"

elif [[ -z "$sid" ]]; then

sid="$1"

else

echo "add only accepts <session-key> and <session-id>" >&2

usage

exit 1

fi

shift

;;

esac

done

if [[ -z "$key" || -z "$sid" ]]; then

echo "add needs <session-key> and <session-id>" >&2

usage

exit 1

fi

add_session "$key" "$sid"

;;

find)

local query=""

while [[ $# -gt 0 ]]; do

case "$1" in

--file)

if [[ $# -lt 2 ]]; then

echo "--file requires a path" >&2

usage

exit 1

fi

FILE_PATH="$2"

shift 2

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/sessions.sh` around lines 116 - 157, In the --file case inside both subcommand parsing loops (the '--file' case that assigns FILE_PATH="$2"), guard against missing argument before reading $2 (with set -u this currently causes an unbound parameter error); change the --file branch to check that $# -ge 2 (or that "$2" is non-empty) and if not print a usage/error and exit, otherwise assign FILE_PATH="$2" and shift 2; apply this same guard in the find branch as well so both parsing loops validate the --file argument before using $2.

coderabbitai · 2026-06-13T20:44:13Z

+interface SessionModel {
+	calls: ToolCallRef[];
+	/** message idx -> turn idx (full-session segmentation) */
+	turnOfMessage: Int32Array;
+	turnCount: number;
+	/** per message idx of a toolResult: estimated full tokens */
+	fullTokens: Map<number, number>;
+	/** per message idx of a toolResult: estimated codec'd tokens */
+	codecTokens: Map<number, number>;
+	resultIdxByCallId: Map<string, number>;
+	/** assistant message idx for each request, ascending */
+	requestTurns: number[];
+}


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Reset simulated state at compaction boundaries.

loadSession() returns compactionBoundaries because turns before those markers were no longer visible to the model. buildSessionModel() drops that metadata, so the sweep can credit a later read as "avoided" based on pinned content that should have been cleared after compaction. Carry the boundaries into SessionModel and flush the retention state when a request crosses one.

Also applies to: 97-155

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/simulate-working-set.ts` around lines 79 - 91, The SessionModel currently loses compaction metadata from loadSession(); extend the SessionModel interface to include compactionBoundaries (e.g., number[] or Set<number>) and update buildSessionModel() to copy the compactionBoundaries returned by loadSession() into the model, then modify the sweep/retention logic that credits "avoided" reads to flush or ignore pinned/retained data when a request's turn (requestTurns) crosses a compaction boundary—i.e., detect when requestTurns[i] > boundary and clear retention state for earlier turns so later reads aren't incorrectly treated as avoided. Ensure all uses of SessionModel (construction in buildSessionModel and any retention/sweep functions) are updated to handle the new compactionBoundaries field.

coderabbitai · 2026-06-13T20:44:13Z

+	const touch = (path: string, resultIdx: number, turn: number): void => {
+		const tokens = fullTokens.get(resultIdx) ?? 0;
+		const existing = pinned.get(path);
+		if (existing) pinnedTokens -= existing.tokens;
+		pinned.set(path, { lastTouchTurn: turn, resultIdx, tokens });
+		pinnedTokens += tokens;
+		result.pinEntries++;
+		// LRU eviction over token cap.
+		while (pinnedTokens > config.tokenCap && pinned.size > 1) {
+			let lruPath: string | null = null;
+			let lruTurn = Number.POSITIVE_INFINITY;
+			for (const [p, e] of pinned) {
+				if (e.lastTouchTurn < lruTurn) {
+					lruTurn = e.lastTouchTurn;
+					lruPath = p;
+				}
+			}
+			if (lruPath === null || lruPath === path) break;
+			evict(lruPath);
+		}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The tokenCap invariant is not enforced for oversized entries.

If one pinned result is larger than tokenCap, or the just-touched path is selected as the LRU entry, this loop breaks and leaves pinnedTokens above the configured cap. That makes both the reported peak footprint and the "breaks avoided" counts too optimistic.

Suggested fix

const touch = (path: string, resultIdx: number, turn: number): void => { const tokens = fullTokens.get(resultIdx) ?? 0; + if (tokens > config.tokenCap) { + evict(path); + return; + } const existing = pinned.get(path); if (existing) pinnedTokens -= existing.tokens; pinned.set(path, { lastTouchTurn: turn, resultIdx, tokens }); pinnedTokens += tokens; result.pinEntries++; // LRU eviction over token cap. - while (pinnedTokens > config.tokenCap && pinned.size > 1) { + while (pinnedTokens > config.tokenCap && pinned.size > 0) { let lruPath: string | null = null; let lruTurn = Number.POSITIVE_INFINITY; for (const [p, e] of pinned) { if (e.lastTouchTurn < lruTurn) { lruTurn = e.lastTouchTurn; lruPath = p; } } - if (lruPath === null || lruPath === path) break; + if (lruPath === null) break; evict(lruPath); } };

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/simulate-working-set.ts` around lines 184 - 203, The touch function can leave pinnedTokens above config.tokenCap because the eviction loop stops when pinned.size <= 1 or when the LRU equals the just-touched path; to fix, change the loop to continue evicting until pinnedTokens <= config.tokenCap or there are no entries left (use while (pinnedTokens > config.tokenCap && pinned.size > 0)), and remove the special-case break when lruPath === path so the LRU entry (even the touched one) can be evicted if it is causing the cap breach; keep using evict(lruPath) and adjust any dependent accounting as needed.

durch added 21 commits June 10, 2026 15:49

feat(coding-agent): surface working-set pin count in assembly summary

d4e653f

The per-turn assembly dashboard line now shows 'N pinned (working set)' when retention pins are active, making the policy observable while dogfooding without opening the prompt inspector.

feat(scripts): track failed recall turn-expansions in illusion miner

c5677f2

recall(turn=N) calls returning empty/no content are failed recovery attempts - the evidence gate for the deferred toolCallId-based stub expansion slice. Report line now shows 'of which empty/failed: N'.

chore: ignore local sessions file

f4d218b

chore: update workstream expert prompts and session helper

90d6d3d

fix(ai): keep Anthropic cache markers off synthetic context

d811c97

Add oh-omp workstream expert system

8290e89

docs: update oh-omp expert system phase gates

2bf1d0d

docs: harden workstream expert builder gates

9428b96

chore: make sessions path repo-relative

d267112

docs: make workstreams deliver PR-ready changes

2bed716

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

		function isCompositeCommand(blankedCommand: string): boolean {
		return /\\|\|&&\|;\|\$\(\|`/.test(blankedCommand);

Uh oh!

Conversation

durch commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

❌ Failed checks (1 warning)

Uh oh!

rna-familiar Bot commented Jun 13, 2026

🧬 RNA Analysis — oh-omp#75

1. Context injection moved to tail — cache discipline fix (sdk.ts)

2. Working-set retention (settings-schema.ts, sdk.ts, message-transform.ts)

3. Recovery recipes in compression stub headers (read-codec.ts, warm-codec.ts)

4. RecallStore.maxTurn(sessionId) + ingest counter seeding (recall/store.ts, sdk.ts)

5. Bash interceptor correctness (bash-interceptor.ts, settings-schema.ts)

6. Grep/find precision filter guard (grep.ts)

7. Read tool RNA guard (read.ts)

8. TaskStore recovery (tasks/store.ts)

9. Tool validation skeleton hint (ai/src providers)

.oh/ workstream artifacts

Blast radius

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

durch commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

4. `RecallStore.maxTurn(sessionId)` + ingest counter seeding (recall/store.ts, sdk.ts)

`.oh/` workstream artifacts