Skip to content

docs: make workstreams deliver PR-ready changes#75

Open
durch wants to merge 21 commits into
mainfrom
workstream/pr-as-outcome-protocol
Open

docs: make workstreams deliver PR-ready changes#75
durch wants to merge 21 commits into
mainfrom
workstream/pr-as-outcome-protocol

Conversation

@durch

@durch durch commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • amend the oh-omp expert system so workstreams default to useful PR/MR-ready output instead of plan-only/governance-only stops
  • treat branch creation/use, logical commits, push to origin, and opening/preparing PR/MR as normal reviewable ship mechanics inside invoked workstream/ship flows
  • preserve human-owned gates for merge, release/tag/publish, destructive/non-reviewable external side effects, model binding/rebinding, and unresolved material decisions
  • update the workstream-expert builder prompts so future projects inherit the same PR-as-outcome semantics

Verification

  • git diff --check -- .oh/workstreams/oh-omp/EXPERT-SYSTEM.md packages/coding-agent/src/prompts/agents/workstream-expert.md packages/coding-agent/src/prompts/commands/workstream-expert.md
  • adversarial grep for old over-gating language and positive PR-as-outcome law
  • Superego review: DECISION: ALLOW, CONFIDENCE: high

Notes

  • This is governance/prompt law only; no product runtime code changed.
  • Ship stops before merge. Release remains separate and human-owned.
  • Branch includes the existing local ahead commits plus the latest protocol-law commit 2bed71690.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed bash command pattern matching to avoid false positives on quoted shell content.
    • Corrected session resumption turn numbering to prevent collisions after resuming sessions.
    • Improved tool validation error messages with required field structure hints.
    • Refined Anthropic prompt caching to avoid marking synthetic context with cache markers.
  • New Features

    • Added configurable working-set retention settings for preserving frequently-accessed content during context compression.
    • New session management CLI for mapping and tracking session identifiers.
  • Improvements

    • Enhanced bash interceptor to detect composite shell commands and apply stricter rule matching.
    • Improved grepping to skip symbol-index searches when precision filters are active.

durch added 21 commits June 10, 2026 15:49
Read-only miner that replays the production assembler transform
(dedup/read/warm codecs, hot window) over recorded session JSONL and
measures context-illusion integrity: re-fetch of compressed content,
recall vs full-price recovery split, recall efficacy, and wasted-token
cost. Mutation-justified re-reads are excluded; output is aggregate-only
with hashed samples (no payloads). Part of harness-ergonomics Track A.
Sweeps eviction age, token cap, and entry rule for the proposed
assembler working-set policy over the recorded corpus, with a
cache-aware cost model (write-priced savings vs read-priced retention
rent, dedup back-ref residue). Miner helpers exported for reuse;
miner main guarded with import.meta.main.
Files the model actively re-reads (2+ unchanged re-reads) keep their
canonical first-read turn verbatim beyond the hot window, so dedup
back-references resolve to full content instead of a codec skeleton.
Pins are version-keyed (path + content hash) so interleaved range reads
don't reset candidacy; one version per path; evicted after N untouched
turns or beyond a token cap; budget pressure still applies.

Pure transformMessages policy: deterministic, cache-stable (pinned
turns never change representation while pinned), no protocol changes.
Off unless explicitly enabled; runtime wires it from assembler settings
(default on, evict=8 turns, cap=16K). Decisions report reason
'working-set' for observability.

Verified: 68 transform tests pass; full-corpus replay via
mine-context-illusion --working-set reclassifies 55 historical
read-only illusion breaks as served-by-pin.
… change

A pinned version is disqualified the moment the same read view
(path + offset/limit) returns different content after its last touch
(re-read after an edit). Without this, provably outdated file content
stayed pinned verbatim for up to evictAfterTurns after a mutation -
worst exactly in edit-heavy sessions. Different ranges of a file remain
independent (pagination is not proof of change); unproven staleness
still ages out.

Found by dogfooding-hazard analysis before it bit: the edit-then-reread
cycle is the dominant pattern in sessions working on this very code.
Stubs now tell the model how to recover at the point of need instead of
relying on system-prompt instructions 100K tokens away:

- read stubs: 'recall("<path>") expands' when the file is unchanged in
  session, or 'edited since this read - re-read for current state' when
  a later in-session mutation makes the snapshot provably stale
- warm stubs: path-scoped recall recipe when the call targeted a file,
  generic 'recall expands' otherwise

Staleness comes from a pure transcript scan (mutating tool calls per
path) threaded into CodecContext as mutatedPaths - no fs I/O in the
transform. Recipes are path-query based, not turn-based: recall turn
numbering is process-scoped and diverges from transcript numbering
(documented seam, follow-up tracked).
The per-turn assembly dashboard line now shows 'N pinned (working set)'
when retention pins are active, making the policy observable while
dogfooding without opening the prompt inspector.
When validation fails on a missing required property, append a compact
required-only skeleton derived from the schema (two levels deep). The
dominant observed failure mode is models confidently filling sibling
fields while omitting a container property (e.g. task's tasks array,
3/3 sampled errors in session-log mining); naming the missing property
alone leaves the model guessing at the shape.

Applies to every tool. Type-mismatch failures are unchanged (no
skeleton noise when the model already knows the shape).
Tiered recovery ladder on every store operation:
1. reopen the table and retry - the dominant failure is a stale handle
   after another agent process compacts version manifests (data intact,
   handle dead)
2. salvage readable rows -> timestamped JSON backup in agentDir ->
   drop -> recreate -> re-insert salvage -> retry

Non-storage errors (bad filters, programmer errors) propagate
unchanged; rebuild only triggers on classified Lance storage failures.
Compaction cutoff widened from 2 minutes to 24 hours, closing the
cross-process prune race that produced ~49% of todo/todos tool errors
in session-log mining.

Worst-case loss: rows unreadable at corruption time, with the backup
attempt on disk either way. Update remains delete+re-insert but is
wrapped as a single recovery op so a retry can never lose the row.

Verified with real on-disk corruption: destroying the table directory
mid-session rebuilds automatically and the store stays operational.
…all recovery

Layer 1 - store data bug: the ingest turn counter was process-lifetime
while the recall store is session-persistent. After a resume, new rows
reused turn numbers already occupied by pre-restart rows of the same
session, so recall(turn=N) merged unrelated turns. The counter now
seeds from RecallStore.maxTurn(sessionId) at wiring time (new method,
5s timeout, fail-open to 0 with a warning).

Layer 2 - lying affordance: system prompt, recall tool prompt, and
composer meta-prompt all claimed stubs carry recall-compatible turn
numbers. They never did. All three now teach the shipped mechanism:
follow the stub's inline recipe (recall("<path>"), or re-read when
flagged 'edited since this read'); for pathless stubs use
recall(query=...) first and take turn numbers from recall results.

One-step turn-precise expansion from stubs (toolCallId-based) is
deferred behind dogfood telemetry per the turn-identity decision.
… stability

Front-placed injections (recalled-context, concept-graph, assembly summary)
regenerated message 0 every request, invalidating the Anthropic prompt cache
for the entire transcript each turn. Measured via proxy: cache write/read
2.3-8.0 for assembler traffic vs ~0.1 control, with reads capped at the
system+tools prefix. Tail placement preserves the conversation prefix so
writes are incremental.
…equests

Two wrong-result bugs (silent bad data, worse than errors):

- grep: the RNA structural shortcut replaced text grep while ignoring
  glob/type/case/multiline/context/offset entirely - a glob-filtered
  search could return TS symbols and never search the target files.
  RNA now only fires when no precision filters are present (explicit
  pre/post only, since context defaults come from settings), and its
  header states it is a symbol index with the escape hatch spelled out.

- read: the RNA structural-view gate checked !offset only, so raw:true
  and explicit limit still returned the skeleton instead of file
  content. raw/limit/offset are explicit requests for actual content
  and now all bypass the structural view.
Two precision fixes to the guard engine:

- Quote-blanking: rules now match against the command with quoted
  segments blanked. Quoted content is shell data, not syntax - echo of
  literal '>' markers is not a redirection, 'sed -i' inside a python -c
  string is not a sed invocation. Fixes the echo-as-redirection FP
  class for every rule generically.

- simpleCommandsOnly: grep/find rules skip composite commands (unquoted
  pipes, chains, substitution). 'find ... | xargs wc' has no find-tool
  equivalent; blocking it forced guard-sandwich loops. Mutation rules
  (sed/perl/awk, echo redirection) still apply everywhere.

9 regression tests covering both FP classes and the preserved blocks.
recall(turn=N) calls returning empty/no content are failed recovery
attempts - the evidence gate for the deferred toolCallId-based stub
expansion slice. Report line now shows 'of which empty/failed: N'.
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive governance framework for the oh-omp workstream, improves Anthropic prompt caching by tracking cache-eligible messages early in conversion, implements working-set retention to preserve actively re-read file content across message transforms, enhances codec recovery markers for staleness detection, refines tool interception logic, and adds analysis utilities for measuring context integrity.

Changes

Workstream Governance and Context Assembly Improvements

Layer / File(s) Summary
Workstream governance framework and system law
.oh/workstreams/oh-omp/EXPERT-SYSTEM.md, .oh/workstreams/oh-omp/FRAME.md, .oh/workstreams/oh-omp/FRAME-model-routing-superseded.md, packages/coding-agent/src/prompts/agents/workstream-expert.md, packages/coding-agent/src/prompts/commands/workstream-expert.md
Establishes durable project law (EXPERT-SYSTEM.md) with dual-chapter acceptance criteria, authority chains, phase mapping (execute/ship/release), superego phase-handoff gates, and model roster bindings; creates FRAME documents for oh-omp and model-routing scopes with execution contracts, stop conditions, and material uncertainties.
Anthropic prompt caching with cache-candidate message marking
packages/ai/CHANGELOG.md, packages/ai/src/providers/anthropic.ts, packages/ai/test/anthropic-alignment.test.ts
Threads cache-candidate message indices through conversion (convertAnthropicMessagesWithMetadata) to mark non-synthetic user messages and tool-result aggregations as cache-eligible early, then applies cache breakpoints near those marks instead of deriving from role alone.
Validation error enrichment
packages/ai/src/utils/validation.ts, packages/ai/test/tool-argument-coercion.test.ts
Adds renderRequiredSkeleton to generate compact JSON-schema required-property hints; appends hints to validation errors when AJV reports missing required fields.
Working-set retention for actively re-read content
packages/coding-agent/src/config/settings-schema.ts, packages/coding-agent/src/context/assembler/message-transform.ts, packages/coding-agent/src/context/assembler/types.ts, packages/coding-agent/test/message-transform.test.ts
Implements optional message-transform retention policy that preserves verbatim content for tool results whose files are repeatedly re-read in-session, constrained by eviction age and token cap; detects mutations and threads mutated-path map through codec context for staleness awareness.
Codec recovery recipes and staleness markers
packages/coding-agent/src/context/assembler/codecs/read-codec.ts, packages/coding-agent/src/context/assembler/codecs/warm-codec.ts
Adds recoveryRecipe helper and recovery instruction text to read and warm codecs indicating file edit status since read and appropriate recall/re-run strategy.
Tool interception and search refinements
packages/coding-agent/src/tools/bash-interceptor.ts, packages/coding-agent/src/tools/grep.ts, packages/coding-agent/src/tools/read.ts, packages/coding-agent/test/bash-interceptor.test.ts
Hardens bash-interceptor to avoid false positives in quoted strings and composite commands via quote-blanking and composite detection; refines grep RNA structural search gating to exclude precision filters; tightens read tool RNA substitution conditions.
Session reliability and storage recovery
packages/coding-agent/src/context/recall/store.ts, packages/coding-agent/src/tasks/store.ts, packages/coding-agent/test/context/recall/store.test.ts, packages/coding-agent/test/tasks-store-recovery.test.ts
Adds RecallStore.maxTurn(sessionId) to seed turn indices on session resume; implements storage-corruption detection, salvage-and-rebuild mechanics in TaskStore with tiered recovery (#withRecovery).
Message assembly and SDK integration
packages/coding-agent/src/sdk.ts, packages/coding-agent/src/context/assembly-summary.ts
Moves assembled context (concept-graph, hydrated recall, assembly summary) to tail of message array to preserve prompt prefix for incremental cache writes; threads working-set configuration through assembler pipeline; seeds recall ingestion from maxTurn.
Analysis and simulation utilities
scripts/mine-context-illusion.ts, scripts/simulate-working-set.ts, scripts/sessions.sh
Introduces scripts to analyze session transcripts for context-illusion integrity (re-fetch classification, mutation detection) and simulate working-set retention configurations (pinning, eviction, token overhead); adds bash CLI for session-key/ID mapping.
Configuration, prompts, and documentation
packages/coding-agent/CHANGELOG.md, .gitignore, packages/coding-agent/src/prompts/composer/meta-prompt.md, packages/coding-agent/src/prompts/system/system-prompt.md, packages/coding-agent/src/prompts/tools/recall.md
Updates changelogs and prompts to document context placement, recovery recipes, and phase-handoff semantics; adds sessions.txt to ignore list.

Sequence Diagrams

sequenceDiagram
  participant Client
  participant buildParams
  participant convertAnthropicMessagesWithMetadata
  participant applyPromptCaching
  participant Anthropic API
  Client->>buildParams: message array
  buildParams->>convertAnthropicMessagesWithMetadata: convert with metadata
  convertAnthropicMessagesWithMetadata->>convertAnthropicMessagesWithMetadata: mark non-synthetic user messages
  convertAnthropicMessagesWithMetadata-->>buildParams: messages + cacheCandidateIndexes
  buildParams->>applyPromptCaching: call with cacheCandidateIndexes
  applyPromptCaching->>applyPromptCaching: position cache breakpoints near candidates
  applyPromptCaching-->>buildParams: params with cache_control markers
  buildParams->>Anthropic API: send request
Loading
sequenceDiagram
  participant transformMessages
  participant mutation detection
  participant working-set compute
  participant pinning logic
  participant codec context
  transformMessages->>mutation detection: scan toolCalls for mutating tools
  mutation detection-->>transformMessages: mutatedPaths map
  transformMessages->>working-set compute: compute retention exemptions
  working-set compute->>working-set compute: group re-reads by path+view+hash
  working-set compute->>pinning logic: select canonical pinned turn per path
  pinning logic->>pinning logic: respect eviction age and token cap
  pinning logic-->>transformMessages: exemption turn indices
  transformMessages->>codec context: thread mutatedPaths for staleness
  codec context-->>codec context: emit recovery recipe hints
Loading

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.96% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs: make workstreams deliver PR-ready changes' directly aligns with the PR's primary objective to amend the expert system so workstreams default to producing PR/MR-ready output.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch workstream/pr-as-outcome-protocol

Comment @coderabbitai help to get the list of available commands and usage tips.

@rna-familiar

rna-familiar Bot commented Jun 13, 2026

Copy link
Copy Markdown

🧬 RNA Analysis — oh-omp#75

Scope: 37 files · ~3400 lines · Context assembly hardening, bash interceptor correctness, working-set retention, TaskStore recovery, workstream expert system artifacts


1. Context injection moved to tail — cache discipline fix (sdk.ts)

contextMessages (recalled-context, concept-graph facts, assembly summary) moved from front to tail of the message array:

-const finalMessages = contextMessages.length > 0 ? [...contextMessages, ...boundedMessages] : boundedMessages;
+const finalMessages = contextMessages.length > 0 ? [...boundedMessages, ...contextMessages] : boundedMessages;

Why: Anthropic prompt caching matches on a stable message prefix. Front-placement regenerated message 0 on every request — recalled-context blocks change per turn, so the entire transcript was rewritten at cache-write prices (measured write/read ratio 2.3–8.0 vs ~0.1 for non-assembler traffic). Tail placement adds one block of fresh input per request without breaking the conversation prefix. CHANGELOG documents the measurement.


2. Working-set retention (settings-schema.ts, sdk.ts, message-transform.ts)

Three new assembler settings: workingSetEnabled (default true), workingSetEvictTurns (default 8), workingSetTokenCap (default 16,000). Keeps actively re-read files pinned verbatim beyond the hot window for up to N turns without a re-read, up to the token cap. Assembly summary shows N pinned (working set) when active.


3. Recovery recipes in compression stub headers (read-codec.ts, warm-codec.ts)

recoveryRecipe(ctx, filePath) — added to every [warm:read:…] stub header:

  • File unchanged in session → recall("path") expands
  • File mutated since this read → edited since this read — re-read for current state

mutatedPaths?: ReadonlyMap<string, number> added to CodecContext. System prompt, recall tool docs, and meta-prompt updated to explain inline recipe lookup. This closes a context-decay loop: the agent no longer needs to guess how to recover a compressed read — the recovery instruction is carried in the stub itself.


4. RecallStore.maxTurn(sessionId) + ingest counter seeding (recall/store.ts, sdk.ts)

RecallStore.maxTurn(sessionId) — returns the highest ingested turn number for a session. Used to seed the ingest turn counter on process resume so post-restart rows never reuse turn numbers occupied by pre-restart rows (which would merge unrelated turns in recall(turn=N) lookups). 5-second timeout. Test suite: null for no rows, scoped by session ID, post-resume counter prevents collision.


5. Bash interceptor correctness (bash-interceptor.ts, settings-schema.ts)

blankQuotedSegments(command) — strips quoted content before pattern matching. echo "a > b" no longer triggers the write interceptor; python3 -c 'sed -i ...' no longer triggers the edit interceptor.

isCompositeCommand(blanked) — detects |, &&, ;, $(, ` in the blanked command.

simpleCommandsOnly: true on grep and find rules — find … | xargs wc passes through; find … -name '*.ts' (simple) still blocks. sed -i (no flag) still blocks in chains. Test suite: 9 cases covering quoted data, pipelines, composite detection, sed non-simple rule.


6. Grep/find precision filter guard (grep.ts)

RNA structural search now gated by !hasPrecisionFilters. Previously, grep -g '*.md' pattern could return RNA symbol results while ignoring the glob filter. Flags blocked: glob, type, ignoreCase, multiline, pre, post, offset. Result banner updated: [RNA structural search — symbol index, not text grep; add glob/type or regex syntax to force text search].

7. Read tool RNA guard (read.ts)

!limit && !raw added to the RNA structural view condition. Explicit limit/raw requests for actual file content are no longer substituted with the structural view.


8. TaskStore recovery (tasks/store.ts)

isLanceStorageError(err) — classifies Lance storage corruption vs. query/schema errors. Auto-rebuild on storage failure: salvage tasks to tasks-backup-<timestamp>.json, drop corrupted table, reinitialise. Store remains operational after rebuild. Non-storage errors propagate without triggering rebuild. Test suite: 2 tests including destructive on-disk corruption simulation.


9. Tool validation skeleton hint (ai/src providers)

validateToolArguments appends Required shape: with TypeBox-derived skeleton when a required property is missing. Only emitted on missing-property failures (not type-mismatch). Tests: skeleton for task tool with missing tasks array; no skeleton for type mismatch.


.oh/ workstream artifacts

Three new artifacts committed:

  • .oh/workstreams/oh-omp/EXPERT-SYSTEM.md (450 lines) — full workstream expert system for oh-omp
  • .oh/workstreams/oh-omp/FRAME.md (218 lines) — current approved frame
  • .oh/workstreams/oh-omp/FRAME-model-routing-superseded.md (145 lines) — superseded frame on record

This is the PR-as-outcome protocol. The workstream's delivered changes are encoded as referenceable RNA artifacts alongside the code. Future RNA queries will surface these as alignment context for changes to the assembler, recall store, and bash interceptor subsystems.


Blast radius

  • AssemblerSettings interface: workingSetEnabled, workingSetEvictTurns, workingSetTokenCap added — callers that spread the interface need the new fields (defaults in SETTINGS_SCHEMA cover all existing call sites)
  • CodecContext.mutatedPaths — optional, backward compatible
  • RecallStore.maxTurn — additive, no callers removed
  • Context order change (sdk.ts) — affects all assembler-mode sessions; measured improvement, no functional regression expected

PR is clean.

RNA familiar @ 036d0369 · dispatch 16fb29b0

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (1)
packages/ai/test/tool-argument-coercion.test.ts (1)

660-663: ⚡ Quick win

Avoid byte-for-byte skeleton formatting assertions.

At Line 661, asserting the full formatted skeleton string couples this test to formatting/order rather than the semantic contract.

As per coding guidelines, "Do not assert exact strings, ordering, and formatting unless downstream code parses or materially depends on the exact bytes."

Proposed hardening
- expect(message).toContain('"tasks": [{ "id": string, "description": string, "assignment": string }]');
+ expect(message).toContain('"tasks"');
+ expect(message).toContain('"id": string');
+ expect(message).toContain('"description": string');
+ expect(message).toContain('"assignment": string');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/test/tool-argument-coercion.test.ts` around lines 660 - 663, The
test currently asserts byte-for-byte skeleton formatting by checking exact
formatted substrings against message; instead relax these assertions in the test
by asserting the semantic pieces exist (e.g., replace
expect(message).toContain('"tasks": [{ "id": string, "description": string,
"assignment": string }]') with checks that the message includes the keys/markers
like '"tasks"', '"id"', '"description"', '"assignment"' and keep the existing
expect(message).toContain('"agent"') (or change it to
expect(message).toMatch(/"agent"\s*:/) for whitespace-robustness); alternatively
use a regex that verifies presence of the required fields without enforcing
ordering/spacing. Target the assertions that reference the variable message and
remove any exact skeleton formatting checks.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/test/tool-argument-coercion.test.ts`:
- Around line 666-679: The test currently only asserts inside the catch block so
it can falsely pass if validateToolArguments(taskLikeTool, toolCall) does not
throw; update the test to explicitly assert the failure by either wrapping the
call with Jest's expect(...).toThrow()/toThrowError() or by adding a
fail()/throw after the call so execution cannot silently succeed. Specifically
change the body of the "omits the skeleton for non-missing-property failures"
test to assert that validateToolArguments throws for the given toolCall (or use
expect(() => validateToolArguments(taskLikeTool, toolCall)).toThrow() and keep
the existing catch-based message assertion if you prefer to inspect the error
text).

In `@packages/coding-agent/src/context/assembler/codecs/read-codec.ts`:
- Around line 117-123: The recoveryRecipe function embeds filePath directly into
recall("...") which can produce malformed recipe text for paths with quotes or
control characters; fix this by escaping filePath before interpolation (e.g.,
compute const escaped = JSON.stringify(filePath).slice(1, -1) or an equivalent
escaping routine) and use escaped in the return template instead of filePath so
recall("...") always contains a properly escaped string.

In `@packages/coding-agent/src/context/recall/store.ts`:
- Around line 107-124: The maxTurn function conflates "no rows" and "scan timed
out" by returning null for both; change the timeout branch to reject with a
distinct error so callers can distinguish a timeout from a legitimate null
result. Specifically, in maxTurn replace the timeout Promise (currently
Bun.sleep(5000).then(() => null)) with a Promise that throws/rejects (e.g.,
Bun.sleep(5000).then(() => { throw new Error('maxTurn: timeout'); })), leaving
the scan logic (the scan variable and its null return when no rows) unchanged so
null still means "no rows" while timeouts surface as an exception the caller can
handle.

In `@packages/coding-agent/src/tasks/store.ts`:
- Around line 27-35: isLanceStorageError currently treats any message containing
"manifest" as a storage error which is too broad; narrow the manifest check in
isLanceStorageError so it only matches missing-manifest signals from lancedb
0.26.x (e.g. paths like "_latest.manifest" combined with "not found"/"was not
found", and the explicit "failed to load manifest version" pattern) while
preserving existing behavior such as the test case "failed to load manifest
version 42"; update the manifest alternative in isLanceStorageError to use
targeted regexes like /_latest\.manifest.*(not found|was not found)/i and
/failed to load manifest version/i (remove the generic /manifest/i), leaving
TaskStore.#withRecovery and `#salvageRebuild` logic unchanged.

In `@packages/coding-agent/src/tools/bash-interceptor.ts`:
- Around line 62-63: Update isCompositeCommand to treat newline-separated
commands as composite by checking for newline characters; change the regex used
on blankedCommand to include \n (and optionally \r) so it matches
/\||&&|;|\$\(|`|\r?\n/ (or add a separate /\n/.test check) instead of the
current pattern, ensuring multiline chains like "cmd1\ncmd2" are detected as
composite.

In `@packages/coding-agent/src/tools/grep.ts`:
- Around line 199-207: The branch that chooses RNA symbol results uses
hasPrecisionFilters but only checks pre/post params, so configured defaults like
grep.contextBefore/grep.contextAfter are ignored; update the hasPrecisionFilters
computation to include the effective context variables (e.g.,
normalizedContextBefore and normalizedContextAfter or the computed context
values that combine settings and params) alongside
pre/post/normalizedOffset/etc., so that when effective context > 0 the text-grep
path is taken instead of returning RNA results (adjust the expression near
hasPrecisionFilters and the subsequent if that checks
this.session.settings.get("rna.enabled") && !hasPrecisionFilters &&
looksLikeIdentifier(normalizedPattern)).

In `@packages/coding-agent/test/context/recall/store.test.ts`:
- Around line 32-34: The afterEach teardown removes the temp directory but never
closes the open RecallStore handle; update the afterEach in the test to first
await closing the store (e.g., await store.close() or await recall.close()) with
a guard if the store variable may be undefined, then remove the directory (await
fs.rm(agentDir, { recursive: true, force: true })); ensure the change targets
the afterEach block and uses the existing RecallStore instance name used in the
test so the store is closed before deletion.

In `@packages/coding-agent/test/tasks-store-recovery.test.ts`:
- Around line 38-39: The tests open TaskStore via TaskStore.open but never close
it; wrap each use of the store in a try/finally and call the store's close
method (e.g., await store.close()) in the finally block so file handles/DB
resources are released; update the occurrences around the TaskStore.open call
that creates tasks (the block using store.create([{ content: "task before
corruption" }], "sess-1")) and the other similar opens noted in the comment so
every opened TaskStore is closed even on failure.
- Around line 61-70: The test "non-storage errors propagate without triggering
rebuild" doesn't actually trigger a non-storage failure; either update the test
to simulate a real non-storage error or rename it to reflect its current
behavior. To fix, either (A) stub/mock TaskStore.query (after TaskStore.open and
store.create) to throw a non-storage Error (e.g., throw new Error("query
failure")) and assert await expect(store.query(...)).rejects.toThrow(...) and
verify no rebuild path is invoked, or (B) if you don't want to inject a throw,
rename the test to describe that it verifies a harmless escaped filter resolves
(change the test name string and keep the existing resolved expectation). Ensure
you reference the test named "non-storage errors propagate without triggering
rebuild", the TaskStore.open call, the store.create call, and the store.query
invocation when making the change.

In `@scripts/mine-context-illusion.ts`:
- Around line 52-65: scripts/mine-context-illusion.ts (lines 52-65): In
parseArgs(), validate the parsed since value by checking Number.isNaN(since)
after Date.parse(...) and if invalid print a clear error and exit non‑zero;
return the validated since value and ensure any later code (report footer)
reuses this returned since instead of reparsing process.argv.
scripts/simulate-working-set.ts (lines 304-311): Mirror the same CLI validation
— check the result of Date.parse(...) for NaN, emit a clear error and exit
non‑zero so the sweep does not run and produce a zero-session report; if valid,
pass the validated since value onward. Ensure both places fail fast on invalid
dates rather than continuing with NaN.
- Around line 193-212: The loadSession function currently reads the file and
parses lines manually and can throw if Bun.file(path).text() or JSON.parse
fails; change loadSession to catch read/parse errors and return null on failure,
and use Bun.JSONL.parse() instead of split/JSON.parse. Specifically, in
loadSession wrap the Bun.file(path).text() call and the parsing in a try/catch,
call Bun.JSONL.parse(text) to get an array of records, then iterate those
records and push record.message into messages when record.type === "message" and
push messages.length into compactionBoundaries when
/compact/i.test(record.type); if any read/parse error occurs return null, and
keep the final messages.length === 0 check and return { messages,
compactionBoundaries } otherwise.

In `@scripts/sessions.sh`:
- Around line 116-157: In the --file case inside both subcommand parsing loops
(the '--file' case that assigns FILE_PATH="$2"), guard against missing argument
before reading $2 (with set -u this currently causes an unbound parameter
error); change the --file branch to check that $# -ge 2 (or that "$2" is
non-empty) and if not print a usage/error and exit, otherwise assign
FILE_PATH="$2" and shift 2; apply this same guard in the find branch as well so
both parsing loops validate the --file argument before using $2.

In `@scripts/simulate-working-set.ts`:
- Around line 79-91: The SessionModel currently loses compaction metadata from
loadSession(); extend the SessionModel interface to include compactionBoundaries
(e.g., number[] or Set<number>) and update buildSessionModel() to copy the
compactionBoundaries returned by loadSession() into the model, then modify the
sweep/retention logic that credits "avoided" reads to flush or ignore
pinned/retained data when a request's turn (requestTurns) crosses a compaction
boundary—i.e., detect when requestTurns[i] > boundary and clear retention state
for earlier turns so later reads aren't incorrectly treated as avoided. Ensure
all uses of SessionModel (construction in buildSessionModel and any
retention/sweep functions) are updated to handle the new compactionBoundaries
field.
- Around line 184-203: The touch function can leave pinnedTokens above
config.tokenCap because the eviction loop stops when pinned.size <= 1 or when
the LRU equals the just-touched path; to fix, change the loop to continue
evicting until pinnedTokens <= config.tokenCap or there are no entries left (use
while (pinnedTokens > config.tokenCap && pinned.size > 0)), and remove the
special-case break when lruPath === path so the LRU entry (even the touched one)
can be evicted if it is causing the cap breach; keep using evict(lruPath) and
adjust any dependent accounting as needed.

---

Nitpick comments:
In `@packages/ai/test/tool-argument-coercion.test.ts`:
- Around line 660-663: The test currently asserts byte-for-byte skeleton
formatting by checking exact formatted substrings against message; instead relax
these assertions in the test by asserting the semantic pieces exist (e.g.,
replace expect(message).toContain('"tasks": [{ "id": string, "description":
string, "assignment": string }]') with checks that the message includes the
keys/markers like '"tasks"', '"id"', '"description"', '"assignment"' and keep
the existing expect(message).toContain('"agent"') (or change it to
expect(message).toMatch(/"agent"\s*:/) for whitespace-robustness); alternatively
use a regex that verifies presence of the required fields without enforcing
ordering/spacing. Target the assertions that reference the variable message and
remove any exact skeleton formatting checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 38f8e0f5-a42b-4ee3-999f-996470fd373c

📥 Commits

Reviewing files that changed from the base of the PR and between 865c862 and 2bed716.

📒 Files selected for processing (35)
  • .gitignore
  • .oh/workstreams/oh-omp/EXPERT-SYSTEM.md
  • .oh/workstreams/oh-omp/FRAME-model-routing-superseded.md
  • .oh/workstreams/oh-omp/FRAME.md
  • packages/ai/CHANGELOG.md
  • packages/ai/src/providers/anthropic.ts
  • packages/ai/src/utils/validation.ts
  • packages/ai/test/anthropic-alignment.test.ts
  • packages/ai/test/tool-argument-coercion.test.ts
  • packages/coding-agent/CHANGELOG.md
  • packages/coding-agent/src/config/settings-schema.ts
  • packages/coding-agent/src/context/assembler/codecs/read-codec.ts
  • packages/coding-agent/src/context/assembler/codecs/warm-codec.ts
  • packages/coding-agent/src/context/assembler/message-transform.ts
  • packages/coding-agent/src/context/assembler/types.ts
  • packages/coding-agent/src/context/assembly-summary.ts
  • packages/coding-agent/src/context/recall/store.ts
  • packages/coding-agent/src/prompts/agents/workstream-expert.md
  • packages/coding-agent/src/prompts/commands/workstream-expert.md
  • packages/coding-agent/src/prompts/composer/meta-prompt.md
  • packages/coding-agent/src/prompts/system/system-prompt.md
  • packages/coding-agent/src/prompts/tools/recall.md
  • packages/coding-agent/src/sdk.ts
  • packages/coding-agent/src/tasks/store.ts
  • packages/coding-agent/src/tools/bash-interceptor.ts
  • packages/coding-agent/src/tools/grep.ts
  • packages/coding-agent/src/tools/read.ts
  • packages/coding-agent/test/assembly-summary.test.ts
  • packages/coding-agent/test/bash-interceptor.test.ts
  • packages/coding-agent/test/context/recall/store.test.ts
  • packages/coding-agent/test/message-transform.test.ts
  • packages/coding-agent/test/tasks-store-recovery.test.ts
  • scripts/mine-context-illusion.ts
  • scripts/sessions.sh
  • scripts/simulate-working-set.ts

Comment on lines +666 to +679
it("omits the skeleton for non-missing-property failures", () => {
const toolCall: ToolCall = {
type: "toolCall",
id: "call-skel-2",
name: "task",
arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] },
};
try {
validateToolArguments(taskLikeTool, toolCall);
} catch (err) {
const message = String(err instanceof Error ? err.message : err);
expect(message).not.toContain("Required shape:");
}
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Assert the failure path explicitly in this negative test.

At Line 666, this test can pass when validateToolArguments(...) does not throw, because there is no assertion outside the catch block.

As per coding guidelines, "Every new test must defend one concrete, externally observable contract... Never add placeholder tests, tautologies, or assertions that only prove code executed."

Proposed fix
 it("omits the skeleton for non-missing-property failures", () => {
   const toolCall: ToolCall = {
     type: "toolCall",
     id: "call-skel-2",
     name: "task",
     arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] },
   };
-  try {
-    validateToolArguments(taskLikeTool, toolCall);
-  } catch (err) {
-    const message = String(err instanceof Error ? err.message : err);
-    expect(message).not.toContain("Required shape:");
-  }
+  let message: string | null = null;
+  try {
+    validateToolArguments(taskLikeTool, toolCall);
+  } catch (err) {
+    message = String(err instanceof Error ? err.message : err);
+  }
+  expect(message).not.toBeNull();
+  expect(message).not.toContain("Required shape:");
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it("omits the skeleton for non-missing-property failures", () => {
const toolCall: ToolCall = {
type: "toolCall",
id: "call-skel-2",
name: "task",
arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] },
};
try {
validateToolArguments(taskLikeTool, toolCall);
} catch (err) {
const message = String(err instanceof Error ? err.message : err);
expect(message).not.toContain("Required shape:");
}
});
it("omits the skeleton for non-missing-property failures", () => {
const toolCall: ToolCall = {
type: "toolCall",
id: "call-skel-2",
name: "task",
arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] },
};
let message: string | null = null;
try {
validateToolArguments(taskLikeTool, toolCall);
} catch (err) {
message = String(err instanceof Error ? err.message : err);
}
expect(message).not.toBeNull();
expect(message).not.toContain("Required shape:");
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/test/tool-argument-coercion.test.ts` around lines 666 - 679, The
test currently only asserts inside the catch block so it can falsely pass if
validateToolArguments(taskLikeTool, toolCall) does not throw; update the test to
explicitly assert the failure by either wrapping the call with Jest's
expect(...).toThrow()/toThrowError() or by adding a fail()/throw after the call
so execution cannot silently succeed. Specifically change the body of the "omits
the skeleton for non-missing-property failures" test to assert that
validateToolArguments throws for the given toolCall (or use expect(() =>
validateToolArguments(taskLikeTool, toolCall)).toThrow() and keep the existing
catch-based message assertion if you prefer to inspect the error text).

Source: Coding guidelines

Comment on lines +117 to +123
function recoveryRecipe(ctx: CodecContext, filePath: string): string {
const mutatedAt = ctx.mutatedPaths?.get(filePath);
if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) {
return " | edited since this read — re-read for current state";
}
return ` | unchanged in session — recall("${filePath}") expands`;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Escape filePath before embedding it in recall("...") recipe text.

The current string interpolation can produce malformed recipe text for valid paths containing quotes/control characters (Line 122), which weakens the recovery contract.

Proposed fix
diff --git a/packages/coding-agent/src/context/assembler/codecs/read-codec.ts b/packages/coding-agent/src/context/assembler/codecs/read-codec.ts
@@
 function recoveryRecipe(ctx: CodecContext, filePath: string): string {
 	const mutatedAt = ctx.mutatedPaths?.get(filePath);
 	if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) {
 		return " | edited since this read — re-read for current state";
 	}
-	return ` | unchanged in session — recall("${filePath}") expands`;
+	const quotedPath = JSON.stringify(filePath);
+	return ` | unchanged in session — recall(${quotedPath}) expands`;
 }
diff --git a/packages/coding-agent/src/context/assembler/codecs/warm-codec.ts b/packages/coding-agent/src/context/assembler/codecs/warm-codec.ts
@@
-		} else if (path) {
-			headerParts.push(`recall("${path}") expands`);
+		} else if (path) {
+			headerParts.push(`recall(${JSON.stringify(path)}) expands`);
 		} else {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function recoveryRecipe(ctx: CodecContext, filePath: string): string {
const mutatedAt = ctx.mutatedPaths?.get(filePath);
if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) {
return " | edited since this read — re-read for current state";
}
return ` | unchanged in session — recall("${filePath}") expands`;
}
function recoveryRecipe(ctx: CodecContext, filePath: string): string {
const mutatedAt = ctx.mutatedPaths?.get(filePath);
if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) {
return " | edited since this read — re-read for current state";
}
const quotedPath = JSON.stringify(filePath);
return ` | unchanged in session — recall(${quotedPath}) expands`;
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/coding-agent/src/context/assembler/codecs/read-codec.ts` around
lines 117 - 123, The recoveryRecipe function embeds filePath directly into
recall("...") which can produce malformed recipe text for paths with quotes or
control characters; fix this by escaping filePath before interpolation (e.g.,
compute const escaped = JSON.stringify(filePath).slice(1, -1) or an equivalent
escaping routine) and use escaped in the return template instead of filePath so
recall("...") always contains a properly escaped string.

Comment on lines +107 to +124
async maxTurn(sessionId: string): Promise<number | null> {
const filter = `session_id = '${sessionId.replace(/'/g, "''")}'`;
const scan = this.#table
.query()
.where(filter)
.select(["turn"])
.toArray()
.then(rows => {
let max: number | null = null;
for (const row of rows as { turn: number }[]) {
const turn = Number(row.turn);
if (Number.isFinite(turn) && (max === null || turn > max)) max = turn;
}
return max;
});
const timeout = Bun.sleep(5000).then(() => null);
return Promise.race([scan, timeout]);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Differentiate timeout from “no rows” in maxTurn.

maxTurn currently returns null for both “session has no rows” and “scan timed out”. That conflation is unsafe for resume seeding: downstream logic treats null as “start from 0”, which can reuse turn numbers after restart.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/coding-agent/src/context/recall/store.ts` around lines 107 - 124,
The maxTurn function conflates "no rows" and "scan timed out" by returning null
for both; change the timeout branch to reject with a distinct error so callers
can distinguish a timeout from a legitimate null result. Specifically, in
maxTurn replace the timeout Promise (currently Bun.sleep(5000).then(() => null))
with a Promise that throws/rejects (e.g., Bun.sleep(5000).then(() => { throw new
Error('maxTurn: timeout'); })), leaving the scan logic (the scan variable and
its null return when no rows) unchanged so null still means "no rows" while
timeouts surface as an exception the caller can handle.

Comment on lines +27 to +35
export function isLanceStorageError(err: unknown): boolean {
const message = err instanceof Error ? err.message : String(err);
return (
/object at location .* not found/i.test(message) ||
/no such file or directory/i.test(message) ||
/manifest/i.test(message) ||
/dataset .*(not found|corrupt)/i.test(message) ||
/invalid .*version/i.test(message)
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For @lancedb/lancedb 0.26.x, what are the canonical error-message patterns for (1) missing object file, (2) missing manifest version, and (3) corrupt dataset metadata?

💡 Result:

Canonical error-message patterns in @lancedb/lancedb 0.26.x (as surfaced via the underlying LanceIO/LanceError strings) for the three cases: 1) Missing object file Pattern (JS/Python often prefixes with “lance error: …” and wraps a LanceError(IO) “Not found”): “lance error: LanceError(IO): External error: Not found: <full/path-or-uri-to-missing-file>” [1] 2) Missing manifest version Pattern (missing _latest.manifest when no manifest/version pointer exists, typically from opening a table/dataset whose manifest pointer file is absent): “ValueError: Dataset at path .lance/_latest.manifest was not found: Not found: .lance/_latest.manifest” [2] 3) Corrupt dataset metadata Pattern (table exists on disk but can’t be opened because the internal dataset files/manifests/metadata are missing or corrupt; surfaces as a “Table … was not found”/not-loadable message): “ValueError: Table '' was not found” (when the.lance directory is present but contains no valid manifest/metadata) [3] If you want, paste the exact stack trace/error text you’re seeing in 0.26.x and I’ll map it to the closest canonical pattern above.

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and print the relevant function implementation
FILE="packages/coding-agent/src/tasks/store.ts"
echo "---- $FILE (lines around 27-60) ----"
nl -ba "$FILE" | sed -n '1,120p'

# Find all usages of isLanceStorageError
echo "---- usages of isLanceStorageError ----"
rg -n "isLanceStorageError" packages/coding-agent/src/tasks -S || true
rg -n "isLanceStorageError" packages/coding-agent -S || true

Repository: open-horizon-labs/oh-omp

Length of output: 201


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Print only the specific line range the review references (27-60) more narrowly
FILE="packages/coding-agent/src/tasks/store.ts"
echo "---- $FILE (lines 20-60) ----"
nl -ba "$FILE" | sed -n '20,70p'

# Also print some broader context for how classifier impacts salvage/drop/rebuild
echo "---- $FILE (lines 1-160) ----"
nl -ba "$FILE" | sed -n '1,180p'

Repository: open-horizon-labs/oh-omp

Length of output: 194


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="packages/coding-agent/src/tasks/store.ts"

echo "---- $FILE (lines 20-70) ----"
nl -ba "$FILE" | sed -n '20,70p'

echo "---- usages of isLanceStorageError ----"
rg -n "isLanceStorageError" packages/coding-agent -S

Repository: open-horizon-labs/oh-omp

Length of output: 194


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="packages/coding-agent/src/tasks/store.ts"

echo "---- $FILE (lines 20-120) ----"
python3 - <<PY
from pathlib import Path
p = Path("$FILE")
lines = p.read_text().splitlines()
start, end = 20, 120  # 1-based in print
for i in range(start, min(end, len(lines)) + 1):
    print(f"{i:4d}\t{lines[i-1]}")
PY

echo "---- $FILE (exact isLanceStorageError body) ----"
python3 - <<PY
import re
from pathlib import Path
p = Path("$FILE")
txt = p.read_text()
m = re.search(r"export function isLanceStorageError\\(err: unknown\\): boolean\\s*\\{([\\s\\S]*?)\\n\\}", txt)
if not m:
    print("NOT FOUND")
else:
    body = m.group(0)
    print(body)
PY

echo "---- usages of isLanceStorageError ----"
rg -n "isLanceStorageError" packages/coding-agent -S

Repository: open-horizon-labs/oh-omp

Length of output: 5433


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="packages/coding-agent/src/tasks/store.ts"

echo "---- $FILE (lines 20-120) ----"
python3 - <<PY
from pathlib import Path
p = Path("$FILE")
lines = p.read_text().splitlines()
start, end = 20, 120  # 1-based
for i in range(start, min(end, len(lines)) + 1):
    print(f"{i:4d}\t{lines[i-1]}")
PY

echo "---- $FILE (exact isLanceStorageError body) ----"
python3 - <<PY
import re
from pathlib import Path
p = Path("$FILE")
txt = p.read_text()
m = re.search(r"export function isLanceStorageError\\(err: unknown\\): boolean\\s*\\{([\\s\\S]*?)\\n\\}", txt)
print(m.group(0) if m else "NOT FOUND")
PY

echo "---- usages of isLanceStorageError ----"
rg -n "isLanceStorageError" packages/coding-agent -S

Repository: open-horizon-labs/oh-omp

Length of output: 5433


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="packages/coding-agent/test/tasks-store-recovery.test.ts"
echo "---- $FILE ----"
python3 - <<PY
from pathlib import Path
p=Path("$FILE")
lines=p.read_text().splitlines()
for i,l in enumerate(lines, start=1):
    print(f"{i:4d}\t{l}")
PY

Repository: open-horizon-labs/oh-omp

Length of output: 3581


Narrow isLanceStorageError’s /manifest/i to the missing-manifest patterns only.
TaskStore.#withRecovery() uses this classifier to decide when to run the destructive #salvageRebuild() tier, and matching any /manifest/i substring is overly broad. Keep the existing intended behavior (e.g. the test case "failed to load manifest version 42") but restrict the regex to lancedb 0.26.x missing-manifest signals like /_latest\.manifest/ / “was not found” / “failed to load manifest version”, rather than any message containing “manifest”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/coding-agent/src/tasks/store.ts` around lines 27 - 35,
isLanceStorageError currently treats any message containing "manifest" as a
storage error which is too broad; narrow the manifest check in
isLanceStorageError so it only matches missing-manifest signals from lancedb
0.26.x (e.g. paths like "_latest.manifest" combined with "not found"/"was not
found", and the explicit "failed to load manifest version" pattern) while
preserving existing behavior such as the test case "failed to load manifest
version 42"; update the manifest alternative in isLanceStorageError to use
targeted regexes like /_latest\.manifest.*(not found|was not found)/i and
/failed to load manifest version/i (remove the generic /manifest/i), leaving
TaskStore.#withRecovery and `#salvageRebuild` logic unchanged.

Comment on lines +62 to +63
function isCompositeCommand(blankedCommand: string): boolean {
return /\||&&|;|\$\(|`/.test(blankedCommand);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Treat newline-separated shell chains as composite commands too.

isCompositeCommand() only recognizes one-line separators here. A multiline bash snippet like find . -name '*.ts'\nwc -l is still a composite command, but this helper returns false, so simpleCommandsOnly rules keep blocking the newline form while allowing the equivalent ;/| form.

Suggested fix
 function isCompositeCommand(blankedCommand: string): boolean {
-	return /\||&&|;|\$\(|`/.test(blankedCommand);
+	return /\||&&|;|\n|\$\(|`/.test(blankedCommand);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function isCompositeCommand(blankedCommand: string): boolean {
return /\||&&|;|\$\(|`/.test(blankedCommand);
function isCompositeCommand(blankedCommand: string): boolean {
return /\||&&|;|\n|\$\(|`/.test(blankedCommand);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/coding-agent/src/tools/bash-interceptor.ts` around lines 62 - 63,
Update isCompositeCommand to treat newline-separated commands as composite by
checking for newline characters; change the regex used on blankedCommand to
include \n (and optionally \r) so it matches /\||&&|;|\$\(|`|\r?\n/ (or add a
separate /\n/.test check) instead of the current pattern, ensuring multiline
chains like "cmd1\ncmd2" are detected as composite.

Comment on lines +52 to +65
function parseArgs(): CliArgs {
// Mutated via --working-set: replay history under the retention policy.

const args = process.argv.slice(2);
let since = Date.parse("2026-05-04");

let jsonOut: string | undefined;
for (let i = 0; i < args.length; i++) {
if (args[i] === "--since" && args[i + 1]) since = Date.parse(args[++i]!);
if (args[i] === "--json" && args[i + 1]) jsonOut = args[++i];
if (args[i] === "--working-set") WORKING_SET_REPLAY.enabled = true;
}
return { since, jsonOut };
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate --since once and fail fast on invalid dates. Both CLIs pass Date.parse(...) through unchecked. A bad date becomes NaN; the miner later crashes when formatting the report date, while the simulator silently reports an empty corpus.

  • scripts/mine-context-illusion.ts#L52-L65: reject NaN in parseArgs() and reuse the validated since value in the report footer instead of reparsing process.argv.
  • scripts/simulate-working-set.ts#L304-L311: reject NaN during CLI parsing so the sweep does not emit a false zero-session report.
📍 Affects 2 files
  • scripts/mine-context-illusion.ts#L52-L65 (this comment)
  • scripts/simulate-working-set.ts#L304-L311
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/mine-context-illusion.ts` around lines 52 - 65,
scripts/mine-context-illusion.ts (lines 52-65): In parseArgs(), validate the
parsed since value by checking Number.isNaN(since) after Date.parse(...) and if
invalid print a clear error and exit non‑zero; return the validated since value
and ensure any later code (report footer) reuses this returned since instead of
reparsing process.argv. scripts/simulate-working-set.ts (lines 304-311): Mirror
the same CLI validation — check the result of Date.parse(...) for NaN, emit a
clear error and exit non‑zero so the sweep does not run and produce a
zero-session report; if valid, pass the validated since value onward. Ensure
both places fail fast on invalid dates rather than continuing with NaN.

Comment on lines +193 to +212
export async function loadSession(path: string): Promise<LoadedSession | null> {
const text = await Bun.file(path).text();
const messages: AgentMessage[] = [];
const compactionBoundaries: number[] = [];
for (const line of text.split("\n")) {
if (!line.trim()) continue;
let record: { type?: string; message?: AgentMessage };
try {
record = JSON.parse(line) as { type?: string; message?: AgentMessage };
} catch {
continue;
}
if (record.type === "message" && record.message) {
messages.push(record.message);
} else if (record.type && /compact/i.test(record.type)) {
compactionBoundaries.push(messages.length);
}
}
if (messages.length === 0) return null;
return { messages, compactionBoundaries };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show relevant portions of the file
file="scripts/mine-context-illusion.ts"
echo "== Imports (approx lines 1-60) =="
nl -ba "$file" | sed -n '1,80p'

echo "== loadSession implementation (approx lines 150-260) =="
nl -ba "$file" | sed -n '150,260p'

echo "== Call sites of loadSession =="
rg -n "loadSession\(" -S scripts | head -n 50

Repository: open-horizon-labs/oh-omp

Length of output: 163


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="scripts/mine-context-illusion.ts"
echo "== Imports (approx lines 1-80) =="
nl -ba "$file" | sed -n '1,80p'

echo "== loadSession implementation (approx lines 150-260) =="
nl -ba "$file" | sed -n '150,260p'

echo "== Call sites of loadSession =="
rg -n "loadSession\(" -S scripts | head -n 50

Repository: open-horizon-labs/oh-omp

Length of output: 163


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="scripts/mine-context-illusion.ts"
echo "== Imports (approx lines 1-80) =="
nl -ba "$file" | sed -n '1,80p'

echo "== loadSession implementation (approx lines 150-260) =="
nl -ba "$file" | sed -n '150,260p'

echo "== Call sites of loadSession =="
rg -n "loadSession\(" -S scripts | head -n 50

Repository: open-horizon-labs/oh-omp

Length of output: 163


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="scripts/mine-context-illusion.ts"

echo "== Top of file (1-80) =="
sed -n '1,80p' "$file"

echo
echo "== loadSession area (160-260) =="
sed -n '160,260p' "$file"

echo
echo "== loadSession call sites =="
rg -n "loadSession\(" -S scripts | head -n 200

Repository: open-horizon-labs/oh-omp

Length of output: 6818


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect call sites where loadSession() is used
sed -n '70,140p' scripts/simulate-working-set.ts
echo
sed -n '300,420p' scripts/mine-context-illusion.ts

Repository: open-horizon-labs/oh-omp

Length of output: 6479


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,180p' scripts/simulate-working-set.ts
echo
sed -n '180,360p' scripts/simulate-working-set.ts

Repository: open-horizon-labs/oh-omp

Length of output: 13673


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="scripts/mine-context-illusion.ts"

echo "== main() loadSession call area (300-380) =="
sed -n '300,420p' "$file"

echo
echo "== Any other loadSession usage =="
rg -n "loadSession\(" -S scripts

Repository: open-horizon-labs/oh-omp

Length of output: 4478


Harden loadSession() against unreadable/transient session files and switch to Bun.JSONL.parse()

loadSession() can throw when Bun.file(path).text() fails (permission denied, deleted mid-run, etc.), which aborts both CLI scripts; callers only skip when it returns null. Catch read/parse errors and return null. Also replace manual JSONL parsing (split("\n") + JSON.parse) with Bun.JSONL.parse() to match repo guidelines.

Suggested direction
 export async function loadSession(path: string): Promise<LoadedSession | null> {
-	const text = await Bun.file(path).text();
-	const messages: AgentMessage[] = [];
-	const compactionBoundaries: number[] = [];
-	for (const line of text.split("\n")) {
-		if (!line.trim()) continue;
-		let record: { type?: string; message?: AgentMessage };
-		try {
-			record = JSON.parse(line) as { type?: string; message?: AgentMessage };
-		} catch {
-			continue;
-		}
+	let records: Array<{ type?: string; message?: AgentMessage }>;
+	try {
+		records = Bun.JSONL.parse(await Bun.file(path).text()) as Array<{
+			type?: string;
+			message?: AgentMessage;
+		}>;
+	} catch {
+		return null;
+	}
+	const messages: AgentMessage[] = [];
+	const compactionBoundaries: number[] = [];
+	for (const record of records) {
 		if (record.type === "message" && record.message) {
 			messages.push(record.message);
 		} else if (record.type && /compact/i.test(record.type)) {
 			compactionBoundaries.push(messages.length);
 		}
 	}
 	if (messages.length === 0) return null;
 	return { messages, compactionBoundaries };
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function loadSession(path: string): Promise<LoadedSession | null> {
const text = await Bun.file(path).text();
const messages: AgentMessage[] = [];
const compactionBoundaries: number[] = [];
for (const line of text.split("\n")) {
if (!line.trim()) continue;
let record: { type?: string; message?: AgentMessage };
try {
record = JSON.parse(line) as { type?: string; message?: AgentMessage };
} catch {
continue;
}
if (record.type === "message" && record.message) {
messages.push(record.message);
} else if (record.type && /compact/i.test(record.type)) {
compactionBoundaries.push(messages.length);
}
}
if (messages.length === 0) return null;
return { messages, compactionBoundaries };
export async function loadSession(path: string): Promise<LoadedSession | null> {
let records: Array<{ type?: string; message?: AgentMessage }>;
try {
records = Bun.JSONL.parse(await Bun.file(path).text()) as Array<{
type?: string;
message?: AgentMessage;
}>;
} catch {
return null;
}
const messages: AgentMessage[] = [];
const compactionBoundaries: number[] = [];
for (const record of records) {
if (record.type === "message" && record.message) {
messages.push(record.message);
} else if (record.type && /compact/i.test(record.type)) {
compactionBoundaries.push(messages.length);
}
}
if (messages.length === 0) return null;
return { messages, compactionBoundaries };
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/mine-context-illusion.ts` around lines 193 - 212, The loadSession
function currently reads the file and parses lines manually and can throw if
Bun.file(path).text() or JSON.parse fails; change loadSession to catch
read/parse errors and return null on failure, and use Bun.JSONL.parse() instead
of split/JSON.parse. Specifically, in loadSession wrap the Bun.file(path).text()
call and the parsing in a try/catch, call Bun.JSONL.parse(text) to get an array
of records, then iterate those records and push record.message into messages
when record.type === "message" and push messages.length into
compactionBoundaries when /compact/i.test(record.type); if any read/parse error
occurs return null, and keep the final messages.length === 0 check and return {
messages, compactionBoundaries } otherwise.

Source: Coding guidelines

Comment thread scripts/sessions.sh
Comment on lines +116 to +157
while [[ $# -gt 0 ]]; do
case "$1" in
--file)
FILE_PATH="$2"
shift 2
;;
--)
shift
break
;;
*)
if [[ -z "$key" ]]; then
key="$1"
elif [[ -z "$sid" ]]; then
sid="$1"
else
echo "add only accepts <session-key> and <session-id>" >&2
usage
exit 1
fi
shift
;;
esac
done

if [[ -z "$key" || -z "$sid" ]]; then
echo "add needs <session-key> and <session-id>" >&2
usage
exit 1
fi

add_session "$key" "$sid"
;;

find)
local query=""

while [[ $# -gt 0 ]]; do
case "$1" in
--file)
FILE_PATH="$2"
shift 2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate --file before reading $2.

Both subcommands do FILE_PATH="$2" under set -u. sessions.sh add foo bar --file exits on an unbound positional parameter instead of returning a normal usage error.

Suggested fix
       while [[ $# -gt 0 ]]; do
         case "$1" in
           --file)
+            if [[ $# -lt 2 ]]; then
+              echo "--file requires a path" >&2
+              usage
+              exit 1
+            fi
             FILE_PATH="$2"
             shift 2
             ;;

Apply the same guard in the find branch as well.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while [[ $# -gt 0 ]]; do
case "$1" in
--file)
FILE_PATH="$2"
shift 2
;;
--)
shift
break
;;
*)
if [[ -z "$key" ]]; then
key="$1"
elif [[ -z "$sid" ]]; then
sid="$1"
else
echo "add only accepts <session-key> and <session-id>" >&2
usage
exit 1
fi
shift
;;
esac
done
if [[ -z "$key" || -z "$sid" ]]; then
echo "add needs <session-key> and <session-id>" >&2
usage
exit 1
fi
add_session "$key" "$sid"
;;
find)
local query=""
while [[ $# -gt 0 ]]; do
case "$1" in
--file)
FILE_PATH="$2"
shift 2
while [[ $# -gt 0 ]]; do
case "$1" in
--file)
if [[ $# -lt 2 ]]; then
echo "--file requires a path" >&2
usage
exit 1
fi
FILE_PATH="$2"
shift 2
;;
--)
shift
break
;;
*)
if [[ -z "$key" ]]; then
key="$1"
elif [[ -z "$sid" ]]; then
sid="$1"
else
echo "add only accepts <session-key> and <session-id>" >&2
usage
exit 1
fi
shift
;;
esac
done
if [[ -z "$key" || -z "$sid" ]]; then
echo "add needs <session-key> and <session-id>" >&2
usage
exit 1
fi
add_session "$key" "$sid"
;;
find)
local query=""
while [[ $# -gt 0 ]]; do
case "$1" in
--file)
if [[ $# -lt 2 ]]; then
echo "--file requires a path" >&2
usage
exit 1
fi
FILE_PATH="$2"
shift 2
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/sessions.sh` around lines 116 - 157, In the --file case inside both
subcommand parsing loops (the '--file' case that assigns FILE_PATH="$2"), guard
against missing argument before reading $2 (with set -u this currently causes an
unbound parameter error); change the --file branch to check that $# -ge 2 (or
that "$2" is non-empty) and if not print a usage/error and exit, otherwise
assign FILE_PATH="$2" and shift 2; apply this same guard in the find branch as
well so both parsing loops validate the --file argument before using $2.

Comment on lines +79 to +91
interface SessionModel {
calls: ToolCallRef[];
/** message idx -> turn idx (full-session segmentation) */
turnOfMessage: Int32Array;
turnCount: number;
/** per message idx of a toolResult: estimated full tokens */
fullTokens: Map<number, number>;
/** per message idx of a toolResult: estimated codec'd tokens */
codecTokens: Map<number, number>;
resultIdxByCallId: Map<string, number>;
/** assistant message idx for each request, ascending */
requestTurns: number[];
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Reset simulated state at compaction boundaries.

loadSession() returns compactionBoundaries because turns before those markers were no longer visible to the model. buildSessionModel() drops that metadata, so the sweep can credit a later read as "avoided" based on pinned content that should have been cleared after compaction. Carry the boundaries into SessionModel and flush the retention state when a request crosses one.

Also applies to: 97-155

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/simulate-working-set.ts` around lines 79 - 91, The SessionModel
currently loses compaction metadata from loadSession(); extend the SessionModel
interface to include compactionBoundaries (e.g., number[] or Set<number>) and
update buildSessionModel() to copy the compactionBoundaries returned by
loadSession() into the model, then modify the sweep/retention logic that credits
"avoided" reads to flush or ignore pinned/retained data when a request's turn
(requestTurns) crosses a compaction boundary—i.e., detect when requestTurns[i] >
boundary and clear retention state for earlier turns so later reads aren't
incorrectly treated as avoided. Ensure all uses of SessionModel (construction in
buildSessionModel and any retention/sweep functions) are updated to handle the
new compactionBoundaries field.

Comment on lines +184 to +203
const touch = (path: string, resultIdx: number, turn: number): void => {
const tokens = fullTokens.get(resultIdx) ?? 0;
const existing = pinned.get(path);
if (existing) pinnedTokens -= existing.tokens;
pinned.set(path, { lastTouchTurn: turn, resultIdx, tokens });
pinnedTokens += tokens;
result.pinEntries++;
// LRU eviction over token cap.
while (pinnedTokens > config.tokenCap && pinned.size > 1) {
let lruPath: string | null = null;
let lruTurn = Number.POSITIVE_INFINITY;
for (const [p, e] of pinned) {
if (e.lastTouchTurn < lruTurn) {
lruTurn = e.lastTouchTurn;
lruPath = p;
}
}
if (lruPath === null || lruPath === path) break;
evict(lruPath);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The tokenCap invariant is not enforced for oversized entries.

If one pinned result is larger than tokenCap, or the just-touched path is selected as the LRU entry, this loop breaks and leaves pinnedTokens above the configured cap. That makes both the reported peak footprint and the "breaks avoided" counts too optimistic.

Suggested fix
 	const touch = (path: string, resultIdx: number, turn: number): void => {
 		const tokens = fullTokens.get(resultIdx) ?? 0;
+		if (tokens > config.tokenCap) {
+			evict(path);
+			return;
+		}
 		const existing = pinned.get(path);
 		if (existing) pinnedTokens -= existing.tokens;
 		pinned.set(path, { lastTouchTurn: turn, resultIdx, tokens });
 		pinnedTokens += tokens;
 		result.pinEntries++;
 		// LRU eviction over token cap.
-		while (pinnedTokens > config.tokenCap && pinned.size > 1) {
+		while (pinnedTokens > config.tokenCap && pinned.size > 0) {
 			let lruPath: string | null = null;
 			let lruTurn = Number.POSITIVE_INFINITY;
 			for (const [p, e] of pinned) {
 				if (e.lastTouchTurn < lruTurn) {
 					lruTurn = e.lastTouchTurn;
 					lruPath = p;
 				}
 			}
-			if (lruPath === null || lruPath === path) break;
+			if (lruPath === null) break;
 			evict(lruPath);
 		}
 	};
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/simulate-working-set.ts` around lines 184 - 203, The touch function
can leave pinnedTokens above config.tokenCap because the eviction loop stops
when pinned.size <= 1 or when the LRU equals the just-touched path; to fix,
change the loop to continue evicting until pinnedTokens <= config.tokenCap or
there are no entries left (use while (pinnedTokens > config.tokenCap &&
pinned.size > 0)), and remove the special-case break when lruPath === path so
the LRU entry (even the touched one) can be evicted if it is causing the cap
breach; keep using evict(lruPath) and adjust any dependent accounting as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant