docs: make workstreams deliver PR-ready changes#75
Conversation
Read-only miner that replays the production assembler transform (dedup/read/warm codecs, hot window) over recorded session JSONL and measures context-illusion integrity: re-fetch of compressed content, recall vs full-price recovery split, recall efficacy, and wasted-token cost. Mutation-justified re-reads are excluded; output is aggregate-only with hashed samples (no payloads). Part of harness-ergonomics Track A.
Sweeps eviction age, token cap, and entry rule for the proposed assembler working-set policy over the recorded corpus, with a cache-aware cost model (write-priced savings vs read-priced retention rent, dedup back-ref residue). Miner helpers exported for reuse; miner main guarded with import.meta.main.
Files the model actively re-reads (2+ unchanged re-reads) keep their canonical first-read turn verbatim beyond the hot window, so dedup back-references resolve to full content instead of a codec skeleton. Pins are version-keyed (path + content hash) so interleaved range reads don't reset candidacy; one version per path; evicted after N untouched turns or beyond a token cap; budget pressure still applies. Pure transformMessages policy: deterministic, cache-stable (pinned turns never change representation while pinned), no protocol changes. Off unless explicitly enabled; runtime wires it from assembler settings (default on, evict=8 turns, cap=16K). Decisions report reason 'working-set' for observability. Verified: 68 transform tests pass; full-corpus replay via mine-context-illusion --working-set reclassifies 55 historical read-only illusion breaks as served-by-pin.
… change A pinned version is disqualified the moment the same read view (path + offset/limit) returns different content after its last touch (re-read after an edit). Without this, provably outdated file content stayed pinned verbatim for up to evictAfterTurns after a mutation - worst exactly in edit-heavy sessions. Different ranges of a file remain independent (pagination is not proof of change); unproven staleness still ages out. Found by dogfooding-hazard analysis before it bit: the edit-then-reread cycle is the dominant pattern in sessions working on this very code.
Stubs now tell the model how to recover at the point of need instead of
relying on system-prompt instructions 100K tokens away:
- read stubs: 'recall("<path>") expands' when the file is unchanged in
session, or 'edited since this read - re-read for current state' when
a later in-session mutation makes the snapshot provably stale
- warm stubs: path-scoped recall recipe when the call targeted a file,
generic 'recall expands' otherwise
Staleness comes from a pure transcript scan (mutating tool calls per
path) threaded into CodecContext as mutatedPaths - no fs I/O in the
transform. Recipes are path-query based, not turn-based: recall turn
numbering is process-scoped and diverges from transcript numbering
(documented seam, follow-up tracked).
The per-turn assembly dashboard line now shows 'N pinned (working set)' when retention pins are active, making the policy observable while dogfooding without opening the prompt inspector.
When validation fails on a missing required property, append a compact required-only skeleton derived from the schema (two levels deep). The dominant observed failure mode is models confidently filling sibling fields while omitting a container property (e.g. task's tasks array, 3/3 sampled errors in session-log mining); naming the missing property alone leaves the model guessing at the shape. Applies to every tool. Type-mismatch failures are unchanged (no skeleton noise when the model already knows the shape).
Tiered recovery ladder on every store operation: 1. reopen the table and retry - the dominant failure is a stale handle after another agent process compacts version manifests (data intact, handle dead) 2. salvage readable rows -> timestamped JSON backup in agentDir -> drop -> recreate -> re-insert salvage -> retry Non-storage errors (bad filters, programmer errors) propagate unchanged; rebuild only triggers on classified Lance storage failures. Compaction cutoff widened from 2 minutes to 24 hours, closing the cross-process prune race that produced ~49% of todo/todos tool errors in session-log mining. Worst-case loss: rows unreadable at corruption time, with the backup attempt on disk either way. Update remains delete+re-insert but is wrapped as a single recovery op so a retry can never lose the row. Verified with real on-disk corruption: destroying the table directory mid-session rebuilds automatically and the store stays operational.
…all recovery
Layer 1 - store data bug: the ingest turn counter was process-lifetime
while the recall store is session-persistent. After a resume, new rows
reused turn numbers already occupied by pre-restart rows of the same
session, so recall(turn=N) merged unrelated turns. The counter now
seeds from RecallStore.maxTurn(sessionId) at wiring time (new method,
5s timeout, fail-open to 0 with a warning).
Layer 2 - lying affordance: system prompt, recall tool prompt, and
composer meta-prompt all claimed stubs carry recall-compatible turn
numbers. They never did. All three now teach the shipped mechanism:
follow the stub's inline recipe (recall("<path>"), or re-read when
flagged 'edited since this read'); for pathless stubs use
recall(query=...) first and take turn numbers from recall results.
One-step turn-precise expansion from stubs (toolCallId-based) is
deferred behind dogfood telemetry per the turn-identity decision.
… stability Front-placed injections (recalled-context, concept-graph, assembly summary) regenerated message 0 every request, invalidating the Anthropic prompt cache for the entire transcript each turn. Measured via proxy: cache write/read 2.3-8.0 for assembler traffic vs ~0.1 control, with reads capped at the system+tools prefix. Tail placement preserves the conversation prefix so writes are incremental.
…equests Two wrong-result bugs (silent bad data, worse than errors): - grep: the RNA structural shortcut replaced text grep while ignoring glob/type/case/multiline/context/offset entirely - a glob-filtered search could return TS symbols and never search the target files. RNA now only fires when no precision filters are present (explicit pre/post only, since context defaults come from settings), and its header states it is a symbol index with the escape hatch spelled out. - read: the RNA structural-view gate checked !offset only, so raw:true and explicit limit still returned the skeleton instead of file content. raw/limit/offset are explicit requests for actual content and now all bypass the structural view.
Two precision fixes to the guard engine: - Quote-blanking: rules now match against the command with quoted segments blanked. Quoted content is shell data, not syntax - echo of literal '>' markers is not a redirection, 'sed -i' inside a python -c string is not a sed invocation. Fixes the echo-as-redirection FP class for every rule generically. - simpleCommandsOnly: grep/find rules skip composite commands (unquoted pipes, chains, substitution). 'find ... | xargs wc' has no find-tool equivalent; blocking it forced guard-sandwich loops. Mutation rules (sed/perl/awk, echo redirection) still apply everywhere. 9 regression tests covering both FP classes and the preserved blocks.
recall(turn=N) calls returning empty/no content are failed recovery attempts - the evidence gate for the deferred toolCallId-based stub expansion slice. Report line now shows 'of which empty/failed: N'.
📝 WalkthroughWalkthroughThis PR introduces a comprehensive governance framework for the oh-omp workstream, improves Anthropic prompt caching by tracking cache-eligible messages early in conversion, implements working-set retention to preserve actively re-read file content across message transforms, enhances codec recovery markers for staleness detection, refines tool interception logic, and adds analysis utilities for measuring context integrity. ChangesWorkstream Governance and Context Assembly Improvements
Sequence DiagramssequenceDiagram
participant Client
participant buildParams
participant convertAnthropicMessagesWithMetadata
participant applyPromptCaching
participant Anthropic API
Client->>buildParams: message array
buildParams->>convertAnthropicMessagesWithMetadata: convert with metadata
convertAnthropicMessagesWithMetadata->>convertAnthropicMessagesWithMetadata: mark non-synthetic user messages
convertAnthropicMessagesWithMetadata-->>buildParams: messages + cacheCandidateIndexes
buildParams->>applyPromptCaching: call with cacheCandidateIndexes
applyPromptCaching->>applyPromptCaching: position cache breakpoints near candidates
applyPromptCaching-->>buildParams: params with cache_control markers
buildParams->>Anthropic API: send request
sequenceDiagram
participant transformMessages
participant mutation detection
participant working-set compute
participant pinning logic
participant codec context
transformMessages->>mutation detection: scan toolCalls for mutating tools
mutation detection-->>transformMessages: mutatedPaths map
transformMessages->>working-set compute: compute retention exemptions
working-set compute->>working-set compute: group re-reads by path+view+hash
working-set compute->>pinning logic: select canonical pinned turn per path
pinning logic->>pinning logic: respect eviction age and token cap
pinning logic-->>transformMessages: exemption turn indices
transformMessages->>codec context: thread mutatedPaths for staleness
codec context-->>codec context: emit recovery recipe hints
🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
🧬 RNA Analysis — oh-omp#75Scope: 37 files · ~3400 lines · Context assembly hardening, bash interceptor correctness, working-set retention, TaskStore recovery, workstream expert system artifacts 1. Context injection moved to tail — cache discipline fix (sdk.ts)
-const finalMessages = contextMessages.length > 0 ? [...contextMessages, ...boundedMessages] : boundedMessages;
+const finalMessages = contextMessages.length > 0 ? [...boundedMessages, ...contextMessages] : boundedMessages;Why: Anthropic prompt caching matches on a stable message prefix. Front-placement regenerated message 0 on every request — recalled-context blocks change per turn, so the entire transcript was rewritten at cache-write prices (measured write/read ratio 2.3–8.0 vs ~0.1 for non-assembler traffic). Tail placement adds one block of fresh input per request without breaking the conversation prefix. CHANGELOG documents the measurement. 2. Working-set retention (settings-schema.ts, sdk.ts, message-transform.ts)Three new assembler settings: 3. Recovery recipes in compression stub headers (read-codec.ts, warm-codec.ts)
4.
|
There was a problem hiding this comment.
Actionable comments posted: 14
🧹 Nitpick comments (1)
packages/ai/test/tool-argument-coercion.test.ts (1)
660-663: ⚡ Quick winAvoid byte-for-byte skeleton formatting assertions.
At Line 661, asserting the full formatted skeleton string couples this test to formatting/order rather than the semantic contract.
As per coding guidelines, "Do not assert exact strings, ordering, and formatting unless downstream code parses or materially depends on the exact bytes."
Proposed hardening
- expect(message).toContain('"tasks": [{ "id": string, "description": string, "assignment": string }]'); + expect(message).toContain('"tasks"'); + expect(message).toContain('"id": string'); + expect(message).toContain('"description": string'); + expect(message).toContain('"assignment": string');🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai/test/tool-argument-coercion.test.ts` around lines 660 - 663, The test currently asserts byte-for-byte skeleton formatting by checking exact formatted substrings against message; instead relax these assertions in the test by asserting the semantic pieces exist (e.g., replace expect(message).toContain('"tasks": [{ "id": string, "description": string, "assignment": string }]') with checks that the message includes the keys/markers like '"tasks"', '"id"', '"description"', '"assignment"' and keep the existing expect(message).toContain('"agent"') (or change it to expect(message).toMatch(/"agent"\s*:/) for whitespace-robustness); alternatively use a regex that verifies presence of the required fields without enforcing ordering/spacing. Target the assertions that reference the variable message and remove any exact skeleton formatting checks.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/ai/test/tool-argument-coercion.test.ts`:
- Around line 666-679: The test currently only asserts inside the catch block so
it can falsely pass if validateToolArguments(taskLikeTool, toolCall) does not
throw; update the test to explicitly assert the failure by either wrapping the
call with Jest's expect(...).toThrow()/toThrowError() or by adding a
fail()/throw after the call so execution cannot silently succeed. Specifically
change the body of the "omits the skeleton for non-missing-property failures"
test to assert that validateToolArguments throws for the given toolCall (or use
expect(() => validateToolArguments(taskLikeTool, toolCall)).toThrow() and keep
the existing catch-based message assertion if you prefer to inspect the error
text).
In `@packages/coding-agent/src/context/assembler/codecs/read-codec.ts`:
- Around line 117-123: The recoveryRecipe function embeds filePath directly into
recall("...") which can produce malformed recipe text for paths with quotes or
control characters; fix this by escaping filePath before interpolation (e.g.,
compute const escaped = JSON.stringify(filePath).slice(1, -1) or an equivalent
escaping routine) and use escaped in the return template instead of filePath so
recall("...") always contains a properly escaped string.
In `@packages/coding-agent/src/context/recall/store.ts`:
- Around line 107-124: The maxTurn function conflates "no rows" and "scan timed
out" by returning null for both; change the timeout branch to reject with a
distinct error so callers can distinguish a timeout from a legitimate null
result. Specifically, in maxTurn replace the timeout Promise (currently
Bun.sleep(5000).then(() => null)) with a Promise that throws/rejects (e.g.,
Bun.sleep(5000).then(() => { throw new Error('maxTurn: timeout'); })), leaving
the scan logic (the scan variable and its null return when no rows) unchanged so
null still means "no rows" while timeouts surface as an exception the caller can
handle.
In `@packages/coding-agent/src/tasks/store.ts`:
- Around line 27-35: isLanceStorageError currently treats any message containing
"manifest" as a storage error which is too broad; narrow the manifest check in
isLanceStorageError so it only matches missing-manifest signals from lancedb
0.26.x (e.g. paths like "_latest.manifest" combined with "not found"/"was not
found", and the explicit "failed to load manifest version" pattern) while
preserving existing behavior such as the test case "failed to load manifest
version 42"; update the manifest alternative in isLanceStorageError to use
targeted regexes like /_latest\.manifest.*(not found|was not found)/i and
/failed to load manifest version/i (remove the generic /manifest/i), leaving
TaskStore.#withRecovery and `#salvageRebuild` logic unchanged.
In `@packages/coding-agent/src/tools/bash-interceptor.ts`:
- Around line 62-63: Update isCompositeCommand to treat newline-separated
commands as composite by checking for newline characters; change the regex used
on blankedCommand to include \n (and optionally \r) so it matches
/\||&&|;|\$\(|`|\r?\n/ (or add a separate /\n/.test check) instead of the
current pattern, ensuring multiline chains like "cmd1\ncmd2" are detected as
composite.
In `@packages/coding-agent/src/tools/grep.ts`:
- Around line 199-207: The branch that chooses RNA symbol results uses
hasPrecisionFilters but only checks pre/post params, so configured defaults like
grep.contextBefore/grep.contextAfter are ignored; update the hasPrecisionFilters
computation to include the effective context variables (e.g.,
normalizedContextBefore and normalizedContextAfter or the computed context
values that combine settings and params) alongside
pre/post/normalizedOffset/etc., so that when effective context > 0 the text-grep
path is taken instead of returning RNA results (adjust the expression near
hasPrecisionFilters and the subsequent if that checks
this.session.settings.get("rna.enabled") && !hasPrecisionFilters &&
looksLikeIdentifier(normalizedPattern)).
In `@packages/coding-agent/test/context/recall/store.test.ts`:
- Around line 32-34: The afterEach teardown removes the temp directory but never
closes the open RecallStore handle; update the afterEach in the test to first
await closing the store (e.g., await store.close() or await recall.close()) with
a guard if the store variable may be undefined, then remove the directory (await
fs.rm(agentDir, { recursive: true, force: true })); ensure the change targets
the afterEach block and uses the existing RecallStore instance name used in the
test so the store is closed before deletion.
In `@packages/coding-agent/test/tasks-store-recovery.test.ts`:
- Around line 38-39: The tests open TaskStore via TaskStore.open but never close
it; wrap each use of the store in a try/finally and call the store's close
method (e.g., await store.close()) in the finally block so file handles/DB
resources are released; update the occurrences around the TaskStore.open call
that creates tasks (the block using store.create([{ content: "task before
corruption" }], "sess-1")) and the other similar opens noted in the comment so
every opened TaskStore is closed even on failure.
- Around line 61-70: The test "non-storage errors propagate without triggering
rebuild" doesn't actually trigger a non-storage failure; either update the test
to simulate a real non-storage error or rename it to reflect its current
behavior. To fix, either (A) stub/mock TaskStore.query (after TaskStore.open and
store.create) to throw a non-storage Error (e.g., throw new Error("query
failure")) and assert await expect(store.query(...)).rejects.toThrow(...) and
verify no rebuild path is invoked, or (B) if you don't want to inject a throw,
rename the test to describe that it verifies a harmless escaped filter resolves
(change the test name string and keep the existing resolved expectation). Ensure
you reference the test named "non-storage errors propagate without triggering
rebuild", the TaskStore.open call, the store.create call, and the store.query
invocation when making the change.
In `@scripts/mine-context-illusion.ts`:
- Around line 52-65: scripts/mine-context-illusion.ts (lines 52-65): In
parseArgs(), validate the parsed since value by checking Number.isNaN(since)
after Date.parse(...) and if invalid print a clear error and exit non‑zero;
return the validated since value and ensure any later code (report footer)
reuses this returned since instead of reparsing process.argv.
scripts/simulate-working-set.ts (lines 304-311): Mirror the same CLI validation
— check the result of Date.parse(...) for NaN, emit a clear error and exit
non‑zero so the sweep does not run and produce a zero-session report; if valid,
pass the validated since value onward. Ensure both places fail fast on invalid
dates rather than continuing with NaN.
- Around line 193-212: The loadSession function currently reads the file and
parses lines manually and can throw if Bun.file(path).text() or JSON.parse
fails; change loadSession to catch read/parse errors and return null on failure,
and use Bun.JSONL.parse() instead of split/JSON.parse. Specifically, in
loadSession wrap the Bun.file(path).text() call and the parsing in a try/catch,
call Bun.JSONL.parse(text) to get an array of records, then iterate those
records and push record.message into messages when record.type === "message" and
push messages.length into compactionBoundaries when
/compact/i.test(record.type); if any read/parse error occurs return null, and
keep the final messages.length === 0 check and return { messages,
compactionBoundaries } otherwise.
In `@scripts/sessions.sh`:
- Around line 116-157: In the --file case inside both subcommand parsing loops
(the '--file' case that assigns FILE_PATH="$2"), guard against missing argument
before reading $2 (with set -u this currently causes an unbound parameter
error); change the --file branch to check that $# -ge 2 (or that "$2" is
non-empty) and if not print a usage/error and exit, otherwise assign
FILE_PATH="$2" and shift 2; apply this same guard in the find branch as well so
both parsing loops validate the --file argument before using $2.
In `@scripts/simulate-working-set.ts`:
- Around line 79-91: The SessionModel currently loses compaction metadata from
loadSession(); extend the SessionModel interface to include compactionBoundaries
(e.g., number[] or Set<number>) and update buildSessionModel() to copy the
compactionBoundaries returned by loadSession() into the model, then modify the
sweep/retention logic that credits "avoided" reads to flush or ignore
pinned/retained data when a request's turn (requestTurns) crosses a compaction
boundary—i.e., detect when requestTurns[i] > boundary and clear retention state
for earlier turns so later reads aren't incorrectly treated as avoided. Ensure
all uses of SessionModel (construction in buildSessionModel and any
retention/sweep functions) are updated to handle the new compactionBoundaries
field.
- Around line 184-203: The touch function can leave pinnedTokens above
config.tokenCap because the eviction loop stops when pinned.size <= 1 or when
the LRU equals the just-touched path; to fix, change the loop to continue
evicting until pinnedTokens <= config.tokenCap or there are no entries left (use
while (pinnedTokens > config.tokenCap && pinned.size > 0)), and remove the
special-case break when lruPath === path so the LRU entry (even the touched one)
can be evicted if it is causing the cap breach; keep using evict(lruPath) and
adjust any dependent accounting as needed.
---
Nitpick comments:
In `@packages/ai/test/tool-argument-coercion.test.ts`:
- Around line 660-663: The test currently asserts byte-for-byte skeleton
formatting by checking exact formatted substrings against message; instead relax
these assertions in the test by asserting the semantic pieces exist (e.g.,
replace expect(message).toContain('"tasks": [{ "id": string, "description":
string, "assignment": string }]') with checks that the message includes the
keys/markers like '"tasks"', '"id"', '"description"', '"assignment"' and keep
the existing expect(message).toContain('"agent"') (or change it to
expect(message).toMatch(/"agent"\s*:/) for whitespace-robustness); alternatively
use a regex that verifies presence of the required fields without enforcing
ordering/spacing. Target the assertions that reference the variable message and
remove any exact skeleton formatting checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 38f8e0f5-a42b-4ee3-999f-996470fd373c
📒 Files selected for processing (35)
.gitignore.oh/workstreams/oh-omp/EXPERT-SYSTEM.md.oh/workstreams/oh-omp/FRAME-model-routing-superseded.md.oh/workstreams/oh-omp/FRAME.mdpackages/ai/CHANGELOG.mdpackages/ai/src/providers/anthropic.tspackages/ai/src/utils/validation.tspackages/ai/test/anthropic-alignment.test.tspackages/ai/test/tool-argument-coercion.test.tspackages/coding-agent/CHANGELOG.mdpackages/coding-agent/src/config/settings-schema.tspackages/coding-agent/src/context/assembler/codecs/read-codec.tspackages/coding-agent/src/context/assembler/codecs/warm-codec.tspackages/coding-agent/src/context/assembler/message-transform.tspackages/coding-agent/src/context/assembler/types.tspackages/coding-agent/src/context/assembly-summary.tspackages/coding-agent/src/context/recall/store.tspackages/coding-agent/src/prompts/agents/workstream-expert.mdpackages/coding-agent/src/prompts/commands/workstream-expert.mdpackages/coding-agent/src/prompts/composer/meta-prompt.mdpackages/coding-agent/src/prompts/system/system-prompt.mdpackages/coding-agent/src/prompts/tools/recall.mdpackages/coding-agent/src/sdk.tspackages/coding-agent/src/tasks/store.tspackages/coding-agent/src/tools/bash-interceptor.tspackages/coding-agent/src/tools/grep.tspackages/coding-agent/src/tools/read.tspackages/coding-agent/test/assembly-summary.test.tspackages/coding-agent/test/bash-interceptor.test.tspackages/coding-agent/test/context/recall/store.test.tspackages/coding-agent/test/message-transform.test.tspackages/coding-agent/test/tasks-store-recovery.test.tsscripts/mine-context-illusion.tsscripts/sessions.shscripts/simulate-working-set.ts
| it("omits the skeleton for non-missing-property failures", () => { | ||
| const toolCall: ToolCall = { | ||
| type: "toolCall", | ||
| id: "call-skel-2", | ||
| name: "task", | ||
| arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] }, | ||
| }; | ||
| try { | ||
| validateToolArguments(taskLikeTool, toolCall); | ||
| } catch (err) { | ||
| const message = String(err instanceof Error ? err.message : err); | ||
| expect(message).not.toContain("Required shape:"); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Assert the failure path explicitly in this negative test.
At Line 666, this test can pass when validateToolArguments(...) does not throw, because there is no assertion outside the catch block.
As per coding guidelines, "Every new test must defend one concrete, externally observable contract... Never add placeholder tests, tautologies, or assertions that only prove code executed."
Proposed fix
it("omits the skeleton for non-missing-property failures", () => {
const toolCall: ToolCall = {
type: "toolCall",
id: "call-skel-2",
name: "task",
arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] },
};
- try {
- validateToolArguments(taskLikeTool, toolCall);
- } catch (err) {
- const message = String(err instanceof Error ? err.message : err);
- expect(message).not.toContain("Required shape:");
- }
+ let message: string | null = null;
+ try {
+ validateToolArguments(taskLikeTool, toolCall);
+ } catch (err) {
+ message = String(err instanceof Error ? err.message : err);
+ }
+ expect(message).not.toBeNull();
+ expect(message).not.toContain("Required shape:");
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it("omits the skeleton for non-missing-property failures", () => { | |
| const toolCall: ToolCall = { | |
| type: "toolCall", | |
| id: "call-skel-2", | |
| name: "task", | |
| arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] }, | |
| }; | |
| try { | |
| validateToolArguments(taskLikeTool, toolCall); | |
| } catch (err) { | |
| const message = String(err instanceof Error ? err.message : err); | |
| expect(message).not.toContain("Required shape:"); | |
| } | |
| }); | |
| it("omits the skeleton for non-missing-property failures", () => { | |
| const toolCall: ToolCall = { | |
| type: "toolCall", | |
| id: "call-skel-2", | |
| name: "task", | |
| arguments: { agent: 42, tasks: [{ id: "a", description: "b", assignment: "c" }] }, | |
| }; | |
| let message: string | null = null; | |
| try { | |
| validateToolArguments(taskLikeTool, toolCall); | |
| } catch (err) { | |
| message = String(err instanceof Error ? err.message : err); | |
| } | |
| expect(message).not.toBeNull(); | |
| expect(message).not.toContain("Required shape:"); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/ai/test/tool-argument-coercion.test.ts` around lines 666 - 679, The
test currently only asserts inside the catch block so it can falsely pass if
validateToolArguments(taskLikeTool, toolCall) does not throw; update the test to
explicitly assert the failure by either wrapping the call with Jest's
expect(...).toThrow()/toThrowError() or by adding a fail()/throw after the call
so execution cannot silently succeed. Specifically change the body of the "omits
the skeleton for non-missing-property failures" test to assert that
validateToolArguments throws for the given toolCall (or use expect(() =>
validateToolArguments(taskLikeTool, toolCall)).toThrow() and keep the existing
catch-based message assertion if you prefer to inspect the error text).
Source: Coding guidelines
| function recoveryRecipe(ctx: CodecContext, filePath: string): string { | ||
| const mutatedAt = ctx.mutatedPaths?.get(filePath); | ||
| if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) { | ||
| return " | edited since this read — re-read for current state"; | ||
| } | ||
| return ` | unchanged in session — recall("${filePath}") expands`; | ||
| } |
There was a problem hiding this comment.
Escape filePath before embedding it in recall("...") recipe text.
The current string interpolation can produce malformed recipe text for valid paths containing quotes/control characters (Line 122), which weakens the recovery contract.
Proposed fix
diff --git a/packages/coding-agent/src/context/assembler/codecs/read-codec.ts b/packages/coding-agent/src/context/assembler/codecs/read-codec.ts
@@
function recoveryRecipe(ctx: CodecContext, filePath: string): string {
const mutatedAt = ctx.mutatedPaths?.get(filePath);
if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) {
return " | edited since this read — re-read for current state";
}
- return ` | unchanged in session — recall("${filePath}") expands`;
+ const quotedPath = JSON.stringify(filePath);
+ return ` | unchanged in session — recall(${quotedPath}) expands`;
}
diff --git a/packages/coding-agent/src/context/assembler/codecs/warm-codec.ts b/packages/coding-agent/src/context/assembler/codecs/warm-codec.ts
@@
- } else if (path) {
- headerParts.push(`recall("${path}") expands`);
+ } else if (path) {
+ headerParts.push(`recall(${JSON.stringify(path)}) expands`);
} else {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function recoveryRecipe(ctx: CodecContext, filePath: string): string { | |
| const mutatedAt = ctx.mutatedPaths?.get(filePath); | |
| if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) { | |
| return " | edited since this read — re-read for current state"; | |
| } | |
| return ` | unchanged in session — recall("${filePath}") expands`; | |
| } | |
| function recoveryRecipe(ctx: CodecContext, filePath: string): string { | |
| const mutatedAt = ctx.mutatedPaths?.get(filePath); | |
| if (mutatedAt !== undefined && mutatedAt > ctx.turnIndex) { | |
| return " | edited since this read — re-read for current state"; | |
| } | |
| const quotedPath = JSON.stringify(filePath); | |
| return ` | unchanged in session — recall(${quotedPath}) expands`; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/coding-agent/src/context/assembler/codecs/read-codec.ts` around
lines 117 - 123, The recoveryRecipe function embeds filePath directly into
recall("...") which can produce malformed recipe text for paths with quotes or
control characters; fix this by escaping filePath before interpolation (e.g.,
compute const escaped = JSON.stringify(filePath).slice(1, -1) or an equivalent
escaping routine) and use escaped in the return template instead of filePath so
recall("...") always contains a properly escaped string.
| async maxTurn(sessionId: string): Promise<number | null> { | ||
| const filter = `session_id = '${sessionId.replace(/'/g, "''")}'`; | ||
| const scan = this.#table | ||
| .query() | ||
| .where(filter) | ||
| .select(["turn"]) | ||
| .toArray() | ||
| .then(rows => { | ||
| let max: number | null = null; | ||
| for (const row of rows as { turn: number }[]) { | ||
| const turn = Number(row.turn); | ||
| if (Number.isFinite(turn) && (max === null || turn > max)) max = turn; | ||
| } | ||
| return max; | ||
| }); | ||
| const timeout = Bun.sleep(5000).then(() => null); | ||
| return Promise.race([scan, timeout]); | ||
| } |
There was a problem hiding this comment.
Differentiate timeout from “no rows” in maxTurn.
maxTurn currently returns null for both “session has no rows” and “scan timed out”. That conflation is unsafe for resume seeding: downstream logic treats null as “start from 0”, which can reuse turn numbers after restart.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/coding-agent/src/context/recall/store.ts` around lines 107 - 124,
The maxTurn function conflates "no rows" and "scan timed out" by returning null
for both; change the timeout branch to reject with a distinct error so callers
can distinguish a timeout from a legitimate null result. Specifically, in
maxTurn replace the timeout Promise (currently Bun.sleep(5000).then(() => null))
with a Promise that throws/rejects (e.g., Bun.sleep(5000).then(() => { throw new
Error('maxTurn: timeout'); })), leaving the scan logic (the scan variable and
its null return when no rows) unchanged so null still means "no rows" while
timeouts surface as an exception the caller can handle.
| export function isLanceStorageError(err: unknown): boolean { | ||
| const message = err instanceof Error ? err.message : String(err); | ||
| return ( | ||
| /object at location .* not found/i.test(message) || | ||
| /no such file or directory/i.test(message) || | ||
| /manifest/i.test(message) || | ||
| /dataset .*(not found|corrupt)/i.test(message) || | ||
| /invalid .*version/i.test(message) | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For @lancedb/lancedb 0.26.x, what are the canonical error-message patterns for (1) missing object file, (2) missing manifest version, and (3) corrupt dataset metadata?
💡 Result:
Canonical error-message patterns in @lancedb/lancedb 0.26.x (as surfaced via the underlying LanceIO/LanceError strings) for the three cases: 1) Missing object file Pattern (JS/Python often prefixes with “lance error: …” and wraps a LanceError(IO) “Not found”): “lance error: LanceError(IO): External error: Not found: <full/path-or-uri-to-missing-file>” [1] 2) Missing manifest version Pattern (missing _latest.manifest when no manifest/version pointer exists, typically from opening a table/dataset whose manifest pointer file is absent): “ValueError: Dataset at path .lance/_latest.manifest was not found: Not found: .lance/_latest.manifest” [2] 3) Corrupt dataset metadata Pattern (table exists on disk but can’t be opened because the internal dataset files/manifests/metadata are missing or corrupt; surfaces as a “Table … was not found”/not-loadable message): “ValueError: Table '' was not found” (when the.lance directory is present but contains no valid manifest/metadata) [3] If you want, paste the exact stack trace/error text you’re seeing in 0.26.x and I’ll map it to the closest canonical pattern above.
Citations:
- 1: bug(python): RuntimeError: lance error: LanceError(IO): External error: Not found (maybe NFS related) lancedb/lancedb#2963
- 2: open_table + create_table error implementation lancedb/lancedb#216
- 3: bug: table_names() lists table with corrupt/incomplete .lance directory that open_table() cannot open lancedb/lancedb#3127
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and print the relevant function implementation
FILE="packages/coding-agent/src/tasks/store.ts"
echo "---- $FILE (lines around 27-60) ----"
nl -ba "$FILE" | sed -n '1,120p'
# Find all usages of isLanceStorageError
echo "---- usages of isLanceStorageError ----"
rg -n "isLanceStorageError" packages/coding-agent/src/tasks -S || true
rg -n "isLanceStorageError" packages/coding-agent -S || trueRepository: open-horizon-labs/oh-omp
Length of output: 201
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Print only the specific line range the review references (27-60) more narrowly
FILE="packages/coding-agent/src/tasks/store.ts"
echo "---- $FILE (lines 20-60) ----"
nl -ba "$FILE" | sed -n '20,70p'
# Also print some broader context for how classifier impacts salvage/drop/rebuild
echo "---- $FILE (lines 1-160) ----"
nl -ba "$FILE" | sed -n '1,180p'Repository: open-horizon-labs/oh-omp
Length of output: 194
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="packages/coding-agent/src/tasks/store.ts"
echo "---- $FILE (lines 20-70) ----"
nl -ba "$FILE" | sed -n '20,70p'
echo "---- usages of isLanceStorageError ----"
rg -n "isLanceStorageError" packages/coding-agent -SRepository: open-horizon-labs/oh-omp
Length of output: 194
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="packages/coding-agent/src/tasks/store.ts"
echo "---- $FILE (lines 20-120) ----"
python3 - <<PY
from pathlib import Path
p = Path("$FILE")
lines = p.read_text().splitlines()
start, end = 20, 120 # 1-based in print
for i in range(start, min(end, len(lines)) + 1):
print(f"{i:4d}\t{lines[i-1]}")
PY
echo "---- $FILE (exact isLanceStorageError body) ----"
python3 - <<PY
import re
from pathlib import Path
p = Path("$FILE")
txt = p.read_text()
m = re.search(r"export function isLanceStorageError\\(err: unknown\\): boolean\\s*\\{([\\s\\S]*?)\\n\\}", txt)
if not m:
print("NOT FOUND")
else:
body = m.group(0)
print(body)
PY
echo "---- usages of isLanceStorageError ----"
rg -n "isLanceStorageError" packages/coding-agent -SRepository: open-horizon-labs/oh-omp
Length of output: 5433
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="packages/coding-agent/src/tasks/store.ts"
echo "---- $FILE (lines 20-120) ----"
python3 - <<PY
from pathlib import Path
p = Path("$FILE")
lines = p.read_text().splitlines()
start, end = 20, 120 # 1-based
for i in range(start, min(end, len(lines)) + 1):
print(f"{i:4d}\t{lines[i-1]}")
PY
echo "---- $FILE (exact isLanceStorageError body) ----"
python3 - <<PY
import re
from pathlib import Path
p = Path("$FILE")
txt = p.read_text()
m = re.search(r"export function isLanceStorageError\\(err: unknown\\): boolean\\s*\\{([\\s\\S]*?)\\n\\}", txt)
print(m.group(0) if m else "NOT FOUND")
PY
echo "---- usages of isLanceStorageError ----"
rg -n "isLanceStorageError" packages/coding-agent -SRepository: open-horizon-labs/oh-omp
Length of output: 5433
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="packages/coding-agent/test/tasks-store-recovery.test.ts"
echo "---- $FILE ----"
python3 - <<PY
from pathlib import Path
p=Path("$FILE")
lines=p.read_text().splitlines()
for i,l in enumerate(lines, start=1):
print(f"{i:4d}\t{l}")
PYRepository: open-horizon-labs/oh-omp
Length of output: 3581
Narrow isLanceStorageError’s /manifest/i to the missing-manifest patterns only.
TaskStore.#withRecovery() uses this classifier to decide when to run the destructive #salvageRebuild() tier, and matching any /manifest/i substring is overly broad. Keep the existing intended behavior (e.g. the test case "failed to load manifest version 42") but restrict the regex to lancedb 0.26.x missing-manifest signals like /_latest\.manifest/ / “was not found” / “failed to load manifest version”, rather than any message containing “manifest”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/coding-agent/src/tasks/store.ts` around lines 27 - 35,
isLanceStorageError currently treats any message containing "manifest" as a
storage error which is too broad; narrow the manifest check in
isLanceStorageError so it only matches missing-manifest signals from lancedb
0.26.x (e.g. paths like "_latest.manifest" combined with "not found"/"was not
found", and the explicit "failed to load manifest version" pattern) while
preserving existing behavior such as the test case "failed to load manifest
version 42"; update the manifest alternative in isLanceStorageError to use
targeted regexes like /_latest\.manifest.*(not found|was not found)/i and
/failed to load manifest version/i (remove the generic /manifest/i), leaving
TaskStore.#withRecovery and `#salvageRebuild` logic unchanged.
| function isCompositeCommand(blankedCommand: string): boolean { | ||
| return /\||&&|;|\$\(|`/.test(blankedCommand); |
There was a problem hiding this comment.
Treat newline-separated shell chains as composite commands too.
isCompositeCommand() only recognizes one-line separators here. A multiline bash snippet like find . -name '*.ts'\nwc -l is still a composite command, but this helper returns false, so simpleCommandsOnly rules keep blocking the newline form while allowing the equivalent ;/| form.
Suggested fix
function isCompositeCommand(blankedCommand: string): boolean {
- return /\||&&|;|\$\(|`/.test(blankedCommand);
+ return /\||&&|;|\n|\$\(|`/.test(blankedCommand);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function isCompositeCommand(blankedCommand: string): boolean { | |
| return /\||&&|;|\$\(|`/.test(blankedCommand); | |
| function isCompositeCommand(blankedCommand: string): boolean { | |
| return /\||&&|;|\n|\$\(|`/.test(blankedCommand); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/coding-agent/src/tools/bash-interceptor.ts` around lines 62 - 63,
Update isCompositeCommand to treat newline-separated commands as composite by
checking for newline characters; change the regex used on blankedCommand to
include \n (and optionally \r) so it matches /\||&&|;|\$\(|`|\r?\n/ (or add a
separate /\n/.test check) instead of the current pattern, ensuring multiline
chains like "cmd1\ncmd2" are detected as composite.
| function parseArgs(): CliArgs { | ||
| // Mutated via --working-set: replay history under the retention policy. | ||
|
|
||
| const args = process.argv.slice(2); | ||
| let since = Date.parse("2026-05-04"); | ||
|
|
||
| let jsonOut: string | undefined; | ||
| for (let i = 0; i < args.length; i++) { | ||
| if (args[i] === "--since" && args[i + 1]) since = Date.parse(args[++i]!); | ||
| if (args[i] === "--json" && args[i + 1]) jsonOut = args[++i]; | ||
| if (args[i] === "--working-set") WORKING_SET_REPLAY.enabled = true; | ||
| } | ||
| return { since, jsonOut }; | ||
| } |
There was a problem hiding this comment.
Validate --since once and fail fast on invalid dates. Both CLIs pass Date.parse(...) through unchecked. A bad date becomes NaN; the miner later crashes when formatting the report date, while the simulator silently reports an empty corpus.
scripts/mine-context-illusion.ts#L52-L65: rejectNaNinparseArgs()and reuse the validatedsincevalue in the report footer instead of reparsingprocess.argv.scripts/simulate-working-set.ts#L304-L311: rejectNaNduring CLI parsing so the sweep does not emit a false zero-session report.
📍 Affects 2 files
scripts/mine-context-illusion.ts#L52-L65(this comment)scripts/simulate-working-set.ts#L304-L311
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/mine-context-illusion.ts` around lines 52 - 65,
scripts/mine-context-illusion.ts (lines 52-65): In parseArgs(), validate the
parsed since value by checking Number.isNaN(since) after Date.parse(...) and if
invalid print a clear error and exit non‑zero; return the validated since value
and ensure any later code (report footer) reuses this returned since instead of
reparsing process.argv. scripts/simulate-working-set.ts (lines 304-311): Mirror
the same CLI validation — check the result of Date.parse(...) for NaN, emit a
clear error and exit non‑zero so the sweep does not run and produce a
zero-session report; if valid, pass the validated since value onward. Ensure
both places fail fast on invalid dates rather than continuing with NaN.
| export async function loadSession(path: string): Promise<LoadedSession | null> { | ||
| const text = await Bun.file(path).text(); | ||
| const messages: AgentMessage[] = []; | ||
| const compactionBoundaries: number[] = []; | ||
| for (const line of text.split("\n")) { | ||
| if (!line.trim()) continue; | ||
| let record: { type?: string; message?: AgentMessage }; | ||
| try { | ||
| record = JSON.parse(line) as { type?: string; message?: AgentMessage }; | ||
| } catch { | ||
| continue; | ||
| } | ||
| if (record.type === "message" && record.message) { | ||
| messages.push(record.message); | ||
| } else if (record.type && /compact/i.test(record.type)) { | ||
| compactionBoundaries.push(messages.length); | ||
| } | ||
| } | ||
| if (messages.length === 0) return null; | ||
| return { messages, compactionBoundaries }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show relevant portions of the file
file="scripts/mine-context-illusion.ts"
echo "== Imports (approx lines 1-60) =="
nl -ba "$file" | sed -n '1,80p'
echo "== loadSession implementation (approx lines 150-260) =="
nl -ba "$file" | sed -n '150,260p'
echo "== Call sites of loadSession =="
rg -n "loadSession\(" -S scripts | head -n 50Repository: open-horizon-labs/oh-omp
Length of output: 163
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="scripts/mine-context-illusion.ts"
echo "== Imports (approx lines 1-80) =="
nl -ba "$file" | sed -n '1,80p'
echo "== loadSession implementation (approx lines 150-260) =="
nl -ba "$file" | sed -n '150,260p'
echo "== Call sites of loadSession =="
rg -n "loadSession\(" -S scripts | head -n 50Repository: open-horizon-labs/oh-omp
Length of output: 163
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="scripts/mine-context-illusion.ts"
echo "== Imports (approx lines 1-80) =="
nl -ba "$file" | sed -n '1,80p'
echo "== loadSession implementation (approx lines 150-260) =="
nl -ba "$file" | sed -n '150,260p'
echo "== Call sites of loadSession =="
rg -n "loadSession\(" -S scripts | head -n 50Repository: open-horizon-labs/oh-omp
Length of output: 163
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="scripts/mine-context-illusion.ts"
echo "== Top of file (1-80) =="
sed -n '1,80p' "$file"
echo
echo "== loadSession area (160-260) =="
sed -n '160,260p' "$file"
echo
echo "== loadSession call sites =="
rg -n "loadSession\(" -S scripts | head -n 200Repository: open-horizon-labs/oh-omp
Length of output: 6818
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect call sites where loadSession() is used
sed -n '70,140p' scripts/simulate-working-set.ts
echo
sed -n '300,420p' scripts/mine-context-illusion.tsRepository: open-horizon-labs/oh-omp
Length of output: 6479
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,180p' scripts/simulate-working-set.ts
echo
sed -n '180,360p' scripts/simulate-working-set.tsRepository: open-horizon-labs/oh-omp
Length of output: 13673
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="scripts/mine-context-illusion.ts"
echo "== main() loadSession call area (300-380) =="
sed -n '300,420p' "$file"
echo
echo "== Any other loadSession usage =="
rg -n "loadSession\(" -S scriptsRepository: open-horizon-labs/oh-omp
Length of output: 4478
Harden loadSession() against unreadable/transient session files and switch to Bun.JSONL.parse()
loadSession() can throw when Bun.file(path).text() fails (permission denied, deleted mid-run, etc.), which aborts both CLI scripts; callers only skip when it returns null. Catch read/parse errors and return null. Also replace manual JSONL parsing (split("\n") + JSON.parse) with Bun.JSONL.parse() to match repo guidelines.
Suggested direction
export async function loadSession(path: string): Promise<LoadedSession | null> {
- const text = await Bun.file(path).text();
- const messages: AgentMessage[] = [];
- const compactionBoundaries: number[] = [];
- for (const line of text.split("\n")) {
- if (!line.trim()) continue;
- let record: { type?: string; message?: AgentMessage };
- try {
- record = JSON.parse(line) as { type?: string; message?: AgentMessage };
- } catch {
- continue;
- }
+ let records: Array<{ type?: string; message?: AgentMessage }>;
+ try {
+ records = Bun.JSONL.parse(await Bun.file(path).text()) as Array<{
+ type?: string;
+ message?: AgentMessage;
+ }>;
+ } catch {
+ return null;
+ }
+ const messages: AgentMessage[] = [];
+ const compactionBoundaries: number[] = [];
+ for (const record of records) {
if (record.type === "message" && record.message) {
messages.push(record.message);
} else if (record.type && /compact/i.test(record.type)) {
compactionBoundaries.push(messages.length);
}
}
if (messages.length === 0) return null;
return { messages, compactionBoundaries };
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export async function loadSession(path: string): Promise<LoadedSession | null> { | |
| const text = await Bun.file(path).text(); | |
| const messages: AgentMessage[] = []; | |
| const compactionBoundaries: number[] = []; | |
| for (const line of text.split("\n")) { | |
| if (!line.trim()) continue; | |
| let record: { type?: string; message?: AgentMessage }; | |
| try { | |
| record = JSON.parse(line) as { type?: string; message?: AgentMessage }; | |
| } catch { | |
| continue; | |
| } | |
| if (record.type === "message" && record.message) { | |
| messages.push(record.message); | |
| } else if (record.type && /compact/i.test(record.type)) { | |
| compactionBoundaries.push(messages.length); | |
| } | |
| } | |
| if (messages.length === 0) return null; | |
| return { messages, compactionBoundaries }; | |
| export async function loadSession(path: string): Promise<LoadedSession | null> { | |
| let records: Array<{ type?: string; message?: AgentMessage }>; | |
| try { | |
| records = Bun.JSONL.parse(await Bun.file(path).text()) as Array<{ | |
| type?: string; | |
| message?: AgentMessage; | |
| }>; | |
| } catch { | |
| return null; | |
| } | |
| const messages: AgentMessage[] = []; | |
| const compactionBoundaries: number[] = []; | |
| for (const record of records) { | |
| if (record.type === "message" && record.message) { | |
| messages.push(record.message); | |
| } else if (record.type && /compact/i.test(record.type)) { | |
| compactionBoundaries.push(messages.length); | |
| } | |
| } | |
| if (messages.length === 0) return null; | |
| return { messages, compactionBoundaries }; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/mine-context-illusion.ts` around lines 193 - 212, The loadSession
function currently reads the file and parses lines manually and can throw if
Bun.file(path).text() or JSON.parse fails; change loadSession to catch
read/parse errors and return null on failure, and use Bun.JSONL.parse() instead
of split/JSON.parse. Specifically, in loadSession wrap the Bun.file(path).text()
call and the parsing in a try/catch, call Bun.JSONL.parse(text) to get an array
of records, then iterate those records and push record.message into messages
when record.type === "message" and push messages.length into
compactionBoundaries when /compact/i.test(record.type); if any read/parse error
occurs return null, and keep the final messages.length === 0 check and return {
messages, compactionBoundaries } otherwise.
Source: Coding guidelines
| while [[ $# -gt 0 ]]; do | ||
| case "$1" in | ||
| --file) | ||
| FILE_PATH="$2" | ||
| shift 2 | ||
| ;; | ||
| --) | ||
| shift | ||
| break | ||
| ;; | ||
| *) | ||
| if [[ -z "$key" ]]; then | ||
| key="$1" | ||
| elif [[ -z "$sid" ]]; then | ||
| sid="$1" | ||
| else | ||
| echo "add only accepts <session-key> and <session-id>" >&2 | ||
| usage | ||
| exit 1 | ||
| fi | ||
| shift | ||
| ;; | ||
| esac | ||
| done | ||
|
|
||
| if [[ -z "$key" || -z "$sid" ]]; then | ||
| echo "add needs <session-key> and <session-id>" >&2 | ||
| usage | ||
| exit 1 | ||
| fi | ||
|
|
||
| add_session "$key" "$sid" | ||
| ;; | ||
|
|
||
| find) | ||
| local query="" | ||
|
|
||
| while [[ $# -gt 0 ]]; do | ||
| case "$1" in | ||
| --file) | ||
| FILE_PATH="$2" | ||
| shift 2 |
There was a problem hiding this comment.
Validate --file before reading $2.
Both subcommands do FILE_PATH="$2" under set -u. sessions.sh add foo bar --file exits on an unbound positional parameter instead of returning a normal usage error.
Suggested fix
while [[ $# -gt 0 ]]; do
case "$1" in
--file)
+ if [[ $# -lt 2 ]]; then
+ echo "--file requires a path" >&2
+ usage
+ exit 1
+ fi
FILE_PATH="$2"
shift 2
;;Apply the same guard in the find branch as well.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| while [[ $# -gt 0 ]]; do | |
| case "$1" in | |
| --file) | |
| FILE_PATH="$2" | |
| shift 2 | |
| ;; | |
| --) | |
| shift | |
| break | |
| ;; | |
| *) | |
| if [[ -z "$key" ]]; then | |
| key="$1" | |
| elif [[ -z "$sid" ]]; then | |
| sid="$1" | |
| else | |
| echo "add only accepts <session-key> and <session-id>" >&2 | |
| usage | |
| exit 1 | |
| fi | |
| shift | |
| ;; | |
| esac | |
| done | |
| if [[ -z "$key" || -z "$sid" ]]; then | |
| echo "add needs <session-key> and <session-id>" >&2 | |
| usage | |
| exit 1 | |
| fi | |
| add_session "$key" "$sid" | |
| ;; | |
| find) | |
| local query="" | |
| while [[ $# -gt 0 ]]; do | |
| case "$1" in | |
| --file) | |
| FILE_PATH="$2" | |
| shift 2 | |
| while [[ $# -gt 0 ]]; do | |
| case "$1" in | |
| --file) | |
| if [[ $# -lt 2 ]]; then | |
| echo "--file requires a path" >&2 | |
| usage | |
| exit 1 | |
| fi | |
| FILE_PATH="$2" | |
| shift 2 | |
| ;; | |
| --) | |
| shift | |
| break | |
| ;; | |
| *) | |
| if [[ -z "$key" ]]; then | |
| key="$1" | |
| elif [[ -z "$sid" ]]; then | |
| sid="$1" | |
| else | |
| echo "add only accepts <session-key> and <session-id>" >&2 | |
| usage | |
| exit 1 | |
| fi | |
| shift | |
| ;; | |
| esac | |
| done | |
| if [[ -z "$key" || -z "$sid" ]]; then | |
| echo "add needs <session-key> and <session-id>" >&2 | |
| usage | |
| exit 1 | |
| fi | |
| add_session "$key" "$sid" | |
| ;; | |
| find) | |
| local query="" | |
| while [[ $# -gt 0 ]]; do | |
| case "$1" in | |
| --file) | |
| if [[ $# -lt 2 ]]; then | |
| echo "--file requires a path" >&2 | |
| usage | |
| exit 1 | |
| fi | |
| FILE_PATH="$2" | |
| shift 2 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/sessions.sh` around lines 116 - 157, In the --file case inside both
subcommand parsing loops (the '--file' case that assigns FILE_PATH="$2"), guard
against missing argument before reading $2 (with set -u this currently causes an
unbound parameter error); change the --file branch to check that $# -ge 2 (or
that "$2" is non-empty) and if not print a usage/error and exit, otherwise
assign FILE_PATH="$2" and shift 2; apply this same guard in the find branch as
well so both parsing loops validate the --file argument before using $2.
| interface SessionModel { | ||
| calls: ToolCallRef[]; | ||
| /** message idx -> turn idx (full-session segmentation) */ | ||
| turnOfMessage: Int32Array; | ||
| turnCount: number; | ||
| /** per message idx of a toolResult: estimated full tokens */ | ||
| fullTokens: Map<number, number>; | ||
| /** per message idx of a toolResult: estimated codec'd tokens */ | ||
| codecTokens: Map<number, number>; | ||
| resultIdxByCallId: Map<string, number>; | ||
| /** assistant message idx for each request, ascending */ | ||
| requestTurns: number[]; | ||
| } |
There was a problem hiding this comment.
Reset simulated state at compaction boundaries.
loadSession() returns compactionBoundaries because turns before those markers were no longer visible to the model. buildSessionModel() drops that metadata, so the sweep can credit a later read as "avoided" based on pinned content that should have been cleared after compaction. Carry the boundaries into SessionModel and flush the retention state when a request crosses one.
Also applies to: 97-155
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/simulate-working-set.ts` around lines 79 - 91, The SessionModel
currently loses compaction metadata from loadSession(); extend the SessionModel
interface to include compactionBoundaries (e.g., number[] or Set<number>) and
update buildSessionModel() to copy the compactionBoundaries returned by
loadSession() into the model, then modify the sweep/retention logic that credits
"avoided" reads to flush or ignore pinned/retained data when a request's turn
(requestTurns) crosses a compaction boundary—i.e., detect when requestTurns[i] >
boundary and clear retention state for earlier turns so later reads aren't
incorrectly treated as avoided. Ensure all uses of SessionModel (construction in
buildSessionModel and any retention/sweep functions) are updated to handle the
new compactionBoundaries field.
| const touch = (path: string, resultIdx: number, turn: number): void => { | ||
| const tokens = fullTokens.get(resultIdx) ?? 0; | ||
| const existing = pinned.get(path); | ||
| if (existing) pinnedTokens -= existing.tokens; | ||
| pinned.set(path, { lastTouchTurn: turn, resultIdx, tokens }); | ||
| pinnedTokens += tokens; | ||
| result.pinEntries++; | ||
| // LRU eviction over token cap. | ||
| while (pinnedTokens > config.tokenCap && pinned.size > 1) { | ||
| let lruPath: string | null = null; | ||
| let lruTurn = Number.POSITIVE_INFINITY; | ||
| for (const [p, e] of pinned) { | ||
| if (e.lastTouchTurn < lruTurn) { | ||
| lruTurn = e.lastTouchTurn; | ||
| lruPath = p; | ||
| } | ||
| } | ||
| if (lruPath === null || lruPath === path) break; | ||
| evict(lruPath); | ||
| } |
There was a problem hiding this comment.
The tokenCap invariant is not enforced for oversized entries.
If one pinned result is larger than tokenCap, or the just-touched path is selected as the LRU entry, this loop breaks and leaves pinnedTokens above the configured cap. That makes both the reported peak footprint and the "breaks avoided" counts too optimistic.
Suggested fix
const touch = (path: string, resultIdx: number, turn: number): void => {
const tokens = fullTokens.get(resultIdx) ?? 0;
+ if (tokens > config.tokenCap) {
+ evict(path);
+ return;
+ }
const existing = pinned.get(path);
if (existing) pinnedTokens -= existing.tokens;
pinned.set(path, { lastTouchTurn: turn, resultIdx, tokens });
pinnedTokens += tokens;
result.pinEntries++;
// LRU eviction over token cap.
- while (pinnedTokens > config.tokenCap && pinned.size > 1) {
+ while (pinnedTokens > config.tokenCap && pinned.size > 0) {
let lruPath: string | null = null;
let lruTurn = Number.POSITIVE_INFINITY;
for (const [p, e] of pinned) {
if (e.lastTouchTurn < lruTurn) {
lruTurn = e.lastTouchTurn;
lruPath = p;
}
}
- if (lruPath === null || lruPath === path) break;
+ if (lruPath === null) break;
evict(lruPath);
}
};🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/simulate-working-set.ts` around lines 184 - 203, The touch function
can leave pinnedTokens above config.tokenCap because the eviction loop stops
when pinned.size <= 1 or when the LRU equals the just-touched path; to fix,
change the loop to continue evicting until pinnedTokens <= config.tokenCap or
there are no entries left (use while (pinnedTokens > config.tokenCap &&
pinned.size > 0)), and remove the special-case break when lruPath === path so
the LRU entry (even the touched one) can be evicted if it is causing the cap
breach; keep using evict(lruPath) and adjust any dependent accounting as needed.
Summary
Verification
git diff --check -- .oh/workstreams/oh-omp/EXPERT-SYSTEM.md packages/coding-agent/src/prompts/agents/workstream-expert.md packages/coding-agent/src/prompts/commands/workstream-expert.mdDECISION: ALLOW,CONFIDENCE: highNotes
2bed71690.Summary by CodeRabbit
Bug Fixes
New Features
Improvements