feat(reborn): Context management — progressive tool disclosure (flag-gated, default off)#5149
feat(reborn): Context management — progressive tool disclosure (flag-gated, default off)#5149serrrfirat wants to merge 36 commits into
Conversation
…sure + NEAR AI prompt caching + compression) Council-signed fusion design (Opus 4.8 ACCEPT, GPT-5.5 xhigh ACCEPT_WITH_NONBLOCKING_NOTES, zero blockers) for cutting the per-call prompt that drives the NEAR AI latency/timeouts: ~91 tool schemas + system prompt + growing history = ~25.8k tokens, re-sent ~4x per turn. Three parts, all flag-gated and default-off behind a Phase-0 shadow: - Tool disclosure: bridges-always + append-only earned promotion (CapabilityCatalog + BM25 + tool_search/tool_describe/tool_call through ToolDispatcher). Target 25.8k -> <=12k tokens. - NEAR AI prompt caching: stable-prefix tiers (no cache_control); cached_tokens as empirical gate, raw prompt_tokens as primary KPI. - Context compression: PromptViewCompressor, DB-backed reference-by-handle via a session/run-scoped artifact store + result_read; deterministic phases before LLM middle-summary; compress the prompt view, never the DB. Doc: docs/plans/2026-06-23-hermes-style-context-management.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Zero-behavior-change instrumentation to quantify the per-call prompt and answer whether NEAR AI reports cached prompt tokens, before any behavior-changing disclosure/compression work. - nearai_chat: deserialize OpenAI-compatible cached prompt tokens (prompt_tokens_details.cached_tokens, with a top-level fallback) and route them into the existing cache_read_input_tokens; emit a per-call debug! usage line on target ironclaw::reborn::context_shadow (prompt/completion/cached tokens + cache_hit_ratio). - model_gateway: when the shadow target is debug-enabled, log tool_definition_count + est_tool_schema_tokens for the resolved tool surface. - context_shadow: pure deterministic ceil(chars/4) token estimator + tests. Request bytes are unchanged (tools=, messages, ordering untouched): the NEAR change only parses response usage; the gateway reads resolved tool definitions by reference for logging. Verified: cargo fmt, clippy clean, nearai usage tests (12) + estimator tests (2) pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nchmark Pure-logic core of progressive tool disclosure (no live wiring yet — the request path is unchanged; select_active_set is only called from the module and tests). Wiring behind REBORN_TOOL_DISCLOSURE + bridge execution comes in Phase 1.2. - tool_disclosure.rs: CapabilityCatalog (per-tool est schema tokens, Core/ Discoverable tiering from CORE_TOOL_NAMES ∪ profile pins, sorted+frozen), the 3 bridge tool definitions (tool_search/tool_describe/tool_call), append-only PromotedSet, DisclosureCaps (12k tokens / 32 tools defer threshold, 10%-ctx aware), and select_active_set with the council-agreed canonical order [core → bridges → promoted], never re-sorted, no per-turn keyword scoring of the wire surface. Deterministic tool_search ranking. - Reuses context_shadow::estimate_tokens (no duplicate estimator). Token benchmark (representative 91-tool fixture; real number comes from the Phase-0 shadow log in production): full: 91 tools / 21,240 tok -> disclosed (core+bridges): 10 tools / 1,427 tok reduction: 19,813 tok (93.3%) Verified: cargo fmt, clippy clean, 9 tool_disclosure tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rator (flag-gated, default off)
Makes progressive disclosure real behind REBORN_TOOL_DISCLOSURE (off|bridged,
default off). Implemented as a LoopCapabilityPort decorator rather than editing
model_gateway directly, so bridge tool_call re-enters the SAME dispatch path:
- ToolDisclosureMode::from_env() + runtime wiring: the decorator is only added
when bridged. Flag OFF => no decorator => behavior/wire surface byte-identical.
- tool_disclosure_port.rs (ToolDisclosureCapabilityDecorator over LoopCapabilityPort):
* visible_capabilities() filters the surface to active-set (core+bridges+promoted)
+ bridge defs.
* bridge tool_call translates to a synthetic target call routed through
self.inner.validate/register/invoke_capability — same validation, audit,
approval/auth gates, and error semantics as a direct call.
* tool_search/tool_describe handled locally (read-only) via tool_search_rank.
* allowed_tool_call_target guards: rejects recursion (bridge names), unknown
catalog names, and tools neither active nor disclosed-this-turn (same-turn
discovery exception) -> model-recoverable failure, never run death.
* append-only PromotedSet per thread: a successfully dispatched deferred tool
is advertised directly next turn.
- Non-bridge calls pass straight through to inner unchanged.
Verified: cargo fmt; check (default + root-llm-provider); clippy clean;
ironclaw_reborn lib tests 267 passed. Includes one through-the-caller async test
(search -> disclose -> tool_call dispatch -> promote).
Follow-up: add negative-path bridge tests (recursion/unknown/invisible ->
recoverable failure; flag-off byte-identical snapshot) before enabling in prod.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ontext-management plan
Adds §13: what shipped (Phase 0 measurement; Phase 1 tool disclosure end-to-end,
flag-gated default-off, 93.3% tool-token benchmark) and two corrected plan
premises discovered while grounding in the real code:
- Phase 3 (compression) is ALREADY implemented: ActiveTaskPreservingCompaction
Strategy + BudgetStrategy + summary-only LoopContextMessage refs. The from-
scratch PromptViewCompressor prototype was the wrong layer (LoopModelMessage
is only {role, content_ref}; content is already externalized) and was
discarded. Future work here is targeted improvement gated on measurement.
- Phase 2 (stable prefix) is largely pre-existing: deterministic instruction-
bundle assembly + stable fingerprint, reinforced by Phase 1.2's deterministic
memoized tool list; cached_tokens measurement landed in Phase 0. Further
tiering deferred until production shows NEAR caches (the design's empirical
gate).
Net: the genuine gap (tool disclosure) is built + benchmarked; Phases 2/3 are
covered by existing infrastructure.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ridge Locks the safety guarantees of the Phase 1.2 bridge before any prod enable: - tool_call_targeting_a_bridge_is_rejected_without_dispatch: recursion guard — tool_call(name = a bridge) stays on the bridge path, returns a recoverable InvalidInput failure, and the inner port records zero registered calls / invocations (no dispatch leaks through). - tool_call_targeting_unknown_tool_is_rejected_without_dispatch: unknown target → recoverable InvalidInput failure, inner port untouched. - tool_disclosure_mode_defaults_off_and_gates_decorator: ToolDisclosureMode default is Off and only Bridged reports is_bridged(), which is what gates whether the decorator is attached — i.e. flag-off => no decorator => byte- identical request path. (The "exists-but-not-disclosed-this-turn → recoverable failure" path was already covered by search_discloses_tool_call_dispatches_target_and_promotes_next_turn.) Verified: cargo fmt; clippy clean; ironclaw_reborn lib suite 270 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds cached-token parsing, disclosure-mode wiring, progressive tool surfacing, gateway resolution changes, repeated-error compaction, and a design plan. ChangesHermes-Style Context Management
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements Hermes-Style Context Management for Reborn, introducing progressive tool disclosure, NEAR AI prompt caching, and context shadow usage measurements. Feedback on the changes focuses on several performance, correctness, and security improvements: pre-computing search terms in CatalogEntry to avoid redundant allocations during search scoring; canonicalizing JSON arguments before digesting to ensure deterministic identifiers; optimizing a nested loop in visible_capabilities using a BTreeSet; caching token estimates for static bridge tools; and using structured keys instead of raw string concatenation with a null separator to prevent potential key collisions.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| fn score_tool_entry(entry: &CatalogEntry, query_terms: &[&str]) -> u32 { | ||
| let blob_terms: HashSet<&str> = entry | ||
| .search_blob | ||
| .split_whitespace() | ||
| .map(|term| term.trim_matches(|c: char| !c.is_alphanumeric() && c != '_')) | ||
| .filter(|term| !term.is_empty()) | ||
| .collect(); | ||
| let mut keyword_score = 0_u32; | ||
| for term in query_terms { | ||
| if entry.definition.name.eq_ignore_ascii_case(term) || blob_terms.contains(term) { | ||
| keyword_score = keyword_score.saturating_add(10); | ||
| } else if entry.search_blob.contains(term) { | ||
| keyword_score = keyword_score.saturating_add(5); | ||
| } | ||
| } | ||
| keyword_score.min(MAX_KEYWORD_SCORE) | ||
| } |
There was a problem hiding this comment.
Performance Optimization: Avoid Redundant HashSet Allocation and Parsing
In score_tool_entry, the blob_terms HashSet is parsed, allocated, and collected from entry.search_blob on every single call (which occurs for every tool in the catalog on every search query). With 91 tools, this results in significant garbage collection and CPU overhead during search operations.
Since CapabilityCatalog is built once and frozen, we can pre-compute and store the search_terms HashSet<String> directly inside CatalogEntry during catalog construction. This makes score_tool_entry completely allocation-free.
Suggested Refactoring:
-
Update
CatalogEntryto storesearch_terms:pub(crate) struct CatalogEntry { definition: ProviderToolDefinition, est_schema_tokens: u32, search_blob: String, search_terms: HashSet<String>, tier: ToolTier, }
-
Pre-compute
search_termsinCapabilityCatalog::new:let search_terms: HashSet<String> = search_blob .split_whitespace() .map(|term| term.trim_matches(|c: char| !c.is_alphanumeric() && c != '_').to_string()) .filter(|term| !term.is_empty()) .collect();
-
Simplify
score_tool_entryto:fn score_tool_entry(entry: &CatalogEntry, query_terms: &[&str]) -> u32 { let mut keyword_score = 0_u32; for term in query_terms { if entry.definition.name.eq_ignore_ascii_case(term) || entry.search_terms.contains(*term) { keyword_score = keyword_score.saturating_add(10); } else if entry.search_blob.contains(term) { keyword_score = keyword_score.saturating_add(5); } } keyword_score.min(MAX_KEYWORD_SCORE) }
| let digest_input = tool_call.arguments.to_string(); | ||
| let digest = ironclaw_host_api::sha256_digest_token(digest_input.as_bytes()); |
There was a problem hiding this comment.
Correctness & Determinism: Canonicalize JSON Arguments Before Digesting
The digest_input is derived directly from tool_call.arguments.to_string(). However, serde_json::Value::to_string() does not guarantee deterministic key ordering for JSON objects. If the keys are serialized in a different order, logically identical tool calls will produce different digests, leading to unstable input_ref identifiers and breaking the "stable-prefix discipline" required for prompt caching.
To ensure determinism, we should canonicalize the JSON arguments using the existing canonicalize_json helper before stringifying them.
let canonical_args = crate::tool_disclosure::canonicalize_json(&tool_call.arguments);
let digest_input = canonical_args.to_string();
let digest = ironclaw_host_api::sha256_digest_token(digest_input.as_bytes());| descriptors.retain(|descriptor| { | ||
| surface | ||
| .descriptors | ||
| .iter() | ||
| .any(|inner| inner.capability_id == descriptor.capability_id) | ||
| || is_bridge_capability_id(&descriptor.capability_id) | ||
| }); |
There was a problem hiding this comment.
Performance Optimization: Avoid $O(N \times M)$ Nested Loop in visible_capabilities
In visible_capabilities, descriptors.retain performs a linear scan over surface.descriptors for every single descriptor. This results in an
Since surface.descriptors can be collected into a BTreeSet (which is already imported) of capability IDs, we can optimize this lookup to
| descriptors.retain(|descriptor| { | |
| surface | |
| .descriptors | |
| .iter() | |
| .any(|inner| inner.capability_id == descriptor.capability_id) | |
| || is_bridge_capability_id(&descriptor.capability_id) | |
| }); | |
| let inner_ids: BTreeSet<&CapabilityId> = surface | |
| .descriptors | |
| .iter() | |
| .map(|d| &d.capability_id) | |
| .collect(); | |
| descriptors.retain(|descriptor| { | |
| inner_ids.contains(&descriptor.capability_id) | |
| || is_bridge_capability_id(&descriptor.capability_id) | |
| }); |
| } | ||
|
|
||
| for definition in bridge_tool_definitions() { | ||
| let est_schema_tokens = estimate_definition_tokens(&definition); |
There was a problem hiding this comment.
Performance Optimization: Avoid Redundant Token Estimation for Static Bridge Tools
In select_active_set, estimate_definition_tokens(&definition) is called for each bridge tool definition on every single invocation. Since bridge_tool_definitions() returns static, pre-defined tools that never change, canonicalizing their schemas and estimating their tokens on every turn is redundant.
We could pre-compute these token estimates once (e.g., as a lazy static or constants) or cache them to avoid unnecessary JSON serialization and string processing. However, please note that performance optimizations, such as adding a caching layer, should be gated on profiling to ensure they provide a meaningful benefit.
References
- Performance optimizations, such as adding a caching layer, should be gated on profiling, especially when existing mechanisms (e.g., ETags, short-circuiting for specific paths) already reduce overhead for common cases.
| let digest_input = format!("{}\0{}", tool_call.id, target.name); | ||
| let target_id = ironclaw_host_api::sha256_digest_token(digest_input.as_bytes()); |
There was a problem hiding this comment.
Security & Robustness: Avoid Raw String Concatenation for Deterministic Identifiers
The digest_input is derived using raw string concatenation with a null separator:
let digest_input = format!("{}\0{}", tool_call.id, target.name);To prevent key collisions, we should use structured keys instead of string concatenation with a separator.
Suggested Change:
Consider using a structured key format or tuple representation to avoid potential collisions.
References
- To prevent key collisions, use structured keys instead of string concatenation with a separator.
…mark The PinchBench harness runs remotely and cannot set env vars, so to exercise the disclosure path it must be on without configuration. Flip ToolDisclosureMode::from_env so an UNSET REBORN_TOOL_DISCLOSURE resolves to Bridged; explicit REBORN_TOOL_DISCLOSURE=off remains the escape hatch. Implemented via a pure, testable `from_raw(Option<&str>)` helper (no env races in tests). This is a knowingly-temporary non-default-off state — revert the catch-all arm back to Self::Off after benchmarking (doc §13 has the⚠️ note). Verified: cargo fmt; clippy clean; ironclaw_reborn lib suite 270 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Reason: the PinchBench harness runs remotely and can't set env vars, so disclosure must be on without configuration to be benchmarked. Explicit This must be reverted before merge/GA — change the catch-all arm in Do not merge to a shared environment while this is in place unless default-on is intended there. |
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/ironclaw_llm/src/nearai_chat.rs`:
- Around line 587-594: The cache_read_input_tokens field in the
CompletionResponse struct is being set to an unclamped cached_tokens value,
which can exceed the input_tokens count and cause incorrect billing/analytics
attribution. Clamp the cache_read_input_tokens to not exceed input_tokens by
using a minimum operation between the cached_tokens value and input_tokens.
Apply this same clamping fix in both locations mentioned: the CompletionResponse
initialization around line 593 and the second location around line 706.
In `@crates/ironclaw_reborn/src/runtime.rs`:
- Around line 106-112: The from_env method in the runtime module silently
defaults to Self::Off for any unrecognized environment variable value, which
masks typos and invalid configurations. Change the from_env method to return a
Result type instead of Self, and return an error (using a thiserror type with
context) when an invalid value is encountered for REBORN_TOOL_DISCLOSURE_ENV
instead of defaulting to Off. This way, invalid configuration becomes observable
during rollout rather than silently disabling the feature.
In `@crates/ironclaw_reborn/src/tool_disclosure_port.rs`:
- Around line 280-312: The promoted_by_thread HashMap is currently keyed only by
thread_id, which can cause scope collisions if thread IDs are reused across
tenants, projects, or agents. Create a scoped newtype key that includes the full
tenant/agent/project/thread identity instead of just thread_id. In both the
promoted_for_thread() and promote_target() methods, replace the simple
self.run_context.thread_id.to_string() key with a composite key that captures
the complete scope identity. This ensures that promotions in one scope do not
inadvertently affect the tool surface of another scope.
- Around line 171-172: All `.map_err(|_| invalid_invocation(...))` patterns
throughout the file discard the source error instead of preserving it as the
cause. Replace each instance by capturing the error variable with `|e|` instead
of `|_|` and include the original error in the context when calling
invalid_invocation or route through a helper function that preserves the cause.
This applies to all occurrences: the one at lines 171-172 in the target store
lock operation, the ones at lines 213-214, 254-258, 282-286, 306-310, 328-332,
and the one at 364-365. Ensure each `.map_err` call binds the actual error and
passes it along so the original cause is not lost.
- Around line 118-125: The code is using `let Ok(Some(...))` guard patterns
around calls to `allowed_tool_call_target()` which silently suppresses errors
instead of propagating them. Replace the combined error and option handling with
the `?` operator to propagate errors loudly, then branch only on the `Option`
type. In the block starting with `if tool_call.name == TOOL_CALL_NAME`, change
the guard condition to call `self.allowed_tool_call_target(tool_call)?` and use
`if let Some(target)` to handle the Option separately. Apply this same fix
pattern to the other occurrences at lines 150-154 and 165-177 where similar `let
Ok(Some(...))` patterns are used with `synthetic_target_call()` and other
fallible methods.
- Around line 192-202: Instead of replacing surface.descriptors with newly
constructed descriptors from the catalog, keep the existing inner
surface.descriptors and filter them to retain only those whose capability_id
matches an active/disclosed capability ID or is identified as a bridge
capability by is_bridge_capability_id. The active and disclosed capability IDs
can be determined from the descriptors returned by
state.catalog.active_or_disclosed_descriptors, and only append bridge
descriptors that are not already present in the filtered inner descriptors list.
- Around line 384-395: The query parameter extraction in this code block uses
unwrap_or_default() which silently converts a missing or non-string query into
an empty string. Instead, replace this logic with error handling that validates
the query is present and is a string, returning a recoverable InvalidInput error
if the validation fails, similar to how tool_describe handles malformed inputs.
Focus on the chain of get("query"), and_then(Value::as_str), and
unwrap_or_default() calls to implement proper validation instead of defaulting
to an empty string.
- Around line 326-339: The digest_input is currently computed from only
tool_call.arguments, which causes collisions when different bridge calls have
identical JSON arguments, leading to entries being overwritten in the
bridge_inputs map. Modify the digest_input variable to include the bridge name
(tool_call.name) and a unique call identifier along with the arguments before
passing it to ironclaw_host_api::sha256_digest_token. This will ensure each
BridgeInvocation has a unique input_ref based on both its content and identity,
preventing collisions in the bridge_inputs store.
In `@crates/ironclaw_reborn/src/tool_disclosure.rs`:
- Around line 277-346: In the `select_active_set` function, the deferred mode
path (after the initial check fails) appends Core tools, bridge tools, and
promoted tools without enforcing the `max_tools` or `max_tokens` caps from
`DisclosureCaps`. Add validation checks before appending each definition in the
loops that iterate over catalog entries, bridge_tool_definitions(), and promoted
items to ensure the advertised_tokens does not exceed
caps.defer_threshold_tokens() and the total definitions count does not exceed
caps.max_tools. If caps would be exceeded, either stop appending additional
definitions or implement an eviction policy as noted in the TODO comment.
- Around line 83-84: The logic that marks definitions as Core by checking
CORE_TOOL_NAMES needs to be adjusted to prevent reserved bridge names
(tool_search, tool_describe, tool_call) from colliding with real inner
capabilities. Either exclude these reserved bridge names from the catalog
entries before they are processed, or ensure bridges are appended before catalog
entries so their names take precedence in the deduplication logic of
append_definition(), or add explicit collision detection to reject definitions
using reserved names. This issue appears in the tier assignment logic around the
CORE_TOOL_NAMES check and also in the code region at lines 319-328.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 0e3b723f-00ab-4c4b-ad4c-6b9fa642303d
📒 Files selected for processing (8)
crates/ironclaw_llm/src/nearai_chat.rscrates/ironclaw_reborn/src/context_shadow.rscrates/ironclaw_reborn/src/lib.rscrates/ironclaw_reborn/src/model_gateway.rscrates/ironclaw_reborn/src/runtime.rscrates/ironclaw_reborn/src/tool_disclosure.rscrates/ironclaw_reborn/src/tool_disclosure_port.rsdocs/plans/2026-06-23-hermes-style-context-management.md
|
/benchmark pinchbench --framework ironclaw-reborn |
|
🧪 Started |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/plans/2026-06-23-hermes-style-context-management.md (1)
125-125: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winRollout section says default-off while status section says current branch is default-on.
Please add an explicit exception note in §8 (or link to §13) so operators don’t apply the wrong default during canary setup.
Also applies to: 169-171
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/plans/2026-06-23-hermes-style-context-management.md` at line 125, The Rollout section at line 125 states that all phases are default-off, but this contradicts the status section which indicates the current branch is default-on. Add an explicit exception note in section 8 that clarifies this discrepancy and specifies which default applies during canary setup, either by providing inline clarification or by linking to section 13 for additional context. Apply the same clarification to the related statements at lines 169-171 to ensure consistency across all mentions of default behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/ironclaw_reborn/src/runtime.rs`:
- Around line 116-121: The from_raw method currently defaults
unset/empty/unrecognized values to Self::Bridged (ON state), but this was
explicitly marked as temporary for benchmarking and must be reverted before
merge. Change the catch-all pattern in the from_raw method to return Self::Off
instead of Self::Bridged to restore the correct default-off behavior.
Additionally, update the related unit tests (referenced in the also applies to
section around lines 775-796) that currently lock in the temporary ON behavior
to instead verify the correct default-off behavior for unset and unrecognized
values.
- Line 107: The line using
`std::env::var(REBORN_TOOL_DISCLOSURE_ENV).ok().as_deref()` silently discards
error information from VarError, violating the fail-loud invariant for settings
reads. Replace the `.ok().as_deref()` chain with proper error handling that
preserves error context, either by using `.as_deref()` directly on the Result
without the intermediate `.ok()` call, or by using the `?` operator to propagate
the error with appropriate context. Ensure that NotUnicode and other VarError
variants are not silently ignored during the environment variable read
operation.
---
Outside diff comments:
In `@docs/plans/2026-06-23-hermes-style-context-management.md`:
- Line 125: The Rollout section at line 125 states that all phases are
default-off, but this contradicts the status section which indicates the current
branch is default-on. Add an explicit exception note in section 8 that clarifies
this discrepancy and specifies which default applies during canary setup, either
by providing inline clarification or by linking to section 13 for additional
context. Apply the same clarification to the related statements at lines 169-171
to ensure consistency across all mentions of default behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 56e12771-c11d-4c3e-af76-5d5ffb4f2421
📒 Files selected for processing (2)
crates/ironclaw_reborn/src/runtime.rsdocs/plans/2026-06-23-hermes-style-context-management.md
| fn from_raw(raw: Option<&str>) -> Self { | ||
| match raw { | ||
| Some(value) if value.eq_ignore_ascii_case("off") => Self::Off, | ||
| Some(value) if value.eq_ignore_ascii_case("bridged") => Self::Bridged, | ||
| // unset / empty / unrecognized -> temporarily ON for benchmarking. | ||
| _ => Self::Bridged, |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Revert temporary default-on before merge; test currently cements non-GA behavior.
Catch-all from_raw → Bridged makes disclosure ON for unset/garbage values, and the unit test locks that behavior in. Your own plan/PR objective marks this as temporary and requires default-off for merge safety.
Suggested patch
fn from_raw(raw: Option<&str>) -> Self {
match raw {
Some(value) if value.eq_ignore_ascii_case("off") => Self::Off,
Some(value) if value.eq_ignore_ascii_case("bridged") => Self::Bridged,
- // unset / empty / unrecognized -> temporarily ON for benchmarking.
- _ => Self::Bridged,
+ _ => Self::Off,
}
}-assert!(ToolDisclosureMode::from_raw(None).is_bridged());
-assert!(ToolDisclosureMode::from_raw(Some("")).is_bridged());
-assert!(ToolDisclosureMode::from_raw(Some("garbage")).is_bridged());
+assert!(!ToolDisclosureMode::from_raw(None).is_bridged());
+assert!(!ToolDisclosureMode::from_raw(Some("")).is_bridged());
+assert!(!ToolDisclosureMode::from_raw(Some("garbage")).is_bridged());Also applies to: 775-796
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/ironclaw_reborn/src/runtime.rs` around lines 116 - 121, The from_raw
method currently defaults unset/empty/unrecognized values to Self::Bridged (ON
state), but this was explicitly marked as temporary for benchmarking and must be
reverted before merge. Change the catch-all pattern in the from_raw method to
return Self::Off instead of Self::Bridged to restore the correct default-off
behavior. Additionally, update the related unit tests (referenced in the also
applies to section around lines 775-796) that currently lock in the temporary ON
behavior to instead verify the correct default-off behavior for unset and
unrecognized values.
|
🚅 Deployed to the ironclaw-pr-5149 environment in ironclaw-ci-preview
|
Correctness / fail-loud / determinism / multi-tenant + perf fixes from the automated PR review: - nearai_chat: clamp cache_read_input_tokens to input_tokens (no over-attribution). - runtime: log unrecognized REBORN_TOOL_DISCLOSURE values via debug! (observable; temporary default-on unchanged). - tool_disclosure: reserve bridge names — exclude inner tools named tool_search/describe/call from the catalog so a real tool can't shadow a bridge; precompute per-entry search_terms (alloc-free scoring); cache bridge defs + token estimates via LazyLock; enforce max_tools/token caps on promoted tools in deferred mode (core+bridges always kept, deterministic truncation). - tool_disclosure_port: propagate target-resolution errors with `?` (no swallowed lock/rebuild failures); bind mapped errors to preserve source cause; filter the inner visible surface instead of reconstructing it (preserves provider/runtime/ concurrency metadata, O(N log M)); scope promotions by tenant+agent+project+thread (no cross-scope leakage); canonical-JSON + provider-call-id + tool-name in bridge/target digests (collision-safe, order-stable); reject missing/blank tool_search.query with recoverable InvalidInput. Regression tests added for bridge-name reservation, scoped promotion, and caps. Verified: cargo fmt; clippy clean (ironclaw_reborn + ironclaw_llm); ironclaw_reborn lib 274 passed, ironclaw_llm lib 884 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed all Gemini + CodeRabbit review comments in Correctness / fail-loud
Determinism / collision-safety
Multi-tenant safety
Bounds / perf
Regression tests added for bridge-name reservation, scoped promotion, and caps. Verified: clippy clean ( (One item — the eviction/epoch-reset policy beyond the cap truncation — is left as a documented follow-up in design doc §13.) |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/ironclaw_reborn/src/tool_disclosure_port.rs (1)
182-183: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winDo not let inner descriptors shadow bridge capability IDs.
Line 225 retains any inner descriptor whose
capability_idis a bridge ID, and lines 234-237 then skip appending the canonical bridge descriptor. A non-bridge inner capability with a reserved bridge ID will surface wrong metadata and later be routed toinvoke_bridge()by line 246 instead of the inner port. Drop inner bridge IDs before appending canonical bridge descriptors, and reject non-bridge inner candidates that return reserved bridge IDs.Suggested shape
- surface.descriptors.retain(|descriptor| { - active_or_disclosed_ids.contains(&descriptor.capability_id) - || is_bridge_capability_id(&descriptor.capability_id) - }); + surface.descriptors.retain(|descriptor| { + active_or_disclosed_ids.contains(&descriptor.capability_id) + && !is_bridge_capability_id(&descriptor.capability_id) + });Also applies to: 225-238, 246-263
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/ironclaw_reborn/src/tool_disclosure_port.rs` around lines 182 - 183, Inner descriptors with reserved bridge capability IDs are not being filtered out, causing wrong metadata to surface and incorrect routing to invoke_bridge. When processing non-bridge inner tool calls in register_provider_tool_call, validate that the inner descriptor returned does not have a capability_id matching a reserved bridge ID and reject such candidates. Additionally, filter out any inner descriptors with bridge capability IDs before appending canonical bridge descriptors. This ensures bridge IDs are never retained from inner descriptors and prevents shadowing of the correct bridge metadata and routing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/ironclaw_reborn/src/tool_disclosure_port.rs`:
- Around line 94-110: The PromotionScopeKey struct is storing all identifiers as
raw String types instead of using the existing typed ID newtypes. Replace the
String field types in the PromotionScopeKey struct definition with the
appropriate ID types (TenantId, Option<AgentId>, Option<ProjectId>, and
ThreadId). Then update the from_run_context method implementation to directly
use the typed IDs from the run_context scope instead of converting them to
strings via as_str().to_string() calls.
In `@crates/ironclaw_reborn/src/tool_disclosure.rs`:
- Line 83: The filter at line 83 in tool_disclosure.rs currently only excludes
tools by bridge name using is_bridge_name, but tools can also have bridge
capability IDs as a reserved routing namespace (as established at line 300).
Extend the filter condition to also exclude definitions with bridge capability
IDs by adding an additional check with
is_bridge_capability_id(&definition.capability_id) alongside the existing
is_bridge_name check, ensuring tools cannot bypass the normal dispatch path by
using a non-bridge name with a bridge CapabilityId.
---
Outside diff comments:
In `@crates/ironclaw_reborn/src/tool_disclosure_port.rs`:
- Around line 182-183: Inner descriptors with reserved bridge capability IDs are
not being filtered out, causing wrong metadata to surface and incorrect routing
to invoke_bridge. When processing non-bridge inner tool calls in
register_provider_tool_call, validate that the inner descriptor returned does
not have a capability_id matching a reserved bridge ID and reject such
candidates. Additionally, filter out any inner descriptors with bridge
capability IDs before appending canonical bridge descriptors. This ensures
bridge IDs are never retained from inner descriptors and prevents shadowing of
the correct bridge metadata and routing behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 21487f16-624e-4958-9885-3c2eefa08f74
📒 Files selected for processing (4)
crates/ironclaw_llm/src/nearai_chat.rscrates/ironclaw_reborn/src/runtime.rscrates/ironclaw_reborn/src/tool_disclosure.rscrates/ironclaw_reborn/src/tool_disclosure_port.rs
…Rabbit comments CI: the temporary REBORN_TOOL_DISCLOSURE default-on changed the advertised tool surface in the recorded-behavior replays (tests/reborn_qa_recorded_behavior.rs), so 4 routine replays ended `status Failed`. Confirmed empirically (default → all 4 fail; =off → all 4 pass). Add `RebornRuntimeInput::with_tool_disclosure` and have the QA replay harness (qa_trace.rs) pin `ToolDisclosureMode::Off` — non-racy (explicit config, no env mutation); these traces predate disclosure. CodeRabbit: - PromotionScopeKey now uses typed ID newtypes (TenantId/AgentId/ProjectId/ ThreadId) instead of raw String. - Reserve bridge CapabilityIds as well as bridge names: catalog filtering rejects both, and visible_capabilities drops inner descriptors carrying a bridge capability id before appending the synthetic bridge descriptors (so a real tool can't shadow a bridge by id). +unit test. Verified: cargo fmt; clippy clean (ironclaw_reborn root-llm-provider; ironclaw_reborn_composition postgres,webui-v2-beta); tool_disclosure lib 18 passed; reborn_qa_recorded_behavior replay_routine_ (all-features) 4 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ool calls (A)
Live runs on deepseek-V4-flash showed the model could discover tools but not
invoke deferred ones ("connect gmail" → extension_activate not visible; PR
review → http not visible). Two root causes:
B (the real cause): CORE_TOOL_NAMES silently mismatched the real builtin names
— `file_read` (real: `read_file`), `skill_search` (real: `skill_list`) — so in
production those tiered as Discoverable and the always-needed flow tools (http,
extension_*) were never core at all → almost everything deferred. Fix:
- Correct names (read_file, skill_list) and add http + extension_search/
install/activate/remove + web_search/web_fetch to core.
- Match core by bare name OR canonical builtin capability suffix, so provider-
encoded names (builtin.read_file / builtin__read_file) still tier as core.
- Regression test cross-referencing core names to known builtin capability ids,
so a future rename can't silently empty the core again.
A (robustness): a direct call to a deferred-but-real tool hard-failed upstream
in the model gateway before the decorator could help. Fix:
- Gateway now rejects only unadvertised calls the capability port cannot
RESOLVE; resolvable (scope-authorized) deferred calls proceed through the
unchanged validate_provider_tool_call + dispatch (full audit/safety/approval).
- Decorator routes a direct deferred-catalog call through inner resolution/
validation/registration, then promotes it for next turn.
Disclosure-off path is unchanged (advertised == full authorized surface).
Verified: clippy clean; tool_disclosure lib 21 passed; llm_gateway 61 passed;
reborn_qa_recorded_behavior replay_routine_ (all-features) 4 passed. Benchmark
mechanism unchanged on the fixture (93.3%); real disclosed surface is now wider
(correct core) and will be measured live via the Phase-0 shadow logs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… replays
A weak model frequently calls a deferred provider tool by its dotted
catalog capability_id (e.g. `google-calendar.list_events`, which
`tool_search`/`tool_describe` surface alongside the `__`-encoded wire
name) instead of the wire name. The forgiving direct-deferred path
resolves and dispatches that fine, but recorded the provider replay
under the raw dotted name. That replay feeds the assistant transcript
and any provider-error result reference, where `validate_provider_tool_name`
rejects the `.` — failing the transcript write and borking the whole run
(`HostUnavailable { stage: Capability }`).
Normalize only when needed: preserve the model's emitted name when it is
already wire-safe (the common `__`-encoded case, so the replay faithfully
mirrors the model's output), and fall back to the resolved definition's
canonical name only for a non-wire-safe emitted form (the dotted
capability_id). Adds `replay_provider_tool_name` and threads an explicit
provider_tool_name through `provider_replay_for`.
Regression test drives the register caller (not just the helper) with a
dotted capability_id and asserts the replay records the canonical wire
name and passes `validate_provider_tool_name`. Startup marker bumped to
`canonical-replay-name-v5`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (2)
crates/ironclaw_reborn/src/runtime.rs (1)
131-144: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRestore default-off before merge.
Unset/empty/garbage
REBORN_TOOL_DISCLOSUREstill enablesBridged, and the test locks in the temporary benchmark behavior.Suggested patch
- Some(value) if !value.is_empty() => { + Some(value) if !value.trim().is_empty() => { tracing::debug!( target: "ironclaw::reborn::runtime", value, - "unrecognized REBORN_TOOL_DISCLOSURE value; falling back to temporary default Bridged" + "unrecognized REBORN_TOOL_DISCLOSURE value; falling back to Off" ); - Self::Bridged + Self::Off } - // unset / empty -> temporarily ON for benchmarking. - _ => Self::Bridged, + _ => Self::Off,As per PR objectives, default-on is temporary and must be reverted before merge/GA.
Also applies to: 817-841
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/ironclaw_reborn/src/runtime.rs` around lines 131 - 144, Restore the intended default-off behavior in RuntimeDisclosure::from_raw and related tests: unset, empty, or unrecognized REBORN_TOOL_DISCLOSURE values should no longer fall back to Bridged, and the temporary benchmarking default must be removed. Update the match arms in from_raw to return Off for missing/empty/invalid input, keep the explicit off/bridged mappings, and adjust the tests around RuntimeDisclosure so they assert the production default rather than the temporary benchmark setting.crates/ironclaw_reborn/src/tool_disclosure_port.rs (1)
165-183: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winPreserve inner capability expansion in the pre-check.
target.target_callis already canonicalized and is the same resolved call used by validate/register, so resolving IDs directly here can drop innereffective_capability_ids/ approval expansion.Suggested patch
- return Ok(ProviderToolCallCapabilityIds::single( - target.definition.capability_id, - )); + return self + .inner + .provider_tool_call_capability_ids(&target.target_call); @@ - return Ok(ProviderToolCallCapabilityIds { - provider_capability_id: target.definition.capability_id.clone(), - effective_capability_ids: vec![target.definition.capability_id], - }); + return self + .inner + .provider_tool_call_capability_ids(&target.target_call);As per coding guidelines, fail closed for auth/approvals.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/ironclaw_reborn/src/tool_disclosure_port.rs` around lines 165 - 183, The pre-check in tool_disclosure_port should not bypass the inner capability expansion for canonicalized target.target_call, because returning only the target.definition.capability_id drops the inner effective_capability_ids and approval expansion used by validate/register. Update the TOOL_CALL_NAME path in the provider_tool_call_capability_ids logic to preserve the inner provider_tool_call_capability_ids result for the resolved target call while still failing closed for auth/approvals, and use the existing allowed_tool_call_target/inner capability lookup flow to keep the same canonical call behavior.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@crates/ironclaw_reborn/src/runtime.rs`:
- Around line 131-144: Restore the intended default-off behavior in
RuntimeDisclosure::from_raw and related tests: unset, empty, or unrecognized
REBORN_TOOL_DISCLOSURE values should no longer fall back to Bridged, and the
temporary benchmarking default must be removed. Update the match arms in
from_raw to return Off for missing/empty/invalid input, keep the explicit
off/bridged mappings, and adjust the tests around RuntimeDisclosure so they
assert the production default rather than the temporary benchmark setting.
In `@crates/ironclaw_reborn/src/tool_disclosure_port.rs`:
- Around line 165-183: The pre-check in tool_disclosure_port should not bypass
the inner capability expansion for canonicalized target.target_call, because
returning only the target.definition.capability_id drops the inner
effective_capability_ids and approval expansion used by validate/register.
Update the TOOL_CALL_NAME path in the provider_tool_call_capability_ids logic to
preserve the inner provider_tool_call_capability_ids result for the resolved
target call while still failing closed for auth/approvals, and use the existing
allowed_tool_call_target/inner capability lookup flow to keep the same canonical
call behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 749b0ec5-259e-4d77-afd5-e282231b8b97
📒 Files selected for processing (2)
crates/ironclaw_reborn/src/runtime.rscrates/ironclaw_reborn/src/tool_disclosure_port.rs
…forgivingly
A model that invokes a catalog tool through the `tool_call` bridge WITHOUT
first discovering it via tool_search/tool_describe hit a dead end:
`allowed_tool_call_target` gated on `active || disclosed`, so an
undisclosed target fell through to the bridge, which returned a generic
`invalid_input` ("unknown or not disclosed") with `detail: None` — no
schema issues, no repairs, no recovery hint. The model could not
self-correct, retried the same call, and every failed observation was
replayed in full each turn until context ballooned and the run failed
(observed live with gpt-5.5: 30+ iterations of generic invalid_input,
prompt grown to ~27k tokens, run ending Failed / driver_protocol_violation).
A *direct* call to the same undisclosed tool already resolves via
`direct_deferred_target`, so the bridge was strictly stricter than the
direct path for no benefit — safety/approval/auth gates run at dispatch
regardless, so disclosure is a token-economy boundary, not a security one.
Make the bridge forgiving: resolve any tool the catalog knows by name,
regardless of prior disclosure. The call now dispatches and surfaces the
tool's real schema error (with repairs) the model can act on, and earns
promotion on success. Genuinely-unknown targets still fail, now with an
actionable message pointing at tool_search.
- `allowed_tool_call_target`: drop the active||disclosed requirement.
- bridge unknown-target message: "...use tool_search to find the correct
tool name" instead of "...unknown or not disclosed".
- New test `tool_call_resolves_undisclosed_catalog_target_forgivingly`.
- `search_discloses_...`: drop the stale "undisclosed tool_call fails"
block (no longer true); the unknown/bridge-recursion reject tests are
unchanged and still pass.
- Startup marker -> forgiving-tool-call-v6.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/ironclaw_reborn/src/runtime.rs (1)
131-144: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winTrim before matching the
offescape hatch.
REBORN_TOOL_DISCLOSURE="off "currently falls through to the unrecognized branch and enablesBridged. Normalize withtrim()before matching so the documented escape hatch works reliably. The PR objective saysREBORN_TOOL_DISCLOSURE=offremains the escape hatch.Suggested patch
fn from_raw(raw: Option<&str>) -> Self { - match raw { + match raw.map(str::trim) { Some(value) if value.eq_ignore_ascii_case("off") => Self::Off, Some(value) if value.eq_ignore_ascii_case("bridged") => Self::Bridged,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/ironclaw_reborn/src/runtime.rs` around lines 131 - 144, The `from_raw` parser in `runtime.rs` should trim the incoming `REBORN_TOOL_DISCLOSURE` value before matching, because values like `off ` currently miss the `off` escape hatch and fall into the fallback branch. Update the matching logic in `from_raw` to normalize `raw` with `trim()` and then compare against `off` and `bridged`, keeping the existing debug logging and `Self::Bridged` fallback for unrecognized non-empty values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/ironclaw_reborn/src/runtime.rs`:
- Around line 131-144: The `from_raw` parser in `runtime.rs` should trim the
incoming `REBORN_TOOL_DISCLOSURE` value before matching, because values like
`off ` currently miss the `off` escape hatch and fall into the fallback branch.
Update the matching logic in `from_raw` to normalize `raw` with `trim()` and
then compare against `off` and `bridged`, keeping the existing debug logging and
`Self::Bridged` fallback for unrecognized non-empty values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 6a712098-3e9d-4104-ae66-2842d43c2db1
📒 Files selected for processing (2)
crates/ironclaw_reborn/src/runtime.rscrates/ironclaw_reborn/src/tool_disclosure_port.rs
…ed context When a model repeats the same failing call, each attempt's error observation is byte-for-byte identical, and every one is replayed into every later prompt. That both bloats context (observed live: ~15 identical invalid_input errors, prompt grown to ~27k tokens) and drowns the model in copies of its own failure so it cannot tell it is looping. The model_gateway prompt-build (`convert_messages`) now runs a pre-pass that, for each set of byte-identical *error* observations, keeps the FIRST and LAST occurrence intact (first for original detail, last for the most recent state + any repair hints) and replaces the interior duplicates with a compact, schema-valid marker. Nothing is dropped — every tool-result message stays, so provider tool-call/result pairing is preserved; only the observation *content* of interior duplicates shrinks. Success observations and a lone repeat (< 3 occurrences) are never touched. This operates only on the in-prompt representation; the durable transcript is unchanged (LLM data is never deleted). - `ironclaw_threads`: `ToolResultReferenceEnvelope::error_observation_fingerprint` (error-only) and `collapse_to_repeated_error_marker` (schema-valid marker), with unit tests that the marker round-trips and success is ignored. - `ironclaw_reborn::model_gateway`: `collapse_repeated_failure_observations` pre-pass + tests driving it through the real path (keeps first/last, leaves a single repeat alone, never touches success). - Startup marker -> collapse-repeated-errors-v7. Note: the forgiving-tool-call fix already eliminates the dead-end that produced the worst error storms, so this is now primarily a future-proofing safety net. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/ironclaw_reborn/src/model_gateway.rs`:
- Around line 1834-1884: There is only helper-level coverage for
repeated-failure compaction, so add a regression that exercises the real caller
path through convert_messages rather than collapse_repeated_failure_observations
directly. Use the existing test fixtures and a scenario with repeated error tool
results plus a success, then assert the resulting ChatMessage sequence contains
the repeated_error_elided marker in the interior duplicate after conversion.
Keep the helper tests if useful, but make the new test validate the behavior at
the convert_messages call site so the guard cannot be removed without failing.
In `@crates/ironclaw_threads/src/tool_result_reference.rs`:
- Around line 252-257: The error observation fingerprinting in
error_observation_fingerprint currently accepts any JSON with status: "error"
and fingerprints it directly, which can let malformed observations bypass the
safe-summary fallback. Add validation before returning a fingerprint by reusing
the same shape checks used by model_visible_content_or_safe_summary or a shared
helper, so only genuine error observations are accepted. Update
error_observation_fingerprint to return None for malformed error payloads, and
add a regression test covering malformed status: "error" observations producing
None.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 25728921-8a5c-40fc-893b-88b1a2b928a4
📒 Files selected for processing (3)
crates/ironclaw_reborn/src/model_gateway.rscrates/ironclaw_reborn/src/runtime.rscrates/ironclaw_threads/src/tool_result_reference.rs
| #[test] | ||
| fn collapse_repeated_failure_observations_keeps_first_and_last_only() { | ||
| let error_obs = generic_error_observation(); | ||
| let success_obs = serde_json::json!({ | ||
| "schema_version": 1, | ||
| "status": "success", | ||
| "summary": "ok", | ||
| "detail": {"kind": "generic_failure", "failure_kind": "none"}, | ||
| "trust": "untrusted_tool_output", | ||
| }); | ||
| // Four identical failures (each its own result_ref) plus a success. | ||
| let mut messages = vec![ | ||
| error_tool_result_message("result:err_1.1", error_obs.clone()), | ||
| error_tool_result_message("result:err_1.2", error_obs.clone()), | ||
| error_tool_result_message("result:err_1.3", error_obs.clone()), | ||
| error_tool_result_message("result:err_1.4", error_obs.clone()), | ||
| error_tool_result_message("result:ok_1.5", success_obs.clone()), | ||
| ]; | ||
|
|
||
| collapse_repeated_failure_observations(&mut messages); | ||
|
|
||
| // First and last identical errors keep full detail. | ||
| assert_eq!(tool_result_observation(&messages[0]), error_obs); | ||
| assert_eq!(tool_result_observation(&messages[3]), error_obs); | ||
| // Interior duplicates collapse to the compact, schema-valid marker. | ||
| for index in [1usize, 2] { | ||
| let failure_kind = tool_result_observation(&messages[index]) | ||
| .get("detail") | ||
| .and_then(|detail| detail.get("failure_kind")) | ||
| .and_then(|kind| kind.as_str()) | ||
| .map(str::to_string); | ||
| assert_eq!(failure_kind.as_deref(), Some("repeated_error_elided")); | ||
| } | ||
| // Success observation is never touched. | ||
| assert_eq!(tool_result_observation(&messages[4]), success_obs); | ||
| } | ||
|
|
||
| #[test] | ||
| fn collapse_repeated_failure_observations_leaves_a_single_repeat_alone() { | ||
| let error_obs = generic_error_observation(); | ||
| let mut messages = vec![ | ||
| error_tool_result_message("result:err_2.1", error_obs.clone()), | ||
| error_tool_result_message("result:err_2.2", error_obs.clone()), | ||
| ]; | ||
|
|
||
| collapse_repeated_failure_observations(&mut messages); | ||
|
|
||
| // Below the 3+ threshold: both copies stay intact. | ||
| assert_eq!(tool_result_observation(&messages[0]), error_obs); | ||
| assert_eq!(tool_result_observation(&messages[1]), error_obs); | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Test the compaction through convert_messages, not only the helper.
These assertions would still pass if Line 1385 were removed. Add a regression that calls convert_messages(...) and verifies the replayed ChatMessage contains the repeated-error marker for the interior duplicate. Repo invariant violated: test through the caller. As per path instructions, “Test through the caller: when a helper gates a side effect, require a test driving the real call site.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/ironclaw_reborn/src/model_gateway.rs` around lines 1834 - 1884, There
is only helper-level coverage for repeated-failure compaction, so add a
regression that exercises the real caller path through convert_messages rather
than collapse_repeated_failure_observations directly. Use the existing test
fixtures and a scenario with repeated error tool results plus a success, then
assert the resulting ChatMessage sequence contains the repeated_error_elided
marker in the interior duplicate after conversion. Keep the helper tests if
useful, but make the new test validate the behavior at the convert_messages call
site so the guard cannot be removed without failing.
Source: Path instructions
| pub fn error_observation_fingerprint(&self) -> Option<String> { | ||
| let observation = self.model_observation.as_ref()?; | ||
| if observation.get("status").and_then(serde_json::Value::as_str) != Some("error") { | ||
| return None; | ||
| } | ||
| Some(observation.to_string()) |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Validate before fingerprinting error observations.
Line 254 accepts any JSON with status: "error", so three malformed observations get replaced by a valid marker before model_visible_content_or_safe_summary() can reject them and fall back to the safe summary. That masks corrupt model-visible data despite the “genuine errors” contract.
Suggested fix
pub fn error_observation_fingerprint(&self) -> Option<String> {
let observation = self.model_observation.as_ref()?;
+ if validate_model_observation(observation).is_err() {
+ return None;
+ }
if observation.get("status").and_then(serde_json::Value::as_str) != Some("error") {
return None;
}
Some(observation.to_string())
}Add a regression for malformed status: "error" observations returning None. Repo invariant violated: guarantees in comments must be enforced. As per coding guidelines, “Comments that promise guarantees across layers must either be enforced by code/tests or softened to describe intent.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn error_observation_fingerprint(&self) -> Option<String> { | |
| let observation = self.model_observation.as_ref()?; | |
| if observation.get("status").and_then(serde_json::Value::as_str) != Some("error") { | |
| return None; | |
| } | |
| Some(observation.to_string()) | |
| pub fn error_observation_fingerprint(&self) -> Option<String> { | |
| let observation = self.model_observation.as_ref()?; | |
| if validate_model_observation(observation).is_err() { | |
| return None; | |
| } | |
| if observation.get("status").and_then(serde_json::Value::as_str) != Some("error") { | |
| return None; | |
| } | |
| Some(observation.to_string()) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/ironclaw_threads/src/tool_result_reference.rs` around lines 252 - 257,
The error observation fingerprinting in error_observation_fingerprint currently
accepts any JSON with status: "error" and fingerprints it directly, which can
let malformed observations bypass the safe-summary fallback. Add validation
before returning a fingerprint by reusing the same shape checks used by
model_visible_content_or_safe_summary or a shared helper, so only genuine error
observations are accepted. Update error_observation_fingerprint to return None
for malformed error payloads, and add a regression test covering malformed
status: "error" observations producing None.
Source: Coding guidelines
…he gateway The gateway's provider-response classifier discards a whole tool-capable response when `validate_provider_tool_call` or `register_provider_tool_call` errors — the error was `?`-propagated with no log. Live, this presented as model calls that were received but never dispatched (budget Reserved -> Released, no "classified as capability calls" line) and the run eventually failing with no trace of which call or why. Make both rejection paths fail loud: log the tool name, provider call id, and error kind before mapping/propagating. This does not change behavior — it makes an otherwise-silent discard diagnosable, so the next run reveals exactly which tool call is rejected and at which stage. Context: this is the diagnostic step for an open tail failure (gpt-5.5 via the Codex ChatGPT backend, calls after a successful google-calendar dispatch). A leading hypothesis is that the forgiving-tool_call change converted a previously *recoverable* undisclosed-bridge failure into a *hard* register error here; this logging will confirm or refute it before any behavior change. Startup marker -> gateway-reject-logging-v8. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/ironclaw_reborn/src/runtime.rs (1)
131-135: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winTrim mode values before matching.
REBORN_TOOL_DISCLOSURE="off "is treated as unrecognized and falls through toBridged, so the documentedoffescape hatch can fail open because of whitespace.As per coding guidelines, “Fail closed for ... runtime selection.”
Suggested patch
fn from_raw(raw: Option<&str>) -> Self { + let raw = raw.map(str::trim); match raw { Some(value) if value.eq_ignore_ascii_case("off") => Self::Off, Some(value) if value.eq_ignore_ascii_case("bridged") => Self::Bridged,assert!(ToolDisclosureMode::from_raw(Some("bridged")).is_bridged()); + assert!(ToolDisclosureMode::from_raw(Some(" bridged ")).is_bridged()); assert!( !ToolDisclosureMode::from_raw(Some("off")).is_bridged(), "explicit REBORN_TOOL_DISCLOSURE=off must still disable disclosure" ); + assert!(!ToolDisclosureMode::from_raw(Some(" off ")).is_bridged());Also applies to: 827-837
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/ironclaw_reborn/src/runtime.rs` around lines 131 - 135, Trim the incoming value before matching in the `TrimMode::from_raw` logic so whitespace-padded settings like `off ` are recognized correctly instead of falling through to the default branch. Update the `from_raw` match in `runtime.rs` to normalize `raw` before checking `off`, `bridged`, and the non-empty fallback, and ensure any other runtime selection paths in the referenced `827-837` section use the same trimmed handling so the selection fails closed.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/ironclaw_reborn/src/runtime.rs`:
- Around line 131-135: Trim the incoming value before matching in the
`TrimMode::from_raw` logic so whitespace-padded settings like `off ` are
recognized correctly instead of falling through to the default branch. Update
the `from_raw` match in `runtime.rs` to normalize `raw` before checking `off`,
`bridged`, and the non-empty fallback, and ensure any other runtime selection
paths in the referenced `827-837` section use the same trimmed handling so the
selection fails closed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 790ec2d1-da2c-4dd7-a523-d4d740974503
📒 Files selected for processing (2)
crates/ironclaw_reborn/src/model_gateway.rscrates/ironclaw_reborn/src/runtime.rs
…orking Diagnostic logging (gateway-reject-logging-v8) confirmed the open tail failure: gpt-5.5 sent `tool_call` bridge invocations that the disclosure port rejected during validation with `error_kind=InvalidInvocation`. The gateway discards the entire provider response on any validate/register error, so each rejection released the model reservation with no dispatch, and after a few the run ended Failed / driver_protocol_violation. Root cause is the forgiving-tool_call change itself: it resolves an undisclosed catalog target and routes it through `inner.validate`/ `inner.register`, which can hard-fail (e.g. malformed arguments for a deferred tool). Before that change the same call became a *recoverable* bridge `invalid_input` the model could correct and retry. Restore the soft behavior on the `tool_call` bridge path without losing forgiving resolution: - validate: probe the resolved target through the inner port for an early diagnostic, but always return Ok — never abort the whole response. - register: try the resolved target; on error, fall back to `register_bridge_call`, so invoke surfaces a recoverable invalid_input instead of a hard error. A malformed/unresolvable `tool_call` is now a tool-result the model can fix (and repeated identical ones collapse via the v7 change), instead of a run-borking discard. - New test drives the register-failure fallback through the caller (SpyPort gains a `register_explodes` sentinel): validate stays Ok, register falls back to a bridge candidate, invoke yields a recoverable InvalidInput. - Direct-deferred path shares this structure and remains a known similar risk; deferred until/if observed, to keep this change surgical. - Startup marker -> recoverable-tool-call-v9. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
crates/ironclaw_reborn/src/tool_disclosure_port.rs (1)
165-183: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winDelegate resolved target IDs to the inner port.
Both forgiving paths collapse resolved targets to
ProviderToolCallCapabilityIds::single(...). If the inner port expandseffective_capability_idsfor approval/policy, the gateway pre-check sees a narrower authority surface than registration/invocation.Suggested patch
- return Ok(ProviderToolCallCapabilityIds::single( - target.definition.capability_id, - )); + return self + .inner + .provider_tool_call_capability_ids(&target.target_call); @@ - return Ok(ProviderToolCallCapabilityIds { - provider_capability_id: target.definition.capability_id.clone(), - effective_capability_ids: vec![target.definition.capability_id], - }); + return self + .inner + .provider_tool_call_capability_ids(&target.target_call);As per coding guidelines, “Fail closed for auth, approvals.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/ironclaw_reborn/src/tool_disclosure_port.rs` around lines 165 - 183, The `provider_tool_call_capability_ids` fallback for resolved targets is collapsing to `ProviderToolCallCapabilityIds::single(...)`, which bypasses any broader `effective_capability_ids` the inner port may produce. Update the `tool_disclosure_port` logic so the `TOOL_CALL_NAME`/`allowed_tool_call_target` path delegates resolved target capability resolution to `self.inner.provider_tool_call_capability_ids` (or otherwise preserves the inner port’s expanded IDs) instead of forcing a single capability id, while keeping the existing fail-closed gate for unresolved names.Source: Coding guidelines
crates/ironclaw_reborn/src/runtime.rs (1)
131-144: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRevert the temporary default-on mode before merge.
from_raw(None), empty, and garbage values all returnBridged, and the test locks that in. Because Line 633 wires the decorator when bridged, this ships the new disclosure path enabled by default; the PR objective marks this as temporary benchmark behavior.Suggested patch
fn from_raw(raw: Option<&str>) -> Self { - match raw { + match raw.map(str::trim) { Some(value) if value.eq_ignore_ascii_case("off") => Self::Off, Some(value) if value.eq_ignore_ascii_case("bridged") => Self::Bridged, Some(value) if !value.is_empty() => { tracing::debug!( target: "ironclaw::reborn::runtime", value, - "unrecognized REBORN_TOOL_DISCLOSURE value; falling back to temporary default Bridged" + "unrecognized REBORN_TOOL_DISCLOSURE value; falling back to Off" ); - Self::Bridged + Self::Off } - // unset / empty -> temporarily ON for benchmarking. - _ => Self::Bridged, + _ => Self::Off, } }Also applies to: 817-841
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/ironclaw_reborn/src/runtime.rs` around lines 131 - 144, The temporary default-on behavior in `from_raw` should be reverted before merge: `None`, empty, and unrecognized `REBORN_TOOL_DISCLOSURE` values must no longer fall back to `Bridged`. Update `from_raw` in `runtime.rs` so only an explicit bridged setting maps to `Self::Bridged`, while unset/invalid inputs return the intended non-disclosure default instead. Also adjust any related test coverage that currently locks in the temporary default so it matches the restored behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/ironclaw_reborn/src/tool_disclosure_port.rs`:
- Around line 220-235: The validation fallback in the target registration flow
is too permissive: after a failed `validate_provider_tool_call` check,
`register_provider_tool_call` still runs and can admit invalid or
policy-rejected targets. Update the registration path in `tool_disclosure_port`
so `inner.register_provider_tool_call(...)` is only attempted when the target
has passed the inner validation, and ensure auth/approval/host-related failures
are propagated instead of downgraded; keep the bridge fallback only for
recoverable invalid input and use the existing `debug!`/validation branch to
gate the subsequent registration logic.
---
Duplicate comments:
In `@crates/ironclaw_reborn/src/runtime.rs`:
- Around line 131-144: The temporary default-on behavior in `from_raw` should be
reverted before merge: `None`, empty, and unrecognized `REBORN_TOOL_DISCLOSURE`
values must no longer fall back to `Bridged`. Update `from_raw` in `runtime.rs`
so only an explicit bridged setting maps to `Self::Bridged`, while unset/invalid
inputs return the intended non-disclosure default instead. Also adjust any
related test coverage that currently locks in the temporary default so it
matches the restored behavior.
In `@crates/ironclaw_reborn/src/tool_disclosure_port.rs`:
- Around line 165-183: The `provider_tool_call_capability_ids` fallback for
resolved targets is collapsing to `ProviderToolCallCapabilityIds::single(...)`,
which bypasses any broader `effective_capability_ids` the inner port may
produce. Update the `tool_disclosure_port` logic so the
`TOOL_CALL_NAME`/`allowed_tool_call_target` path delegates resolved target
capability resolution to `self.inner.provider_tool_call_capability_ids` (or
otherwise preserves the inner port’s expanded IDs) instead of forcing a single
capability id, while keeping the existing fail-closed gate for unresolved names.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: c5538514-1a05-4d1d-b4a5-41a6489cd862
📒 Files selected for processing (2)
crates/ironclaw_reborn/src/runtime.rscrates/ironclaw_reborn/src/tool_disclosure_port.rs
Resolved conflicts: - crates/ironclaw_reborn/src/runtime.rs: compose both decorators — keep the conditional ToolDisclosureCapabilityDecorator (this branch) AND main's DisabledCapabilitiesDecorator. Order: spawn -> ToolDisclosure (if bridged) -> DisabledCapabilities (outermost, per main's intent). - crates/ironclaw_loop_support/src/capability_surface_filter.rs: additive test conflict — keep this branch's `visible_filter_delegates_...` test AND main's CapabilitySurfaceDenyFilter tests. Semantic fix required by the merge (lesson 8): main's CapabilitySurfaceDenyFilter overrides tool_definitions/validate/register but NOT `provider_tool_call_capability_ids`, so the gateway pre-check would fall to the LoopCapabilityPort default (search its own deny-filtered tool_definitions) and reject every deferred/disclosed tool — re-shadowing the resolution fix from f9a436a whenever the deny filter is active (spawn_subagent is disabled by default, so it always is). Added the delegating override + a regression test, mirroring the CapabilitySurfaceVisibleFilter/ProfileFilter fix. Validated: `cargo check --workspace --all-features` clean; loop_support filter tests 24/24, reborn tool_disclosure_port 11/11; clippy clean; binary builds.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/ironclaw_loop_support/src/capability_surface_filter.rs`:
- Around line 202-208: CapabilitySurfaceDenyFilter is using
provider_capability_permitted, which preserves capability_info visibility and
makes explicit denies non-strict; update the deny paths in
CapabilitySurfaceDenyFilter (including tool_definitions and the other listed
surfaces) to use a raw self.permits(...) check or a strict deny-scope helper so
denied capabilities are fully hidden and fail closed. Make sure
capability_info::CAPABILITY_ID is actually excluded when denied, and add a
regression test that verifies capability_info cannot remain advertised or pass
scope checks under the deny filter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 2c476e60-aaa4-4bd4-adf6-9ac3054f2e4a
📒 Files selected for processing (5)
crates/ironclaw_llm/src/nearai_chat.rscrates/ironclaw_loop_support/src/capability_surface_filter.rscrates/ironclaw_loop_support/src/lib.rscrates/ironclaw_reborn/src/runtime.rscrates/ironclaw_reborn_composition/src/runtime.rs
💤 Files with no reviewable changes (2)
- crates/ironclaw_reborn/src/runtime.rs
- crates/ironclaw_reborn_composition/src/runtime.rs
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/ironclaw_loop_support/src/capability_surface_filter.rs`:
- Around line 202-208: CapabilitySurfaceDenyFilter is using
provider_capability_permitted, which preserves capability_info visibility and
makes explicit denies non-strict; update the deny paths in
CapabilitySurfaceDenyFilter (including tool_definitions and the other listed
surfaces) to use a raw self.permits(...) check or a strict deny-scope helper so
denied capabilities are fully hidden and fail closed. Make sure
capability_info::CAPABILITY_ID is actually excluded when denied, and add a
regression test that verifies capability_info cannot remain advertised or pass
scope checks under the deny filter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 2c476e60-aaa4-4bd4-adf6-9ac3054f2e4a
📒 Files selected for processing (5)
crates/ironclaw_llm/src/nearai_chat.rscrates/ironclaw_loop_support/src/capability_surface_filter.rscrates/ironclaw_loop_support/src/lib.rscrates/ironclaw_reborn/src/runtime.rscrates/ironclaw_reborn_composition/src/runtime.rs
💤 Files with no reviewable changes (2)
- crates/ironclaw_reborn/src/runtime.rs
- crates/ironclaw_reborn_composition/src/runtime.rs
🛑 Comments failed to post (1)
crates/ironclaw_loop_support/src/capability_surface_filter.rs (1)
202-208: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Make the deny filter strict; don’t reuse the capability-info-preserving helper.
CapabilitySurfaceDenyFilterpromises explicit denies, but these paths callprovider_capability_permitted, which existing profile tests rely on to keepcapability_infovisible outside the allowlist. Ifcapability_info::CAPABILITY_IDis denied, it can still remain advertised/visible and pass scope checks. Use rawself.permits(...)or a strict deny-scope helper, and add a regression that deniescapability_info.As per path instructions, fail closed for authority/dispatch surfaces. As per coding guidelines, comments that promise guarantees across layers must be enforced by code/tests or softened.
Suggested shape
- definitions.retain(|definition| { - provider_capability_permitted(&definition.capability_id, |capability_id| { - self.permits(capability_id) - }) - }); + definitions.retain(|definition| self.permits(&definition.capability_id)); - validate_provider_tool_call_capability_scope( + validate_provider_tool_call_capability_scope_strict( capability_ids.clone(), |capability_id| self.permits(capability_id), "provider tool call targets a disabled capability", )?;Also applies to: 223-227, 235-257, 267-271, 1628-1675
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/ironclaw_loop_support/src/capability_surface_filter.rs` around lines 202 - 208, CapabilitySurfaceDenyFilter is using provider_capability_permitted, which preserves capability_info visibility and makes explicit denies non-strict; update the deny paths in CapabilitySurfaceDenyFilter (including tool_definitions and the other listed surfaces) to use a raw self.permits(...) check or a strict deny-scope helper so denied capabilities are fully hidden and fail closed. Make sure capability_info::CAPABILITY_ID is actually excluded when denied, and add a regression test that verifies capability_info cannot remain advertised or pass scope checks under the deny filter.Sources: Coding guidelines, Path instructions
…jection
The v9 soft-fix made the ToolDisclosure decorator's bridge validate return Ok,
but a live run still showed `tool_call` rejected during validation with
error_kind=InvalidInvocation. The rejection therefore comes from a HIGHER port
in the chain (CapabilitySurfaceVisibleFilter / DenyFilter), which the
ToolDisclosure soft-fix cannot reach. The gateway logged only error_kind, not
the message that names the layer.
Add `reason = error.safe_summary` to both gateway rejection logs. The
safe_summary is layer-distinct ("outside the model-visible capability view" =
visible filter, "targets a disabled capability" = deny filter), so the next run
names exactly which port rejects and why.
Working hypothesis (to confirm via the new log): the bridge `tool_call` resolves
to a tool that was disclosed via tool_search but then DE-disclosed when a
BlockedAuth resume reset the per-turn disclosed set (the live surface drops the
google-calendar tools from 26 back to 21 across the resume). The visible filter
then hard-rejects the now-non-visible target. Root trigger is the recurring
Google auth_required → BlockedAuth → resume.
Startup marker -> reject-reason-logging-v10.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/ironclaw_reborn/src/runtime.rs (1)
131-135: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winTrim before matching the
offescape hatch.
REBORN_TOOL_DISCLOSURE="off "currently hits the unrecognized branch and enablesBridged, so a padded env-file value fails open instead of disabling disclosure.Suggested patch
fn from_raw(raw: Option<&str>) -> Self { - match raw { + match raw.map(str::trim) { Some(value) if value.eq_ignore_ascii_case("off") => Self::Off, Some(value) if value.eq_ignore_ascii_case("bridged") => Self::Bridged,As per coding guidelines, “Fail closed for ... runtime selection.” The PR objective also says
REBORN_TOOL_DISCLOSURE=offremains the escape hatch.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/ironclaw_reborn/src/runtime.rs` around lines 131 - 135, The `DisclosureMode::from_raw` parsing in `runtime.rs` should fail closed by recognizing the `off` escape hatch even when the env value has surrounding whitespace. Trim the incoming `raw` string before matching in `from_raw` so `REBORN_TOOL_DISCLOSURE="off "` is treated the same as `off`, and keep the existing `Off`/`Bridged` matching logic in place for the `DisclosureMode` enum.Source: Coding guidelines
♻️ Duplicate comments (2)
crates/ironclaw_reborn/src/model_gateway.rs (2)
1859-1908: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winTest replay compaction through
convert_messages.These assertions would still pass if the caller stopped invoking
collapse_repeated_failure_observations. Add a regression that drives the realconvert_messages(...)path and asserts the interior duplicate carriesrepeated_error_elided. As per path instructions, “Test through the caller.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/ironclaw_reborn/src/model_gateway.rs` around lines 1859 - 1908, Add a regression that exercises the real convert_messages(...) flow in model_gateway.rs instead of calling collapse_repeated_failure_observations() directly, so the replay-compaction behavior is verified through its caller. Reuse the existing generic_error_observation(), error_tool_result_message(), and tool_result_observation() helpers to build repeated failures plus a success, then assert the interior duplicate produced by convert_messages(...) is marked with detail.failure_kind = repeated_error_elided while the first/last failures and success remain unchanged.Source: Path instructions
1055-1063: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winPropagate resolver failures; don’t collapse them into “unadvertised”.
This boolean gate still maps every
provider_tool_call_capability_ids()failure to genericInvalidOutput, so internal host-state failures become indistinguishable from an actually outside-surface call. Make the helper returnResult<bool, AgentLoopHostError>and only returnOk(false)for explicit invalid/outside-surface cases. As per path instructions, “Fail loud … Errors propagate with?into thiserror types with context.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/ironclaw_reborn/src/model_gateway.rs` around lines 1055 - 1063, Update the tool-call validation flow in model_gateway so resolver errors are not collapsed into InvalidOutput: change provider_calls_are_advertised_or_resolvable to return Result<bool, AgentLoopHostError>, have it return Ok(false) only for explicit outside-surface cases, and propagate provider_tool_call_capability_ids failures with ? so internal host-state issues surface with context. Then adjust the caller in the model gateway validation block to handle the Result and only construct HostManagedModelError::safe(InvalidOutput, ...) when the helper returns Ok(false), using the existing HostManagedModelError and AgentLoopHostError paths to preserve error detail.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/ironclaw_reborn/src/model_gateway.rs`:
- Around line 1074-1078: The comment attached to the `reason =
error.safe_summary.as_str()` assignment in `model_gateway.rs` overstates an
unverified cross-layer guarantee. Keep the `safe_summary` field, but rewrite the
comment to describe it as sanitized rejection context rather than claiming it is
layer-distinct or that it identifies the rejecting port. Use the surrounding
`error.safe_summary`/`reason` log context to locate and soften the wording
without changing behavior.
---
Outside diff comments:
In `@crates/ironclaw_reborn/src/runtime.rs`:
- Around line 131-135: The `DisclosureMode::from_raw` parsing in `runtime.rs`
should fail closed by recognizing the `off` escape hatch even when the env value
has surrounding whitespace. Trim the incoming `raw` string before matching in
`from_raw` so `REBORN_TOOL_DISCLOSURE="off "` is treated the same as `off`, and
keep the existing `Off`/`Bridged` matching logic in place for the
`DisclosureMode` enum.
---
Duplicate comments:
In `@crates/ironclaw_reborn/src/model_gateway.rs`:
- Around line 1859-1908: Add a regression that exercises the real
convert_messages(...) flow in model_gateway.rs instead of calling
collapse_repeated_failure_observations() directly, so the replay-compaction
behavior is verified through its caller. Reuse the existing
generic_error_observation(), error_tool_result_message(), and
tool_result_observation() helpers to build repeated failures plus a success,
then assert the interior duplicate produced by convert_messages(...) is marked
with detail.failure_kind = repeated_error_elided while the first/last failures
and success remain unchanged.
- Around line 1055-1063: Update the tool-call validation flow in model_gateway
so resolver errors are not collapsed into InvalidOutput: change
provider_calls_are_advertised_or_resolvable to return Result<bool,
AgentLoopHostError>, have it return Ok(false) only for explicit outside-surface
cases, and propagate provider_tool_call_capability_ids failures with ? so
internal host-state issues surface with context. Then adjust the caller in the
model gateway validation block to handle the Result and only construct
HostManagedModelError::safe(InvalidOutput, ...) when the helper returns
Ok(false), using the existing HostManagedModelError and AgentLoopHostError paths
to preserve error detail.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: b8cc017e-74b4-4fe9-9862-1aebdb425874
📒 Files selected for processing (2)
crates/ironclaw_reborn/src/model_gateway.rscrates/ironclaw_reborn/src/runtime.rs
| // The safe_summary is layer-distinct ("outside the | ||
| // model-visible capability view" = visible filter, "targets a | ||
| // disabled capability" = deny filter, etc.), so it names which | ||
| // port in the chain rejected the call. | ||
| reason = error.safe_summary.as_str(), |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Soften the safe_summary guarantee.
This comment asserts a cross-layer guarantee that safe_summary is layer-distinct and names the rejecting port, but that uniqueness is not enforced here. Keep the log field; soften the comment to say it provides sanitized rejection context. As per coding guidelines, comments that promise guarantees across layers must be enforced or softened.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/ironclaw_reborn/src/model_gateway.rs` around lines 1074 - 1078, The
comment attached to the `reason = error.safe_summary.as_str()` assignment in
`model_gateway.rs` overstates an unverified cross-layer guarantee. Keep the
`safe_summary` field, but rewrite the comment to describe it as sanitized
rejection context rather than claiming it is layer-distinct or that it
identifies the rejecting port. Use the surrounding `error.safe_summary`/`reason`
log context to locate and soften the wording without changing behavior.
Source: Coding guidelines
…th/Approval resume v10 reason-logging confirmed the open tail failure: after a google-calendar tool blocked on an approval gate and the run resumed, the model's retry of the SAME tool was rejected by the visible-surface filter with reason="provider tool call is outside the model-visible capability view" — discarding the response, released without dispatch, eventually Failed. Root cause is an asymmetry in promotion. ToolDisclosure promotes a target only on `CapabilityOutcome::Completed`, so: - list_calendars dispatched -> Completed -> promoted -> visible on resume (works) - list_events dispatched -> ApprovalRequired (a suspension, not Completed) -> NOT promoted -> the per-turn disclosed set resets on the resume -> the tool drops off the model-visible surface -> the visible filter hard-rejects the retry. A gate-block is a user-action pause, not a failure: the model committed to that tool and resumes onto the exact same call. Promote on a gate suspension (approval/auth/resource) as well as on Completed. A hard *failure* still does not promote, so this does NOT drift toward advertising every discovered tool — only tools the model actually invoked promote (and the promoted set is capped). Considered and rejected persisting the whole `disclosed_names` set across resume precisely because that WOULD balloon the surface over a long conversation. - one-line condition change in invoke_capability (Completed || is_suspension). - New test `gate_suspended_target_is_promoted_so_it_survives_the_resume` drives an ApprovalRequired outcome (SpyPort `fixture.suspends` sentinel) and asserts the tool is advertised on the next (resume) turn. - Startup marker -> promote-gate-suspended-v11. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tool disclosure hides deferred tools' schemas, advertising only their names in
the tool_search index. A model that calls a deferred tool by name without
loading its schema omits required arguments and loops on the opaque
invalid_input error -- the failure mode that did not exist pre-disclosure, when
the full schema (incl. required fields) was always in the model's tool list.
Describe-first (tool_disclosure_port.rs): when the model uses a deferred tool
whose schema it has not loaded this turn, probe-validate its arguments.
- valid -> dispatch directly (zero round-trip, as pre-disclosure)
- invalid -> return the parameter schema as a recoverable describe-style
completion instead of dispatching blind, and mark it disclosed
One-shot per undisclosed tool: a still-invalid call after disclosure dispatches
and fails through the normal path, so the no-progress detector still observes
repeated failures (cannot be turned into an endless "made progress" schema
loop). Restores the pre-disclosure schema-at-call-time guarantee at the point
of need without taxing correct blind calls.
Also commits the structural-awareness disclosure hardening this builds on:
- tool_search description indexes every discoverable tool by exact name
(names only, byte-budgeted under the 4096-byte capability safe-description
cap with an "...and N more" tail) so the model can see deferred tools exist
- callable set unions the synthesized bridges plus everything advertised this
turn so the visible-surface filter never strips tool_search/describe/call
- extension_activate success enumerates the published capability ids
Live-validated with gpt-5.5: discovers, describes, and uses the Google tools,
and a malformed deferred call now gets its schema back instead of looping.
Reborn lib tests green (285), clippy clean. Experimental branch (PR #5149);
pre-GA cleanup (temporary default-on, build markers, diagnostics) tracked
separately in the design notes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/ironclaw_agent_loop/src/executor/prompt.rs`:
- Around line 236-256: The call-time authorization path in prompt.rs now uses
callable_capability_ids directly, but the visible-capability filters only narrow
descriptors, so a tool can be hidden from the model-visible surface yet still
remain callable. Update CapabilitySurfaceProfileFilter::visible_capabilities and
CapabilitySurfaceDenyFilter::visible_capabilities to intersect or otherwise
constrain callable_capability_ids in the same way they constrain descriptors,
and add a regression test covering a surface with non-empty
callable_capability_ids to verify hidden tools are excluded from authorization.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 4d1f2561-3a80-4e79-979a-753ac6b3726a
📒 Files selected for processing (20)
crates/ironclaw_agent_loop/src/executor/prompt.rscrates/ironclaw_agent_loop/src/executor/tests/support.rscrates/ironclaw_agent_loop/src/test_support/mod.rscrates/ironclaw_hooks/src/middleware/capability_port.rscrates/ironclaw_hooks/src/middleware/tests/capability_port.rscrates/ironclaw_loop_support/src/capability_port.rscrates/ironclaw_loop_support/src/capability_surface_filter.rscrates/ironclaw_loop_support/src/lib.rscrates/ironclaw_loop_support/src/subagent_spawn_port/tests.rscrates/ironclaw_loop_support/tests/thread_loop_support_contract.rscrates/ironclaw_product_workflow/tests/support/planned_agent_loop.rscrates/ironclaw_reborn/src/runtime.rscrates/ironclaw_reborn/src/subagent/flavors.rscrates/ironclaw_reborn/src/tool_disclosure.rscrates/ironclaw_reborn/src/tool_disclosure_port.rscrates/ironclaw_reborn/tests/hooks_integration.rscrates/ironclaw_reborn/tests/llm_gateway.rscrates/ironclaw_reborn_composition/src/extension_lifecycle.rscrates/ironclaw_turns/src/run_profile/host.rscrates/ironclaw_turns/tests/agent_loop_host_contract.rs
| // The capability view drives call-time authorization (the model-visible | ||
| // capability filter), which must permit every tool the model can legitimately | ||
| // invoke this turn — not just the advertised subset. Under progressive tool | ||
| // disclosure the surface narrows `descriptors` to the advertised set but | ||
| // carries the full reachable catalog in `callable_capability_ids`; use that | ||
| // wider set so bridge / forgiving-direct calls to disclosed-but-unadvertised | ||
| // tools aren't rejected as "outside the model-visible capability view". | ||
| // Advertising and prompt rendering still use the narrow `descriptors`. | ||
| // Empty `callable_capability_ids` means no narrowing is in effect, so fall | ||
| // back to `descriptors` (preserves non-disclosure behavior exactly). | ||
| let visible_capability_ids = if surface.callable_capability_ids.is_empty() { | ||
| surface | ||
| .descriptors | ||
| .iter() | ||
| .map(|descriptor| descriptor.capability_id.clone()) | ||
| .collect(), | ||
| .collect() | ||
| } else { | ||
| surface.callable_capability_ids.clone() | ||
| }; | ||
| let capability_view = LoopModelCapabilityView { | ||
| visible_capability_ids, |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== prompt pipeline branch =="
rg -n -C4 'callable_capability_ids|visible_capability_ids|apply_capability_filter' \
crates/ironclaw_agent_loop/src/executor/prompt.rs
echo
echo "== filter implementations touching the visible surface =="
rg -n -C4 'callable_capability_ids|descriptors|apply_capability_filter|CapabilitySurfaceProfileFilter|CapabilitySurfaceDenyFilter' \
crates/ironclaw_loop_support/src
echo
echo "== tests that exercise non-empty callable_capability_ids =="
rg -n -C2 'callable_capability_ids:\s*vec!\[|callable_capability_ids:\s*Vec::new\(\)' \
crates testsRepository: nearai/ironclaw
Length of output: 50372
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== prompt.rs surface -> capability view path =="
sed -n '220,270p' crates/ironclaw_agent_loop/src/executor/prompt.rs
echo
echo "== capability_surface_filter.rs visible_capabilities implementations =="
sed -n '250,420p' crates/ironclaw_loop_support/src/capability_surface_filter.rs
echo
echo "== any non-empty callable_capability_ids constructors =="
rg -n -C3 'callable_capability_ids:\s*(?!Vec::new\(\)|vec!\[\])' \
crates/ironclaw_loop_support/src crates/ironclaw_agent_loop/src \
--pcre2Repository: nearai/ironclaw
Length of output: 21215
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== prompt.rs surface -> capability view path =="
sed -n '220,270p' crates/ironclaw_agent_loop/src/executor/prompt.rs
echo
echo "== capability_surface_filter.rs visible_capabilities implementations =="
sed -n '250,420p' crates/ironclaw_loop_support/src/capability_surface_filter.rs
echo
echo "== any non-empty callable_capability_ids constructors =="
rg -n -C3 'callable_capability_ids:\s*(?!Vec::new\(\)|vec!\[\])' \
crates/ironclaw_loop_support/src crates/ironclaw_agent_loop/src \
--pcre2Repository: nearai/ironclaw
Length of output: 21215
Intersect callable_capability_ids when filtering visible surfaces.
CapabilitySurfaceProfileFilter::visible_capabilities and CapabilitySurfaceDenyFilter::visible_capabilities only trim descriptors; crates/ironclaw_agent_loop/src/executor/prompt.rs now authorizes against callable_capability_ids. That leaves a path for an outer filter to drop a tool from the model-visible surface but still let it through call-time auth. Add a regression test with a non-empty callable_capability_ids surface.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/ironclaw_agent_loop/src/executor/prompt.rs` around lines 236 - 256,
The call-time authorization path in prompt.rs now uses callable_capability_ids
directly, but the visible-capability filters only narrow descriptors, so a tool
can be hidden from the model-visible surface yet still remain callable. Update
CapabilitySurfaceProfileFilter::visible_capabilities and
CapabilitySurfaceDenyFilter::visible_capabilities to intersect or otherwise
constrain callable_capability_ids in the same way they constrain descriptors,
and add a regression test covering a surface with non-empty
callable_capability_ids to verify hidden tools are excluded from authorization.
Source: Path instructions
|
/benchmark pinchbench --framework ironclaw-reborn |
|
🧪 Started |
🧪 nearai-bench
|
Summary
Cuts the per-call prompt that drives NEAR AI latency/timeouts. From production logs, every model call shipped all ~91 tool schemas + system prompt + history ≈ 25.8k tokens, re-sent ~4× per turn — pushing NEAR AI past its 120s request timeout (retries exhausted → no answer). This PR adds progressive tool disclosure (Hermes-style) behind a flag, plus shadow measurement, and documents that compression/caching already exist in the codebase.
Everything new is behind
REBORN_TOOL_DISCLOSURE(off | bridged), defaultoff⇒ byte-identical request path when off.What's here (by commit)
docs/plans/2026-06-23-hermes-style-context-management.md) — fusion-council design (Opus 4.8 + GPT-5.5), signed off.prompt_tokens_details.cached_tokens(+ top-level fallback) intocache_read_input_tokens; per-calldebug!ontarget: ironclaw::reborn::context_shadowof prompt/cached tokens + tool-schema token estimate.CapabilityCatalog,select_active_set(bridges-always + append-only earned promotion, canonical never-resorted order), 3 bridge tool defs, deterministictool_search_rank. Benchmark: 21,240 → 1,427 tool-schema tokens = 93.3% reduction on a representative 91-tool fixture.LoopCapabilityPortdecorator (ToolDisclosureCapabilityDecorator).visible_capabilities()filters to the active set; bridgetool_callroutes its synthetic target throughinner.validate/register/invoke— same audit/safety/approval/validation as a direct call. Guards reject recursion / unknown / not-disclosed-this-turn as model-recoverableInvalidInputfailures (never run death). Append-only per-thread promotion.tool_call→ recoverable failure with the inner port never dispatched; flag-off defaults verified.Behavior of disclosure (flag = bridged)
Core set (always advertised):
tool_search,tool_describe,tool_call,result_read,memory_search/read/write,skill_search,file_read,list_dir, plus profile pins. Deliberately read-leaning; side-effecting tools (shell/write_file/patch/http) are discoverable + promoted on use. Membership is telemetry-tunable.Plan premises corrected during implementation
Grounding in the real code surfaced that two design phases already exist and were not reimplemented:
ActiveTaskPreservingCompactionStrategy+BudgetStrategy+ summary-onlyLoopContextMessagerefs. A from-scratch compressor was prototyped, found to be the wrong layer (LoopModelMessageis{role, content_ref}— content already externalized), and discarded.InstructionBundleFingerprintassembly, now reinforced by the deterministic memoized tool list;cached_tokensmeasurement landed in Phase 0.Validation
cargo fmt;cargo clippy -p ironclaw_reborn --features root-llm-provider --tests(zero warnings); default +root-llm-providerbuilds.ironclaw_rebornlib suite: 270 passed, incl. the disclosure selector tests, the through-the-caller bridge dispatch test, and the negative-path/flag-off tests.cargo test -p ironclaw_reborn --lib tool_disclosure -- --nocapture.Remaining (operational, not in this PR)
Enabling disclosure in production is a flagged canary: set
REBORN_TOOL_DISCLOSURE=bridgedon one deployment, validate against the Phase-0 shadow logs (real token delta + whether NEAR populatescached_tokens), gated on PinchBench ≥ 0.768. Core-tool membership to be tuned from the resulting telemetry.Live results & hardening (update — 2026-06-25)
Since the original write-up the flag has been exercised live (profile
local-dev, against NEAR AI) on real tasks, and a chain of run-borks found only under a live model has been fixed at source. The mechanism now works end-to-end on a general model.Tool-schema token reduction — measured
The advertised floor (core + 3 bridges ≈ 3.1k tokens) is roughly constant, so the % reduction grows with catalog size:
index-names-only-v16)Live numbers come straight from the Phase-0 shadow log (
target: ironclaw::reborn::context_shadow,reduction_pct=…per turn).NEAR AI prefix caching — confirmed
The original open empirical-gate question — does NEAR actually cache the stable prefix? — is answered yes. Within a single turn, live
cached_tokensclimbed 0 → 256 → 6,912 ≈ 97% cache hit on the stable prefix. Stable-prefix discipline (deferred tools never re-sorted, bridges-always, append-only promotion) pays off on NEAR's vLLM/SGLang backend.End-to-end behavior
Two failures that only appeared under a live model, both now fixed:
Discovery gap — a general model on a non-coding task ("what's on my calendar") burned ~26 turns of
glob/read_file/grepand never calledtool_search: it can't reach for tools it can't see. Fixed by moving discovery into the harness — thetool_searchdescription now indexes every deferred tool by exact name (names-only, byte-budgeted under the 4096-byte capability-description cap), and the callable set unions the synthesized bridges back in so the visible-surface filter can't strip them. gpt-5.5 now discovers, describes, and promotes the Google tools (calendar.list_events,docs.get_document,drive.list_files).Blind-call run-bork → describe-first — on "prepare a meeting brief", gpt-5.5 called
docs.get_document/drive.get_filewithout the required id (deferred schemas meant it called by name alone), looped ~30 identicalinvalid_inputfailures, and the run diedFailedat the iteration cap. This failure mode did not exist pre-disclosure, when the full schema was always in the tool list. Fix (tool_disclosure_port.rs): when the model uses a deferred tool whose schema it hasn't loaded this turn, probe-validate its arguments — valid → dispatch directly (zero round-trip), invalid → return the parameter schema instead of dispatching blind, one-shot per undisclosed tool so a still-invalid retry dispatches and fails normally (the no-progress detector still sees repeats). After this, the meeting-brief task completes successfully.Still pending before this is merge-ready
disclosure_buildstartup markers, and the temporary direct-deferred diagnostics.ironclaw_rebornlib suite 285 passed, clippy clean.🤖 Generated with Claude Code