feat(harness): single-agent cached tool-calling loop (replaces agentic expert panel, flag-gated)#877
Conversation
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… dict per message Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…oints Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…-content tails Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…xtraction langchain_anthropic maps ephemeral cache writes to ephemeral_5m_input_tokens / ephemeral_1h_input_tokens and sets cache_creation=0. _extract_usage now sums all three keys so dashboards show accurate cache-write counts and pricing.py applies the correct 1.25× rate instead of base input rate. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds a single-agent native tool-calling execution path with cached prompt construction, deterministic tool definitions, supervisor/server wiring, telemetry updates for Anthropic cache tokens, and unit plus live integration tests. ChangesAgent Loop Feature
Estimated code review effort: 4 (Complex) | ~60 minutes Sequence Diagram(s)sequenceDiagram
participant Server
participant HarnessSupervisor
participant AgentLoopRunner
participant Model
participant ToolRegistry
Server->>HarnessSupervisor: inject agent_loop at startup
HarnessSupervisor->>AgentLoopRunner: run(user_message, ctx, prior_messages)
AgentLoopRunner->>Model: ainvoke(cached system + history)
Model-->>AgentLoopRunner: AIMessage / tool_calls
alt tool calls returned
loop for each tool call
AgentLoopRunner->>ToolRegistry: invoke_step(tool_call)
ToolRegistry-->>AgentLoopRunner: evidence or failure
AgentLoopRunner->>Model: ainvoke(next turn)
Model-->>AgentLoopRunner: final AIMessage
end
else direct answer
Model-->>AgentLoopRunner: final AIMessage
end
AgentLoopRunner-->>HarnessSupervisor: answer, evidence, usage_log
Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
miot-harness/tests/integration/test_agent_loop_cache_live.py (1)
109-121: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueUse
ToolRegistry()here instead of__new__. The registry already has a public empty constructor, and__init__only initializes_tools; keeping the helper on the normal construction path avoids reaching into private state.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@miot-harness/tests/integration/test_agent_loop_cache_live.py` around lines 109 - 121, Replace the manual ToolRegistry.__new__ construction in _big_registry() with the public ToolRegistry() constructor. The helper only needs an empty registry, and using the normal initialization path preserves __init__ behavior instead of setting the private _tools field directly.miot-harness/src/miot_harness/observability/callbacks.py (1)
85-93: 🗄️ Data Integrity & Integration | 🔵 Trivial | 💤 Low valueKeep 1h ephemeral cache writes distinct
TokenUsage.cache_creation_input_tokenscollapsesephemeral_5m_input_tokensandephemeral_1h_input_tokens, socompute_costwill price both at the same cache-creation rate. If 1h ephemeral caching is ever enabled, split this bucket or carry the TTL through to pricing so those writes don’t get underbilled.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@miot-harness/src/miot_harness/observability/callbacks.py` around lines 85 - 93, The TokenUsage cache_creation_input_tokens aggregation in callbacks.py is combining 5m and 1h ephemeral writes into one bucket, which hides the TTL distinction. Update the telemetry path around the cache_creation calculation so ephemeral_5m_input_tokens and ephemeral_1h_input_tokens remain separate (or preserve TTL metadata) and adjust compute_cost to price them with the correct cache-creation rate based on the specific TTL.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@miot-harness/src/miot_harness/runtime/agent_loop.py`:
- Around line 102-135: The live cache-marker coverage in AgentLoopRunner
currently misses the tool-result path, so add a live test case that exercises a
real tool call through AgentLoopRunner.run() and reaches the
ToolMessage/content=str branch in _mark_message and _with_tail_marker. Extend
the existing integration test to include a tool-using turn and assert the
request still succeeds with cache markers applied to the final tool-result
message, so a server-side invalid_cache regression on tool-result turns is
caught.
---
Nitpick comments:
In `@miot-harness/src/miot_harness/observability/callbacks.py`:
- Around line 85-93: The TokenUsage cache_creation_input_tokens aggregation in
callbacks.py is combining 5m and 1h ephemeral writes into one bucket, which
hides the TTL distinction. Update the telemetry path around the cache_creation
calculation so ephemeral_5m_input_tokens and ephemeral_1h_input_tokens remain
separate (or preserve TTL metadata) and adjust compute_cost to price them with
the correct cache-creation rate based on the specific TTL.
In `@miot-harness/tests/integration/test_agent_loop_cache_live.py`:
- Around line 109-121: Replace the manual ToolRegistry.__new__ construction in
_big_registry() with the public ToolRegistry() constructor. The helper only
needs an empty registry, and using the normal initialization path preserves
__init__ behavior instead of setting the private _tools field directly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b1f5a5a6-9eec-4d8d-a60c-b7e3f0859fff
📒 Files selected for processing (19)
miot-harness/src/miot_harness/agents/chat_models.pymiot-harness/src/miot_harness/agents/native_tools.pymiot-harness/src/miot_harness/api/server.pymiot-harness/src/miot_harness/config.pymiot-harness/src/miot_harness/observability/callbacks.pymiot-harness/src/miot_harness/runtime/agent_loop.pymiot-harness/src/miot_harness/runtime/agent_prompt.pymiot-harness/src/miot_harness/runtime/supervisor.pymiot-harness/tests/agents/test_chat_models_loop.pymiot-harness/tests/integration/__init__.pymiot-harness/tests/integration/conftest.pymiot-harness/tests/integration/test_agent_loop_cache_live.pymiot-harness/tests/observability/test_callbacks.pymiot-harness/tests/runtime/test_agent_loop.pymiot-harness/tests/runtime/test_agent_loop_payload.pymiot-harness/tests/runtime/test_agent_prompt.pymiot-harness/tests/runtime/test_supervisor_agent_loop.pymiot-harness/tests/test_config.pymiot-harness/tests/test_native_tools.py
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Summary
Replaces the agentic expert-panel pipeline (planner → verifier → synthesizer → critic seats, 5-10+ sequential full-context LLM calls per question) with one cached native tool-calling agent loop, behind a feature flag (
MIOT_HARNESS_AGENTS_AGENT_LOOP_ENABLED, default off — flag-off behavior is verified identical to trunk).Root causes addressed (Console showed zero prompt-cache usage):
cache_controlanywhere — every LLM call paid full input price and full prefill latency.Architecture
runtime/agent_loop.py—AgentLoopRunner: one model, tools bound once at boot, append-only conversation loopingtool_use → invoke_step → tool_resultuntil the model answers. Replaces the planner/verifier/synthesizer/critic seats; reusesinvoke_step, the rule-basedfreshness_judge, and provenance logging unchanged.ephemeralbreakpoint) + one request-time tail marker applied on a copy — exactly 2 breakpoints per request, wire-verified via_get_request_payloadtests.<system-reminder>blocks in the user turn.input_token_details["ephemeral_5m_input_tokens"]and zeroescache_creation), so the Console/Langfuse cache panels report real numbers.Verification
tests/integration/test_agent_loop_cache_live.py, skip-gated behindMIOT_HARNESS_RUN_LIVE_TESTS=1).Rollout plan (not in this PR)
Golden-eval parity run (legacy agentic graph vs loop) → review latency/cost/cache metrics → flip the flag default and delete the legacy agentic seats in a follow-up.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Closes #878