Add pending message queue and background tool execution#4980
Add pending message queue and background tool execution#4980
Conversation
Introduces two framework-level features, both powered by auto-injected capabilities: **Pending Message Queue:** - `PendingMessage` type with `'steering'` and `'follow_up'` priorities - `enqueue_message()` on `RunContext` (from tools/hooks) and `AgentRun` (external code) - `PendingMessageDrainCapability` drains steering messages before model requests and follow-up messages when the agent would otherwise end **Background Tools:** - `ToolDefinition.background` field, `Tool(background=True)`, `@agent.tool(background=True)` - `BackgroundToolCapability` spawns asyncio tasks, returns immediate ack, delivers results as follow-up messages, waits before End, cleans up on run end Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docs Preview
|
There was a problem hiding this comment.
🚩 async for iteration path does NOT fire after_node_run hooks, breaking pending message drain
The AgentRun.__anext__ method at run.py:193-210 directly calls anext(self._graph_run) without firing any capability hooks (before_node_run, after_node_run, etc.). The existing warning in __aiter__ (run.py:183-189) only checks for has_wrap_node_run and warns about wrap_node_run hooks. However, the new PendingMessageDrainCapability and BackgroundToolCapability rely on after_node_run hooks, which also don't fire through async for. Users who iterate via async for node in agent_run: instead of agent_run.next(node) will silently miss pending message draining and background task waiting. The warning should be updated to mention after_node_run hooks as well, or these capabilities should be checked in __aiter__.
(Refers to lines 183-189)
Was this helpful? React with 👍 or 👎 to provide feedback.
| # Wait for at least one task to complete | ||
| self._completion_event.clear() | ||
| await self._completion_event.wait() |
There was a problem hiding this comment.
🔴 Race condition: _completion_event.clear() can drop already-set signals from completed tasks
In BackgroundToolCapability.after_node_run, between the check not self._tasks and self._completion_event.clear(), a background task could have completed during an earlier await in the agent loop (e.g. during the model request that produced the End result). If a task completed and called self._completion_event.set() but there are still OTHER pending tasks in self._tasks, the code enters the wait path, calls clear() (discarding the signal from the already-completed task), and then wait()s. If the remaining tasks also happened to complete (and remove themselves from _tasks) before the event loop yields back into _run(), the event will never be re-set, and wait() will block indefinitely.
While asyncio's cooperative scheduling reduces the window, the await self._completion_event.wait() itself is a yield point — if ALL remaining tasks complete and set the event during the same tick as clear(), the clear-then-wait pattern loses the signal. A safer approach would use asyncio.wait() on the actual task objects rather than a separate Event.
Prompt for agents
In BackgroundToolCapability.after_node_run (lines 95-104), the clear-then-wait pattern on a shared asyncio.Event can lose signals when tasks complete between clear() and wait(). Consider replacing the Event-based approach with asyncio.wait() on the actual task objects, e.g.:
done, _ = await asyncio.wait(list(self._tasks.values()), return_when=asyncio.FIRST_COMPLETED)
This directly waits for at least one task to finish without risk of losing signals. The _completion_event field could then be removed entirely.
Was this helpful? React with 👍 or 👎 to provide feedback.
| drained = _drain_by_priority(ctx.pending_messages, 'steering') | ||
| if drained: | ||
| parts = [part for msg in drained for part in msg.parts] | ||
| request_context.messages.append(ModelRequest(parts=parts)) | ||
| return request_context |
There was a problem hiding this comment.
🔴 Steering messages are appended to request_context.messages (the model's view) but not to state.message_history (the persistent history)
In PendingMessageDrainCapability.before_model_request (_pending_messages.py:58-61), steering messages are drained and appended to request_context.messages. However, request_context.messages is a copy of state.message_history (created at _agent_graph.py:808 via ctx.state.message_history[:]). After the model request completes, _agent_graph.py:839 writes messages back: ctx.state.message_history[:] = messages. But messages is set from request_context.messages at line 819 (messages = request_context.messages). So the steering messages are written back.
However, the new ModelRequest containing the steering parts is appended after the existing messages. The model sees it as a separate request interleaved in the conversation. But crucially, it's appended before the actual request that triggered this model call. This means the steering message appears in the conversation before the tool return / user prompt that preceded the model call, which may confuse the model about conversation ordering. The steering ModelRequest has no run_id or timestamp set, unlike the other requests (see _agent_graph.py:830-834), making it inconsistent with the rest of the message history.
Prompt for agents
The PendingMessageDrainCapability.before_model_request appends a new ModelRequest to request_context.messages without setting run_id or timestamp. All other ModelRequests in the message history have these fields set (see _agent_graph.py lines 830-834 which ensure the last request has a timestamp and run_id). The injected steering ModelRequest will be missing these fields, making it inconsistent with the rest of the message history and potentially breaking serialization or tooling that expects these fields.
At minimum, the ModelRequest created in _pending_messages.py:61 should include timestamp and run_id fields. You could pass them via the RunContext (which has run_id) and use now_utc() for the timestamp. For example:
from pydantic_ai._utils import now_utc
request_context.messages.append(ModelRequest(parts=parts, run_id=ctx.run_id, timestamp=now_utc()))
Also consider whether appending the steering message as a separate ModelRequest at the end of messages is the right approach, vs merging the parts into the existing last ModelRequest.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if not isinstance(result, End) and self._graph_run.output is not None: | ||
| self._graph_run._next = [self._node_to_task(result)] # pyright: ignore[reportPrivateUsage,reportAttributeAccessIssue] |
There was a problem hiding this comment.
🔴 Private _graph_run._next hack assigns GraphTaskRequest instead of GraphTask, bypassing required conversion
At run.py:270, the code directly assigns [self._node_to_task(result)] to self._graph_run._next. The _node_to_task method returns GraphTaskRequest objects, but GraphRun._next expects EndMarker | Sequence[GraphTask] | None. The normal path through GraphRun.next() converts GraphTaskRequest to GraphTask via GraphTask.from_request() — this direct assignment bypasses that conversion.
While the value is typically overwritten on the next GraphRun.next() call, any intermediate access to self._graph_run.next_task would return objects of the wrong type. More critically, if the underlying graph async generator has already returned (exhausted after yielding EndMarker), subsequent asend() calls will raise StopAsyncIteration, which _advance_graph silently catches — causing it to return the input node unchanged, potentially creating an infinite loop.
Prompt for agents
The hack at run.py:269-270 directly mutates the private _next field of GraphRun, bypassing the GraphTaskRequest → GraphTask conversion that GraphRun.next() normally performs. This also assumes the graph's internal async generator is still alive after yielding EndMarker, but if the generator has returned (exhausted), subsequent asend() calls raise StopAsyncIteration which _advance_graph silently catches, leading to the node being returned unchanged (infinite loop).
A more robust approach would be to extend the GraphRun API to support 'un-ending' a run (e.g. a public method like reset_end(tasks) that properly converts GraphTaskRequests and restarts the iterator), rather than directly manipulating private internals. Alternatively, consider whether after_node_run should be able to convert End results to continuation nodes at all — perhaps the pending message drain should happen at a different point in the lifecycle.
Was this helpful? React with 👍 or 👎 to provide feedback.
| async def _run() -> None: | ||
| try: | ||
| result = await handler(args) | ||
| ctx.enqueue_message( | ||
| SystemPromptPart(f"Background tool '{tool_name}' (task {task_id}) completed.\nResult: {result}"), | ||
| priority='follow_up', | ||
| ) | ||
| except Exception as e: | ||
| ctx.enqueue_message( | ||
| SystemPromptPart(f"Background tool '{tool_name}' (task {task_id}) failed: {e}"), | ||
| priority='follow_up', | ||
| ) | ||
| finally: | ||
| self._tasks.pop(task_id, None) | ||
| self._completion_event.set() | ||
|
|
There was a problem hiding this comment.
🟡 Background task captures ctx (RunContext) closure that may be stale or reused across steps
In BackgroundToolCapability.wrap_tool_execute (_background_tools.py:62-76), the _run() async closure captures ctx (a RunContext) by reference. This ctx object is used later (potentially much later, after the task completes) to call ctx.enqueue_message(...). However, RunContext instances are created per-step via build_run_context (_agent_graph.py:1223-1247), and pending_messages is a shared deque from state.pending_messages. The ctx captured here is the one from the tool execution step, which should still reference the correct shared pending_messages deque. However, ctx also contains messages, run_step, usage, and other step-specific state that will be stale when the background task completes. If the handler(args) call within the background task accesses ctx (which it does via the tool function's RunContext), it will see outdated messages and run_step values. This is a correctness issue for tools that depend on ctx.messages being current.
Was this helpful? React with 👍 or 👎 to provide feedback.
| auto_caps: list[AbstractCapability[AgentDepsT]] = [ | ||
| PendingMessageDrainCapability(), | ||
| BackgroundToolCapability(), | ||
| ] | ||
| self._root_capability = CombinedCapability(auto_caps + capabilities) |
There was a problem hiding this comment.
🚩 Auto-injected capabilities add overhead to every agent, even without background tools
Both PendingMessageDrainCapability and BackgroundToolCapability are unconditionally prepended to every agent's capability list (agent/__init__.py:413-417). This means every agent run — even those that never use background tools or pending messages — pays the cost of:
for_run()creating a freshBackgroundToolCapabilityinstance every run (_background_tools.py:42-44)before_model_requestiterating the pending_messages deque (even when empty) on every model requestafter_node_runcheckingisinstance(result, End)andself._taskson every nodewrap_runwrapping the entire run with a try/finallywrap_tool_executecheckingtool_def.backgroundfor every single tool call
For the vast majority of agents that don't use these features, this is unnecessary overhead. A more targeted approach would be to only inject these capabilities when background=True tools are actually registered, or to make the checks essentially free (e.g., skip for_run allocation when no background tools exist).
Was this helpful? React with 👍 or 👎 to provide feedback.
| auto_caps: list[AbstractCapability[AgentDepsT]] = [ | ||
| PendingMessageDrainCapability(), | ||
| BackgroundToolCapability(), | ||
| ] | ||
| self._root_capability = CombinedCapability(auto_caps + capabilities) |
There was a problem hiding this comment.
🚩 Capability hook ordering relies on position in list, which is fragile
The after_node_run hooks in CombinedCapability (combined.py:184) execute in reversed order. The auto-injected capabilities at agent/__init__.py:413-417 are prepended as [PendingMessageDrainCapability, BackgroundToolCapability, ...user_caps]. In reversed order, BackgroundToolCapability.after_node_run runs before PendingMessageDrainCapability.after_node_run, which is critical: Background waits for task completion and enqueues follow-ups, then PendingMessageDrain converts End → ModelRequestNode. If a user adds a capability via from_spec or override that also manipulates after_node_run, the ordering could break. The coupling between these two auto-injected capabilities via positional ordering in the list is an implicit contract that isn't documented or enforced.
Was this helpful? React with 👍 or 👎 to provide feedback.
| follow_ups = _drain_by_priority(ctx.pending_messages, 'follow_up') | ||
| if not follow_ups: | ||
| return result | ||
|
|
||
| parts = [part for msg in follow_ups for part in msg.parts] | ||
| request = ModelRequest(parts=parts) | ||
| return ModelRequestNode(request=request) |
There was a problem hiding this comment.
🚩 Follow-up drain creates ModelRequestNode without run_step increment or usage limit check
When PendingMessageDrainCapability.after_node_run redirects End to a ModelRequestNode (_pending_messages.py:83-84), this creates a new model request that bypasses the normal UserPromptNode flow. The new ModelRequestNode will trigger a model call, consuming tokens and incrementing usage. However, there's no explicit usage_limits check before this redirect — the check happens inside ModelRequestNode.run(). If the agent is near its usage limit, this redirect could cause UsageLimitExceeded to be raised during the follow-up model call, which would lose the follow-up context. This is arguably correct behavior (limits should be respected), but users may find it surprising that background tool results are silently lost when usage limits are reached.
Was this helpful? React with 👍 or 👎 to provide feedback.
| async def _run() -> None: | ||
| try: | ||
| result = await handler(args) | ||
| ctx.enqueue_message( | ||
| SystemPromptPart(f"Background tool '{tool_name}' (task {task_id}) completed.\nResult: {result}"), | ||
| priority='follow_up', | ||
| ) |
There was a problem hiding this comment.
🚩 Background tool results delivered as SystemPromptPart may confuse models
In _background_tools.py:65-67, completed background tool results are delivered as SystemPromptPart messages. This is semantically odd — the result of a tool call is packaged as a system prompt rather than a ToolReturnPart or UserPromptPart. Some models may treat system prompts differently (e.g., with higher weight or special formatting). Since the original tool call already received an acknowledgment ToolReturnPart, there's no corresponding tool_call_id for a follow-up ToolReturnPart. Using SystemPromptPart is a pragmatic choice given the constraints, but it could lead to unexpected model behavior, especially with models that enforce strict system/user/assistant role alternation.
Was this helpful? React with 👍 or 👎 to provide feedback.
| PendingMessage, | ||
| PendingMessagePriority, |
There was a problem hiding this comment.
🚩 No documentation added for the pending messages and background tools features
The AGENTS.md rules state all changes need to 'update/add all relevant documentation, following the existing voice and patterns'. This PR adds significant new public APIs (PendingMessage, PendingMessagePriority, RunContext.enqueue_message, AgentRun.enqueue_message, background=True tool parameter) but no documentation files were changed. The features are exported in __init__.py and have docstrings, but there are no docs/ changes, no guide pages, and no updates to existing tool documentation to describe background tools or the pending message queue. This is flagged as an analysis rather than a bug since the PR may be WIP or documentation may be planned separately.
Was this helpful? React with 👍 or 👎 to provide feedback.
| # If after_node_run redirected End to a new node (e.g. pending message drain), | ||
| # clear the graph run's end state so the run loop continues. | ||
| if not isinstance(result, End) and self._graph_run.output is not None: | ||
| self._graph_run._next = [self._node_to_task(result)] # pyright: ignore[reportPrivateUsage,reportAttributeAccessIssue] | ||
|
|
There was a problem hiding this comment.
🚩 Direct mutation of private GraphRun._next to override End state is fragile
At pydantic_ai_slim/pydantic_ai/run.py:269-270, the code directly mutates self._graph_run._next to clear the EndMarker when after_node_run transforms an End into a new node. This works because (1) GraphRun.output checks isinstance(self._next, EndMarker) and returns None for non-EndMarker values, and (2) the next call to GraphRun.next() overwrites _next with properly-typed GraphTask objects. However, the value written ([GraphTaskRequest]) doesn't match _next's expected type (Sequence[GraphTask]). This is a private API that could break silently if pydantic_graph changes its internals. The pyright: ignore comment acknowledges this. Consider whether GraphRun should expose a public method for resetting end state.
Was this helpful? React with 👍 or 👎 to provide feedback.
Supersedes #4979 (which had merge conflicts due to branch history).
Summary
ctx.enqueue_message()/agent_run.enqueue_message()) with'steering'and'follow_up'priorities, powered by an auto-injectedPendingMessageDrainCapability@agent.tool(background=True)/Tool(background=True)/ToolDefinition.background), powered by an auto-injectedBackgroundToolCapabilitythat spawns asyncio tasks, returns immediate acks, and delivers results as follow-up messagesTest plan
🤖 Generated with Claude Code