Skip to content

Add pending message queue and background tool execution#4980

Open
DouweM wants to merge 3 commits intomainfrom
background-tools
Open

Add pending message queue and background tool execution#4980
DouweM wants to merge 3 commits intomainfrom
background-tools

Conversation

@DouweM
Copy link
Copy Markdown
Collaborator

@DouweM DouweM commented Apr 4, 2026

Supersedes #4979 (which had merge conflicts due to branch history).

Summary

  • Adds a pending message queue (ctx.enqueue_message() / agent_run.enqueue_message()) with 'steering' and 'follow_up' priorities, powered by an auto-injected PendingMessageDrainCapability
  • Adds background tool execution (@agent.tool(background=True) / Tool(background=True) / ToolDefinition.background), powered by an auto-injected BackgroundToolCapability that spawns asyncio tasks, returns immediate acks, and delivers results as follow-up messages
  • Both features are framework-level with clean user-facing APIs, internally implemented via auto-injected capabilities that dogfood the capability abstraction

Test plan

  • 8 new tests covering steering drain, follow-up drain preventing End, external enqueue via AgentRun, background tool execution, error handling, ack messages, non-background tools unaffected, RunContext queue access
  • All existing capability, agent, tool, toolset, and logfire tests pass
  • Full pyright: 0 errors
  • ruff lint + format: clean

🤖 Generated with Claude Code

Introduces two framework-level features, both powered by auto-injected capabilities:

**Pending Message Queue:**
- `PendingMessage` type with `'steering'` and `'follow_up'` priorities
- `enqueue_message()` on `RunContext` (from tools/hooks) and `AgentRun` (external code)
- `PendingMessageDrainCapability` drains steering messages before model requests
  and follow-up messages when the agent would otherwise end

**Background Tools:**
- `ToolDefinition.background` field, `Tool(background=True)`, `@agent.tool(background=True)`
- `BackgroundToolCapability` spawns asyncio tasks, returns immediate ack,
  delivers results as follow-up messages, waits before End, cleans up on run end

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size: L Large PR (501-1500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Apr 4, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Docs Preview

commit: d12f52e
Preview URL: https://32f85f24-pydantic-ai-previews.pydantic.workers.dev

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 13 new potential issues.

View 3 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 async for iteration path does NOT fire after_node_run hooks, breaking pending message drain

The AgentRun.__anext__ method at run.py:193-210 directly calls anext(self._graph_run) without firing any capability hooks (before_node_run, after_node_run, etc.). The existing warning in __aiter__ (run.py:183-189) only checks for has_wrap_node_run and warns about wrap_node_run hooks. However, the new PendingMessageDrainCapability and BackgroundToolCapability rely on after_node_run hooks, which also don't fire through async for. Users who iterate via async for node in agent_run: instead of agent_run.next(node) will silently miss pending message draining and background task waiting. The warning should be updated to mention after_node_run hooks as well, or these capabilities should be checked in __aiter__.

(Refers to lines 183-189)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +98 to +100
# Wait for at least one task to complete
self._completion_event.clear()
await self._completion_event.wait()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Race condition: _completion_event.clear() can drop already-set signals from completed tasks

In BackgroundToolCapability.after_node_run, between the check not self._tasks and self._completion_event.clear(), a background task could have completed during an earlier await in the agent loop (e.g. during the model request that produced the End result). If a task completed and called self._completion_event.set() but there are still OTHER pending tasks in self._tasks, the code enters the wait path, calls clear() (discarding the signal from the already-completed task), and then wait()s. If the remaining tasks also happened to complete (and remove themselves from _tasks) before the event loop yields back into _run(), the event will never be re-set, and wait() will block indefinitely.

While asyncio's cooperative scheduling reduces the window, the await self._completion_event.wait() itself is a yield point — if ALL remaining tasks complete and set the event during the same tick as clear(), the clear-then-wait pattern loses the signal. A safer approach would use asyncio.wait() on the actual task objects rather than a separate Event.

Prompt for agents
In BackgroundToolCapability.after_node_run (lines 95-104), the clear-then-wait pattern on a shared asyncio.Event can lose signals when tasks complete between clear() and wait(). Consider replacing the Event-based approach with asyncio.wait() on the actual task objects, e.g.:

  done, _ = await asyncio.wait(list(self._tasks.values()), return_when=asyncio.FIRST_COMPLETED)

This directly waits for at least one task to finish without risk of losing signals. The _completion_event field could then be removed entirely.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +58 to +62
drained = _drain_by_priority(ctx.pending_messages, 'steering')
if drained:
parts = [part for msg in drained for part in msg.parts]
request_context.messages.append(ModelRequest(parts=parts))
return request_context
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Steering messages are appended to request_context.messages (the model's view) but not to state.message_history (the persistent history)

In PendingMessageDrainCapability.before_model_request (_pending_messages.py:58-61), steering messages are drained and appended to request_context.messages. However, request_context.messages is a copy of state.message_history (created at _agent_graph.py:808 via ctx.state.message_history[:]). After the model request completes, _agent_graph.py:839 writes messages back: ctx.state.message_history[:] = messages. But messages is set from request_context.messages at line 819 (messages = request_context.messages). So the steering messages are written back.

However, the new ModelRequest containing the steering parts is appended after the existing messages. The model sees it as a separate request interleaved in the conversation. But crucially, it's appended before the actual request that triggered this model call. This means the steering message appears in the conversation before the tool return / user prompt that preceded the model call, which may confuse the model about conversation ordering. The steering ModelRequest has no run_id or timestamp set, unlike the other requests (see _agent_graph.py:830-834), making it inconsistent with the rest of the message history.

Prompt for agents
The PendingMessageDrainCapability.before_model_request appends a new ModelRequest to request_context.messages without setting run_id or timestamp. All other ModelRequests in the message history have these fields set (see _agent_graph.py lines 830-834 which ensure the last request has a timestamp and run_id). The injected steering ModelRequest will be missing these fields, making it inconsistent with the rest of the message history and potentially breaking serialization or tooling that expects these fields.

At minimum, the ModelRequest created in _pending_messages.py:61 should include timestamp and run_id fields. You could pass them via the RunContext (which has run_id) and use now_utc() for the timestamp. For example:

  from pydantic_ai._utils import now_utc
  request_context.messages.append(ModelRequest(parts=parts, run_id=ctx.run_id, timestamp=now_utc()))

Also consider whether appending the steering message as a separate ModelRequest at the end of messages is the right approach, vs merging the parts into the existing last ModelRequest.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +269 to +270
if not isinstance(result, End) and self._graph_run.output is not None:
self._graph_run._next = [self._node_to_task(result)] # pyright: ignore[reportPrivateUsage,reportAttributeAccessIssue]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Private _graph_run._next hack assigns GraphTaskRequest instead of GraphTask, bypassing required conversion

At run.py:270, the code directly assigns [self._node_to_task(result)] to self._graph_run._next. The _node_to_task method returns GraphTaskRequest objects, but GraphRun._next expects EndMarker | Sequence[GraphTask] | None. The normal path through GraphRun.next() converts GraphTaskRequest to GraphTask via GraphTask.from_request() — this direct assignment bypasses that conversion.

While the value is typically overwritten on the next GraphRun.next() call, any intermediate access to self._graph_run.next_task would return objects of the wrong type. More critically, if the underlying graph async generator has already returned (exhausted after yielding EndMarker), subsequent asend() calls will raise StopAsyncIteration, which _advance_graph silently catches — causing it to return the input node unchanged, potentially creating an infinite loop.

Prompt for agents
The hack at run.py:269-270 directly mutates the private _next field of GraphRun, bypassing the GraphTaskRequest → GraphTask conversion that GraphRun.next() normally performs. This also assumes the graph's internal async generator is still alive after yielding EndMarker, but if the generator has returned (exhausted), subsequent asend() calls raise StopAsyncIteration which _advance_graph silently catches, leading to the node being returned unchanged (infinite loop).

A more robust approach would be to extend the GraphRun API to support 'un-ending' a run (e.g. a public method like reset_end(tasks) that properly converts GraphTaskRequests and restarts the iterator), rather than directly manipulating private internals. Alternatively, consider whether after_node_run should be able to convert End results to continuation nodes at all — perhaps the pending message drain should happen at a different point in the lifecycle.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +62 to +77
async def _run() -> None:
try:
result = await handler(args)
ctx.enqueue_message(
SystemPromptPart(f"Background tool '{tool_name}' (task {task_id}) completed.\nResult: {result}"),
priority='follow_up',
)
except Exception as e:
ctx.enqueue_message(
SystemPromptPart(f"Background tool '{tool_name}' (task {task_id}) failed: {e}"),
priority='follow_up',
)
finally:
self._tasks.pop(task_id, None)
self._completion_event.set()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Background task captures ctx (RunContext) closure that may be stale or reused across steps

In BackgroundToolCapability.wrap_tool_execute (_background_tools.py:62-76), the _run() async closure captures ctx (a RunContext) by reference. This ctx object is used later (potentially much later, after the task completes) to call ctx.enqueue_message(...). However, RunContext instances are created per-step via build_run_context (_agent_graph.py:1223-1247), and pending_messages is a shared deque from state.pending_messages. The ctx captured here is the one from the tool execution step, which should still reference the correct shared pending_messages deque. However, ctx also contains messages, run_step, usage, and other step-specific state that will be stale when the background task completes. If the handler(args) call within the background task accesses ctx (which it does via the tool function's RunContext), it will see outdated messages and run_step values. This is a correctness issue for tools that depend on ctx.messages being current.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +413 to +417
auto_caps: list[AbstractCapability[AgentDepsT]] = [
PendingMessageDrainCapability(),
BackgroundToolCapability(),
]
self._root_capability = CombinedCapability(auto_caps + capabilities)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Auto-injected capabilities add overhead to every agent, even without background tools

Both PendingMessageDrainCapability and BackgroundToolCapability are unconditionally prepended to every agent's capability list (agent/__init__.py:413-417). This means every agent run — even those that never use background tools or pending messages — pays the cost of:

  • for_run() creating a fresh BackgroundToolCapability instance every run (_background_tools.py:42-44)
  • before_model_request iterating the pending_messages deque (even when empty) on every model request
  • after_node_run checking isinstance(result, End) and self._tasks on every node
  • wrap_run wrapping the entire run with a try/finally
  • wrap_tool_execute checking tool_def.background for every single tool call

For the vast majority of agents that don't use these features, this is unnecessary overhead. A more targeted approach would be to only inject these capabilities when background=True tools are actually registered, or to make the checks essentially free (e.g., skip for_run allocation when no background tools exist).

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +413 to +417
auto_caps: list[AbstractCapability[AgentDepsT]] = [
PendingMessageDrainCapability(),
BackgroundToolCapability(),
]
self._root_capability = CombinedCapability(auto_caps + capabilities)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Capability hook ordering relies on position in list, which is fragile

The after_node_run hooks in CombinedCapability (combined.py:184) execute in reversed order. The auto-injected capabilities at agent/__init__.py:413-417 are prepended as [PendingMessageDrainCapability, BackgroundToolCapability, ...user_caps]. In reversed order, BackgroundToolCapability.after_node_run runs before PendingMessageDrainCapability.after_node_run, which is critical: Background waits for task completion and enqueues follow-ups, then PendingMessageDrain converts End → ModelRequestNode. If a user adds a capability via from_spec or override that also manipulates after_node_run, the ordering could break. The coupling between these two auto-injected capabilities via positional ordering in the list is an implicit contract that isn't documented or enforced.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +78 to +84
follow_ups = _drain_by_priority(ctx.pending_messages, 'follow_up')
if not follow_ups:
return result

parts = [part for msg in follow_ups for part in msg.parts]
request = ModelRequest(parts=parts)
return ModelRequestNode(request=request)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Follow-up drain creates ModelRequestNode without run_step increment or usage limit check

When PendingMessageDrainCapability.after_node_run redirects End to a ModelRequestNode (_pending_messages.py:83-84), this creates a new model request that bypasses the normal UserPromptNode flow. The new ModelRequestNode will trigger a model call, consuming tokens and incrementing usage. However, there's no explicit usage_limits check before this redirect — the check happens inside ModelRequestNode.run(). If the agent is near its usage limit, this redirect could cause UsageLimitExceeded to be raised during the follow-up model call, which would lose the follow-up context. This is arguably correct behavior (limits should be respected), but users may find it surprising that background tool results are silently lost when usage limits are reached.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +62 to +68
async def _run() -> None:
try:
result = await handler(args)
ctx.enqueue_message(
SystemPromptPart(f"Background tool '{tool_name}' (task {task_id}) completed.\nResult: {result}"),
priority='follow_up',
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Background tool results delivered as SystemPromptPart may confuse models

In _background_tools.py:65-67, completed background tool results are delivered as SystemPromptPart messages. This is semantically odd — the result of a tool call is packaged as a system prompt rather than a ToolReturnPart or UserPromptPart. Some models may treat system prompts differently (e.g., with higher weight or special formatting). Since the original tool call already received an acknowledgment ToolReturnPart, there's no corresponding tool_call_id for a follow-up ToolReturnPart. Using SystemPromptPart is a pragmatic choice given the constraints, but it could lead to unexpected model behavior, especially with models that enforce strict system/user/assistant role alternation.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +94 to +95
PendingMessage,
PendingMessagePriority,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 No documentation added for the pending messages and background tools features

The AGENTS.md rules state all changes need to 'update/add all relevant documentation, following the existing voice and patterns'. This PR adds significant new public APIs (PendingMessage, PendingMessagePriority, RunContext.enqueue_message, AgentRun.enqueue_message, background=True tool parameter) but no documentation files were changed. The features are exported in __init__.py and have docstrings, but there are no docs/ changes, no guide pages, and no updates to existing tool documentation to describe background tools or the pending message queue. This is flagged as an analysis rather than a bug since the PR may be WIP or documentation may be planned separately.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +267 to +271
# If after_node_run redirected End to a new node (e.g. pending message drain),
# clear the graph run's end state so the run loop continues.
if not isinstance(result, End) and self._graph_run.output is not None:
self._graph_run._next = [self._node_to_task(result)] # pyright: ignore[reportPrivateUsage,reportAttributeAccessIssue]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Direct mutation of private GraphRun._next to override End state is fragile

At pydantic_ai_slim/pydantic_ai/run.py:269-270, the code directly mutates self._graph_run._next to clear the EndMarker when after_node_run transforms an End into a new node. This works because (1) GraphRun.output checks isinstance(self._next, EndMarker) and returns None for non-EndMarker values, and (2) the next call to GraphRun.next() overwrites _next with properly-typed GraphTask objects. However, the value written ([GraphTaskRequest]) doesn't match _next's expected type (Sequence[GraphTask]). This is a private API that could break silently if pydantic_graph changes its internals. The pyright: ignore comment acknowledges this. Consider whether GraphRun should expose a public method for resetting end state.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature request, or PR implementing a feature (enhancement) size: L Large PR (501-1500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant