Skip to content

feat: add deferred tool-call reply support#250

Open
Yagi-Michael wants to merge 5 commits intocloudwalk:mainfrom
Yagi-Michael:main
Open

feat: add deferred tool-call reply support#250
Yagi-Michael wants to merge 5 commits intocloudwalk:mainfrom
Yagi-Michael:main

Conversation

@Yagi-Michael
Copy link
Copy Markdown

Summary

  • Add deferred tool-call replies — allow handle_tool_call to return {:defer, ref, opts, frame} instead of an immediate reply. The transport caller blocks naturally via GenServer.call until resolved via Hermes.Server.deferred_reply/3 or cancelled via Hermes.Server.cancel_deferred/2.
  • Wire notifications/cancelled to deferred entries — prevents the transport caller from blocking indefinitely when a client cancels a pending request.
  • Cherry-pick two upstream bugfixes needed as prerequisites:

Motivation

Long-running tool calls (file indexing, batch embedding, external API fan-out) can't return immediately. Today the only option is to block the handler process. Deferred replies let the server acknowledge the request, do async work, and resolve later — matching how real MCP servers need to operate.

This also enables cancellation: if the client sends notifications/cancelled for a deferred request, the server cleans up and unblocks the caller with {:error, :cancelled}.

Changes

File What
lib/hermes/server.ex Public API: deferred_reply/3, cancel_deferred/2
lib/hermes/server/base.ex deferred_replies map in state, handle_cast for resolve/cancel, caller DOWN monitoring, session termination sweep, cancellation notification handling
lib/hermes/server/handlers.ex Thread {:defer, ...} through handle/3
lib/hermes/server/handlers/tools.ex Thread {:defer, ...} through forward_to/3
lib/hermes/server/transport/stdio.ex Iterate decoded message list (cherry-pick #241)
test/hermes/server/deferred_reply_test.exs 338-line test suite covering resolve, cancel, caller DOWN, session expiry
test/support/deferred_stub_server.ex Test helper server

Test plan

  • Deferred reply resolves and unblocks caller
  • notifications/cancelled cancels deferred entry, caller gets {:error, :cancelled}
  • cancel_notify pid receives {:deferred_cancelled, ref} on cancellation
  • Caller DOWN cleans up deferred entry
  • Session termination sweeps all deferred replies
  • Error responses skip output schema validation

Message.decode/1 returns a list of decoded messages, but
process_message/2 expects a single message map. This caused a
BadMapError on every incoming STDIO message from real MCP clients.

Cherry-picked from cloudwalk#241.
Tool error responses should not be validated against the output schema —
they contain error information, not structured tool output.

Cherry-picked from cloudwalk#225.
Allow server handler to return {:defer, ref, opts, frame} from
handle_tool_call instead of an immediate reply. The transport caller
blocks naturally via GenServer.call semantics until the reply is
resolved via Hermes.Server.deferred_reply/3 or cancelled via
Hermes.Server.cancel_deferred/2.

Key changes:
- Add deferred_replies map to Base state, tracking pending refs
- Thread {:defer, ...} through Handlers.Tools.forward_to, Handlers.handle,
  server_request, and handle_call
- Add handle_cast for :deferred_reply (resolve) and :cancel_deferred
- Handle caller DOWN by cleaning up deferred entries and notifying
- Sweep deferred replies on session termination and expiry
- Add server_pid to Frame.private for handler access
- Update handle_tool_call callback typespec
- Add public API: deferred_reply/3 and cancel_deferred/2
- In Base.handle_notification for notifications/cancelled: after the
  existing session request completion, scan deferred_replies for any
  entry whose request_id matches the cancelled id. If found, demonitor,
  GenServer.reply with {:error, :cancelled}, notify cancel_notify pid,
  and remove the entry — preventing the transport caller from blocking
  indefinitely.

- In Server.handle_tool_call @callback doc: expand the defer tuple
  description with an "opts" section listing :cancel_notify and its
  semantics. Tighten the typespec from bare map() to a typed map with
  optional :cancel_notify key.

- Add a test in DeferredReplyTest covering the new cancellation path:
  send a deferred tool call, then send notifications/cancelled for the
  same request_id, and assert the caller gets {:error, :cancelled},
  cancel_notify receives {:deferred_cancelled, ref}, and the deferred
  entry is removed from state.
Copy link
Copy Markdown

@cloudwalk-review-agent cloudwalk-review-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Solid implementation of deferred tool-call replies — the GenServer mechanics, monitor-based caller tracking, session sweep, idempotency, and the notifications/cancelled wiring are all well thought out. Test coverage is thorough for the direct GenServer path.

Two structural issues worth addressing, both rooted in how the STDIO (and StreamableHTTP) transports interact with the blocking GenServer.call:

case Message.decode(data) do
{:ok, messages} ->
process_message(messages, state)
Enum.each(messages, &process_message(&1, state))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STDIO blocks on deferred calls, making cancellation unreachable

The STDIO reading loop is single-threaded: handle_info calls process_message synchronously and only spawns the next stdin Task after it returns. When a tool returns {:defer, ...}, GenServer.call here blocks — and the transport reads no new stdin until the deferred reply resolves. A notifications/cancelled sent by the client while the call is in-flight will never arrive; the whole cancellation path in handle_notification is dead for STDIO.

For this to work, the request dispatch in process_message needs to be async (e.g. Task.start + reply routed back to transport), so stdin keeps being read independently.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d890f3a. STDIO transport now dispatches requests via Task.Supervisor.async (the TaskSupervisor was already started in the supervision tree but not wired into STDIO). The read loop returns immediately after spawning the task, so new stdin data — including notifications/cancelled — is processed without waiting for the request to complete.

The GenServer.call inside the spawned task uses :infinity timeout (see second comment for rationale).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense — is the right call here since the transport has no visibility into whether a request will be deferred. The server already owns the full lifecycle. Thanks for the detailed rationale.

if request_id, do: Session.complete_request(session.name, request_id)
{:reply, {:error, error}, new_state}

{:defer, ref, opts, frame} ->
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transport timeout silently drops deferred replies after 30 s

Both STDIO and StreamableHTTP call GenServer.call(server, {:request, ...}, timeout) with a default 30-second timeout. The primary motivation for deferred replies is long-running work (file indexing, batch embedding, external API fan-out) that can easily exceed 30 s. When the timeout fires, the transport's catch :exit branch swallows the error, the caller unblocks with nothing, and the eventual deferred_reply/3 call silently no-ops against a dead from.

This should at minimum be called out in the deferred_reply docs. A cleaner fix: transport layers pass :infinity for the GenServer.call timeout when a request is flagged as potentially deferrable, or expose a dedicated deferred_call_timeout transport option.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d890f3a. Both STDIO and StreamableHTTP now use :infinity for the GenServer.call timeout. The server already manages the full request lifecycle — deferred reply resolution, cancellation, caller DOWN monitoring, and session sweep — so a transport-level timeout is redundant and actively harmful for deferred replies.

Went with :infinity over a separate deferred_call_timeout option because the transport can't know in advance which requests will be deferred; the decision is made inside the server handler.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good rationale — the transport can't know upfront which calls will defer, so a per-request opt-in timeout wouldn't work cleanly anyway. The server-side lifecycle management (monitor + sweep + cancel notification) covers all the failure modes.

…d replies

STDIO transport now dispatches requests via Task.Supervisor.async instead
of blocking the stdin read loop with GenServer.call. This allows
notifications/cancelled to arrive while a deferred request is in-flight.

Both STDIO and StreamableHTTP now use :infinity for the GenServer.call
timeout — the server manages request lifecycle via deferred reply
resolution, cancellation, caller DOWN monitoring, and session sweep.
The previous 30s transport timeout silently dropped deferred replies for
long-running work.
Copy link
Copy Markdown

@cloudwalk-review-agent cloudwalk-review-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (d890f3a)

Both previous findings addressed:

  • ✅ STDIO blocking issue — requests are now dispatched via Task.Supervisor.async, keeping stdin reading independent of long-running tool calls.
  • ✅ Transport timeout — both STDIO and StreamableHTTP now use :infinity, with the server managing lifecycle via deferred reply resolution, caller monitoring, and session sweep.

One new item worth noting below.

{:name, {:custom, &Hermes.genserver_name/1}},
{:registry, {:atom, {:default, Hermes.Server.Registry}}},
{:request_timeout, {:integer, {:default, to_timeout(second: 30)}}}
{:request_timeout, {:integer, {:default, to_timeout(second: 30)}}},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking change: task_supervisor is now required

{:required, ...} means any existing STDIO transport instantiation that doesn't pass :task_supervisor will crash at startup with a missing-option error. Users who were previously starting the transport with just server:, name:, and registry: will need a migration.

Consider defaulting to a module-level supervisor started as part of the transport's own supervision tree (similar to how DynamicSupervisor is embedded in some transports), or at least add a note in the module doc / changelog about the new required dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant