feat: add deferred tool-call reply support#250
feat: add deferred tool-call reply support#250Yagi-Michael wants to merge 5 commits intocloudwalk:mainfrom
Conversation
Message.decode/1 returns a list of decoded messages, but process_message/2 expects a single message map. This caused a BadMapError on every incoming STDIO message from real MCP clients. Cherry-picked from cloudwalk#241.
Tool error responses should not be validated against the output schema — they contain error information, not structured tool output. Cherry-picked from cloudwalk#225.
Allow server handler to return {:defer, ref, opts, frame} from
handle_tool_call instead of an immediate reply. The transport caller
blocks naturally via GenServer.call semantics until the reply is
resolved via Hermes.Server.deferred_reply/3 or cancelled via
Hermes.Server.cancel_deferred/2.
Key changes:
- Add deferred_replies map to Base state, tracking pending refs
- Thread {:defer, ...} through Handlers.Tools.forward_to, Handlers.handle,
server_request, and handle_call
- Add handle_cast for :deferred_reply (resolve) and :cancel_deferred
- Handle caller DOWN by cleaning up deferred entries and notifying
- Sweep deferred replies on session termination and expiry
- Add server_pid to Frame.private for handler access
- Update handle_tool_call callback typespec
- Add public API: deferred_reply/3 and cancel_deferred/2
- In Base.handle_notification for notifications/cancelled: after the
existing session request completion, scan deferred_replies for any
entry whose request_id matches the cancelled id. If found, demonitor,
GenServer.reply with {:error, :cancelled}, notify cancel_notify pid,
and remove the entry — preventing the transport caller from blocking
indefinitely.
- In Server.handle_tool_call @callback doc: expand the defer tuple
description with an "opts" section listing :cancel_notify and its
semantics. Tighten the typespec from bare map() to a typed map with
optional :cancel_notify key.
- Add a test in DeferredReplyTest covering the new cancellation path:
send a deferred tool call, then send notifications/cancelled for the
same request_id, and assert the caller gets {:error, :cancelled},
cancel_notify receives {:deferred_cancelled, ref}, and the deferred
entry is removed from state.
There was a problem hiding this comment.
Summary
Solid implementation of deferred tool-call replies — the GenServer mechanics, monitor-based caller tracking, session sweep, idempotency, and the notifications/cancelled wiring are all well thought out. Test coverage is thorough for the direct GenServer path.
Two structural issues worth addressing, both rooted in how the STDIO (and StreamableHTTP) transports interact with the blocking GenServer.call:
lib/hermes/server/transport/stdio.ex
Outdated
| case Message.decode(data) do | ||
| {:ok, messages} -> | ||
| process_message(messages, state) | ||
| Enum.each(messages, &process_message(&1, state)) |
There was a problem hiding this comment.
STDIO blocks on deferred calls, making cancellation unreachable
The STDIO reading loop is single-threaded: handle_info calls process_message synchronously and only spawns the next stdin Task after it returns. When a tool returns {:defer, ...}, GenServer.call here blocks — and the transport reads no new stdin until the deferred reply resolves. A notifications/cancelled sent by the client while the call is in-flight will never arrive; the whole cancellation path in handle_notification is dead for STDIO.
For this to work, the request dispatch in process_message needs to be async (e.g. Task.start + reply routed back to transport), so stdin keeps being read independently.
There was a problem hiding this comment.
Fixed in d890f3a. STDIO transport now dispatches requests via Task.Supervisor.async (the TaskSupervisor was already started in the supervision tree but not wired into STDIO). The read loop returns immediately after spawning the task, so new stdin data — including notifications/cancelled — is processed without waiting for the request to complete.
The GenServer.call inside the spawned task uses :infinity timeout (see second comment for rationale).
There was a problem hiding this comment.
Makes sense — is the right call here since the transport has no visibility into whether a request will be deferred. The server already owns the full lifecycle. Thanks for the detailed rationale.
| if request_id, do: Session.complete_request(session.name, request_id) | ||
| {:reply, {:error, error}, new_state} | ||
|
|
||
| {:defer, ref, opts, frame} -> |
There was a problem hiding this comment.
Transport timeout silently drops deferred replies after 30 s
Both STDIO and StreamableHTTP call GenServer.call(server, {:request, ...}, timeout) with a default 30-second timeout. The primary motivation for deferred replies is long-running work (file indexing, batch embedding, external API fan-out) that can easily exceed 30 s. When the timeout fires, the transport's catch :exit branch swallows the error, the caller unblocks with nothing, and the eventual deferred_reply/3 call silently no-ops against a dead from.
This should at minimum be called out in the deferred_reply docs. A cleaner fix: transport layers pass :infinity for the GenServer.call timeout when a request is flagged as potentially deferrable, or expose a dedicated deferred_call_timeout transport option.
There was a problem hiding this comment.
Fixed in d890f3a. Both STDIO and StreamableHTTP now use :infinity for the GenServer.call timeout. The server already manages the full request lifecycle — deferred reply resolution, cancellation, caller DOWN monitoring, and session sweep — so a transport-level timeout is redundant and actively harmful for deferred replies.
Went with :infinity over a separate deferred_call_timeout option because the transport can't know in advance which requests will be deferred; the decision is made inside the server handler.
There was a problem hiding this comment.
Good rationale — the transport can't know upfront which calls will defer, so a per-request opt-in timeout wouldn't work cleanly anyway. The server-side lifecycle management (monitor + sweep + cancel notification) covers all the failure modes.
…d replies STDIO transport now dispatches requests via Task.Supervisor.async instead of blocking the stdin read loop with GenServer.call. This allows notifications/cancelled to arrive while a deferred request is in-flight. Both STDIO and StreamableHTTP now use :infinity for the GenServer.call timeout — the server manages request lifecycle via deferred reply resolution, cancellation, caller DOWN monitoring, and session sweep. The previous 30s transport timeout silently dropped deferred replies for long-running work.
There was a problem hiding this comment.
Follow-up review (d890f3a)
✅ Both previous findings addressed:
- ✅ STDIO blocking issue — requests are now dispatched via
Task.Supervisor.async, keeping stdin reading independent of long-running tool calls. - ✅ Transport timeout — both STDIO and StreamableHTTP now use
:infinity, with the server managing lifecycle via deferred reply resolution, caller monitoring, and session sweep.
One new item worth noting below.
| {:name, {:custom, &Hermes.genserver_name/1}}, | ||
| {:registry, {:atom, {:default, Hermes.Server.Registry}}}, | ||
| {:request_timeout, {:integer, {:default, to_timeout(second: 30)}}} | ||
| {:request_timeout, {:integer, {:default, to_timeout(second: 30)}}}, |
There was a problem hiding this comment.
Breaking change: task_supervisor is now required
{:required, ...} means any existing STDIO transport instantiation that doesn't pass :task_supervisor will crash at startup with a missing-option error. Users who were previously starting the transport with just server:, name:, and registry: will need a migration.
Consider defaulting to a module-level supervisor started as part of the transport's own supervision tree (similar to how DynamicSupervisor is embedded in some transports), or at least add a note in the module doc / changelog about the new required dependency.
Summary
handle_tool_callto return{:defer, ref, opts, frame}instead of an immediate reply. The transport caller blocks naturally viaGenServer.calluntil resolved viaHermes.Server.deferred_reply/3or cancelled viaHermes.Server.cancel_deferred/2.notifications/cancelledto deferred entries — prevents the transport caller from blocking indefinitely when a client cancels a pending request.fix(stdio): iterate over decoded messages list (fix(stdio): iterate over decoded messages list #241)fix(tools): skip output schema validation on error responses (fix: don't validate output schema on tool error responses #225)Motivation
Long-running tool calls (file indexing, batch embedding, external API fan-out) can't return immediately. Today the only option is to block the handler process. Deferred replies let the server acknowledge the request, do async work, and resolve later — matching how real MCP servers need to operate.
This also enables cancellation: if the client sends
notifications/cancelledfor a deferred request, the server cleans up and unblocks the caller with{:error, :cancelled}.Changes
lib/hermes/server.exdeferred_reply/3,cancel_deferred/2lib/hermes/server/base.exdeferred_repliesmap in state, handle_cast for resolve/cancel, caller DOWN monitoring, session termination sweep, cancellation notification handlinglib/hermes/server/handlers.ex{:defer, ...}throughhandle/3lib/hermes/server/handlers/tools.ex{:defer, ...}throughforward_to/3lib/hermes/server/transport/stdio.extest/hermes/server/deferred_reply_test.exstest/support/deferred_stub_server.exTest plan
notifications/cancelledcancels deferred entry, caller gets{:error, :cancelled}cancel_notifypid receives{:deferred_cancelled, ref}on cancellation