feat: add deferred tool-call reply support by Yagi-Michael · Pull Request #250 · cloudwalk/hermes-mcp

Yagi-Michael · 2026-04-04T23:13:18Z

Summary

Add deferred tool-call replies — allow handle_tool_call to return {:defer, ref, opts, frame} instead of an immediate reply. The transport caller blocks naturally via GenServer.call until resolved via Hermes.Server.deferred_reply/3 or cancelled via Hermes.Server.cancel_deferred/2.
Wire notifications/cancelled to deferred entries — prevents the transport caller from blocking indefinitely when a client cancels a pending request.
Cherry-pick two upstream bugfixes needed as prerequisites:
- fix(stdio): iterate over decoded messages list (fix(stdio): iterate over decoded messages list #241)
- fix(tools): skip output schema validation on error responses (fix: don't validate output schema on tool error responses #225)

Motivation

Long-running tool calls (file indexing, batch embedding, external API fan-out) can't return immediately. Today the only option is to block the handler process. Deferred replies let the server acknowledge the request, do async work, and resolve later — matching how real MCP servers need to operate.

This also enables cancellation: if the client sends notifications/cancelled for a deferred request, the server cleans up and unblocks the caller with {:error, :cancelled}.

Changes

File	What
`lib/hermes/server.ex`	Public API: `deferred_reply/3`, `cancel_deferred/2`
`lib/hermes/server/base.ex`	`deferred_replies` map in state, handle_cast for resolve/cancel, caller DOWN monitoring, session termination sweep, cancellation notification handling
`lib/hermes/server/handlers.ex`	Thread `{:defer, ...}` through `handle/3`
`lib/hermes/server/handlers/tools.ex`	Thread `{:defer, ...}` through `forward_to/3`
`lib/hermes/server/transport/stdio.ex`	Iterate decoded message list (cherry-pick #241)
`test/hermes/server/deferred_reply_test.exs`	338-line test suite covering resolve, cancel, caller DOWN, session expiry
`test/support/deferred_stub_server.ex`	Test helper server

Test plan

Deferred reply resolves and unblocks caller
notifications/cancelled cancels deferred entry, caller gets {:error, :cancelled}
cancel_notify pid receives {:deferred_cancelled, ref} on cancellation
Caller DOWN cleans up deferred entry
Session termination sweeps all deferred replies
Error responses skip output schema validation

Message.decode/1 returns a list of decoded messages, but process_message/2 expects a single message map. This caused a BadMapError on every incoming STDIO message from real MCP clients. Cherry-picked from cloudwalk#241.

Tool error responses should not be validated against the output schema — they contain error information, not structured tool output. Cherry-picked from cloudwalk#225.

Allow server handler to return {:defer, ref, opts, frame} from handle_tool_call instead of an immediate reply. The transport caller blocks naturally via GenServer.call semantics until the reply is resolved via Hermes.Server.deferred_reply/3 or cancelled via Hermes.Server.cancel_deferred/2. Key changes: - Add deferred_replies map to Base state, tracking pending refs - Thread {:defer, ...} through Handlers.Tools.forward_to, Handlers.handle, server_request, and handle_call - Add handle_cast for :deferred_reply (resolve) and :cancel_deferred - Handle caller DOWN by cleaning up deferred entries and notifying - Sweep deferred replies on session termination and expiry - Add server_pid to Frame.private for handler access - Update handle_tool_call callback typespec - Add public API: deferred_reply/3 and cancel_deferred/2

@callback

- In Base.handle_notification for notifications/cancelled: after the existing session request completion, scan deferred_replies for any entry whose request_id matches the cancelled id. If found, demonitor, GenServer.reply with {:error, :cancelled}, notify cancel_notify pid, and remove the entry — preventing the transport caller from blocking indefinitely. - In Server.handle_tool_call @callback doc: expand the defer tuple description with an "opts" section listing :cancel_notify and its semantics. Tighten the typespec from bare map() to a typed map with optional :cancel_notify key. - Add a test in DeferredReplyTest covering the new cancellation path: send a deferred tool call, then send notifications/cancelled for the same request_id, and assert the caller gets {:error, :cancelled}, cancel_notify receives {:deferred_cancelled, ref}, and the deferred entry is removed from state.

cloudwalk-review-agent

Summary

Solid implementation of deferred tool-call replies — the GenServer mechanics, monitor-based caller tracking, session sweep, idempotency, and the notifications/cancelled wiring are all well thought out. Test coverage is thorough for the direct GenServer path.

Two structural issues worth addressing, both rooted in how the STDIO (and StreamableHTTP) transports interact with the blocking GenServer.call:

cloudwalk-review-agent · 2026-04-04T23:46:23Z

lib/hermes/server/transport/stdio.ex

    case Message.decode(data) do
      {:ok, messages} ->
-        process_message(messages, state)
+        Enum.each(messages, &process_message(&1, state))


STDIO blocks on deferred calls, making cancellation unreachable

The STDIO reading loop is single-threaded: handle_info calls process_message synchronously and only spawns the next stdin Task after it returns. When a tool returns {:defer, ...}, GenServer.call here blocks — and the transport reads no new stdin until the deferred reply resolves. A notifications/cancelled sent by the client while the call is in-flight will never arrive; the whole cancellation path in handle_notification is dead for STDIO.

For this to work, the request dispatch in process_message needs to be async (e.g. Task.start + reply routed back to transport), so stdin keeps being read independently.

Fixed in d890f3a. STDIO transport now dispatches requests via Task.Supervisor.async (the TaskSupervisor was already started in the supervision tree but not wired into STDIO). The read loop returns immediately after spawning the task, so new stdin data — including notifications/cancelled — is processed without waiting for the request to complete.

The GenServer.call inside the spawned task uses :infinity timeout (see second comment for rationale).

Makes sense — is the right call here since the transport has no visibility into whether a request will be deferred. The server already owns the full lifecycle. Thanks for the detailed rationale.

cloudwalk-review-agent · 2026-04-04T23:46:23Z

lib/hermes/server/base.ex

          if request_id, do: Session.complete_request(session.name, request_id)
          {:reply, {:error, error}, new_state}
+
+        {:defer, ref, opts, frame} ->


Transport timeout silently drops deferred replies after 30 s

Both STDIO and StreamableHTTP call GenServer.call(server, {:request, ...}, timeout) with a default 30-second timeout. The primary motivation for deferred replies is long-running work (file indexing, batch embedding, external API fan-out) that can easily exceed 30 s. When the timeout fires, the transport's catch :exit branch swallows the error, the caller unblocks with nothing, and the eventual deferred_reply/3 call silently no-ops against a dead from.

This should at minimum be called out in the deferred_reply docs. A cleaner fix: transport layers pass :infinity for the GenServer.call timeout when a request is flagged as potentially deferrable, or expose a dedicated deferred_call_timeout transport option.

Fixed in d890f3a. Both STDIO and StreamableHTTP now use :infinity for the GenServer.call timeout. The server already manages the full request lifecycle — deferred reply resolution, cancellation, caller DOWN monitoring, and session sweep — so a transport-level timeout is redundant and actively harmful for deferred replies.

Went with :infinity over a separate deferred_call_timeout option because the transport can't know in advance which requests will be deferred; the decision is made inside the server handler.

Good rationale — the transport can't know upfront which calls will defer, so a per-request opt-in timeout wouldn't work cleanly anyway. The server-side lifecycle management (monitor + sweep + cancel notification) covers all the failure modes.

…d replies STDIO transport now dispatches requests via Task.Supervisor.async instead of blocking the stdin read loop with GenServer.call. This allows notifications/cancelled to arrive while a deferred request is in-flight. Both STDIO and StreamableHTTP now use :infinity for the GenServer.call timeout — the server manages request lifecycle via deferred reply resolution, cancellation, caller DOWN monitoring, and session sweep. The previous 30s transport timeout silently dropped deferred replies for long-running work.

cloudwalk-review-agent

Follow-up review (`d890f3a`)

✅ Both previous findings addressed:

✅ STDIO blocking issue — requests are now dispatched via Task.Supervisor.async, keeping stdin reading independent of long-running tool calls.
✅ Transport timeout — both STDIO and StreamableHTTP now use :infinity, with the server managing lifecycle via deferred reply resolution, caller monitoring, and session sweep.

One new item worth noting below.

cloudwalk-review-agent · 2026-04-05T03:44:31Z

lib/hermes/server/transport/stdio.ex

    {:name, {:custom, &Hermes.genserver_name/1}},
    {:registry, {:atom, {:default, Hermes.Server.Registry}}},
-    {:request_timeout, {:integer, {:default, to_timeout(second: 30)}}}
+    {:request_timeout, {:integer, {:default, to_timeout(second: 30)}}},


Breaking change: task_supervisor is now required

{:required, ...} means any existing STDIO transport instantiation that doesn't pass :task_supervisor will crash at startup with a missing-option error. Users who were previously starting the transport with just server:, name:, and registry: will need a migration.

Consider defaulting to a module-level supervisor started as part of the transport's own supervision tree (similar to how DynamicSupervisor is embedded in some transports), or at least add a note in the module doc / changelog about the new required dependency.

Yagi-Michael added 4 commits March 31, 2026 10:14

fix(stdio): iterate over decoded messages list

993339c

Message.decode/1 returns a list of decoded messages, but process_message/2 expects a single message map. This caused a BadMapError on every incoming STDIO message from real MCP clients. Cherry-picked from cloudwalk#241.

fix(tools): skip output schema validation on error responses

ade0aed

Tool error responses should not be validated against the output schema — they contain error information, not structured tool output. Cherry-picked from cloudwalk#225.

cloudwalk-review-agent bot reviewed Apr 4, 2026

View reviewed changes

cloudwalk-review-agent bot approved these changes Apr 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add deferred tool-call reply support#250

feat: add deferred tool-call reply support#250
Yagi-Michael wants to merge 5 commits intocloudwalk:mainfrom
Yagi-Michael:main

Yagi-Michael commented Apr 4, 2026

Uh oh!

cloudwalk-review-agent bot left a comment

Uh oh!

cloudwalk-review-agent bot Apr 4, 2026

Uh oh!

Yagi-Michael Apr 5, 2026

Uh oh!

cloudwalk-review-agent bot Apr 5, 2026

Uh oh!

cloudwalk-review-agent bot Apr 4, 2026

Uh oh!

Yagi-Michael Apr 5, 2026

Uh oh!

cloudwalk-review-agent bot Apr 5, 2026

Uh oh!

cloudwalk-review-agent bot left a comment

Uh oh!

cloudwalk-review-agent bot Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Yagi-Michael commented Apr 4, 2026

Summary

Motivation

Changes

Test plan

Uh oh!

cloudwalk-review-agent bot left a comment

Choose a reason for hiding this comment

Summary

Uh oh!

cloudwalk-review-agent bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Yagi-Michael Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

cloudwalk-review-agent bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

cloudwalk-review-agent bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Yagi-Michael Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

cloudwalk-review-agent bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

cloudwalk-review-agent bot left a comment

Choose a reason for hiding this comment

Follow-up review (d890f3a)

Uh oh!

cloudwalk-review-agent bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Follow-up review (`d890f3a`)