Skip to content

fix: prevent GenServer.call timeout crash in StreamableHTTP Plug#246

Open
larskluge wants to merge 1 commit intocloudwalk:mainfrom
larskluge:fix/streamable-http-genserver-call-timeout
Open

fix: prevent GenServer.call timeout crash in StreamableHTTP Plug#246
larskluge wants to merge 1 commit intocloudwalk:mainfrom
larskluge:fix/streamable-http-genserver-call-timeout

Conversation

@larskluge
Copy link
Copy Markdown

Problem

handle_message_for_sse/4 and handle_message/4 in StreamableHTTP use the default 5s GenServer.call timeout. When a tool handler takes longer than 5 seconds (common for LLM-backed tools, database queries, or network calls), the GenServer.call raises an unhandled :exit that crashes the Plug process. The MCP client receives a generic HTTP connection error with no indication of what went wrong.

The transport already has a configurable request_timeout (default 30s) that's correctly used for the internal forward_request_to_server call — but the outer GenServer.call from the public API functions times out first.

Fix

  1. Default GenServer.call timeout to :infinity for handle_message/5 and handle_message_for_sse/5, since the actual timeout is already enforced internally via the transport's request_timeout option. A new optional timeout parameter is added for callers that want explicit control.

  2. Catch :exit in the Plug for both handle_sse_request and handle_json_request, returning a proper JSON-RPC internal error response instead of silently crashing the connection.

Test plan

  • Existing streamable_http_test.exs tests pass (9 tests)
  • Existing streamable_http/plug tests pass (15 tests)
  • Manual verification: tool calls taking >5s return proper JSON-RPC error responses instead of connection drops

🤖 Generated with Claude Code

The handle_message_for_sse/4 and handle_message/4 functions used the
default 5s GenServer.call timeout. When tool execution takes longer
(e.g. LLM calls), this caused an unhandled exit that crashed the HTTP
connection — the client received no error response at all.

Two fixes:
1. Default GenServer.call timeout to :infinity for handle_message and
   handle_message_for_sse, since the actual timeout is already enforced
   internally via the transport's request_timeout option.
2. Catch :exit in the Plug's handle_sse_request and handle_json_request
   so that any unexpected exits return a proper JSON-RPC error response
   instead of silently crashing the connection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cloudwalk-review-agent cloudwalk-review-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Solid fix for a real production pain point. The :infinity default is the right call — the actual timeout budget is already governed by the transport's internal request_timeout, so the outer GenServer.call timeout was just a landmine waiting to go off on any slow tool. The catch :exit guards in the Plug are good defensive layering on top.

Two observations worth addressing in a follow-up:

{:error, error} ->
handle_request_error(conn, error, body)
end
catch
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch :exit block here and the identical one in handle_json_request (line ~281) are copy-pasted. If the error handling logic ever changes (e.g. adding a different log level or shaping the error differently), both need updating. Consider extracting into a private handle_exit(conn, reason, body) helper to keep it DRY.

{:error, error} ->
handle_request_error(conn, error, body)
end
catch
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no automated test exercising this catch :exit path — the PR description leaves the manual verification checkbox unchecked. Worth adding an ExUnit test with a stubbed transport that exits (Process.exit(self(), :kill) from a mock handler) to verify the Plug returns a proper JSON-RPC internal_error response rather than dropping the connection. This path is easy to accidentally break in a refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant