Drain HTTP server before relaying termination signal to upstream by alexspeller · Pull Request #132 · basecamp/thruster

alexspeller · 2026-06-04T11:07:07Z

Problem

On SIGTERM/SIGINT, Thruster relays the signal straight to the upstream process and only closes its own HTTP listener afterwards, via the deferred server.Stop() in Service.Run — which doesn't run until upstream.Run() returns, i.e. until the upstream has already exited.

That leaves a window, for the entire duration of the upstream's shutdown (typically tens of seconds while it drains in-flight requests), where Thruster still accepts new connections on its HTTP port but the upstream has stopped listening. Each such connection fails at the proxy hop with connection refused and is returned to the client as a 502. On a busy service this produces a burst of user-facing 502s on every deploy.

Change

Reverse the order: intercept the signal in the Service, drain the HTTP server first — closing the listener so new connections are refused at the TCP level (where an upstream load balancer routes them elsewhere) while in-flight requests finish against the still-running upstream — and only then relay the signal to the upstream.

Signal handling moves from UpstreamProcess to Service, which owns both the server and the upstream and so is the natural place to coordinate their shutdown order.
Server.Stop is made idempotent, since it is now called both from the signal handler and from the deferred cleanup in Run.
The drain timeout was previously hardcoded at 5s — which never mattered, since the listener wasn't closed until the upstream had already exited. Now that it is meaningful, it is configurable via HTTP_DRAIN_TIMEOUT, defaulting to 30s to match kamal-proxy's --drain-timeout.

Before / after

Wrapping a backend that releases its port on SIGTERM and then takes a few seconds to exit (as Puma does while draining), while hammering the front-end during the shutdown window:

Before: every request during the window → 502 (dial tcp ...: connect: connection refused)
After: every request during the window → TCP connection refused, no 5xx; in-flight requests continue to drain via http.Server.Shutdown

Tests

Adds service_test.go asserting the HTTP listener stops accepting connections before the upstream is signalled, plus config coverage for the new HTTP_DRAIN_TIMEOUT setting.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Implements a more reliable graceful shutdown flow by draining the HTTP server before relaying termination signals to the wrapped upstream process, with a configurable drain timeout.

Changes:

Move SIGINT/SIGTERM handling from UpstreamProcess into Service to coordinate server drain + upstream signaling order.
Add HTTP_DRAIN_TIMEOUT/HttpDrainTimeout config and use it for Server.Stop() shutdown deadlines.
Add a regression test covering “stop accepting connections before signaling upstream”, plus README docs.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
internal/upstream_process.go	Removes standalone signal forwarding from the upstream wrapper so shutdown can be coordinated at the service level.
internal/service.go	Adds signal handling + `gracefulShutdown` that drains the server before signaling upstream.
internal/server.go	Makes `Stop()` idempotent and uses configurable drain timeout.
internal/config.go	Adds `HttpDrainTimeout` with env var + default.
internal/config_test.go	Extends config tests to cover the new drain timeout.
internal/service_test.go	Adds a test asserting the server listener is closed before upstream signaling.
README.md	Documents `HTTP_DRAIN_TIMEOUT`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

On SIGTERM/SIGINT, Thruster relayed the signal straight to the upstream process and only closed its own HTTP listener afterwards, via the deferred server.Stop() in Service.Run -- which doesn't run until upstream.Run() returns, i.e. until the upstream has already exited. That leaves a window for the entire duration of the upstream's shutdown (typically tens of seconds while it drains in-flight requests) where Thruster still accepts new connections on its HTTP port but the upstream has stopped listening. Each such connection fails at the proxy hop with 'connection refused' and is returned to the client as a 502. On a busy service this produces a burst of user-facing 502s on every deploy. Reverse the order: intercept the signal in the Service, drain the HTTP server first (closing the listener so new connections are refused at the TCP level, where an upstream load balancer will route them elsewhere, while in-flight requests finish against the still-running upstream), and only then relay the signal to the upstream. Signal handling moves from UpstreamProcess to Service, which owns both the server and the upstream and so is the natural place to coordinate their shutdown order. Server.Stop is made idempotent so it can be called both here and from the deferred cleanup. The drain timeout was previously hardcoded at 5 seconds -- which never mattered before, since the listener wasn't closed until the upstream had already exited. Now that it is meaningful, make it configurable via HTTP_DRAIN_TIMEOUT, defaulting to 30 seconds to match kamal-proxy's drain-timeout.

Copilot AI review requested due to automatic review settings June 4, 2026 11:07

claude Bot reviewed Jun 4, 2026

View reviewed changes

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Comment thread internal/service.go Outdated

Comment thread internal/service.go Outdated

Comment thread internal/service.go

Comment thread internal/service_test.go Outdated

alexspeller force-pushed the drain-before-signaling-upstream branch from 7353a0a to 310147e Compare June 4, 2026 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drain HTTP server before relaying termination signal to upstream#132

Drain HTTP server before relaying termination signal to upstream#132
alexspeller wants to merge 1 commit into
basecamp:mainfrom
alexspeller:drain-before-signaling-upstream

alexspeller commented Jun 4, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexspeller commented Jun 4, 2026

Problem

Change

Before / after

Tests

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants