Drain HTTP server before relaying termination signal to upstream#132
Open
alexspeller wants to merge 1 commit into
Open
Drain HTTP server before relaying termination signal to upstream#132alexspeller wants to merge 1 commit into
alexspeller wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Implements a more reliable graceful shutdown flow by draining the HTTP server before relaying termination signals to the wrapped upstream process, with a configurable drain timeout.
Changes:
- Move SIGINT/SIGTERM handling from
UpstreamProcessintoServiceto coordinate server drain + upstream signaling order. - Add
HTTP_DRAIN_TIMEOUT/HttpDrainTimeoutconfig and use it forServer.Stop()shutdown deadlines. - Add a regression test covering “stop accepting connections before signaling upstream”, plus README docs.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/upstream_process.go | Removes standalone signal forwarding from the upstream wrapper so shutdown can be coordinated at the service level. |
| internal/service.go | Adds signal handling + gracefulShutdown that drains the server before signaling upstream. |
| internal/server.go | Makes Stop() idempotent and uses configurable drain timeout. |
| internal/config.go | Adds HttpDrainTimeout with env var + default. |
| internal/config_test.go | Extends config tests to cover the new drain timeout. |
| internal/service_test.go | Adds a test asserting the server listener is closed before upstream signaling. |
| README.md | Documents HTTP_DRAIN_TIMEOUT. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
On SIGTERM/SIGINT, Thruster relayed the signal straight to the upstream process and only closed its own HTTP listener afterwards, via the deferred server.Stop() in Service.Run -- which doesn't run until upstream.Run() returns, i.e. until the upstream has already exited. That leaves a window for the entire duration of the upstream's shutdown (typically tens of seconds while it drains in-flight requests) where Thruster still accepts new connections on its HTTP port but the upstream has stopped listening. Each such connection fails at the proxy hop with 'connection refused' and is returned to the client as a 502. On a busy service this produces a burst of user-facing 502s on every deploy. Reverse the order: intercept the signal in the Service, drain the HTTP server first (closing the listener so new connections are refused at the TCP level, where an upstream load balancer will route them elsewhere, while in-flight requests finish against the still-running upstream), and only then relay the signal to the upstream. Signal handling moves from UpstreamProcess to Service, which owns both the server and the upstream and so is the natural place to coordinate their shutdown order. Server.Stop is made idempotent so it can be called both here and from the deferred cleanup. The drain timeout was previously hardcoded at 5 seconds -- which never mattered before, since the listener wasn't closed until the upstream had already exited. Now that it is meaningful, make it configurable via HTTP_DRAIN_TIMEOUT, defaulting to 30 seconds to match kamal-proxy's drain-timeout.
7353a0a to
310147e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On
SIGTERM/SIGINT, Thruster relays the signal straight to the upstream process and only closes its own HTTP listener afterwards, via the deferredserver.Stop()inService.Run— which doesn't run untilupstream.Run()returns, i.e. until the upstream has already exited.That leaves a window, for the entire duration of the upstream's shutdown (typically tens of seconds while it drains in-flight requests), where Thruster still accepts new connections on its HTTP port but the upstream has stopped listening. Each such connection fails at the proxy hop with
connection refusedand is returned to the client as a502. On a busy service this produces a burst of user-facing 502s on every deploy.Change
Reverse the order: intercept the signal in the
Service, drain the HTTP server first — closing the listener so new connections are refused at the TCP level (where an upstream load balancer routes them elsewhere) while in-flight requests finish against the still-running upstream — and only then relay the signal to the upstream.UpstreamProcesstoService, which owns both the server and the upstream and so is the natural place to coordinate their shutdown order.Server.Stopis made idempotent, since it is now called both from the signal handler and from the deferred cleanup inRun.HTTP_DRAIN_TIMEOUT, defaulting to 30s to match kamal-proxy's--drain-timeout.Before / after
Wrapping a backend that releases its port on
SIGTERMand then takes a few seconds to exit (as Puma does while draining), while hammering the front-end during the shutdown window:502(dial tcp ...: connect: connection refused)http.Server.ShutdownTests
Adds
service_test.goasserting the HTTP listener stops accepting connections before the upstream is signalled, plus config coverage for the newHTTP_DRAIN_TIMEOUTsetting.