Skip to content

fix(webservice): health-monitor skips projects with an in-flight deploy#2730

Merged
lukemarsden merged 1 commit into
mainfrom
fix/health-monitor-skip-inflight-deploy
Jun 25, 2026
Merged

fix(webservice): health-monitor skips projects with an in-flight deploy#2730
lukemarsden merged 1 commit into
mainfrom
fix/health-monitor-skip-inflight-deploy

Conversation

@lukemarsden

Copy link
Copy Markdown
Collaborator

The web-service health-monitor probed active services and recovered after 3 consecutive failures — but a fresh deploy's docker compose up --build legitimately 502s for minutes. So the monitor raced the initial deploy and fired a redundant restart-in-place (the benign superseded we saw on find-ai's first prod deploy); for a slow build it could repeatedly interrupt it.

Fix: skip recovery for any project whose latest web_service_deploy is pending/building, and clear accrued failures so a genuine post-deploy failure still recovers.

Unit test covers the status gate (pending/building → skip; live/failed/superseded/none → probe normally). go test ./pkg/webservice/ green.

🤖 Generated with Claude Code

The health-monitor probed live web services and recovered after 3 failures, but
a fresh deploy's `docker compose up --build` legitimately returns errors for
minutes — so the monitor raced the initial deploy and fired a redundant
restart-in-place (the benign "superseded" seen on find-ai's first prod deploy).
Worse, for a slow build it could repeatedly interrupt it.

Skip recovery for any project whose latest web_service_deploy is pending or
building; clear accrued failures so a real failure after the deploy still
recovers normally.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@lukemarsden lukemarsden merged commit 2a54532 into main Jun 25, 2026
5 checks passed
@lukemarsden lukemarsden deleted the fix/health-monitor-skip-inflight-deploy branch June 25, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant