Skip to content

fix(tasks): release pool state when HA finalize sees persisted End#3933

Draft
cursor[bot] wants to merge 1 commit into
developfrom
cursor/critical-bug-investigation-3d69
Draft

fix(tasks): release pool state when HA finalize sees persisted End#3933
cursor[bot] wants to merge 1 commit into
developfrom
cursor/critical-bug-investigation-3d69

Conversation

@cursor

@cursor cursor Bot commented Jun 9, 2026

Copy link
Copy Markdown

Bug and impact

In HA mode, when a remote runner task is finalized on one node and that node crashes after persisting End to the database but before onTaskStop runs, the shared task pool state (running tasks, active-by-project counts, claims) is never released.

User-visible impact: parallel task limits and runner slot accounting stay exhausted until the Semaphore process restarts, blocking new tasks from running even though the DB shows the task as finished.

Introduced by the HA duplicate-finalize guard in #3925 (Ha remove owner).

Root cause

FinalizeRemoteTask refreshes the task from the DB and returns early when End is already set, intentionally skipping duplicate finishRun / autorun work. That early return also skipped onTaskStop, which is the only path that releases pool/Redis bookkeeping.

Fix

When End is already persisted, call onTaskStop(tsk) before returning. This is idempotent when the owning node already cleaned up, and repairs the leak when it did not.

Validation

  • Added TestTaskPool_FinalizeRemoteTask_ReleasesPoolStateWhenEndAlreadyPersisted
  • go test ./services/tasks/... ./services/runners/...
Open in Web View Automation 

When FinalizeRemoteTask refreshes a task from the DB and finds End already
set, it returned early to avoid duplicate finishRun/autorun work. That left
running/active Redis bookkeeping leaked if the node that persisted End crashed
before onTaskStop ran.

Still skip re-finalization, but call onTaskStop to release pool capacity.

Co-authored-by: Denis Gukov <fiftin@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant