fix(tasks): release pool state when HA finalize sees persisted End#3933
Draft
cursor[bot] wants to merge 1 commit into
Draft
fix(tasks): release pool state when HA finalize sees persisted End#3933cursor[bot] wants to merge 1 commit into
cursor[bot] wants to merge 1 commit into
Conversation
When FinalizeRemoteTask refreshes a task from the DB and finds End already set, it returned early to avoid duplicate finishRun/autorun work. That left running/active Redis bookkeeping leaked if the node that persisted End crashed before onTaskStop ran. Still skip re-finalization, but call onTaskStop to release pool capacity. Co-authored-by: Denis Gukov <fiftin@outlook.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug and impact
In HA mode, when a remote runner task is finalized on one node and that node crashes after persisting
Endto the database but beforeonTaskStopruns, the shared task pool state (running tasks, active-by-project counts, claims) is never released.User-visible impact: parallel task limits and runner slot accounting stay exhausted until the Semaphore process restarts, blocking new tasks from running even though the DB shows the task as finished.
Introduced by the HA duplicate-finalize guard in #3925 (
Ha remove owner).Root cause
FinalizeRemoteTaskrefreshes the task from the DB and returns early whenEndis already set, intentionally skipping duplicatefinishRun/ autorun work. That early return also skippedonTaskStop, which is the only path that releases pool/Redis bookkeeping.Fix
When
Endis already persisted, callonTaskStop(tsk)before returning. This is idempotent when the owning node already cleaned up, and repairs the leak when it did not.Validation
TestTaskPool_FinalizeRemoteTask_ReleasesPoolStateWhenEndAlreadyPersistedgo test ./services/tasks/... ./services/runners/...