Skip to content

fix(worker): per-commit concurrency limit for upload_finisher tasks#754

Closed
thomasrockhu-codecov wants to merge 1 commit intomainfrom
fix/finisher-per-commit-concurrency-limit
Closed

fix(worker): per-commit concurrency limit for upload_finisher tasks#754
thomasrockhu-codecov wants to merge 1 commit intomainfrom
fix/finisher-per-commit-concurrency-limit

Conversation

@thomasrockhu-codecov
Copy link
Copy Markdown
Contributor

@thomasrockhu-codecov thomasrockhu-codecov commented Mar 10, 2026

Summary

  • Add a Redis-based per-commit concurrency gate to upload_finisher tasks. Only MAX_CONCURRENT_FINISHERS_PER_COMMIT (3) finisher tasks can actively work on a given commit at any time. Excess tasks exit in milliseconds, freeing their worker slots immediately.

Why

Large CI matrices fire one finisher task per upload via chord callbacks. All these finishers fight for the same per-commit Redis lock (UPLOAD_PROCESSING). With blocking_timeout=30s, each excess task wastes a worker slot for 30 seconds before failing, retrying with backoff, and going back to the queue. This creates a cascading worker starvation effect.

How it works

run_impl():
    active_count = redis.incr("upload_finisher_active:{repoid}:{commitid}")
    redis.expire(key, 660)  # safety TTL, slightly beyond soft_time_limit

    if active_count > MAX_CONCURRENT_FINISHERS_PER_COMMIT:
        redis.decr(key)
        return {concurrency_limited: True}   # exits in ~1ms

    try:
        ... existing logic ...
    finally:
        redis.decr(key)
  • INCR on entry: atomically claims a slot
  • Limit check: if over the limit, immediately decrement and exit
  • Finally block: guarantees decrement on all exit paths (success, error, timeout, retry)
  • TTL safety net: 660s TTL ensures counter resets even if a task crashes without reaching finally
  • Limit of 3 (not 1): allows failover if the active finisher crashes

Impact estimate

For a commit with N concurrent finishers:

  • Before: N tasks x 30s lock wait = significant wasted worker time per commit
  • After: 3 tasks proceed, (N-3) exit in ~1ms = near-zero overhead

Test plan

  • test_exits_early_when_concurrency_limit_reached — verifies tasks over the limit return {concurrency_limited: True} and decrement the counter
  • test_proceeds_when_under_concurrency_limit — verifies tasks under the limit proceed normally and decrement on exit
  • test_counter_decremented_on_exception — verifies the counter is decremented even when the task throws an exception
  • Existing tests pass

Large CI matrices (e.g. stacks-network with 167 uploads, mixpanel with
100+) fire one finisher task per upload. All of them fight for the same
per-commit Redis lock, blocking worker slots for up to 30s each before
retrying. This starves the worker pool and causes a cascading queue
buildup (44k+ tasks).

Add a Redis INCR-based concurrency gate: only
MAX_CONCURRENT_FINISHERS_PER_COMMIT (3) tasks can actively work on a
given commit at any time. Excess tasks exit in milliseconds, freeing
their worker slots immediately. The counter is decremented in a finally
block with a 660s TTL safety net.

Made-with: Cursor
@thomasrockhu-codecov thomasrockhu-codecov deleted the fix/finisher-per-commit-concurrency-limit branch March 10, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant