Skip to content

0.14.0: listener can remain deleted after AutoscalingRunnerSet patch when minRunners > 0, leaving jobs queued #4432

@tonypottera24

Description

@tonypottera24

Checks

Controller Version

0.14.0

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes.

To Reproduce

  1. Deploy the official gha-runner-scale-set-controller / gha-runner-scale-set charts at 0.14.0.
  2. Configure an org-level scale set with minRunners > 0 (in our case minRunners: 2, maxRunners: 4).
  3. Let the pool keep warm idle runners.
  4. Patch the AutoscalingRunnerSet spec (in our case we changed minRunners / maxRunners a few times while tuning the pool).
  5. The controller deletes the out-of-date listener.
  6. If warm runners are still present, the listener is not recreated and new GitHub jobs targeting that scale set remain queued.

Describe the bug

We are seeing a repeated listener deadlock on 0.14.0 with a warm ARC build pool.

After patching the build AutoscalingRunnerSet, the controller deletes the listener and then gets stuck in this state:

  • AutoscalingListener does not exist.
  • Creating a new AutoscalingListener is waiting for the running and pending runners to finish

The important detail is that the remaining runners are not active jobs anymore; they are idle warm runners from minRunners > 0.

When this happens:

  • the scale set has no listener pod
  • at least one warm EphemeralRunner is still present / Running
  • new workflow jobs targeting the scale set stay queued indefinitely

Manual workaround: deleting the idle EphemeralRunner immediately allows ARC to recreate the listener, and the queued workflow starts running.

This looks very similar to #4200, but we are reproducing it on 0.14.0, which should already include #4289.

Describe the expected behavior

Patching an AutoscalingRunnerSet with minRunners > 0 should not leave the scale set without a listener while idle warm runners still exist.

ARC should either:

  • recreate the listener while warm runners still exist, or
  • correctly distinguish idle warm runners from runners that should block listener recreation

New jobs should not remain queued indefinitely after a scale set patch.

Additional Context

Relevant AutoscalingRunnerSet config:

spec:
  runnerScaleSetName: <build-scale-set>
  githubConfigUrl: https://github.com/<redacted-org>
  minRunners: 2
  maxRunners: 4
  listenerTemplate:
    spec:
      nodeSelector:
        kubernetes.io/os: linux
        kubernetes.io/arch: amd64
  template:
    spec:
      nodeSelector:
        kubernetes.io/os: linux
        kubernetes.io/arch: amd64
      containers:
        - name: runner
          image: ghcr.io/actions/actions-runner@<redacted-digest>
          resources:
            requests:
              cpu: "4"
              memory: 8Gi
            limits:
              cpu: "16"
              memory: 32Gi
        - name: dind
          image: docker:dind@<redacted-digest>
          resources:
            requests:
              cpu: "4"
              memory: 8Gi
            limits:
              cpu: "16"
              memory: 32Gi
          securityContext:
            privileged: true

Observed on 2026-04-04.

One concrete affected workflow run was a build job in a private repository that stayed queued until we manually deleted the idle EphemeralRunner.

Controller Logs

Most relevant excerpt:

2026-04-04T15:28:00Z INFO AutoscalingRunnerSet RunnerScaleSetListener is out of date. Deleting it so that it is recreated {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"},"name":"<build-scale-set>-listener"}
2026-04-04T15:28:00Z INFO AutoscalingRunnerSet Deleted RunnerScaleSetListener since existing one is out of date {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"}}
2026-04-04T15:28:02Z INFO AutoscalingRunnerSet AutoscalingListener does not exist. {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"}}
2026-04-04T15:28:02Z INFO AutoscalingRunnerSet Creating a new AutoscalingListener is waiting for the running and pending runners to finish. Waiting for the running and pending runners to finish: {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"},"running":2,"pending":0}
2026-04-04T15:31:01Z INFO AutoscalingRunnerSet AutoscalingListener does not exist. {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"}}
2026-04-04T15:31:01Z INFO AutoscalingRunnerSet Creating a new AutoscalingListener is waiting for the running and pending runners to finish. Waiting for the running and pending runners to finish: {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"},"running":1,"pending":0}

If additional controller or runner logs would be useful, I can provide a further-redacted subset privately in follow-up.

Runner Pod Logs

The runner pods themselves were not crash-looping. In our case the issue was that idle warm runners remained present while the listener was gone, which blocked new job assignment until we manually deleted the idle EphemeralRunner.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions