-
Notifications
You must be signed in to change notification settings - Fork 1.4k
0.14.0: listener can remain deleted after AutoscalingRunnerSet patch when minRunners > 0, leaving jobs queued #4432
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.14.0
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes.
To Reproduce
- Deploy the official
gha-runner-scale-set-controller/gha-runner-scale-setcharts at0.14.0. - Configure an org-level scale set with
minRunners > 0(in our caseminRunners: 2,maxRunners: 4). - Let the pool keep warm idle runners.
- Patch the
AutoscalingRunnerSetspec (in our case we changedminRunners/maxRunnersa few times while tuning the pool). - The controller deletes the out-of-date listener.
- If warm runners are still present, the listener is not recreated and new GitHub jobs targeting that scale set remain queued.
Describe the bug
We are seeing a repeated listener deadlock on 0.14.0 with a warm ARC build pool.
After patching the build AutoscalingRunnerSet, the controller deletes the listener and then gets stuck in this state:
AutoscalingListener does not exist.Creating a new AutoscalingListener is waiting for the running and pending runners to finish
The important detail is that the remaining runners are not active jobs anymore; they are idle warm runners from minRunners > 0.
When this happens:
- the scale set has no listener pod
- at least one warm
EphemeralRunneris still present /Running - new workflow jobs targeting the scale set stay
queuedindefinitely
Manual workaround: deleting the idle EphemeralRunner immediately allows ARC to recreate the listener, and the queued workflow starts running.
This looks very similar to #4200, but we are reproducing it on 0.14.0, which should already include #4289.
Describe the expected behavior
Patching an AutoscalingRunnerSet with minRunners > 0 should not leave the scale set without a listener while idle warm runners still exist.
ARC should either:
- recreate the listener while warm runners still exist, or
- correctly distinguish idle warm runners from runners that should block listener recreation
New jobs should not remain queued indefinitely after a scale set patch.
Additional Context
Relevant AutoscalingRunnerSet config:
spec:
runnerScaleSetName: <build-scale-set>
githubConfigUrl: https://github.com/<redacted-org>
minRunners: 2
maxRunners: 4
listenerTemplate:
spec:
nodeSelector:
kubernetes.io/os: linux
kubernetes.io/arch: amd64
template:
spec:
nodeSelector:
kubernetes.io/os: linux
kubernetes.io/arch: amd64
containers:
- name: runner
image: ghcr.io/actions/actions-runner@<redacted-digest>
resources:
requests:
cpu: "4"
memory: 8Gi
limits:
cpu: "16"
memory: 32Gi
- name: dind
image: docker:dind@<redacted-digest>
resources:
requests:
cpu: "4"
memory: 8Gi
limits:
cpu: "16"
memory: 32Gi
securityContext:
privileged: trueObserved on 2026-04-04.
One concrete affected workflow run was a build job in a private repository that stayed queued until we manually deleted the idle EphemeralRunner.
Controller Logs
Most relevant excerpt:
2026-04-04T15:28:00Z INFO AutoscalingRunnerSet RunnerScaleSetListener is out of date. Deleting it so that it is recreated {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"},"name":"<build-scale-set>-listener"}
2026-04-04T15:28:00Z INFO AutoscalingRunnerSet Deleted RunnerScaleSetListener since existing one is out of date {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"}}
2026-04-04T15:28:02Z INFO AutoscalingRunnerSet AutoscalingListener does not exist. {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"}}
2026-04-04T15:28:02Z INFO AutoscalingRunnerSet Creating a new AutoscalingListener is waiting for the running and pending runners to finish. Waiting for the running and pending runners to finish: {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"},"running":2,"pending":0}
2026-04-04T15:31:01Z INFO AutoscalingRunnerSet AutoscalingListener does not exist. {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"}}
2026-04-04T15:31:01Z INFO AutoscalingRunnerSet Creating a new AutoscalingListener is waiting for the running and pending runners to finish. Waiting for the running and pending runners to finish: {"autoscalingrunnerset":{"name":"<build-scale-set>","namespace":"github-actions-runners"},"running":1,"pending":0}If additional controller or runner logs would be useful, I can provide a further-redacted subset privately in follow-up.
Runner Pod Logs
The runner pods themselves were not crash-looping. In our case the issue was that idle warm runners remained present while the listener was gone, which blocked new job assignment until we manually deleted the idle EphemeralRunner.