Skip to content

fix: skip ephemeral volume topology requirements for bound pods during consolidation#2907

Open
moko-poi wants to merge 1 commit intokubernetes-sigs:mainfrom
moko-poi:fix/ephemeral-volume-consolidation-tsc
Open

fix: skip ephemeral volume topology requirements for bound pods during consolidation#2907
moko-poi wants to merge 1 commit intokubernetes-sigs:mainfrom
moko-poi:fix/ephemeral-volume-consolidation-tsc

Conversation

@moko-poi
Copy link
Copy Markdown
Contributor

@moko-poi moko-poi commented Mar 11, 2026

Summary

Fix for #2803: Ephemeral volumes with zone TopologySpreadConstraints block consolidation with Unconsolidatable events.

Skip ephemeral volume topology requirements when evaluating bound pods for consolidation, since ephemeral PVCs are deleted with the pod and a new PVC will be created on rescheduling. This primarily targets WaitForFirstConsumer StorageClasses, which is the common case for ephemeral volumes. See inline comment for discussion on
Immediate binding mode.

Test plan

  • Unit tests: bound pod + ephemeral volume returns no topology requirements
  • Unit tests: unbound pod + ephemeral volume returns topology requirements (unchanged behavior)
  • Unit tests: bound pod + regular PVC returns topology requirements (unchanged behavior)
  • Integration test: consolidation replaces node with ephemeral volumes and zone TSC
  • Full scheduling test suite passes (339 tests)
  • Full disruption test suite passes (233 tests)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 11, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: moko-poi
Once this PR has been reviewed and has the lgtm label, please assign tzneal for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from jmdeal and tallaxes March 11, 2026 09:28
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 11, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @moko-poi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 11, 2026
@moko-poi moko-poi marked this pull request as ready for review April 2, 2026 01:42
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2026
@moko-poi
Copy link
Copy Markdown
Contributor Author

moko-poi commented Apr 2, 2026

@jmdeal Thanks for the analysis on this issue! I wanted to get your thoughts on one thing.

In your comment on #2803, you noted

One nuance that I'll note is that if the VolumeBindingMode is Immediate, we would still want to honor the ephemeral volume constraints.

The current implementation skips topology requirements for all ephemeral volumes on bound pods, regardless of VolumeBindingMode:

if volume.Ephemeral != nil && pod.Spec.NodeName != "" {
    return nil, nil
}

For WaitForFirstConsumer, this is correct — the new PVC will bind to whatever zone the pod is scheduled to, so the old PVC's zone constraint is irrelevant.

For Immediate, the situation is more nuanced. If we skip the constraint, Karpenter may create a replacement node in a zone that differs from where the provisioner binds the new PVC, which could leave the pod temporarily unschedulable. However, if we don't skip, the old PVC's zone constraint will continue to cause incorrect TSC
counts, blocking consolidation entirely (which is the current bug). Since the old PVC is deleted with the pod and the new PVC's zone is determined independently by the provisioner, honoring the old constraint doesn't guarantee correctness either.

Given that Immediate + ephemeral volumes is an uncommon combination and skipping is still an improvement over the current behavior (permanently blocked consolidation), I went with the simpler approach for now. That said, if you think Immediate mode warrants additional handling, I'd like to discuss what the right approach would
be. Let me know your thoughts.

@moko-poi moko-poi force-pushed the fix/ephemeral-volume-consolidation-tsc branch from 9ee5051 to 148ee66 Compare April 2, 2026 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants