fix: Track pending nodeclaims in provisioning state by w1t · Pull Request #2947 · kubernetes-sigs/karpenter

w1t · 2026-04-04T07:36:21Z

Summary

Track newly created pending NodeClaims in cluster state immediately after Create, before lifecycle/status updates populate the real provider ID and resolved capacity fields.

This addresses kubernetes-sigs/karpenter#2854, where repeated scheduling rounds could create additional NodeClaims while an earlier claim remained pending and outside the state used by provisioning.

The change also adds a regression test covering repeated scheduling after an unlaunched NodeClaim remains.

Problem

The reported failure mode is:

provisioning creates a NodeClaim
the original pod goes away, but the NodeClaim remains pending
the node never progresses far enough to be fully represented in cluster state
a later scheduling round still behaves as if that pending claim does not exist
Karpenter creates another NodeClaim

That can repeat when nodes fail to register, allowing over-creation relative to the intended NodePool budget.

Fix

After Create persists the NodeClaim, update cluster state with a tracked copy when the claim does not yet have a ProviderID.

That tracked copy:

uses the nodeclaim name as a temporary provider ID
sets Status.Capacity to the max capacity across InstanceTypeOptions
sets Status.Allocatable to the max allocatable across InstanceTypeOptions

This makes the pending claim visible to the same state consumed by provisioning before lifecycle/status reconciliation catches up.

Depending on the pod's constraints, duplicate creation is prevented in one of two ways:

if the later pod is compatible with the pending tracked node, the scheduler can place it there and no new NodeClaim is produced
if the later pod cannot use that tracked node, the pending claim is still represented in nodepool accounting, so create-time limit enforcement remains a backstop

Regression test

The new regression test covers the reported loop:

create a CPU-limited NodePool
schedule a pod that creates one NodeClaim
delete the original pod while leaving the pending NodeClaim
schedule a second pod
assert that no second NodeClaim is created

The important end-to-end behavior is that the first pending claim is already represented in cluster state, so the second scheduling round does not create another claim.

Tradeoff

This uses a pessimistic reservation model for pending claims by taking the max capacity/allocatable across the candidate instance types.

That can temporarily over-reserve in tight-limit pools when the eventual launched instance would have been smaller, but it is the intended tradeoff for this fix: prefer bounding provisioning during the async gap over allowing
repeated over-creation while pending claims are invisible.

Issue

Fixes #2854

PR #2526 addressed node-count limit enforcement once a pending claim is represented in StateNode capacity (by adding Node as a static resource). This PR addresses the earlier gap for resource limits (CPU, memory) before a newly
created pending NodeClaim has a real ProviderID and enters state in a form provisioning can reuse.

Seed cluster state with a tracked copy of newly created NodeClaims before lifecycle status arrives so pending claims are visible to scheduling and count against nodepool accounting during the async gap. Add a regression test covering repeated scheduling after an unlaunched NodeClaim remains.

linux-foundation-easycla · 2026-04-04T07:36:30Z

❌ - login: @w1t / name: w1t . The commit (68127a5) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

k8s-ci-robot · 2026-04-04T07:36:31Z

Welcome @w1t!

It looks like this is your first PR to kubernetes-sigs/karpenter 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/karpenter has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2026-04-04T07:36:32Z

Hi @w1t. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2026-04-04T07:36:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: w1t
Once this PR has been reviewed and has the lgtm label, please assign jonathan-innis for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 4, 2026

k8s-ci-robot requested review from engedaam and tallaxes April 4, 2026 07:36

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Track pending nodeclaims in provisioning state#2947

fix: Track pending nodeclaims in provisioning state#2947
w1t wants to merge 1 commit intokubernetes-sigs:mainfrom
w1t:fix_issue2854

w1t commented Apr 4, 2026

Uh oh!

linux-foundation-easycla bot commented Apr 4, 2026

Uh oh!

k8s-ci-robot commented Apr 4, 2026

Uh oh!

k8s-ci-robot commented Apr 4, 2026

Uh oh!

k8s-ci-robot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

w1t commented Apr 4, 2026

Summary

Problem

Fix

Regression test

Tradeoff

Issue

Uh oh!

linux-foundation-easycla bot commented Apr 4, 2026

Uh oh!

k8s-ci-robot commented Apr 4, 2026

Uh oh!

k8s-ci-robot commented Apr 4, 2026

Uh oh!

k8s-ci-robot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants