fix: Track pending nodeclaims in provisioning state#2947
fix: Track pending nodeclaims in provisioning state#2947w1t wants to merge 1 commit intokubernetes-sigs:mainfrom
Conversation
Seed cluster state with a tracked copy of newly created NodeClaims before lifecycle status arrives so pending claims are visible to scheduling and count against nodepool accounting during the async gap. Add a regression test covering repeated scheduling after an unlaunched NodeClaim remains.
|
|
Welcome @w1t! |
|
Hi @w1t. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: w1t The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary
Track newly created pending
NodeClaims in cluster state immediately afterCreate, before lifecycle/status updates populate the real provider ID and resolved capacity fields.This addresses
kubernetes-sigs/karpenter#2854, where repeated scheduling rounds could create additionalNodeClaims while an earlier claim remained pending and outside the state used by provisioning.The change also adds a regression test covering repeated scheduling after an unlaunched
NodeClaimremains.Problem
The reported failure mode is:
NodeClaimNodeClaimremains pendingNodeClaimThat can repeat when nodes fail to register, allowing over-creation relative to the intended
NodePoolbudget.Fix
After
Createpersists theNodeClaim, update cluster state with a tracked copy when the claim does not yet have aProviderID.That tracked copy:
Status.Capacityto the max capacity acrossInstanceTypeOptionsStatus.Allocatableto the max allocatable acrossInstanceTypeOptionsThis makes the pending claim visible to the same state consumed by provisioning before lifecycle/status reconciliation catches up.
Depending on the pod's constraints, duplicate creation is prevented in one of two ways:
NodeClaimis producedRegression test
The new regression test covers the reported loop:
NodePoolNodeClaimNodeClaimNodeClaimis createdThe important end-to-end behavior is that the first pending claim is already represented in cluster state, so the second scheduling round does not create another claim.
Tradeoff
This uses a pessimistic reservation model for pending claims by taking the max capacity/allocatable across the candidate instance types.
That can temporarily over-reserve in tight-limit pools when the eventual launched instance would have been smaller, but it is the intended tradeoff for this fix: prefer bounding provisioning during the async gap over allowing
repeated over-creation while pending claims are invisible.
Issue
Fixes #2854
PR #2526 addressed node-count limit enforcement once a pending claim is represented in StateNode capacity (by adding Node as a static resource). This PR addresses the earlier gap for resource limits (CPU, memory) before a newly
created pending NodeClaim has a real ProviderID and enters state in a form provisioning can reuse.