Skip to content

docs: RFC for building a pod deletion cost controller to enable Karpenter to work better with the ReplicaSet Controller#2935

Open
nathangeology wants to merge 1 commit intokubernetes-sigs:mainfrom
nathangeology:Pod-Deletion-Cost-RFC
Open

docs: RFC for building a pod deletion cost controller to enable Karpenter to work better with the ReplicaSet Controller#2935
nathangeology wants to merge 1 commit intokubernetes-sigs:mainfrom
nathangeology:Pod-Deletion-Cost-RFC

Conversation

@nathangeology
Copy link
Copy Markdown
Contributor

@nathangeology nathangeology commented Mar 26, 2026

Description

Add RFC design document for the Pod Deletion Cost Controller — a new feature-gated controller that reduces voluntary pod disruption rates of replica pods during Karpenter consolidation.

Today, the ReplicaSet controller and Karpenter's consolidation controller operate independently. When a Deployment scales down, the ReplicaSet controller spreads pod deletions evenly across nodes. This keeps all nodes partially occupied and prevents Karpenter from having better consolidation targets — especially painful for ConsolidateWhenEmpty NodePools where a node must be completely free of pods before removal.

This RFC proposes a controller that bridges the gap by setting controller.kubernetes.io/pod-deletion-cost annotations on pods based on their node's consolidation priority. Pods on nodes Karpenter wants to consolidate first get the lowest deletion costs, so the ReplicaSet controller naturally concentrates scale-down deletions on those nodes. The result is a positive feedback loop: targeted nodes drain faster, become empty sooner, and Karpenter consolidates them with less (or zero) disruption.

Key design points:

Feature-gated behind PodDeletionCostManagement (defaults to false), no CRD changes
Nodes partitioned into consolidate-able (Group A) vs do-not-disrupt (Group B) before ranking
SHA-256 change detection skips reconcile cycles when cluster state is unchanged (zero API writes in steady state)
Two-annotation protocol (pod-deletion-cost + karpenter.sh/managed-deletion-cost) ensures customer-set annotations are never overwritten
Pluggable ranking strategies: pod count (recommended), unallocated vCPU per pod, node size, random
New Prometheus metrics and Kubernetes events for observability
Full RFC:
pod-deletion-cost-controller.md

Related Issues

#2123
https://github.com/aws/karpenter/issues/4356
aws/karpenter-provider-aws#3785
aws/karpenter-provider-aws#3927
Does this change impact any existing behavior?

No. This PR adds a design document only. The proposed controller is feature-gated off by default and requires explicit opt-in.

NOTE: Will be proposing changes to the replicaset controller in parallel, will link that KEP here once it's submitted and ready. We only need the changes here or on the replicaset controller (don't need both to see the benefits, but having both doesn't hurt anything either).
Replicaset Controller issue: kubernetes/enhancements#5982

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 26, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nathangeology
Once this PR has been reviewed and has the lgtm label, please assign tzneal for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 26, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @nathangeology. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 26, 2026
nathangeology

This comment was marked as resolved.

nathangeology added a commit to nathangeology/karpenter-core that referenced this pull request Apr 2, 2026
…n-aware scale-in (kp-1tj)

Add proposed updates document addressing review feedback on the pod
deletion cost controller RFC:

1. Fill in the ReplicaSet Controller Strategy section (was a TODO)
   covering short-term pod-deletion-cost approach and long-term
   ConsolidatingScaleDown KEP (kubernetes/enhancements#5982)
2. New Placement and Spreading Constraints section addressing
   reviewer concerns about topology spread and scheduling constraints
3. New risk entry for topology spread constraint interaction

This is a review-only document for human review before pushing
changes to the upstream PR.
@nathangeology nathangeology marked this pull request as ready for review April 2, 2026 21:14
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2026
nathangeology added a commit to nathangeology/karpenter-core that referenced this pull request Apr 2, 2026
@nathangeology nathangeology force-pushed the Pod-Deletion-Cost-RFC branch from b980166 to 3897e53 Compare April 3, 2026 16:51
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 3, 2026
Proposes a feature-gated controller that ranks nodes by consolidation
preference (with drifted nodes prioritized) and propagates that ranking
to pods via pod-deletion-cost annotations. Three-tier ranking:
drifted nodes → normal nodes → do-not-disrupt nodes, each sorted by
pod count ascending to follow Karpenter's consolidation candidate sorting.
@nathangeology nathangeology force-pushed the Pod-Deletion-Cost-RFC branch from 3897e53 to eb6c4c2 Compare April 3, 2026 16:55
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants