Skip to content

docs: Support Capacity Buffers in Karpenter#2898

Open
sumukha-radhakrishna wants to merge 7 commits intokubernetes-sigs:mainfrom
sumukha-radhakrishna:capacity-buffers-rfc
Open

docs: Support Capacity Buffers in Karpenter#2898
sumukha-radhakrishna wants to merge 7 commits intokubernetes-sigs:mainfrom
sumukha-radhakrishna:capacity-buffers-rfc

Conversation

@sumukha-radhakrishna
Copy link
Copy Markdown
Contributor

Fixes #749 #2571

Description

Support sig-autoscaling CapacityBuffer API

How was this change tested?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sumukha-radhakrishna
Once this PR has been reviewed and has the lgtm label, please assign maciekpytel for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 9, 2026
@k8s-ci-robot k8s-ci-robot requested review from tallaxes and tzneal March 9, 2026 20:49
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 9, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @sumukha-radhakrishna. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 9, 2026
@sumukha-radhakrishna sumukha-radhakrishna changed the title Capacity buffers rfc docs: Support Capacity Buffers in Karpenter Mar 9, 2026

**Ephemeral Capacity Strategy:**

After the initial active buffer implementation, we will implement ephemeral capacity strategy to support batch systems like Kueue. Ephemeral strategy will provide:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: To harden support for batch systems. Kueue can work with active buffers, its just a bit racey

- Ephemeral capacity strategy: One-time capacity request pattern for batch systems like Kueue (deferred to future work)
- Adding `expireAfter` field to the API in initial implementation (requires upstream sig-autoscaling consensus)

# Future Work
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should probably all be towards the end, we want to talk about what we are working on first instead of what we will work on

- No → Create NodeClaims, keep `Provisioning: False` until nodes are available
4. Only sets buffer status to `Provisioning: True` when virtual pods can be successfully placed on existing cluster capacity without creating new NodeClaims

**Key Point:** Virtual pods are reconstructed every provisioning loop from buffer status. No pod objects are created. The `Provisioning: True` status reflects actual available capacity in the cluster, ensuring the status accurately represents whether buffer capacity is ready for use even if NodeClaims fail to provision.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we'll cache the recreated pods?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mega Issue: Manual node provisioning

4 participants