docs: RFC to introduce node replacement strategies during drift, starting with optionally not requiring replacements#2906
Conversation
|
Hi @vaietc. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vaietc The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@DerekFrank as discussed at the working group meeting, I moved the issue over from the AWS provider to the main Karpenter project. I've attached an early RFC for review and would appreciate any early feedback :) |
|
Summarizing notes from working group:
|
|
@DerekFrank Updated the proposal to help answer the questions raised, summarizing here:
I've proposed a small tweak to the current disruption budget calculation to include Terminating and Initializing nodes in the list of total nodes in the NodePool AND in the 'disrupting' Nodes section. I believe this change more accurately reflects the spirit of the disruption budget to track all disrupted nodes, not necessarily just ones disrupted by Karpenter.
Did a quick read of the code and I think the answer is no. Since the disruption budget calculation is the same for both static and dynamic capacity, that change can be made in one common place.
I think its a bit tricky to reason about for Consolidation since sometimes the decision to consolidate is based on one of the following:
As a result, I propose we don't touch consolidation for now but we can control it later if we want with changes to the API. For now, I'd like to have
I renamed to 'Terminate' since that is the only action the disruption controller takes and accurately captures the setting |
Fixes #2905
Description
This change adds a small design for supporting different node replacement strategies during drift resolution, starting with not requiring replacements at all. The design attempts to extend the API such that other replacement strategies can be introduced over time.
How was this change tested?
Docs update only so far, no code yet.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.