KEP-5726: TopologyManager CPU-attached NUMA filter option#5992
KEP-5726: TopologyManager CPU-attached NUMA filter option#5992fanzhangio wants to merge 1 commit intokubernetes:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fanzhangio The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @fanzhangio! |
|
Hi @fanzhangio. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
Signed-off-by: Fan Zhang <fanzhang@nvidia.com>
855131c to
8b8723b
Compare
|
@ffromani Could we add this KEP to the SIG-Node agenda for discussion? I’d be happy to present it in the meeting (or follow up asynchronously over email if that’s easier) Thanks. |
Yes, it is 1.37 material, but the 1.37 cycle did not begun yet (we didn't finish 1.36). That said, you can present earlier to sig-node if you wish to. |
What type of PR is this?
/kind documentation
/kind cleanup
/kind kep
What this PR does / why we need it:
Adds a draft KEP for an alpha TopologyManager policy option:
restrict-to-cpu-numa-nodesThis option is intended to help kubelet handle large NUMA systems that expose many CPU-less NUMA nodes, such as NVIDIA GraceBlackwell / VeraRubin-class platforms.
When enabled, TopologyManager computes an effective NUMA-node set containing only NUMA nodes with CPUs attached and propagates that same view to topology-aware hint providers.
The proposal is intended to address the root issue discussed in:
This proposal also reworks and supersedes the earlier code-only approach discussed in:
Which issue(s) this PR fixes:
Part of kubernetes/kubernetes#135541
Special notes for your reviewer:
Main points of the proposal:
There is also related broader enhancement discussion in:
This KEP is intentionally scoped more narrowly around a kubelet TopologyManager option.
Does this PR introduce a user-facing change?