Skip to content

vpa/admission-controller: limit request payload size to 5MB#9690

Open
sophieliu15 wants to merge 1 commit into
kubernetes:masterfrom
sophieliu15:fix-511328366
Open

vpa/admission-controller: limit request payload size to 5MB#9690
sophieliu15 wants to merge 1 commit into
kubernetes:masterfrom
sophieliu15:fix-511328366

Conversation

@sophieliu15
Copy link
Copy Markdown

@sophieliu15 sophieliu15 commented May 25, 2026

Add a defensive 5MB payload size cap using on the HTTP request body in the VPA Admission Controller webhook.

This prevents arbitrary/maliciously large payload requests from exhausting memory resources and triggering Out-of-Memory (OOM) crashes (Denial of Service). The webhook continues to fail open on reading or unmarshaling errors to protect cluster scheduling availability.

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR adds a defensive 5MB payload size limit to the Vertical Pod Autoscaler (VPA) admission controller webhook server.

Prior to this change, the webhook handler used io.ReadAll(r.Body) to read the incoming HTTP request body into memory without bounds checking. A maliciously large payload (e.g. an endless stream of data) sent to the webhook endpoint could consume all memory resources and trigger an Out-of-Memory (OOM) crash. Since the admission controller is a critical component for scheduling, its crash could lead to a Denial of Service (DoS) blocking all new pod creation in the cluster if configured to fail closed.

By wrapping r.Body with io.LimitReader(r.Body, 5MB), we safely cap the maximum memory allocation during the read step. If the payload is truncated or fails parsing, the webhook gracefully respects its permissive "fail-open" contract to allow scheduling to proceed, while protecting its own server process from OOM.

Special notes for your reviewer:

The 5MB limit was selected to comfortably accommodate valid massive Kubernetes update requests (which contain both the object and oldObject payloads, each up to etcd's default 1.5MB limit plus metadata overhead) while remaining well below standard container memory limits to eliminate OOM risk.

Unit tests have been added to server_test.go to cover both normal and oversized payload scenarios.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. area/vertical-pod-autoscaler Issues or PRs related to the Vertical Pod Autoscaler component labels May 25, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If SIG Autoscaling contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sophieliu15
Once this PR has been reviewed and has the lgtm label, please assign kwiesmueller for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-area Indicates that a PR should not merge because it lacks an area label. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. and removed do-not-merge/needs-area Indicates that a PR should not merge because it lacks an area label. labels May 25, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @sophieliu15. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 25, 2026
Copy link
Copy Markdown
Member

@adrianmoisey adrianmoisey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya,
Thanks for the PR.
I've got two small comments for you

{
name: "Oversized payload exceeding 5MB limit",
isEndless: true,
expectedStatus: http.StatusOK, // Fails open, returning 200 OK with unmarshal error rather than crashing
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rather be expecting a 413 from the server here?

Copy link
Copy Markdown
Author

@sophieliu15 sophieliu15 May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I chose to let the read error "fail open" (proceeding with a 200 OK rather than a 413) for two reasons:

  • Preserving Webhook Behavior: This design represents a minimal change that strictly preserves the VPA webhook's existing, permissive "fail-open" philosophy for JSON parsing/unmarshaling errors (which defaults to Allowed: true). It guarantees the webhook is protected from OOM crashes without changing VPA's business logic or breaking backward compatibility.

  • Preventing Cluster Scheduling Outages Under DoS: Mutating webhooks are inline to all Pod creations/updates. If the webhook is under a heavy DoS flood and we return a 413 HTTP error:

    • The API Server treats non-200 responses as webhook infrastructure failures.
    • If the cluster operator has VPA configured with failurePolicy: Fail, the API Server will immediately block and reject all legitimate user Pod scheduling requests.
    • By using http.MaxBytesReader combined with 200 OK, we discard oversized payloads with minimal resource consumption and do not report this as infrastructure failure to the API Server, ensuring minimal impact on cluster-wide scheduling under DoS.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's true. This changes the logic to respond with a 200, without the expected payload, see https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#response
api-server will treat this as a failure, no matter the HTTP code.

var body []byte
if r.Body != nil {
if data, err := io.ReadAll(r.Body); err == nil {
if data, err := io.ReadAll(io.LimitReader(r.Body, maxAdmissionPayloadSize)); err == nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to rather use http.MaxBytesReader ? It seems to be purpose built for HTTP

Copy link
Copy Markdown
Author

@sophieliu15 sophieliu15 May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! I have updated the PR to use http.MaxBytesReader instead of io.LimitReader.

Add a defensive 5MB payload size cap using  on the HTTP
request body in the VPA Admission Controller webhook.

This prevents arbitrary/maliciously large payload requests from exhausting
memory resources and triggering Out-of-Memory (OOM) crashes (Denial of
Service). The webhook continues to fail open on reading or unmarshaling
errors to protect cluster scheduling availability.
Comment on lines +200 to +203
if data, err := io.ReadAll(http.MaxBytesReader(w, r.Body, maxAdmissionPayloadSize)); err == nil {
body = data
} else {
klog.ErrorS(err, "Failed to read admission request body (payload may exceed 5MB limit)")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we be sure that the error here is because of that reason?
Also, printing out 5MB without a reference to the const value can cause errors.
I would expect something like this:

diff --git a/vertical-pod-autoscaler/pkg/admission-controller/logic/server.go b/vertical-pod-autoscaler/pkg/admission-controller/logic/server.go
index d4ee79e64..47c74d431 100644
--- a/vertical-pod-autoscaler/pkg/admission-controller/logic/server.go
+++ b/vertical-pod-autoscaler/pkg/admission-controller/logic/server.go
@@ -19,6 +19,7 @@ package logic
 import (
        "context"
        "encoding/json"
+       "errors"
        "fmt"
        "io"
        "net/http"
@@ -200,7 +201,12 @@ func (s *AdmissionServer) Serve(w http.ResponseWriter, r *http.Request) {
                if data, err := io.ReadAll(http.MaxBytesReader(w, r.Body, maxAdmissionPayloadSize)); err == nil {
                        body = data
                } else {
-                       klog.ErrorS(err, "Failed to read admission request body (payload may exceed 5MB limit)")
+                       var maxBytesErr *http.MaxBytesError
+                       if errors.As(err, &maxBytesErr) {
+                               klog.ErrorS(err, "Admission request body exceeds size limit", "limit", maxAdmissionPayloadSize)
+                       } else {
+                               klog.ErrorS(err, "Failed to read admission request body")
+                       }
                }
        }

}

func TestServePayloadLimit(t *testing.T) {
tests := []struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more tests here for my above comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/vertical-pod-autoscaler Issues or PRs related to the Vertical Pod Autoscaler component cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants