fix: Add explicit HTTP request timeouts to all AWS SDK clients#1374
fix: Add explicit HTTP request timeouts to all AWS SDK clients#1374howard-junec wants to merge 2 commits intokubernetes:masterfrom
Conversation
Configure WithHTTPClient on all LoadDefaultConfig calls to prevent clock skew overcorrection from slow API responses (COE-389792). Add AST test to ensure future SDK clients include timeout.
|
This issue is currently awaiting triage. If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @howard-junec! |
|
Hi @howard-junec. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| if region != "" { | ||
| cfg, err = config.LoadDefaultConfig(ctx, | ||
| config.WithRegion(region), | ||
| config.WithHTTPClient(defaultHTTPClient), |
There was a problem hiding this comment.
Can we just override the the default client's timeout since aws sdk has other default for transport which are good to take https://github.com/aws/aws-sdk-go-v2/blob/main/aws/transport/http/client.go#L44
There was a problem hiding this comment.
thanks, I will update to use awshttp.NewBuildableClient().WithTimeout(30 * time.Second) to preserve the SDK's default transport settings.
pkg/providers/v1/aws_sdk.go
Outdated
| // HTTP request timeout and reuse connection pools. Without a timeout, a single | ||
| // slow response can trigger the Go SDK's clock skew overcorrection and break all | ||
| // subsequent API calls (COE-389792). | ||
| var defaultHTTPClient = &http.Client{Timeout: 30 * time.Second} |
There was a problem hiding this comment.
is 30 sec enough to list all the instances or does pagination take care of that ?
There was a problem hiding this comment.
The 30s timeout is per HTTP request, not per operation. DescribeInstances paginates via NextToken, each page is a separate HTTP call with its own 30s window.
Replace http.Client with awshttp.NewBuildableClient().WithTimeout() to preserve SDK default transport settings.
|
/ok-to-test |
|
@howard-junec: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add explicit HTTP request timeouts (30s) to all AWS SDK clients using the SDK's BuildableClient via WithHTTPClient on every LoadDefaultConfig call. Using BuildableClient instead of a bare http.Client preserves the SDK's default transport settings (TLS 1.2, connection pooling, dialer config). This prevents the Go SDK's clock skew overcorrection from breaking all API calls when a single response is slow. Added AST test to fail if any future LoadDefaultConfig is added without WithHTTPClient.
/kind bug
What this PR does / why we need it:
Add explicit HTTP request timeouts (30s) to all AWS SDK clients by configuring
WithHTTPClienton everyLoadDefaultConfigcall. Without a timeout, a singleslow API response can trigger the Go AWS SDK v2's clock skew overcorrection,
breaking all subsequent API calls with HTTP 401 errors.
Also adds an AST-based test that scans source files and fails if any
LoadDefaultConfigis added withoutWithHTTPClient.Which issue(s) this PR fixes:
NONE
Special notes for your reviewer:
tests/e2e/loadbalancer.gohas oneLoadDefaultConfigwithout timeout —intentionally excluded since it's test tooling, not production controller code.
The AST test skips the
tests/directory accordingly.Does this PR introduce a user-facing change?: