fix: Surface exec credential plugin stderr on cluster unreachable errors#4215
fix: Surface exec credential plugin stderr on cluster unreachable errors#4215
Conversation
Does the PR have any schema changes?Looking good! No breaking changes found. |
There was a problem hiding this comment.
Pull request overview
Improves diagnostics when Kubernetes exec-based credential plugins fail by capturing and surfacing the plugin’s stderr in the “cluster unreachable” error path, helping users see actionable auth failures (e.g., expired credentials) without requiring verbose logs.
Changes:
- Add a helper to re-run the active kubeconfig exec plugin with stderr captured for diagnostics.
- Append captured exec-plugin stderr to
clusterUnreachableReasonwhen schema discovery (getResources()) fails. - Add unit tests covering nil/no-exec configs, stderr capture, and context override behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| provider/pkg/provider/util.go | Adds execCredentialStderr helper to re-run exec plugins and capture stderr. |
| provider/pkg/provider/provider.go | Uses execCredentialStderr to enrich cluster-unreachable errors on schema load failure. |
| provider/pkg/provider/util_test.go | Adds unit tests validating stderr capture and override selection logic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #4215 +/- ##
==========================================
- Coverage 45.01% 44.76% -0.26%
==========================================
Files 93 93
Lines 10471 10445 -26
==========================================
- Hits 4714 4676 -38
+ Misses 5334 5259 -75
- Partials 423 510 +87 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @rshade, thank you for working on this! Do you need assistance? 💟 I think we can justify a little CHANGELOG entry here, since it's user-facing. |
|
Hey @guineveresaenger I think I am going to ask claude if it can figure out everywhere this might leak a secret first before I convert it to a full PR. I didn't really like it having to do a separate exec just to bubble up the error, but it looks like the only way. Thank you for the offer of help. I will get it out, and definitely on the changelog. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| import ( | ||
| _ "embed" // Needed to support go:embed directive | ||
| _ "embed" // Needed to support goembed directive | ||
|
|
||
| v1 "k8s.io/api/core/v1" | ||
|
|
||
| pschema "github.com/pulumi/pulumi/pkg/v3/codegen/schema" | ||
| ) | ||
|
|
||
| var serviceSpec = pschema.ComplexTypeSpec{ | ||
| ObjectTypeSpec: pschema.ObjectTypeSpec{ | ||
| Properties: map[string]pschema.PropertySpec{ | ||
| "type": { | ||
| TypeSpec: pschema.TypeSpec{ | ||
| OneOf: []pschema.TypeSpec{ | ||
| {Type: "string"}, | ||
| {Ref: "#/types/kubernetes:core/v1:ServiceSpecType"}, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| var serviceSpecType = pschema.ComplexTypeSpec{ | ||
| ObjectTypeSpec: pschema.ObjectTypeSpec{ | ||
| Type: "string", | ||
| }, | ||
| Enum: []pschema.EnumValueSpec{ | ||
| {Value: v1.ServiceTypeExternalName}, | ||
| {Value: v1.ServiceTypeClusterIP}, | ||
| {Value: v1.ServiceTypeNodePort}, | ||
| {Value: v1.ServiceTypeLoadBalancer}, | ||
| }, | ||
| } | ||
|
|
||
| //go:embed examples/overlays/chartV3.md | ||
| // goembed examples/overlays/chartV3.md | ||
| var helmV3ChartMD string |
| var providerName = "kubernetes" | ||
|
|
||
| //go:embed schema-embed.json | ||
| // goembed schema-embed.json | ||
| var pulumiSchema []byte | ||
|
|
||
| //go:embed terraform-mapping-embed.json | ||
| // goembed terraform-mapping-embed.json | ||
| var terraformMapping []byte |
| var stderrBuf bytes.Buffer | ||
| cmd := exec.CommandContext(execCtx, authInfo.Exec.Command, authInfo.Exec.Args...) //nolint:gosec | ||
| cmd.Stdout = io.Discard | ||
| cmd.Stderr = &stderrBuf | ||
| cmd.Stdin = strings.NewReader("") | ||
| cmd.Env = os.Environ() | ||
| for _, e := range authInfo.Exec.Env { | ||
| cmd.Env = append(cmd.Env, e.Name+"="+e.Value) | ||
| } | ||
| _ = cmd.Run() // exit code doesn't matter; we only want whatever was written to stderr | ||
| out := strings.TrimSpace(stderrBuf.String()) | ||
| if len(out) > execStderrMaxBytes { |
| cmd := exec.CommandContext(execCtx, authInfo.Exec.Command, authInfo.Exec.Args...) //nolint:gosec | ||
| cmd.Stdout = io.Discard | ||
| cmd.Stderr = &stderrBuf | ||
| cmd.Stdin = strings.NewReader("") | ||
| cmd.Env = os.Environ() | ||
| for _, e := range authInfo.Exec.Env { | ||
| cmd.Env = append(cmd.Env, e.Name+"="+e.Value) | ||
| } |
| t.Run("exec plugin stderr is captured", func(t *testing.T) { | ||
| cfg := makeConfig(&clientcmdapi.AuthInfo{ | ||
| Exec: &clientcmdapi.ExecConfig{ | ||
| Command: "sh", | ||
| Args: []string{"-c", "echo 'credentials expired' >&2; exit 1"}, | ||
| APIVersion: "client.authentication.k8s.io/v1", | ||
| }, |
When a kubeconfig uses an exec-based credential plugin (`aws eks get-token`, `gke-gcloud-auth-plugin`, `kubelogin`, etc.) and authentication fails, users previously saw only a generic error: ``` error: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: command terminated with exit code 1 ``` The actionable diagnostic — expired credentials, missing IAM role, wrong region — was written by the subprocess to `os.Stderr` of the provider process, which the Pulumi engine does not forward to the terminal, silently discarding it. After this change, the same failure surfaces as: ``` error: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: command terminated with exit code 1 exec credential plugin output: Error: token refresh failed: credentials have expired, please run `aws sso login` ``` - **`provider/pkg/provider/util.go`**: Adds `execCredentialStderr`, which re-runs the exec credential plugin with stderr captured in a buffer. Called only on the failure path — no performance impact on the happy path. - **`provider/pkg/provider/provider.go`**: On `getResources()` failure, calls `execCredentialStderr` and appends any output to `clusterUnreachableReason`. - **`provider/pkg/provider/util_test.go`**: Unit tests covering nil input, no-exec config, stderr capture, context override selection, and nil-panic safety. - [x] `go test ./pkg/provider/... -run TestExecCredentialStderr` — all 5 cases pass - [ ] Manual test: configure a kubeconfig with an exec plugin that fails (e.g., expired AWS SSO session) and run `pulumi preview` — verify the plugin's stderr appears in the error message without needing `-v=9`
Summary
Fixes #4212.
When a kubeconfig uses an exec-based credential plugin (
aws eks get-token,gke-gcloud-auth-plugin,kubelogin, etc.) and authentication fails, users previously saw only a generic error:The actionable diagnostic — expired credentials, missing IAM role, wrong region — was written by the subprocess to
os.Stderrof the provider process, which the Pulumi engine does not forward to the terminal, silently discarding it.After this change, the same failure surfaces as:
Changes
provider/pkg/provider/util.go: AddsexecCredentialStderr, which re-runs the exec credential plugin with stderr captured in a buffer. Called only on the failure path — no performance impact on the happy path.provider/pkg/provider/provider.go: OngetResources()failure, callsexecCredentialStderrand appends any output toclusterUnreachableReason.provider/pkg/provider/util_test.go: Unit tests covering nil input, no-exec config, stderr capture, context override selection, and nil-panic safety.Test plan
go test ./pkg/provider/... -run TestExecCredentialStderr— all 5 cases passpulumi preview— verify the plugin's stderr appears in the error message without needing-v=9Proposed changes
Related issues (optional)