Skip to content

Block server-side apply upsert of existing objects on create#4270

Open
guineveresaenger wants to merge 12 commits intomasterfrom
worktree-issue-2926-upsert-behavior
Open

Block server-side apply upsert of existing objects on create#4270
guineveresaenger wants to merge 12 commits intomasterfrom
worktree-issue-2926-upsert-behavior

Conversation

@guineveresaenger
Copy link
Copy Markdown
Contributor

@guineveresaenger guineveresaenger commented Apr 8, 2026

This pull request addresses the issues laid out in #2926.

Specifically, here we are approaching #2926 as a bug. When a user attempts to delete an object that is in Pulumi state, such a delete should succeed. Because Create defaults to using server-side apply, we can wind up with undeleteable objects in Pulumi state. I would argue that this is a bug.

Therefore, this PR does the following:

  • Add a GET call before each Create to assert that an object already exists. This is perhaps the most controversial choice since it adds a full ; however it allows us to keep using Server-Side Apply on Create, rather than reverting to client-side apply, which I understand has some migration issues wrt field managers.

  • Fail Create when an object already exists. The user is suggested three options:

    • use an alias if the operation is a rename so we don't accidentally delete objects
    • use DeleteBeforeReplace resource option for renames
    • use a Patch resource to only manage certain object fields.
      We don't tell the user to explicitly put the object into state via Import, because that will result in Deletes hitting the Forbidden erorr.
  • Relinquish on Forbidden delete: When the API server forbids deletion (e.g. protected namespaces like kube-system), fall back to relinquishing managed fields instead of failing. This is for situations where a user explicitly imported a protected object. This was too broad. Users should be directed towards patch resources, which can be safely deleted.

  • Also adds upsertExistingObjects provider config: Users have reported they like this behavior for bulk import purposes. Therefore, we also provide an opt-in flag for those users.

Fixes #2926
Refs #2948

Test plan

  • Unit tests for create-side existence check (existing object blocks create, preview also blocks, new objects proceed, Patch resources skip check, upsertExistingObjects skips check)
  • E2E test: creating the built-in default namespace fails with "already exists" error
  • E2E test: upsert + destroy of protected namespace succeeds via Relinquish fallback
  • Existing Creation/Update/Deletion unit tests pass

🤖 Generated with Claude Code

guineveresaenger and others added 5 commits April 7, 2026 14:57
Add a pre-flight GET before server-side apply during resource creation.
If the target object already exists in the cluster, return an error
directing users to aliases (for renames), `pulumi import`, or Patch
resources, instead of silently updating the existing object.

This prevents the dangerous create-then-delete pattern where renaming a
Pulumi resource with the same metadata.name causes silent data loss
(#2948), and the confusing "stuck in state" scenario when targeting
protected resources like kube-system (#2926).

The check runs on both preview and up, and is skipped for Patch
resources (which are designed for existing objects).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the API server forbids deletion (e.g. protected namespaces like
kube-system) and server-side apply is enabled, fall back to relinquishing
managed fields instead of failing. This cleans up fields Pulumi added
while leaving the protected resource intact, and allows the resource to
be removed from Pulumi state without manual state surgery.

This is a safety net for cases where a protected resource ends up in
state via `pulumi import`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
For users who rely on the upsert behavior (e.g. bulk-adopting existing
cluster resources into Pulumi), add a provider config flag
upsertExistingObjects that skips the pre-create existence check.

Can be set via:
- Provider config: upsertExistingObjects: true
- Environment variable: PULUMI_K8S_UPSERT_EXISTING_OBJECTS=true

Defaults to false — the safe behavior where creating a resource that
already exists in the cluster returns an error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests against a live cluster that creating the built-in "default"
namespace fails with an "already exists" error, preventing the silent
upsert behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Does the PR have any schema changes?

Looking good! No breaking changes found.
No new resources/functions.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 24.67532% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 45.77%. Comparing base (c2f6707) to head (029b0b2).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
provider/pkg/provider/provider.go 4.25% 42 Missing and 3 partials ⚠️
provider/pkg/gen/schema.go 0.00% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4270      +/-   ##
==========================================
+ Coverage   45.74%   45.77%   +0.02%     
==========================================
  Files          93       93              
  Lines       10363    10404      +41     
==========================================
+ Hits         4741     4762      +21     
- Misses       5211     5229      +18     
- Partials      411      413       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The "already exists" error also fires during create-before-delete
replacements of explicitly-named resources, where the same data loss
pattern applies. Mention deleteBeforeReplace as a resolution option.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@guineveresaenger guineveresaenger requested review from blampe and pgavlin and removed request for pgavlin April 8, 2026 18:10
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@guineveresaenger guineveresaenger requested a review from a team April 8, 2026 18:12
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@guineveresaenger
Copy link
Copy Markdown
Contributor Author

/review-again

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blocking issues found in provider/pkg/await/await.go:

  1. Silent bypass of the existence guard on non-NotFound GET errors (line 208-211): Any GET error that isn't 404 (e.g. a 403 from a service account with write-but-not-read RBAC, a transient 503) is logged at V(1) and ignored, allowing the SSA upsert to proceed. This directly undermines the PR's goal — a routine RBAC split where the SA has patch but not get means the guard is bypassed on every create, reproducing the data-loss scenario of #2926.

  2. Relinquish fires on any 403, not just protected resources (line 984-989): The IsForbidden && ServerSideApply condition catches all 403 responses — including a service account that simply has no delete RBAC permission. Any SSA stack where delete is denied for any reason will silently have its managed fields relinquished and the resource removed from state rather than surfacing the permission error. This is a silent state-corruption risk that affects all SSA users, not just those using upsertExistingObjects.

Reviewed by Internal PR Re-Review (Slash Command)

To install this agentic workflow, run

gh aw add pulumi-labs/gh-aw-internal/.github/workflows/gh-aw-pr-rereview.md@734ef41746387a6818fd8ac3e619c9fd81ac6957

Comment on lines +208 to +211
if !apierrors.IsNotFound(getErr) {
logger.V(1).Infof("pre-create GET for %s/%s failed: %v",
c.Inputs.GetNamespace(), name, getErr)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: Non-404 GET errors silently bypass the existence guard

When the pre-create GET returns an error that is not NotFound (e.g. a 403 Forbidden because the service account has patch but not get, a 503, or a transient timeout), the code logs at V(1) and falls through — allowing the SSA Patch to proceed as if the object does not exist.

This directly undermines the PR's stated goal: a common RBAC configuration where the service account has write-but-not-read permission means every create silently bypasses the guard. If the cluster object actually exists in that scenario, the upsert happens undetected, and the subsequent delete causes the exact data loss described in #2926.

Consider returning the GET error (or a wrapped error surfacing that the check could not be completed) instead of silently proceeding:

if !apierrors.IsNotFound(getErr) {
    return fmt.Errorf("pre-create existence check for %s/%s failed: %w — "+
        "cannot confirm object does not exist; use upsertExistingObjects=true to skip this check",
        c.Inputs.GetNamespace(), name, getErr)
}

https://github.com/pulumi/pulumi-kubernetes/blob/d23451e829c9c07664bad420915d2ba0a22ce6cc/provider/pkg/await/await.go#L208-L211

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@guineveresaenger guineveresaenger marked this pull request as draft April 8, 2026 21:48
The Relinquish fallback fired on any 403 Forbidden, not just protected
resources — including RBAC denials and admission webhook rejections.
This could silently corrupt state by removing resources the user
intended to keep.

Users who need to manage fields on protected resources (like kube-system)
should use Patch resources, which already use Relinquish on delete.
Users who opt into upsertExistingObjects accept the risks of delete
failures on protected resources.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@guineveresaenger guineveresaenger force-pushed the worktree-issue-2926-upsert-behavior branch 2 times, most recently from d355ea1 to c8cfe1b Compare April 8, 2026 22:55
@guineveresaenger
Copy link
Copy Markdown
Contributor Author

/review-again

- Return error on non-NotFound GET failures instead of silently
  bypassing the existence guard. A 403 or 503 on GET now surfaces
  a clear error suggesting upsertExistingObjects as a workaround.
- Add deleteBeforeReplace to TestPreviewReplacements Job, since
  create-before-delete with an explicit name hits the existence check.
- Add upsertExistingObjects to TestServerSideApply provider, since
  the test explicitly exercises upsert behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@guineveresaenger guineveresaenger force-pushed the worktree-issue-2926-upsert-behavior branch from c8cfe1b to e24bc47 Compare April 8, 2026 23:35
@guineveresaenger
Copy link
Copy Markdown
Contributor Author

/review-again

@guineveresaenger guineveresaenger force-pushed the worktree-issue-2926-upsert-behavior branch from 7ec4304 to 74e8efb Compare April 9, 2026 18:43
@guineveresaenger
Copy link
Copy Markdown
Contributor Author

/review-again

@guineveresaenger guineveresaenger marked this pull request as ready for review April 10, 2026 16:56
@guineveresaenger
Copy link
Copy Markdown
Contributor Author

/review-again

@@ -0,0 +1,11 @@
name: existing-object
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAML tests make me happy :)

return enabled == trueStr
}
if enabled, exists := os.LookupEnv("PULUMI_K8S_UPSERT_EXISTING_OBJECTS"); exists {
return enabled == trueStr
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpick) If I understand this correctly, would using in bashexport PULUMI_K8S_UPSERT_EXISTING_OBJECTS=1 would fail this check since we are comparing exclusively for == true. Is that desirable? I would expect =1 to be a valid way to set it to true. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the way that the other environment variables are set up. :/ I think to change it we should change it everywhere.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

I reviewed the full diff. Below are the definite issues found, ordered by severity.


note This is not true, server-side apply defaults to true. The check hits the code path just fine.

Issue 1 — The new E2E test does not exercise the new SSA code path

File: tests/sdk/yaml/existing_object_test.go + tests/sdk/yaml/testdata/existing-object/Pulumi.yaml

Problem: The TestExistingObjectBlocksCreate E2E test is intended to verify that the new SSA blocking behavior works end-to-end. However, testdata/existing-object/Pulumi.yaml does not configure enableServerSideApply: true on the provider. Because serverSideApplyMode defaults to false, the test exercises the old non-SSA code path (client.Create() → API server returns AlreadyExists), not the new pre-create GET check added in await.go.

The gate condition in await.go is:

if c.ServerSideApply && !c.Preview && !c.UpsertExistingObjects && !kinds.IsPatchResource(...) {

Without enableServerSideApply: true, c.ServerSideApply is false and the new check is never reached. The test passes, but it would continue passing even if the new SSA guard were completely removed. There is no E2E test that would catch a regression in the new code path.

Why it's definitely a real problem: The E2E test was added specifically to validate the new blocking behavior, but it silently validates existing (unrelated) behavior instead. Any regression in the new SSA guard would go undetected by E2E tests.

Fix: Add config: kubernetes:enableServerSideApply: "true" (or a provider resource with enableServerSideApply: true) to testdata/existing-object/Pulumi.yaml.


Issue 2 — Existing SSA E2E tests (step1/2/3) now mask regressions in normal SSA creates

Files:

  • tests/sdk/nodejs/server-side-apply/step1/index.ts line 27
  • tests/sdk/nodejs/server-side-apply/step2/index.ts line 27
  • tests/sdk/nodejs/server-side-apply/step3/index.ts line 27

Problem: All three steps now use upsertExistingObjects: true. Step 1's comment at line 3 explicitly states: "3. Upsert a Deployment resource that already exists." — the nginx-upsert resource (lines 130–147) intentionally creates a second Pulumi resource pointing at the same named cluster object as nginx.

With upsertExistingObjects: true, the pre-create existence check is bypassed for all SSA creates in these tests, including creates that are not intentional upserts. This means if the new existence check logic were accidentally inverted (blocking creates of non-existing objects, or incorrectly firing on auto-named resources), these tests would not catch it because the bypass is always active.

Why it's definitely a real problem: The three step tests are the primary integration tests for SSA create behavior. With the bypass on globally, they no longer cover the default (and new default-safe) code path. A logic error in the existence check that causes false positives for new objects would be invisible to these tests.

The correct fix would be to split the nginx-upsert scenario out into its own test that explicitly uses upsertExistingObjects: true, and restore the remaining tests to upsertExistingObjects: false (or omit the flag entirely since false is the default). This would mean the main step1/2/3 tests continue to exercise SSA creates against non-existing objects under the new default behavior.


Issue 3 — Existence check is inside the retry loop and runs on every retry

File: provider/pkg/await/await.go, lines 190–217

Problem: The pre-create GET check runs inside the SleepingRetry closure. On the very first iteration (i == 0), the client variable is nil and gets initialized. On the next retry (if one somehow occurs), client is non-nil so the check runs again. This is benign for the "already exists" case (the error is not IsNotFound or IsNoMatchError, so the retrier stops after the first failure and the check runs exactly once).

However, for the non-404, non-NoMatch error branch (lines 208–215), the error returned IS a plain fmt.Errorf — not IsNotFound, not IsNoMatchError — so the retrier also stops immediately. This is correct behavior.

The placement is safe for all currently covered cases, but it does mean that if client.Get fails transiently (e.g., a momentary network issue), the entire create is hard-failed with a message telling the user to set upsertExistingObjects=true. This is a definite behavioral regression: a transient GET failure is not the same as "cannot confirm the object does not exist," but the error message frames it that way and blocks the create.

Why it's definitely a real problem: A transient API server blip during the GET — which is not retried — permanently blocks the create with a misleading error message directing users toward a flag that bypasses the safety check. The GET should either be retried or its transient-failure error should be clearly distinguished from a permanent one.


go.sum — No suspicious entries

All additions to sdk/go.sum are /go.mod-only indirect dependency hashes or well-known packages (Azure SDK, AWS SDK, Google Cloud, HashiCorp Vault, etc.) consistent with a Pulumi SDK dependency update. No suspicious entries were found.

Reviewed by Internal PR Re-Review (Slash Command)

To install this agentic workflow, run

gh aw add pulumi-labs/gh-aw-internal/.github/workflows/gh-aw-pr-rereview.md@734ef41746387a6818fd8ac3e619c9fd81ac6957

@guineveresaenger guineveresaenger force-pushed the worktree-issue-2926-upsert-behavior branch from 74e8efb to d37ae0a Compare April 10, 2026 21:15
During preview, the engine plans operations but doesn't execute them.
A planned delete-before-create for a replacement won't have run yet,
so the object is still in the cluster — making the existence check a
false positive.

Skip the GET when c.Preview is true. The check still runs on the
actual up (and up --skip-preview), since Creation() is called for
both phases.

Tradeoff: preview no longer fails fast on the rename/upsert case.
Users get the error on apply instead.

Also revert the deleteBeforeReplace addition to TestPreviewReplacements
testdata, since it's no longer needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@guineveresaenger guineveresaenger force-pushed the worktree-issue-2926-upsert-behavior branch from d37ae0a to e554ce8 Compare April 10, 2026 21:28
Use a separate provider with upsertExistingObjects for the nginx-upsert
resource that intentionally targets an existing object. The main
provider uses the default behavior so that other SSA creates in the
test exercise the new existence check code path.

Also remove debug panic from await.go.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@guineveresaenger
Copy link
Copy Markdown
Contributor Author

/review-again

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Three bugs found. One is blocking; two are non-blocking.

1. Blocking — GET errors silently bypass the existence check (provider/pkg/await/await.go, line 196)

The guard only acts on getErr == nil. Any error other than "object found" — including 403 Forbidden, network failures, and 503 ServiceUnavailable — is silently swallowed and the create proceeds as if the object does not exist.

A user whose RBAC allows create/patch but not get (a common least-privilege setup) will completely bypass this guard on every run. The original #2926 silent-upsert-then-delete bug is still fully reproducible for those users after this PR merges.

Fix: treat any non-NotFound GET error as a hard failure rather than a silent pass-through.


2. Non-blocking — PULUMI_K8S_UPSERT_EXISTING_OBJECTS=1 is silently ignored (provider/pkg/provider/provider.go, lines 633–641)

The env var is compared only to the literal string "true". Setting it to 1, TRUE, or yes (all standard Unix truthy values) leaves the flag disabled with no warning. This is load-bearing for an escape hatch: a user who sets the env var in a normal CI/shell style and sees the new blocking error will have no idea the flag is being ignored.

Fix: use strconv.ParseBool, or at minimum document in the schema description and release notes that only the exact string "true" is accepted.


3. Non-blocking — generateName resources silently skip the existence check (provider/pkg/await/await.go, line 195)

The if name := c.Inputs.GetName(); name != "" guard skips the check for any resource without an explicit name. For purely generated names this is harmless. However, there is no assertion that generateName is actually set in the empty-name case, and the code path is undocumented. A future refactor or an unusual resource shape could reach this branch unintentionally and silently re-introduce the #2926 bug.

Fix: add a comment explaining the deliberate skip and, optionally, assert that GetGenerateName() != "" when the name is empty.


Other focus areas (no issues found)

  • Flag propagation: UpsertExistingObjects is correctly set on CreateConfig (line 1740), UpdateConfig (line 2249), and DeleteConfig (line 2417).
  • Schema type: upsertExistingObjects is typed as boolean and the description accurately describes the opt-in nature and the #2926 scenario. The DefaultInfo.Environment entry correctly points to PULUMI_K8S_UPSERT_EXISTING_OBJECTS. No schema inconsistencies found.
  • Security (resource existence exposure): The GET is performed with the provider's own credentials, so it does not expose resource existence to end-users beyond what their RBAC already grants. No additional attack surface introduced.

Reviewed by Internal PR Re-Review (Slash Command)

To install this agentic workflow, run

gh aw add pulumi-labs/gh-aw-internal/.github/workflows/gh-aw-pr-rereview.md@734ef41746387a6818fd8ac3e619c9fd81ac6957

Comment on lines +196 to +212
_, getErr := client.Get(c.Context, name, metav1.GetOptions{})
if getErr == nil {
msg := fmt.Sprintf(
"resource %s/%s (%s) already exists in the cluster. "+
"To resolve this, you can:\n"+
" 1. Use aliases if renaming a Pulumi resource.\n"+
" 2. Use deleteBeforeReplace if replacing a resource with an explicit name.\n"+
" 3. Use a Patch resource to manage specific fields on existing objects.\n"+
" 4. Set upsertExistingObjects on the provider to update existing objects.",
c.Inputs.GetNamespace(), name, c.Inputs.GetKind(),
)
if c.Preview {
_ = c.Host.Log(c.Context, diag.Warning, c.URN, msg)
} else {
return fmt.Errorf("%s", msg)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (blocking): GET errors other than "not found" silently bypass the existence check

The existence check only acts when getErr == nil (object found). Any other error — Forbidden, network timeout, ServiceUnavailable, etc. — is silently ignored and the create proceeds as if the object does not exist.

_, getErr := client.Get(c.Context, name, metav1.GetOptions{})
if getErr == nil {
    // block/warn
}
// getErr != nil → falls through, create proceeds

A user whose RBAC permits create/patch but not get (a common least-privilege pattern) will bypass this guard entirely on every run. The original #2926 bug — silent upsert followed by destructive delete — will still occur for those users even after this PR.

The fix is to propagate non-NotFound errors rather than silently proceeding:

_, getErr := client.Get(c.Context, name, metav1.GetOptions{})
if getErr == nil {
    // existing block/warn logic
} else if !apierrors.IsNotFound(getErr) {
    return fmt.Errorf("failed to check whether resource %s/%s (%s) already exists: %w",
        c.Inputs.GetNamespace(), name, c.Inputs.GetKind(), getErr)
}

Comment on lines +633 to +638
upsertExistingObjects := func() bool {
if enabled, exists := vars["kubernetes:config:upsertExistingObjects"]; exists {
return enabled == trueStr
}
if enabled, exists := os.LookupEnv("PULUMI_K8S_UPSERT_EXISTING_OBJECTS"); exists {
return enabled == trueStr
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (non-blocking): PULUMI_K8S_UPSERT_EXISTING_OBJECTS=1 does not enable the flag

The env var is compared only against the string literal "true":

upsertExistingObjects := func() bool {
    if enabled, exists := vars["kubernetes:config:upsertExistingObjects"]; exists {
        return enabled == trueStr   // trueStr = "true"
    }
    if enabled, exists := os.LookupEnv("PULUMI_K8S_UPSERT_EXISTING_OBJECTS"); exists {
        return enabled == trueStr   // only "true" accepted
    }
    return false
}

Standard Unix convention (and the Pulumi docs for other env vars) treats 1, yes, TRUE, and true as truthy. A user who sets PULUMI_K8S_UPSERT_EXISTING_OBJECTS=1 (following normal shell/CI conventions) gets the new blocking behavior with no indication why the flag appears to be ignored.

This is consistent with how all other PULUMI_K8S_* env vars are handled in this file, so it is a pre-existing pattern — but it is newly load-bearing here because the flag's only purpose is to opt out of a breaking change. Consider using strconv.ParseBool so that 1, TRUE, true, yes all work, or at minimum document in the schema description that only "true" is accepted.

// prevents silent upsert of existing objects, which can lead
// to data loss when the engine later deletes a resource that
// points to the same cluster object (see #2926, #2948).
if name := c.Inputs.GetName(); name != "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (non-blocking): Resources using generateName silently skip the existence check

The guard if name := c.Inputs.GetName(); name != "" means any resource that uses generateName (no explicit metadata.name) bypasses the check entirely. This is intentional and correct for truly new generated names, but the comment does not document this assumption, and the condition does not distinguish between "name is empty because generateName is set" and "name is empty for some other reason."

More importantly: a Pulumi resource that is renamed in the program may have been tracked under an explicit name in state but arrive here without a name in c.Inputs if the name was computed. In that edge case the existence guard silently does nothing and the original #2926 silent-upsert-then-delete bug resurfaces.

This warrants a comment explaining the deliberate skip, and possibly an assertion that generateName is set when GetName() is empty.

Comment on lines +196 to +213
_, getErr := client.Get(c.Context, name, metav1.GetOptions{})
if getErr == nil {
msg := fmt.Sprintf(
"resource %s/%s (%s) already exists in the cluster. "+
"To resolve this, you can:\n"+
" 1. Use aliases if renaming a Pulumi resource.\n"+
" 2. Use deleteBeforeReplace if replacing a resource with an explicit name.\n"+
" 3. Use a Patch resource to manage specific fields on existing objects.\n"+
" 4. Set upsertExistingObjects on the provider to update existing objects.",
c.Inputs.GetNamespace(), name, c.Inputs.GetKind(),
)
if c.Preview {
_ = c.Host.Log(c.Context, diag.Warning, c.URN, msg)
} else {
return fmt.Errorf("%s", msg)
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: non-404 GET errors silently bypass the existence guard

The check at line 197 only acts when getErr == nil (confirmed present). Any non-nil error — including 403 Forbidden (operator lacks get but has create/patch), 503 Service Unavailable, or a transient network failure — falls through and proceeds to the SSA Patch with no error or log.

This permanently disables the guard for any least-privilege setup that allows create/patch but not get on the resource — the exact cluster posture this PR is meant to protect.

Suggested fix: only treat a confirmed IsNotFound as safe-to-create; return an error on anything else:

_, getErr := client.Get(c.Context, name, metav1.GetOptions{})
if getErr == nil {
    // object exists — warn or fail
    ...
} else if !apierrors.IsNotFound(getErr) {
    return fmt.Errorf("failed to check whether %s/%s already exists: %w",
        c.Inputs.GetNamespace(), name, getErr)
}
// only reach here if IsNotFound(getErr) — safe to create

Context:

if name := c.Inputs.GetName(); name != "" {
_, getErr := client.Get(c.Context, name, metav1.GetOptions{})
if getErr == nil {
msg := fmt.Sprintf(
"resource %s/%s (%s) already exists in the cluster. "+
"To resolve this, you can:\n"+
" 1. Use aliases if renaming a Pulumi resource.\n"+
" 2. Use deleteBeforeReplace if replacing a resource with an explicit name.\n"+
" 3. Use a Patch resource to manage specific fields on existing objects.\n"+
" 4. Set upsertExistingObjects on the provider to update existing objects.",
c.Inputs.GetNamespace(), name, c.Inputs.GetKind(),
)
if c.Preview {
_ = c.Host.Log(c.Context, diag.Warning, c.URN, msg)
} else {
return fmt.Errorf("%s", msg)
}
}
}

Note: this was raised in a previous review thread (marked "Addressed" by the author) but the if getErr == nil form is still present in the current code at HEAD 029b0b2d5. The unit tests in TestCreationSSAExistingObject do not include a case where the GET itself returns a non-404 error, so the regression is not caught by the test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v5: Fail-fast when a program attempts to create an object that already exists.

2 participants