Skip to content

fix(labels): merge cluster_template labels into sparse cluster.labels#1016

Open
Rico Lin (ricolin) wants to merge 3 commits into
mainfrom
f3-fill-missing-labels
Open

fix(labels): merge cluster_template labels into sparse cluster.labels#1016
Rico Lin (ricolin) wants to merge 3 commits into
mainfrom
f3-fill-missing-labels

Conversation

@ricolin

@ricolin Rico Lin (ricolin) commented May 8, 2026

Copy link
Copy Markdown
Member

Summary

Magnum cluster create --labels replaces, rather than merges, cluster-template labels. When an operator creates a cluster with sparse custom labels, the stored cluster row can miss labels required by the Rust ClusterLabels extractor, such as kube_tag.

This PR fills those missing template-derived labels before the Rust create/delete paths and persists the completed label dict on create.

Fix

  • Add utils.fill_missing_labels_from_template(cluster) to fill missing labels without overriding user-provided cluster labels.
  • Prefer cluster.labels_skipped before cluster.cluster_template.labels, matching Magnum's stored representation for template-only labels on sparse-label creates.
  • Invoke the helper before Rust create/delete paths.
  • Persist the filled label dict during create when it changes, so later status/health monitor reads see the completed labels from Magnum storage.
  • Tolerate CAPI Cluster objects that do not yet have status.conditions during early reconciliation.

This PR is rebased on current main, which already includes PR #1064.

Repro

  1. Create a cluster template with labels like kube_tag, fixed_subnet_cidr, and octavia_provider.
  2. Run openstack coe cluster create --labels audit_log_enabled=true while omitting those template labels.
  3. The sparse cluster row can later fail Rust label extraction with KeyError: 'kube_tag', especially in status/health monitor or delete paths.

Validation

tox -e py310
# 221 passed, 58 warnings

git diff --check origin/main...HEAD
# passed

Previously validated on Atmosphere main and stable AIOs with the equivalent sparse-label/status-condition fixes:

  • sparse labels filled from labels_skipped first, then template labels
  • missing CAPI status.conditions produced an empty status map instead of KeyError
  • fresh cluster create passed after updated mCAPI image rollout
  • old/default cluster create, mCAPI update, then resize of the same existing cluster passed
  • delete path passed with sparse labels present

Notes

Signed-off-by: Rico Lin rlin@vexxhost.com

Rico Lin (ricolin) added a commit that referenced this pull request May 8, 2026
…content guard

Two independent fixes that surfaced while running the G2 bare-metal smoke
test against PR #1015 + PR #1014.  See
#1015 for the full
discussion; the README at notes/logs/ironic/mcapi-bm/README.md captures the
end-to-end repro.

F1 — `_decode_extra_files_label` accepts plain YAML/JSON in addition to
base64
  Operators using YAML manifests (Heat, Terraform, kubectl) typically have
  the platform handle base64 wrapping, so being forced to manually wrap the
  list a second time was a frequent foot-gun.  The decoder now tries
  base64 first (the documented/recommended path) and falls back to plain
  YAML/JSON, with a single clear error message when neither parses.

F2 — `_normalize_extra_file` rejects pre-encoded content that omits
`encoding: base64`
  Previously, an entry with `content: <base64>` and no `encoding`
  field was silently re-encoded, landing on disk as a base64 string and
  breaking any later `bash <script>` step.  A narrow heuristic now
  detects this case (canonical base64 charset, length >= 16, multiple of
  4, decodes to valid UTF-8) and raises `InvalidParameterValue` with a
  hint pointing operators at the `encoding` field.  Tests cover plain
  scripts, short strings, and multi-line content to confirm no false
  positives.

(F3 — `fill_missing_labels_from_template` — was originally bundled here
but has been split out into its own PR (#1016) against `main` so it can
land independently of this PR's larger scope.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rico Lin (ricolin) added a commit that referenced this pull request May 8, 2026
…content guard

Two independent fixes that surfaced while running the G2 bare-metal smoke
test against PR #1015 + PR #1014.  See
#1015 for the full
discussion; the README at notes/logs/ironic/mcapi-bm/README.md captures the
end-to-end repro.

F1 — `_decode_extra_files_label` accepts plain YAML/JSON in addition to
base64
  Operators using YAML manifests (Heat, Terraform, kubectl) typically have
  the platform handle base64 wrapping, so being forced to manually wrap the
  list a second time was a frequent foot-gun.  The decoder now tries
  base64 first (the documented/recommended path) and falls back to plain
  YAML/JSON, with a single clear error message when neither parses.

F2 — `_normalize_extra_file` rejects pre-encoded content that omits
`encoding: base64`
  Previously, an entry with `content: <base64>` and no `encoding`
  field was silently re-encoded, landing on disk as a base64 string and
  breaking any later `bash <script>` step.  A narrow heuristic now
  detects this case (canonical base64 charset, length >= 16, multiple of
  4, decodes to valid UTF-8) and raises `InvalidParameterValue` with a
  hint pointing operators at the `encoding` field.  Tests cover plain
  scripts, short strings, and multi-line content to confirm no false
  positives.

(F3 — `fill_missing_labels_from_template` — was originally bundled here
but has been split out into its own PR (#1016) against `main` so it can
land independently of this PR's larger scope.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: ricolin <rlin@vexxhost.com>
@ricolin Rico Lin (ricolin) force-pushed the f3-fill-missing-labels branch from 0569744 to b7364ee Compare May 8, 2026 02:56
@ricolin

Copy link
Copy Markdown
Member Author

Test report (test env redacted)

Scope: utils.fill_missing_labels_from_template — fill template-only label keys back into cluster.labels before invoking the Rust driver, so sparse --labels on coe cluster create no longer crashes the deserialiser.

Env: Atmosphere AIO + 3-node BM emulator, kube image v1.34.7.

Phase Test Result
Repro (pre-fix) coe cluster create with only extra_* labels (no kube_tag etc.) → CREATE_FAILED with KeyError: 'kube_tag' from the Rust ClusterLabels deserialiser; orphan then stuck in DELETE_FAILED (delete path tripped on the same KeyError) ❌ as expected (gap-22 reproduced)
Post-fix Same sparse---labels create, with this PR applied ✅ CREATE_COMPLETE on first try
Post-fix cluster delete on a sparse-labels cluster ✅ DELETE_COMPLETE — no KeyError on the delete path
Regression Round-1 BM cluster lifecycle (create → upgrade → scale) on a fully-labelled cluster ✅ Did not regress

This was originally bundled inside #1015's commit 1930573 alongside two extra_cloud_init fixes; split out here so the label-merge fallback can land on main independently of #1015's larger scope. Same E2E evidence covers both PRs.

@ricolin

Copy link
Copy Markdown
Member Author

recheck

@ricolin

Copy link
Copy Markdown
Member Author

@codex[agent] review this PR

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

@Codex

Codex AI commented May 26, 2026

Copy link
Copy Markdown

@codex[agent] review this PR

Reviewed and did a local validation pass. pre-commit run -a passes, and the label-merge helper is wired into both create_cluster and delete_cluster as intended. I pushed a small follow-up commit 83f7593 to fix a mismatched default expectation in test_utils and add missing tox deps (lark, setuptools<82) so unit tests can run; tox -e unit still has existing failures in test_driver (update_nodegroup request mocks) that appear unrelated to this PR.

@chatgpt-codex-connector

Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

Magnum's cluster create --labels path replaces the cluster_template labels
instead of merging them into the cluster row.  That can leave stored
cluster.labels without template-derived keys such as kube_tag, which the Rust
ClusterLabels deserializer expects on create and delete paths.

Fill missing keys from labels_skipped first, then from cluster_template.labels,
without overriding operator-provided cluster labels.  Persist the completed
labels on create so later status and monitor paths read the same label set from
Magnum, and apply the same fallback before delete so existing sparse clusters
can still be removed.

Also tolerate early CAPI Cluster objects that do not have status.conditions yet
while waiting for reconciliation.

Signed-off-by: Rico Lin <rlin@vexxhost.com>
Assisted-By: Codex <noreply@openai.com>
Cover the label fallback helper, including labels_skipped taking precedence over
cluster_template.labels for missing keys.  Add driver tests proving create
persists the completed label set and both create and delete pass label-complete
cluster objects to the Rust driver.

Mock the MachineSet list used by nodegroup update tests so the unit suite keeps
matching the requests made by current main.

Signed-off-by: Rico Lin <rlin@vexxhost.com>
Assisted-By: Codex <noreply@openai.com>
The tox unit environment installs its own deps instead of the pyproject dev
dependency group.  Add lark so the Magnum imports used by the unit tests are
available, and pin setuptools below the pkg_resources removal boundary used by
current Magnum dependencies.

Signed-off-by: Rico Lin <rlin@vexxhost.com>
Assisted-By: Codex <noreply@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants