Skip to content

feat: support server_type=bm in cluster_api driver (auto-default boot_volume_size=0)#1014

Open
Rico Lin (ricolin) wants to merge 6 commits into
mainfrom
add-server-type-bm
Open

feat: support server_type=bm in cluster_api driver (auto-default boot_volume_size=0)#1014
Rico Lin (ricolin) wants to merge 6 commits into
mainfrom
add-server-type-bm

Conversation

@ricolin

@ricolin Rico Lin (ricolin) commented May 6, 2026

Copy link
Copy Markdown
Member

Problem

Magnum's cluster_api drivers only advertised server_type=vm. Baremetal templates using Ironic through Nova failed driver lookup, and baremetal boot also needed safer defaults around root volumes and networking.

Solution

This PR now carries the server_type=bm behavior as three focused pieces:

  1. feat(driver): advertise server_type=bm alongside vm
    • Adds server_type=bm entries for the shipped cluster_api drivers.
  2. feat(resources): default boot_volume_size=0 for server_type=bm
    • Defaults BM clusters away from Cinder root volumes unless the operator explicitly sets a boot volume size.
  3. fix(pr1014): disable CAPO managed security groups for bm
    • Adds disableManagedSecurityGroups and removes CAPO managedSecurityGroups only for server_type=bm ClusterClasses.
    • This is Issue 41 from the live OVN BM replay. CAPO managed security groups caused Neutron PortSecurityAndIPRequiredForSecurityGroups on the disabled-port-security baremetal VLAN network, before Nova/Ironic server create.

VM clusters keep the existing CAPO managed-security-group behavior.

Validation

Latest scoped validation after moving Issue 41 into this PR:

  • cargo test managed_security_groups -> 2 passed
  • cargo test test_convert_values_to_cluster_topology_variables -> 1 passed
  • uv run pytest magnum_cluster_api/tests/unit/test_resources.py::test_cluster_variables_disable_managed_security_groups_for_bm -q -> 1 passed, 8 warnings
  • uv run ruff check magnum_cluster_api/resources.py magnum_cluster_api/tests/unit/test_resources.py -> passed
  • rustfmt --check src/features/managed_security_groups.rs -> passed

Review Notes

The managed-security-groups change is deliberately scoped to BM. The live failure came from a BM VLAN network with port_security_enabled=False; VM tenant-network behavior should remain unchanged.

Rico Lin (ricolin) and others added 2 commits May 6, 2026 10:48
Magnum's driver dispatch matches a cluster_template against the
(server_type, os, coe) tuples returned by each driver's provides()
property. Until now the cluster_api drivers only advertised
server_type=vm, so any cluster_template targeting baremetal nodes
(e.g. Ironic via Nova) failed driver lookup with NoValidDriver even
though Nova would happily route the boot via nova-compute-ironic when
the flavor's resource_class matches a baremetal node.

Add a parallel server_type=bm entry to all four shipped drivers
(Ubuntu, UbuntuFocal, Flatcar, RockyLinux) so the same cluster_api
driver handles both VM and BM clusters. This unblocks Magnum/CAPI on
top of the Ironic emulator AIO scenario described in
vexxhost/atmosphere#bm-mcapi-fixes (gap #2 in that branch's README).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Rico Lin <rlin@vexxhost.com>
Baremetal nodes provisioned through Ironic boot from the local disk
via iPXE and cannot use a Cinder root volume. When a cluster_template
has server_type=bm but the operator did not explicitly set the
boot_volume_size label, the previous code defaulted to
CONF.cinder.default_boot_volume_size, which caused
OpenStackMachineTemplate to request a Cinder root volume that Ironic
refuses to attach during enroll, breaking cluster creation with an
opaque flavor/scheduling error.

Add utils.get_default_boot_volume_size(cluster, default) that returns
0 when cluster.cluster_template.server_type == 'bm' and the supplied
default otherwise, and use it as the default at the three call sites
in resources.py (autoscaling annotation, machine_deployment override,
cluster-level override).

Operators can still override explicitly by setting boot_volume_size
on the cluster_template/cluster/node_group; the auto-default only
kicks in when the label is unset.

This closes the BM cluster_template foot-gun documented as gap #17 in
vexxhost/atmosphere bm-mcapi-fixes branch — operators no longer need
to remember to pass --labels boot_volume_size=0 alongside
server_type=bm.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Rico Lin <rlin@vexxhost.com>
Rico Lin (ricolin) added a commit that referenced this pull request May 6, 2026
Address review feedback on PR #1014.

For server_type=bm, the boot volume default returns 0 (Ironic boots
from local disk, not Cinder), and the fallback flavor.disk is
typically also 0 because root_gb lives on the Ironic node rather
than the Nova flavor. Previously the autoscaler annotation
capacity.cluster-autoscaler.kubernetes.io/ephemeral-disk was set
to "0", which causes cluster-autoscaler to reject any pod that
requests ephemeral-storage even if the Ironic node has plenty of
local disk.

Skip the annotation entirely in that case so the autoscaler can fall
back to cpu/memory-only scheduling decisions, which is correct for
BM nodes whose disk capacity we cannot reliably determine here.

Also document the auto-default in docs/user/labels.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address review feedback on PR #1014.

For server_type=bm, the boot volume default returns 0 (Ironic boots
from local disk, not Cinder), and the fallback flavor.disk is
typically also 0 because root_gb lives on the Ironic node rather
than the Nova flavor. Previously the autoscaler annotation
capacity.cluster-autoscaler.kubernetes.io/ephemeral-disk was set
to "0", which causes cluster-autoscaler to reject any pod that
requests ephemeral-storage even if the Ironic node has plenty of
local disk.

Skip the annotation entirely in that case so the autoscaler can fall
back to cpu/memory-only scheduling decisions, which is correct for
BM nodes whose disk capacity we cannot reliably determine here.

Also document the auto-default in docs/user/labels.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Rico Lin <rlin@vexxhost.com>
Rico Lin (ricolin) added a commit that referenced this pull request May 6, 2026
Address review feedback on PR #1014.

For server_type=bm, the boot volume default returns 0 (Ironic boots
from local disk, not Cinder), and the fallback flavor.disk is
typically also 0 because root_gb lives on the Ironic node rather
than the Nova flavor. Previously the autoscaler annotation
capacity.cluster-autoscaler.kubernetes.io/ephemeral-disk was set
to "0", which causes cluster-autoscaler to reject any pod that
requests ephemeral-storage even if the Ironic node has plenty of
local disk.

Skip the annotation entirely in that case so the autoscaler can fall
back to cpu/memory-only scheduling decisions, which is correct for
BM nodes whose disk capacity we cannot reliably determine here.

Also document the auto-default in docs/user/labels.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: ricolin <rlin@vexxhost.com>
Rico Lin (ricolin) pushed a commit to vexxhost/atmosphere that referenced this pull request May 6, 2026
When patching mcapi via configMap subPath overlay, only the Python
files are replaced -- shipped data files (manifests/calico/v*.yaml,
etc.) still come from the installed image. If the feature branch has
bumped CALICO_TAG past what the image ships, cluster create fails
instantly with No such file or directory on the calico manifest.

Documents the workaround (sed CALICO_TAG down to a shipped version
before applying the configMap) and the proper fix (image rebuild or
add the missing manifest to the configMap).

Hit while validating PR vexxhost/magnum-cluster-api#1014 (head 6301116)
on add-server-type-bm: resources.py references v3.31.5 but the
deployed image only ships up to v3.31.3.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Change-Id: I3da483cd29c53d04dec58e41a002b4cc3627a877
Signed-off-by: Rico Lin <rico@vexxhost.com>
@ricolin

Rico Lin (ricolin) commented May 6, 2026

Copy link
Copy Markdown
Member Author

Result: ✅ End-to-end pass with PR #1014

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Rico Lin <rlin@vexxhost.com>
Rico Lin (ricolin) added a commit that referenced this pull request May 7, 2026
…fallback

Three independent fixes that surfaced while running the G2 bare-metal smoke
test against PR #1015 + PR #1014.  See
#1015 for the full
discussion; the README at notes/logs/ironic/mcapi-bm/README.md captures the
end-to-end repro.

F1 — `_decode_extra_files_label` accepts plain YAML/JSON in addition to
base64
  Operators using YAML manifests (Heat, Terraform, kubectl) typically have
  the platform handle base64 wrapping, so being forced to manually wrap the
  list a second time was a frequent foot-gun.  The decoder now tries
  base64 first (the documented/recommended path) and falls back to plain
  YAML/JSON, with a single clear error message when neither parses.

F2 — `_normalize_extra_file` rejects pre-encoded content that omits
`encoding: base64`
  Previously, an entry with `content: <base64>` and no `encoding`
  field was silently re-encoded, landing on disk as a base64 string and
  breaking any later `bash <script>` step.  A narrow heuristic now
  detects this case (canonical base64 charset, length ≥ 16, multiple of
  4, decodes to valid UTF-8) and raises `InvalidParameterValue` with a
  hint pointing operators at the `encoding` field.  Tests cover plain
  scripts, short strings, and multi-line content to confirm no false
  positives.

F3 — `fill_missing_labels_from_template` merges template labels into
sparse cluster.labels
  Magnum's `cluster create --labels` REPLACES rather than merges the
  cluster_template labels.  When a freshly-created cluster row was
  missing keys the Rust `ClusterLabels` deserialiser requires (most
  notably `kube_tag`), both create and delete crashed with
  `KeyError: 'kube_tag'` and the cluster ended up stuck in
  DELETE_FAILED.  The new helper fills any template-only keys back in
  without overriding values the operator explicitly set, and is invoked
  before `rust_driver.create_cluster` and `rust_driver.delete_cluster`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@mnaser Mohammed Naser (mnaser) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do we need boot_volume_size=0 -- what happens if it's not set? It seems/feels like it should just work out of the box, no?

Rico Lin (ricolin) added a commit that referenced this pull request May 8, 2026
…content guard

Two independent fixes that surfaced while running the G2 bare-metal smoke
test against PR #1015 + PR #1014.  See
#1015 for the full
discussion; the README at notes/logs/ironic/mcapi-bm/README.md captures the
end-to-end repro.

F1 — `_decode_extra_files_label` accepts plain YAML/JSON in addition to
base64
  Operators using YAML manifests (Heat, Terraform, kubectl) typically have
  the platform handle base64 wrapping, so being forced to manually wrap the
  list a second time was a frequent foot-gun.  The decoder now tries
  base64 first (the documented/recommended path) and falls back to plain
  YAML/JSON, with a single clear error message when neither parses.

F2 — `_normalize_extra_file` rejects pre-encoded content that omits
`encoding: base64`
  Previously, an entry with `content: <base64>` and no `encoding`
  field was silently re-encoded, landing on disk as a base64 string and
  breaking any later `bash <script>` step.  A narrow heuristic now
  detects this case (canonical base64 charset, length >= 16, multiple of
  4, decodes to valid UTF-8) and raises `InvalidParameterValue` with a
  hint pointing operators at the `encoding` field.  Tests cover plain
  scripts, short strings, and multi-line content to confirm no false
  positives.

(F3 — `fill_missing_labels_from_template` — was originally bundled here
but has been split out into its own PR (#1016) against `main` so it can
land independently of this PR's larger scope.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ricolin

Copy link
Copy Markdown
Member Author

Out of curiosity, why do we need boot_volume_size=0 -- what happens if it's not set? It seems/feels like it should just work out of the box, no?

Mohammed Naser (@mnaser)
so boot_volume_size defaults to CONF.cinder.default_boot_volume_size, that makes when we try to provide bm for magnum cluster it will failed because machine try to boot the ironic bm from cinder volume.
A workaround is to make sure we alway specifc set boot volume size to 0 from label.
WDYT?

@ricolin Rico Lin (ricolin) marked this pull request as ready for review May 8, 2026 02:38
Rico Lin (ricolin) added a commit that referenced this pull request May 8, 2026
…content guard

Two independent fixes that surfaced while running the G2 bare-metal smoke
test against PR #1015 + PR #1014.  See
#1015 for the full
discussion; the README at notes/logs/ironic/mcapi-bm/README.md captures the
end-to-end repro.

F1 — `_decode_extra_files_label` accepts plain YAML/JSON in addition to
base64
  Operators using YAML manifests (Heat, Terraform, kubectl) typically have
  the platform handle base64 wrapping, so being forced to manually wrap the
  list a second time was a frequent foot-gun.  The decoder now tries
  base64 first (the documented/recommended path) and falls back to plain
  YAML/JSON, with a single clear error message when neither parses.

F2 — `_normalize_extra_file` rejects pre-encoded content that omits
`encoding: base64`
  Previously, an entry with `content: <base64>` and no `encoding`
  field was silently re-encoded, landing on disk as a base64 string and
  breaking any later `bash <script>` step.  A narrow heuristic now
  detects this case (canonical base64 charset, length >= 16, multiple of
  4, decodes to valid UTF-8) and raises `InvalidParameterValue` with a
  hint pointing operators at the `encoding` field.  Tests cover plain
  scripts, short strings, and multi-line content to confirm no false
  positives.

(F3 — `fill_missing_labels_from_template` — was originally bundled here
but has been split out into its own PR (#1016) against `main` so it can
land independently of this PR's larger scope.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: ricolin <rlin@vexxhost.com>
@ricolin

Copy link
Copy Markdown
Member Author

recheck

@ricolin

Copy link
Copy Markdown
Member Author

Test report (test env redacted)

Scope: --server-type bm driver dispatch, boot_volume_size=0 auto-default for BM, autoscaler ephemeral-disk annotation skip-when-0.

Env: Atmosphere AIO with 3-node BM emulator (libvirt + OVS PXE bridge), kube image ubuntu-2204-kube-v1.34.{3,7}-raw.

Phase Test Result
C2 coe cluster create --server-type bm (no boot_volume_size label) ✅ PASS — first try, CREATE_COMPLETE / HEALTHY
U1 Rolling upgrade kube_tag v1.34.3 → v1.34.7 (KCP surge + MD rollout) ✅ PASS
U3 Worker scale-up node_count 1 → 2 immediately after U1 ✅ PASS — UPDATE_COMPLETE / HEALTHY
G1 Fake-GPU device-plugin DaemonSet + demo pod on the upgraded cluster ✅ PASS
Gap boot_volume_size=0 auto-default ✅ Held across create + upgrade + scale
Gap Autoscaler ephemeral-disk annotation skip-when-0 ✅ Did not regress
Gap BM driver dispatch (no VM-only specifics on BM path) ✅ Did not regress

Overall: 5/5 executed phases PASS. Round-1 report archived locally with the full reproducer commands, libvirt domain XML, and ovs-vsctl wiring.

Caveat: Driver().provides is the public API for "is server_type=bm supported"; no separate helper added.

@ricolin

Copy link
Copy Markdown
Member Author

recheck

When a cluster_template advertises `server_type=bm` (this PR), the
operator's intent is to provision both the master and worker nodes on
Ironic via Nova's ironic compute driver.  However, Magnum does not
enforce that `master_flavor_id` / `flavor_id` actually resolve to an
Ironic-backed Nova flavor: a flavor with empty `extra_specs` (e.g.
the stock `m1.small`) is silently accepted and Nova schedules the
server on a libvirt/KVM hypervisor instead of an Ironic node.  The
cluster reaches CREATE_COMPLETE and looks healthy, but every
server_type=bm code path added by this PR (boot_volume_size auto-zero,
network-interface=flat downstream, etc.) is bypassed.  Operators
believe they are testing the BM control plane when they are not.

Reproduced on a baremetal AIO with this branch deployed:

  $ openstack coe cluster create --cluster-template k8s-bm-gpu \\
       --master-flavor m1.small --master-count 1 --node-count 0 \\
       gap-master-vm
  Request to create cluster ... accepted

  $ openstack server show kube-...-master -c flavor -c hypervisor_hostname
  flavor                       {'name': 'm1.small', ..., 'extra_specs': {}}
  OS-EXT-SRV-ATTR:hypervisor_hostname  instance   # ← AIO host KVM, not Ironic

This commit adds `validate_baremetal_flavors()` and a small
`_is_baremetal_flavor()` helper.  A flavor is considered Ironic-backed
when its extra-specs contain a `resources:CUSTOM_<NAME>=1` entry,
which is the convention Nova's Ironic driver requires for the
scheduler to route the request to the Ironic compute service.
`validate_cluster()` now calls the new helper, raising
`InvalidParameterValue` with an actionable message when either
flavor is missing the required extra-spec.  The validator is a no-op
for `server_type=vm` templates, so the existing virtual-cluster path
is unaffected.

Unit tests cover:
  * detection of the baremetal extra-spec (positive, negative, and
    fallback to the cached `extra_specs` attribute when
    `get_keys()` is not eagerly resolvable);
  * no-op for vm templates;
  * pass when both flavors are baremetal-backed;
  * reject for a virtual master_flavor_id;
  * reject for a virtual flavor_id.

  $ pytest magnum_cluster_api/tests/unit/test_utils.py::TestValidateBaremetalFlavors -q
  8 passed, 8 warnings in 0.03s

Signed-off-by: Rico Lin <rlin@vexxhost.com>
@ricolin

Copy link
Copy Markdown
Member Author

Follow-up: validate server_type=bm flavors are Ironic-backed (c2257ff)

While integration-testing this PR end-to-end on a baremetal AIO, I hit a silent failure mode that I think this PR should close before merge.

Repro on this branch (no patch applied)

Cluster template with server_type=bm and BM-backed flavors. Override --master-flavor with a stock virtual flavor on cluster create:

$ openstack coe cluster create --cluster-template k8s-bm-gpu \
    --keypair admin-key --master-count 1 --node-count 0 \
    --master-flavor m1.small \
    --labels kube_tag=v1.34.3,octavia_provider=ovn \
    gap-master-vm
Request to create cluster 23fdc519-d4f9-4f36-9e2a-d6980d8a03c6 accepted

The cluster reaches CREATE_IN_PROGRESS immediately (no validation error) and proceeds to schedule the master. Inspecting the resulting Nova server:

$ openstack server show kube-84lu2-rrtgr-r6zkn -c flavor -c hypervisor_hostname -f value
{'name': 'm1.small', 'original_name': 'm1.small', ..., 'extra_specs': {}, ...}
instance        # ← this is the AIO host's KVM hypervisor, not an Ironic node

So the master silently lands on a libvirt/KVM hypervisor with flavor=m1.small, extra_specs={} — none of the server_type=bm code paths added by this PR fire (get_default_boot_volume_size returns 0 only for BM, the chart's network_interface=flat decision downstream depends on the same predicate, etc.). The cluster looks healthy but is not actually testing the BM control plane.

For comparison, the Ironic-backed flavor on the same AIO does have the expected extra-spec:

$ openstack flavor show baremetal -f value -c properties
{'capabilities:boot_mode': 'uefi', 'cpu_arch': 'x86_64',
 'resources:CUSTOM_BAREMETAL': '1', 'resources:DISK_GB': '0',
 'resources:MEMORY_MB': '0', 'resources:VCPU': '0', 'trait:CUSTOM_GOLD': 'required'}

Fix in c2257ff

Adds validate_baremetal_flavors() (and a small _is_baremetal_flavor() helper) to utils.py, hooked into validate_cluster(). A flavor is considered Ironic-backed when its extra-specs contain a resources:CUSTOM_<NAME>=1 entry, which is the convention Nova's Ironic driver requires for the scheduler to route the request to the Ironic compute service.

The validator is a no-op for server_type=vm templates, so the existing virtual-cluster path is unaffected.

Tests

$ pytest magnum_cluster_api/tests/unit/test_utils.py::TestValidateBaremetalFlavors -q
8 passed, 8 warnings in 0.03s

Coverage:

  • positive / negative / unrelated-extra-specs detection
  • fallback to cached extra_specs attribute when get_keys() is not eagerly resolvable
  • no-op for server_type=vm
  • pass when both flavors are baremetal-backed
  • reject when only master_flavor_id is virtual
  • reject when only flavor_id is virtual

The whole tests/unit/test_utils.py module: 27 passed, 1 pre-existing failure (TestGenerateCloudControllerManagerConfig.test_generate_cloud_controller_manager_config — fixture expects lb-provider=amphora but the test environment now returns amphorav2; reproduces on f9a753e without my changes).

Candidate upstream home: #1014.

PR #1014 owns the server_type=bm path. Bare-metal tenant networks can have
port security disabled so Ironic can use pre-created ports, but CAPO managed
security groups cause generated OpenStackServer ports to include a security
group. Neutron rejects that combination before Nova/Ironic can create the
server.

Add a ClusterClass variable that removes managedSecurityGroups only for
server_type=bm clusters, keep VM clusters on the existing managed security
group behavior, and cover the remove-patch path in the Rust test helper.

Validation:
- cargo test managed_security_groups
- cargo test test_convert_values_to_cluster_topology_variables
- uv run pytest magnum_cluster_api/tests/unit/test_sync.py magnum_cluster_api/tests/unit/test_utils.py magnum_cluster_api/tests/unit/test_resources.py magnum_cluster_api/tests/unit/test_driver.py -q

Assisted-By: ChatGPT <noreply@openai.com>
(cherry picked from commit 93e3406ff794f6e1bdd5cbee42f0ec61112691d5)
(cherry picked from commit 86b807549c8a845c5e037e89ada46e1bd99e325c)
Signed-off-by: Rico Lin <rlin@vexxhost.com>
Rico Lin (ricolin) added a commit that referenced this pull request May 28, 2026
Candidate upstream home: #1014.

PR #1014 owns the server_type=bm path. Bare-metal tenant networks can have
port security disabled so Ironic can use pre-created ports, but CAPO managed
security groups cause generated OpenStackServer ports to include a security
group. Neutron rejects that combination before Nova/Ironic can create the
server.

Add a ClusterClass variable that removes managedSecurityGroups only for
server_type=bm clusters, keep VM clusters on the existing managed security
group behavior, and cover the remove-patch path in the Rust test helper.

Validation:
- cargo test managed_security_groups
- cargo test test_convert_values_to_cluster_topology_variables
- uv run pytest magnum_cluster_api/tests/unit/test_sync.py magnum_cluster_api/tests/unit/test_utils.py magnum_cluster_api/tests/unit/test_resources.py magnum_cluster_api/tests/unit/test_driver.py -q

Assisted-By: ChatGPT <noreply@openai.com>
(cherry picked from commit 93e3406ff794f6e1bdd5cbee42f0ec61112691d5)
Signed-off-by: ricolin <rlin@vexxhost.com>
(cherry picked from commit 86b807549c8a845c5e037e89ada46e1bd99e325c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants