Skip to content

feat(ci): add Ubuntu 24.04 hydrophone canaries#1087

Open
Rico Lin (ricolin) wants to merge 2 commits into
mainfrom
codex/ubuntu-2404-support
Open

feat(ci): add Ubuntu 24.04 hydrophone canaries#1087
Rico Lin (ricolin) wants to merge 2 commits into
mainfrom
codex/ubuntu-2404-support

Conversation

@ricolin

@ricolin Rico Lin (ricolin) commented Jun 18, 2026

Copy link
Copy Markdown
Member

Depends-On: #1084

Summary

  • parameterize Hydrophone guest image URLs with an image_prefix variable
  • add Ubuntu 24.04 Hydrophone canary jobs for the newest generated Kubernetes version, currently v1.36.1, for both Calico and Cilium
  • update DevStack and user/developer image examples to consume capo-image-elements Ubuntu 24.04 artifacts with os_distro=ubuntu

Rationale

vexxhost/capo-image-elements already publishes ubuntu-24.04-v<version>.qcow2 artifacts. This keeps the existing Ubuntu 22.04 Hydrophone matrix intact while adding a narrow Magnum-specific Noble validation path before widening the CI matrix.

Validation

  • uv run --no-project --script hack/bump/kubernetes.py --self-test
  • bash -n hack/bump/kubernetes.sh hack/run-integration-tests.sh
  • YAML parse check for zuul.d/jobs.yaml, zuul.d/project.yaml, and zuul.d/hydrophone-jobs.yaml
  • git diff --check
  • uvx --python 3.12 pre-commit run --all-files
  • dry run of hack/bump/kubernetes.py --image-prefix ubuntu-24.04 against temporary output files

Add image family support to the Kubernetes bump helper and keep
Ubuntu 24.04 Hydrophone coverage for the newest generated Kubernetes
version. Update the user, developer, and DevStack image examples to use
capo-image-elements Ubuntu 24.04 artifacts with os_distro=ubuntu.

Signed-off-by: Rico Lin <rlin@vexxhost.com>
Assisted-By: Codex <noreply@openai.com>

Rico Lin (ricolin) commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

Ubuntu 24.04 validation report

Validated PR head 7b495eda8666349a2c78ed4b856ac60a3db93f36 with Ubuntu 24.04 and Kubernetes v1.36.1.

Passed:

  • Baseline Ubuntu 24.04 cluster create reached CREATE_COMPLETE.
  • Nodes came up Ready on Ubuntu 24.04.4 LTS with Kubernetes v1.36.1.
  • Fresh create/delete validation against the PR head passed:
    • Fresh cluster reached CREATE_COMPLETE.
    • Delete completed and cleaned up the associated compute/load-balancer resources.
  • Upgrade path: Ubuntu 22.04.5 LTS + Kubernetes v1.35.5 to Ubuntu 24.04.4 LTS + Kubernetes v1.36.1 passed:
    • Magnum cluster status reached UPDATE_COMPLETE.
    • Cluster template and nodegroup image moved to the Ubuntu 24.04 target.
    • Control-plane and worker machines were replaced.
    • Final nodes were all Ready on Ubuntu 24.04.4 LTS with Kubernetes v1.36.1.
    • No non-running pods after convergence.
  • Upgrade path: Ubuntu 22.04.5 LTS + Kubernetes v1.36.1 to Ubuntu 24.04.4 LTS + Kubernetes v1.36.1 passed:
    • Magnum cluster status reached UPDATE_COMPLETE.
    • Control-plane and worker machines were replaced.
    • Final nodes were all Ready on Ubuntu 24.04.4 LTS with Kubernetes v1.36.1.
    • No non-running pods after convergence.
  • Worker resize through the supported Magnum path passed on Ubuntu 24.04:
    • openstack coe cluster resize --nodegroup default-worker ... 2
    • Result: nodegroup stayed UPDATE_COMPLETE, second worker became Ready on Ubuntu 24.04.4 LTS.
  • Worker resize down through the same supported path passed:
    • Explicit node removal returned the worker nodegroup to 1 worker.
    • Result: UPDATE_COMPLETE, remaining worker Ready, no non-running pods.
  • Magnum API/conductor restart passed while Ubuntu 24.04 clusters existed:
    • API and conductor rolled successfully.
    • PR overlay still imported from the expected path after restart.
    • Cluster and nodegroup reads still returned UPDATE_COMPLETE.
  • Workload networking checks passed:
    • Pod scheduling across workers.
    • CoreDNS service discovery.
    • ClusterIP Service routing.
    • Direct pod-to-pod traffic across workers.
  • Service type=LoadBalancer passed with the default Octavia provider path:
    • Service reached EnsuredLoadBalancer and received a load-balancer address.
    • No non-running pods after the test.
  • Cinder CSI checks passed:
    • PVC provisioned and bound with the default Cinder storage class.
    • Pod mounted the volume and successfully wrote/read data.
    • Cross-worker reattach passed: data written from a pod on one worker was read from a pod on another worker after pod deletion/recreate.
    • Test PVC/volume cleanup completed.
  • Cilium smoke test passed with Ubuntu 24.04:
    • Cilium cluster reached CREATE_COMPLETE.
    • Cilium daemonset and operator were ready.
    • Nodes were Ready on Ubuntu 24.04.4 LTS with Kubernetes v1.36.1.
    • In-cluster Service connectivity succeeded.
    • No non-running pods after convergence.
  • Temporary upgrade/Cilium test clusters and templates were cleaned up after validation.

Observed non-blocking issues:

  • Generic openstack coe cluster update ... replace node_count=3 is not implemented in the mCAPI driver path and raises NotImplementedError from update_cluster(). The supported coe cluster resize --nodegroup ... path works and was used for successful update/resize validation.
  • Service type=LoadBalancer did not complete when the cluster cloud config requested Octavia provider ovn but that provider was unavailable. Retesting with the default provider path passed, so this still looks like a provider/template config mismatch rather than an Ubuntu 24.04 regression.

Final state after validation: the retained Ubuntu 24.04 validation cluster is healthy at 1 control-plane + 2 workers, with all nodes Ready and no non-running pods from the test suite.

Switch the shared Hydrophone image prefix to Ubuntu 24.04 so all
Hydrophone jobs use the newer guest image by default. Drop the temporary
Ubuntu 24.04-specific canary jobs now that they are redundant.

Signed-off-by: Rico Lin <rlin@vexxhost.com>
Assisted-By: Codex <noreply@openai.com>
@ricolin Rico Lin (ricolin) force-pushed the codex/ubuntu-2404-support branch from 483c263 to df31cf9 Compare June 18, 2026 06:45
@ricolin Rico Lin (ricolin) marked this pull request as ready for review June 18, 2026 06:45

@mnaser Mohammed Naser (mnaser) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets just switch everything to 24.04

in theory, we are not testing if the images work, we already do that in capo-image-elements by deploying a cluster with them, we're only using the images as an artifact to validate.


enable_plugin magnum https://opendev.org/openstack/magnum
MAGNUM_GUEST_IMAGE_URL=https://static.atmosphere.dev/artifacts/magnum-cluster-api/ubuntu-jammy-kubernetes-1-31-1-1728920853.qcow2
MAGNUM_GUEST_IMAGE_URL=https://github.com/vexxhost/capo-image-elements/releases/latest/download/ubuntu-24.04-v1.36.1.qcow2

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a pinned release, we can use hack/bump to update it with time, otherwise, it will break with time.

Comment on lines +50 to +54
export OS_DISTRO=ubuntu
export IMAGE_FAMILY=ubuntu-24.04
for version in v1.33.12 v1.34.8 v1.35.5 v1.36.1; do \
IMAGE_NAME="${IMAGE_FAMILY}-${version}"; \
curl -LO https://github.com/vexxhost/capo-image-elements/releases/latest/download/${IMAGE_NAME}.qcow2; \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants