Skip to content

OCPBUGS-83826: deploy-from-self when skopeo < 1.22.2#5867

Open
QiWang19 wants to merge 1 commit intoopenshift:mainfrom
QiWang19:ostree-in-container
Open

OCPBUGS-83826: deploy-from-self when skopeo < 1.22.2#5867
QiWang19 wants to merge 1 commit intoopenshift:mainfrom
QiWang19:ostree-in-container

Conversation

@QiWang19
Copy link
Copy Markdown
Member

@QiWang19 QiWang19 commented Apr 21, 2026

use deploy-from-self when skopeo is older than 1.22.2 and the boot image is multi-arch, which causes skopeo sigstore to fail.

- What I did

- How to verify it

Can manually verify on a non multi-arch release the skopeo version check runs and triggers the in-place update with the order version of skopeo
4.22.0-0.nightly-2026-04-23-021021, #5867 cluster:

/registry-build10-ci-openshift-org-ci-ln-2dmblvb-stable-sha256-ed400db8213c7f0e23d62efa30436f976cb6b4434ba8cbd42b1696bdccd1f6ad/host_service_logs/masters/machine-config-daemon-firstboot_service.log

423 14:40:49.877077    1912 update.go:2965] skopeo version output: skopeo version 1.22.0 commit: 1a41cccfa63d5bd923b57e85986041d7283e720c

423 14:40:49.877116    1912 update.go:2994] "rpm-ostree or skopeo is not new enough for new-format image; forcing an update via container and queuing immediate reboot"

- Description for the changelog

Summary by CodeRabbit

  • Bug Fixes
    • Update now probes host tooling and image metadata to detect multi-arch Sigstore support; when host tooling or the target image lacks support, the updater falls back to container-based updates and forces an immediate reboot. Logging was clarified to reference either host tooling or image capabilities for improved observability.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ca30a886-b2ba-4756-b265-08323f75b706

📥 Commits

Reviewing files that changed from the base of the PR and between 34856ed and 30e3e49.

📒 Files selected for processing (2)
  • pkg/daemon/daemon.go
  • pkg/daemon/update.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/daemon/daemon.go
  • pkg/daemon/update.go

Walkthrough

Adds cached host skopeo capability detection and expands the condition that forces the privileged in-place container update to run when either the host updater is not new enough or host skopeo lacks multi-arch Sigstore support; when skopeo is old, a skopeo inspect --raw fallback checks manifest type.

Changes

Cohort / File(s) Summary
Daemon control flow
pkg/daemon/daemon.go
Broadened condition to force the privileged in-place update when !newEnough OR skopeoSupportsMultiArchSigstore(mc.Spec.OSImageURL) is false; updated log message to reference both rpm-ostree and skopeo.
Skopeo capability detection & fallback
pkg/daemon/update.go
Added cached host probing for skopeo --version (semver >= 1.22.2), skopeoSupportsMultiArchSigstore(imageURL) helper, and a timeout-protected skopeo inspect --raw docker://<image> fallback that treats manifest-list/OCI index media types as multi-arch. Failures default to “unsupported.”

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Daemon
  participant Host as Host (skopeo/rpm-ostree)
  participant Registry as Image Registry
  participant Updater as In-place Updater

  Daemon->>Host: Check NodeUpdaterClient.IsNewEnoughForLayering()
  alt host not new enough
    Daemon->>Updater: Force privileged in-place update
    Updater-->>Daemon: Start update and schedule reboot
  else host new enough
    Daemon->>Host: skopeoSupportsMultiArchSigstore(imageURL)?
    alt skopeo >= 1.22.2
      Host-->>Daemon: supported
      Daemon->>Updater: Proceed with layering in-place update
    else skopeo older
      Daemon->>Registry: skopeo inspect --raw docker://<image> (timeout)
      Registry-->>Daemon: manifest JSON (mediaType)
      alt mediaType is manifest-list or oci/index
        Daemon-->>Host: treat as multi-arch (support)
        Daemon->>Updater: Proceed with layering in-place update
      else
        Daemon->>Updater: Force privileged in-place update
        Updater-->>Daemon: Start update and schedule reboot
      end
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title specifically references the bug ID and the core technical change (deploy-from-self when skopeo < 1.22.2), directly corresponding to the main control-flow modification in daemon.go that adds multi-arch Sigstore support detection.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No test files were modified in this PR; only source code files (pkg/daemon/daemon.go and pkg/daemon/update.go) were changed, so dynamic test name requirement does not apply.
Test Structure And Quality ✅ Passed PR contains only production code changes to daemon.go and update.go with no test code modifications.
Microshift Test Compatibility ✅ Passed This PR modifies daemon logic files without adding any new Ginkgo e2e tests.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR modifies operational code only; no new Ginkgo e2e tests are being added, making this check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains only Go source code changes to daemon process logic with no deployment manifests, scheduling constraints, or operator controller modifications.
Ote Binary Stdout Contract ✅ Passed PR modifies daemon library code invoked from machine-config-daemon process, not OTE test extension binary. New skopeo version checking functions execute at runtime within daemon process, capturing command output via CombinedOutput() and logging via klog to stderr. OTE test extension is separate binary that does not invoke these daemon functions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR modifies production code in pkg/daemon/ without adding Ginkgo e2e tests, making this IPv6/disconnected network e2e test check not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@QiWang19 QiWang19 changed the title deploy-from-self when skopeo < 1.22.2 OCPBUGS-83826: deploy-from-self when skopeo < 1.22.2 Apr 21, 2026
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@QiWang19: This pull request references Jira Issue OCPBUGS-83826, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

uses deploy-from-self when skopeo is actually older than 1.22.2 that has issue
with multi-arch sigstore verification

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@QiWang19
Copy link
Copy Markdown
Member Author

/test all

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@QiWang19: This pull request references Jira Issue OCPBUGS-83826, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

uses deploy-from-self when skopeo is actually older than 1.22.2 that has issue
with multi-arch sigstore verification

- What I did

- How to verify it

- Description for the changelog

Summary by CodeRabbit

  • Bug Fixes
  • Enhanced system update logic to better handle version compatibility for rpm-ostree and skopeo, improving reliability of container-based updates by checking additional version requirements.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/daemon/update.go`:
- Around line 2972-2983: The code currently strips everything after the first
'-' from versionStr which converts prerelease versions like "1.22.2-rc1" into
the stable "1.22.2"; stop removing the '-' portion so semver.NewVersion parses
full prerelease identifiers (remove the dashIdx trimming), then use
semver.NewVersion(versionStr) and compare against
minSkopeoVersionForMultiArchSigstore via
skopeoVersion.Compare(*semver.New(minSkopeoVersionForMultiArchSigstore)) to set
skopeoMultiArchSigstoreSupportedValue; if parsing fails for distro
package-release suffixes (e.g., "1.22.2-3.el9"), attempt a conservative
secondary parse by stripping only a numeric package-release pattern (e.g.,
"-<digits>(..*)?$") before calling semver.NewVersion so you still reject true
prerelease identifiers while tolerating distro release metadata referenced by
updateLayeredOS and RunFirstbootCompleteMachineconfig.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ea8b93d1-1bbf-4dda-8529-764ff5ad88b2

📥 Commits

Reviewing files that changed from the base of the PR and between c3a9db7 and 485b74b.

📒 Files selected for processing (2)
  • pkg/daemon/daemon.go
  • pkg/daemon/update.go

Comment thread pkg/daemon/update.go Outdated
@QiWang19 QiWang19 force-pushed the ostree-in-container branch from 485b74b to e9c2dce Compare April 21, 2026 16:19
@QiWang19
Copy link
Copy Markdown
Member Author

/test all

@QiWang19 QiWang19 force-pushed the ostree-in-container branch from e9c2dce to b5cf5df Compare April 21, 2026 18:45
@QiWang19
Copy link
Copy Markdown
Member Author

/test all

@QiWang19 QiWang19 marked this pull request as ready for review April 21, 2026 18:48
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/daemon/update.go (1)

2973-2985: ⚠️ Potential issue | 🟠 Major

Don't normalize prerelease skopeo builds to stable.

Line 2975 strips everything after the first -, so 1.22.2-rc1 / 1.22.2-dev become 1.22.2 and incorrectly report support. That can skip the container fallback here and in pkg/daemon/daemon.go:1241, reintroducing the multi-arch Sigstore failure on prerelease builds. Please distinguish true prereleases from distro package-release suffixes instead of truncating all suffixes.

Suggested fix
-		// strip off any potential suffix, we only need to compare the major version
-		versionStr := fields[2]
-		if dashIdx := strings.Index(versionStr, "-"); dashIdx != -1 {
-			versionStr = versionStr[:dashIdx]
-		}
-		skopeoVersion, err := semver.NewVersion(versionStr)
+		versionStr := fields[2]
+		baseVersion := versionStr
+		prereleaseLike := false
+		if dashIdx := strings.Index(versionStr, "-"); dashIdx != -1 {
+			baseVersion = versionStr[:dashIdx]
+			suffix := strings.ToLower(versionStr[dashIdx+1:])
+			prereleaseLike = strings.HasPrefix(suffix, "dev") ||
+				strings.HasPrefix(suffix, "rc") ||
+				strings.HasPrefix(suffix, "alpha") ||
+				strings.HasPrefix(suffix, "beta")
+		}
+		skopeoVersion, err := semver.NewVersion(baseVersion)
 		if err != nil {
-			klog.Errorf("failed to parse skopeo version %s: %v", versionStr, err)
+			klog.Errorf("failed to parse skopeo version %s: %v", baseVersion, err)
 			skopeoMultiArchSigstoreSupportedValue = false
 			return
 		}
 		minSkopeoVersionForMultiArchSigstore := "1.22.2"
-		skopeoMultiArchSigstoreSupportedValue = skopeoVersion.Compare(*semver.New(minSkopeoVersionForMultiArchSigstore)) >= 0
+		cmp := skopeoVersion.Compare(*semver.New(minSkopeoVersionForMultiArchSigstore))
+		skopeoMultiArchSigstoreSupportedValue = cmp > 0 || (cmp == 0 && !prereleaseLike)
For github.com/coreos/go-semver/semver, how are `1.22.2-rc1`, `1.22.2-dev`, and `1.22.2-3.el9` compared against `1.22.2`? Specifically, should prerelease builds be treated as older than `1.22.2` while distro package-release suffixes are ignored for a minimum-version gate?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/daemon/update.go` around lines 2973 - 2985, The code currently strips any
suffix after the first '-' which turns prerelease builds like 1.22.2-rc1 into
1.22.2; instead, change the logic in pkg/daemon/update.go around
versionStr/skopeoVersion so that you only remove a trailing package-release
suffix (e.g. a pure numeric package iteration like "-3" or common distro
patterns such as "-3.el9" / "-1.fc34") but do not remove true semver prerelease
identifiers (like "-rc1" or "-dev"); implement this by examining the substring
after the first '-' and only truncating when it matches a package-release regex
(digits with optional dot/alpha segments typical of distro releases), otherwise
keep the suffix and pass the full string to semver.NewVersion so prereleases are
parsed and compared properly against minSkopeoVersionForMultiArchSigstore when
setting skopeoMultiArchSigstoreSupportedValue.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/daemon/update.go`:
- Around line 2973-2985: The code currently strips any suffix after the first
'-' which turns prerelease builds like 1.22.2-rc1 into 1.22.2; instead, change
the logic in pkg/daemon/update.go around versionStr/skopeoVersion so that you
only remove a trailing package-release suffix (e.g. a pure numeric package
iteration like "-3" or common distro patterns such as "-3.el9" / "-1.fc34") but
do not remove true semver prerelease identifiers (like "-rc1" or "-dev");
implement this by examining the substring after the first '-' and only
truncating when it matches a package-release regex (digits with optional
dot/alpha segments typical of distro releases), otherwise keep the suffix and
pass the full string to semver.NewVersion so prereleases are parsed and compared
properly against minSkopeoVersionForMultiArchSigstore when setting
skopeoMultiArchSigstoreSupportedValue.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c107fb66-a7aa-4a02-9e1c-6cb595b07e58

📥 Commits

Reviewing files that changed from the base of the PR and between 485b74b and b5cf5df.

📒 Files selected for processing (2)
  • pkg/daemon/daemon.go
  • pkg/daemon/update.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/daemon/daemon.go

@QiWang19
Copy link
Copy Markdown
Member Author

@yuqi-zhang could you review?

@wking
Copy link
Copy Markdown
Member

wking commented Apr 23, 2026

Not sure about the ovnWaiting for status to be reported — Waiting for pipeline condition to trigger this job:

image

Maybe this will wiggle Prow/Deck loose:

/retest-required

@wking
Copy link
Copy Markdown
Member

wking commented Apr 23, 2026

/test e2e-aws-ovn

@wking
Copy link
Copy Markdown
Member

wking commented Apr 23, 2026

ah, the /test ... seems to have done it, moving us from "Waiting for pipeline condition to trigger..." to "Job triggered":

image

Do the rest:

/test e2e-aws-ovn-upgrade
/test e2e-gcp-op-part1
/test e2e-gcp-op-part2
/test e2e-gcp-op-single-node
/test e2e-hypershift

@QiWang19
Copy link
Copy Markdown
Member Author

/verified later @QiWang19

The failure case of multi-arch will be verified post-merge.

verified the older skopeo verison falls to the in-place update case on a non multi-arch cluster.
must-gather log:

/registry-build10-ci-openshift-org-ci-ln-2dmblvb-stable-sha256-ed400db8213c7f0e23d62efa30436f976cb6b4434ba8cbd42b1696bdccd1f6ad/host_service_logs/masters/machine-config-daemon-firstboot_service.log

423 14:40:49.877077    1912 update.go:2965] skopeo version output: skopeo version 1.22.0 commit: 1a41cccfa63d5bd923b57e85986041d7283e720c

423 14:40:49.877116    1912 update.go:2994] "rpm-ostree or skopeo is not new enough for new-format image; forcing an update via container and queuing immediate reboot"

@openshift-ci-robot openshift-ci-robot added verified-later verified Signifies that the PR passed pre-merge verification criteria labels Apr 23, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@QiWang19: This PR has been marked to be verified later by @QiWang19.

Details

In response to this:

/verified later @QiWang19

The failure case of multi-arch will be verified post-merge.

verified the older skopeo verison falls to the in-place update case on a non multi-arch cluster.
must-gather log:

/registry-build10-ci-openshift-org-ci-ln-2dmblvb-stable-sha256-ed400db8213c7f0e23d62efa30436f976cb6b4434ba8cbd42b1696bdccd1f6ad/host_service_logs/masters/machine-config-daemon-firstboot_service.log

423 14:40:49.877077    1912 update.go:2965] skopeo version output: skopeo version 1.22.0 commit: 1a41cccfa63d5bd923b57e85986041d7283e720c

423 14:40:49.877116    1912 update.go:2994] "rpm-ostree or skopeo is not new enough for new-format image; forcing an update via container and queuing immediate reboot"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@QiWang19
Copy link
Copy Markdown
Member Author

/test e2e-aws-ovn

@wking
Copy link
Copy Markdown
Member

wking commented Apr 23, 2026

e2e-aws-ovn seems unrelated to this pull:

(microdnf:39): libdnf-WARNING **: 17:06:46.628: Skipping refresh of rhel-9-codeready-builder-rpms-aarch64: cannot update repo 'rhel-9-codeready-builder-rpms-aarch64': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried; Last error: Status code: 403 for http://base-4-22-rhel9.ocp.svc/rhel-9-codeready-builder-rpms-aarch64/repodata/repomd.xml (IP: 172.30.25.0)
error: No package matches 'nmstate >= 2.2.10'
error: build error: building at STEP "RUN if [ "${TAGS}" = "fcos" ]; then     sed -i '/- name: rhel-coreos-/,+3 s/^/#/' /manifests/image-references &&     sed -i '/- name: rhel-coreos-10/,+3 s/^/#/' /manifests/image-references &&     sed -i '/baseOSExtensionsContainerImage:/ s/^/#/' /manifests/0000_80_machine-config_05_osimageurl.yaml &&     sed -i 's/rhel-coreos/fedora-coreos/g' /manifests/*;     elif [ "${TAGS}" = "scos" ]; then     sed -i '/- name: rhel-coreos-10/,+3 s/^/#/' /manifests/* &&     sed -i 's/rhel-coreos/stream-coreos/g' /manifests/*; fi &&     dnf -y install 'nmstate >= 2.2.10' &&     if ! rpm -q util-linux; then dnf install -y util-linux; fi &&     if ! rpm -q buildah; then dnf install -y buildah fuse-overlayfs cpp --exclude container-selinux; fi &&     if ! id -u "build" >/dev/null 2>&1; then useradd --uid 1000 build; fi &&     dnf clean all &&     rm -rf /var/cache/dnf/*": while running runtime: exit status 1

I've reported this internally, but am not yet sure what's going on there.

@wking
Copy link
Copy Markdown
Member

wking commented Apr 23, 2026

Still working on the No package matches 'nmstate >= 2.2.10' issue. Check with fresh runs:

/test images
/test ci/prow/e2e-aws-ovn

Copy link
Copy Markdown
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main concerns inline about the amount of affected clusters that would need double reboots

Not sure about the ovnWaiting for status to be reported — Waiting for pipeline condition to trigger this job:

That's a general requirement across openshift to move to pipeline stages. We were just an early adopter. Unit tests will always run, but e2es would only run if lgtm has been applied. Of course you can still trigger them normally as you did @wking

Still working on the No package matches 'nmstate >= 2.2.10' issue. Check with fresh runs:

I think that should be good now via openshift/release#78290

Comment thread pkg/daemon/update.go Outdated
Comment thread pkg/daemon/update.go Outdated
return
}

// strip off any potential suffix, we only need to compare the major version
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this strips it to e.g. 1.21.0? which isn't just the major version (unless I'm mis-interpreting something here in the logic)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop the strip off suffix code as the skopoe official repo does not release version with dash suffix

Comment thread pkg/daemon/update.go Outdated
logSystem("rpm-ostree is not new enough for layering; forcing an update via container")
// If Skopeo used by rpm-ostree is an older version with issue supporting multi-arch Sigstore verification, run as a privileged container which has updated skopeo.
// See https://redhat.atlassian.net/browse/OCPBUGS-83826 and https://redhat.atlassian.net/browse/OCPBUGS-81187
if !newEnough || !skopeoSupportsMultiArchSigstore() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is actually needed here. updateLayeredOS should just be doing OS updates in-cluster when we do an upgrade, which generally should have a new enough rpm-ostree (actually, in which release did we introduce the updated skopeo version? It looks like it's in 4.22 only? Should we try to backport that so older releases upgrading to this would already have the right rpm-ostree/skopeo?).

Basically, I think we should try to minimize the times we do a deploy-from-self operation to only when absolutely necessary. ( also see comment in the bootstrap case)

Side note, I think the comment above This currently will incur a double reboot is actually incorrect for the in-cluster case because we don't actually explicitly reboot here, but just once as part of the normal update flow. So maybe it's fine to leave here now I think about it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to backport that so older releases upgrading to this would already have the right rpm-ostree/skopeo

skopeo fix is included in 1.22.2, OCP 4.21 is with skopeo 1.22.0. I think we can backport the fix to skopeo and cut an upgrade edge for the OCP version that picks up the backport. We can clean up this workaround after the upgrade edge is added. I think we need this workaround given the 4.22 release timeline?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rerun the above reproducer upgrade from 4.21.12(skopeo 1.22.0) multi-arch -> 4.22.0-rc.0 (skopeo 1.22.2) multi-arch, cluster upgrade succefully. https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-launch-gcp-modern/2049222498062438400

Comment thread pkg/daemon/daemon.go Outdated
logSystem("rpm-ostree is not new enough for new-format image; forcing an update via container and queuing immediate reboot")
// If Skopeo used by rpm-ostree is an older version with issue supporting multi-arch Sigstore verification, run as a privileged container which has updated skopeo.
// See https://redhat.atlassian.net/browse/OCPBUGS-83826 and https://redhat.atlassian.net/browse/OCPBUGS-81187
if !newEnough || !skopeoSupportsMultiArchSigstore() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can only target cases where this would be broken, i.e. multi-arch only? Unlike the in-cluster case this actually does incur a double reboot which increases node join time, which is not great in the general case if it can be avoided. If only 4.22 has the right version this means that all clusters not having bootimage updates on by default would have to double reboot to join nodes, which seems excessive.

@wking
Copy link
Copy Markdown
Member

wking commented Apr 24, 2026

/test images

@wking
Copy link
Copy Markdown
Member

wking commented Apr 24, 2026

/test e2e-aws-ovn

@QiWang19 QiWang19 force-pushed the ostree-in-container branch from b5cf5df to 34856ed Compare April 24, 2026 16:40
@openshift-ci-robot openshift-ci-robot removed verified Signifies that the PR passed pre-merge verification criteria verified-later labels Apr 24, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@QiWang19: This pull request references Jira Issue OCPBUGS-83826, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

use deploy-from-self when skopeo is older than 1.22.2 and the boot image is multi-arch, which causes skopeo sigstore to fail.

- What I did

- How to verify it

Can manually verify on a non multi-arch release the skopeo version check runs and triggers the in-place update with the order version of skopeo
4.22.0-0.nightly-2026-04-23-021021, #5867 cluster:

/registry-build10-ci-openshift-org-ci-ln-2dmblvb-stable-sha256-ed400db8213c7f0e23d62efa30436f976cb6b4434ba8cbd42b1696bdccd1f6ad/host_service_logs/masters/machine-config-daemon-firstboot_service.log

423 14:40:49.877077    1912 update.go:2965] skopeo version output: skopeo version 1.22.0 commit: 1a41cccfa63d5bd923b57e85986041d7283e720c

423 14:40:49.877116    1912 update.go:2994] "rpm-ostree or skopeo is not new enough for new-format image; forcing an update via container and queuing immediate reboot"

- Description for the changelog

Summary by CodeRabbit

  • Bug Fixes
  • Improved update logic to trigger the container fallback when either the host tooling is too old or the image tool lacks multi-arch Sigstore verification. Adds runtime detection and logging for the image tool’s capability and updates messages to reference either rpm-ostree or the image tool, improving reliability and observability of container-based updates.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@QiWang19
Copy link
Copy Markdown
Member Author

/test e2e-aws-ovn

@wking
Copy link
Copy Markdown
Member

wking commented Apr 24, 2026

It's stale with the recent push, but the previous commit's e2e-aws-ovn run had gather-extra node logs like:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_machine-config-operator/5867/pull-ci-openshift-machine-config-operator-main-e2e-aws-ovn/2047508480398462976/artifacts/e2e-aws-ovn/gather-extra/artifacts/nodes/ip-10-0-0-77.us-west-2.compute.internal/journal | zgrep -1 skopeo
Apr 24 03:41:54.106135 ip-10-0-0-77 silly_poincare[1914]: I0424 03:41:54.106103    1916 command_runner.go:24] Running captured: rpm-ostree --version
Apr 24 03:41:55.788183 ip-10-0-0-77 silly_poincare[1914]: I0424 03:41:55.788109    1916 update.go:2965] skopeo version output: skopeo version 1.22.0 commit: 1a41cccfa63d5bd923b57e85986041d7283e720c
Apr 24 03:41:55.788197 ip-10-0-0-77 silly_poincare[1914]: I0424 03:41:55.788140    1916 update.go:2994] "rpm-ostree or skopeo is not new enough for new-format image; forcing an update via container and queuing immediate reboot"
Apr 24 03:41:55.788836 ip-10-0-0-77 podman[1902]: I0424 03:41:55.788109    1916 update.go:2965] skopeo version output: skopeo version 1.22.0 commit: 1a41cccfa63d5bd923b57e85986041d7283e720c
Apr 24 03:41:55.788836 ip-10-0-0-77 podman[1902]: I0424 03:41:55.788140    1916 update.go:2994] "rpm-ostree or skopeo is not new enough for new-format image; forcing an update via container and queuing immediate reboot"
Apr 24 03:41:55.827449 ip-10-0-0-77 root[2019]: machine-config-daemon[1916]: "rpm-ostree or skopeo is not new enough for new-format image; forcing an update via container and queuing immediate reboot"
Apr 24 03:41:55.827816 ip-10-0-0-77 silly_poincare[1914]: I0424 03:41:55.827776    1916 update.go:2907] Running: setenforce 0
--
Apr 24 03:43:29.462719 ip-10-0-0-77 podman[3168]:   runc 4:1.4.0-2.el9 -> 4:1.4.2-1.el9_8
Apr 24 03:43:29.462719 ip-10-0-0-77 podman[3168]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 03:43:29.462719 ip-10-0-0-77 podman[3168]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8
--
Apr 24 03:43:29.462280 ip-10-0-0-77 vigorous_kilby[3189]:   runc 4:1.4.0-2.el9 -> 4:1.4.2-1.el9_8
Apr 24 03:43:29.462286 ip-10-0-0-77 vigorous_kilby[3189]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 03:43:29.462292 ip-10-0-0-77 vigorous_kilby[3189]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8
--
Apr 24 03:44:14.918722 ip-10-0-0-77 podman[1239]: I0424 03:44:14.918628    1253 command_runner.go:24] Running captured: rpm-ostree --version
Apr 24 03:44:15.078762 ip-10-0-0-77 great_dubinsky[1251]: I0424 03:44:15.078681    1253 update.go:2965] skopeo version output: skopeo version 1.22.2 commit: 4db1c56e8003ad07759a4d899254643d445dbe04
Apr 24 03:44:15.078776 ip-10-0-0-77 great_dubinsky[1251]: I0424 03:44:15.078712    1253 daemon.go:1259] rpm-ostree has container feature
Apr 24 03:44:15.079021 ip-10-0-0-77 podman[1239]: I0424 03:44:15.078681    1253 update.go:2965] skopeo version output: skopeo version 1.22.2 commit: 4db1c56e8003ad07759a4d899254643d445dbe04
Apr 24 03:44:15.079021 ip-10-0-0-77 podman[1239]: I0424 03:44:15.078712    1253 daemon.go:1259] rpm-ostree has container feature

which looks good to me (othere than the setenforce 0 bit; but that's orthogonal to this pull, git blame ... says it was 0451275 with a 2022 "but for now..." 😅).

uses deploy-from-self when skopeo is actually older than 1.22.2 that has issue
 with multi-arch sigstore verification

Signed-off-by: Qi Wang <qiwan@redhat.com>
@QiWang19 QiWang19 force-pushed the ostree-in-container branch from 34856ed to 30e3e49 Compare April 24, 2026 20:54
@QiWang19
Copy link
Copy Markdown
Member Author

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_machine-config-operator/5867/pull-ci-openshift-machine-config-operator-main-e2e-aws-ovn/2047728019010752512

Apr 24 17:52:39.564931 ip-10-0-23-19 podman[1890]: I0424 17:52:39.564664    1904 update.go:2960] skopeo version output: skopeo version 1.22.0 commit: 1a41cccfa63d5bd923b57e85986041d7283e720c
Apr 24 17:52:40.149103 ip-10-0-23-19 stoic_swartz[1902]: W0424 17:52:40.149027    1904 update.go:2986] Failed to determine if image quay-proxy.ci.openshift.org/openshift/ci@sha256:d141f8d4fc762f855715d4676e9c46738ea498b8f983bc102bb9b805554ef008 is multi-arch: skopeo inspect failed for quay-proxy.ci.openshift.org/openshift/ci@sha256:d141f8d4fc762f855715d4676e9c46738ea498b8f983bc102bb9b805554ef008: exit status 1
Apr 24 17:52:40.149118 ip-10-0-23-19 stoic_swartz[1902]: I0424 17:52:40.149054    1904 update.go:3016] "rpm-ostree or skopeo is not new enough for new-format image; forcing an update via container and queuing immediate reboot"
Apr 24 17:52:40.149907 ip-10-0-23-19 podman[1890]: W0424 17:52:40.149027    1904 update.go:2986] Failed to determine if image quay-proxy.ci.openshift.org/openshift/ci@sha256:d141f8d4fc762f855715d4676e9c46738ea498b8f983bc102bb9b805554ef008 is multi-arch: skopeo inspect failed for quay-proxy.ci.openshift.org/openshift/ci@sha256:d141f8d4fc762f855715d4676e9c46738ea498b8f983bc102bb9b805554ef008: exit status 1
Apr 24 17:52:40.149907 ip-10-0-23-19 podman[1890]: I0424 17:52:40.149054    1904 update.go:3016] "rpm-ostree or skopeo is not new enough for new-format image; forcing an update via container and queuing immediate reboot"
Apr 24 17:52:40.179155 ip-10-0-23-19 root[2013]: machine-config-daemon[1904]: "rpm-ostree or skopeo is not new enough for new-format image; forcing an update via container and queuing immediate reboot"
Apr 24 17:52:40.179545 ip-10-0-23-19 stoic_swartz[1902]: I0424 17:52:40.179502    1904 update.go:2904] Running: setenforce 0
--
Apr 24 17:53:53.055595 ip-10-0-23-19 podman[3165]:   samba-common-libs 4.23.5-6.el9 -> 4.23.5-8.el9_8
Apr 24 17:53:53.055595 ip-10-0-23-19 podman[3165]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 17:53:53.055595 ip-10-0-23-19 podman[3165]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8
--
Apr 24 17:53:53.055217 ip-10-0-23-19 lucid_hofstadter[3185]:   samba-common-libs 4.23.5-6.el9 -> 4.23.5-8.el9_8
Apr 24 17:53:53.055225 ip-10-0-23-19 lucid_hofstadter[3185]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 17:53:53.055233 ip-10-0-23-19 lucid_hofstadter[3185]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8

multiarch check failed, which seems to be caused by no authfile passed to skopeo, rebased.

/test e2e-aws-ovn

@QiWang19
Copy link
Copy Markdown
Member Author

/verified later @QiWang19
e2e-aws-ovn

$ curl https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_machine-config-operator/5867/pull-ci-openshift-machine-config-operator-main-e2e-aws-ovn/2047782636453105664/artifacts/e2e-aws-ovn/gather-extra/artifacts/nodes/ip-10-0-13-29.us-west-1.compute.internal/journal | zgrep -1 skopeo

Apr 24 21:42:02.591952 ip-10-0-13-29 podman[1899]: I0424 21:42:02.591827    1913 command_runner.go:24] Running captured: rpm-ostree --version
Apr 24 21:42:04.170861 ip-10-0-13-29 hungry_hugle[1911]: I0424 21:42:04.170772    1913 update.go:2962] skopeo version output: skopeo version 1.22.0 commit: 1a41cccfa63d5bd923b57e85986041d7283e720c
Apr 24 21:42:04.171476 ip-10-0-13-29 podman[1899]: I0424 21:42:04.170772    1913 update.go:2962] skopeo version output: skopeo version 1.22.0 commit: 1a41cccfa63d5bd923b57e85986041d7283e720c
Apr 24 21:42:04.851471 ip-10-0-13-29 hungry_hugle[1911]: I0424 21:42:04.851137    1913 update.go:3015] Image quay-proxy.ci.openshift.org/openshift/ci@sha256:96aa746ecac7f52aaeb40c5dbc06438f4e7d1659b50ce15648bd9f9eca3abd51 mediaType: application/vnd.docker.distribution.manifest.v2+json, multi-arch: false
--
Apr 24 21:43:48.291493 ip-10-0-13-29 podman[1899]:   samba-common-libs 4.23.5-6.el9 -> 4.23.5-8.el9_8
Apr 24 21:43:48.291493 ip-10-0-13-29 podman[1899]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 21:43:48.291493 ip-10-0-13-29 podman[1899]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8
--
Apr 24 21:43:48.290953 ip-10-0-13-29 hungry_hugle[1911]:   samba-common-libs 4.23.5-6.el9 -> 4.23.5-8.el9_8
Apr 24 21:43:48.290959 ip-10-0-13-29 hungry_hugle[1911]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 21:43:48.290965 ip-10-0-13-29 hungry_hugle[1911]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8
--
Apr 24 21:43:53.561939 ip-10-0-13-29 podman[1899]:   samba-common-libs 4.23.5-6.el9 -> 4.23.5-8.el9_8
Apr 24 21:43:53.561939 ip-10-0-13-29 podman[1899]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 21:43:53.561939 ip-10-0-13-29 podman[1899]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8
--
Apr 24 21:43:53.561461 ip-10-0-13-29 hungry_hugle[1911]:   samba-common-libs 4.23.5-6.el9 -> 4.23.5-8.el9_8
Apr 24 21:43:53.561466 ip-10-0-13-29 hungry_hugle[1911]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 21:43:53.561472 ip-10-0-13-29 hungry_hugle[1911]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8

@openshift-ci-robot openshift-ci-robot added verified-later verified Signifies that the PR passed pre-merge verification criteria labels Apr 25, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@QiWang19: This PR has been marked to be verified later by @QiWang19.

Details

In response to this:

/verified later @QiWang19
e2e-aws-ovn

$ curl https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_machine-config-operator/5867/pull-ci-openshift-machine-config-operator-main-e2e-aws-ovn/2047782636453105664/artifacts/e2e-aws-ovn/gather-extra/artifacts/nodes/ip-10-0-13-29.us-west-1.compute.internal/journal | zgrep -1 skopeo

Apr 24 21:42:02.591952 ip-10-0-13-29 podman[1899]: I0424 21:42:02.591827    1913 command_runner.go:24] Running captured: rpm-ostree --version
Apr 24 21:42:04.170861 ip-10-0-13-29 hungry_hugle[1911]: I0424 21:42:04.170772    1913 update.go:2962] skopeo version output: skopeo version 1.22.0 commit: 1a41cccfa63d5bd923b57e85986041d7283e720c
Apr 24 21:42:04.171476 ip-10-0-13-29 podman[1899]: I0424 21:42:04.170772    1913 update.go:2962] skopeo version output: skopeo version 1.22.0 commit: 1a41cccfa63d5bd923b57e85986041d7283e720c
Apr 24 21:42:04.851471 ip-10-0-13-29 hungry_hugle[1911]: I0424 21:42:04.851137    1913 update.go:3015] Image quay-proxy.ci.openshift.org/openshift/ci@sha256:96aa746ecac7f52aaeb40c5dbc06438f4e7d1659b50ce15648bd9f9eca3abd51 mediaType: application/vnd.docker.distribution.manifest.v2+json, multi-arch: false
--
Apr 24 21:43:48.291493 ip-10-0-13-29 podman[1899]:   samba-common-libs 4.23.5-6.el9 -> 4.23.5-8.el9_8
Apr 24 21:43:48.291493 ip-10-0-13-29 podman[1899]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 21:43:48.291493 ip-10-0-13-29 podman[1899]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8
--
Apr 24 21:43:48.290953 ip-10-0-13-29 hungry_hugle[1911]:   samba-common-libs 4.23.5-6.el9 -> 4.23.5-8.el9_8
Apr 24 21:43:48.290959 ip-10-0-13-29 hungry_hugle[1911]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 21:43:48.290965 ip-10-0-13-29 hungry_hugle[1911]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8
--
Apr 24 21:43:53.561939 ip-10-0-13-29 podman[1899]:   samba-common-libs 4.23.5-6.el9 -> 4.23.5-8.el9_8
Apr 24 21:43:53.561939 ip-10-0-13-29 podman[1899]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 21:43:53.561939 ip-10-0-13-29 podman[1899]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8
--
Apr 24 21:43:53.561461 ip-10-0-13-29 hungry_hugle[1911]:   samba-common-libs 4.23.5-6.el9 -> 4.23.5-8.el9_8
Apr 24 21:43:53.561466 ip-10-0-13-29 hungry_hugle[1911]:   skopeo 2:1.22.0-2.el9 -> 2:1.22.2-1.el9_8
Apr 24 21:43:53.561472 ip-10-0-13-29 hungry_hugle[1911]:   sssd 2.9.8-1.el9 -> 2.9.8-4.el9_8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@QiWang19
Copy link
Copy Markdown
Member Author

@yuqi-zhang PTAL

@QiWang19
Copy link
Copy Markdown
Member Author

@wking PTAL

Copy link
Copy Markdown
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more questions - the general idea looks better, thanks for making it conditional

Comment thread pkg/daemon/update.go
args = append(args, "--authfile", kubeletAuthFile)
}
args = append(args, "docker://"+imageURL)
cmd := exec.CommandContext(ctx, "skopeo", args...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to check: this is using the current skopeo on-disk, which is of whatever bootimage version is in the bootimage (let's say for example, whatever we shipped in 4.13, if the cluster is not in a managed bootimage path).

Do you expect that inspect to fail at all due to similar problems that caused the initial failure, or any other changes in the meantime that would be incompatible with an old skopeo?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skopeo inspect does not validate the signature the way skopeo copy does, so old skopeo inspect here should not cause the initial failure.

Comment thread pkg/daemon/daemon.go
// This currently will incur a double reboot; see https://github.com/coreos/rpm-ostree/issues/4018
if !newEnough {
logSystem("rpm-ostree is not new enough for new-format image; forcing an update via container and queuing immediate reboot")
// If skopeo is < 1.22.2 on a multi-arch image, run as a privileged container which has updated skopeo.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot if I asked this - does the rpm-ostree directly use the skopeo binary on-disk or does it use some vendored library, etc. ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://redhat.atlassian.net/browse/OCPBUGS-81187?focusedCommentId=16667251 and https://redhat-internal.slack.com/archives/CB95J6R4N/p1775744756600979
The previous diagnosis pointed out the skopeo experimental-image-proxy subcommand used by rpm-ostree, which indicates rpm-ostree uses the skopeo on the host, but I don't know much about the code related to it.

Comment thread pkg/daemon/update.go
if err := json.Unmarshal(out, &manifest); err != nil {
return false, fmt.Errorf("failed to parse manifest for %s: %w", imageURL, err)
}
multiArch := manifest.MediaType == "application/vnd.docker.distribution.manifest.list.v2+json" ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand where these are defined and why these are guaranteed to be multi-arch?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manifest.list in the mediaType means this is a manifest list. skopeo inspect --raw output the manifest, if it is multi arch images, skopeo returns a manifest list containing entries for each architecture.
mediaType of single arch image would be application/vnd.oci.image.manifest.v1+json. https://github.com/distribution/distribution/blob/v2.8.3/docs/spec/manifest-v2-2.md#manifest-list

Comment thread pkg/daemon/update.go

multiArch, err := isMultiArchImage(imageURL)
if err != nil {
klog.Warningf("Failed to determine if image %s is multi-arch: %v", imageURL, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this would mean that if something went wrong we just blanket update skopeo and hope for the best (since this is early boot only, any logging doesn't actually help I think).

What real failure scenarios would we hit? And would a skopeo update + reboot help or make it worse?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic is:
If skopeo version check errors, only do inplace deploy if image is multi-arch
If multi-arch check errors, only do inplace deploy if skopeo version < 1.22.2
If both error, fallback to do inplace deploy.

skopeo version check failure should be rare, possiblely caused by the skopeo on the host is broken, so skopeo update would help since it uses the skopeo from the container.
mutli arch check may fail on network issue or registry authication, in this case I think in place deploy may not help, it would also fail on the same network/authentication error. But falling back to in place deploy make sure we do not fail on the original sigstore verification failure when we can not determine the skopeo version or if image is multi-arch.

Copy link
Copy Markdown
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications - lgtm, if we can also validate through CI that's be good.
/lgtm
/hold

To let the tests run, but also give @wking a chance for any final feedback

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2026
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 28, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test e2e-gcp-op-part1
/test e2e-gcp-op-part2
/test e2e-gcp-op-single-node
/test e2e-hypershift

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 28, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: QiWang19, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 28, 2026
@QiWang19
Copy link
Copy Markdown
Member Author

/retest-required

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

@QiWang19: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-ocl b5cf5df link false /test e2e-gcp-op-ocl
ci/prow/e2e-gcp-op-single-node 30e3e49 link true /test e2e-gcp-op-single-node
ci/prow/e2e-hypershift 30e3e49 link true /test e2e-hypershift
ci/prow/e2e-aws-ovn-upgrade 30e3e49 link true /test e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria verified-later

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants