Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions pkg/daemon/daemon.go
Original file line number Diff line number Diff line change
Expand Up @@ -1236,8 +1236,10 @@ func (dn *Daemon) RunFirstbootCompleteMachineconfig(machineConfigFile string) er
// If the host isn't new enough to understand the new container model natively, run as a privileged container.
// See https://github.com/coreos/rpm-ostree/pull/3961 and https://issues.redhat.com/browse/MCO-356
// This currently will incur a double reboot; see https://github.com/coreos/rpm-ostree/issues/4018
if !newEnough {
logSystem("rpm-ostree is not new enough for new-format image; forcing an update via container and queuing immediate reboot")
// If skopeo is < 1.22.2 on a multi-arch image, run as a privileged container which has updated skopeo.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot if I asked this - does the rpm-ostree directly use the skopeo binary on-disk or does it use some vendored library, etc. ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://redhat.atlassian.net/browse/OCPBUGS-81187?focusedCommentId=16667251 and https://redhat-internal.slack.com/archives/CB95J6R4N/p1775744756600979
The previous diagnosis pointed out the skopeo experimental-image-proxy subcommand used by rpm-ostree, which indicates rpm-ostree uses the skopeo on the host, but I don't know much about the code related to it.

// See https://redhat.atlassian.net/browse/OCPBUGS-83826 and https://redhat.atlassian.net/browse/OCPBUGS-81187
if !newEnough || !skopeoSupportsMultiArchSigstore(mc.Spec.OSImageURL) {
logSystem("rpm-ostree or skopeo is not new enough for new-format image; forcing an update via container and queuing immediate reboot")
if err := dn.InplaceUpdateViaNewContainer(mc.Spec.OSImageURL); err != nil {
return err
}
Expand Down
73 changes: 72 additions & 1 deletion pkg/daemon/update.go
Original file line number Diff line number Diff line change
Expand Up @@ -2762,7 +2762,6 @@ func (dn *Daemon) updateLayeredOS(config *mcfgv1.MachineConfig) error {
}
// If the host isn't new enough to understand the new container model natively, run as a privileged container.
// See https://github.com/coreos/rpm-ostree/pull/3961 and https://issues.redhat.com/browse/MCO-356
// This currently will incur a double reboot; see https://github.com/coreos/rpm-ostree/issues/4018
if !newEnough {
logSystem("rpm-ostree is not new enough for layering; forcing an update via container")
return dn.InplaceUpdateViaNewContainer(newURL)
Expand Down Expand Up @@ -2943,6 +2942,78 @@ func podmanSupportsSigstore() bool {
return podmanSigstoreSupportedValue
}

var (
skopeoVersionChecked sync.Once
skopeoVersionNewEnough bool
)

// skopeoVersionSupportsMultiArchSigstore checks (once) whether the host skopeo is >= 1.22.2.
// Returns false on any version check or parse failure (OCPBUGS-81187).
func skopeoVersionSupportsMultiArchSigstore() bool {
skopeoVersionChecked.Do(func() {
cmd := exec.Command("skopeo", "--version")
out, err := cmd.CombinedOutput()
if err != nil {
klog.Errorf("failed to run skopeo --version: %v", err)
return
}
klog.Infof("skopeo version output: %s", strings.TrimSpace(string(out)))
// Output format: "skopeo version X.Y.Z commit: abcdefg"
fields := strings.Fields(strings.TrimSpace(string(out)))
if len(fields) < 3 {
klog.Errorf("unexpected skopeo version output format: %s", string(out))
return
}
skopeoVersion, err := semver.NewVersion(fields[2])
if err != nil {
klog.Errorf("failed to parse skopeo version %s: %v", fields[2], err)
return
}
minVersion := "1.22.2"
skopeoVersionNewEnough = skopeoVersion.Compare(*semver.New(minVersion)) >= 0
})
return skopeoVersionNewEnough
}

// skopeoSupportsMultiArchSigstore returns false when skopeo < 1.22.2 and the image is a multi-arch
func skopeoSupportsMultiArchSigstore(imageURL string) bool {
if skopeoVersionSupportsMultiArchSigstore() {
return true
}

multiArch, err := isMultiArchImage(imageURL)
if err != nil {
klog.Warningf("Failed to determine if image %s is multi-arch: %v", imageURL, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this would mean that if something went wrong we just blanket update skopeo and hope for the best (since this is early boot only, any logging doesn't actually help I think).

What real failure scenarios would we hit? And would a skopeo update + reboot help or make it worse?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic is:
If skopeo version check errors, only do inplace deploy if image is multi-arch
If multi-arch check errors, only do inplace deploy if skopeo version < 1.22.2
If both error, fallback to do inplace deploy.

skopeo version check failure should be rare, possiblely caused by the skopeo on the host is broken, so skopeo update would help since it uses the skopeo from the container.
mutli arch check may fail on network issue or registry authication, in this case I think in place deploy may not help, it would also fail on the same network/authentication error. But falling back to in place deploy make sure we do not fail on the original sigstore verification failure when we can not determine the skopeo version or if image is multi-arch.

return false
}
return !multiArch
}

func isMultiArchImage(imageURL string) (bool, error) {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
args := []string{"inspect", "--raw"}
if _, err := os.Stat(kubeletAuthFile); err == nil {
args = append(args, "--authfile", kubeletAuthFile)
}
args = append(args, "docker://"+imageURL)
cmd := exec.CommandContext(ctx, "skopeo", args...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to check: this is using the current skopeo on-disk, which is of whatever bootimage version is in the bootimage (let's say for example, whatever we shipped in 4.13, if the cluster is not in a managed bootimage path).

Do you expect that inspect to fail at all due to similar problems that caused the initial failure, or any other changes in the meantime that would be incompatible with an old skopeo?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skopeo inspect does not validate the signature the way skopeo copy does, so old skopeo inspect here should not cause the initial failure.

out, err := cmd.CombinedOutput()
if err != nil {
return false, fmt.Errorf("skopeo inspect failed for %s: %w", imageURL, err)
}
var manifest struct {
MediaType string `json:"mediaType"`
}
if err := json.Unmarshal(out, &manifest); err != nil {
return false, fmt.Errorf("failed to parse manifest for %s: %w", imageURL, err)
}
multiArch := manifest.MediaType == "application/vnd.docker.distribution.manifest.list.v2+json" ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand where these are defined and why these are guaranteed to be multi-arch?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manifest.list in the mediaType means this is a manifest list. skopeo inspect --raw output the manifest, if it is multi arch images, skopeo returns a manifest list containing entries for each architecture.
mediaType of single arch image would be application/vnd.oci.image.manifest.v1+json. https://github.com/distribution/distribution/blob/v2.8.3/docs/spec/manifest-v2-2.md#manifest-list

manifest.MediaType == "application/vnd.oci.image.index.v1+json"
klog.Infof("Image %s mediaType: %s, multi-arch: %v", imageURL, manifest.MediaType, multiArch)
return multiArch, nil
}

// Log a message to the systemd journal as well as our stdout
func logSystem(format string, a ...interface{}) {
message := fmt.Sprintf(format, a...)
Expand Down